How to compare a filed of file1 with values in file2



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Re: How to compare a filed of file1 with values in file2

Postby lakshmipathik » Sun Apr 14, 2013 1:25 am

Hi,
Based on Skolusu's example given to me, I have modified my code according to my requirement and started running now,
it is failing with 'SORT CAPACITY EXCEEDED - RECORD COUNT 88792978'

I tried giving options like below, but still it is giving same error. Please let me know how can I overcome from this issue.
Option 1:
1) //SORTWKXX DD UNIT=SYSDA,SPACE=(CYL,(6200,6200),RLSE) - -like this I have given 29 files

Option 2: remove SORTWK files and instead add below
//DFSPARM DD *
OPTION DYNALLOC=(SYSDA,255)
/*

My current code is below
//SAUSRT02 EXEC PGM=ICETOOL,                                         
//             COND=(0,NE)                                           
//TOOLMSG  DD  SYSOUT=A,FLASH=F003,                                   
//             DEST=LOCAL                                             
//DFSMSG   DD  SYSOUT=A,FLASH=F003,                                   
//             DEST=LOCAL                                             
//SYSPRINT DD  SYSOUT=A,FLASH=F003,                                   
//             DEST=LOCAL                                             
//SYSOUT   DD  SYSOUT=A,FLASH=F003,                                   
//             DEST=LOCAL                                             
//INFILE1  DD  DSN=A.B.C, - - - - 25 million records             
//             DISP=SHR                                               
//INFILE2  DD  DSN=B.C.D.,DISP=SHR - - - - -17000 RECORDS       
//TEMP1    DD  DSN=TEMP1,DISP=(NEW,CATLG,DELETE),
//             DCB=(SRCDSCB,RECFM=FB,LRECL=1025),                     
//             SPACE=(1025,(50,50),RLSE),AVGREC=K                     
//TEMP2    DD  DSN=TEMP2,DISP=(NEW,CATLG,DELETE),
//             DCB=(SRCDSCB,RECFM=FB,LRECL=1025),                     
//             SPACE=(1025,(50,50),RLSE),AVGREC=K                     
//TEMP3    DD  DSN=TEMP3,DISP=(NEW,CATLG,DELETE),
//             DCB=(SRCDSCB,RECFM=FB,LRECL=1024),                     
//             SPACE=(1024,(50,50),RLSE),AVGREC=K                     
//TEMP5    DD  DSN=&&TWMC5,DISP=(NEW,DELETE,DELETE)               
//TEMP6    DD  DSN=&&TWMC6,DISP=(MOD,DELETE,DELETE)               
//DFSPARM  DD  *                                                   
  OPTION DYNALLOC=(SYSDA,255)                                     
/*                                                                 
//TOOLIN   DD  *                                                   
  COPY   FROM(INFILE1) TO(TEMP1)     USING(XBI0)                   
  COPY   JKFROM        TO(TEMP2)     USING(XBI1)                   
  COPY   FROM(TEMP2)   TO(TEMP3)     USING(XBI2)                   
/*                                                                 
//XBI0CNTL DD  *                                                   
  OUTREC FIELDS=(1,37,38,2,BI,TO=ZD,LENGTH=03,40,985)             
/*                                                                 
//XBI1CNTL DD  *                                                   
  JOINKEYS F1=TEMP1,FIELDS=(1026,1,A),SORTED,NOSEQCK               
  JOINKEYS F2=INFILE2,FIELDS=(64,1,A),SORTED,NOSEQCK               
  REFORMAT FIELDS=(F1:1,1034,F2:2,12)                             
  INREC IFOUTLEN=1034,                                             
  IFTHEN=(WHEN=((960,1,CH,EQ,C'N'),AND,(1041,4,SS,EQ,C'**'),AND,   
                (38,3,ZD,GE,1035,3,ZD),AND,(38,3,ZD,LE,1038,3,ZD)),
  BUILD=(1,574,1045,2,577,449,1027,8,X)),                         
  IFTHEN=(WHEN=((960,1,CH,EQ,C'N'),AND,(131,4,ZD,EQ,1041,4,ZD),AND,
                (38,3,ZD,GE,1035,3,ZD),AND,(38,3,ZD,LE,1038,3,ZD)), 
  BUILD=(1,574,1045,2,577,449,1027,8,X)),                           
  IFTHEN=(WHEN=((960,1,CH,EQ,C'Y'),AND,(131,4,ZD,EQ,1041,4,ZD),AND, 
                (1035,3,SS,EQ,C'**'),AND,(1038,3,SS,EQ,C'**')),     
  BUILD=(1,574,1045,2,577,449,1027,8,X)),                           
  IFTHEN=(WHEN=NONE,                                                 
  BUILD=(1,574,C'OT',577,449,1027,8,C'1'))                           
                                                                     
  SORT FIELDS=(1026,9,CH,A)                                         
                                                                     
  OUTFIL REMOVECC,NODETAIL,BUILD=(1025X),                           
  SECTIONS=(1026,8,HEADER3=(1,1025))                                 
/*                                                                   
//JNF1CNTL DD *                                                     
  INREC BUILD=(1,1025,X,SEQNUM,8,ZD)                                 
//*                                                                 
//JNF2CNTL DD *                                                     
  INREC BUILD=(1,63,X)                                               
//*                                                                 
/*                                                                   
//XBI2CNTL DD  *                                                     
  OUTREC FIELDS=(1,37,38,2,BI,TO=ZD,LENGTH=03,40,985)               
/*
lakshmipathik
 
Posts: 12
Joined: Thu May 08, 2008 11:14 am
Location: bangalore
Has thanked: 5 times
Been thanked: 0 time

Re: Re: How to compare a filed of file1 with values in file2

 

Re: How to compare a filed of file1 with values in file2

Postby BillyBoyo » Sun Apr 14, 2013 2:08 am

Can you just take a pause.

I don't think that if he'd known of your volumes Kolusu would have suggested doing it like this.

You have failed at 88,000,000 records (or so). You are due to process 425,000,000,000, meaning you are only about 1/4800th of the way through the task.

That is just the number of records. That's more that 425,000,000 Megabytes, or 425,000 Gigabytes.

Do you really want to try to get that to run?

So, don't do any more runs for now.

Fully explain what you are doing, in all its detail, please.
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: How to compare a filed of file1 with values in file2

Postby lakshmipathik » Sun Apr 14, 2013 3:59 am

Hi Billyboyo,

I tried to explain as much as possible.

Existing: In our process we are reading a file (with 25M records) and based on 2 fields (X, Y) from the input file, we are calling a subprogram(which has a seperate logic and process) and getting corresponding entry for the matching fields and update the same in the input file.

Current:
Now we are no longer suppose to use the above subprogram to get the above said corresponding entries.
Instead we will be given a extract (with 17000 records) data shown like below
Now what we need to do is we have to read the same input file(I WILL CALL IT AS FILE1) used above (with 25m records) and search for the required entries in the latest extract(i WILL CALL IT AS FILE2) given to us based on below condition and if matches get the corresponding entries (a field value from pos 12-13) and update it in the input file read-FILE1, if not matching update with default value.

Values will be using for comparison from Input file - FILE1:
POSITION   FIELD NAME     DECLARATION
38-39   - bus-code       (declaration 01 bus-CODE   PIC S9(04) COMP) and it will have 3 digit value always.eX: 600
575-576   - com-code     (declaration 01 com-code   pic x(2))
131-134   - code-first    (declaration 01 code-first pic x(4))
959         - ind         (declaration 01 ind        pic x(1))   AND it will always have value either 'Y' or 'N' to indicate if it valid record or not


Values will be using for comparison from Input file - FILE2:
POSITION   FIELD NAME
2-4      -  MIN
5-7      - MAX
8-11     - CODE-FIRST   and
12-13    - COM-CODE




Conditions:

If ind = 'Y'(from FILE1)
Search if code-first from FILE1 is within All valid code-first field listed in FILE2
If found
get COM-CODE from FILE2 and updated the same in FILE1 at positino 575 and 576
else
update FILE1 position 575 and 576 with default value 'XX'


If ind = 'N'(from FILE1)
Search if (BUS-CODE from FILE1 is within the range of the MIN and MAX fields of FILE2) and (If CODE-FIRST from FILE2 = '**' )
get COM-CODE from FILE2 and updated the same in FILE1 at positino 575 and 576
else
Search if(BUS-CODE from FILE1 is within the range of the MIN and MAX fields of FILE2) and
if (code-first from FILE1 is within All valid code-first listed in FILE2)
If found
get COM-CODE from FILE2 and updated the same in FILE1 at positino 575 and 576
else
update FILE1 position 575 and 576 with default value 'XX'.

FILE2:
----+----1----+----2----+----3----+----4----+----5----+----6---
N600689**  ABXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
N691693**  BCyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
N1901997711DEZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
Y** ** 5199FGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Y** ** 8527GHKKKKKKKKKKKKKKKKKKK
(ie. I have below values at the positions
Position value   FIELD
1           N  - IND
2-4       600  - MIN
5-7       692  - MAX
8-11     '**  '- CODE-FIRST   and
12-13     AB  - COM-CODE)
lakshmipathik
 
Posts: 12
Joined: Thu May 08, 2008 11:14 am
Location: bangalore
Has thanked: 5 times
Been thanked: 0 time

Re: How to compare a filed of file1 with values in file2

Postby BillyBoyo » Sun Apr 14, 2013 4:29 am

After your first change, Kolusu gave you a solution with a "cartesian join". This means every record on file 1 matches every record on file 2.

The "cartesian product" which results is then sorted.

There is no way Kolusu would have done that if there was so much as a hint about the number of records on your files.

I think you need to explain why you can no longer use the Cobol sub-routine. SORT does not have file loop-ups. I can't think of a reasonable SORT solution for the look-up without passing the data twice, other than the case of a "SORT Exit", which would be a Cobol sub-routine, which DFSORT calls instead of a Cobol program calling it.

I've not even looked at all the other new processing.

If you are replacing some processing, you first have to design the new processing. You can't get it done by just saying "let's use SORT, get some code from the web, and go with that".
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: How to compare a filed of file1 with values in file2

Postby lakshmipathik » Sun Apr 14, 2013 4:40 am

Hi BillyBoyo,
Thanks for the clarification.
Actually, I first thought of using COBOL only as it involves several loopings and thought It will take lot more processing time, since my input file is huge. I was looking at the sort by thinking it will reduce lot of processing time.

just for curiosity, One more question, If I want to use the above JCL which I have developed, then until how many number of input records it can handle.

regards,
..lakshmipathi.
lakshmipathik
 
Posts: 12
Joined: Thu May 08, 2008 11:14 am
Location: bangalore
Has thanked: 5 times
Been thanked: 0 time

Re: How to compare a filed of file1 with values in file2

Postby dick scherrer » Sun Apr 14, 2013 8:36 am

Hello,


If I want to use the above JCL which I have developed, then until how many number of input records it can handle.

This will depend on limitations of the run environment. . .

Suggest you cut the volume in half and try. If that fails, try 1/4, etc to learn what volume your process can handle.

There are other approaches to the result that may help . . .
Hope this helps,
d.sch.

These users thanked the author dick scherrer for the post:
lakshmipathik (Mon Apr 15, 2013 3:26 am)
User avatar
dick scherrer
Global moderator
 
Posts: 6304
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 91 times

Re: How to compare a filed of file1 with values in file2

Postby BillyBoyo » Sun Apr 14, 2013 1:34 pm

If we keep your file 2 at 17,000 records, then you can process 1,471 records on file 1 (no, there is no typo there) before you exceed 25,000,000 records out of the join.

That is:

1,471 * 17,000 = 25,007,000


If you run this type of setup, I think it would be "quicker" with file 1 and file 2 "swapped", as enrico mentioned.

If you want to reduce the 17,000, you can see the calculation. The 25,000,000 is (a) the "amount of processing I can put up with", the 17,000 is (b) the "minimum look-up keys I need" and (c) is the "approximate number of records I can process".

(a) / (b) = (c)


Did you not notice anything "strange" when you ran your test data? It is important to look at the sysout, have a glance at the CPU and EXCPs when doing testing.

These users thanked the author BillyBoyo for the post:
lakshmipathik (Mon Apr 15, 2013 3:25 am)
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: How to compare a filed of file1 with values in file2

Postby lakshmipathik » Mon Apr 15, 2013 3:25 am

Hi skolusu, BillyBoyo,dick scherrer,and enrico ,

Thank you so much for your valuable time spent on this post. I truly admit that I have learnt several things over this post.

Now by seeing the kind of data, cartesian join is not suitable in this scenario, and hence I am going with COBOL subroutines.

regards,
....lakshmi pathi.
lakshmipathik
 
Posts: 12
Joined: Thu May 08, 2008 11:14 am
Location: bangalore
Has thanked: 5 times
Been thanked: 0 time

Re: How to compare a filed of file1 with values in file2

Postby BillyBoyo » Mon Apr 15, 2013 4:20 am

Again, you're being a little too quick.

The IO processing of DFSORT is much faster than COBOL. You have 25m records. You might want to consider and experiment with the SORT EXIT, which can be written in COBOL, which can then do the look-up part.

Looking up a 17,000 item table 25m times is something you want to code efficiently.

An important part of getting the code right is knowing the data. The fastest way to do anything, in terms of code, is to not do it. So you'll want to find an effective method to do the look-up, which is still easy to understand and maintain.

Are you running this once-off, daily, weekly, what? Any why?

If you have any problems or questions with the look-up, post them in the Cobol part of the forum.
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: How to compare a filed of file1 with values in file2

Postby skolusu » Mon Apr 15, 2013 10:31 pm

lakshmipathik wrote:INFILE1: A.B.C – LRECL=1025 – (I have around 2,50,00,000 records like below with different values)
INFILE2: B.C.D – LRECL=63 and below is the data
(I have around 17000 records like below with different values)

Second time, I ran by taking 2,29,00,000 records from INFILE1 and 17000 records from INFILE2 and it is almost 2.5 hours and still the job is running.


Woah. How hard is it to specify the LRECL and RECFM upfront. I was under the assumption that you just have 20 byte file and we can get away even with a cartesian join. Now you blow that LRECL by 50 times and the volume to 25 million.

I am not even sure if the JCL you show ran correctly as your copy statement is writing to a TEMP1 file and you are using the Same file as INPUT to JOINKEYS. Did that job even run?

Why are you even running ICETOOL? Do you have some other operators to do in the same step?

lakshmipathik wrote:just for curiosity, One more question, If I want to use the above JCL which I have developed, then until how many number of input records it can handle.


Just because something can be done, doesn't mean you have to do it. You need to think of ways to get the results.

Ideally I would load up the 17,000 records in a COBOL internal table and perform a search looking for the range. If written efficiently this code should run in less than 15 mins.

If you insist on doing in sort.

1. Find the MIN and MAX values from the 17000 file and use them as INCLUDE to filter out records from file1
2. Run file 1 thru the above min/max values and write 2 files. one which matches ALL your conditions and which are in the range of min/max values. The other one would be your unmatched file.
3. Reduce the file lengths drastically picking ONLY the fields you need in the final output.
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

PreviousNext

Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post