Compare multipe records in a single file



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Compare multipe records in a single file

Postby ragsara » Tue Feb 18, 2014 11:10 pm

Hi ,

Please find my below requirement.
1) Need to compare multiple records for the same key ( Key being the ID)
2) Pull all records for the same key if there are any mismatch between First Name and DOB within multiple records for the same key.
3) Field Lengths - ID : 11 , First Name : 15 , Last NAme : 15 , DOB : 10 - Total : 51
4) I have been analysing GROUP option and IFTHEN option and KEYBEGIN options, but couldn arrive at the correct code. Please guide me through the solution

Input :

ID           First Name        Last Name         DOB
1            Jack              Daniels           1955-04-19
1            Jack              Daniels           1955-04-19
1            Jck               Daniels           1954-04-19
384          Joaan             Stanley           1996-04-01
384          Joaan             Stanley           1996-04-01
384          Joaan             Stanley           1996-04-01
384          Joaan             Stanley           1996-04-01
4789         William           Forster           1992-06-12
4789         William           Forter            1991-06-12

Output :

ID           First Name        Last Name         DOB
1            Jack              Daniels           1955-04-19
1            Jack              Daniels           1955-04-19
1            Jck               Daniels           1954-04-19
4789         William           Forster           1992-06-12
4789         William           Forter            1991-06-12

ragsara
 
Posts: 23
Joined: Thu Mar 22, 2012 5:17 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Compare multipe records in a single file

 

Re: Compare multipe records in a single file

Postby skolusu » Wed Feb 19, 2014 2:56 am

Do you need the entire group of keys or can it be just 1 key and the mismatched key?

for example for key 1

1          JACK           DANIELS        1955-04-19
1          JCK            DANIELS        1954-04-19


Instead of
1          JACK           DANIELS        1955-04-19
1          JACK           DANIELS        1955-04-19
1          JCK            DANIELS        1954-04-19
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Compare multipe records in a single file

Postby ragsara » Wed Feb 19, 2014 3:21 am

Hi Kolusu,

One record should be fine, but we need to make sure if we do not miss out if there are more than one mismatched records for a unique key.

I will later collect the set of keys from the output and extract their corresponding records from initial file through JOINKEYS.
ragsara
 
Posts: 23
Joined: Thu Mar 22, 2012 5:17 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Compare multipe records in a single file

Postby skolusu » Wed Feb 19, 2014 11:48 pm

ragsara wrote:Hi Kolusu,

One record should be fine, but we need to make sure if we do not miss out if there are more than one mismatched records for a unique key.

I will later collect the set of keys from the output and extract their corresponding records from initial file through JOINKEYS.


Well if you are running the joinkeys once again then why not simply say that you need the complete set of records? Use the following ICETOOL JCL which will give you the desired results

//STEP0100 EXEC PGM=ICETOOL                                   
//TOOLMSG  DD SYSOUT=*                                       
//DFSMSG   DD SYSOUT=*                                       
//IN       DD DISP=SHR,DSN=Your Input 51 byte FB file           
//T1       DD DSN=&&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)   
//OUT      DD SYSOUT=*                                       
//TOOLIN   DD *                                             
  SELECT FROM(IN) TO(T1) ON(01,11,UFF) ALLDUPS USING(CTL1)   
  COPY JKFROM TO(OUT) USING(CTL2)                           
//*                                                         
//CTL1CNTL DD *                                             
  SORT FIELDS=(01,11,UFF,A,                                 
               12,15,CH,A,                                   
               27,15,CH,A,                                   
               42,10,CH,A)                                   
  SUM FIELDS=NONE                                           
  OUTFIL FNAMES=T1,REMOVECC,NODETAIL,BUILD=(11X),           
  SECTIONS=(1,11,TRAILER3=(1,11))                           
//*                                                         
//CTL2CNTL DD *                                             
  JOINKEYS F1=IN,FIELDS=(1,11,A)                             
  JOINKEYS F2=T1,FIELDS=(1,11,A),SORTED,NOSEQCK             
  REFORMAT FIELDS=(F1:1,51)                                 
//*                                                         
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Compare multipe records in a single file

Postby ragsara » Fri Feb 21, 2014 2:37 am

Hi Kolusu,

Thanks for your help. I have a huge number of records to be processed, the total number comes around 450 Million records.

Because of this huge number of records,job is failing with RC 0016. (I have provided the code and the error messages below)

PLease let me know if we can dynamically increase the main storage to handle huge records or is there any other way to handle this error.

JCL :

//STEP0100 EXEC PGM=ICETOOL                                     
//TOOLMSG  DD SYSOUT=*                                         
//DFSMSG   DD SYSOUT=*                                         
//IN       DD DISP=SHR,DSN=F6435T.ALLSRC.BOB.D140218           
//T1       DD DSN=&&T1,DISP=(,PASS),SPACE=(CYL,(4000,4000),RLSE)
//OUT      DD DSN=F6435T.ALLSRC.BOB.D140218.ICE,               
//            DISP=(NEW,CATLG,DELETE),                         
//            DCB=(LRECL=302,RECFM=FB,BLKSIZE=0),               
//            SPACE=(CYL,(4000,4000),RLSE)                     
//SORTWK01 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK02 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK03 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK04 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK05 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK06 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK07 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK08 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK09 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))             
//SORTWK10 DD UNIT=SYSALLDA,SPACE=(CYL,(4200,4500))   
//TOOLIN   DD *                                           
  SELECT FROM(IN) TO(T1) ON(03,11,UFF) ALLDUPS USING(CTL1)
  COPY JKFROM TO(OUT) USING(CTL2)                         
/*                                                       
//CTL1CNTL DD *                                           
  SORT FIELDS=(03,11,UFF,A,                               
               198,35,CH,A,                               
               233,60,CH,A,                               
               293,10,CH,A)                               
  SUM FIELDS=NONE                                         
  OUTFIL FNAMES=T1,REMOVECC,NODETAIL,BUILD=(11X),         
  SECTIONS=(3,11,TRAILER3=(3,11))                         
/*                                                       
//CTL2CNTL DD *                                           
  JOINKEYS F1=IN,FIELDS=(3,11,A)                         
  JOINKEYS F2=T1,FIELDS=(1,11,A),SORTED,NOSEQCK           
  REFORMAT FIELDS=(F1:1,302)                             
/*                                                                 


Error message
TOOLMSG
ICE632I 0 SOURCE FOR ICETOOL STATEMENTS:  TOOLIN                               
                                                                               
                                                                               
ICE630I 0 MODE IN EFFECT:  STOP                                               
                                                                               
            SELECT FROM(IN) TO(T1) ON(03,11,UFF) ALLDUPS USING(CTL1)           
ICE606I 0 DFSORT CALL 0001 FOR SORT  FROM IN       TO T1       USING CTL1CNTL COMPLETED
ICE628I 0 RECORD COUNT:  000000216466627                                       
ICE638I 0 NUMBER OF RECORDS RESULTING FROM CRITERIA:  000000031195324         
ICE602I 0 OPERATION RETURN CODE:  00                                           
                                                                               
            COPY JKFROM TO(OUT) USING(CTL2)                                   
ICE606I 0 DFSORT CALL 0002 FOR COPY  FROM JOINKEYS TO OUT      USING CTL2CNTL TERMINATED
ICE602I 0 OPERATION RETURN CODE:  16                                           
                                                                               
                                                                               
ICE601I 0 DFSORT ICETOOL UTILITY RUN ENDED - RETURN CODE:  16                 

DFSMSG
ICE039A 1 INSUFFICIENT MAIN STORAGE - ADD AT LEAST 28K BYTES
ragsara
 
Posts: 23
Joined: Thu Mar 22, 2012 5:17 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Compare multipe records in a single file

Postby skolusu » Fri Feb 21, 2014 3:05 am

ragsara wrote:Hi Kolusu,

Thanks for your help. I have a huge number of records to be processed, the total number comes around 450 Million records.

Because of this huge number of records,job is failing with RC 0016. (I have provided the code and the error messages below)

PLease let me know if we can dynamically increase the main storage to handle huge records or is there any other way to handle this error.
ICE039A 1 INSUFFICIENT MAIN STORAGE - ADD AT LEAST 28K BYTES


ragsara,

Since a JOINKEYS application uses three tasks, it can require more storage than a regular DFSORT application. You may need to use REGION=0M for some JOINKEYS applications. So Add REGION=0M to your job like this

PGM=ICETOOL,REGION=0M 


You need to give us the complete details and we could have suggested an alternative as the first sort doesn't require the entire 302 bytes to be sorted. You can just pick the fields you want to test using INREC statement.

Also get rid off your JCL sortwork allocations and use the dynamic allocation of DFSORT which allocates work datasets as needed. so add
OPTION DYNALLOC=(SYSALLDA,50) which will allocate the sortworks needed for your job.

Change your SELECT statement to the following
//TOOLIN   DD *                                             
  SELECT FROM(IN) TO(T1) ON(01,11,UFF) ALLDUPS USING(CTL1)
//*


Change your CTL1CNTL to the following

//CTL1CNTL DD * 
  INREC BUILD=(3,11,198,35,233,60,293,10)
                       
  OPTION DYNALLOC=(SYSALLDA,50)
                 
  SORT FIELDS=(01,011,UFF,A,
               12,105,CH,A)
                               
  SUM FIELDS=NONE                                         
  OUTFIL FNAMES=T1,REMOVECC,NODETAIL,BUILD=(11X),         
  SECTIONS=(1,11,TRAILER3=(1,11))                         
/*


Change your CTL2CNTL to the following
//CTL2CNTL DD *       
  OPTION DYNALLOC=(SYSALLDA,50)
  JOINKEYS F1=IN,FIELDS=(3,11,A)                         
  JOINKEYS F2=T1,FIELDS=(1,11,A),SORTED,NOSEQCK           
  REFORMAT FIELDS=(F1:1,302)                             
/*
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Compare multipe records in a single file

Postby ragsara » Mon Feb 24, 2014 10:56 pm

Hi Kolusu,

I tried your code and its pulling all the records into output file, irrespective of the fact whether the key has same first name or different first name, same DOB or different DOB.

Just a recap of my requirements, i need those records whose key has different first name and different DOB among its set of records.

Please find my below testing of your code.

Input file :
Key : Starting position : 3 , Length : 11

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
***************************** Top of Data ******************************
CO1          PNX2010400708380400                                       
CS1          02011400002734910900001                                   
PA1          IL-CO-359002043-00                                         
PA1          NI-22-7083804-01                                           
PA1          RI-11-0303460-01-E                                         
CO21         PNX2071700708382500                                       
CS21         07024350052238821000001                                   
PA21         IL-CO-334002568-00                                         
PA21         NI-22-7083825-01                                           
CO41         PNX2010400143858100                                       
PA41         IL-CO-359002207-00                                         
PA41         NI-22-1438581-01                                           
PA41         NI-22-7083847-01                                           
PD41         002487161_00000000000056362851_S260_00008444               
PA61         IL-CO-359001976-00                                         
PA61         NI-22-7083871-01                                           
PD61         002488020_00000000000056436101_S260_00008444                 


First name : Starting position : 198 , Length : 35

4----+----5----+----6----+----7----+----8----+----9----+----0----+----1-
***************************** Top of Data ******************************
                                                          GRAHAM       
                                                          GRAHAM       
                                                          GRAHAM       
                                                          GRAHAM       
                                                          GRAHAM       
                                                          LYLE         
        834495305                                         LYLE         
                                                          LYLE         
                                                          LYLEE         
                                                          LAWRENCE     
                                                          LAWRENCE A   
                                                          LAWRENCE     
                                                          LAWRENCE     
        7083847                                           LAWRENCE     
                                                          JOE           
                                                          JOEY         
        7083871                                           JOE           


DOB : Starting position : 293, Length : 10

----+----4----+----5----+----6----+----7----+----8----+----9----+----0--
***************************** Top of Data ******************************
  BARGER                                                      1939-05-29
  BARGER                                                      1939-05-29
  BARGER                                                      1939-05-29
  BARGER                                                      1939-05-29
  BARGER                                                      1939-05-29
  PHIPPS                                                      1935-12-22
  PHIPPS                                                      1935-12-22
  PHIPPS                                                      1935-12-22
  PHIPPS                                                      1945-12-22
  CORIAIN                                                     1934-04-11
  CORIAIN                                                     1934-04-11
  CORIAIN                                                     1934-04-11
  CORIAIN                                                     1934-04-11
  CORIAIN                                                     1934-04-11
  SHEARRIN                                                    1931-03-19
  SHEARRIN                                                    1931-04-19
  SHEARRIN                                                    1931-03-19


JCL

//STEP0100 EXEC PGM=ICETOOL                                 
//TOOLMSG  DD SYSOUT=*                                     
//DFSMSG   DD SYSOUT=*                                     
//IN       DD DISP=SHR,DSN=F6435T.ALLSRC.BOB.D140218.SAMPLE
//T1       DD DSN=F6435T.ALLSRC.BOB.D140218.ICE.TEMP,       
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=11,RECFM=FB,BLKSIZE=0),           
//            SPACE=(CYL,(4000,4000),RLSE)                 
//OUT      DD DSN=F6435T.ALLSRC.BOB.D140218.ICE,           
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=302,RECFM=FB,BLKSIZE=0),           
//            SPACE=(CYL,(4000,4000),RLSE)                 
//TOOLIN   DD *                                             
  SELECT FROM(IN) TO(T1) ON(03,11,UFF) ALLDUPS USING(CTL1) 
  COPY JKFROM TO(OUT) USING(CTL2)                           
/*                                                         
//CTL1CNTL DD *                                 
  INREC BUILD=(3,11,198,35,293,10)               
  OPTION DYNALLOC=(SYSALLDA,50)                 
  SORT FIELDS=(01,011,UFF,A,12,45,CH,A)         
  SUM FIELDS=NONE                               
  OUTFIL FNAMES=T1,REMOVECC,NODETAIL,BUILD=(11X),
  SECTIONS=(1,11,TRAILER3=(1,11))               
/*                                               
//CTL2CNTL DD *                                 
  OPTION DYNALLOC=(SYSALLDA,50)                 
  JOINKEYS F1=IN,FIELDS=(3,11,A)                 
  JOINKEYS F2=T1,FIELDS=(1,11,A),SORTED,NOSEQCK 
  REFORMAT FIELDS=(F1:1,302)                     
/*                                               


Output file : T1

 BROWSE    F6435T.ALLSRC.BOB.D140218.ICE.TEMP         Line 00000000 Col 001 011
 Command ===>                                                  Scroll ===> CSR 
********************************* Top of Data **********************************
1                                                                               
21                                                                             
41                                                                             
61                                                                             
******************************** Bottom of Data ********************************


Output file : OUT

 BROWSE    F6435T.ALLSRC.BOB.D140218.ICE              Line 00000000 Col 0
 Command ===>                                                  Scroll ===
********************************* Top of Data ***************************
CO1          PNX2010400708380400                                         
CS1          02011400002734910900001                                     
PA1          IL-CO-359002043-00                                         
PA1          NI-22-7083804-01                                           
PA1          RI-11-0303460-01-E                                         
CO21         PNX2071700708382500                                         
CS21         07024350052238821000001                                     
PA21         IL-CO-334002568-00                                         
PA21         NI-22-7083825-01                                           
CO41         PNX2010400143858100                                         
PA41         IL-CO-359002207-00                                         
PA41         NI-22-1438581-01                                           
PA41         NI-22-7083847-01                                           
PD41         002487161_00000000000056362851_S260_00008444               
PA61         IL-CO-359001976-00                                         
PA61         NI-22-7083871-01                                           
PD61         002488020_00000000000056436101_S260_00008444               


To sum up the testing :

Input file has 4 keys : 1 , 21 , 41 , 61

Key 1 : Same First name (GRAHAM) and Same DOB (1939-05-29) for all records - Should not be written into output file
Key 21 : Different First name (LYLE & LYLEE) and Different DOB (1935-12-22 & 1945-12-22) - Should be wrritten into output file
Key 41 : Different First name (LAWRENCE & LAWRENCE A) and same DOB (1934-04-11) - Should not be wrriten into output file
Key 61 : Different First name (JOE & JOEY) and Different DOB (1931-03-19 & 1931-04-19) - Should be written into output file.

But in my testing all records form input file has been written into output file.

But my expected test cases were as below :

When a record has both ( as that of Key 21 and 61), different first name and different DOB from other set of records belonging to same key , it has to be written in output file.
When a records has different first name but same DOB (Key 41) or vice versa (Same first name and different DOB), then it has to be skipped and shouldnt be written into output file.

Hope am clear, a bit complex requirement hoping to have a solution from you.

Thanks in advance and being patient in giving your solution.
ragsara
 
Posts: 23
Joined: Thu Mar 22, 2012 5:17 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Compare multipe records in a single file

Postby skolusu » Tue Feb 25, 2014 1:23 am

ragsara wrote:Hi Kolusu,

I tried your code and its pulling all the records into output file, irrespective of the fact whether the key has same first name or different first name, same DOB or different DOB.

Just a recap of my requirements, i need those records whose key has different first name and different DOB among its set of records.

Please find my below testing of your code.

//TOOLIN   DD *                                            
  SELECT FROM(IN) TO(T1) ON(03,11,UFF) ALLDUPS USING(CTL1)  
//CTL1CNTL DD *                                  
  INREC BUILD=(3,11,198,35,293,10)              
 




Ragsara,

You really NEED to pay attention to the control cards suggested. You FORGOT to make the CHANGE to the positions on your SELECT operator when you used INREC in CTL1. I specifically mentioned about changing the SELECT statement in this post. dfsort-icetool-icegener/topic9776.html#p48743 but you seem to ignore that and hence the wrong results. Correct that and you will see that you will not pick up the key 1.



ragsara wrote:Input file has 4 keys : 1 , 21 , 41 , 61

Key 1 : Same First name (GRAHAM) and Same DOB (1939-05-29) for all records - Should not be written into output file
Key 21 : Different First name (LYLE & LYLEE) and Different DOB (1935-12-22 & 1945-12-22) - Should be wrritten into output file
Key 41 : Different First name (LAWRENCE & LAWRENCE A) and same DOB (1934-04-11) - Should not be wrriten into output file
Key 61 : Different First name (JOE & JOEY) and Different DOB (1931-03-19 & 1931-04-19) - Should be written into output file.

But in my testing all records form input file has been written into output file.


You seem to be going in circles. Your testing sample shows ONLY 3 key fields and your real data has 4 key fields and you don't even mention the rules about the missing field. I cannot read your mind nor your data, so you need to do a really really good job of explaining everything in detail.

Show just the data of 4 fields in question mentioning the positions and format of them and write down ALL the rules. Show records to match each rule and expected output.
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Compare multipe records in a single file

Postby ragsara » Tue Feb 25, 2014 2:25 am

Hi Kolusu,

Apologize for not updating correct positions of fields in SELECT statement.

Detailed explanation of my requirement :

Please find the below fields, positions and their formats involved in testing

Field Name : ID (Key) , Starting position : 3 , Length : 11 , Format : Numenric
Field Name : First Name , Starting position : 198 , Length : 35 , Format : Text
Field Name : DOB , Starting position : 293 , Length : 10 , Format : YYYY-MM-DD

Rules :

1) For a given unique key, if a record has DIFFERENT FIRST NAME AND DIFFERENT DOB from other records belonging to the same key, then the entire set of records belonging to the unique key has to be written into output file. (Second record in the below mentioned example)

For Eg)
----+----1----+----2----+----3----+----4----+----5----+-
***************************** Top of Data **************
ID         FIRST NAME                         DOB       
1          BARGER                             1939-05-29
1          BARGERET                           1949-06-19
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29


2) For a given key, if a record has SAME FIRST NAME but DIFFERENT DOB from other records belonging to the same key, then the entire set of records belonging to the unique key needs to be skipped and should NOT be written into output file. (Second record in the below mentioned example)

For Eg )

----+----1----+----2----+----3----+----4----+----5----+-
***************************** Top of Data **************
ID         FIRST NAME                         DOB       
1          BARGER                             1939-05-29
1          BARGER                             1949-06-19
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29


3) For a given key, if a record has DIFFERENT FIRST NAME but SAME DOB from other records belonging to the same key, then the entire set of records belonging to the unique key needs to be skipped and should NOT be written into output file. (Second record in the below mentioned example)

For Eg )

----+----1----+----2----+----3----+----4----+----5----+-
***************************** Top of Data **************
ID         FIRST NAME                         DOB       
1          BARGER                             1939-05-29
1          BARGERET                           1939-05-29
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29


4) For a given key, if all the records have SAME FIRST NAME AND SAME DOB , then the entire set of records belonging to the unique key needs to be skipped and should NOT be written into output file.

For Eg )

----+----1----+----2----+----3----+----4----+----5----+-
***************************** Top of Data **************
ID         FIRST NAME                         DOB       
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29
1          BARGER                             1939-05-29


Please let me know if you need more information or clarification on my requirement.
ragsara
 
Posts: 23
Joined: Thu Mar 22, 2012 5:17 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Compare multipe records in a single file

Postby skolusu » Tue Feb 25, 2014 5:32 am

ragsara,

So we are sticking with 3 keys to compare instead of 4 keys? You need to be clear on this as the solution will differ if you have 4 keys or 3 keys
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Next

Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post