Keeping / Matching duplicate records within each input file



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Keeping / Matching duplicate records within each input file

Postby claywilly » Sat Jun 14, 2008 2:47 am

Hello

We want to be able to keep duplicates within each file and splice out the records that match from another file.

We have a job that runs twice a month (15th and 30th).

The file created on the 15th consists of records with length of 159 and can have duplicates.
The file created on the 30th consists of records with length of 159, can have duplicates and it also contains the records from the file created on the 15th.

We want to run a sort/splice job to compare both files, remove the records that match (including the matching duplicates) and result in a new file which consists of the remaining records (with duplicates).

The whole record (159 bytes) will be compared. Also keeping them in the same order would be nice too.

Sample data:


File 1 (15th)
AAAA…
BBBB…
CCCC…
CCCC…

File 2 (30th)
AAAA…
BBBB…
CCCC…
CCCC…
DDDD…
EEEE…
FFFF…
FFFF…
GGGG…

New File
DDDD…
EEEE…
FFFF…
FFFF…
GGGG…


We’ve tried all sorts of combinations, but haven’t gotten the correct results yet.
Any help would be appreciated. Thanks...
User avatar
claywilly
 
Posts: 26
Joined: Sat Jun 14, 2008 12:01 am
Has thanked: 0 time
Been thanked: 0 time

Re: Keeping / Matching duplicate records within each input file

 

Re: Keeping / Matching duplicate records within each input file

Postby dragone007 » Mon Jun 16, 2008 8:15 pm

Hello Claywilly,

I've done something like your requirement. I'm posting it here assuming the LRECL=80 for both files with Key from 1-st to 4-th byte:

//MATCH EXEC PGM=ICETOOL
//*
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN1 DD *
AAAA11111111111
BBBB22222222222
CCCC33333333333
CCCC44444444444
//IN2 DD *
AAAA22222222222
BBBB33333333333
CCCC44444444444
CCCC55555555555
DDDD66666666666
EEEE77777777777
FFFF88888888888
FFFF99999999999
GGGG11111111111
//OUT DD SYSOUT=*
//OUT1 DD SYSOUT=*
//T1 DD DSN=&&T1,DISP=(MOD,PASS),SPACE=(CYL,(1,50)),UNIT=SYSDA
//TOOLIN DD *
COPY FROM(IN1) USING(WK01)
COPY FROM(IN2) USING(WK02)
SPLICE FROM(T1) TO(OUT) ON(01,04,CH) WITHALL -
WITH(01,80) KEEPNODUPS KEEPBASE USING(WK03)
//WK01CNTL DD *
OUTFIL FNAMES=T1,BUILD=(1,80,81:1,4)
//WK02CNTL DD *
OUTFIL FNAMES=T1,OVERLAY=(81:4X)
//WK03CNTL DD *
OUTFIL FNAMES=OUT,BUILD=(1,80),
INCLUDE=(1,4,CH,EQ,81,4,CH,AND,81,4,CH,NE,C' ')
OUTFIL FNAMES=OUT1,SAVE
/*

The result in Out:
AAAA11111111111
AAAA22222222222
BBBB22222222222
BBBB33333333333
CCCC33333333333
CCCC44444444444
CCCC44444444444
CCCC55555555555

The result in Out1:
DDDD66666666666
EEEE77777777777
FFFF88888888888
FFFF99999999999
GGGG11111111111

May be this can help you out.

Denis
dragone007
 
Posts: 3
Joined: Wed Jun 04, 2008 7:01 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Keeping / Matching duplicate records within each input file

Postby skolusu » Tue Jun 17, 2008 3:27 am

Claywilly,

The following DFSORT/ICETOOL JCL will give you the desired results.

//STEP0100 EXEC PGM=ICETOOL                                   
//TOOLMSG  DD SYSOUT=*                                       
//DFSMSG   DD SYSOUT=*                                       
//IN1      DD *                                               
AAAA                                                         
BBBB                                                         
CCCC                                                         
CCCC                                                         
//IN2      DD *                                               
AAAA                                                         
BBBB                                                         
CCCC                                                         
CCCC                                                         
DDDD                                                         
EEEE                                                         
FFFF                                                         
FFFF                                                         
GGGG                                                         
//T1       DD DSN=&&T1,DISP=(MOD,PASS),SPACE=(CYL,(1,1),RLSE)
//OUT      DD SYSOUT=*                                       
//TOOLIN   DD *                                               
  SORT FROM(IN1) USING(CTL1)                                 
  SORT FROM(IN2) USING(CTL1)                                 
  SELECT FROM(T1) TO(OUT) ON(1,167,CH) NODUPS USING(CTL3)     
//CTL1CNTL DD *                                               
  SORT FIELDS=(1,159,CH,A)                                   
  OUTFIL FNAMES=T1,OVERLAY=(160:SEQNUM,8,ZD,RESTART=(1,159)) 
//CTL3CNTL DD *                                               
  OUTFIL FNAMES=OUT,BUILD=(01,159)                           
/*
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Keeping / Matching duplicate records within each input file

Postby claywilly » Tue Jun 17, 2008 5:45 am

Thanks Skolusu, it works great if the jobs are run on the 15th and then on the 30th. But the 'gap' from the 30th to the next 15th, there won't be any data matches (nodups). In this case we only want the data from the 15th run and not the data from the 30th of the prior month. Does that make sense?

For example: When the job runs again on the 15th, it could have the following data:
File 1 (15th) New data
AAAA…
BBBB…
CCCC…
CCCC...

File 2 (Prior 30th) Would be the OUT file that was created from the JCL (provided by Skolusu)
DDDD…
EEEE…
FFFF…
FFFF…
GGGG…

Since there is no data that matches, what happens? Will the new OUT file combine data from both files?
We only want to keep the new data from file 1 and not from file 2.
Then when the job runs on the 30th it will follow the original scenario and solution as provided by Skolusu.

Any Ideas?
User avatar
claywilly
 
Posts: 26
Joined: Sat Jun 14, 2008 12:01 am
Has thanked: 0 time
Been thanked: 0 time


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post