De-duplicating records in the outfil



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

De-duplicating records in the outfil

Postby dja2 » Tue Oct 02, 2012 8:45 pm

As you can see from the attached code, I am creating two datasets, with data from different parts of the input record.

Is it possible to de-duplicate the records in the output dataset in the same SORT step? (Code follows).
SORT FIELDS=COPY
OUTFIL FNAMES=CONNID,
BUILD=(1,6,C' ',264,4),
OUTFIL FNAMES=INTCSTID,
BUILD=(1,6,C' ',278,4)

I can, (and have), de-duplicated the datasets in a separate, following, SORT step. I just wondered if it was possible to do it all in one step.
dja2
 
Posts: 20
Joined: Wed Jul 11, 2012 6:11 pm
Has thanked: 13 times
Been thanked: 0 time

Re: De-duplicating records in the outfil

Postby BillyBoyo » Tue Oct 02, 2012 8:58 pm

If you show all the code for both the steps then we might have suggestions.
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: De-duplicating records in the outfil

Postby dick scherrer » Tue Oct 02, 2012 9:02 pm

Hello,

Suggest you post a bit of sample input data and the output you want when this input is processed. The records do Not need to be full-width, just enough to show what you have and want to do with it.

If i understand what you are asking, yes, this can be done in a single step.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: De-duplicating records in the outfil

Postby skolusu » Tue Oct 02, 2012 10:36 pm

dja2,

If your intention is write out 2 records for every record in the input file with contents from pos 264 and 278, you can use the following DFSORT JCL. You need to use the operator "/" to split the records into 2.

//STEP0100 EXEC PGM=SORT         
//SYSOUT   DD SYSOUT=*           
//SORTIN   DD DSN=Your input file,DISP=SHR 
//SORTOUT  DD SYSOUT=*           
//SYSIN    DD *                 
  SORT FIELDS=COPY               
  OUTFIL BUILD=(1,6,X,264,4,/,   
                1,6,X,278,4)     
//*
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: De-duplicating records in the outfil

Postby dja2 » Wed Oct 03, 2012 1:30 pm

Apologies for not giving enough information.

This is a sample of the input data
=COLS>    ----+----1----+----2----+----3----+----4----+----5--
000008    WAVE 3 15202510252825      15202510252825 KEY BB RD
000009    WAVE 3 15202510283461      15202510283461 KEY BB RD
000010    WAVE 3 15202510283461      15202510283461 KEY BB RD
000011    WAVE 3 15202510283488      15202510283488 KEY BB RD
000012    WAVE 3 15202510283488      15202510283488 KEY BB RD


Using this sample, from this input, I create three output files, every file contains columns 1 to 6 ("WAVE 3").
Output dataset 1 also contains columns 8 to 27.
Output dataset 2 also contains columns 28 to 41.
Output dataset 3 also contains columns 43 to 56.

So, each output dataset will contain 5 records.

Output datasets 1 and 2 will contain 3 distinct records.
Output dataset 3 will contain 1 distinct record.

To concentrate on output dataset 1, this will contain;

=COLS>    ----+----1----+----2----+--
000001    WAVE 3 15202510252825     
000002    WAVE 3 15202510283461     
000003    WAVE 3 15202510283461     
000004    WAVE 3 15202510283488     
000005    WAVE 3 15202510283488     


My question is, how do I remove the duplicate records from the output datasets, to give the following ?

=COLS>    ----+----1----+----2----+--
000001    WAVE 3 15202510252825     
000002    WAVE 3 15202510283461     
000003    WAVE 3 15202510283488     


A portion of my JCL follows

//STEP100  EXEC PGM=SORT,COND=(0,NE)                       
//SORTIN   DD DSN=XXXX.SEQ.WAVE4,DISP=OLD 
//CONNID   DD DSN=XXXX.SEQ.CONNID.UNSRTD, 
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=27,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//INTCSTID DD DSN=XXXX.SEQ.INTCSTID.UNSRTD,
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=21,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//CIN      DD DSN=XXXX.SEQ.CIN.UNSRTD,     
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=21,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//SORTMSG  DD SYSOUT=*                                     
//SYSOUT   DD SYSOUT=*                     
//SYSPRINT DD SYSOUT=*                     
//SYSUDUMP DD SYSOUT=*                     
//SYSIN    DD *                           
  SORT FIELDS=COPY                         
  OUTFIL FNAMES=CONNID,                   
         BUILD=(1,27),                     
  OUTFIL FNAMES=INTCSTID,                 
         BUILD=(1,7,28,14)                 
  OUTFIL FNAMES=CIN,                       
         BUILD=(1,7,43,14)                 
//*                                       


Above corrected to Output dataset 3 also contains columns 43 to 56.
dja2
 
Posts: 20
Joined: Wed Jul 11, 2012 6:11 pm
Has thanked: 13 times
Been thanked: 0 time

Re: De-duplicating records in the outfil

Postby NicC » Wed Oct 03, 2012 1:37 pm

Do you mean:
Output dataset 3 also contains columns 43 to 56
The problem I have is that people can explain things quickly but I can only comprehend slowly.
Regards
Nic
NicC
Global moderator
 
Posts: 3025
Joined: Sun Jul 04, 2010 12:13 am
Location: Pushing up the daisies (almost)
Has thanked: 4 times
Been thanked: 136 times

Re: De-duplicating records in the outfil

Postby dja2 » Wed Oct 03, 2012 2:01 pm

NicC

Apologies - typo - I should have written Output dataset 3.
dja2
 
Posts: 20
Joined: Wed Jul 11, 2012 6:11 pm
Has thanked: 13 times
Been thanked: 0 time

Re: De-duplicating records in the outfil

Postby BillyBoyo » Wed Oct 03, 2012 2:14 pm

If the duplicates can only be contiguous, then have a look at the "reporting functions" on OUTFIL. SECTIONS=(start,lengh with a TRAILERn will do a "consolidation". With REMOVECC (don't include the printer control-code) and NODETAIL (exclude detail records) you could get what you want.

If the "key fields" are not contiguous, you are going to have to SORT in a separate step.

These users thanked the author BillyBoyo for the post:
dja2 (Wed Oct 03, 2012 2:21 pm)
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: De-duplicating records in the outfil

Postby dja2 » Wed Oct 03, 2012 2:25 pm

BillyBoyo

Thanks very much. The "Key Fields" will not necessarily be contiguous, so it looks as though another SORT will have to be done.
dja2
 
Posts: 20
Joined: Wed Jul 11, 2012 6:11 pm
Has thanked: 13 times
Been thanked: 0 time

Re: De-duplicating records in the outfil

Postby skolusu » Wed Oct 03, 2012 9:39 pm

dja2,

If your intention is to get the unique records in columns 8 to 27 and columns 43 to 56, then the following DFSORT JCL will give you the desired results. The trick here is to use the same file twice and match on unique which we create using JNF1/JNF2


//STEP0100 EXEC PGM=SORT                                         
//SYSOUT   DD SYSOUT=*                                           
//INA      DD *                                                   
WAVE 3 15202510252825      15202510252825 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
//INB      DD *                                                   
WAVE 3 15202510252825      15202510252825 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
//CONNID   DD SYSOUT=*                                           
//CIN      DD SYSOUT=*                                           
//SYSIN    DD *                                                   
  JOINKEYS F1=INA,FIELDS=(29,8,A),SORTED,NOSEQCK                 
  JOINKEYS F2=INB,FIELDS=(29,8,A),SORTED,NOSEQCK                 
  JOIN UNPAIRED                                                   
  REFORMAT FIELDS=(F1:1,36,F2:1,36,?)                             
  INREC IFOUTLEN=28,IFTHEN=(WHEN=(73,1,CH,EQ,C'2'),BUILD=(37,28))
  SORT FIELDS=(8,21,CH,A)                                         
  SUM FIELDS=NONE                                                 
  OUTFIL FNAMES=CONNID,INCLUDE=(28,1,CH,EQ,C'A'),BUILD=(1,27)     
  OUTFIL FNAMES=CIN,INCLUDE=(28,1,CH,EQ,C'B'),BUILD=(1,21)       
//*                                                               
//JNF1CNTL DD *                                                   
  INREC BUILD=(1,27,C'A',SEQNUM,8,ZD,START=1,INCR=2)             
//*                                                               
//JNF2CNTL DD *                                                   
  INREC BUILD=(1,7,43,14,28:C'B',SEQNUM,8,ZD,START=2,INCR=2)     
//*
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

These users thanked the author skolusu for the post:
dja2 (Thu Oct 04, 2012 12:33 pm)
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Next

Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post