Page 1 of 2

De-duplicating records in the outfil

PostPosted: Tue Oct 02, 2012 8:45 pm
by dja2
As you can see from the attached code, I am creating two datasets, with data from different parts of the input record.

Is it possible to de-duplicate the records in the output dataset in the same SORT step? (Code follows).
SORT FIELDS=COPY
OUTFIL FNAMES=CONNID,
BUILD=(1,6,C' ',264,4),
OUTFIL FNAMES=INTCSTID,
BUILD=(1,6,C' ',278,4)

I can, (and have), de-duplicated the datasets in a separate, following, SORT step. I just wondered if it was possible to do it all in one step.

Re: De-duplicating records in the outfil

PostPosted: Tue Oct 02, 2012 8:58 pm
by BillyBoyo
If you show all the code for both the steps then we might have suggestions.

Re: De-duplicating records in the outfil

PostPosted: Tue Oct 02, 2012 9:02 pm
by dick scherrer
Hello,

Suggest you post a bit of sample input data and the output you want when this input is processed. The records do Not need to be full-width, just enough to show what you have and want to do with it.

If i understand what you are asking, yes, this can be done in a single step.

Re: De-duplicating records in the outfil

PostPosted: Tue Oct 02, 2012 10:36 pm
by skolusu
dja2,

If your intention is write out 2 records for every record in the input file with contents from pos 264 and 278, you can use the following DFSORT JCL. You need to use the operator "/" to split the records into 2.

//STEP0100 EXEC PGM=SORT         
//SYSOUT   DD SYSOUT=*           
//SORTIN   DD DSN=Your input file,DISP=SHR 
//SORTOUT  DD SYSOUT=*           
//SYSIN    DD *                 
  SORT FIELDS=COPY               
  OUTFIL BUILD=(1,6,X,264,4,/,   
                1,6,X,278,4)     
//*

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 1:30 pm
by dja2
Apologies for not giving enough information.

This is a sample of the input data
=COLS>    ----+----1----+----2----+----3----+----4----+----5--
000008    WAVE 3 15202510252825      15202510252825 KEY BB RD
000009    WAVE 3 15202510283461      15202510283461 KEY BB RD
000010    WAVE 3 15202510283461      15202510283461 KEY BB RD
000011    WAVE 3 15202510283488      15202510283488 KEY BB RD
000012    WAVE 3 15202510283488      15202510283488 KEY BB RD


Using this sample, from this input, I create three output files, every file contains columns 1 to 6 ("WAVE 3").
Output dataset 1 also contains columns 8 to 27.
Output dataset 2 also contains columns 28 to 41.
Output dataset 3 also contains columns 43 to 56.

So, each output dataset will contain 5 records.

Output datasets 1 and 2 will contain 3 distinct records.
Output dataset 3 will contain 1 distinct record.

To concentrate on output dataset 1, this will contain;

=COLS>    ----+----1----+----2----+--
000001    WAVE 3 15202510252825     
000002    WAVE 3 15202510283461     
000003    WAVE 3 15202510283461     
000004    WAVE 3 15202510283488     
000005    WAVE 3 15202510283488     


My question is, how do I remove the duplicate records from the output datasets, to give the following ?

=COLS>    ----+----1----+----2----+--
000001    WAVE 3 15202510252825     
000002    WAVE 3 15202510283461     
000003    WAVE 3 15202510283488     


A portion of my JCL follows

//STEP100  EXEC PGM=SORT,COND=(0,NE)                       
//SORTIN   DD DSN=XXXX.SEQ.WAVE4,DISP=OLD 
//CONNID   DD DSN=XXXX.SEQ.CONNID.UNSRTD, 
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=27,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//INTCSTID DD DSN=XXXX.SEQ.INTCSTID.UNSRTD,
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=21,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//CIN      DD DSN=XXXX.SEQ.CIN.UNSRTD,     
//            DISP=(NEW,CATLG,DELETE),                     
//            DCB=(LRECL=21,RECFM=FB),                     
//            DSORG=PS,                                   
//            SPACE=(CYL,(1,20),RLSE)                     
//SORTMSG  DD SYSOUT=*                                     
//SYSOUT   DD SYSOUT=*                     
//SYSPRINT DD SYSOUT=*                     
//SYSUDUMP DD SYSOUT=*                     
//SYSIN    DD *                           
  SORT FIELDS=COPY                         
  OUTFIL FNAMES=CONNID,                   
         BUILD=(1,27),                     
  OUTFIL FNAMES=INTCSTID,                 
         BUILD=(1,7,28,14)                 
  OUTFIL FNAMES=CIN,                       
         BUILD=(1,7,43,14)                 
//*                                       


Above corrected to Output dataset 3 also contains columns 43 to 56.

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 1:37 pm
by NicC
Do you mean:
Output dataset 3 also contains columns 43 to 56

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 2:01 pm
by dja2
NicC

Apologies - typo - I should have written Output dataset 3.

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 2:14 pm
by BillyBoyo
If the duplicates can only be contiguous, then have a look at the "reporting functions" on OUTFIL. SECTIONS=(start,lengh with a TRAILERn will do a "consolidation". With REMOVECC (don't include the printer control-code) and NODETAIL (exclude detail records) you could get what you want.

If the "key fields" are not contiguous, you are going to have to SORT in a separate step.

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 2:25 pm
by dja2
BillyBoyo

Thanks very much. The "Key Fields" will not necessarily be contiguous, so it looks as though another SORT will have to be done.

Re: De-duplicating records in the outfil

PostPosted: Wed Oct 03, 2012 9:39 pm
by skolusu
dja2,

If your intention is to get the unique records in columns 8 to 27 and columns 43 to 56, then the following DFSORT JCL will give you the desired results. The trick here is to use the same file twice and match on unique which we create using JNF1/JNF2


//STEP0100 EXEC PGM=SORT                                         
//SYSOUT   DD SYSOUT=*                                           
//INA      DD *                                                   
WAVE 3 15202510252825      15202510252825 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
//INB      DD *                                                   
WAVE 3 15202510252825      15202510252825 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283461      15202510283461 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
WAVE 3 15202510283488      15202510283488 KEY BB RD               
//CONNID   DD SYSOUT=*                                           
//CIN      DD SYSOUT=*                                           
//SYSIN    DD *                                                   
  JOINKEYS F1=INA,FIELDS=(29,8,A),SORTED,NOSEQCK                 
  JOINKEYS F2=INB,FIELDS=(29,8,A),SORTED,NOSEQCK                 
  JOIN UNPAIRED                                                   
  REFORMAT FIELDS=(F1:1,36,F2:1,36,?)                             
  INREC IFOUTLEN=28,IFTHEN=(WHEN=(73,1,CH,EQ,C'2'),BUILD=(37,28))
  SORT FIELDS=(8,21,CH,A)                                         
  SUM FIELDS=NONE                                                 
  OUTFIL FNAMES=CONNID,INCLUDE=(28,1,CH,EQ,C'A'),BUILD=(1,27)     
  OUTFIL FNAMES=CIN,INCLUDE=(28,1,CH,EQ,C'B'),BUILD=(1,21)       
//*                                                               
//JNF1CNTL DD *                                                   
  INREC BUILD=(1,27,C'A',SEQNUM,8,ZD,START=1,INCR=2)             
//*                                                               
//JNF2CNTL DD *                                                   
  INREC BUILD=(1,7,43,14,28:C'B',SEQNUM,8,ZD,START=2,INCR=2)     
//*