Removing duplicates from the second file only



Support for NetApp SyncSort for z/OS, Visual SyncSort, SYNCINIT, SYNCLIST and SYNCTOOL

Removing duplicates from the second file only

Postby sujithsamuel » Thu Nov 03, 2011 9:27 am

Dear Forum members

I have a situation in which there are 2 files(LRECL 80), e.g.

File 1
There are the below records in this file

ABDULLA
MOHAMMED
RESET
MOVE
7
9
4
2

FIle 2
There are the below records in this file

NEXT
INSERT
RESET
55
0
9
8
2

When i sort both these files into one file to remove the duplicates (considering 1:80 as the key), the order of the record changes, the output becomes
ABDULLAH
INSERT
MOHAMED
MOVE
NEXT
RESET
0
2
4
55
7
8
9

I do want the duplicates to be removed, but the order of the records should not change. Basically if duplicates exist in file 1 they should be removed. The other requirement is that if any record in file 2 matches with any record in file 1, then the file 2 record should be removed. So basically the result should be like this

ABDULLAH
MOHAMED
RESET
MOVE
7
9
4
2
NEXT
INSERT
55
0
8

THis has to done using SYNCSORT.

Kindly help

Thanks
Sujith
sujithsamuel
 
Posts: 2
Joined: Thu Nov 03, 2011 9:16 am
Has thanked: 0 time
Been thanked: 0 time

Re: Removing duplicates from the second file only

Postby ectgunner64 » Wed Nov 09, 2011 8:18 pm

This can be done by an initial sort that concatenates the 2 input files together and writes out records that have duplicates removed and puts a sequence number in columns 81-88 of an intermediate output file. The second sort resequences the intermediate file (that has the duplicates removed) by the sequence number in columns 81-88 to give you the desired result.


//SORT1 EXEC PGM=SORT
//SORTIN DD DSN=SORT1.INPFILE1, This is your first input file
// DISP=SHR
// DD DSN=SORT1.INPFILE2, This is your second input file
// DISP=SHR
//SORTOUT DD DSN=SORT1.TEMPFILE, This is the intermediate file with the sequence numbers in 81-88
// DCB=(RECFM=FB,LRECL=88,BLKSIZE=0),
// DISP=(,CATLG,DELETE)
//SYSOUT DD SYSOUT=*
//SYSIN DD *
INREC FIELDS=(1,80,SEQNUM,8,ZD)
SORT FIELDS=(1,80,CH,A)
SUM FIELDS=NONE
/*
//STEP0020 EXEC PGM=SORT
//SORTIN DD DSN=SORT1.TEMPFILE, This is the intermediate file with the sequence numbers in 81-88
// DISP=(OLD,DELETE,KEEP)
//SORTOUT DD DSN=SORT2.OUTFILE, This is your desired output file with the 13 records in the sequence you want
// DCB=(RECFM=FB,LRECL=80,BLKSIZE=0),
// DISP=(,CATLG,DELETE)
//SYSOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(81,8,CH,A)
OUTREC FIELDS=(1:1,80)
/*
ectgunner64
 
Posts: 8
Joined: Wed Sep 07, 2011 8:30 pm
Has thanked: 0 time
Been thanked: 0 time


Return to Syncsort/Synctool

 


  • Related topics
    Replies
    Views
    Last post