Page 1 of 1

Remove duplicates for only specific records

PostPosted: Fri Oct 15, 2010 4:50 pm
by santhoshkumar_sm
Hi All,

I have a requirement wherein I need to remove duplicates for only specific records.

Input (dynamic with the below format):

AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 1234
CCCC 5895
CCCC 1234
CCCC 7545
CCCC 8877
AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 8585

Requirement:

I need to remove duplicates for only the records starting with CCCC and also the rest should remain as such. So, my output should be like

OUTPUT:

AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 1234
CCCC 5895
CCCC 7545
CCCC 8877
AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 8585

Many thanks.

Re: Remove duplicates for only specific records

PostPosted: Fri Oct 15, 2010 10:35 pm
by Frank Yaeger
You can use a DFSORT job like the following to do what you asked for. I assumed your input file has RECFM=FB and LRECL=80, but the job can be changed appropriately for other attributes.

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG  DD SYSOUT=*
//IN DD *
AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 1234
CCCC 5895
CCCC 1234
CCCC 7545
CCCC 8877
AAAA ***fdsf***sdf**
BBBBB **gfhfghfghfg*
CCCC 8585
/*
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD SYSOUT=*
//TOOLIN DD *
SELECT FROM(IN) TO(T1) ON(81,8,ZD) ON(5,76,CH) FIRST USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,8,ZD,89:81,8)),
    IFTHEN=(WHEN=(1,4,CH,EQ,C'CCCC'),OVERLAY=(81:8C'0'))
/*
//CTL2CNTL DD *
  SORT FIELDS=(89,8,ZD,A)
  OUTREC BUILD=(1,80)
/*

Re: Remove duplicates for only specific records

PostPosted: Fri Oct 15, 2010 10:53 pm
by santhoshkumar_sm
Hi Frank,

Thanks for your help... I can try this out only on monday....
If you dont mind can you please explain me the sort card you have used.... I am new to mainframe and also a beginner...

SELECT FROM(IN) TO(T1) ON(81,8,ZD) ON(5,76,CH) FIRST USING(CTL1)

INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,8,ZD,89:81,8)),
IFTHEN=(WHEN=(1,4,CH,EQ,C'CCCC'),OVERLAY=(81:8C'0'))

Lots of thanks...

Re: Remove duplicates for only specific records

PostPosted: Fri Oct 15, 2010 11:38 pm
by santhoshkumar_sm
Hi Frank,

Also wanted to know where duplicates are being removed..
Thanks in advance

Re: Remove duplicates for only specific records

PostPosted: Sat Oct 16, 2010 12:00 am
by Frank Yaeger
If you're not familiar with DFSORT and DFSORT's ICETOOL, I'd suggest reading through "z/OS DFSORT: Getting Started". It's an excellent tutorial, with lots of examples, that will show you how to use DFSORT, DFSORT's ICETOOL and DFSORT Symbols. You can access it online, along with all of the other DFSORT books, from:

http://www.ibm.com/support/docview.wss? ... g3T7000080

SELECT is an ICETOOL operator - see:

http://publibz.boulder.ibm.com/cgi-bin/ ... 0630155256

FIRST keeps the first record of each set of duplicates.

If you comment out the SORT operator (* SORT FROM...), the job will run with just the SELECT operator. You can then look at the output and you'll see what I'm doing with the sequence numbers. Basically I'm giving all of the CCCC records a sequence number of 0 so they will be treated as duplicates, and all of the other records a unique sequence number so they won't be treated as duplicates. The second sequence number is used to get the records back in their original order.

Re: Remove duplicates for only specific records

PostPosted: Sat Oct 16, 2010 11:46 pm
by santhoshkumar_sm
Hi Frank,

Can you please explain me the below statement you have used...

ON(81,8,ZD) ON(5,76,CH) FIRST USING(CTL1)

I went through the document but could not get only this...

Will be thankful if u help me...

Re: Remove duplicates for only specific records

PostPosted: Sun Oct 17, 2010 4:14 am
by dick scherrer
Hello,

The bit of code you have pasted is part of the SELECT. As Frank explained, the FIRST controls the elimination of the duplicates.

Did you read this suggestion Frank posted:
If you comment out the SORT operator (* SORT FROM...), the job will run with just the SELECT operator. You can then look at the output and you'll see what I'm doing with the sequence numbers.
Suggest you try this when you are logged on again on Monday. It will make the process more clear to you.

Do you understand this explanation?
Basically I'm giving all of the CCCC records a sequence number of 0 so they will be treated as duplicates, and all of the other records a unique sequence number so they won't be treated as duplicates. The second sequence number is used to get the records back in their original order.

As i mentioned, this will be more clear when you can see the output from the SELECT.

Re: Remove duplicates for only specific records

PostPosted: Mon Oct 18, 2010 10:28 pm
by santhoshkumar_sm
Hi Frank,

Thank you so much... It worked great.... also i clearly understood wat u have done.... brilliant... :)