Page 1 of 1

Sorting and removing duplicates in a single sort step

PostPosted: Tue May 10, 2011 5:05 am
by drucky
Hi,

Please see my requirements below:

1. I need to sort a file based on a key of 5 byte length.
2. In case there are records with duplicate keys I need to retain the one which has the most recent date-time stamp.

For example

11111 20110509 01:14:56
11111 20110509 02:00:00
22222 20110508 01:30:30
22222 20110509 07:15:00
33333 20110509 08:00:00
44444 20110509 09:00:00

Output File should be

11111 20110509 02:00:00
22222 20110509 07:15:00
33333 20110509 08:00:00
44444 20110509 09:00:00

The date and time fields are actually packed decimal fields,but in the example above i have mentioned it differently
for illustration purposes.

The Input File record length is 1000
Key position - 1; length 5 bytes
Date position - 6; packed decimal 9(09)
Time position - 11; packed decimal 9(09)

Can someone can suggest me how to achieve this using a single SORT card. I know we can achieve this using ICETOOL but I would
prefer if we did it using just DFSORT.

Please let me know if you need some more clarifications.

Thanks,
Drucky

Re: Sorting and removing duplicates in a single sort step

PostPosted: Tue May 10, 2011 6:45 am
by Frank Yaeger
Can someone can suggest me how to achieve this using a single SORT card. I know we can achieve this using ICETOOL but I would prefer if we did it using just DFSORT.


Why? To make life difficult? Since you want to SUM on one field, but need to SORT on several fields, you can't do this
using a single SORT card.
You can do it using a single ICETOOL SELECT pass as follows:

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=... input file
//OUT DD DSN=...  output file
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,5,CH) FIRST USING(CTL1)
/*
//CTL1CNTL DD *
  SORT FIELDS=(1,5,CH,A,6,5,PD,D,11,5,PD,D)
/*


Perhaps you weren't aware that ICETOOL's SELECT could do it in a single pass?

If you want to do this with DFSORT instead of DFSORT's ICETOOL for some reason, use two steps - first sort the records on the three fields to create a temp file, then SORT and SUM on the temp file. This will accomplish in two passes what you can do with ICETOOL in one pass. After all, why be efficient?

Re: Sorting and removing duplicates in a single sort step

PostPosted: Tue May 10, 2011 10:20 am
by drucky
Thank you Frank,

Unfortunately ICETOOL is not recommended in our organisation because of maintainability concerns. After all not everyone is aware of it's entire functionality, including me. I suppose i'll have to go with two SORT steps instead.

Thanks again for your suggestion.

Re: Sorting and removing duplicates in a single sort step

PostPosted: Tue May 10, 2011 4:52 pm
by archnXSP
Why not use SUM FIELDS=NONE in the second line of your sort card..?

But then again it won't work if you are using more than one field to SORT...




Regards,
Sam

Re: Sorting and removing duplicates in a single sort step

PostPosted: Tue May 10, 2011 9:07 pm
by skolusu
archnXSP wrote:Why not use SUM FIELDS=NONE in the second line of your sort card..?

But then again it won't work if you are using more than one field to SORT...


archnxsp,

Are you contradicting your 1st statement with your second statement? Did you try to add SUM FIELDS=NONE and check if you got the desired results? What exactly did you want convey in the above post?

drucky ,

As Frank mentioned ICETOOL's SELLECT is the ideal choice for such requests. Here is a customized solution for your input RECFM=FB and LRECL=1000 using SORT.

//STEP0100 EXEC PGM=SORT                                       
//SYSOUT   DD SYSOUT=*                                         
//SORTIN   DD DSN=Your FB input 1000 byte file,DISP=SHR
//SORTOUT  DD SYSOUT=*                                         
//SYSIN    DD *                                               
  SORT FIELDS=(1,5,CH,A,6,5,PD,D,11,5,PD,D),EQUALS             
  OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,5),PUSH=(1001:SEQ=8)) 
  OUTFIL BUILD=(1,1000),INCLUDE=(1001,8,ZD,EQ,1)               
//*

Re: Sorting and removing duplicates in a single sort step

PostPosted: Wed May 11, 2011 12:59 am
by Frank Yaeger
Unfortunately ICETOOL is not recommended in our organisation because of maintainability concerns. After all not everyone is aware of it's entire functionality, including me. I suppose i'll have to go with two SORT steps instead.


.soapbox on
This is a ridiculous statement. Do you think that anyone in your organization is actually aware of the entire functionality of DFSORT (including you)? If you are, then you must have done a lot of reading. ICETOOL, like DFSORT, is fully documented, so anyone can become aware of the functions of either one equally. ICETOOL has been part of DFSORT since 1991 - it's not exactly something new. It's just as easy to "maintain" ICETOOL by reading its documentation as it is to maintain DFSORT by reading its documentation. I just don't understand organizations that set crazy restrictions like this based on "laziness". Your organization is paying for the functionality in ICETOOL, so why not spend some time to take advantage of what you're paying for? Look at that SELECT operator - does it really seem complicated to you?
.soapbox off