Page 1 of 2

Removing Dups, select last date/time

PostPosted: Thu Jan 12, 2012 10:02 pm
by Peter_Mann
Hi folks, I've been working on a syncsort job that reads a report of jobnames, the input data contains a list job jobs that ran for the month, I need to only select the last run of a particular job for each month
below is a sample of the data I've been working with
****** ***************************** Top of Data ******************************
=COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
000001  DTST2520  DTST2520 0003926 01/01/12-01:30  01/01/12-01:30     0:09  SIN
000002  DTST2520  DTST2520 0005114 01/08/12-01:30  01/08/12-01:30     0:09  SIN
000003  DTST2002  DTST2002 0004119 01/02/12-05:15  01/02/12-05:15     0:19  SIN
000004  DTST2003  DTST2003 0004148 01/02/12-08:00  01/02/12-08:07     7:29  SIN
000005  DTST2001  DTST2001 0004147 01/02/12-08:00  01/02/12-08:08     8:25  SIN
000006  DTST2002  DTST2002 0005305 01/09/12-05:15  01/09/12-05:15     0:19  SIN
000007  DTST2003  DTST2003 0005337 01/09/12-08:00  01/09/12-08:00     0:05  SIN

my code below
//SYSIN    DD  *                                                       
  SORT FIELDS=(12,8,CH,A,29,13,CH,D)                                   
  SUM FIELDS=NONE                                                       
  OUTFIL OUTREC=(2:12,8,                                               
                12:29,30,                                               
                50:104,4,                                               
                55:C'                                             '),   
         HEADER1=(3:'JOBNAME',                                         
                  17:'START',                                           
                  33:'END',                                             
                  50:'SYSID',                                           
                  59:'PAGE:',                                           
                  68:&PAGE,/,                                           
                  17:'TIME',                                           
                  32:'TIME',//)                                         
/*                                                                     

is working but not removing the duplicates, from what I understand from reading the Programmers Guide, the SUM FIELDS is what's needed to remove dups from the SORT FIELDS statement, but my output still contains duplicates, can anyone eyeball my code and tell me what I'm missing?
Thanks
output data follows
 ********************************* TOP OF DATA **********************************
  JOBNAME       START          END               SYSID    PAGE:                 
                TIME           TIME                                             
                                                                               
                                                                               
 DSUT1000  01/10/12-00:01  01/10/12-00:01        TST1                           
 DSUT1000  01/09/12-00:01  01/09/12-00:01        TST1                           
 DSUT1000  01/08/12-00:01  01/08/12-00:01        TST1                           
 DSUT1000  01/07/12-00:01  01/07/12-00:01        TST1                           
 DSUT1000  01/06/12-00:01  01/06/12-00:01        TST1                           
 DSUT1000  01/05/12-00:01  01/05/12-00:01        TST1                           
 DSUT1000  01/04/12-00:01  01/04/12-00:01        TST1                           
 DSUT1000  01/03/12-00:01  01/03/12-00:01        TST1                           
 DSUT1000  01/02/12-00:01  01/02/12-00:01        TST1                           
 DSUT1000  01/01/12-00:01  01/01/12-00:01        TST1                           
 DSUT1001  01/10/12-00:01  01/10/12-00:01        TST1                           
 DSUT1001  01/09/12-00:01  01/09/12-00:01        TST1                           
 DSUT1001  01/08/12-00:01  01/08/12-00:01        TST1                           
 DSUT1001  01/07/12-00:01  01/07/12-00:01        TST1                           
 DSUT1001  01/06/12-00:01  01/06/12-00:01        TST1                           
 DSUT1001  01/05/12-00:01  01/05/12-00:01        TST1                           
 DSUT1001  01/04/12-00:01  01/04/12-00:01        TST1                           
 DSUT1001  01/03/12-00:01  01/03/12-00:01        TST1                           
 DSUT1001  01/02/12-00:01  01/02/12-00:01        TST1 

Re: Removing Dups, select last date/time

PostPosted: Thu Jan 12, 2012 11:50 pm
by BillyBoyo
You don't have any duplicates.

You have sorted on the date/time after sorting on the jobname, so they would only be duplicate if the (start) date/times are equal.

You haven't included the final digit of the minutes.

You will not get correct output if you have other month's/year's data, as you are sorting on dd, mm, yy.

You will need to do the sort-on-multiple-key-sum-on-shorter-key thing. There are a couple of recent examples.

Edit: Here is an example. http://www.ibmmainframeforum.com/syncsort-synctool/topic6819.html

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 12:06 am
by Peter_Mann
Thanks Billy, I think I've included the entire date/time - I've may have confused things abit by not showing the input data relavent to the output (right records), but the date/time is in tact
DSUT1000  DSUT1000 0003908 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1001  DSUT1001 0003909 01/01/12-00:01  01/01/12-00:01     0:03  CTM-CONTROL
DSUT1002  DSUT1002 0003914 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1003  DSUT1003 0003915 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1007  DSUT1007 0003912 01/01/12-00:01  01/01/12-00:01     0:05  CTO-CONTROL
DSUT1007  DSUT1007 0003913 01/01/12-00:01  01/01/12-00:01     0:03  CTO-CONTROL
DSUT1008  DSUT1008 0003910 01/01/12-00:01  01/01/12-00:01     0:06  IOA-CONTROL
DSUT1004  DSUT1004 0003911 01/01/12-00:01  01/01/12-00:01     0:05  IOA-CONTROL

above data is from the input file.
I've reviewed that example you've provided prior to posting, I had some issues following the logic and code, I'll take a harder look at it again.
thanks for your help

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 12:29 am
by BillyBoyo
It generates a sequence number which is reset on change of value of a particular defined field, so in your case it would be the jobname. Then it excludes all those which do not have a sequence number of one, which will be the record which has sorted first.

I'd sort on the date yymmdd, even if your data will "never" include dates outside of one month. Never know when you want to do a quick copy of a piece of "working" code, and find that you have to change it...

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 12:46 am
by Peter_Mann
Got it! getting someone else's eyes always helps, I was off by one on the date/time, these old eyes..... I was able to get my results in a two sort phase, I used your suggestion, I sorted on date first to get the last value, then sorted, removing the duplicate jobnames. thanks so very much for your help.
I've used syncsort to sort data since the late 70's and supported the product since the 90's, I've just never been asked to do any real data manupilation, simple for most folks I'd gather, but a bugger for me, again thanks

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 1:35 am
by BillyBoyo
Glad you got it, glad I could help. I also remember the days when SORT sorted data :-)

I'm at a PC without an emulator at the moment. If you try the below, it might give you the result in one step. I'll test it later unless you get there first.

//SORT1 EXEC PGM=SORT                                             
//SORTIN  DD *                                                   
DSUT1000  DSUT1000 0003908 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1001  DSUT1001 0003909 01/01/12-00:01  01/01/12-00:01     0:03  CTM-CONTROL
DSUT1002  DSUT1002 0003914 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1003  DSUT1003 0003915 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1007  DSUT1007 0003912 01/01/12-00:01  01/01/12-00:01     0:05  CTO-CONTROL
DSUT1007  DSUT1007 0003913 01/01/12-00:01  01/01/12-00:01     0:03  CTO-CONTROL
DSUT1008  DSUT1008 0003910 01/01/12-00:01  01/01/12-00:01     0:06  IOA-CONTROL
DSUT1004  DSUT1004 0003911 01/01/12-00:01  01/01/12-00:01     0:05  IOA-CONTROL
//SORTOUT DD SYSOUT=*                                             
//SYSOUT  DD SYSOUT=*                                             
//SYSIN   DD *                                                   
  SORT FIELDS=(12,8,CH,A,35,2,CH,D,32,2,CH,D,29,2,CH,D,38,5,CH,D),EQUALS                                   
  OUTREC IFTHEN=(WHEN=INIT,OVERLAY=(121:SEQNUM,5,ZD,RESTART=(12,8)))
  OUTFIL INCLUDE=(121,5,ZD,EQ,1),BUILD=(1,120)                     
/*


I've tried to double-check, so I hope everything is in the right place. I've assumed 120 for the report-data coming in, but change the 120/121 if necessary.

Basically, it does the sort into the correct sequence, and then while pre-processing the oputput file puts a sequence number, which will restart on change of the jobname. Then with the OUTFIL INCLUDE only the latest run of the job (with sequence number one, for the descending date/time) will be selected and remaining records with that same jobname will not appear in the output.

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 1:47 am
by dick scherrer
Hello,

simple for most folks I'd gather,
You might be surprised at how few folks are knowledgable enough to do much more than simply putting things in sequence, merging things, or selectively copying things. We are fortunate to have several people (including developers/support people from IBM and Syncsort) who do these advanced tasks very well 8-)

In fact, several of the sites i've supported will not allow many of the advanced features to be used. . .

It does seem to be getting better, though :)

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 2:16 am
by Peter_Mann
BillyBoyo wrote:Glad you got it, glad I could help. I also remember the days when SORT sorted data :-)

I'm at a PC without an emulator at the moment. If you try the below, it might give you the result in one step. I'll test it later unless you get there first.

//SORT1 EXEC PGM=SORT                                             
//SORTIN  DD *                                                   
DSUT1000  DSUT1000 0003908 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1001  DSUT1001 0003909 01/01/12-00:01  01/01/12-00:01     0:03  CTM-CONTROL
DSUT1002  DSUT1002 0003914 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1003  DSUT1003 0003915 01/01/12-00:01  01/01/12-00:01     0:04  CTM-CONTROL
DSUT1007  DSUT1007 0003912 01/01/12-00:01  01/01/12-00:01     0:05  CTO-CONTROL
DSUT1007  DSUT1007 0003913 01/01/12-00:01  01/01/12-00:01     0:03  CTO-CONTROL
DSUT1008  DSUT1008 0003910 01/01/12-00:01  01/01/12-00:01     0:06  IOA-CONTROL
DSUT1004  DSUT1004 0003911 01/01/12-00:01  01/01/12-00:01     0:05  IOA-CONTROL
//SORTOUT DD SYSOUT=*                                             
//SYSOUT  DD SYSOUT=*                                             
//SYSIN   DD *                                                   
  SORT FIELDS=(12,8,CH,A,35,2,CH,D,32,2,CH,D,29,2,CH,D,38,5,CH,D),EQUALS                                   
  OUTREC IFTHEN=(WHEN=INIT,OVERLAY=(121:SEQNUM,5,ZD,RESTART=(12,8)))
  OUTFIL INCLUDE=(121,5,ZD,EQ,1),BUILD=(1,120)                     
/*


I've tried to double-check, so I hope everything is in the right place. I've assumed 120 for the report-data coming in, but change the 120/121 if necessary.

Basically, it does the sort into the correct sequence, and then while pre-processing the oputput file puts a sequence number, which will restart on change of the jobname. Then with the OUTFIL INCLUDE only the latest run of the job (with sequence number one, for the descending date/time) will be selected and remaining records with that same jobname will not appear in the output.


Billy, Thank you, this looks alot cleaner than what I've come up with, I'll give it a go, I don't think we're too worried about the record lenght, as long as I can supply all the fields they need.
:D

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 2:23 am
by Peter_Mann
dick scherrer wrote:Hello,

simple for most folks I'd gather,
You might be surprised at how few folks are knowledgable enough to do much more than simply putting things in sequence, merging things, or selectively copying things. We are fortunate to have several people (including developers/support people from IBM and Syncsort) who do these advanced tasks very well 8-)

In fact, several of the sites i've supported will not allow many of the advanced features to be used. . .

It does seem to be getting better, though :)

I've come across some great example from the support folks here, I've used the search tool more often than I've posted because of the time and effort you put into explaining the tool/options all in your spare time.
Thanks to everyone for all your time and efforts.

Re: Removing Dups, select last date/time

PostPosted: Fri Jan 13, 2012 2:35 am
by dick scherrer
Hi Peter,

I've used the search tool more often than I've posted
Cool 8-)

We try to encourage folks to search, but someone is usually here when there are questions.

Good to hear the forum is useful to you :)

d