remove duplicate records



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

remove duplicate records

Postby Namrata_Singri » Tue Jun 21, 2016 12:21 pm

Hi

I need to remove duplicate records from a file

Input file
field timestamp
123 2016-06-16-08.39.38.523669
123 2016-06-20-07.39.38.523669
456 2016-06-16-08.39.38.523669


the ouput file should be i.e., it should contain latest record from duplicate records,as in the input file we can see the duplicate record is not having same timestamp,in the output file i need data with 2016-06-20-07.39.38.523669 timestamp not 2016-06-16-08.39.38.523669.
Output file:
field timestamp
123 2016-06-20-07.39.38.523669
456 2016-06-16-08.39.38.523669

Thanks
Namrata_Singri
 
Posts: 13
Joined: Mon Jun 20, 2016 6:18 pm
Has thanked: 2 times
Been thanked: 0 time

Re: remove duplicate records

Postby Aki88 » Tue Jun 21, 2016 1:50 pm

Hello,

Like your earlier post, there are multiple problems with this query; for instance the SORT keys are still not clearly defined; is it only the first three characters, or are the fields starting at column 5 going on for 10 bytes (date I guess) to be considered in SORT, or is it some other way.
Because depending on this, the duplicate/no-duplicate case changes.

Though, if only first three bytes are to be taken, then the below DFSORT solution gives the requested output.
Note that if you have SYNCSORT, then things might be slightly different; also if the aforementioned points pertaining to SORT keys, change, then the below DFSORT solution will require modifications:


000007 //STEP001  EXEC PGM=ICETOOL            
000008 //DFSMSG   DD SYSOUT=*                  
000009 //TOOLMSG  DD SYSOUT=*                  
000010 //IN       DD *                        
=COLS> ----+----1----+----2----+----3----+----4
000011 123 2016-06-16-08.39.38.523669          
000012 123 2016-06-20-07.39.38.523669          
000013 456 2016-06-16-08.39.38.523669          
000014 /*                                      
000015 //OUT      DD SYSOUT=*                  
000016 //TOOLIN   DD *                        
000017  SELECT FROM(IN) TO(OUT) ON(1,3,CH) LAST
000018 /*                                      
 


Output:


123 2016-06-20-07.39.38.523669
456 2016-06-16-08.39.38.523669
 


Refer DFSORT Application Programming Guide, to understand what each keyword means.

Hth.
Aki88
 
Posts: 381
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 33 times
Been thanked: 36 times

Re: remove duplicate records

Postby Namrata_Singri » Tue Jun 21, 2016 2:21 pm

Hi

I want to use sortcard and need to consider feild and timestamp both as sort keys.
The solution you provided will not meet my requirement.
Namrata_Singri
 
Posts: 13
Joined: Mon Jun 20, 2016 6:18 pm
Has thanked: 2 times
Been thanked: 0 time

Re: remove duplicate records

Postby Aki88 » Tue Jun 21, 2016 2:31 pm

Namrata_Singri wrote:Hi

I want to use sortcard and need to consider feild and timestamp both as sort keys.
The solution you provided will not meet my requirement.


Dearie, what is the position of timestamp in your input? If we assume that same starts at 16th position, then you do not have duplicates in your input; it is as simple as that.
The first record has the value: 08.39.38.523669, while the second record has the value: 07.39.38.523669; which as far as I can tell are not duplicate.

Hope this helps.


Edit: Earlier shared solution is indeed a DFSORT solution (ICETOOL is a DFSORT utility), with a SORT card. ;)
Aki88
 
Posts: 381
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 33 times
Been thanked: 36 times

Re: remove duplicate records

Postby Namrata_Singri » Tue Jun 21, 2016 2:47 pm

timestamp and date is only one field (2016-06-16-08.39.38.523669)it is starting from 12th position and length is 26.
i want the latest one in the output file, i guess i have confused you.


Let me tell you clearly,as you can see in the file 123 is occuring 2 times but its timestamp is different,first record is having 2016-06-16-08.39.38.52366 and 2nd record is having 2016-06-20-07.39.38.523669,i have to remove this duplicate record so that the latest timestamp should come in the output i.e., 123 with 2016-06-16-08.39.38.52366 this timestamp.
Hope its clear now
Namrata_Singri
 
Posts: 13
Joined: Mon Jun 20, 2016 6:18 pm
Has thanked: 2 times
Been thanked: 0 time

Re: remove duplicate records

Postby Aki88 » Tue Jun 21, 2016 2:55 pm

Hello,

Namrata_Singri wrote:timestamp and date is only one field (2016-06-16-08.39.38.523669)it is starting from 12th position and length is 26.
i want the latest one in the output file, i guess i have confused you.


Let me tell you clearly,as you can see in the file 123 is occuring 2 times but its timestamp is different,first record is having 2016-06-16-08.39.38.52366 and 2nd record is having 2016-06-20-07.39.38.523669,i have to remove this duplicate record so that the latest timestamp should come in the output i.e., 123 with 2016-06-16-08.39.38.52366 this timestamp.
Hope its clear now


As stated earlier, if above is the case then please refer the earlier shared solution; it does what you need using DFSORT.

NOTE (with extra emphasis) that it assumes that the arrangement of records is such that the 'latest time-stamp' is always appearing as the last, in the sequence of 'duplicate field-1'.

If otherwise, tweak the shared SORT card by referring the 'DFSORT Application Programming Guide', if you get stuck, share what you have tried, and we'll be able to guide you.
Aki88
 
Posts: 381
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 33 times
Been thanked: 36 times

Re: remove duplicate records

Postby Namrata_Singri » Tue Jun 21, 2016 3:10 pm

thanks...will get back to you if i face any issue
Namrata_Singri
 
Posts: 13
Joined: Mon Jun 20, 2016 6:18 pm
Has thanked: 2 times
Been thanked: 0 time


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post