Page 1 of 1

remove duplicate records by considering date and timestamp

PostPosted: Mon Jun 20, 2016 6:49 pm
by Namrata_Singri
Hi

I have 1 file which consist of duplicate records like

Feild1   feild2    date(yyyymmdd)         timestamp(hhmmss)
123       ABC      20160620                         083938
123       ABC      20160620                         053623
145       LMN      20160619                        123456
145       LMN      20160620                        116712
178       PQR      20160620                        051122
178       PQR      20160620                        051122


i need to remove duplicate records and write into another file but based on feild1+date+timestamp combination. As shown above, it should consider the latest record.
i.e., the output file must be

Feild1   feild2    date(yyyymmdd)         timestamp(hhmmss)
123       ABC      20160620                         083938
145       LMN      20160620                        116712
178       PQR      20160620                        051122


it should write only one record in output file if the date and timestamp is same in input file like record3

(178       PQR      20160620    051122).


Coded

Re: remove duplicate records by considering date and timest

PostPosted: Mon Jun 20, 2016 8:09 pm
by Namrata_Singri
can you please give me the solution using sortcard

Re: remove duplicate records by considering date and timest

PostPosted: Mon Jun 20, 2016 9:07 pm
by Akatsukami
Please remember to use the Code tags when posting anything where alignment may be critical.

Additionally, you should post your questions in a forum relevant to the desired answer. JCL is, of course, unable to do what you have asked. I have moved your query to the DFSORT forum; if your shop uses a different sort product, let me know and I will move the query to the relevant forum.

Re: remove duplicate records by considering date and timest

PostPosted: Mon Jun 20, 2016 10:07 pm
by ottoelflaco
I m not sure if I understand you. You need two output?:
- Output1: latest record without duplicated
- Output2: duplicated with the same date+time


If it is like I have understood, maybe this work:

//SORT10B  EXEC PGM=SORT          
//SORTIN   DD *                                          
123       ABC      20160620                        083938
123       ABC      20160620                        053623
145       LMN      20160619                        123456
145       LMN      20160620                        116712
178       PQR      20160620                        051122
178       PQR      20160620                        051122
//SORTOUT  DD DSN=dataset1          
//SORTXSUM DD DSN=dataset2                                  
  SORT FIELDS=(1,3,CH,A,11,3,CH,A,20,8,CH,D,52,6,CH,D)  
  SUM FIELDS=NONE,XSUM                                  
//SORT10B  EXEC PGM=SORT
//SORTIN   DD DSN=dataset1
//SORTOUT  DD DSN=dataset3                    
  SORT FIELDS=(1,3,CH,A,11,3,CH,A),EQUALS              
  SUM FIELDS=NONE                                      
 


dataset2:

178       PQR      20160620                        051122
 


dataset3:

123       ABC      20160620                        083938
145       LMN      20160620                        116712
178       PQR      20160620                        051122
 

Re: remove duplicate records by considering date and timest

PostPosted: Tue Jun 21, 2016 12:03 pm
by Namrata_Singri
i dont need 2 output, only 1 output file

Re: remove duplicate records by considering date and timest

PostPosted: Tue Jun 21, 2016 1:04 pm
by Aki88
Hello,

Couple of queries:
a. Which SORT product you have: DFSORT or SYNCSORT; the solution provided by Mr. ottoelflaco is a SYNCSORT solution going by the 'SUM FIELDS=NONE,XSUM' statement and will not work with DFSORT.
b. The time-stamp data start and end-position are not consistent for all records; can you please specify the column numbers where they start; or are your records of variable length?
c. The last line of your post is tad-bit confusing; when you say 'write only one record', do you mean you want only this one record written in output or the other two records with different value in field-1 as well:

Namrata_Singri wrote:it should write only one record in output file if the date and timestamp is same in input file like record3

(178       PQR      20160620    051122).


Coded


d. Do you want the header line preserved in output?
Namrata_Singri wrote:

Feild1   feild2    date(yyyymmdd)         timestamp(hhmmss)
123       ABC      20160620                         083938
123       ABC      20160620                         053623
145       LMN      20160619                        123456
145       LMN      20160620                        116712
178       PQR      20160620                        051122
178       PQR      20160620                        051122


e. Do you want the final output to be sorted or do you want the original order preserved?
f. Lastly, your SORT keys vs your output do not match, for example - if your SORT key is Field-1, date and time-stamp, then both records for field-1 = 123, will be retained, as 'timestamp' is not duplicate there.

If the above queries pertaining to the SORT key are cleared, then this will require a simple SUM FIELDS=NONE; in case this does get a bit complicated, then an ICETOOL with DISCARD operator should give you what is needed.