Page 1 of 1

how to add numbers to groops of records...

PostPosted: Wed May 24, 2017 6:25 pm
by Dmitriy
hello colleagues!
I need to add sequence number to each group of records, like unique word ids...

input:

630774221
630774221
963495850
963495850
963495850
345695561
678609548
678609548
678609548
918367402
279702180
 


output:

630774221 0001
630774221 0001
963495850 0002
963495850 0002
963495850 0002
345695561 0003
678609548 0004
678609548 0004
678609548 0004
918367402 0005
279702180 0006
 


and file contains billions of records, so how to do this with maximum performance?
can you help me please. Thanks in advance!

Re: how to add numbers to groops of records...

PostPosted: Mon May 29, 2017 11:29 am
by Aki88
Hello,

A few questions before we look at the solution:
a. You do not want the records to be sorted while padding the ID? The output you've shown retains the original order of records.
b. Is there a possibility of a unique group record to appear again somewhere down the line, if so how do you want that handled; for example:

630774221
630774221
963495850
963495850
963495850
345695561
678609548
678609548
678609548
630774221 --> here this appears again
630774221 --> here this appears again  
918367402
279702180
 


c. You've mentioned that there can be billions of records in input, but you've shown unique identifiers of 4 bytes only, which would mean that it can accommodate maximum of '9999' unique identifiers.

Solution to the query is fairly straight forward unless the aforementioned complexities are not added to it; you need to group the records and PUSH an ID to it. DFSORT allows 15 bytes zoned decimal id to be pushed in, which means 999,999,999,999,999 is the maximum value:


//SORTIN   DD *                          
630774221                                
630774221                                
963495850                                
963495850                                
963495850                                
345695561                                
678609548                                
678609548                                
678609548                                
918367402                                
279702180                                
/*                                        
//SORTOUT  DD SYSOUT=*                    
//SYSIN    DD *                          
 SORT FIELDS=COPY                        
 INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,9),
                          PUSH=(11:ID=15))
/*                                        
 


Output:


630774221 000000000000001
630774221 000000000000001
963495850 000000000000002
963495850 000000000000002
963495850 000000000000002
345695561 000000000000003
678609548 000000000000004
678609548 000000000000004
678609548 000000000000004
918367402 000000000000005
279702180 000000000000006
 

Re: how to add numbers to groops of records...

PostPosted: Mon May 29, 2017 11:40 am
by enrico-sorichetti
the number of records is NOT related to the number of groups/identifiers
:mrgreen:

Re: how to add numbers to groops of records...

PostPosted: Mon May 29, 2017 11:53 am
by Aki88
Hello Mr. Sorichetti,

enrico-sorichetti wrote:the number of records is NOT related to the number of groups/identifiers
:mrgreen:


Yes, I completely agree; but going by the representative data, there are certain records which have only one entry (instead of paired/grouped entries).
Hence the SORT card written gives the solution for maximum possible groups; TS is expected to tweak it to fit his needs.
I'd be very-very surprised if ONLY 9999 groups were possible in the actual 'billions of records'. :)

Best regards.

Re: how to add numbers to groops of records...

PostPosted: Mon May 29, 2017 12:56 pm
by prino
Dmitriy wrote:... and file contains billions of records ...

And if if my uncle was a woman he'd be my aunt...

Which PHB has come up with this ludicrous time-wasting requirement?