Compare set of records and filter them



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Compare set of records and filter them

Postby Aki88 » Wed Feb 01, 2017 7:45 pm

Hello,

Apologies for a reapeated query.

1. Two FB datasets; DS-A, LRECL=19, has duplicates, is unsorted; DS-B, LRECL=600, has 19 byte data at 'n' column (fixed position), these may be repeated, records are not duplicate but valid repetitions (data is arranged on the basis of SEQ number in column 1 to 6, ascending, though can be unsorted on the aforementioned 19 bytes).
2. Data content-- DS-A can have upwards of 1.5 mill records, DS-B can have upwards of 500k records, as of now < 1 mill.
3. DS-B structure:
--> Has a header record with some-text and date (not necessarily current date)
--> Has detail records with aforementioned sequence in column 1 to 6; yes, the sequence will be repeated from '1' once the limit of '999999' is completed.
--> Has 19 byte data at column 'n' which is to be matched with DS-A
--> Has another 19 byte number in the format: bbbb.cc, on another column
--> Has a tail record which has some text in column 1 to 3, detail-record count ONLY (total count minus head and tail), and lastly the summation of the 19 byte number which is in format bbbb.cc, translated to 15 byte ZD

Requirement: Omit all the records of DS-A from DS-B, rebuild DS-B in the original order with columns 1 to 6 resequenced, and tail rebuilt with updated COUNT and summation.

Since DS-A was large in size (and also because I couldn't think of anything better), hence I went the JOINKEYS-way; below is the current solution:

Step-1:

DS-A is being sorted and SUM FIELDS=NONEd at this point. Original tail record is being dropped here. The set of matched records is being saved for reconciliation.


//SYSIN    DD *                                              
 JOINKEYS FILES=F1,FIELDS=(1,19,A)                          
 JOINKEYS FILES=F2,FIELDS=(39,19,A)                          
 JOIN UNPAIRED F1,F2                                        
 REFORMAT FIELDS=(F1:1,19,F2:1,600,?)                        
 SORT FIELDS=COPY                                            
 OUTFIL FNAMES=BOTH,REMOVECC,                                
        INCLUDE=(620,1,CH,EQ,C'B'),                          
        BUILD=(20,600)                                      
 OUTFIL FNAMES=ONLYDSB,REMOVECC,                            
        INCLUDE=((620,1,CH,EQ,C'2'),AND,(20,3,CH,NE,C'999')),
        BUILD=(20,600)                                      
/*                                                          
 


Step-2:
Handled the header record, and rebuilt the tail.


//TOOLIN   DD *                                    
 DATASORT FROM(IN) TO(OUT) USING(SRT1) FIRST      
/*                                                
//SRT1CNTL DD *                                    
 SORT FIELDS=(1,6,CH,A)                            
 OUTFIL FNAMES=OUT,REMOVECC,                      
        IFTHEN=(WHEN=(1,3,CH,NE,C'HEA'),          
                BUILD=(SEQNUM,6,ZD,7,594)),        
        TRAILER1=(C'999',                          
                  COUNT-1=(M11,LENGTH=9),          
                  TOT=(258,18,UFF,TO=ZD,LENGTH=15))
/*                                                
 


Query: Can this be done in a more simpler way, can we drop JOINKEYS altogether, an alternate approach maybe?

Thank you.
Aki88
 
Posts: 371
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 33 times
Been thanked: 35 times

Re: Compare set of records and filter them

 

Re: Compare set of records and filter them

Postby BillyBoyo » Thu Feb 02, 2017 3:55 pm

Since the JOINKEYS is the way to match, the only way to be able to drop it is to use another language :-)

You only need REMOVECC on OUTFIL if you are using reporting-features to generate a file rather than a report. With no use if reporting-features, REMOVECC is ignored.

You would be able to integrate the second step into the first. More work (for you) less resources (to pay for).

Exactly how depends largely on exactly what your headers and trailers look like.

At a quick reading, I'm not sure of your purpose of using SUM FIELDS=NONE in JNFnCNTL dataset that you've not shown.
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: Compare set of records and filter them

Postby Aki88 » Thu Feb 02, 2017 4:11 pm

Hello Billy,

Thank you for looking this one up.

'REMOVECC' on OUTFIL is and old bad-habit, something I tend to type-in eveytime I use OUTFIL. Trying to unlearn it.

Header record explicitly looks as:

XXXXXDD/MM/YYYY< rest of the chunk is 585 spaces >   the date here comes from an external source and can be any day on a calender and the Xs are unique identifier
 


Tail:

999< 9 bytes ZD for count of records><15bytes ZD for balance totals>< 573 bytes of space >
 


The SUM FIELDS=NONE is to drop the duplicate key-records from the DS-A, this is to avoid the extra cartesian JOINs which will happen if duplicate records are present in DS-A.
This in turn will have an impact on the summation of checksum value which will result in incorrect reconciliation-balance calculation for the dropped records.

Hope I was able to bring more clarity on the process.
What I was thinking of was to add an identifier for HEADER during JOINKEYS pre-processing, then use 'IFTHEN=(WHEN' on the OUTFIL while building the data and handle the header and rebuild of records. But after some thought to it, I could not visualize it, hence the current solution.

Any guidance is much appreciated.

Thank you.
Aki88
 
Posts: 371
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 33 times
Been thanked: 35 times


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post