Page 1 of 1

Clunky DFSORT

PostPosted: Mon May 09, 2016 6:50 pm
by Aki88
Hello,


 OPTION SKIPREC=60000                                              
 OMIT COND=(5,14,CH,EQ,C'00000000000000',OR,                      
            5,14,CH,EQ,C'99999999999999')                          
 INREC IFTHEN=(WHEN=(79,4,CH,EQ,C'XXXX'),                          
                    PARSE=(%01=(STARTAFT=C'<idrequest>',          
                                ENDBEFR=C'</idrequest>',FIXLEN=8)),
                      BUILD=(1:1,4,5:%01,13:C'¦',14:5,4000)),      
       IFTHEN=(WHEN=(79,4,CH,EQ,C'YYYY'),                          
                    PARSE=(%02=(STARTAFT=C'?><',FIXLEN=4)),        
               BUILD=(1:1,4,5:259,8,13:C'¦',14:5,4000))            
 OUTFIL REMOVECC,VTOF,BUILD=(1:5,4013),VLFILL=C' '                
 SORT FIELDS=(1,8,CH,A)                                            
 SUM FIELDS=NONE                                                  
 


We have a VB input file (relevant attrib given below) which has over 60 mil records in xml layout (this dataset is a combined unsorted data pool of over 50 datasets from different sources). These records are a mix of variable and fixed format records, meaning - in case of fixed format, data is present at a specific location controlled by record layout copybooks, au-contraire in case of variable layout records, data is free-form. Variable record data is built by reference modification and padding of xml tags.


Organization  . . . : PS  
Record format . . . : VB  
Record length . . . : 30018
Block size  . . . . : 30022
 


Both the aforementioned record types present in this dataset have an identifier (259,8 --> for fixed layout; preceding </idrequest> tag for variable layout). The aim is to extract unique identifiers records from the input file.

Current approach:
a. Since the input is variable length, hence extend the records temporarily towards left.
b. Dump the identifier in this extended field
c. Once INREC processing is completed, SORT the data and apply SUM FIELD=NONE on the extended key

Problem with this approach:
a. I feel that this code can be made better- much better; it is very clunky at the moment.
b. If the record count is increased by even a mil, the SORTing goes haywire
c. To fix the problem in point-b, a two step approach is taken, i.e. build the INREC data first; and then have another step which SORTs and SUM FIELDS the key position.

Any suggestions on trimming the above SORT card are much appreciated.

PS: The SKIPREC is to avoid a junk chunk of data, and can be ignored. The 'XXXX' and 'YYYY' are identifiers that segregate the fixed/variable records. Xs are variable records, Ys are fixed records.

Thank you.

Re: Clunky DFSORT

PostPosted: Tue May 10, 2016 11:01 pm
by BillyBoyo
OPTION SKIPREC=60000                                              
 OMIT COND=(5,14,CH,EQ,C'00000000000000',OR,                      
            5,14,CH,EQ,C'99999999999999')                          
 INREC IFTHEN=(WHEN=(79,4,CH,EQ,C'XXXX'),                          
                    PARSE=(%01=(STARTAFT=C'<idrequest>',          
                                ENDBEFR=C'</idrequest>',FIXLEN=8)),
                      BUILD=(1:1,4,5:%01,13:C'¦',14:5,4000)),      
       IFTHEN=(WHEN=(79,4,CH,EQ,C'YYYY'),                          
                    PARSE=(%02=(STARTAFT=C'?><',FIXLEN=4)),        
               BUILD=(1:1,4,5:259,8,13:C'¦',14:5,4000))            
 OUTFIL REMOVECC,VTOF,BUILD=(1:5,4013),VLFILL=C' '                
 SORT FIELDS=(1,8,CH,A)                                            
 SUM FIELDS=NONE  


OK, several things. Firstly, unless you are leaving "gaps" (which will be automatically filled with blanks) it is only a headache to include column-numbers in a BUILD. In all your uses, the data naturally occupies the next available position. For instance:

BUILD=(1,4,259,8,C'¦',5,4000))


You have a PARSE for %02, but you don't use it. So not needed at all.

Unless using OUTFIL reporting features, you don't get given a CC, so no need for a REMOVECC. The default fill is blank, so no need for the VLFILL.

The SORT is not doing what you think. It does not matter the order you put the control cards in, SORT will process them exactly how they should be processed. In your case OMIT COND, INREC, SORT, SUM, OUTFIL. Which means your SORT key is wrong, you need it to be 5,8.

I assume, since you have no other treatment for otherwise, that XXXX and YYYY are the only things you can have. Which means you can use WHEN=NONE rather than the second condition (so like an ELSE/WHEN OTHER).

Are your keys contiguous on the input data? If so, you don't need the SORT and SUM, you can instead actually use the OUTFIL reporting features, with SECTIONS and TRAILER3.

Some representative sample data and expected output (shortened data) would help to see what you are expecting to happen.

Re: Clunky DFSORT

PostPosted: Wed May 11, 2016 10:40 am
by Aki88
BillyBoyo wrote:Which means your SORT key is wrong, you need it to be 5,8.


Thank you Billy; my bad for totally goofing up the SORT key, missed it in the whole mess, forgetting that OUTFIL is processed at end.
The PARSE for %02 is leftover code from a previous test run :oops: I'll clean it up.

VLFILL- yup, that should definitely save me a bunch, thank you.
A specific check for 'YYYY' was added to avoid junk data occurrence for fixed length records; which had conked up during the initial runs with WHEN=NONE - which was originally coded.

Have added the sample input/output records for variable type data below (the three lines in the text pasted are but one - a single continuous record spanned across 30k LRECL). The position/length of 'VALUE NEEDED AS SORT KEY' can change depending upon the type of data that is being pushed in. The length of the complete record will also vary depending upon the type of records in question.
In output, this key needs to come at the start of the record, so that SORTing can be carried out on this data.


Sample Input:
YYMMDDTIME-VAR<RANDOM GENERATED CONSTANT><?xml version="1.0"?><XXXX><ident-1><ident-2>Another Value Here</ident-2><Date-Time>MMMMMMMMNNNNNNN</Date-Time><idrequest>VALUE NEEDED AS SORT KEY</idrequest><ident-3>Some Random Number Here</ident-3></ident-1><ident-4><ident-5><ident-6>Another Random Number Here</ident-6></ident-5></ident-4></XXXX>


Sample Output:
VALUE NEEDED AS SORT KEY|YYMMDDTIME-VAR<RANDOM GENERATED CONSTANT><?xml version="1.0"?><XXXX><ident-1><ident-2>Another Value Here</ident-2><Date-Time>MMMMMMMMNNNNNNN</Date-Time><idrequest>VALUE NEEDED AS SORT KEY</idrequest><ident-3>Some Random Number Here</ident-3></ident-1><ident-4><ident-5><ident-6>Another Random Number Here</ident-6></ident-5></ident-4></XXXX>

 


Lastly, completely agree on 'it is only a headache to include column-numbers in a BUILD'; it has become a habit of sorts to add the column; will work on fixing it :)

Once again, thank you for the help.