Page 1 of 1

Count and segregate records basis conditions

PostPosted: Wed Feb 08, 2017 6:26 pm
by Aki88
Hello,

Scenario:
a. 10 to 20 fixed length VSAM KSDS; key-len: 22; number of records per DS ranging from 40 to 50 mill.
b. In the DS, a particular field contains date in packed decimal format, this is the field to be tested.

Requirement: Test the aforementioned date field for each record for all the VSAM DS for a range of various dates, and accordingly count the records, segregating the count basis the date range.

A very crude pseudo-logic flow would appear as:


PERFORM THIS LOOP INCREMENTING N FROM 1 BY 1, 20 TIMES, FOR EACH SET OF VSAM DS
        READ DATASET-N UNTIL END                                    
               EVALUATE TRUE                                        
                        WHEN DATE-DS-N  IS < RANGE-1 AND IS > RANGE-2
                             INCREMENT COUNTER-1                    
                        WHEN DATE-DS-N  IS < RANGE-3 AND IS > RANGE-4
                             INCREMENT COUNTER-2                    
               ......                                                
               ......                                                
                        WHEN OTHER                                  
                             INCREMENT COUNTER-M                    
               END-EVALUATE                                          
        END-READ                                                    
END-PERFORM.                                                        
      DISPLAY 'COUNTER-1:' COUNTER-1                                
      ....                                                          
      ....                                                          
      DISPLAY 'COUNTER-M:' COUNTER-M                                
 


This can be very easily achieved in multiple passes of data. For a single pass, I coded the below card:


//SYSIN    DD *                                    
 MERGE FIELDS=(4,19,CH,A)                          
 OUTFIL FNAMES=OUT0,INCLUDE=(MY DATE CHECK RANGE1),
        REMOVECC,NODETAIL,                        
        TRAILER1=(COUNT15)                        
 OUTFIL FNAMES=OUT1,INCLUDE=(MY DATE CHECK RANGE2),
        REMOVECC,NODETAIL,                        
        TRAILER1=(COUNT15)                        
 OUTFIL FNAMES=OUT2,INCLUDE=(MY DATE CHECK RANGE3),
        REMOVECC,NODETAIL,                        
        TRAILER1=(COUNT15)                        
 OUTFIL FNAMES=OUT3,INCLUDE=(MY DATE CHECK RANGE4),
        REMOVECC,NODETAIL,                        
        TRAILER1=(COUNT15)                        
 OUTFIL FNAMES=OUT4,INCLUDE=(MY DATE CHECK RANGE5),
        REMOVECC,NODETAIL,                        
        TRAILER1=(COUNT15)                        
/*                                                
 


This fails under the scenario, wherein the data between two or more VSAM DS goes out of (sorted) order, as MERGE FIELDS requires the data to be sorted. [ICE068A 0 OUT OF SEQUENCE <SORTIN DSN here>]

Query: Can I use ICETOOL COUNT to achieve this, or a similar logic which handles all VSAM DS in one go and produces a simple count. I tried coding an IFTHEN condition in the USING card, but couldn't work out how to segregate the final count on multiple conditions.

Thank you.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 5:21 am
by BillyBoyo
I don't fully understand what you want, or why you wanted to use MERGE.

Why can't you use the 1/0 technique for multiple temporary fields, which then go to TOT/TOTAL in the TRAILER3 for each of the fields? If you have to test the same record more then once (not clear if you have dates at multiple locations), then include HIT=NEXT in the IFTHEN.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 10:59 am
by Aki88
Hello Billy,

There is only one date field in the input DS which is to be tested against a date range (which can be passed through multiple INCLUDE/test-conditions); for which seperate COUNT buckets are to be created.

Reason for using MERGE was to handle multiple VSAM DS in one step itself. This way the job would work as a simple top down read of all records in all VSAM DS and segregate the COUNTs at the end.

The other approach that I could think of was to use ICETOOL to read all VSAM KSDS in one go without having to go through multiple steps creating seperate count outputs which in turn can be merged in one last step. But I am not sure if COUNT operator will allow me segregation (I tried using COUNT with USING, and tried coding an IFTHEN construct to see if seperate buckets were created; but it required an extra step to sum the counts from the 20 VSAM KSDS, and COUNT really did not count seperate conditions passed in one USING clause itself).

Putting the requirement more simply:
20 VSAM DS; having a few million records each; having a date field at a particular position; this date field is to be tested against a range of dates.
At the end, the final count of each date-range-bucket has to be written to output. For sample test (from my first post), 5 buckets have been shown (in the form of 5 OUTFIL statements).

Your guidance is much appreciated.

Thank you.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 12:44 pm
by BillyBoyo
Why are you not specifying the full key on the MERGE statement?

Set "n" additional fields to zero (only need to be one-byte long) and then a bunch of IFTHEN to set the correct one to one. One OUTFIL, "n" TOT/TOTALS on TRAILER1.

You could look to compare the performance between what you have, and that (maximum of one input data set).

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 1:09 pm
by BillyBoyo
Actually, the key is relevant to you, isn't it? You don't care what order things are counted in :-)

Use INREC to OVERLAY a value (one byte long) at, for instance, position one on each record.

Specify that one byte on the MERGE FIELDS=. It will at least minimise the key processing for MERGE.

VSAM data sets are tricky, because without special provision (writing your own code, purchasing a product that does it (I think SAS does, and the other SORT product but I've never used it to see how effective - basically the same as you are attempting with your MERGE), they can't be concatenated.

These data sets don't happen to have sequential backups lying around, do they?

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 2:08 pm
by Aki88
Hello Billy,

BillyBoyo wrote:Why are you not specifying the full key on the MERGE statement?
...


I segregated the key only on the changing values, the first 3 bytes were constant through and through (could've taken complete key though for cleaner MERGE processing, but then key here is irrelevant). If a single VSAM DS is taken as input at a given time, then this will be replaced with 'COPY'.

BillyBoyo wrote:....
Set "n" additional fields to zero (only need to be one-byte long) and then a bunch of IFTHEN to set the correct one to one. One OUTFIL, "n" TOT/TOTALS on TRAILER1.

You could look to compare the performance between what you have, and that (maximum of one input data set).


I wonder if OUTFIL/NODETAIL/COUNT in multiple datasets and then a summation in the last step is quicker or TRAILER1/TOT/and then a summation in last step; will test it just for thrills, it is fairly small code.

The prime challenge is processing multiple VSAM DS in one go; hence the MERGE; else like I mentioned earlier, a simple COUNT is all that is needed.

Thank you.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 2:33 pm
by Aki88
Hello Billy,

Missed this post; looks like I started my earlier post a while back, but actually posted it later.

BillyBoyo wrote:Actually, the key is relevant to you, isn't it? You don't care what order things are counted in :-)
...


Bingo! You're spot on with that.

BillyBoyo wrote:...
Use INREC to OVERLAY a value (one byte long) at, for instance, position one on each record.

Specify that one byte on the MERGE FIELDS=. It will at least minimise the key processing for MERGE.
....


Umm, now that you mention it, yes, I'll have to overlay a large enough SEQNUM; reason being when the DS transition happens (from VSAM-DS-1 to VSAM-DS-n), there is a possibility of out-of-sequence key. Will try this with MERGE.

BillyBoyo wrote:...VSAM data sets are tricky, because without special provision (writing your own code, purchasing a product that does it (I think SAS does, and the other SORT product but I've never used it to see how effective - basically the same as you are attempting with your MERGE), they can't be concatenated.

These data sets don't happen to have sequential backups lying around, do they?


You've put my problem in that one precise sentence- VSAM DS cannot be DD concatenated. Nopes, no PS copies of these DS.

Maybe a DFSORT wishlist item for Kolusu - an enhanced ICETOOL-COUNT that allows me multiple breaks. I am sure he'll point me to a solution which is already there and I've missed it in the good book. ;) :D

Thank you for looking this one up.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 2:47 pm
by BillyBoyo
If you make the merge-key one byte long, each with the same value, you will have no problem with the keys being out-of-sequence.

Previously I meant to type irrelevant, not relevant. For simply counting, the order if the input does not matter to you, does it? So the easiest way to ensure MERGE doesn't trip is to make a very short key with only one value on every record, which is where the INREC comes into it. Just splat data-byte one with X, and use that as the "key". You don't care about the actual "merge", you are just interested in reading multiple VSAM datasets in one step.

Re: Count and segregate records basis conditions

PostPosted: Thu Feb 09, 2017 3:05 pm
by Aki88
Yeah, I read it as you'd intended and not as you'd typed :)

And voila, it works :D
Below is the change made; in actual card, have also shortened the data-length by keeping only the required date field; rest chunk of code can be kept same as first post, or be driven through IFTHEN and then be summed:

 INREC OVERLAY=(1:C'1')
 MERGE FIELDS=(1,1,CH,A)
 


Thanks a ton for the pointer Billy :D