Page 1 of 1

Join matching keys for more than two files

PostPosted: Mon Aug 03, 2015 4:39 pm
by hariharan_bk
Hi All,

I have got in a new requirement, where in multiple files need to be scanned for a common key across them and if found, then we need to reformat the matched key records from multiple files.

INPUT:
FILE1: 123ABCDEFGH
FILE2: 123STUVWXYZ
FILE3: 123ABCDABCD
FILE4: 123WXYZWXYZ

OUTPUT: 123ABUVCDXY (matching key+ some fields from every file)

Joining 2 files based on m matching key can be done either using SORTJOIN or SPLICE operators. But I am not sure how we can we achieve when multiple files are fed as input for a JCL step and we need to join them based on matching key.

Is this possible in a single JCL step using ICETOOL (or) this is only possible in multiple steps using of either SORTJOIN or SPLICE?

Please advise.

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Tue Aug 04, 2015 2:54 am
by NicC
Surely you know by now that this sort of thing shoud have been posted in the sort forum for your sort product and not in the JCL section. You are not after JCL but sort control cards - in fact you are only asking if it is possible so you are asking for a "Yes" or "No" answer.

Moved to DFsort as ICETOOL was mentioned not SYNCTOOL.

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Tue Aug 04, 2015 2:48 pm
by hariharan_bk
Apologies and Thanks !!

Can anyone pls let me know the possibility of the request !!

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Tue Aug 04, 2015 3:38 pm
by BillyBoyo
You need a lot more description of what you want to do and how you arrive at that output from that input.

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Tue Aug 04, 2015 3:52 pm
by hariharan_bk
INPUT:
-------
FILE1: 123 ABCDEFGH
FILE2: 123 STUVWXYZ
FILE3: 123 ABCDABCD
FILE4: 123 WXYZWXYZ

OUTPUT:
----------
123 ABUVCDXY (matching key+ some fields from every file)


Lets consider the input and output files have same length - 12. The string '123' is seen common on these files and this is the matching key to be used.

Using this key position (1-3), we need to look for matching records in all the 4 input files. If found, these matched records need to be reformatted to hold fields from all 4 input files.
For example in the output record,
123 - matching key field, AB - is picked from input file 1 (pos - 5-6) , UV is picked from second file (7-8), CD is picked from third file (pos -7-8) and XY is picked from 4th file (pos - 10-11)

Just to make sure that the matching record (output record) have to hold some fields from all 4 input fiels, in addition to the matching key value.

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Tue Aug 04, 2015 3:55 pm
by BillyBoyo
What if there are three matches? Or two? Or none?

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Wed Aug 05, 2015 9:55 am
by hariharan_bk
We need not consider them. I mean we can discard partial match or no match records. Records found in all four input files are our scope. Rest can be ignored or discarded

Re: JOIN MATCHING KEYS FROM MORE THAN 2 FILES

PostPosted: Wed Aug 05, 2015 2:33 pm
by BillyBoyo
OK, so the first thing you will need is a dummy file header (assuming you don't have file headers).

You need a simple COPY step, which takes a single record from one of you (presumed identical DCB) files, and uses OVERLAY to mark it as the dummy header.

  OPTION COPY,STOPAFT=1
  INREC OVERLAY=(1:C'HEADER')


You either run this each time, or as a once-of, initially and any time the RECFM or LRECL of the file changes in the future.

Then into your main SORT step, you concatenate the dummy header, prior to each of your input files.
//SORTIN DD DSN=dummy-header
//       DD DSN=yourfile1
//       DD DSN=dummy-header
//       DD DSN=yourfile2
//       DD DSN=dummy-header
//       DD DSN=yourfile3
//       DD DSN=dummy-header
//       DD DSN=yourfile4


Then you use WHEN=GROUP to identify the header, and PUSH an ID to temporary extension to the record.
  INREC IFTHEN=(WHEN=GROUP,
                    BEGIN=(1,6,CH,EQ,C'HEADER),
                    PUSH=(81:ID=1))


This will mark every record (including the header) with a 1 in position 81 for the first file, 2 for the second, etc.

Then you can SORT your data on the key, and then on the extended ID, giving you this:
HEADER1
HEADER2
HEADER3
HEADER4
AAA   1
AAA   2
AAA   3
AAA   4
BBB   2
CCC   1
CCC   3
DDD   1
DDD   2
DDD   3
DDD   4


Then in OUTREC you need WHEN=GROUP with KEYBEGIN for your key, and PUSH SEQ=1 to another extended byte.

Continuing in OUTREC, you have three WHEN=GROUP which test for the extended ID being 1 then 2 then 3. You PUSH the information that you need from that particular record into an extended location (one for each file). You use RECORDS=4, RECORDS=3 and RECORDS=2 in the WHEN=GROUP.

The fourth record of each group (the extended SEQ = 4) is the one you want, and it will contain all the data you need.

In OUTFIL, use INCLUDE= to select the 4s only (and to ignore the HEADER record which happens to be number four).

Then use BUILD to do your final formatting.

Re: Join matching keys for more than two files

PostPosted: Thu Aug 06, 2015 7:15 pm
by hariharan_bk
Thanks BillyBoyo. Will try this and update.