Drop records from DS-A, basis records in DS-B



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Drop records from DS-A, basis records in DS-B

Postby Aki88 » Thu Jun 09, 2016 9:29 pm

Hello,

My apologies for another done to death query.

We have an FB-RECL 20k, dataset, the data is a mix of records which vary in their actual xml-tag and content length; tag and data positions can vary record to record; a single record can have a repetition of tags and the data content; and there can be duplicate records.

We have another dataset, this has records that have only- key1: 16 bytes, key2: 8 bytes; both of these keys may/may-not be present in the earlier dataset.

We need to skip all the records from the first dataset that have (key1 AND key2) present.

Have tried two approaches:
a. Ugly method: Dynamically created a SORT card with 'SS' operator coupled with an OMIT, keys seperated by an AND, and each key change combination seperated by an OR, but this failed, as the entire SORT card went for greater than 40k+ lines (23k key1, key2 combinations; each were seperated by AND; whole lot seperated by OR)
b. Dirty method: JOINKEYS- Parsed dataset-1 to get key1, key2 in extended record; JOINed it with dataset-2 (the one with keys), and waited for the entire horde to get over.

It'd be really-really kind if someone can please point out a simpler method, as I seem to have knocked my brains out.

Thank you.
Aki88
 
Posts: 338
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 32 times
Been thanked: 31 times

Re: Drop records from DS-A, basis records in DS-B

 

Re: Drop records from DS-A, basis records in DS-B

Postby BillyBoyo » Thu Jun 09, 2016 10:17 pm

Are the key1 and key2 in the XML identifiable by tag?

Can you make some representative data, with expected output?

What is the volume of the XML data?

If it is large volume with identifiable keys, I'd rip them out of the XML, with original "sequence number", JOINKEYS to identify matching keys and output the sequence numbers you don't want, in sequence number order, then JOINKEYS to the XML, on sequence number (generated for XML) with with SORTED,NOSEQCK for the JOINKEYS statements. Then JOIN UNPAIRED,F1,ONLY gives you output excluding the records matched on key.
BillyBoyo
Global moderator
 
Posts: 3795
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 259 times

Re: Drop records from DS-A, basis records in DS-B

Postby Aki88 » Fri Jun 10, 2016 11:33 am

Hello Billy,

a. No. of records in DS-A are 30 million and up (these are received from external system, so we do not have control over the record count/record build logic); DS-B has 23k records; each is a key-combination record.
b. The mainframe is being accessed via remote server, so am unable to copy the data as-is; though for representation, relevant parts of data look something as below:



Dataset-A (the one from which records are to be dropped)

<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-2>data</key-2></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-3>data</key-3></identifier-8>................</xml informational tag>

<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-4>data</key-4></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-5>data</key-5></identifier-8>.......</xml informational tag>

<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-6>data</key-6></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-7>data</key-7></identifier-8>.......</xml informational tag>

...
...
 



Dataset-B (the keys that are to be dropped)

key-2 - 16 bytes (3 bytes space) key-1 - 8 bytes
key-4 - 16 bytes (3 bytes space) key-1 - 8 bytes
key-6 - 16 bytes (3 bytes space) key-1 - 8 bytes
...
...
 


In the above samples, records in DS-A can vary in length/key-combinations. Some records do contain error meaning they might have key-1 or can have a different value altogether in place of key-1; the position of key-1 is also not fixed; representational data though shows a simpler record.

Duplicate records are possible; repeated key combinations on same record are possible; repeated key combinations on different records are possible; basically in simple terms, the data is virtually free-form with key combinations placed here and there.

The aim is to drop the entire xml record from DS-A, if the combination of Key-1 and Key-n given in DS-B is present anywhere in any record of DS-A; if both values are present in the same record- position of the value in that record doesn't matter- drop the record. If the DS-B data combinations are not found on 'm' records in DS-A, then copy that record as-is from DS-A to output dataset.

So basically, if one was to write a COBOL program for this, and they have the logic for parsing data in place, then it would be a simple:


03  WS-KEY-COMBINATIONS       PIC X(27)  VALUE SPACES.
    88 88-KEY-COMBINATION              VALUE
           'key-2 - 16 bytes (3 bytes space) key-1 - 8 bytes'
           'key-4 - 16 bytes (3 bytes space) key-1 - 8 bytes'
           'key-6 - 16 bytes (3 bytes space) key-1 - 8 bytes'
---
---
.

---
---

IF NOT 88-KEY-COMBINATION
    MOVE DS-A-RECORD TO OUTPUT-RECORD
END-IF

---
---
 


I really hope, was able to state the requirement clearly; any pointers to simplify the approach would be really helpful.

Thank you.
Aki88
 
Posts: 338
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 32 times
Been thanked: 31 times

Re: Drop records from DS-A, basis records in DS-B

Postby BillyBoyo » Fri Jun 10, 2016 10:35 pm

First stage then, a JOINKEYS. In JNF1CNTL create from your XML, using PARSE, records containing key-1 and key-2 data and the sequence number of the source record.

Allow the JOINKEYS to sort that data on key-1/key-2.

F2 will be your keys file. If already sorted, specify SORTED,NOSEQCK on the JOINKEYS for it, else let it sort.

You want the matches only, so no need for a JOIN statement.

REFORMAT can be just the sequence number.

In the Main-task, SORT on the sequence number. You may have duplicates, and since you are already SORTing, SUM FIELDS=NONE takes care of that.

Second step.

Another JOINKEYS.

On the F1, the XML, use INRECF in JNF1CNTL to append/prepend a sequence number.

Keys for the JOINKEYS statements are the sequence numbers. SORTED,NOSEQCK for both.

JOIN is UNPAIRED,F1

That should be about there.

The XML never needs to be SORTed, just the keys and then the record sequence-numbers.
BillyBoyo
Global moderator
 
Posts: 3795
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 259 times

Re: Drop records from DS-A, basis records in DS-B

Postby Aki88 » Mon Jun 13, 2016 11:02 am

Thank you Billy; JOINKEYS it was then; my earlier solution was close to your suggestion, minus the sequence numbers; but this one is much better.
Aki88
 
Posts: 338
Joined: Tue Jan 28, 2014 1:52 pm
Has thanked: 32 times
Been thanked: 31 times


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post