Page 1 of 1

MERGE AND SORT

PostPosted: Wed Aug 19, 2015 4:18 pm
by hariharan_bk
Hi All,

I have two input files of length (FB/54)
I need to merge these and sort them. And need to pick only the first occurrence of a record with a key combination.

for this I tried the following code and it is working fine.

//TOOLIN DD *                                                       
  COPY FROM(IN1) TO(TMPIN)                                         
  COPY FROM(IN2) TO(TMPIN)                                         
  SORT FROM(TMPIN) TO(TMPOUT) USING(CTL1)                           
  SELECT FROM(TMPOUT) TO(FNLOUT) ON(1,8,BI) ON(11,10,CH) FIRST     
/*                                                                 
//CTL1CNTL DD *                                                     
  SORT FIELDS=(1,8,BI,A,11,10,CH,A,39,4,CH,D,33,2,CH,D,36,2,CH,D)   
/*
//CTL2CNTL DD *                                                     
  SORT FIELDS=COPY
/*                                                 


My question is if I have defined the two input files under the same DD name IN1.

IN1 DD DSN=FILE,DISP=SHR
      DD DSN=FILE,DISP=SHR


and change the toolin like this, the sort function is not considering the all records from both the files and the sort order is not as expected.
//TOOLIN DD *                                                       
  COPY FROM(IN) TO(TMPOUT) USING(CTL1)                           
  SELECT FROM(TMPOUT) TO(FNLOUT) ON(1,8,BI) ON(11,10,CH) FIRST     
/* 


Why is this happening and the sort is not as expected, when the above first approach is working ?

Re: MERGE AND SORT

PostPosted: Wed Aug 19, 2015 4:52 pm
by BillyBoyo
MERGE means something in SORT. What you are talking about is not MERGEing, but concatentating input DSNs to SORT.

If you are able to concatenate in the JCL, you don't need to use ICETOOL, just a simple SORT with the datasets concatenated on SORTIN.

ICETOOL has a SORT operator, so why use COPY with a SORT statement in the USING dataset.

Without knowing what you think has gone wrong (seeing some sample data, the output you get, the output you expected) it is difficult to be certain, but if you have duplicate keys then you will/can get different output from different runs, because the order of duplicate keys is not guaranteed. To guarantee the order as the same as the input (for duplicate keys) use OPTION EQUALS or specify EQUALS on the SORT statement.

You don't seem to need the SELECT. SUM FIELDS=NONE will do that in a simple SORT.

Re: MERGE AND SORT

PostPosted: Wed Aug 19, 2015 5:21 pm
by hariharan_bk
yes.. SORTIN DDname can serve the purpose of concatenating.

Also SUM FIELDS=NONE can remove the duplicates... But how can we make sure that only the first record among the duplicates have to be retained.

in the existing code we referred, EQUALS have been used. SO we thought that the position of the dup records have to be retained and we don't have any of the dups to be removed.

We want the first record to be available and the rest dups to be discarded. Can this be achieved in SUM FIELDS=NONE?

Re: MERGE AND SORT

PostPosted: Wed Aug 19, 2015 5:28 pm
by hariharan_bk
The sort card used for sorting and the one we require for DUPS removal are different.

Also I have tried SORT operator in ICETOOL. Even that dint give the latest record in the top

Re: MERGE AND SORT

PostPosted: Wed Aug 19, 2015 6:34 pm
by BillyBoyo
SUM FIELDS=NONE will retain the first record of the duplicate set. All other records will be discarded.

You should be able to do what you want with just the SELECT, if I have understood correctly.

The high-order two elements of your SORT key are the same. Specify a USING for SELECT which has your full SORT key, ensuring that EQUALS is used. Concatenate your datasets on the input DD for the SELECT.

Re: MERGE AND SORT

PostPosted: Wed Aug 19, 2015 6:40 pm
by steve-myers
Copying and the MERGE function in sort are two separate concepts.

For example -

----+----1----+
KEY1     DS1   
KEY2     DS1
(The ruler is not part of the data) and
----+----1----+
KEY1     DS2   
KEY2     DS2

When the two data sets were processed by this job -
//A       EXEC PGM=IEFBR14                                   
//SORTOUT  DD  DISP=(MOD,DELETE),UNIT=SYSALLDA,SPACE=(TRK,0),
//             DSN=&SYSUID..TESTDATA.MERGED                 
//B       EXEC PGM=SORT                                     
//SYSOUT   DD  SYSOUT=*                                     
//SORTIN01 DD  DISP=SHR,DSN=&SYSUID..TESTDATA.DS1.DATA       
//SORTIN02 DD  DISP=SHR,DSN=&SYSUID..TESTDATA.DS2.DATA       
//SORTOUT  DD  DISP=(,CATLG),UNIT=SYSDA,SPACE=(TRK,(1,1)),   
//             DSN=*.A.SORTOUT                               
//SYSIN    DD  *                                             
 MERGE FIELDS=(1,8,CH,A)
produced this output -
KEY1     DS1
KEY1     DS2
KEY2     DS2
KEY2     DS1
Your exact order may vary. In a sort MERGE, the input is already sorted by its key, the output is also sorted by the key, but the data sets are effectively combined, though still effectively sorted by the key.
Even that dint give the latest record in the top
Just exactly what do you mean by "latest record?" The output order when you have duplicate keys is not defined by sort, though there are options in the SORT and MERGE control statements that can affect the order. See the fine manual for more information.

By the way, we try to avoid tricks you might use when sending texts from a cell phone or "smart" phone here. By "dint" I think most of us assumed "didn't." Even "didn't" is discouraged in "formal" English used in preparing documents of this sort. "Did not" is preferred. "Dint" might be used by a writer attempting to express ethnic speech by a poorly educated person.