Page 1 of 2

Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 3:49 am
by vinu78
Hi,

I have 1 million records in an input file.
Using SORT, I want to split the records to 3 files.
- If the column 5 - 10 bytes contains the value '00000', it should go to File 1
- 50% of the remaining records should go to File 2
- 50% of the remaining records should go to File 3

Could you please help me in writing JCL using SORT utility. (I can't ICEMAN here in my company).

Thanks
Vinu

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 4:39 am
by Frank Yaeger
If the column 5 - 10 bytes contains the value '00000', it should go to File 1


5-10 bytes is a length is a length of 6 whereas '00000' is a length of 5. Which is it - a length of 6 or a length of 5?

- 50% of the remaining records should go to File 2
- 50% of the remaining records should go to File 3


So if the remaining records are 1-10, do you want 1,3,5,7,9 in file2 and 2,4,6,8,10 in file3, or do you want 1-5 in file2 and 6-10 in file3?
How do you want them split if the remaining records are 1-9?

What is the RECFM and LRECL of the input file?

Could you please help me in writing JCL using SORT utility. (I can't ICEMAN here in my company).


This statement makes no sense. PGM=SORT and PGM=ICEMAN both invoke the SORT utility at your site.

Which SORT product are you using- DFSORT or Syncsort?

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 4:51 am
by vinu78
Sorry for the incomplete data.
The RECL = 150bytes, If 5 bytes from 5th byte (5 - 9) of the input file is '00000', then it should go to File 1.
The remaining records should go to File 2 and File 3. (It can either go as 1,3,5,7 records in File 1 and 2,4,6,8 records in File 2 OR 1- 100 records in file1 and 101-200 records in File 2) like that. Actually I am not concerned about how the records are getting split in File 2 and 3.

The reason behind splitting of the records in 2 files is that since 50 million records can come, 9 prod disk packs each can be allocated to each dataset. I will then SORT/Merge this 2 files to TAPE dataset. I can't reserve 18 prod disk packs in one go. that is why I am splitting the file to 2.

Appreciate your help.

Thanks
Chidam

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 5:24 am
by vinu78
Hi Frank,

Our company uses DFSORT.
Regarding the splitting of files, I came across a command SPLITBY however was not sure how I can use it because I want to filter out 5- 9 byte = '00000' to another file too.

Please help.

Thanks
Vinu

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 8:58 pm
by Frank Yaeger
Here's a DFSORT job that will do what you asked for:

//S1    EXEC  PGM=SORT
//SYSOUT    DD  SYSOUT=*
//SORTIN DD DSN=...  input file (FB/150)
//OUT1 DD DSN=...  output file1 (FB/150)
//OUT2 DD DSN=...  output file2 (FB/150)
//OUT3 DD DSN=...  output file3 (FB/150)
//SYSIN    DD    *
   OPTION COPY
   OUTFIL FNAMES=OUT1,INCLUDE=(5,5,CH,EQ,C'00000')
   OUTFIL FNAMES=(OUT2,OUT3),SAVE,SPLIT
/*

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 10:13 pm
by vinu78
Hi Frank,

Thanks a lot for the JCL. It worked fine.
Just out of curiosity can I know whether the particular records can be split like the following:-
The first half of the records which doesn't have '00000' as 5-9 bytes should go to File 2 and the remaining half as bulk should go to File 3. The File 1 will be having the records with '00000' as mentioned earlier.

Eventhough I don't care the way it was split for my requirement, just for my knowledge sake, I thought of having the logic of dividing the records exactly first half to File 2 and second half of remaining records to File 3.

Thanks
Vinu

Re: Splitting the records based on conditions

PostPosted: Tue Sep 22, 2009 10:38 pm
by Frank Yaeger
You could modify the job I gave you to use the technique discussed in the "Split a file to n output files dynamically" Smart DFSORT Trick at:

http://www.ibm.com/support/docview.wss? ... g3T7000094

Re: Splitting the records based on conditions

PostPosted: Wed Sep 23, 2009 2:13 am
by vinu78
Thanks for the document.

Re: Splitting the records based on conditions

PostPosted: Thu Mar 31, 2011 12:44 am
by vnktrrd
Hi Frank,

I have got the same situation. Pls help...

I have an input file which may contain any number of records and I want to split it to multiple files based on number of records in each output file.
suppose if the input file contains 124000 records, I want to split this file in to 50000 records in each file.that means 1st file should contain first 50000 records(1 - 49000) of input file, 2nd file should contain the next 50000 records(51000 - 100000) and the third file should contain the last 24000 records(100001 - 124000) of input file .so total of three files in this case.

thanks a lot for the help.

Re: Splitting the records based on conditions

PostPosted: Thu Mar 31, 2011 1:56 am
by Frank Yaeger
I want to split this file in to 50000 records in each file.that means 1st file should contain first 50000 records(1 - 49000) of input file, 2nd file should contain the next 50000 records(51000 - 100000) and the third file should contain the last 24000 records(100001 - 124000)


Your math is a bit off. I assume you mean 1-50000 to output file1, 50001-100000 to output file2 and 100001 to 124000 to output file3.

You can use a DFSORT job like the following for that:

//S1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=... input file
//OUT1 DD DSN=...  output file1
//OUT2 DD DSN=...  output file2
//OUT3 DD DSN=...  output file3
//SYSIN DD *
   OPTION COPY
   OUTFIL FNAMES=(OUT1,OUT2,OUT3),SPLIT1R=50000
/*