Splitting a file



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Splitting a file

Postby smita257 » Mon Jul 19, 2010 2:55 pm

I have a requirement to split the sample input file, given below.
A sample XML file:
AA1-AA001-3-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
AA1-AA002-4-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
---------------------------------------
----------------------------------

Now, a job needs to be written to split the above files into different different files (no. of files depends on the record present in the main input file) starting from the header part to the end of </UPOS>
In the above example there will be 2 files. First file will contain data:
AA1-AA001-3-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
And the 2nd file will contain data
AA1-AA002-4-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
Now the number of records in the main input file are variable and depending on that seperate files need to be generated.
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Splitting a file

Postby Frank Yaeger » Mon Jul 19, 2010 9:02 pm

What identifies the "header part"? Is it 'AA' in positions 1-2, or ".xml" somewhere in the record, or what?

What is the RECFM and LRECL of the input file?

What is the maximum number of output files you will have?
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Splitting a file

Postby smita257 » Tue Jul 20, 2010 10:13 am

Hi Frank,
Thanks for your response.
There is a change in the requirement. Header part will be identified by <Upos>. And the output file name will be
AA1-AA001-3-V001.xml. Similarly, the second output file name will be AA1-AA002-4-V001.xml
and the records will contain data from <Upos> to </Upos> in the output files and so on.
Input file is a report file, with attributes RECFM=FBA,LRECL=133
Maximum number of output files will be variable. It depends on the record present in the input file.

Thanks,
Debosmita.
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Splitting a file

Postby Frank Yaeger » Tue Jul 20, 2010 10:09 pm

Based on the information you've given I suggest you write a program to do what you want so you can dynamically allocate the correct number of output data sets with the correct names.

If you were willing to hardcode the output DD statements or generate them as part of a job for the internal reader, you could probably do this with DFSORT using its group function and OUTFIL statements, but given that you don't even know the maximum number of output files, it would be difficult.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Splitting a file

Postby smita257 » Wed Jul 21, 2010 10:30 am

Basically, the input xml file is getting generated as part of Coolgen output. Now the requirement is to break the main input file with several output files based on the conditions, that I have written above.
Now for this purpose, it has been proposed to handle this functionality through JCL only.So, I was wondering whether this can be achived with SORT utility, without writing any program.
If I have fixed number of files, for example, if I take 20 output files, can you please suggest how I can write the OUTFIL statements for them, since the record in each file has to start from <Upos> and should end at </UPOS>.Also, I can't use the skiprec or stopaft facility, since it's not known, how many records each output file will contain.Generally,in sort in the OUTFIL statements, I have used with include condition which is unique for a particular file.In this case, for all the output files, it should start at <Upos> and end with </Upos>.
Is there any facility where we can mention the starting position and ending position of a file, to be written?
suppose, for example START(1,6,CH,EQ,C'<UPOS>'), END(1,7,CH,EQ,C'</UPOS>')--is there any provision that we can use sort facility like that?Then the desired data will be written for a particular output file, including the records containing between the <upos> and </upos> statement.But again, data will be written for one output file.For the next output file how it will be written, because the input file has to point to the 2nd <upos>.
Hope, you are getting my point.
Kindly suggest.
Thanks.
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Splitting a file

Postby Frank Yaeger » Wed Jul 21, 2010 10:29 pm

If you're willing to hardcode the DD statements, you can use a DFSORT job with the group function like the following. Since your input file has RECFM=FBA, I assumed <UPOS> and </UPOS> start in position 2 after the carriage control character, not in position 1.

//S1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=...  input file (FBA/133)
//OUT01 DD DSN=...  output file1 (FBA/133)
//OUT02 DD DSN=...  output file1 (FBA/133)
//OUT03 DD DSN=...  output file1 (FBA/133)
...  as many OUTnn DDs as you might need
//SYSIN DD *
  OPTION COPY
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(2,6,CH,EQ,C'<UPOS>'),
    END=(2,7,CH,EQ,C'</UPOS>'),PUSH=(134:ID=8))
  OUTFIL FNAMES=OUT01,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,1)
  OUTFIL FNAMES=OUT02,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,2)
  OUTFIL FNAMES=OUT03,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,3)
  ...  as many OUTFIL statements as you might need
/*
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Splitting a file

Postby smita257 » Thu Jul 22, 2010 12:27 pm

Thank you Frank, for ur help.
If you don't mind, can you please tell me what does PUSH=(134:ID=8) mean here?
I found from the sort manual, syntax Push=(C:Item), where c is the starting position.What is the function of ID=8 here?
Pls explain.
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Splitting a file

Postby smita257 » Thu Jul 22, 2010 5:39 pm

Does it mean an identifier will be added for each group from position 134 to 141 and increased by 1 for each group?
Thanks.
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Splitting a file

Postby Frank Yaeger » Thu Jul 22, 2010 10:33 pm

Yes - that's what it means. So each record of the first "group" will have id=1, each record of the second "group" will have id=2, etc.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Splitting a file

Postby smita257 » Fri Jul 23, 2010 9:48 am

Thank You!
smita257
 
Posts: 13
Joined: Mon Jul 19, 2010 2:39 pm
Has thanked: 0 time
Been thanked: 0 time


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post