Page 1 of 1

Splitting a file

PostPosted: Mon Jul 19, 2010 2:55 pm
by smita257
I have a requirement to split the sample input file, given below.
A sample XML file:
AA1-AA001-3-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
AA1-AA002-4-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
---------------------------------------
----------------------------------

Now, a job needs to be written to split the above files into different different files (no. of files depends on the record present in the main input file) starting from the header part to the end of </UPOS>
In the above example there will be 2 files. First file will contain data:
AA1-AA001-3-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
And the 2nd file will contain data
AA1-AA002-4-V001.xml
<Upos>
<Upo><Deelnemer><b>---------------------------------------------------
-------------------------------------------------------
</UPOS>
Now the number of records in the main input file are variable and depending on that seperate files need to be generated.

Re: Splitting a file

PostPosted: Mon Jul 19, 2010 9:02 pm
by Frank Yaeger
What identifies the "header part"? Is it 'AA' in positions 1-2, or ".xml" somewhere in the record, or what?

What is the RECFM and LRECL of the input file?

What is the maximum number of output files you will have?

Re: Splitting a file

PostPosted: Tue Jul 20, 2010 10:13 am
by smita257
Hi Frank,
Thanks for your response.
There is a change in the requirement. Header part will be identified by <Upos>. And the output file name will be
AA1-AA001-3-V001.xml. Similarly, the second output file name will be AA1-AA002-4-V001.xml
and the records will contain data from <Upos> to </Upos> in the output files and so on.
Input file is a report file, with attributes RECFM=FBA,LRECL=133
Maximum number of output files will be variable. It depends on the record present in the input file.

Thanks,
Debosmita.

Re: Splitting a file

PostPosted: Tue Jul 20, 2010 10:09 pm
by Frank Yaeger
Based on the information you've given I suggest you write a program to do what you want so you can dynamically allocate the correct number of output data sets with the correct names.

If you were willing to hardcode the output DD statements or generate them as part of a job for the internal reader, you could probably do this with DFSORT using its group function and OUTFIL statements, but given that you don't even know the maximum number of output files, it would be difficult.

Re: Splitting a file

PostPosted: Wed Jul 21, 2010 10:30 am
by smita257
Basically, the input xml file is getting generated as part of Coolgen output. Now the requirement is to break the main input file with several output files based on the conditions, that I have written above.
Now for this purpose, it has been proposed to handle this functionality through JCL only.So, I was wondering whether this can be achived with SORT utility, without writing any program.
If I have fixed number of files, for example, if I take 20 output files, can you please suggest how I can write the OUTFIL statements for them, since the record in each file has to start from <Upos> and should end at </UPOS>.Also, I can't use the skiprec or stopaft facility, since it's not known, how many records each output file will contain.Generally,in sort in the OUTFIL statements, I have used with include condition which is unique for a particular file.In this case, for all the output files, it should start at <Upos> and end with </Upos>.
Is there any facility where we can mention the starting position and ending position of a file, to be written?
suppose, for example START(1,6,CH,EQ,C'<UPOS>'), END(1,7,CH,EQ,C'</UPOS>')--is there any provision that we can use sort facility like that?Then the desired data will be written for a particular output file, including the records containing between the <upos> and </upos> statement.But again, data will be written for one output file.For the next output file how it will be written, because the input file has to point to the 2nd <upos>.
Hope, you are getting my point.
Kindly suggest.
Thanks.

Re: Splitting a file

PostPosted: Wed Jul 21, 2010 10:29 pm
by Frank Yaeger
If you're willing to hardcode the DD statements, you can use a DFSORT job with the group function like the following. Since your input file has RECFM=FBA, I assumed <UPOS> and </UPOS> start in position 2 after the carriage control character, not in position 1.

//S1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=...  input file (FBA/133)
//OUT01 DD DSN=...  output file1 (FBA/133)
//OUT02 DD DSN=...  output file1 (FBA/133)
//OUT03 DD DSN=...  output file1 (FBA/133)
...  as many OUTnn DDs as you might need
//SYSIN DD *
  OPTION COPY
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(2,6,CH,EQ,C'<UPOS>'),
    END=(2,7,CH,EQ,C'</UPOS>'),PUSH=(134:ID=8))
  OUTFIL FNAMES=OUT01,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,1)
  OUTFIL FNAMES=OUT02,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,2)
  OUTFIL FNAMES=OUT03,BUILD=(1,133),INCLUDE=(134,8,ZD,EQ,3)
  ...  as many OUTFIL statements as you might need
/*

Re: Splitting a file

PostPosted: Thu Jul 22, 2010 12:27 pm
by smita257
Thank you Frank, for ur help.
If you don't mind, can you please tell me what does PUSH=(134:ID=8) mean here?
I found from the sort manual, syntax Push=(C:Item), where c is the starting position.What is the function of ID=8 here?
Pls explain.

Re: Splitting a file

PostPosted: Thu Jul 22, 2010 5:39 pm
by smita257
Does it mean an identifier will be added for each group from position 134 to 141 and increased by 1 for each group?
Thanks.

Re: Splitting a file

PostPosted: Thu Jul 22, 2010 10:33 pm
by Frank Yaeger
Yes - that's what it means. So each record of the first "group" will have id=1, each record of the second "group" will have id=2, etc.

Re: Splitting a file

PostPosted: Fri Jul 23, 2010 9:48 am
by smita257
Thank You!