Splitting a very large file
Posted: Fri Aug 01, 2008 3:49 am
Hi,
We have a file of approx 30 to 40 million records with a LRECL=5493. The number of records can vary from run to run.
I need to split this file into smaller files of 1 million records each. The source file is on cartridge and the smaller files will also be going to cartridge. The source file is already sorted in the order that we want, so no sorting needs to be done.
I also need to add a header and footer to each file that is generated from splitting the file into smaller chunks.
The header will need to contain the following:
An identifier of ten 0's, then a blank, then a timestamp in the format YYYY-MM-DD-HH.MM.SS.TTTTTT (this timestamp basically needs to be the time that the job started and needs to be the same across all files) then the text 'ATO Temporary Resident Super File', then the number of the file (PIC 9(2)).
So the header for the fifth file would look like this:
0000000000 2008-08-01-15.30.32.123456ATO Temporary Resident Super File 05
The footer will need to contain the following:
An identifier of ten 9's then the number of records in the file (PIC 9(7)), then a count of the number of records in all the files up to this file (PIC 9(8)),
So the footer for the fifth file would look like this:
9999999999100000005000000
Lastly, how much temporary work space does ICETOOL need to do this. I know it is a lot of data and am not sure how much to allocate to help with efficiencies.
Any help very much appreciated.
Aaron
We have a file of approx 30 to 40 million records with a LRECL=5493. The number of records can vary from run to run.
I need to split this file into smaller files of 1 million records each. The source file is on cartridge and the smaller files will also be going to cartridge. The source file is already sorted in the order that we want, so no sorting needs to be done.
I also need to add a header and footer to each file that is generated from splitting the file into smaller chunks.
The header will need to contain the following:
An identifier of ten 0's, then a blank, then a timestamp in the format YYYY-MM-DD-HH.MM.SS.TTTTTT (this timestamp basically needs to be the time that the job started and needs to be the same across all files) then the text 'ATO Temporary Resident Super File', then the number of the file (PIC 9(2)).
So the header for the fifth file would look like this:
0000000000 2008-08-01-15.30.32.123456ATO Temporary Resident Super File 05
The footer will need to contain the following:
An identifier of ten 9's then the number of records in the file (PIC 9(7)), then a count of the number of records in all the files up to this file (PIC 9(8)),
So the footer for the fifth file would look like this:
9999999999100000005000000
Lastly, how much temporary work space does ICETOOL need to do this. I know it is a lot of data and am not sure how much to allocate to help with efficiencies.
Any help very much appreciated.
Aaron