Incoming file, delimited, no line feed, spanning records?



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Incoming file, delimited, no line feed, spanning records?

Postby Peter J » Wed May 28, 2014 7:59 pm

Hello, I have been searching but could not find that this is possible to handle:

Data looks like

2014-01-01 12:12:12,data,data,,,,data,data,,,data,,2014-01-01 12:12:22,data,data,,,,data,data,data,data,data,,2014-01-01 12:12:32,data,data,,,,data,data,,,data,,<continues to EOF>da
ta,data,,2014-01-01 12:12:12,data,data,,,,data,data,,,data,,

until max file length (VB) reached and then the data wraps. So in essence it's one giant record. A 50GB record.
Each logical record begins with a timestamp, and then data values or nulls based on the comma delimiter.

Can I read this effectively and create separate records?

2014-01-01 12:12:12,data,data,,data,,data,data,,,data,,<CRLF>
2014-01-01 12:12:22,data,data,,,,data,data,data,data,data,,<CRLF>
2014-01-01 12:12:32,data,data,,,,data,data,,,data,,<CRLF>
...

This is some sort of internet data dump coming in and we're arguing for line feeds
Peter J
 
Posts: 2
Joined: Wed May 28, 2014 7:39 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Incoming file, delimited, no line feed, spanning records?

 

Re: Incoming file, delimited, no line feed, spanning records

Postby steve-myers » Wed May 28, 2014 8:56 pm

The data you described probably start life as a Windoze text file, and was sent to MVS as a binary transfer or by the 3270 file transfer service using ASCII without CRLF. This assumes <CRLF> is not a figment of your imagination.

OS/360 data sets do not use record delimiter characters of any sort. IBM made that mistake with the 14xx series of computers and made a deliberate decision not to repeat it with System/360. RECM V and VB data sets are delimited by physical record boundaries, Block Descriptor Words (BDWs), and Record Descriptor Words (RDWs). RECFM F and FB data sets are delimited by physical record boundaries, and the data set's LRECL. RECFM U data sets are delimited by physical record boundaries. The Selecting Record Formats for Non-VSAM Data Sets chapter in DFSMS Using Data Sets for your z/OS release discusses this matter in more detail.

The good news is it should be fairly easy to transform this data to a standard OS/360 data set. Your sort product may be able to do it, though that's out of my field, or it should be possible to write a program for the purpose.
steve-myers
Global moderator
 
Posts: 1885
Joined: Thu Jun 03, 2010 6:21 pm
Has thanked: 4 times
Been thanked: 197 times

Re: Incoming file, delimited, no line feed, spanning records

Postby BillyBoyo » Wed May 28, 2014 10:19 pm

And there's me just about to say it'll be tricky in SORT. Perhaps.

Are there a fixed number of fields? How many? Maximum fields on a record? LRECL? What SORT product? What version of same (from sysout of any SORT step).
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 264 times

Re: Incoming file, delimited, no line feed, spanning records

Postby Peter J » Wed May 28, 2014 10:44 pm

Z/OS DFSORT V1R12

Fixed number of fields - so they say, 28. But when you come to the end of a column delimiter, the 28th field may or may not be the timestamp that separates data.
Current leaning is to max VB length - 32767.

Which means, I suspect, I would have to examine each data element, determine if it starts with 2014- or 2015- and so on to say - ah, ok this is new.
If there were 1 or two records per row, then no problem, but what I can't determine is if I come to End Of Line - and the record continues on the next one, how would I represent that?
Peter J
 
Posts: 2
Joined: Wed May 28, 2014 7:39 pm
Has thanked: 0 time
Been thanked: 0 time


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post