Parsing Excel CSV file with enclosed quotes



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Parsing Excel CSV file with enclosed quotes

Postby MitchellSTL » Wed Feb 16, 2011 9:19 am

Input record format: Variable
Input record length: 256
Input record organizations: PS

The CSV input file layout is:

“ID #”, “”, “Acct #”,”Timestamp”,”Free form text #1”,”Free form text #2”,”Timestamp”

Field #1 – ID #; Most likely (always?) 15 positions
Field #2 – Not used; Always empty
Field #3 – Not used; Acct #
Field #4 – Not used; Timestamp #1
Field #5 – Not used; Free form text #1
Field #6 – Free form text #2
Field #7 – Timestamp #2 (format: 2011/02/13 07:06:25)

Output format:
Field #1 – 15 positions
Field #7 – 20 positions; Timestamp #2
Field #6 – 100 positions; Free form text #2

I was using the following SYSIN from another JCL that we use to convert VB to FB record when parsing fixed CSV files. This is the SYSIN minus the last line which is the OUTREC.

//SYSIN    DD *                               
  OPTION VLSHRT                               
  SORT FIELDS=(5,1,CH,A)                       
  INCLUDE COND=(5,4,CH,EQ,C'"ID #')           
  OUTFIL FNAMES=REFORMAT,                     
  CONVERT,                                     
  OUTREC .....


TIA
MitchellSTL
 
Posts: 27
Joined: Wed Feb 16, 2011 8:57 am
Has thanked: 1 time
Been thanked: 0 time

Re: Parsing Excel CSV file with enclosed quotes

 

Re: Parsing Excel CSV file with enclosed quotes

Postby Frank Yaeger » Thu Feb 17, 2011 12:21 am

It isn't clear what you want to do. It isn't clear if you want to keep the quotes in the output. It isn't clear whether or not every input field is enclosed in quotes. It isn't clear what the input record looks like when a field isn't used.

Please show a better example of the records in your input file for various possible cases you need to handle and what you expect for output. Explain the "rules" for getting from input to output. Give the RECFM and LRECL of the input file and output file.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Re: Parsing Excel CSV file with enclosed quotes

Postby MitchellSTL » Thu Feb 17, 2011 12:56 am

Sorry.. Hopefully this will help.

Yes, everY column is enclosed with double quotes. Here's a few records from the file...

"RECD AAAAAAAAAA","","00000X","2011/02/13 07:27:18 ",".79 miles NNW of Prathersville, MO","At Columbia, MO","2011/02/13 07:27:59 "
"RECD AAAAAAAAAA","","00000X","2011/02/13 07:08:44 ","3.81 miles E of Guasti, CA","5.49 miles WSW of Fontana, CA","2011/02/13 07:09:09 "
"RECD AAAAAA    ","","00000X","2011/02/13 07:08:51 ","3.82 miles NNW of Glen Avon, CA","5.46 miles WSW of Fontana, CA","2011/02/13 07:09:14 "
"RECD AAAA      ","","00000X","2011/02/14 06:56:22 ","1.44 miles WNW of Sky Lake, FL","5.97 miles SW of Orlando, FL","2011/02/14 06:57:03 "
"RECD AAAAAAAAAA","","00000X","2011/02/13 07:22:09 ","4.27 miles SSW of Cornersville, TN","43.99 miles NNW of Huntsville, AL","2011/02/13 07:22:34 "
"RECD AAAAAAAAAA","","00000X","2011/02/14 06:20:44 ","3.79 miles E of Guasti, CA","5.49 miles WSW of Fontana, CA","2011/02/14 06:21:20 "
"RECD AAAAAAA   ","","00000x","2007/10/27 06:33:30 ","UNKNOWN","UNKNOWN","UNKNOWN             "


I'd like the output to not have the enclosed quotes. When the a field is not used, it has double quotes. For example, the 2nd field is not used.

The output would look like...

RECD AAAAAAAAAA2011/02/13 07:27:59At Columbia, MO
RECD AAAAAAAAAA2011/02/13 07:09:095.49 miles WSW of Fontana, CA
RECD AAAAAAAAAA2011/02/13 07:09:145.46 miles WSW of Fontana, CA
RECD AAAAAAAAAA2011/02/14 06:57:035.97 miles SW of Orlando, FL
RECD AAAAAAAAAA2011/02/13 07:22:3443.99 miles NNW of Huntsville, AL
RECD AAAAAAAAAA2011/02/14 06:21:205.49 miles WSW of Fontana, CA
RECD AAAAAAA   UNKNOWN            UNKNOWN


This is what I came up with so far.. It works (sorta), BUT it leaves a trailing double quote at the end..

//SYSIN    DD *                                         
  OPTION COPY                                           
  INCLUDE COND=(1,6,CH,EQ,C'"XTRA ')                   
  OUTFIL FNAMES=REFORMAT                               
  INREC PARSE=(%00=(ENDBEFR=C',',PAIR=QUOTE,FIXLEN=17),
                 %=(ENDBEFR=C',',PAIR=QUOTE),           
                 %=(ENDBEFR=C',',PAIR=QUOTE),           
                 %=(ENDBEFR=C',',PAIR=QUOTE),           
                 %=(ENDBEFR=C',',PAIR=QUOTE),           
               %05=(ENDBEFR=C',',PAIR=QUOTE,FIXLEN=52),
               %06=(ENDBEFR=C',',PAIR=QUOTE,FIXLEN=22)),
        BUILD=(%00,%06,%05)                             
  OUTREC BUILD=(2,15,19,19,41,50)


The output from that is...

****** *********************************** Top of Data ************************************
- - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 14 Line(s) not Displayed
000015 RECD AAAAAAAAAA2011/02/13 07:27:59At Columbia, MO"                                 
000016 RECD AAAAAAAAAA2011/02/13 07:09:095.49 miles WSW of Fontana, CA"                   
000017 RECD AAAAAA    2011/02/13 07:09:145.46 miles WSW of Fontana, CA"                   
000018 RECD AAAA      2011/02/14 06:57:035.97 miles SW of Orlando, FL"                     
000019 RECD AAAAAAAAAA2011/02/13 07:22:3443.99 miles NNW of Huntsville, AL"               
000020 RECD AAAAAAAAAA2011/02/14 06:21:205.49 miles WSW of Fontana, CA"                   
000021 RECD AAAAAAA   UNKNOWN            UNKNOWN"                                         
- - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 58 Line(s) not Displayed
****** ********************************** Bottom of Data **********************************
MitchellSTL
 
Posts: 27
Joined: Wed Feb 16, 2011 8:57 am
Has thanked: 1 time
Been thanked: 0 time

Re: Parsing Excel CSV file with enclosed quotes

Postby Frank Yaeger » Thu Feb 17, 2011 2:35 am

I don't know how your job (sorta) works given that

1) it appears to assume FB input, but you said your input is actually VB
2) your INCLUDE statement would not get any hits.

Assuming your input really is VB and you want the output to be FB, here's a DFSORT job that will do what you want:

//S1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=...  input file (VB/256)
//REFORMAT DD DSN=...  output file (FB/84)
//SYSIN    DD *
  OPTION COPY
  INREC PARSE=(%00=(ABSPOS=6,ENDBEFR=C'","',FIXLEN=15),
                 %=(ENDBEFR=C'","'),
                 %=(ENDBEFR=C'","'),
                 %=(ENDBEFR=C'","'),
                 %=(ENDBEFR=C'","'),
               %05=(ENDBEFR=C'","',FIXLEN=50),
               %06=(ENDBEFR=C'"',FIXLEN=19)),
        BUILD=(1,4,%00,%06,%05)
  OUTFIL FNAMES=REFORMAT,VTOF,
     BUILD=(5,84)
/*
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Re: Parsing Excel CSV file with enclosed quotes

Postby MitchellSTL » Thu Feb 17, 2011 5:54 am

Thanks! That did the trick!

Yes it is VB. I could not get the CSV parsing AND VB to FB conversion to occur within the same step. My solution was to do the conversion in a previous step. So by the time I replied / posted I did have it in a FB. BUT my ultimate goal to achieve what you did you in your solution. The INCLUDE that I had was something I accidentally omitted from my requirement (there were some junk records in the front of the data). The INCLUDE would work if it was a FB file. This is what I ended up with.

//SYSIN    DD *                                         
  OPTION VLSCMP                                         
  SORT FIELDS=(5,1,CH,A)                               
  INCLUDE COND=(5,6,CH,EQ,C'"RECD ')                   
  INREC PARSE=(%00=(ABSPOS=6,ENDBEFR=C'","',FIXLEN=15),
                 %=(ENDBEFR=C'","'),                   
                 %=(ENDBEFR=C'","'),                   
                 %=(ENDBEFR=C'","'),                   
                 %=(ENDBEFR=C'","'),                   
               %05=(ENDBEFR=C'","',FIXLEN=50),         
               %06=(ENDBEFR=C'"',FIXLEN=19)),           
        BUILD=(1,4,%00,%06,%05)                         
  OUTFIL FNAMES=REFORMAT,VTOF,                         
     BUILD=(5,84)                                       
/*                                                     


If you see anything that I did wrong, please feel free to suggest any improvements. I added the INCLUDE & SORT parameters and modified the OPTION from COPY to VLSCMP.

Love that this can be done on the mainframe now. I appreciate your help!!
MitchellSTL
 
Posts: 27
Joined: Wed Feb 16, 2011 8:57 am
Has thanked: 1 time
Been thanked: 0 time

Re: Parsing Excel CSV file with enclosed quotes

Postby Frank Yaeger » Thu Feb 17, 2011 6:18 am

I would just suggest moving the SORT statement between the INREC and OUTFIL statements to reflect the order in which the statements are actually processed (INCLUDE, INREC, SORT, OUTFIL). I find that to be a good visual trick. (I assume you know that the SORT statement processes the reformatted INREC records rather than the original input records.)

Glad I could help.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post