Page 1 of 2

Calculate exact length of PARSEd field

PostPosted: Fri Apr 05, 2013 8:40 pm
by mendijur
Hello,

I have the following requirement and I started trying DFSORT's PARSE statement:

The input is a CSV file (LRECL=80, LRECFM=FB). Each record will be inserted in a DB2 table. Each comma-separated field is the value of a column of the target DB2 table. Records that start with a semicolon are comments and so must be omitted/ignored.

So the object is to transform each CSV record into an SQL INSERT statement.

Sample CSV input:
;XMLTAG,MANDATORY(YES/NO),TAGTYPE(Internal/Ctrl/Normal)
<SctiesSttlmCondModStsAdvc>,YES,CTRL
<ReqRef>,NO,CTRL
<Id>,YES,NORMAL
</ReqRef>,NO,CTRL
...


Those 4 first records would be at a later stage transformed into something similar to the following:
INSERT INTO MyTableName VALUES(005,'<SctiesSttlmCondModStsAdvc>','Y', 'C', 27);
INSERT INTO MyTableName VALUES(010,'<ReqRef>','N', 'C',  8);
INSERT INTO MyTableName VALUES(015,'<Id>','Y', 'N', 4);
INSERT INTO MyTableName VALUES(020,'</ReqRef>','N', 'C', 9);

Where the first field in the SQL statement is a sequence number (incremented by 5), and the last field is the length of the XML tag in the input CSV file.

Using DFSORT's PARSE function, I've almost managed to parse the input, but I can't find a way to calculate the length.

This is my question: How can I calculate with DFSORT the exact length of a parsed field? Scrutinizing DFSORT manual I haven't come up with a "length" function. Any hint or suggestion would be highly appreciated.

Note: I know it's possible to LOAD a table using CSV syntax. I'm not interested in this. What interests me is the feasibility of calculating the length of a parsed field.

This is my approach using DFSORT statements:
//SYSIN DD *
  OPTION COPY
* Ignore comments in input
  OMIT COND=(1,1,CH,EQ,C';')
* Parse input fields
  OUTFIL PARSE=(%01=(ENDBEFR=C',',PAIR=QUOTE,FIXLEN=80), * XML Tag, max 80 bytes
                %02=(ENDBEFR=C',',FIXLEN=1),  * Mandatory: (Y)/(N)
                %03=(ENDBEFR=C',',FIXLEN=1)), * TagType: (I)nternal/(C)trl/(N)ormal
         BUILD=(SEQNUM,3,ZD,START=5,INCR=5,   * Writes Sequence number
                %01,    * writes XML tag
                %02,    * writes Y or N (mandatory)
                %03,    * writes I C or N (tagtype)
                ????? ) * How to calculate exact LENGTH OF %01 ??????
/*

Re: Calculate exact length of PARSEd field

PostPosted: Fri Apr 05, 2013 11:34 pm
by BillyBoyo
If you notice, you tell PARSE the length.

Do you mean "the length of the data excluding trailing blanks" or what?

Re: Calculate exact length of PARSEd field

PostPosted: Sat Apr 06, 2013 1:19 am
by mendijur
BillyBoyo wrote:If you notice, you tell PARSE the length.

Do you mean "the length of the data excluding trailing blanks" or what?


Yes BillyBoyo, I mean the length of the original data parsed, i.e., in my example above, I need the exact length of the XML tag name, up to the closing '>' character, excluding any trailing blank.

As far as I understand, specifying FIXLEN=80 (FIXLEN is mandatory) for PARSE will make DFSORT pad with spaces on the right if the data parsed is less than 80 bytes long, such as the manual states "FIXLEN=m specifies the length (m) of the fixed area to contain the extracted variable-length data for the %nn fixed parsed field".

Thanks for your reply.

Re: Calculate exact length of PARSEd field

PostPosted: Sun Apr 07, 2013 12:56 pm
by BillyBoyo
How many lengths of fields do you want to know at a time?

Can there be embedded blanks in the data?

Say you have a FIXLEN=5, and you put the parsed field at position 1. This is when you can have embedded blanks:
  ... IFTHEN=(WHEN=INIT,OVERLAY=(6:C'5')),
      IFTHEN=(WHEN=(1,5,CH,EQ,C' '),
                       OVERLAY=(6:C'0')),
      IFTHEN=(WHEN=(2,4,CH,EQ,C' '),
                       OVERLAY=(6:C'1')),
      IFTHEN=(WHEN=(3,3,CH,EQ,C' '),
                       OVERLAY=(6:C'2')),
      IFTHEN=(WHEN=(4,2,CH,EQ,C' '),
                       OVERLAY=(6:C'3')),
      IFTHEN=(WHEN=(5,1,CH,EQ,C' '),
                       OVERLAY=(6:C'4'))


You might think that that's a lot of code, but since you want the answer... The code can be "generated" and then you copy it in (saves typos).

If there is no chance of embedded blanks, there is a "home-rolled" "length function" :-)

Re: Calculate exact length of PARSEd field

PostPosted: Sun Apr 07, 2013 7:36 pm
by mendijur
Many thanks for your answer, BillyBoyo.

Yes, it seems quite a bit of code, and even more if the field is 80 bytes long, as in my example. As you suggest, the code could somehow be automatically generated from a template.

However, in this other post, another user had the same need as me, and it seems there's no easy/direct way to calculate the exact length of a parsed field, even if DFSORT must internally know. I wonder how the length of a parsed field could be externally exposed/accessible. I also wonder when. In future versions?
dfsort-icetool-icegener/topic6704.html

Anyway, I think I'll resort to plan B: do the parsing using AWK scripting language from z/OS UNIX. AWK has a built-in function "length" and can be applied to any parsed field.

Again, many thanks for your effort and responsiveness, BillyBoyo.
Kind regards.

Re: Calculate exact length of PARSEd field

PostPosted: Sun Apr 07, 2013 8:56 pm
by BillyBoyo
Yes, if you need to know the lengths, AWK would be a good choice of language, or Rexx, perhaps. What are your data volumes?

As to "when" this may happen, if you can describe why you need the length, I'm sure Kolusu, a DFSORT developer at IBM, will see the post.

Re: Calculate exact length of PARSEd field

PostPosted: Tue Apr 09, 2013 2:41 am
by skolusu
well here is a simplified version to calculate the length.

//STEP0100 EXEC PGM=SORT                 
//SYSOUT   DD SYSOUT=*                   
//SYMNAMES DD *                         
T,C'1234567890'                         
//SORTIN   DD *                         
<SCTIESSTTLMCONDMODSTSADVC>             
<REQREF>                                 
<ID>                                     
</REQREF>                               
<LENGTH OF 79                           
<LENGTH OF 80 BYTES                     
<BLAH BLAH BLAH BAS>                     
<DUMMY RECORD>                           
                                         
//SORTOUT  DD SYSOUT=*                   
//SYSIN    DD *                         
  OPTION COPY                                                       
  INREC IFTHEN=(WHEN=INIT,PARSE=(%01=(ENDAT=C'>',FIXLEN=80)),       
  BUILD=(%01,%01,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=82))), 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'00'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'10'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'20'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'30'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'40'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'50'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'60'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'70'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,SS,NE,C'",0'),                                 
  BUILD=(+80,SUB,(166,2,ZD,ADD,162,1,ZD),EDIT=(TTT),1,80)),         
  IFTHEN=(WHEN=(162,1,CH,EQ,C'"'),BUILD=(C'080',1,80)),             
  IFTHEN=(WHEN=(162,1,CH,EQ,C'0'),                                   
  BUILD=(+80,SUB,(+10,ADD,166,2,ZD,ADD,162,1,ZD),EDIT=(TTT),1,80))   
//*


The output from this job is
027<SCTIESSTTLMCONDMODSTSADVC>     
008<REQREF>                       
004<ID>                           
009</REQREF>                       
079<LENGTH OF 79                   
080<LENGTH OF 80 BYTES             
020<BLAH BLAH BLAH BAS>           
014<DUMMY RECORD>                 
000                               

Re: Calculate exact length of PARSEd field

PostPosted: Tue Apr 09, 2013 6:03 am
by BillyBoyo
Well, I'd already prepared one for if there were no embedded blanks.

After seeing Kolusu's code, I realised I could apply the same technique (which uses FINDREP) whilst preserving the embedded blanks by using SQZ with PAIR=QUOTE.

I have symbols for counting, expressed as binary values. Symbols cannot be continued, so I need three of them. I need to bound them by quotes (got me scratching my head until I realised X'40' was being squeezed from my counts :-) ), and prefix the first quote by "something" (I chose a binary zero) to avoid confusion from consecutive quotes.

//STEP0200 EXEC PGM=SORT
//SYMNAMES DD *
A-QUOTE,C'"'
A-BINARY-ZERO,X'00'
C1,X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F'
C2,X'202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F'
C3,X'404142434445464748494A4B4C4D4E4F50'
//SYMNOUT DD SYSOUT=*
//SYSOUT   DD SYSOUT=*
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
                                                                       
  OPTION COPY
  INREC IFTHEN=(WHEN=INIT,
                  PARSE=(%01=(ENDAT=C'>',FIXLEN=80)),
                  BUILD=(%01,%01)),
  IFTHEN=(WHEN=INIT,
         FINDREP=(IN=C'"',
                  OUT=X'FF',
                  STARTPOS=81,
                  ENDPOS=160)),
  IFTHEN=(WHEN=INIT,
         OVERLAY=(81:81,80,
                  JFY=(SHIFT=LEFT,
                  LEAD=C'"',
                  TRAIL=C'"',
                  LENGTH=82),
                  A-BINARY-ZERO,A-QUOTE,C1,C2,C3,A-QUOTE)),
  IFTHEN=(WHEN=INIT,
        OVERLAY=(81:81,165,SQZ=(SHIFT=LEFT,PAIR=QUOTE))),
  IFTHEN=(WHEN=INIT,
  BUILD=(+80,SUB,165,1,BI,EDIT=(TTT),1,80))
//SORTIN   DD *
"<SCTIESSTTLMCONDMODSTSADVC012>
"<SCTIESSTTLMCONDMODSTSADVC""">
<SCTIESSTTLMCONDMODSTSADVC>
<REQREF>
<ID>
</REQREF>
<LENGTH OF 79                                                                 > 
<LENGTH OF OF 80 BYTES                                                            >
<BLAH BLAH BLAH BAS>
<"LAH BLAH BLAH BAS>
<BLAH BLAH "LAH BAS>
<"BLAH BLAH BLAH BAS">
<DUMMY RECORD>
>
1>


Produces:
031"<SCTIESSTTLMCONDMODSTSADVC012>
031"<SCTIESSTTLMCONDMODSTSADVC""">
027<SCTIESSTTLMCONDMODSTSADVC>   
008<REQREF>                       
004<ID>                           
009</REQREF>                     
079<LENGTH OF 79                                                                 >
080<LENGTH OF 80 BYTES                                                            >
020<BLAH BLAH BLAH BAS>           
020<"LAH BLAH BLAH BAS>           
020<BLAH BLAH "LAH BAS>           
022<"BLAH BLAH BLAH BAS">         
014<DUMMY RECORD>                 
001>                             
0021>                             
000                               

Re: Calculate exact length of PARSEd field

PostPosted: Tue Apr 09, 2013 5:19 pm
by mendijur
BillyBoyo wrote:Yes, if you need to know the lengths, AWK would be a good choice of language, or Rexx, perhaps. What are your data volumes?

As to "when" this may happen, if you can describe why you need the length, I'm sure Kolusu, a DFSORT developer at IBM, will see the post.


Data volumes are presumably small. Even if at production might be higher, the process I'm working on would only be run once, just to automate the initial load of data. Afterwards, the user could use the "modify" function/process to change the data loaded accordingly.

If you have a look at the example above, the length I need is the length of an XML tag. One of the pieces of input data is the XML tag. I wanted to abstract from counting and let the process calculate it automatically. In this particular case, I was interested in the length of only one of the parsed fields (in the example, the first, i.e., the length of the XML tag).

Having used AWK has been easier. The first chance was to use DFSORT's PARSE function, which I managed to code, but then I missed a "length" function.

Let me study your answer and Kolusu's.
Many many thanks for your support guys.

Re: Calculate exact length of PARSEd field

PostPosted: Sun Apr 14, 2013 3:11 am
by mendijur
I find AWK, for this particular case, a more flexible approach than using DFSORT.

I'm very grateful to you guys for your support.
Kindest regards.