Calculate exact length of PARSEd field



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Calculate exact length of PARSEd field

Postby mendijur » Fri Apr 05, 2013 8:40 pm

Hello,

I have the following requirement and I started trying DFSORT's PARSE statement:

The input is a CSV file (LRECL=80, LRECFM=FB). Each record will be inserted in a DB2 table. Each comma-separated field is the value of a column of the target DB2 table. Records that start with a semicolon are comments and so must be omitted/ignored.

So the object is to transform each CSV record into an SQL INSERT statement.

Sample CSV input:
;XMLTAG,MANDATORY(YES/NO),TAGTYPE(Internal/Ctrl/Normal)
<SctiesSttlmCondModStsAdvc>,YES,CTRL
<ReqRef>,NO,CTRL
<Id>,YES,NORMAL
</ReqRef>,NO,CTRL
...


Those 4 first records would be at a later stage transformed into something similar to the following:
INSERT INTO MyTableName VALUES(005,'<SctiesSttlmCondModStsAdvc>','Y', 'C', 27);
INSERT INTO MyTableName VALUES(010,'<ReqRef>','N', 'C',  8);
INSERT INTO MyTableName VALUES(015,'<Id>','Y', 'N', 4);
INSERT INTO MyTableName VALUES(020,'</ReqRef>','N', 'C', 9);

Where the first field in the SQL statement is a sequence number (incremented by 5), and the last field is the length of the XML tag in the input CSV file.

Using DFSORT's PARSE function, I've almost managed to parse the input, but I can't find a way to calculate the length.

This is my question: How can I calculate with DFSORT the exact length of a parsed field? Scrutinizing DFSORT manual I haven't come up with a "length" function. Any hint or suggestion would be highly appreciated.

Note: I know it's possible to LOAD a table using CSV syntax. I'm not interested in this. What interests me is the feasibility of calculating the length of a parsed field.

This is my approach using DFSORT statements:
//SYSIN DD *
  OPTION COPY
* Ignore comments in input
  OMIT COND=(1,1,CH,EQ,C';')
* Parse input fields
  OUTFIL PARSE=(%01=(ENDBEFR=C',',PAIR=QUOTE,FIXLEN=80), * XML Tag, max 80 bytes
                %02=(ENDBEFR=C',',FIXLEN=1),  * Mandatory: (Y)/(N)
                %03=(ENDBEFR=C',',FIXLEN=1)), * TagType: (I)nternal/(C)trl/(N)ormal
         BUILD=(SEQNUM,3,ZD,START=5,INCR=5,   * Writes Sequence number
                %01,    * writes XML tag
                %02,    * writes Y or N (mandatory)
                %03,    * writes I C or N (tagtype)
                ????? ) * How to calculate exact LENGTH OF %01 ??????
/*
mendijur
 
Posts: 14
Joined: Tue Feb 14, 2012 3:14 am
Has thanked: 2 times
Been thanked: 0 time

Re: Calculate exact length of PARSEd field

Postby BillyBoyo » Fri Apr 05, 2013 11:34 pm

If you notice, you tell PARSE the length.

Do you mean "the length of the data excluding trailing blanks" or what?
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: Calculate exact length of PARSEd field

Postby mendijur » Sat Apr 06, 2013 1:19 am

BillyBoyo wrote:If you notice, you tell PARSE the length.

Do you mean "the length of the data excluding trailing blanks" or what?


Yes BillyBoyo, I mean the length of the original data parsed, i.e., in my example above, I need the exact length of the XML tag name, up to the closing '>' character, excluding any trailing blank.

As far as I understand, specifying FIXLEN=80 (FIXLEN is mandatory) for PARSE will make DFSORT pad with spaces on the right if the data parsed is less than 80 bytes long, such as the manual states "FIXLEN=m specifies the length (m) of the fixed area to contain the extracted variable-length data for the %nn fixed parsed field".

Thanks for your reply.
mendijur
 
Posts: 14
Joined: Tue Feb 14, 2012 3:14 am
Has thanked: 2 times
Been thanked: 0 time

Re: Calculate exact length of PARSEd field

Postby BillyBoyo » Sun Apr 07, 2013 12:56 pm

How many lengths of fields do you want to know at a time?

Can there be embedded blanks in the data?

Say you have a FIXLEN=5, and you put the parsed field at position 1. This is when you can have embedded blanks:
  ... IFTHEN=(WHEN=INIT,OVERLAY=(6:C'5')),
      IFTHEN=(WHEN=(1,5,CH,EQ,C' '),
                       OVERLAY=(6:C'0')),
      IFTHEN=(WHEN=(2,4,CH,EQ,C' '),
                       OVERLAY=(6:C'1')),
      IFTHEN=(WHEN=(3,3,CH,EQ,C' '),
                       OVERLAY=(6:C'2')),
      IFTHEN=(WHEN=(4,2,CH,EQ,C' '),
                       OVERLAY=(6:C'3')),
      IFTHEN=(WHEN=(5,1,CH,EQ,C' '),
                       OVERLAY=(6:C'4'))


You might think that that's a lot of code, but since you want the answer... The code can be "generated" and then you copy it in (saves typos).

If there is no chance of embedded blanks, there is a "home-rolled" "length function" :-)
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: Calculate exact length of PARSEd field

Postby mendijur » Sun Apr 07, 2013 7:36 pm

Many thanks for your answer, BillyBoyo.

Yes, it seems quite a bit of code, and even more if the field is 80 bytes long, as in my example. As you suggest, the code could somehow be automatically generated from a template.

However, in this other post, another user had the same need as me, and it seems there's no easy/direct way to calculate the exact length of a parsed field, even if DFSORT must internally know. I wonder how the length of a parsed field could be externally exposed/accessible. I also wonder when. In future versions?
dfsort-icetool-icegener/topic6704.html

Anyway, I think I'll resort to plan B: do the parsing using AWK scripting language from z/OS UNIX. AWK has a built-in function "length" and can be applied to any parsed field.

Again, many thanks for your effort and responsiveness, BillyBoyo.
Kind regards.
mendijur
 
Posts: 14
Joined: Tue Feb 14, 2012 3:14 am
Has thanked: 2 times
Been thanked: 0 time

Re: Calculate exact length of PARSEd field

Postby BillyBoyo » Sun Apr 07, 2013 8:56 pm

Yes, if you need to know the lengths, AWK would be a good choice of language, or Rexx, perhaps. What are your data volumes?

As to "when" this may happen, if you can describe why you need the length, I'm sure Kolusu, a DFSORT developer at IBM, will see the post.
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: Calculate exact length of PARSEd field

Postby skolusu » Tue Apr 09, 2013 2:41 am

well here is a simplified version to calculate the length.

//STEP0100 EXEC PGM=SORT                 
//SYSOUT   DD SYSOUT=*                   
//SYMNAMES DD *                         
T,C'1234567890'                         
//SORTIN   DD *                         
<SCTIESSTTLMCONDMODSTSADVC>             
<REQREF>                                 
<ID>                                     
</REQREF>                               
<LENGTH OF 79                           
<LENGTH OF 80 BYTES                     
<BLAH BLAH BLAH BAS>                     
<DUMMY RECORD>                           
                                         
//SORTOUT  DD SYSOUT=*                   
//SYSIN    DD *                         
  OPTION COPY                                                       
  INREC IFTHEN=(WHEN=INIT,PARSE=(%01=(ENDAT=C'>',FIXLEN=80)),       
  BUILD=(%01,%01,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=82))), 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'00'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'10'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'20'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'30'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'40'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'50'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'60'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,CH,EQ,C' '),OVERLAY=(81:81,82,SQZ=(SHIFT=LEFT,
          PAIR=QUOTE,TRAIL=T),166:C'70'),HIT=NEXT),                 
  IFTHEN=(WHEN=(162,1,SS,NE,C'",0'),                                 
  BUILD=(+80,SUB,(166,2,ZD,ADD,162,1,ZD),EDIT=(TTT),1,80)),         
  IFTHEN=(WHEN=(162,1,CH,EQ,C'"'),BUILD=(C'080',1,80)),             
  IFTHEN=(WHEN=(162,1,CH,EQ,C'0'),                                   
  BUILD=(+80,SUB,(+10,ADD,166,2,ZD,ADD,162,1,ZD),EDIT=(TTT),1,80))   
//*


The output from this job is
027<SCTIESSTTLMCONDMODSTSADVC>     
008<REQREF>                       
004<ID>                           
009</REQREF>                       
079<LENGTH OF 79                   
080<LENGTH OF 80 BYTES             
020<BLAH BLAH BLAH BAS>           
014<DUMMY RECORD>                 
000                               
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times

Re: Calculate exact length of PARSEd field

Postby BillyBoyo » Tue Apr 09, 2013 6:03 am

Well, I'd already prepared one for if there were no embedded blanks.

After seeing Kolusu's code, I realised I could apply the same technique (which uses FINDREP) whilst preserving the embedded blanks by using SQZ with PAIR=QUOTE.

I have symbols for counting, expressed as binary values. Symbols cannot be continued, so I need three of them. I need to bound them by quotes (got me scratching my head until I realised X'40' was being squeezed from my counts :-) ), and prefix the first quote by "something" (I chose a binary zero) to avoid confusion from consecutive quotes.

//STEP0200 EXEC PGM=SORT
//SYMNAMES DD *
A-QUOTE,C'"'
A-BINARY-ZERO,X'00'
C1,X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F'
C2,X'202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F'
C3,X'404142434445464748494A4B4C4D4E4F50'
//SYMNOUT DD SYSOUT=*
//SYSOUT   DD SYSOUT=*
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
                                                                       
  OPTION COPY
  INREC IFTHEN=(WHEN=INIT,
                  PARSE=(%01=(ENDAT=C'>',FIXLEN=80)),
                  BUILD=(%01,%01)),
  IFTHEN=(WHEN=INIT,
         FINDREP=(IN=C'"',
                  OUT=X'FF',
                  STARTPOS=81,
                  ENDPOS=160)),
  IFTHEN=(WHEN=INIT,
         OVERLAY=(81:81,80,
                  JFY=(SHIFT=LEFT,
                  LEAD=C'"',
                  TRAIL=C'"',
                  LENGTH=82),
                  A-BINARY-ZERO,A-QUOTE,C1,C2,C3,A-QUOTE)),
  IFTHEN=(WHEN=INIT,
        OVERLAY=(81:81,165,SQZ=(SHIFT=LEFT,PAIR=QUOTE))),
  IFTHEN=(WHEN=INIT,
  BUILD=(+80,SUB,165,1,BI,EDIT=(TTT),1,80))
//SORTIN   DD *
"<SCTIESSTTLMCONDMODSTSADVC012>
"<SCTIESSTTLMCONDMODSTSADVC""">
<SCTIESSTTLMCONDMODSTSADVC>
<REQREF>
<ID>
</REQREF>
<LENGTH OF 79                                                                 > 
<LENGTH OF OF 80 BYTES                                                            >
<BLAH BLAH BLAH BAS>
<"LAH BLAH BLAH BAS>
<BLAH BLAH "LAH BAS>
<"BLAH BLAH BLAH BAS">
<DUMMY RECORD>
>
1>


Produces:
031"<SCTIESSTTLMCONDMODSTSADVC012>
031"<SCTIESSTTLMCONDMODSTSADVC""">
027<SCTIESSTTLMCONDMODSTSADVC>   
008<REQREF>                       
004<ID>                           
009</REQREF>                     
079<LENGTH OF 79                                                                 >
080<LENGTH OF 80 BYTES                                                            >
020<BLAH BLAH BLAH BAS>           
020<"LAH BLAH BLAH BAS>           
020<BLAH BLAH "LAH BAS>           
022<"BLAH BLAH BLAH BAS">         
014<DUMMY RECORD>                 
001>                             
0021>                             
000                               
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: Calculate exact length of PARSEd field

Postby mendijur » Tue Apr 09, 2013 5:19 pm

BillyBoyo wrote:Yes, if you need to know the lengths, AWK would be a good choice of language, or Rexx, perhaps. What are your data volumes?

As to "when" this may happen, if you can describe why you need the length, I'm sure Kolusu, a DFSORT developer at IBM, will see the post.


Data volumes are presumably small. Even if at production might be higher, the process I'm working on would only be run once, just to automate the initial load of data. Afterwards, the user could use the "modify" function/process to change the data loaded accordingly.

If you have a look at the example above, the length I need is the length of an XML tag. One of the pieces of input data is the XML tag. I wanted to abstract from counting and let the process calculate it automatically. In this particular case, I was interested in the length of only one of the parsed fields (in the example, the first, i.e., the length of the XML tag).

Having used AWK has been easier. The first chance was to use DFSORT's PARSE function, which I managed to code, but then I missed a "length" function.

Let me study your answer and Kolusu's.
Many many thanks for your support guys.
mendijur
 
Posts: 14
Joined: Tue Feb 14, 2012 3:14 am
Has thanked: 2 times
Been thanked: 0 time

Re: Calculate exact length of PARSEd field

Postby mendijur » Sun Apr 14, 2013 3:11 am

I find AWK, for this particular case, a more flexible approach than using DFSORT.

I'm very grateful to you guys for your support.
Kindest regards.
mendijur
 
Posts: 14
Joined: Tue Feb 14, 2012 3:14 am
Has thanked: 2 times
Been thanked: 0 time

Next

Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post