Page 1 of 1

Formatting free form data

PostPosted: Tue Feb 28, 2017 7:12 pm
by Aki88
Hello,

I am trying to format data which is currently spread across multiple rows in a dataset, and generate a simplified single line CSV output.

For example, (one of the formatting request is of) a sample IEBPTPCH output is given below; this is a portion of the merged dump of all Connect-Direct control-cards on a given LPAR:

VMEMBER NAME  AAAAAAAA
V SUBMIT PROC=AAAAAAAXW &PNODE=YYY                             -
V                      &SNODE=MMM                        -
V            &PATHNAME='C:\TEST'      -
V                      &XLATE=YES                              -
V                      &DATATYPE=TEXT                          -
V                      &D1='TEST(+1)'     -
V                      &DISP=(NEW,CATLG,DELETE)                -
V                      &LRECL=60                               -
V                      &RECFM=FB                               -
V                      &SPACE=CYL                              -
V                      &PQTY=10                                -
V                      &SQTY=10
VMEMBER NAME  AAAAAAAXW
VAAAAAAAXD PROCESS PNODE=&PNODE                                    -
V                 SNODE=&SNODE
VSTEP1    COPY    FROM (SNODE DSN=&PATHNAME DISP=SHR              -
V                 SYSOPTS="XLATE(&XLATE) DATATYPE(&DATATYPE)")    -
V                 TO  (PNODE DSN=&D1 DISP=&DISP                   -
V                 DCB=(LRECL=&LRECL,RECFM=&RECFM,BLKSIZE=0)       -
V                 SPACE=(CYL,(&PQTY,&SQTY),RLSE))                 -
V                 CKPT=20K                                        -
V                 COMPRESS EXTENDED
V        IF (STEP1 = 0 ) THEN
VSTEP2    RUN JOB (DSN=WINNT)                                     -
V                 SYSOPTS=\'PGM(C:\\BIN\\TEST.BAT) \        -
V                         \ARGS(TEST.TEST)'
\                -
V                 SNODE
V         EXIT
V        EIF
VMEMBER NAME  BBBBBBD
VBBBBBBD PROCESS PNODE=&PNODE                                    -
V                 SNODE=&SNODE
VSTEP1    COPY    FROM (SNODE DSN=&PATHNAME DISP=SHR              -
V                 SYSOPTS="XLATE(&XLATE) DATATYPE(&DATATYPE)")    -
V                 TO  (PNODE DSN=&D1 DISP=&DISP                   -
V                 DCB=(LRECL=&LRECL,RECFM=&RECFM,BLKSIZE=0)       -
V                 SPACE=(CYL,(&PQTY,&SQTY),RLSE))                 -
V                 CKPT=20K                                        -
V                 COMPRESS EXTENDED
V        IF (STEP1 = 0 ) THEN
VSTEP2    RUN JOB (DSN=WINNT)                                     -
V                 SYSOPTS=\'PGM(C:\\BIN\\TEST.BAT) \        -
V                         \ARGS(&SARG)'
\               -
V                 SNODE
V         EXIT
V        EIF
VMEMBER NAME  BBBBBBP
V SUBMIT PROC=BBBBBBW &PNODE=YYY                             -
V                      &SNODE=MMM                        -
V         &PATHNAME='C:\TEST'         -
V                      &XLATE=YES                              -
V                      &DATATYPE=TEXT                          -
V                      &D1='TEST(+1)'     -
V                      &DISP=(NEW,CATLG,DELETE)                -
V                      &LRECL=60                               -
V                      &RECFM=FB                               -
V                      &SPACE=CYL                              -
V                      &PQTY=10                                -
V                      &SQTY=10
VMEMBER NAME  BBBBBBW
VBBBBBBD PROCESS PNODE=&PNODE                                    -
V                 SNODE=&SNODE
VSTEP1    COPY    FROM (SNODE DSN=&PATHNAME DISP=SHR              -
V                 SYSOPTS="XLATE(&XLATE) DATATYPE(&DATATYPE)")    -
V                 TO  (PNODE DSN=&D1 DISP=&DISP                   -
V                 DCB=(LRECL=&LRECL,RECFM=&RECFM,BLKSIZE=0)       -
V                 SPACE=(CYL,(&PQTY,&SQTY),RLSE))                 -
V                 CKPT=20K                                        -
V                 COMPRESS EXTENDED
V        IF (STEP1 = 0 ) THEN
VSTEP2    RUN JOB (DSN=WINNT)                                     -
V                 SYSOPTS=\'PGM(C:\\BIN\\TEST.BAT) \        -
V                         \ARGS(TEST.TEST)'
\                -
V                 SNODE
V         EXIT
V        EIF
 


Requirement is to generate a single formatted record of the resolved values from this data, resolved here means any-data which is not passed as a symbolic substitution; for example &D1 is resolved as 'TEST(+1)', &PNODE is resolved as: 'YYY'.

Similarly there are other IEBPTPCH outputs which again spread through multiple lines, and a single formatted output is required to be generated.

All these are segregated on the basis of the 'MEMBER NAME'.

A pseudo-algorithm with *SORT would be very helpful for me; I can convert it into code.

My current thought process is to first get all the lines of data falling under a group in one single row, after which I can PARSE the same and format accordingly.
Now getting it into a single line is another big cookie.

Thank you.

Re: Formatting free form data

PostPosted: Tue Feb 28, 2017 8:10 pm
by enrico-sorichetti
unfortunately Your explanation is clear as mud :(

post a SHORT sample of the input and the relative expected output

Re: Formatting free form data

PostPosted: Tue Feb 28, 2017 8:54 pm
by Aki88
Hello Mr. Sorichetti,

My sincere apologies, on re-reading I realized that the description is indeed not clear.

Kindly refer the data in original post; reason for referring to it is because it mixes a few permutations in which actual data can occur.

The requirement-
a. This input is IEBPTPCH output, hence the data has to grouped on the basis of MEMBER names
b. Refer the below set of data from the first MEMBER group (of sample input of earlier post):


VMEMBER NAME  AAAAAAAA
V SUBMIT PROC=AAAAAAAXW &PNODE=YYY                             -
V                      &SNODE=MMM                        -
V            &PATHNAME='C:\TEST'      -
V                      &XLATE=YES                              -
V                      &DATATYPE=TEXT                          -
V                      &D1='TEST(+1)'     -
V                      &DISP=(NEW,CATLG,DELETE)                -
V                      &LRECL=60                               -
V                      &RECFM=FB                               -
V                      &SPACE=CYL                              -
V                      &PQTY=10                                -
V                      &SQTY=10
 


If this type of data is encountered, then output should appear as (the headings have been added only for representational purpose):


MEMBER NAME | PROC | PNODE | SNODE | PATHNAME | D1 |
AAAAAAAA | AAAAAAAXW | YYY | MMM | C:\TEST | TEST(+1) |
 


c. If the data is of the form:


VMEMBER NAME  AAAAAAAXW
VAAAAAAAXD PROCESS PNODE=&PNODE                                    -
V                 SNODE=&SNODE
VSTEP1    COPY    FROM (SNODE DSN=&PATHNAME DISP=SHR              -
V                 SYSOPTS="XLATE(&XLATE) DATATYPE(&DATATYPE)")    -
V                 TO  (PNODE DSN=&D1 DISP=&DISP                   -
V                 DCB=(LRECL=&LRECL,RECFM=&RECFM,BLKSIZE=0)       -
V                 SPACE=(CYL,(&PQTY,&SQTY),RLSE))                 -
V                 CKPT=20K                                        -
V                 COMPRESS EXTENDED
V        IF (STEP1 = 0 ) THEN
VSTEP2    RUN JOB (DSN=WINNT)                                     -
V                 SYSOPTS=\'PGM(C:\\BIN\\TEST.BAT) \        -
V                         \ARGS(TEST.TEST)'
\                -
V                 SNODE
V         EXIT
V        EIF
 


Then only the below portion is to be extracted:


MEMBER NAME | DSN | SYSOPTS |
AAAAAAAXW | WINNT | \'PGM(C:\\BIN\\TEST.BAT) \\ARGS(TEST.TEST)'\ |
 



My thought process was to collate all the lines falling under any given 'MEMBER NAME' onto a single line, and then PARSE this data to extract the details required, using INREC/IFTHEN; was looking for an alternate approach or an algorithm which could achieve this output.

Hope I was able to put across the requirement in a better form this time around.
The solution to this will form the base of logic for other SORT cards which will be developed to format similar IEBPTPCH outputs (data arrangement in all these datasets in unformatted).

Thank you.

Re: Formatting free form data

PostPosted: Sat Mar 04, 2017 3:45 am
by Aki88
Hello,

For the moment, coded a COBOL program, it enabled me to play around with free-form data as-and-how I wished without my-lack-of-tool's-knowledge-restraints.

Having said that, would definitely like to achieve it using *SORT (DFSORT or SYNCSORT), because I am aware that this is achievable with a bit of code; only a bit of clarity on the algorithm is required - on handling multiple rows of data per record, and then filtering only the details needed from them.

Will post the solution, should I hit gold.

Thank you.

Re: Formatting free form data

PostPosted: Mon Mar 06, 2017 4:05 am
by BillyBoyo
You only need six pieces of data from one type, and three from the other.

I'd work on the three first.

Identify something which gives you the first line from where you want to extract the DSN and something which indicates the "end" of that.

The member-name you just identify and PUSH with WHEN=GROUP.

You also PUSH the data you need.

When you reach the "end" you identified, you can pick up the information you have PUSHed.

The split-across-a-line is a minor thing, as you have a "-" showing connectivity. You'll probably want a SEQ on the group for that, and then include that in a further part of the condition for another WHEN=GROUP. The condition for GROUP can be complex, with AND and OR, and you can define multiple GROUPs and you can intersperse with WHEN=INIT if needed. You'll need a bit of a SQZ and removal of "-" at some point, but get all the data you want first, then do the formatting of it.

Don't be worried about developing it in multiple steps. They can be combined later. You don't have to work on everything at once, break it down.

Re: Formatting free form data

PostPosted: Mon Mar 06, 2017 3:40 pm
by Aki88
Thank you Billy for looking this one up.

BillyBoyo wrote:...The split-across-a-line is a minor thing, as you have a "-" showing connectivity. ....


The splits were what were giving me a headache, because neither the position at which my data tags are occurring is fixed nor the continuation; the 'how to hold the data in sequence with DFSORT' took me a while to grasp, for instance like a working-storage variable in COBOL. The only way that I could think of it was by creating an extended record using 'GROUP' every time a data that I needed was encountered; the only way around that was - PARSE; but clubbing it all together - HOW to go about it - lost.

BillyBoyo wrote:... but get all the data you want first, then do the formatting of it. ...

Don't be worried about developing it in multiple steps. They can be combined later. You don't have to work on everything at once, break it down.


And now it makes more sense, breaking it down and creating a multi-staged solution first. Once A solution is visible, then it can be tuned further, since we'll have a framework to work on; will work on this.

Thank you once again.

Re: Formatting free form data

PostPosted: Mon Mar 06, 2017 3:55 pm
by BillyBoyo
The "-" as continuation must always be the last thing on the line. You can make use of that fact to get it into a fixed position. JFY with SHIFT=RIGHT and "bang", the last character of the (logical) line is now the last character of the (physical) line, as you have got rid of the pesky trailing blanks.

Doesn't have to be on the line, and doesn't have to be a full line in an extension, if you have a look at LENGTH.

Re: Formatting free form data

PostPosted: Wed Mar 15, 2017 3:37 pm
by Aki88
Thank you Billy for the pointers; I worked on this one in spare time, and the SORT solution is ready now :D ; took me longer than I'd thought it'd take, to understand the data pattern; the field-names were not consistent across all records, and there were a few other gaps too. Not as simple and straight forward as I'd thought in my first post. BUT, it is done now.

Logic employed was:
a. First group the data on the basis of IEBPTPCH 'MEMBER NAME'
b. Now sub-group each member on the basis of the PROCs that are employed within them, these can be 'n' in number (n >= 1, <= 20)
c. Once the data is grouped on the basis of members and PROCs, now bring complete sub-group PROC data onto one line, so that elements can be parsed individually to cherry-pick.
d. Cherry-pick, format and write to output.

The core SORT-card (minus the code piece for decoration):


Group the data:

//SYSIN    DD *                                                    
 INREC IFTHEN=(WHEN=INIT,BUILD=(2000:1,72)),                        
       IFTHEN=(WHEN=GROUP,BEGIN=(2000,72,SS,EQ,C'MEMBER'),          
                          PUSH=(2076:2014,8,2085:ID=5)),            
       IFTHEN=(WHEN=GROUP,BEGIN=(2000,72,SS,EQ,C'SUBMIT PROC='),    
                          END=(2000,72,SS,NE,C'-'),                
                          PUSH=(2091:ID=5,2097:SEQ=3)),            
*                                                                  
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,001),                
                       PUSH=(1:2076,9,2001,72)),                    
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,002),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00081:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,003),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00161:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,004),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00241:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,005),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00321:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,006),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00401:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,007),                
                     END=(2000,72,SS,NE,C'-'),PUSH=(00481:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,008),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00561:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,009),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00641:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,010),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00721:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,011),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00801:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,012),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00881:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,013),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(00961:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,014),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01041:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,015),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01121:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,016),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01201:2001,72)),
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,017),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01281:2001,72)),  
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,018),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01361:2001,72)),  
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,019),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01441:2001,72)),  
       IFTHEN=(WHEN=GROUP,BEGIN=(2097,3,ZD,EQ,020),                  
                     END=(2000,72,SS,NE,C'-'),PUSH=(01521:2001,72)),  
       IFTHEN=(WHEN=(2097,3,ZD,EQ,001),                              
               OVERLAY=(81:1520X))                                    
*                                                                    
 OUTFIL INCLUDE=(2091,2,CH,NE,C'  '),                                
        REMOVECC,NODETAIL,BUILD=(1,1600),                            
 SECTIONS=(2091,5,                                                    
          TRAILER3=(1,72,                                            
                    81,72,                                            
                    161,72,                                          
                    241,72,                                          
                    321,72,                                          
                    401,72,                                          
                    481,72,                                          
                    561,72,  
                    641,72,  
                    721,72,  
                    801,72,  
                    881,72,  
                    961,72,  
                   1041,72,  
                   1121,72,  
                   1201,72,  
                   1281,72,  
                   1361,72,  
                   1441,72,  
                   1521,72))  
*                            
 SORT FIELDS=COPY            
/*                            

Now that the data is in a single row, start parsing and formatting the same:

//TOOLIN   DD *                                              
 COPY FROM(IN1) TO(IN2) USING(CTL1)                          
 COPY FROM(IN2) TO(OUT) USING(CTL2)                          
/*                                                          
//CTL1CNTL DD *                                              
 INREC IFTHEN=(WHEN=INIT,                                    
               FINDREP=(IN=C'DSN2',OUT=C'PATHNAME')),        
       IFTHEN=(WHEN=INIT,                                    
               FINDREP=(IN=C'DSN1',OUT=C'D1')),              
       IFTHEN=(WHEN=(1,1000,SS,EQ,C'||'),                    
               FINDREP=(IN=(C'|',C'-',C' '),OUT=C'',        
                            STARTPOS=146,ENDPOS=265))        
 OUTFIL REMOVECC,                                            
        IFTHEN=(WHEN=INIT,                                  
              PARSE=(%00=(ENDBEFR=C' ',FIXLEN=8))),          
        IFTHEN=(WHEN=INIT,                                  
              PARSE=(%01=(STARTAFT=C'PROC=',                
                     ENDBEFR=C' ',FIXLEN=8))),              
        IFTHEN=(WHEN=INIT,                                  
              PARSE=(%02=(STARTAFT=C'&PNODE=',              
                     ENDBEFR=C' ',FIXLEN=8))),          
        IFTHEN=(WHEN=INIT,                              
              PARSE=(%03=(STARTAFT=C'&SNODE=',          
                     ENDBEFR=C' ',FIXLEN=12))),          
        IFTHEN=(WHEN=INIT,                              
              PARSE=(%04=(STARTAFT=C'&D1=',              
                     ENDBEFR=C' ',FIXLEN=46))),          
        IFTHEN=(WHEN=INIT,                              
              PARSE=(%05=(STARTAFT=C'&PATHNAME=',        
                     ENDBEFR=C'  ',FIXLEN=90)),          
*                                                        
                     BUILD=(%00,C' | ',                  
                            %01,C' | ',                  
                            %02,C' | ',                  
                            %03,C' | ',                  
                            %04,C' | ',                  
                            %05))                        
/*                                                      
//CTL2CNTL DD *                                          
 INREC IFTHEN=(WHEN=INIT,                                
               FINDREP=(IN=(C'''',C'-'),OUT=C'',  
                        STARTPOS=90))            
/*  


PS: COBOL was simpler to code, but this was fun ;)