Page 1 of 2

input file delimited by comma

PostPosted: Tue Mar 15, 2011 7:30 pm
by Manju Venkat
I have an input file which is delimited by commas. Some fields in the input file will have double quotes and within that it can have comma. For those fields I should not delimit by comma. And I have to remove that double quotes in my output file. Please suggest me some logic.
input file format:
aaaaaa,bbbbb,"ccc,ccc",ddd,ee,"ff,ff",

output file format should be as follows.

aaaaaa bbbb ccc,ccc ddd ee ff,ff

can anyone please suggest the code?

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 7:44 pm
by prino
Scan the record and remember when you're in a quote delimited field.

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 7:49 pm
by BillyBoyo
Do you really want to remove all your field delimiters and leave it as one big string? If you have any possibility of blanks in any fields, you won't be able to tell what is what if you just remove the quotes and unquoted commas .

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 7:54 pm
by Manju Venkat
yes. i want to remove the commas and commas within double quotes should not be removed.requirement is like that.

but the output file is fixed format. means if any field is having spaces, it will move spaces to whole length of the field.so i think no problem will be there in the output file.

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 10:32 pm
by BillyBoyo
OK then. Seems odd. Any chance of getting the creation of the file done not as a "CSV" just as a plain text file, like the output you are going to have to create otherwise? Then you'd just have the stuff you want, without the quotes to protect the embedded commas. Text, with the actual commas, and nothing else. Saves you writing the program.

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 10:52 pm
by BillyBoyo
So, you want a loop. One subscript/index for the input, one for the output. Flag to ignore a comma. Flag for opening or closing quote (set to expect opening quote to start with). Look at character on input. If quote and quote-flag expecting open quote, turn off comma flag, quote-flag to expecting closing quote, ignore input byte. If quote and quote-flag expecting close, turn on comma flag, quote flag to expecting open, ignore input byte. If comma and ignore comma is on, ignore input byte. Otherwise copy byte to output. You will, of course, have to determine the end of your record, but you have given no details on that so I assume you can handle it.

Re: input file delimited by comma

PostPosted: Tue Mar 15, 2011 11:41 pm
by Quasar
Reminds me of a class on automaton theory and turing machines. My professors used to say "create an automaton to accept all strings that start with a, end with b and contain. . ." . I surely miss those days, when I learnt about parsers.

Re: input file delimited by comma

PostPosted: Wed Mar 16, 2011 5:51 am
by BillyBoyo
Manju Venkat wrote:yes. i want to remove the commas and commas within double quotes should not be removed.requirement is like that.

but the output file is fixed format. means if any field is having spaces, it will move spaces to whole length of the field.so i think no problem will be there in the output file.


Reading this again, it is still not clear to me. "Yes" is the answer to my question, that you are not concerned about loosing the distinction between a field and a string of characters. Then in the second paragraph, you talks about fields.

For fields, I would suggest a different method.

Anyway, code outline as provided can still give you problems. Presumably this has come from some user application which can "export" a CSV. The problem is, if there is a human putting the data in, are they restricted to only messing you about with embedded commas? What about embedded quotes? Such a thing would mess up that code, whether or not in a field bounded by quotes.

Without full knowledge of the possible inputs, the program code is more complicated. If a quote can occur in the data (if it is user typing, it will occur unless prevented) you have to also know that only quotes that are in an expected delimiting position should be treated as delimiters. Except, what about a necessary quote in the first position? And then, if you are looking for quotes as delimiters (so, something like ", or ,") what if one of those combinations occurs in the text? So in the end, you have to start from the beggining and make fields, also start from the end of the line and make other fields, and see if they are the same, and decide what to do if not.

As I have said, much simpler just to get a "text" file exported instead of the CSV, if at all possible. If not, you need a full specification of the possible data. Then maybe we can see again.

Re: input file delimited by comma

PostPosted: Wed Mar 16, 2011 9:03 am
by Quasar
Hi -

Here's a code snippet for your help.

DATA DIVISION.                                                 
WORKING-STORAGE SECTION.                                       
01  WS-STRING                        PIC X(38)                 
                 VALUE "aaaaaa,bbbbb,'ccc,ccc',ddd,ee,'ff,ff',".
01  WS-I                             PIC S9(04) COMP-3         
                                     VALUE 0.                   
01  WS-CHARACTER                     PIC X.                     
01  WS-QUOTE-FLAG                    PIC X.                     
    88 QUOTE-ON                      VALUE "'".                 

PROCEDURE DIVISION.                                 
    PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I > 38
       MOVE WS-STRING(WS-I:1) TO WS-CHARACTER       
                                                   
       IF QUOTE-ON                                 
          IF WS-CHARACTER = "'"                     
             MOVE SPACES TO WS-CHARACTER           
          END-IF                                   
       ELSE                                         
          IF WS-CHARACTER = ","                     
             MOVE SPACES TO WS-CHARACTER           
          END-IF                                   
          IF WS-CHARACTER = "'"                     
             MOVE SPACES TO WS-CHARACTER           
             SET QUOTE-ON TO TRUE                   
          END-IF                                   
       END-IF                                       
    END-PERFORM                                     


Thank you very much.

Re: input file delimited by comma

PostPosted: Wed Mar 16, 2011 9:05 am
by Quasar
You also need to add turn the QUOTE-ON to FALSE when a second quote is encountered. Forgot that possibility.
IF QUOTE-ON
IF WS-CHARACTER = " ' "
MOVE SPACES TO WS-CHARACTER
QUOTE-FLAG
END-IF