Page 1 of 1

Removing spaces from within quoted fields

PostPosted: Fri May 06, 2016 2:47 pm
by LundyGruffalo
A source file is written in CSV format, as a VB file.

The problem is that they didn't "trim" the strings before putting them in quotes, which means the average record length is over 3,000, when it should be about 200.

e.g.
123,"ABC ",45.67,"DEF ",45,"GHI ",999
123,"ABC ",7689.99,"DEF ",234,"GHI ",999

Therefore there is a lot of wasted space in the file. The numeric values can be different lengths, which means not all fields are lined up.

What's the best way of stripping out the spaces from within the quotes?

Re: Removing spaces from within quoted fields

PostPosted: Fri May 06, 2016 3:31 pm
by NicC
As you can see, there are no extra spaces in your post. (You should have spotted this and corrected when you posted.)

However, if you had used the code tags then this is what you would have:
123,"ABC      ",45.67,"DEF                   ",45,"GHI      ",999
123,"ABC      ",7689.99,"DEF                   ",234,"GHI      ",999

Re: Removing spaces from within quoted fields

PostPosted: Fri May 06, 2016 5:38 pm
by BillyBoyo
Can there be valid embedded spaces? If no, then you can arrange to SQZ the record, and all spaces will disappear (with the record-length automatically adjusted).

If you can have valid embedded blanks the problem is different, so let us know.

Re: Removing spaces from within quoted fields

PostPosted: Mon May 09, 2016 11:04 am
by LundyGruffalo
There can be valid embedded spaces. The fields typically contain strings of four or five words, so the spaces between those words must stay. It is just the spaces between the last "valid" character and the closing quotes that need to be removed.

Re: Removing spaces from within quoted fields

PostPosted: Mon May 09, 2016 7:49 pm
by BillyBoyo
You'll need SQZ with MID=C' '. Note this will also squeeze multiple contiguous embedded blanks if you have those.

You'll need to extend your records to fixed-length first, with an X in the column after the last of the possible data (LRECL+1). Then you can specify the SQZ on all the data, and that will take care of the lengths. I assume you have variable-length records? You will need a FINDREP if you want to get rid of the single trailing blank, IN=C' "',OUT=C'"'.