Page 1 of 2

How to use ASCII condition in DFSORT

PostPosted: Tue Sep 20, 2022 10:15 pm
by Prasanna G
Hi Team

My requirement is to do the following:

1. Take one long sequential record from mainframe file..
2. Read one character at a time, if > ASCII 127, ignore, else keep.
3. Extract each sequence of contiguous ASCII characters as a separate word.
4. Reconstruct all the words into a sentence with spaces in between and write the output.
5. The final output will be a dataset of lines with only ASCII words.

Is it possible to achieve this using DFSORT?

Thank You
Regards
Prasanna G.

Re: How to use ASCII condition in DFSORT

PostPosted: Tue Sep 20, 2022 11:26 pm
by sergeyken
IMHO, in that case REXX would be a better solution than any SORT tool.

SORT has very limited set of options to work with single bytes.

During the same time needed to build such sophisticated solution for SORT, one could create 5 to 10 similar solutions in REXX.
IMHO.

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 4:58 am
by Prasanna G
sergeyken wrote:IMHO, in that case REXX would be a better solution than any SORT tool.

SORT has very limited set of options to work with single bytes.

During the same time needed to build such sophisticated solution for SORT, one could create 5 to 10 similar solutions in REXX.
IMHO.


Hi Sergeyken

The files that I am going to deal with will have million of records. Hence I thought processing them using REXX will be time consuming and will be inefficient.
Any REXX gurus or SORT gurus can please provide your valuable suggestions.

Thank You
Regards
Prasanna G.

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 1:39 pm
by prino
Prasanna G wrote:Any REXX gurus or SORT gurus can please provide your valuable suggestions.

Square pegs don't fit in round holes, use the right tools, write a program in PL/I (or if you're so inclined COBOL).

And by the way mainframe files are usually in EBCDIC, and they're called datasets!

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 4:55 pm
by sergeyken
Prasanna G wrote:2. Read one character at a time, if > ASCII 127, ignore, else keep.

Is it possible to achieve this using DFSORT?

This is definitely impossible in SORT.
It works with records of datasets, and never - with characters in files.

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 4:57 pm
by sergeyken
Prasanna G wrote:The files that I am going to deal with will have million of records. Hence I thought processing them using REXX will be time consuming and will be inefficient.

If so, I would recommend either C/C++, or Assembler.

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 5:37 pm
by sergeyken
Something like this, without unimportant details

#include <fstream>
#include <iostream>
#include <string>
using namespace std;
 
ifstream infile;
ofstream outfile;
string inline, outline;
. . . . . . . . . .

while ( getline( infile, inline) ) {
   outline = "";
   int len = inline.length();
   for ( int i = 0, j = 0; i < len; ) {
      while( i < len && inline[i] < 0x80 ) i++;
      for (j = i; j < len && inline[j] >= 0x80; ) j++ ;
      if (i < len)
         outline += inline.substr( i, j - i + 1 ) + " " ;
   }
   if (outline.length() > 0)
      outfile << outline << endl;
}

. . . . . . . . . .
 

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 5:51 pm
by sergeyken
1) My mistake - it should be
outline += inline.substr( i, j - i ) + " " ;


2) If performance is a real issue, I'd recommend to switch to pure C, without using C++ classes.

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 6:17 pm
by Prasanna G
Thanks Sergeyken.. I will try that out..

Re: How to use ASCII condition in DFSORT

PostPosted: Wed Sep 21, 2022 7:24 pm
by sergeyken
In C, it may be like this

#include <stdio.h>
#include <stdlib.h>

#define MAX_LINE 1000
 
FILE *infile, *outfile;
unsigned char inline[MAX_LINE], outline[MAX_LINE];
. . . . . . . . . .

if ( NULL == (infile = fopen( "........", "r" ) ) ) exit(100);
if ( NULL == (outfile = fopen( ".......", "w" )) ) exit(200);

do {
   fgets( inline, sizeof(inline), infile );
   outline[0] ='\0';
   unsigned char *ichar, *jchar, *ochar;  
   for ( ichar = jchar = inline, ochar = outline; *ichar != '\0' ) {
      while( *ichar != '\0' && *ichar < 0x80 ) ichar++;
      for (jchar = ichar; *jchar >= 0x80; ) jchar++ ;
      if (*ichar != '\0') {
         int word_size = (jchar - ichar);
         strncpy( ochar, ichar, word_size) ;
         strcpy( (ochar += word_size), " " );
         ochar++;
      }
   }
   if ( outline[0] != '\0' ) {
      fputs( outline, outfile );
      fputc( '\n', outfile );    // because fputs() does not add EOL after the line...
   }
} while ( !eof(infile) );

fclose(infile);
fclose(outfile);

. . . . . . . . . .
 


Here, using the pointers char * instead of line indexes, or line-scanning functions like strlen(), strcat()... can significantly improve performance.