IBM Mainframe Forum

Posted: **Tue Sep 20, 2022 10:15 pm**

Hi Team

My requirement is to do the following:

1. Take one long sequential record from mainframe file..
2. Read one character at a time, if > ASCII 127, ignore, else keep.
3. Extract each sequence of contiguous ASCII characters as a separate word.
4. Reconstruct all the words into a sentence with spaces in between and write the output.
5. The final output will be a dataset of lines with only ASCII words.

Is it possible to achieve this using DFSORT?

Thank You
Regards
Prasanna G.

Posted: **Tue Sep 20, 2022 11:26 pm**

IMHO, in that case REXX would be a better solution than any SORT tool.

SORT has very limited set of options to work with single bytes.

During the same time needed to build such sophisticated solution for SORT, one could create 5 to 10 similar solutions in REXX.
IMHO.

Posted: **Wed Sep 21, 2022 4:58 am**

sergeyken wrote:IMHO, in that case REXX would be a better solution than any SORT tool.

SORT has very limited set of options to work with single bytes.

During the same time needed to build such sophisticated solution for SORT, one could create 5 to 10 similar solutions in REXX.
IMHO.

Hi Sergeyken

The files that I am going to deal with will have million of records. Hence I thought processing them using REXX will be time consuming and will be inefficient.
Any REXX gurus or SORT gurus can please provide your valuable suggestions.

Thank You
Regards
Prasanna G.

Posted: **Wed Sep 21, 2022 1:39 pm**

Prasanna G wrote:Any REXX gurus or SORT gurus can please provide your valuable suggestions.

Square pegs don't fit in round holes, use the right tools, write a program in PL/I (or if you're so inclined COBOL).

And by the way mainframe files are usually in EBCDIC, and they're called datasets!

Posted: **Wed Sep 21, 2022 4:55 pm**

Prasanna G wrote:2. Read one character at a time, if > ASCII 127, ignore, else keep.

Is it possible to achieve this using DFSORT?

This is definitely impossible in SORT.
It works with records of datasets, and never - with characters in files.

Posted: **Wed Sep 21, 2022 4:57 pm**

Prasanna G wrote:The files that I am going to deal with will have million of records. Hence I thought processing them using REXX will be time consuming and will be inefficient.

If so, I would recommend either C/C++, or Assembler.

Posted: **Wed Sep 21, 2022 5:37 pm**

Something like this, without unimportant details

Select all

#include <fstream>
#include <iostream>
#include <string>
using namespace std;
 
ifstream infile;
ofstream outfile;
string inline, outline;
. . . . . . . . . .

while ( getline( infile, inline) ) {
   outline = "";
   int len = inline.length();
   for ( int i = 0, j = 0; i < len; ) {
      while( i < len && inline[i] < 0x80 ) i++;
      for (j = i; j < len && inline[j] >= 0x80; ) j++ ;
      if (i < len)
         outline += inline.substr( i, j - i + 1 ) + " " ;
   }
   if (outline.length() > 0)
      outfile << outline << endl;
}

. . . . . . . . . .
 

Posted: **Wed Sep 21, 2022 5:51 pm**

1) My mistake - it should be

Select all

outline += inline.substr( i, j - i ) + " " ;

2) If performance is a real issue, I'd recommend to switch to pure C, without using C++ classes.

Posted: **Wed Sep 21, 2022 6:17 pm**

Thanks Sergeyken.. I will try that out..

Posted: **Wed Sep 21, 2022 7:24 pm**

In C, it may be like this

Select all

#include <stdio.h>
#include <stdlib.h>

#define MAX_LINE 1000
 
FILE *infile, *outfile;
unsigned char inline[MAX_LINE], outline[MAX_LINE];
. . . . . . . . . .

if ( NULL == (infile = fopen( "........", "r" ) ) ) exit(100);
if ( NULL == (outfile = fopen( ".......", "w" )) ) exit(200);

do {
   fgets( inline, sizeof(inline), infile ); 
   outline[0] ='\0';
   unsigned char *ichar, *jchar, *ochar;   
   for ( ichar = jchar = inline, ochar = outline; *ichar != '\0' ) {
      while( *ichar != '\0' && *ichar < 0x80 ) ichar++;
      for (jchar = ichar; *jchar >= 0x80; ) jchar++ ;
      if (*ichar != '\0') {
         int word_size = (jchar - ichar);
         strncpy( ochar, ichar, word_size) ;
         strcpy( (ochar += word_size), " " );
         ochar++;
      }
   }
   if ( outline[0] != '\0' ) {
      fputs( outline, outfile );
      fputc( '\n', outfile );    // because fputs() does not add EOL after the line...
   }
} while ( !eof(infile) );

fclose(infile);
fclose(outfile);

. . . . . . . . . .
 

Here, using the pointers char * instead of line indexes, or line-scanning functions like strlen(), strcat()... can significantly improve performance.

IBM Mainframe Forum

How to use ASCII condition in DFSORT

How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT

Re: How to use ASCII condition in DFSORT