Page 1 of 1

Efficient Way of Spitting records

PostPosted: Fri Jul 30, 2010 4:12 pm
by cvrupesh
What is an efficient way to split the records from File1 to File2 and File3 based on a certain field?

I saw so many posts on Splitting records using DFSORT/ICETOOL but want something which is good at performance. I said Performance because I'm splitting a file1 which has around 60 Million records out of which more or less 50 Million belongs to one file and rest belongs to other file. The Current job takes 10 Minutes to do the same.

The current Toolin statement and Control Cards which I use are listed below:

//TOOLIN   DD *                                                         
  COPY FROM(IN1) TO(OUT1) USING(CTL1)                                   
  COPY FROM(IN1) TO(OUT2) USING(CTL2)                                   
/*                                                                     
//CTL1CNTL DD *                                                         
  INCLUDE COND=(34,3,CH,EQ,C'AA')                     
/*                                                                     
//CTL2CNTL DD *                                                         
  INCLUDE COND=(34,3,CH,EQ,C'BB')                     
/*                                                                     
//* 


I'm sure I am traversing the file for 2 times one for First one and another time for Second time.
I guess there should be some way to send the records to file one based on a condition and parallelly the rest to other file.

Re: Efficient Way of Spitting records

PostPosted: Fri Jul 30, 2010 4:43 pm
by Robert Sample
10 minutes to process 60 million records is pretty fast. I'm not sure there is much you could do to improve that time -- and even if you figure out a way to save 50% (not likely), you're only saving 5 minutes. Why not just leave this job alone and go find a REAL performance issue to work on?

Re: Efficient Way of Spitting records

PostPosted: Fri Jul 30, 2010 5:37 pm
by cvrupesh
Hi Robert,

My problem has 2 sides...

One is performance, yes i would like to see even a couple of minutes saving as my job is critical and every minute adds delay to consequent processes.

On the other hand the code explains me that I am going through the 60 Million records twice, and that is the reason I am looking for any way to avoid going through the same path twice and instead I can make it in single instance.

Hope you understand my need now.

Re: Efficient Way of Spitting records

PostPosted: Fri Jul 30, 2010 8:35 pm
by Frank Yaeger
You can do it in one pass with a DFSORT/ICETOOL job like this:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG  DD SYSOUT=*
//IN1 DD DSN=...  input file
//OUT1 DD DSN=...  output file1
//OUT2 DD DSN=...  output file2
//TOOLIN DD *
COPY FROM(IN1) USING(CTL1)
/*
//CTL1CNTL DD *
  OUTFIL FNAMES=OUT1,INCLUDE=(34,3,CH,EQ,C'AA')
  OUTFIL FNAMES=OUT2,INCLUDE=(34,3,CH,EQ,C'BB')
/*


If you're not familiar with DFSORT and DFSORT's ICETOOL, I'd suggest reading through "z/OS DFSORT: Getting Started". It's an excellent tutorial, with lots of examples, that will show you how to use DFSORT, DFSORT's ICETOOL and DFSORT Symbols. You can access it online, along with all of the other DFSORT books, from:

http://www.ibm.com/support/docview.wss? ... g3T7000080