IBM Mainframe Forum

by **ITConsultant** » Wed Mar 10, 2010 5:03 am

I am trying to improve performance of a COBOL (IBM Enterprise Ver 4.1) batch module both for CPU and elapsed time. The module dynamically calls 13 sub modules (yes I will also look into switching to nodynam).

To start with I am overiding the default of 5 BUFNO by explicitly using the BUFNO parameter in the DCB for an input file (VB, sequential, average rec length 4096 on 3390) and set it to 31 (block size is 27998).

Would MVS allocate these 31 buffers one time and keep on "reusing" them or would it "destroy" and then "reallocate" thus the system may wait for the allocation?

If I code this parameter on the input DD, should I also need to code it on the out DD?

I found a performance tuning white paper on IBM website "IBM Enterprise COBOL Version 3 Release 1" and section under QSAM Files suggests to increase the numberr of I/O buffers. Even though my COBOL version is 4.1 but I did not see a similar manual for this version.

by **ITConsultant** » Wed Mar 10, 2010 5:36 am

I have changed the 31 to 30 b/c I think that would be 1 cylinder on 3390.

by **Robert Sample** » Wed Mar 10, 2010 5:58 am

The average record length is useless in this context -- you must know the block size (CI size for VSAM) as this is the size of a buffer. Using BUFNO on a VSAM file is also useless since you must specify AMP=('BUFND=??,BUFNI=??') for buffering -- VSAM won't use DCB=BUFNO. The buffers are allocated from memory in the address space your program is running and and remain allocated for the length of the program execution. They are not "destroyed" -- after the last (31st in your example) is filled by reading from disk, the next read will use the first buffer, then the second, and so forth. And buffering is applied by DD statement so anything you do to your input DD statement would need to be replicated on the output DD statement to reduce the overhead of the output file as well. Adding buffers may require using a larger REGION parameter on your step or job since each buffer added requires some memory; adding 26 buffers to input and output files may take another megabyte or more of memory.

A program may be CPU-bound (the bottleneck in processing is lack of CPU time), or I/O-bound (the bottleneck in processing is I/O). If a program is CPU-bound, reducing I/O to zero would have no impact on overall time required by the program -- only if the program is I/O-bound will using buffering reduce the time required by the program. Fortunately, these days most programs are I/O-bound so buffering is likely to help your job. However, your statement that there are 13 dynamic subprograms does indicate the potential for a CPU-bound process.

by **Robert Sample** » Wed Mar 10, 2010 6:00 am

With modern machines, I usually see improvement in overall times going as high as 50 or 60 buffers per file. This is something that you can experiment with; getting up over 100 buffers per file seems to increase the time rather than decrease it (although I haven't measured this in the last few years). The difference between 31 and 30 buffers is not likely to be large.

by **dick scherrer** » Wed Mar 10, 2010 9:29 am

Hello,

One thing to consider is that while making the i/o "go faster" could reduce run time, doing significantly less i/o will usually provide more improvement. If there is some monster process that does billions of i/o and by using a better approach the i/o is cut to only millions, the process becomes a non-issue. With this much i/o reduction, the cpu usage would also be cut drastically.

Your situation sounds very much like a conversion task i was asked to look into several years ago. The Customer Service system was being completely re-designed and re-written (changing database software in addition to the logical redesign). The direction had been chosen to use "common edit/process subroutines" for all of the online and batch input processes of the new Production system.

Someone decided that the conversion process would use the same common subroutines. The "main" conversion program called many subprograms to provide various editing/updating. In small tests everything ran well. When the first full-volume conversion test was run over a weekend it was discovered/estimated that the full conversion might run over 4 days - nonstop. This was when i was asked to participate. . .

Many of the common subroutines performed very similar functions, but not exactly the same, so there was at least double the overhead. In order to be "fresh" each subroutine was loaded form the library for each iteration Many validations were performed against various validation tables - multiple times. And on and on and on. . .

Long story short, by removing most of the subroutine calls/loads, combining similar functions, loading reference tables into the main module, creating files to load rather than directly inserting each row, etc, the runtime was reduced to under 6 hours. . .

Unfortunately, this was not a magic bullet to accomplish major resource reduction with almost no work. It was considerable work to redo the process and an even bigger issue to convince the management that the "common routines" would not be used by the conversion process.

by **ITConsultant** » Thu Mar 11, 2010 10:17 pm

Dick / Rober thank you both for replying.

I have similar experiences of my own but may be not of the same scale. I was asked to look at a conversion process that was converting 15 records per hour with exclusive right to the CPU (I guess that means non-swapable). I think they needed to convert something like 10 million records in 10 legs. The process had two inputs (an IMS database table with one million records and a DB2 table with around 40million entries). For every IMS record, it would open cursor on DB2, process record and then close cursor. It would take about 3.5 minutes to open the cursor so you do the math. I don't remember the exact number now but I think after the fix it was running under 30 minutes per leg.

Anyhow, I tried BUFNO=60 with Region set to 30m on the step (removed region from job card) and saw a little improvement but not significant (went from 810 records processing per second to 850 range). Would there be anything in the JES/SDSF to tell me some statistics on the I/O?

This is not a conversion job but I am expecting a three fold volume increase thus want to reduce the ELAPSED & CPU times. Since I/O Buffers did not show improvment now I am switching to static calls. Currently all calls are made via working storage definitions. The sub-modules however do not have CANCEL (have GOBACK) so my question is in absence of CANCEL, would MVS swap in / out the loads? If yes, I think there may be millions of times loads getting swapped so I may see a very significant improvement.

Looking forward to your comments.

by **Robert Sample** » Fri Mar 12, 2010 12:02 am

It is sounding like your process is CPU-bound, not I/O-bound. Changing from dynamic to static calls will have some impact, but probably not a huge amount in the absence of CANCEL statements. MVS will load a dynamically called routine into memory upon first call, and unless there is a CANCEL, the routine will remain in memory until end of step. If you have a run-time analyzer such as Compuware's STROBE product, you can tell a lot about the bottleneck -- otherwise, you might be able to find out from the SMF data for the job some of the indicators. There's not much data about I/O generated in a typical shop in the job output.

If the shop has Mainview or Omegamon, you could start monitoring the WLM service class, start the job, and watch the service class to see the delay reasons. Mainview supports digging into a job for some details about what is causing waits (delays), and I assume Omegamon and other similar tools have similar features.

by **ITConsultant** » Fri Mar 12, 2010 12:07 am

Thanks Robert. I will give Omegamon a try.

by **dick scherrer** » Fri Mar 12, 2010 1:11 am

Hello,

Suggest you "play computer" and map the various parts of the process or have some tool identify the heavy usage routines. Identify things that are done repeatedly with the goal to do them less or manye even only once at the start of the run. You might add some counters and displays to show how many rows/records are processed in each part of the process each time thru.

Doing "things" more effeciently can improve the run. Eliminating "things" can drastically improve the run.

Good luck

d

IBM Mainframe Forum

BUFNO Parm, COBOL & Performance

BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance

Re: BUFNO Parm, COBOL & Performance