Page 1 of 2

Reducing I/O amount

PostPosted: Mon Aug 01, 2011 10:53 pm
by sensuixel
Hello,

I've wrote a little programm that just count the number of record of a sequential file.

Basically, it opens the file, execute GET macro and AR to count and then i close the file.
It works fine.

Today i tried to do the same task using ICETOOL, hopefully i get the same result but the amount of EXCP
is not exactly on par.

using my code ==> 16 193 EXCP on a 5,000,000 records of length 14
using ICETOOL ==> 293 EXCP on the same file

The gap is tremendous, so i'm looking for a way to "slightly" reduce this distance but i don't really know where to start ?
Buffer pool, getmain ....

I started reading http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DGT2D470/CCONTENTS?DT=20080602122917
and http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DGT1D405/3.3.10?DT=19990106110554

Any suggestion is welcome ;)

Re: Reducing I/O amount

PostPosted: Mon Aug 01, 2011 11:05 pm
by Robert Sample
using my code ==> 16 193 EXCP on a 5,000,000 records of length 14
using ICETOOL ==> 293 EXCP on the same file
First, why are you giving LRECL when it is not needed? What's important in looking at I/O (EXCP, etc) is the physical block size, not the record length -- the system reads and writes BLOCKS, not RECORDS.

Next, what does your JCL look like to access the file? JCL parameters can make a big difference in EXCP counts.

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 12:28 am
by steve-myers
There are some tricks an Assembler programmer can do to reduce the number of real EXCPs to read and write datasets. The DFSORT developers know all these tricks; most likely they are not using GET, they are using EXCP with channel programs that read as much as 1 cylinder with each EXCP. You can do this yourself, but it's not quite as easy to do it as it is for me to write about it.

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 10:00 am
by steve-myers
Robert - While more complete information would have been useful, just knowing the LRECL allows us to determine the probable BLJKSIZE is 4312 (5000000/16192). With this we could determine the dataset size, though, quite honestly I'm not terribly interested. With the dataset size we might be able to deduce exactly what DFSORT is actually doing.

To return to the original post: I can't think of anything simple that could be done to the original program to achieve the EXCP reduction that was observed.

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 12:15 pm
by sensuixel
Actually the LRECL is 89 :oops:
the BLKSIZE is 27,946 and the number of records is 5079294.

That gives me a total of 16193 EXCP.


Here is the DD card and the DCB i use (nothing special as you can see):

//SYSUT1   DD DISP=SHR,DSN=R$TRF.EXT.ISICA.SRC.T11101B8 


SYSUT1   DCB DDNAME=SYSUT1,DSORG=PS,MACRF=(GL),EODAD=ENDFIC           


And the basic routine to count records

         LA   R4,R1   
GET      GET  SYSUT1   
         AR   R3,R4   
         B     GET     
ENDFIC   EQU   *       


I'll go read some article on channel programming as steve-myers suggested.

Thanks for your reply.

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 5:27 pm
by enrico-sorichetti
that' exsctly what could habe been predicted without any need to test ...
89 x 5079294 = 452057166
452057166 / 27946 = 16176
which implies one excp per physical block
( there are some spurious excp, probably due to general data management overhead )
sort uses quite clever I/O techniques so that the number of excp' s will be much less

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 6:32 pm
by Robert Sample
Add some buffers to the DD statement. The QSAM default is 5 buffers, which is by no means enough. DCB=BUFNO=50 (or 100) should be a good starting place. However, be aware that you are not very likely going to get anywhere near the sort product's EXCP count since the sort products are optimized for I/O and a lot of very experienced programmers have worked to improve I/O performance of the sort.

Re: Reducing I/O amount

PostPosted: Tue Aug 02, 2011 6:45 pm
by steve-myers
Actually, by my analysis, the EXCP count seems to be a little high. This is probably caused by embedded short blocks, created because the dataset is created using DISP=MOD to an existing dataset. If this is really true sensuixel's program will be much more complicated. Other than a learning experience, it is not worth the effort. It's time for a war story.

Back in 1975 my shop was trying to copy a moderately large unblocked dataset fairly quickly. They could not rebuild the dataset as a blocked dataset because it was used by an application that used the dataset as if it were a BDAM dataset. In 1975, using VS/2 Version 1 (or early MVS releases for that matter) this could be accomplished using PCI chained scheduling in a V=R region, but getting such an animal was difficult for operational reasons I won't discuss here. I proposed an EXCP solution that worked very well. In the later 1970s (or maybe the early 1980s, I don't remember exactly) IBM released a product called SAM-E to improve the performance of sequential datasets. At first SAM-E stunk, but IBM eventually fixed the bugs and by the middle 1980s added SAM-E to its mainline DFP/370 product. My EXCP solution was no longer required. If sensuixel can get this to run, he'll find it will reduce his EXCP count, but it will not affect his elapsed time.

One other point.

         LA    R1,R1
is perfectly legal and does exactly what sensuixel wants, but it's confusing to read because the second R1 is not register 1; it is just the value 1. You should write it as LA R1,1.

Re: Reducing I/O amount

PostPosted: Thu Aug 04, 2011 11:02 pm
by sensuixel
Robert Sample wrote:Add some buffers to the DD statement. The QSAM default is 5 buffers, which is by no means enough. DCB=BUFNO=50 (or 100) should be a good starting place. However, be aware that you are not very likely going to get anywhere near the sort product's EXCP count since the sort products are optimized for I/O and a lot of very experienced programmers have worked to improve I/O performance of the sort.


I tried to add some buffer (until 500) but i did not see even a slight difference, anyway i'm still reading stuff on channel programming.

And of course i don't want to compete with sort performance but at least to reduce a little the amount of I/O.

Re: Reducing I/O amount

PostPosted: Fri Aug 05, 2011 9:37 am
by steve-myers
sensuixel wrote:... I tried to add some buffer (until 500) but i did not see even a slight difference, anyway i'm still reading stuff on channel programming.

And of course i don't want to compete with sort performance but at least to reduce a little the amount of I/O.
Adding buffers should not change the EXCP count when using QSAM It may reduce elapsed time a little, but that's all. By doing a great deal of effort, you may reduce the EXCP count when using EXCP, but you are not reducing the total I/O; you are just increasing the amount of I/O you are performing for each EXCP. Like I said before, you probably won't see much, if any, reduction in elapsed time.

There is a complex trade off here; it was more extreme before the days of control units with disk caches. When you do any sort of I/O it is using three real resources behind the scene: the channel, the control unit, and the device. Often, all three resources are required. By being clever, yuu, or the operating system, can reduce the use of these resources.

In a real sense, there have been four generations of DASD connectivity starting with System/360.

In the first generation, you had a "selector" channel that supported one I/O event at a time, coupled to a control unit that supported one I/O event at a time. A complete (but simplified) channel program to read one record might be -

CCW seek,disk-address,command-chain,4
CCW search ID equal,record ID,command-chain,5
CCW transfer in channel,*-8,0,0
CCW read,buffer address,0,buffer size

You, as a programmer, do not provide the first CCW: OS/360 provides it. You, as a programmer, provide the last 3 CCWS.

The channel, control unit, and device are busy for the entire duration of this channel program.

There is room for improvement here. Once issued, the device, on its own, can move the access mechanism to the right place and select the approprisate read/write mechanism, and the channel and control unit can be released so that other channel programs can use these resources. What really happened is OS/360 would run an I/O with just the first CCW. As soon as the channel and control unit sent the seek address to the device, the channel and control were free for other users. When the device completed the seek command, it notified the control unit and channel it was complete, and the channel presented an interrupt to the CPU. OS/360 would start a second real I/O with the full channel program. In other words, in first generation DASD every disk I/O required two real I/O events; the stand-alone seek, and the complete channel program.

The next resource killer is the search ID equal command. It requires all three resources. The control unit reads the track until the next record appears at the read/write mechanism, and then compares the record ID with the record ID in CPU storage. If they are equal, the control unit (in essence) tells the channel to skip the next CCW, otherwise the next CCW executes. In concept, this businnes requires 1/2 the revolution time of the disk.

Finally, we have the read CCW. As with the search CCW, all three resources are required. The time required depends on the actual record length, unless the record length specified in the CCW is less than the actual record length.

Thr second generation sought to improve system performance. This required a new kind of channel, the block multiplexor channel, which could manage one I/O event for each device, but only do an I/O transfer at a time, control units that could manage an I/O event for each device. and a new device capability, rotational position sensing, which divides each track into a number of fixed length sectors. On command, the device can locate a specific sector without requiring assistance from the channel and control unit. The "typical" channel program becomes -

CCW seek
CCW set sector
CCW search ID equal
CCW transfer in channel,*-8
CCW read

Once the channel has sent the seek address to the control unit and device it can disconnect from the control unit, and the device can locate the sector. One the device has positioned the access mechanism, it wakes up the channel to send the sector to the device and the channel goes back to sleep. Once the device has found thew sector, the channel wakes up and we execute the search
ID equal, with (hopefully) the next record coming up almost immediately so we never actually execute the transfer in channel.

"Ah ha!" you say, "Where does the sector number come from?" A good question. You can calculate it, or you can cheat, sort of, and get the device to tell you by extending the channel program -

CCW seek
CCW set sector
CCW search ID equal
CCW transfer in channel,*-8
CCW read
CCW read count
CCW read sector

The read count command reads the count area, which contains the record ID and the record length, for the next record, The read sector command transfers the sector containing the last record to your program.

This sector stuff doesn't do you any good at all. It frees channel and control unit resources others can use

The third generation included caching controllers, which did not alter our programming, but made this sector stuff, which didn't work all that well any way, superfluous. The fourth generation is "RAID" DASD which makes serious I/O errors a thing of the past. In 1996 I was working on a program which was supposed to include very fancy I/O error recovery when I realized I was never going to see I/O errors to recover from! So I discarded the code I had written, which was just as well, since it could not be tested.