Page 1 of 2

Extended Format Datasets

PostPosted: Mon Aug 05, 2013 9:04 pm
by mfrookie
All,

We are trying to see if Extended format datasets can be used for certain datasets we have. These datasets are quite large (sequential datasets) and are over 100 GB.

So just wanted to know if any of you have any experiences with these types of datasets and are there any precautions or points that you would like to suggest.

Based on the manual
1) It can not be used for datasets accessed using EXCP - I believe EXCP is quite good as compared to BSAM and QSAM.
2) Once allocated these datasets can not grow. Whatever growth is required, can be achieved only on the existing volumes.
3) Stripping needs certain Hardware - What kind of hardware it needs.

I also would like to know if these datasets are supported by commonly used IBM / third party tools like SYNCSORT (I believe it does as DFSORT supports them), File Manager, File-Aid, IBM Debugger
etc.

Also what is the impact involved in compressing and decompressing the data. Are there any problems in viewing the data using ISPF 3.4 option (probably a stupid question).

Thanks.

Re: Extended Format Datasets

PostPosted: Mon Aug 05, 2013 9:54 pm
by dick scherrer
Hello,

Why does someone believe this should be considered? What is your goal?

If you explain more clearly what you want to accomplish, someone may have some ideas.

Re: Extended Format Datasets

PostPosted: Tue Aug 06, 2013 12:27 am
by steve-myers
mfrookie wrote:... 1) It can not be used for datasets accessed using EXCP - I believe EXCP is quite good as compared to BSAM and QSAM. ...
Actually QSAM (especially) and BSAM are very, very good. It takes a very experienced and clever EXCP programmer to improve upon their performance. Been there, done that.

Re: Extended Format Datasets

PostPosted: Tue Aug 06, 2013 3:50 pm
by mfrookie
We are basically looking to improve performance and at the same time reduce the storage by compressing the data.

I know that compressing and decompressing the data will probably incur some overhead but it should be outweighed by the savings in terms of CPU and I/O.

If anyone has used them, please share your experiences.

Thanks.

Re: Extended Format Datasets

PostPosted: Tue Aug 06, 2013 5:42 pm
by Robert Sample
I know that compressing and decompressing the data will probably incur some overhead but it should be outweighed by the savings in terms of CPU and I/O.
If you are compressing data, the CPU overhead of compressing it before writing and uncompressing it after reading will probably well exceed any CPU savings you expect to see.

Re: Extended Format Datasets

PostPosted: Tue Aug 06, 2013 9:19 pm
by steve-myers
mfrookie wrote:We are basically looking to improve performance and at the same time reduce the storage by compressing the data. ...
Granted that mainframe DASD is ridiculously over priced compared to toy machine disk storage, but it is almost certainly cheaper than the CPU clicks spent compressing and decompressing data.

You may see a small run time improvement in reduced I/O time. I don't know if any studies have been published about this, and I'm too lazy to research this myself, but you're welcome to use your favorite search tool to find out. Let us know what you find.

Re: Extended Format Datasets

PostPosted: Tue Aug 06, 2013 11:57 pm
by dick scherrer
Hello,

We are basically looking to improve performance . . . .
What amount of input being processed and how long does it take to sort? What amount of improvement is wanted?

Re: Extended Format Datasets

PostPosted: Sat Aug 10, 2013 12:54 pm
by mfrookie
Sorry for delay in replying.

The datasets are partitioned based on keys and each partition is well over 100 GB, already defined as DSNTYPE=LARGE and takes anywhere between 40 to 70 minutes to sort the files depending on the workload on the LPAR.

We just want to do a small PoC (Proof of Concept) to see if Extended format datasets can help in improving the performance. And how much time it takes (CPU / Elapsed) in compressing and decompressing the data. If it takes more time, then can we just not compress it and use it as it is.

Please share your experiences if you have already used them.

Thanks.

Re: Extended Format Datasets

PostPosted: Mon Aug 12, 2013 3:07 am
by dick scherrer
Hello,

The cpu usage to compress the data will greatly outweigh what you currently see. The data will have to be decompressed before it can be used . . . This is less than compression, but is still not free.

I suspect that when the data is sorted/used only some of the records are needed and only some of the fields in those records are needed. What you describe sounds like poor design ( it is far easier for the coder to sort all of the fields and all of the records all of the time and not be concerned about system usage).

Re: Extended Format Datasets

PostPosted: Mon Aug 12, 2013 4:04 pm
by j2422tw
Hi, mfrookie:

Look like you confuse about 'large' and 'extend' format dataset.
Large format dataset can using for SAM, extended format dataset can using for SAM and VSAM.
Extended format dataset must SMS, but large format not.
If you want compress, the SMS is needed.

Striping dataset is another case of extended format dataset. It need hardware support, 'Sustained Data Rate (SDR)' is used to set up for striping dataset , and I think the hardware right now used by all of us support this.

We use extended format dataset and compress for SAM and VSAM in my site, it reduce the I/O usage (because compress) and increase CPU usage (because decompress), like dick scherrer explain: batch job cpu usage to compress the data will greatly outweigh what you currently see.
But when I/O usage reduce, the CPU comsume is reduce too.
So if you using extended format dataset for SORT, when the CPU environment is not busy, then the elapse time will reduce, and total CPU time not differ much with origianl, and the peck CPU usage will increase.
But when the CPU environment is very busy, the elapse time and CPU comsume time will almost like original job.

Striping dataset is divide data into multi part dataset to increase I/O in parallel. We use it to replace a EMC product-(TeraSAM), and it work fine.

Hope can give you some help.

Jerry