Extended Format Datasets



Help for IBM's record-oriented filesystem VSAM, ESDS, KSDS, RRDS, LDS and Storage management Subsystems

Extended Format Datasets

Postby mfrookie » Mon Aug 05, 2013 9:04 pm

All,

We are trying to see if Extended format datasets can be used for certain datasets we have. These datasets are quite large (sequential datasets) and are over 100 GB.

So just wanted to know if any of you have any experiences with these types of datasets and are there any precautions or points that you would like to suggest.

Based on the manual
1) It can not be used for datasets accessed using EXCP - I believe EXCP is quite good as compared to BSAM and QSAM.
2) Once allocated these datasets can not grow. Whatever growth is required, can be achieved only on the existing volumes.
3) Stripping needs certain Hardware - What kind of hardware it needs.

I also would like to know if these datasets are supported by commonly used IBM / third party tools like SYNCSORT (I believe it does as DFSORT supports them), File Manager, File-Aid, IBM Debugger
etc.

Also what is the impact involved in compressing and decompressing the data. Are there any problems in viewing the data using ISPF 3.4 option (probably a stupid question).

Thanks.
mfrookie
 
Posts: 40
Joined: Mon Apr 25, 2011 8:46 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Extended Format Datasets

Postby dick scherrer » Mon Aug 05, 2013 9:54 pm

Hello,

Why does someone believe this should be considered? What is your goal?

If you explain more clearly what you want to accomplish, someone may have some ideas.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: Extended Format Datasets

Postby steve-myers » Tue Aug 06, 2013 12:27 am

mfrookie wrote:... 1) It can not be used for datasets accessed using EXCP - I believe EXCP is quite good as compared to BSAM and QSAM. ...
Actually QSAM (especially) and BSAM are very, very good. It takes a very experienced and clever EXCP programmer to improve upon their performance. Been there, done that.
steve-myers
Global moderator
 
Posts: 2105
Joined: Thu Jun 03, 2010 6:21 pm
Has thanked: 4 times
Been thanked: 243 times

Re: Extended Format Datasets

Postby mfrookie » Tue Aug 06, 2013 3:50 pm

We are basically looking to improve performance and at the same time reduce the storage by compressing the data.

I know that compressing and decompressing the data will probably incur some overhead but it should be outweighed by the savings in terms of CPU and I/O.

If anyone has used them, please share your experiences.

Thanks.
mfrookie
 
Posts: 40
Joined: Mon Apr 25, 2011 8:46 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Extended Format Datasets

Postby Robert Sample » Tue Aug 06, 2013 5:42 pm

I know that compressing and decompressing the data will probably incur some overhead but it should be outweighed by the savings in terms of CPU and I/O.
If you are compressing data, the CPU overhead of compressing it before writing and uncompressing it after reading will probably well exceed any CPU savings you expect to see.
Robert Sample
Global moderator
 
Posts: 3720
Joined: Sat Dec 19, 2009 8:32 pm
Location: Dubuque, Iowa, USA
Has thanked: 1 time
Been thanked: 279 times

Re: Extended Format Datasets

Postby steve-myers » Tue Aug 06, 2013 9:19 pm

mfrookie wrote:We are basically looking to improve performance and at the same time reduce the storage by compressing the data. ...
Granted that mainframe DASD is ridiculously over priced compared to toy machine disk storage, but it is almost certainly cheaper than the CPU clicks spent compressing and decompressing data.

You may see a small run time improvement in reduced I/O time. I don't know if any studies have been published about this, and I'm too lazy to research this myself, but you're welcome to use your favorite search tool to find out. Let us know what you find.
steve-myers
Global moderator
 
Posts: 2105
Joined: Thu Jun 03, 2010 6:21 pm
Has thanked: 4 times
Been thanked: 243 times

Re: Extended Format Datasets

Postby dick scherrer » Tue Aug 06, 2013 11:57 pm

Hello,

We are basically looking to improve performance . . . .
What amount of input being processed and how long does it take to sort? What amount of improvement is wanted?
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: Extended Format Datasets

Postby mfrookie » Sat Aug 10, 2013 12:54 pm

Sorry for delay in replying.

The datasets are partitioned based on keys and each partition is well over 100 GB, already defined as DSNTYPE=LARGE and takes anywhere between 40 to 70 minutes to sort the files depending on the workload on the LPAR.

We just want to do a small PoC (Proof of Concept) to see if Extended format datasets can help in improving the performance. And how much time it takes (CPU / Elapsed) in compressing and decompressing the data. If it takes more time, then can we just not compress it and use it as it is.

Please share your experiences if you have already used them.

Thanks.
mfrookie
 
Posts: 40
Joined: Mon Apr 25, 2011 8:46 pm
Has thanked: 0 time
Been thanked: 0 time

Re: Extended Format Datasets

Postby dick scherrer » Mon Aug 12, 2013 3:07 am

Hello,

The cpu usage to compress the data will greatly outweigh what you currently see. The data will have to be decompressed before it can be used . . . This is less than compression, but is still not free.

I suspect that when the data is sorted/used only some of the records are needed and only some of the fields in those records are needed. What you describe sounds like poor design ( it is far easier for the coder to sort all of the fields and all of the records all of the time and not be concerned about system usage).
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: Extended Format Datasets

Postby j2422tw » Mon Aug 12, 2013 4:04 pm

Hi, mfrookie:

Look like you confuse about 'large' and 'extend' format dataset.
Large format dataset can using for SAM, extended format dataset can using for SAM and VSAM.
Extended format dataset must SMS, but large format not.
If you want compress, the SMS is needed.

Striping dataset is another case of extended format dataset. It need hardware support, 'Sustained Data Rate (SDR)' is used to set up for striping dataset , and I think the hardware right now used by all of us support this.

We use extended format dataset and compress for SAM and VSAM in my site, it reduce the I/O usage (because compress) and increase CPU usage (because decompress), like dick scherrer explain: batch job cpu usage to compress the data will greatly outweigh what you currently see.
But when I/O usage reduce, the CPU comsume is reduce too.
So if you using extended format dataset for SORT, when the CPU environment is not busy, then the elapse time will reduce, and total CPU time not differ much with origianl, and the peck CPU usage will increase.
But when the CPU environment is very busy, the elapse time and CPU comsume time will almost like original job.

Striping dataset is divide data into multi part dataset to increase I/O in parallel. We use it to replace a EMC product-(TeraSAM), and it work fine.

Hope can give you some help.

Jerry
j2422tw
 
Posts: 25
Joined: Wed Sep 19, 2007 9:46 am
Has thanked: 0 time
Been thanked: 0 time

Next

Return to VSAM/SMS

 


  • Related topics
    Replies
    Views
    Last post