Page 3 of 3

Re: Reading Type80 SMF Records file

PostPosted: Mon Dec 26, 2011 11:17 pm
by BillyBoyo
There are are large number of SMF records, varying in complexity.

If the "raw" data is/can-be-made available instead as formatted reports/files that will make your task a good deal easier.

As enrico has pointed out, the "sort" utility at your site may well have built-in functions for understanding various parts of the SMF data and be able to produce what you require in a report/file format.

Overall, the total time expended will (should) be less if the data can be formatted on the mainframe and downloaded to you.

If all the data you need is already reported upon, then there is not a great deal of effort expended on the mainframe, just some stuff to collect it all together and to "bullet-proof" it, so they don't give you inconsistent data.

If some stuff is not available, that does need more effort on the mainframe to produce it.

The least effort on the mainframe is to give you all the data as a hex/character "dump" and let you get on with it, but this is also the least "stable" solution, in that you have to get translations right, which are done "automatically" on the mainframe (to display the date/time for instance).

The reports/files would be entirely in "character" format (numbers, letters, that sort of stuff) which would (should) survive EBCDIC to ASCII translation without problems. You then get to process the text/numbers. You'd still have the situation of records of the same type containing "variable" amounts of data, in certain cases.

The 102 data seems to be complex, but it will depend on what you need from it as to how reliable the initial output is.

Bear in mind that especially for any new reports, the output you are given might not be correct, but that is part of the fun of testing anyway :-)

If your site has a package which does SMF reporting, the process should be smoother.

I can't help feeling that this project is being done on sufferance from the mainframe staff, and that you currently have the dirty end of the stick. There seems to be a whole chunk of "data analysis" missing which is replaced by you "looking at the data" and taking it from there. However, if that work is not done, I don't see how it is going to fly without them just giving you all the data. You'll get countless iterations of "this data is missing from the file for that record-type" if you do the "as you go along" version.

I hope I'm wrong.

Re: Reading Type80 SMF Records file

PostPosted: Tue Dec 27, 2011 11:43 am
by dick scherrer
Hello,

Sounhds like you are caught up in politics (someone trying to prove a point) rather than getting some technical solution. . . Bummer.

Most mainframe organizations have processes available to extract particular info from the raw SMF data as well as report on it. Most of the mainframe application developers are largely unaware of these as they have no real need to be familiar with them. Someone from another environment (*nix, Win-based, etc) would be most unlikey to just "grab the file" and do what they want. There is considerable complexity in the SMF data and the data and record formats are not so friendly once downloaded from the mainframe (unless it is a "friendly" data extract).

Once this process provides some useful output, what will it provide?

Re: Reading Type80 SMF Records file

PostPosted: Sat Jan 21, 2012 11:08 am
by angrybeaver
It is possible and it saves you a lot in mainframe processing costs to work with SMF in a distributed environment. You can load it all into a database once you've unloaded the raw SMF to ascii text and query over a full year. If you compress the tables it will only be a few gigs of data depending on what you decide to keep (ie; only type 80s). Have you ever tried to concatenate a bunch of monthly SMF unloads and run a job? When you came back 3 hours later did you find your storage class was wrong or filters were slightly off? Did you have to re-run it and write it to tape and then download an 8 gig file of text? Did it cost you a few hundred dollars and take a few runs? There are tons of tools to do this on the mainframe which will unload to plain text. The problem is always going to be cost: processing and storage. You can invest $2k in a simple Linux server and within the first month you would recoup all the costs of the equivalent mainframe jobs you would have to run. A lot of mainframers will balk at the idea of using a distributed server to process the data. Let them keep their mainframe tools, wait in line in SDSF and spend their hundreds of thousands each year.

First of all you have to retain the RDW record as mentioned before. If the SMF is written to a DASD file then you want to use the following FTP parameters:
bin
site rdw
get 'your.smf.file' smfdata

If the SMF is written to tape you will use these FTP parameters:
bin
site rdw readtapeformat=s
get 'your.smf.file' smfdata

If you want to validate that what you have is accurate then use whichever archaic SMF unloader you'd like and run a job to dump a subset of raw type 80 records. Download it using the instructions above and in Linux use
hexcump smfdata -vC | less

You will see the number 50 in the 6th position in the file. x50 = 80.

Keep in mind that SMF is full of multi-segment and variable length records. The basic layout for ANY record is as follows:

SMFLEN < 65k (SMF Length)
SMFSEG=0,1,2 (SMF Segments) (1 and 2 are multi-segment records. 0 is single segment) SMFLEN + SMFSEG = RDW

So you're basically reading in the RDW to find out how much more to read. Read in the Record Type field that follows the RDW and if it's what you want (ie; type 80) then process everything after it. If not, you know how far to read until the next record. For a type 80 record the SMFRTY data is comprised of an event code and data type for relocation records.

Re: Reading Type80 SMF Records file

PostPosted: Sat Jan 21, 2012 12:56 pm
by steve-myers
angrybeaver wrote:... The basic layout for ANY record is as follows:

SMFLEN < 65k (SMF Length)
SMFSEG=0,1,2 (SMF Segments) (1 and 2 are multi-segment records. 0 is single segment) SMFLEN + SMFSEG = RDW

So you're basically reading in the RDW to find out how much more to read. Read in the Record Type field that follows the RDW and if it's what you want (ie; type 80) then process everything after it. If not, you know how far to read until the next record. For a type 80 record the SMFRTY data is comprised of an event code and data type for relocation records.
This is not quite correct. The RDW is 4 bytes. The first two bytes are the record or segment length expressed as a signed big endian binary number. This may be a problem on X86 workstations where little endian binary numbers are typically used. The second two bytes are a code - not the code shown by Angry beaver - expressed as a signed 16 bit number that allow the segments to be connected. This code is 0 for a complete record, a code for the first segment of a multiple segment record, a code for a middle segment, and a code for the last segment. I don't have these codes memorized; they are discussed in "DFSMS Using Data Sets." This scheme does not provide an indication of how much of a multiple segment record remains. However, the most common method used to read this data on the mainframe automagically connects the segments together with a maximum record length of 32760 (not 64K) bytes; the user program is seldom aware of how the record was actually constructed. The record length is a signed 16-bit big endian number, not an unsigned number. When SMF data for a record actually exceeds 32760 bytes other methods are used to connect the records. I know this is an issue with type 30 records; I don't think it's an issue with type 80 records.

It is possible to build a variable length dataset with logical records greater than 32760 bytes, but writing and reading them is rather complex and is not used for SMF data.

Re: Reading Type80 SMF Records file

PostPosted: Sat Jan 21, 2012 8:28 pm
by angrybeaver
Dealing with the endianness is as simple as googling. He is writing it in C so he can and each number individually against 0xff. There is a little bit of trickiness to joining the the two bytes up (ie; 1 line of code) but I'm sure he'll figure it out if he googles for more than 5 mins (http://www.codeguru.com/forum/showthread.php?t=292902 2nd hit on Google, you're welcome). He seemed to be proficient in C and his real barrier appeared to be retaining the RDW info which was the same issue I ran into. Had he forgotten to code for big->little endian I'm sure this would have been apparent the first time he tried processing the first record of a file and everything was truncated or the end was misaligned. You seemed to know it was stored as a signed big endian. Surely you didn't just figure that out and got it from the same documentation I did and the same he is apparently reading.

Joining the segments is just as simple. 0 is complete. If you have a multi-seg record (ie; the last segment you read was NOT a 0) then you're reading an additional segment. 0 and 1's can start a record. If the last record you read is a 1 then a 1 or a 0 will terminate the segment. We should all have access to the IBM record layout documentation so I won't copy and paste it all for you but that should get him and anybody else interested well on their way.

I had the same problems as he did when I was starting up. "That's impossible! The records are too complex! Just run a job and then download the report." were the words I heard from everybody who was a supposed expert. I see the same thing in this thread. I think the people saying these things have just never taken the time to read the publicly available documentation from IBM, have never written anything more complex than simple JCL, or just don't care if there is a better way to go about processing SMF and are content submitting job after job. I guess there is job security in being inefficient since everybody thinks "what you do takes so long so it must be really complicated and you say no other way exists". SMF is not trivial and writing a parser for every record/field can be daunting. It's fully documented and FAR from rocket science.

He will run into all the weird gotchas of processing SMF as he goes. I don't imagine anybody would code it perfectly based on the strangeness of the design (my opinion) and painful detail of the documentation. I'm confident any coder who is strong in whatever language they choose, understands the differences between mainframes and distributed servers (ie; endians, character encoding, etc) and can read IBMs documentation would eventually figure it out. It took me a week to figure out the RDW field and after that it was relatively straightforward.

Another option some of you may want to consider if you process a lot of SMF reports (audit? fraud investigation? cleaning up security and want to run scenarios against historical SMF to see impacts?) there is readily available documentation from IBM on storing SMF in DB2. You'll find it's much faster than processing through the records byte by byte with any existing unloaders each time you need a report. This could be something to consider for those of you who are C/Linux/Windows averse but want a better/faster/cheaper way to store and research SMF and currently think the unloader you're presently using is the only/best option.

Re: Reading Type80 SMF Records file

PostPosted: Sun Jan 22, 2012 2:06 am
by steve-myers
  • SMF records are broken into segments when
    • The record is longer than the physical record length of the dataset.
    • The proposed record is too long to fit into the current physical record. Then, as much of the proposed record as will fit into the current physical record will be written to the current physical record as the first segment, and the remainder will be written to subsequent physical records.
  • I discussed the big endian / little endian issue because this is an issue that will affect people attempting to analyze the data on an X86 work station.
  • I mentioned that RDW lengths are signed 16-bit values because that's why the 32760 byte length is imposed.
  • I should have mentioned that all binary data in the data records in in the big endian format, and all character data is EBCDIC.
Back in 2002 I think it was I wrote a large Assembler program to process type 30 SMF records. Part of the output was a CSV formatted text dataset that was loaded into workstation spreadsheet programs to generate graphs.