IBM Mainframe Forum

by **thermalchu** » Mon Sep 17, 2012 5:46 pm

Which is the best way to compare two sorted files having 66 million records and 8500 record length using copybook. The difference in these copybooks are that positions of fields are changed and some of them even removed.

by **BillyBoyo** » Mon Sep 17, 2012 6:01 pm

I can't resist: the best way is surely to use a computer.

Are your keys unique? Are there the same number of recrods on both datasets?

You want to do a field-level compare? What do you want to do about fields which have been removed - just not "compare" them?

For speed, I'd look at identifying the largest "lumps" of data you can at first. The "lumps" have to be the same length and number on both files, but don't have to be in the same position.

Then compare on the "lumps", outputting records which mismatch. Then do the field-level comparison on the mismatch files which have been output. You don't want to do a field-level on 8500 bytes times 66,000,000 records, You want to do it only on those which you already know are wrong.

Which Sort product do you use? What "file comparison" products do you have?

by **thermalchu** » Mon Sep 17, 2012 6:11 pm

Yes, Keys are unique. And there are same no. of records which are in sorted order.
I need a field-level compare. One is an old copybook. The new copybook have some fields removed and new fields added. Order of fields are also changed. We have to compare only common fileds. We have file aid,superc,icetool and easytrieve.

by **BillyBoyo** » Mon Sep 17, 2012 6:44 pm

OK, thanks.

I know you want a field-level compare as your final output.

8500 bytes is going to be a lot of fields. That would be enough to hold 280 address lines of 30-bytes.

For 60 million records, that is going to be slow.

That is why I suggested first finding records with significant differences in fields as large as possible.

Then you have the data where you know a field-level comparison is going to yield results. The other x-million (hopefully close to 60...) you then ignore. Just work on the subset, to identify which field(s) make the difference(s) on each record.

If the records match, no need to confirm at field level.

With Easytrieve Plus, you can do both at once.

Use the matched-file processing.

Taking account of missing fields and new fields, descibe both records in the big lumps.

Compare the big lumps. If all equal, get on with the next input.

If difference found, do the field-level comparison and report the outcome.

by **thermalchu** » Mon Sep 17, 2012 6:53 pm

k, Around 5 million records till 5000 record length have significant difference in both files. Most of the fields are interchanged here.

by **BillyBoyo** » Mon Sep 17, 2012 7:15 pm

You mean the length of a number of fields is different, for instance?

Do you already have an Easytrieve Plus layout for the old and new records? Are you using Sort symbols/SYMNAMES for the old and new records?

If not, either can be generated. CA provide some code to do it. DFSort provides code. I don't know about SyncSort.

Choose which you'd like to use.

Generate the tests for the matching at field level by a program (can be Easytrieve Plus or your Sort product) reading the record layout.

The code for Easytrieve Plus will be easier to generate: it is more "wordy" but each test can happily be one statement. In Sort the tests are all part of the same statement (INREC, for instance). You'd also not need to worry about running into possible limits (like the size of a statement) or the need to "map" data to a REFORMAT record for the JOINKEYS if going with Sort.

by **thermalchu** » Mon Sep 17, 2012 7:30 pm

I meant positions of the fields are changed. Like,if some fields are removed then remaining fields comes in this position.
I don't have Easytrieve Layout. But I have a program which can compare 2 files of huge size using DFSort but not with copybook.

by **dick scherrer** » Mon Sep 17, 2012 7:40 pm

Hello,

If you have a COBOL copybook for the old/new file layout, you can probably generate Easytrieve file layouts from these. Make sure you pay attention to any ODO structures (occurs depending on). Easytrieve comes with macros to do this.

by **BillyBoyo** » Mon Sep 17, 2012 8:33 pm

If you have 50 fields, then a field dropped, then 70 fields, then a field dropped, then 90 fields - you have three "lumps". Lump 1 on the old file starts at a position, lump 2 starts at a position and lump 3 starts at a position.

The lumps may be contiguous on the new file, but if you treat them as lumps of the same length, the processing will be considerably quicker.

As I said, it can be done in one program in Easytrieve Plus.

It can also be done in one sort step (JOINKEYS) but there is some added complexity in defining the data on the REFORMAT record.

Logically your Easytrieve Plus is like this:

Match on the keys
Test for lumps to see if two matching records contain differences

Only if they do, do the field-level test.

This, assuming that you basically expect the data to match, will save you hundreds of millions of comparisons - enough that you will notice.

by **thermalchu** » Tue Sep 18, 2012 8:51 am

Thank you

IBM Mainframe Forum

Compare 2 files

Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files

Re: Compare 2 files