IBM Mainframe Forum

by **ibmmf4u** » Mon Jul 30, 2012 10:33 pm

Hi Everyone,

My requirement goes this way.I would like to remove duplicates from a file.Below is an example of input file.

Input file:-

Select all

00001234 TEST1 TESTING
00001234 DESCRIPTION
00003456 TEST2 TESTER2
00003456 DESC
00001234 TEST1 TESTER1
00001234 EXAMPLE1
00003456 TEST2 TESTER2
00003456 EXAMPLE2
00001234 TEST1 TESTSAMP
00001234 SAMPLE1

The first 8 positions were the key fields and the positions from 13 till 17 are the descriptive fields.Now that i would like to remove the other occurence's of the field such that if it encounters the same key's feilds and same descriptive fields again.

The output file should contain following records, eliminating the other duplicate occurrence's of the keyfield+ descriptive field.

Output file:-

Select all

00001234 TEST1 TESTING
00001234 DESCRIPTION
00003456 TEST2 TESTER2
00003456 DESC

I would like to remove the other occurrence's of the fields.

by **dick scherrer** » Mon Jul 30, 2012 11:45 pm

Hello,

What business requiement will be met by this? Why are duplicates (that aren't really duplicate as the data other than the key is different) being discarded?

Why is the data not in sequence?

If you post what is really going on, someone may have a suggestion.

by **BillyBoyo** » Tue Jul 31, 2012 4:02 am

The method is quite simple, but as Dick has said, I'm not sure that you have explained enough about what is happening.

You have groups of data again. You need to sort on the groups and decide which group to retain. There must be a business requirement which details how you decide. That may well be crucial to the solution.

by **bodatrinadh** » Tue Jul 31, 2012 2:44 pm

Hi ibmmf4u,

You can try this code..

Select all

//STEP1 EXEC PGM=SORT
//SORTIN DD *
00001234 TEST1 TESTING
00001234 DESCRIPTION
00003456 TEST2 TESTER2
00003456 DESC
00001234 TEST1 TESTER1
00001234 EXAMPLE1
00003456 TEST2 TESTER2
00003456 EXAMPLE2
00001234 TEST1 TESTSAMP
00001234 SAMPLE1
//SORTOUT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(1,8,CH,A)
OUTREC IFTHEN=(WHEN=INIT,OVERLAY=(61:SEQNUM,4,ZD,RESTART=(1,8)))
OUTFIL BUILD=(1,60,20X),OMIT=(61,4,ZD,GE,+3)

Output :-

Select all

00001234 TEST1 TESTING
00001234 DESCRIPTION
00003456 TEST2 TESTER2
00003456 DESC

by **dick scherrer** » Tue Jul 31, 2012 7:48 pm

Hello,

Thank you for the contribution, but i believe TS needs to provide some more clarification before taking some code and running with it.

In addition to getting what was asked for, i believe more needs to be understood (by TS's organization) about the real intent of this process. I'm scratching my head as to how it might be all right to arbitrarily toss away "stuff" from some unsorted input file. . .

by **ibmmf4u** » Tue Jul 31, 2012 7:57 pm

Hello Dick/Bill,

Its mainly for the reporting purpose. Everyday we will be sending out the reports which contains the transactions which were invalid/failed along with some kind of description including the transaction.Where if one transaction is already there in the report with some description in it which it's already invalid/failed and hence we need not send out the other occurrence of the same transaction with it's description , hence we will be eliminating the other occurrences. That's the reason behind it.

Bill, Thanks for the inputs.

Hi bodatrinadh,

Thanks for the above piece of code, i will test it and let you know the outcome.

Thank you All!!!!!!!!!!!

by **ibmmf4u** » Tue Jul 31, 2012 8:20 pm

Hi bodatrinadh,

Thanks a lot . The above piece of code is working fine if the actual description field contains only two lines but in actual we weren't sure of the number of the descriptor lines.

Pasted below is an example of the Input file.

Select all

00001234 TEST1 TESTING
00001234 DESCRIPTION
00001234 TESTDUPL
00003456 TYPE2 TESTER2
00003456 DESC1
00003456 DESC2
00003456 DESC3
00004567 ERROR1 ERROR DESC1
00004567 DESC2
00004567 DESC3
00001234 TEST1 TESTER1
00001234 EXAMPLE1
00003456 TYPE2 TESTER2
00003456 EXAMPLE2
00004567 ERROR1 DESC LINE1
00004567 DESC LINE2
00001234 TEST1 TESTSAMP
00001234 SAMPLE1

Expected output file:-

Select all

00001234 TEST1 TESTING
00001234 DESCRIPTION
00001234 TESTDUPL
00003456 TYPE2 TESTER2
00003456 DESC1
00003456 DESC2
00003456 DESC3
00004567 ERROR1 ERROR DESC1
00004567 DESC2
00004567 DESC3

Please help me in achieving the above.

Thanks in advance!!!

by **BillyBoyo** » Tue Jul 31, 2012 8:37 pm

If you are certain that this is what is needed, then OK. However, your users are going to be annoyed if you show them there is one error, which they correct, and then the following day there is another that was previously known.

If you do a GROUP on the INREC, PUSHing an ID, then a GROUP on the OUTREC, PUSHing that existing ID to a new position, then on OUTFIL you can INCLUDE those where the two IDs are equal (or OMIT when they are not equal).

The GROUPs have to be done with the RESTART as before, because of no KEYBEGIN in Syncsort.

Note that contiguous error messages on the input file will be included as one GROUP (you'd find that in your testing, I hope). If that is not what you want, then you'd need to include the error message presence in the initial GROUP identification.

by **ibmmf4u** » Fri Aug 10, 2012 12:22 pm

Hi Bill,

Sorry for the delayed reply.

I tried coding per your instructions but got stuck at the 3rd and 4th scenarios.

a GROUP on the OUTREC, PUSHing that existing ID to a new position,

The GROUPs have to be done with the RESTART as before

Pasted below is the piece of code and their outcome in each scenario.

Source code:-

Select all

//SYSIN DD *
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(41:SEQNUM,8,ZD,
RESTART=(1,8)))

SORT FIELDS=COPY

Output:-

Select all

00001234 TEST1 TESTING 00000001
00001234 DESCRIPTION 00000002
00001234 TESTDUPL 00000003
00003456 TYPE2 TESTER2 00000001
00003456 DESC1 00000002
00003456 DESC2 00000003
00003456 DESC3 00000004
00004567 ERROR1 ERROR DESC1 00000001
00004567 DESC2 00000002
00004567 DESC3 00000003
00001234 TEST1 TESTER1 00000001
00001234 EXAMPLE1 00000002
00003456 TYPE2 TESTER2 00000001
00003456 EXAMPLE2 00000002
00004567 ERROR1 DESC LINE1 00000001
00004567 DESC LINE2 00000002
00001234 TEST1 TESTSAMP 00000001
00001234 SAMPLE1 00000002

Source code:-

Select all

//SYSIN DD *
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(41:SEQNUM,8,ZD,
RESTART=(1,8))),

IFTHEN=(WHEN=GROUP,
BEGIN=(41,8,CH,EQ,C'00000001'),
PUSH=(51:1,8,60:ID=1))

SORT FIELDS=COPY
//*

Output:-

Select all

00001234 TEST1 TESTING 00000001 00001234 1
00001234 DESCRIPTION 00000002 00001234 1
00001234 TESTDUPL 00000003 00001234 1
00003456 TYPE2 TESTER2 00000001 00003456 2
00003456 DESC1 00000002 00003456 2
00003456 DESC2 00000003 00003456 2
00003456 DESC3 00000004 00003456 2
00004567 ERROR1 ERROR DESC1 00000001 00004567 3
00004567 DESC2 00000002 00004567 3
00004567 DESC3 00000003 00004567 3
00001234 TEST1 TESTER1 00000001 00001234 4
00001234 EXAMPLE1 00000002 00001234 4
00003456 TYPE2 TESTER2 00000001 00003456 5
00003456 EXAMPLE2 00000002 00003456 5
00004567 ERROR1 DESC LINE1 00000001 00004567 6
00004567 DESC LINE2 00000002 00004567 6
00001234 TEST1 TESTSAMP 00000001 00001234 7
00001234 SAMPLE1 00000002 00001234 7

I am not sure if i did interpret your statements correctly. Request you to Kindly help me in achieving the same.

Am sorry once again for the delayed response.

Thanks in advance!!!

by **BillyBoyo** » Fri Aug 10, 2012 1:49 pm

That's OK as far as it goes. You do not need to have the 1,8 PUSHed, as it is already on each record.

Now you SORT, with EQUALS, on 1,8 and the ID.

On OUTFIL, define another GROUP (same technique) and PUSH the ID to another position, so you have two of them

Select all

11111111 1 1
11111111 1 1
11111111 2 1
11111111 2 1

Then in INCLUDE/OMIT on the OUTFIL, you can test first ID against second. If they are equal, that is the group you let through.

If you use an ID of lenth 1, you will get trouble with more than 10 groups of errors.

You need to look at the "continguous" error messages on your input file, which will be treated as one group.

IBM Mainframe Forum

Removing duplicates!!!

Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!

Re: Removing duplicates!!!