Page 1 of 2

Find Duplicate records using cobol internal sort

PostPosted: Thu Nov 05, 2009 5:59 pm
by Jikesh Patel
I have a file in NACHA format and I need to find duplicate records. Why I am stressing to use cobol internal sort is because the NACHA formated file has many layers like file header, batch header, detail records.... So i need to find those duplicates which are belongs to a particular batch header.... So what I have thought is I will extract two fields from batch header and I will plug those records in to detail records which are under that particular batch header after that I will find the duplicates so that it will give me duplicates uniquely.

Re: Find Duplicate records using cobol internal sort

PostPosted: Fri Nov 06, 2009 1:34 am
by dick scherrer
Hello,

You need to post some sample input data and the output you want from that sample input. You do not need to use "full size" sample data - just enough to show the requirement and the various conditions that might exist in the real input later. Show some with duplicates and some without and explain the processing rules.

Once you have shown the data and explained the rules, someone should be able to offer a suggestion on how to get the desired result.

Re: Find Duplicate records using cobol internal sort

PostPosted: Mon Nov 16, 2009 6:16 pm
by Jikesh Patel
Thanks dick for your quick reply.

Here I am pasting you sample data and result.

INPUT FILE:
101 321171184 6910001340910301446C094101BANKNAME  FSB          ASF APPLICATION SUPERVI
5200VALLEY CREDIT UNACH ORIG            1321171773PPDACH ORIG  OCT 300910303061321171770000001
62732516117440048225342      0000060523040000000156534LE,SEAN                 0321171770000007
62732516117440048225342      0000060523040000000156534LE,SEAN                 0321171770000007
820000000100321171180000000809270000000000001321171773                         321171770000001
5225BANKNAME LOAN   ACH                 5251450340PPDEZ-PAY    4049590910293061101006690000059
6262710708015900075792       0000011440000002714562697STEELE, WANDA           1101006691385075
735D02273050302730274      40402639                                            304046691385075
822500000200271070800000000114400000000000009221400000                         101006690000059
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
710MES030402010203120304                      PATRICK GERVAIS                          4834032


UNIQUE ENTRY RECORD:
62732516117440048225342      0000060523040000000156534LE,SEAN                 0321171770000007
6262710708015900075792       0000011440000002714562697STEELE, WANDA           1101006691385075
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824


DUPLICATE ENTRY RECORD:
62732516117440048225342      0000060523040000000156534LE,SEAN                 0321171770000007
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824
6475151433349771964865       4000045444CPL RTL NRG PMTNANYEN CHOU           S 4143303428292824


YOU MUST HAVE NACHA FORMAT SO U CAN UNDERSTAND WHAT EACH FIELD IS SAYING: In case you dont have I can provide you.
And in my shop xsum is not working so I cannot use sort utility here but we have ICETOOL.I tried every possibility I know in icetool but didn't get result.

Thanks a lot for all your kind help... :)

Re: Find Duplicate records using cobol internal sort

PostPosted: Tue Nov 17, 2009 12:18 am
by dick scherrer
Hello,

For future posts, suggest you become familiar/confortable with the "Code" tag - your posted data has now been "Code"d. There is a "Preview" feature so you can see your post as it will appear to the forum (rather than only in the Reply Editor). When you have Previewed and are satisfied with the post, Submit.

You need to post the "rules" for processing that input to get the posted outputs. Mention the relevant data positions and what data should be included in each output file and why. Also menton the recfm and lrecl of each file.

If you are trying to use xsum and ICETOOL how is this related to a cobol internal sort? Please verify which sort product is used on your system and what release/ptf level.

Re: Find Duplicate records using cobol internal sort

PostPosted: Tue Nov 17, 2009 1:17 pm
by Jikesh Patel
Thanks Dick,

I will take care of code tab and I will use preview option before submitting:

I will give u sample which is different then the example given above
Input file with recfm=fb lrcel=20:
And fields are:
1rdfi_no bank_name --> First digit 1 indicates file Header rec
5bank_id Mics_reco  --> First digit 5 indicates batch header rec
6trno Accnt_no amt  --> First digit 5 indicates detail rec

The Input file records are:
12343454 BANK_Name
5sd34245 rec
62345 12345678 345
62345 12345678 345
62345 12345678 345
62445 12343658 355
5er32345 rec
62446 12343653 455
62446 32342653 555
62446 12343653 455


Output:
1) Unique Record file with recfm=fb,lrecl=40
rdfi_no, bank_id, trno, Accnt_no, amt
2343454, sd34245, 23,   12345678, 345
2343454, sd34245, 24,   12343658, 355
2343454, er32345, 24,   12343653, 455
2343454, er32345, 24,   32342653, 555

2) Duplicate record file with rcfm=fb,lrecl=40
rdfi_no, bank_id, trno, Accnt_no, amt
2343454, sd34245, 23,   12345678, 345
2343454, sd34245, 23,   12345678, 345
2343454, er32345, 24,   12343653, 455


The rules to get this result are:
1)plug rdfi_no and bank_id from file header and batch header to detail record.
2)find duplicates after above step(every field must match e.g. full lrecl).
3)create two file one is having only single entry records and other is having only duplicates from second occurance e.g. if input file has a record occured 4 times then write one entry to unique record file and write 3 times to other file.

Approach given by me:
step 1: Create an intermediate file using cobol internal sort and then release only duplicate record to output file and write all other records to unique record file but its not possible so I prepared intermedeate file using a simple cobol code which looks like:
rdfi_no, bank_id, trno, Accnt_no, amt
2343454, sd34245, 23,   12345678, 345
2343454, sd34245, 23,   12345678, 345
2343454, sd34245, 23,   12345678, 345
2343454, sd34245, 24,   12343658, 355
2343454, er32345, 24,   12343653, 455
2343454, er32345, 24,   32342653, 555
2343454, er32345, 24,   12343653, 455

step2: trying to create output files shown above 1)Unique Record file 2)duplicate Record file
--> In this step I am stuck to create these two files because in my shop xsum is not working and in icetool there is no option to separate duplicates from second occurance.


Thanks for all your kind help. :)

Re: Find Duplicate records using cobol internal sort

PostPosted: Wed Nov 18, 2009 2:52 am
by dick scherrer
Hello,

Create an intermediate file using cobol internal sort and then release only duplicate record to output file and write all other records to unique record file but its not possible
Uhh. . . I believe it is possible. I also believe that both outputs could be written rather than an intermediate file. Possibly there is somehting i misunderstand. . .

I would define the sort file (SD) having all of the control fields as the key and in the OUTPUT PROCEDURE i would compare the current record to the previous record and if it is a duplicate, write it to output 2. The first record for each "control" would go to output 1.

As i said, i may be misunderstanding something. . .

Re: Find Duplicate records using cobol internal sort

PostPosted: Wed Nov 18, 2009 12:14 pm
by Jikesh Patel
Thanks dick,

Thats a very good approach you have suggested.
But as u said you misunderstand something that is:
--> the key of each record not only lies on that record but also lies on the other records let me explain you. Here you need to first combine required fields from file,batch header and detail record then apply sort on those all fields.
input file
12343454BANK_Name -->file header having a part of key to find duplicate that is 12343454 which is common for all subsequent records till new file header
5sd34245rec              -->batch header having a part of key to find duplicate that is 5sd34245 which is common for all subsequent records till new batch header
62345 12345678 345  -->detail record having a part of key to find duplicate
62345 12345678 345
62345 12345678 345
62445 12343658 355
5er32345 rec
62446 12343653 455
62446 32342653 555
62446 12343653 455

output file contains records from file header, batch header and detail record
rdfi_no, bank_id, trno, Accnt_no, amt
2343454, sd34245,23,   12345678, 345 -->contains one field from file header,batch header and detail record
2343454, sd34245,24,   12343658, 355
2343454, er32345,24,   12343653, 455
2343454, er32345,24,   32342653, 555


-->second point is how to create two files from a single file using cobol internal sort ?

I hope my requirment is clear now. I am sure I will get couple of very good suggestions.

Thanks for all your kind help. :)

Re: Find Duplicate records using cobol internal sort

PostPosted: Thu Nov 19, 2009 1:02 am
by dick scherrer
Hello,

Yes, as i mentioned, build the sort key using all of the necessary fields from whatever record they are from. One record in the sort-file will have control fields from all 3 records. After the control fields, i'd use a "record type" to identify which header or detail this sort record was build from. The sort-file would contain all of the data from the original records (if anything other than the control info is needed from a header). Write the appropriate output(s) depending on if this is a control break or if this is a duplicate.

So far, i see no reason this will not do what you want . . . Where do you see this approach not doing what you want?

Re: Find Duplicate records using cobol internal sort

PostPosted: Thu Nov 19, 2009 1:05 pm
by Jikesh Patel
Thanks dick,

Wow dick its very good suggestion.That I never think of.
I tried to code according to your suggestions and pasting below, it has only logic part.
I am using cobol internal sort very first time so please feel free to correct my mistakes done.
I didnt run the code as there are so many records(millions) in input file.
       FILE SECTION.                                                    00010000
       FD INPUTFILE.                                                    00020000
       01 INPUT-REC.                                                    00021000
          05 FILLER     PIC X(20).                                      00022000
       SD SORTTEST.                                                     00080000
       01 SORT-REC.                                                     00090000
          05 RDFI-NO    PIC 9(7).                                       00100000
          05 BANK-ID    PIC X(7).                                       00110000
          05 TRNO       PIC 9(2).                                       00120000
          05 ACCNT-NO   PIC X(8).                                       00130000
          05 AMNT       PIC 9(3).                                       00140000
       FD OUTPUT1.                                                      00150000
       01 OUTPUT1-REC.                                                  00160000
          05 O1RDFI-NO    PIC 9(7).                                     00170000
          05 O1BANK-ID    PIC X(7).                                     00180000
          05 O1TRNO       PIC 9(2).                                     00190000
          05 O1ACCNT-NO   PIC X(8).                                     00200000
          05 O1AMNT       PIC 9(3).                                     00210000
       FD OUTPUT2.                                                      00220000
       01 OUTPUT2-REC.                                                  00230000
          05 O2RDFI-NO    PIC 9(7).                                     00240000
          05 O2BANK-ID    PIC X(7).                                     00250000
          05 O2TRNO       PIC 9(2).                                     00260000
          05 O2ACCNT-NO   PIC X(8).                                     00270000
          05 O2AMNT       PIC 9(3).                                     00280000
       WORKING-STORAGE SECTION.
          01 WS-REC.
               05 WS-RDFI-NO        PIC 9(7).
               05 WS-BANK-ID         PIC 9(7).
               05 WS-DETAIL          PIC X(13). 
       PROCEDURE DIVISION.                                              00290000
          SORT SORTTEST                                                 00300000
               ASCENDING SORT-REC                                       00310000
               INPUT PROCEDURE IS 1000-INPUT                            00320000
               OUTPUT PROCEDURE IS 1000-OUTPUT.                         00330000
       1000-INPUT.                                                      00340000
          OPEN INPUT INPUTFILE                                          00350000
          READ INPUTFILE AT END SET EOF-INPUT TRUE.                     00360000
          PERFORM UNTIL EOF-INPUT                                       00370000
          IF INPUT-REC(1:1) = 1                                         00380000
             MOVE INPUT-REC(2:7) TO WS-RDFI-NO                          00390000
          ELSE IF INPUT-REC(1:1) = 5                                    00400000
             MOVE INPUT-REC(2:7) TO WS-BANK-ID                          00410000
          ELSE IF INPUT-REC(1:1) = 6                                    00420000
             MOVE INPUT-REC(2:13) TO WS-DETAIL                          00430000
             MOVE WS-REC TO SORT-REC                                    00450000
             RELEASE SORT-REC                                              00460000
          END-IF                                                        00440000
          READ INPUTFILE AT END SET EOF-INPUT TRUE                      00461000
               END-PERFORM.                                             00470000
          CLOSE INPUTFILE.                                              00471000
       1000-OUTPUT.                                                     00480000
          INITIALIZE WS-SORT-REC WS-PREV-SORT.                          00481000
          OPEN OUTPUT OUTPUT1 OUTPUT2.                                  00490000
          RETURN SORTTEST AT END SET EOF-SORT TRUE.                     00500000
          PERFORM UNTIL EOF-SORT                                        00510000
             MOVE SORE-REC TO WS-SORT-REC                               00511000
             IF WS-SORT-REC = WS-PREV-SORT                              00520000
                WRITE OUTPUT2 FROM WS-SORT-REC                          00530000
             ELSE                                                       00540000
                WRITE OUTPUT1 FROM WS-SORT-REC                          00550000
             END-IF                                                     00560000
             RETURN SORTTEST AT END SET EOF-SORT TRUE                   00570000
          END-PERFORM.                                                  00580000
          CLOSE OUTPUT1 OUTPUT2.                                        00590000


Thanks for all your kind help. :)

Re: Find Duplicate records using cobol internal sort

PostPosted: Fri Nov 20, 2009 12:53 am
by dick scherrer
Hello,

Suggest you add a bit of code to stop reading after 100 records or copy the first 100 records to a test file.

Later i will try to look over the code, but that won't be for several hours.

Good luck :)

d