how to remove duplicates



Support for OS/VS COBOL, VS COBOL II, COBOL for OS/390 & VM and Enterprise COBOL for z/OS

how to remove duplicates

Postby pmerc8888 » Thu Dec 11, 2008 12:51 am

Greetings!

I wonder if someone can help. I have the following arrays:

05 WS-TABLE-1  OCCURS  20 TIMES.             
   10 WS-T1-ID.                           
      15 WS-T1-CAR-CODE            PIC X(03).
      15 WS-T1-SER-NUM              PIC X(04).
      15 WS-T1-DEL-DATE             PIC X(06).
   10 WS-T1-DPT-IND                  PIC X(01).
   10 WS-T1-EQP-CAT                 PIC X(01).
   10 WS-T1-A-MIN                     PIC 9(04).
   10 WS-T1-DUP-FL                   PIC X(01).

05 WS-TABLE-2  OCCURS  20 TIMES.             
   10 WS-T2-ID.                           
      15 WS-T2-CAR-CODE            PIC X(03).
      15 WS-T2-SER-NUM              PIC X(04).
      15 WS-T2-DEL-DATE             PIC X(06).
   10 WS-T2-DPT-IND                  PIC X(01).
   10 WS-T2-EQP-CAT                 PIC X(01).
   10 WS-T2-A-MIN                     PIC 9(04).
   10 WS-T2-DUP-FL                   PIC X(01).


Actually, the contents of WS-TABLE-2 is exactly the same as WS-TABLE-1. I just created it in case I could use it to match with WS-TABLE-1.

Anyways, the thing is, I need to produce an array looking like WS-TABLE-1 containing only unique occurrences and where WS-T1-DUP-FL (subscript) = "Y" if a duplicate was found. Here's what I mean:

Input:
AAA1111081203NA0010
BBB1111081203NA0010
AAA1111081203NA0010
CCC1111081203NA0010
EEE1111081203NA0010
DDD1111081203NA0010
EEE1111081203NA0010
PPP1111081203NA0010

Output:
AAA1111081203NA0010Y
BBB1111081203NA0010
CCC1111081203NA0010
EEE1111081203NA0010Y
DDD1111081203NA0010
PPP1111081203NA0010

Thanks for your time.

Kind regards,
pmerc
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Re: how to remove duplicates

Postby dick scherrer » Thu Dec 11, 2008 3:03 am

Hello,

Where does the data that you are loading into the table originate? Does it come from some external file that is read by the program?

You should not need 2 arrays.

One easy thing to do is sort the array entries before this process begins and when loading into the arrya simply check to see if the one being processed is equal to the one just processed and if so, set the duplicate indicator to Y but do not move the enbtry to the array.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: how to remove duplicates

Postby pmerc8888 » Thu Dec 11, 2008 4:48 am

Greetings Dick,

Thank you kindly for the prompt reply.

The data in WS-TABLE-1 comes from different sources and is a result of several processes at different stages of the program.

Thanks for the idea of sorting. Hope you don't mind if I ask some more questions. Do you see the need for an INPUT and/or OUTPUT PROCEDURE in the SORT? Also, when you said
do not move the entry to the array
... you're referring to the DUP, right? Would I be correct in thinking that I would move the "first occurrence" to the array after setting the indicator when the "keys" change? And then, I compare that "new key" to the "key" of the next occurrence and so on and so forth?

Kind regards,
pmerc
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Re: how to remove duplicates

Postby dick scherrer » Thu Dec 11, 2008 5:57 am

Hello,

is a result of several processes at different stages of the program
Yup, time for 2 tables :)

Then (as you cannot easily control the populating of the table), i would let the code continue as it does now and build the table.

When you get to the point that you need to remove the duplicates you could:
for an INPUT and/or OUTPUT PROCEDURE in the SORT
use the internal sort and input/output procedure. The input procedure would release the individual table 1 entries to the sort. The output procedure would return the sorted entries and load table 2 (marking duplicate entries and discarding the duplicate records rather than loading them to the table).

When you return a value compare it to the current table 2 entry. If it is equal, set the indicator and get the next sorted record. If the value is not the same as the current table 2 entry, increment the position in table 2 and move in the new value. On the first return, you need to "seed" the first entry as there is nothing to compare.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: how to remove duplicates

Postby pmerc8888 » Thu Dec 11, 2008 5:39 pm

Greetings Dick,

Once again, thank you kindly for the reply.

I shall work on the code and let you know how it goes.

Kindest regards,
pmerc
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Re: how to remove duplicates

Postby pmerc8888 » Fri Dec 12, 2008 12:16 am

Hi again, Dick... I hope you don't mind some follow up queries. Please bear with me... I have VERY little experience using internal sort.

Following are my questions:

1. Am I correct in thinking that I still need to have an SD statement for the sort file even if this program is a subroutine which is called by both an online and a batch driver? If so, would I be correct in thinking that in the INPUT-OUTPUT SECTION I need to have something like:

SELECT SORTWORK ASSIGN SORTWK

Also, would the SD have to look something like the following?

SD  SORTWORK.   
               
01  SORT-RECORD.
    05 S-ITEM  OCCURS  20 TIMES.             
       10 S-ID.                           
          15 S-CAR-CODE            PIC X(03).
          15 S-SER-NUM             PIC X(04).
          15 S-DEL-DATE            PIC X(06).
       10 S-DPT-IND                PIC X(01).
       10 S-EQP-CAT                PIC X(01).
       10 S-A-MIN                  PIC 9(04).
       10 S-DUP-FL                 PIC X(01).


2. Would you kindly look at the following and let me know if this is how it should more or less look like:

7700-SORT-TABLE-1 SECTION.                                   
                                                             
    SORT SORTWORK                                         
      ON ASCENDING KEY S-ID                       
                       S-DPT-IND                         
                       S-EQP-CAT                       
                       S-A-MIN                       
      INPUT PROCEDURE IS 7730-SORT-INPUT-PROC               
      OUTPUT PROCEDURE IS 7750-SORT-OUTPUT-PROC.             
                                                             
7700-EXIT.                                                   
    EXIT.                                                   


7730-SORT-INPUT-PROC SECTION.                               
                                                             
    PERFORM VARYING WS-S1 FROM 1 BY 1                         
      UNTIL WS-S1 > WS-T1-LIMIT                               
                                                             
      MOVE WS-TABLE-1 (WS-S1)           TO S-ITEM (WS-S1)
      RELEASE SORT-RECORD                                                             
    END-PERFORM.                                             
                                                             
7730-EXIT.                                                   
    EXIT.                                                     


7750-SORT-OUTPUT-PROC SECTION.                                 
                                                               
    MOVE ZERO                           TO WS-S2.               
    MOVE 'N'                            TO                     
                                        WS-NO-MORE-SRTD-ITEM-SW.
    RETURN SORTWORK                                             
      AT END WS-NO-MORE-SRTD-ITEM                               
      NOT AT END MOVE 'Y'               TO WS-FIRST-TIME-SW.   
                                                               
    PERFORM VARYING WS-S1 FROM 1 BY 1                           
      UNTIL WS-NO-MORE-SRTD-ITEM                               
                                                               
      IF WS-FIRST-TIME                                         
        ADD 1                           TO WS-S2               
        MOVE S-ITEM (WS-S1)             TO WS-TABLE-2 (WS-S2)   
        MOVE 'N'                        TO WS-FIRST-TIME       
      ELSE                                                     
        IF  S-ID      (WS-S1)  =  WS-T2-ID      (WS-S2)     
        AND S-DPT-IND (WS-S1)  =  WS-T2-DPT-IND (WS-S2)     
        AND S-EQP-CAT (WS-S1)  =  WS-T2-EQP-CAT (WS-S2)   
        AND S-A-MIN   (WS-S1)  =  WS-T2-A-MIN   (WS-S2)     
           MOVE 'Y'                     TO WS-T2-DUP-FL (WS-S2)
        ELSE                                                   
           ADD 1                        TO WS-S2               
           MOVE S-ITEM (WS-S1)          TO WS-TABLE-2 (WS-S2)   
        END-IF                                                 
                                                               
        RETURN SORTWORK                                         
          AT END WS-NO-MORE-SRTD-ITEM                           
                                                               
      END-IF                                                   
                                                               
    END-PERFORM.                                               
                                                               
7750-EXIT.                                                     
    EXIT.                                                       
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Re: how to remove duplicates

Postby dick scherrer » Fri Dec 12, 2008 1:15 am

Hello,

Whoa :)
even if this program is a subroutine which is called by both an online and a batch driver?
What is "online" here? CICS? If so, this will not work for you.

If you want to continue with the batch process, you would need an SD to invoke the internal sort.

The SD defined will not work as it should not have an array - it should be the fields for one occurence of the array. Once the original array is loaded, it would be processed 1 entry at a time, releasing the records to the sort (input procedure). The output procedure would do as i mentioned earlier and return records from the sort, loading them into table 2 while marking the duplicate indicator and discarding the duplicate entries.

Most systems have separate called routines for cics and batch. You might check with your system support to find out what is the proper method on your system.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: how to remove duplicates

Postby pmerc8888 » Fri Dec 12, 2008 2:21 am

Hi Dick,

Thanks for the reply.

This is a subroutine is invoked by IMS programs (DB and DC).

Can I still use internal sort?

Till next time...

pmerc
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Re: how to remove duplicates

Postby dick scherrer » Fri Dec 12, 2008 2:45 am

Hello,

This is a subroutine is invoked by IMS programs (DB and DC).
Can I still use internal sort?
I don't know - i'm not an IMS person. I suspect someone in your organization can tell you if there are online IMS programs that use an internal sort.

Maybe someone else who reads the topic will post a reply.
Hope this helps,
d.sch.
User avatar
dick scherrer
Global moderator
 
Posts: 6268
Joined: Sat Jun 09, 2007 8:58 am
Has thanked: 3 times
Been thanked: 93 times

Re: how to remove duplicates

Postby pmerc8888 » Fri Dec 12, 2008 2:48 am

Hi Dick,

Thanks for trying to help.

Kind regards,
pmerc
pmerc8888
 
Posts: 16
Joined: Thu Aug 14, 2008 4:06 pm
Location: China
Has thanked: 0 time
Been thanked: 0 time

Next

Return to IBM Cobol

 


  • Related topics
    Replies
    Views
    Last post