Page 1 of 2

Duplicates between two files

PostPosted: Mon Feb 14, 2011 12:47 am
by Brifka
Hello,
First of all I would like to thank this forum for saving my life couple times ;)
It contains SO MUCH useful information and examples and I learned a lot from it.

But now I have some task, that I don't know how to solve

I have two files VB/606

example of file 1 : aaa- prefix of each record, (4.4,zd) - key
aaa1234lalalala
aaa1234mmmm
aaa1235kkkkk
aaa3456ktktkt
aaa3456yyyyy
aaa3567kkkkk
aaa3567kkky

example of file2: bbb-prefix of each record ,(4.4,zd) - key
bbb1234nata
bbb2345mmmm
bbb2345jtjtj
bbb3456nnnn
bbb3456hhhh

Output file (VB/606) has to contain only those record, that has the same key in both files. Both files have duplicates
output file :
aaa1234lalala
aaa1234mmmm
bbb1234nata
aaa3456ktktkt
aaa3456yyyyy
bbb3456nnnn
bbb3456hhhh

I tried :
SELECT FROM(IN) TO(OUT) ON(10,1,CH) ON(4,4,zD) -
ALLDUPS

but it didn't dive me desired result. :(

Re: Duplicates between two files

PostPosted: Mon Feb 14, 2011 3:22 am
by dick scherrer
Hello and welcome with your first post,

Which release of which sort product is being used?

If you are not sure, run the following and post the informational output including all message ids:
//SORTSTEP EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN   DD *
ABC
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
  SORT     FIELDS=COPY


Hint - it is also best to use the Code Tag when posting data as this will preserve alignment :)

Re: Duplicates between two files

PostPosted: Mon Feb 14, 2011 6:25 pm
by Brifka
Hello,
Thank you for your replay and for hint :)
Unfortunately i cannot edit my open post, but i will take it into account next time
I run the job and got, that I use
Z/OS DFSORT V1R10

Re: Duplicates between two files

PostPosted: Mon Feb 14, 2011 10:54 pm
by skolusu
Brifka wrote:I run the job and got, that I use
Z/OS DFSORT V1R10


Brifka,

We need more than just 1 line about version of DFSORT you are running. We need to see the message ICE201I which tells us about the level of DFSORT you are running. Check this link for a better explanation

dfsort-icetool-icegener/topic2894.html

Re: Duplicates between two files

PostPosted: Mon Feb 14, 2011 11:13 pm
by Brifka
Hi,
sorry
Please find the message
ICE201I H RECORD TYPE IS F - DATA STARTS IN POSITION 1

Re: Duplicates between two files

PostPosted: Mon Feb 14, 2011 11:19 pm
by Brifka
I tried

//SYSIN    DD    *                               
  JOINKEYS F1=IN1,FIELDS=(4,4,A)                 
  JOINKEYS F2=IN2,FIELDS=(4,4,A)                 
  JOIN UNPAIRED,F1,F2                           
  REFORMAT FIELDS=(?,F1:1,20,F2:1,20)           
  OPTION COPY                                   
  OUTFIL FNAMES=OUT1,INCLUDE=(1,1,CH,EQ,C'B'),   
      BUILD=(2,40)                               
/*                                               


and got the cartesian join, and no desired result.
I can split the files, remove duplicates and then merge and sort , but I'm looking for smarter solution :roll:
Thank you

Re: Duplicates between two files

PostPosted: Tue Feb 15, 2011 12:51 am
by Brifka
and one more question.
in my real file key is combination of CH and PD .
will it work with join?

Re: Duplicates between two files

PostPosted: Tue Feb 15, 2011 2:24 am
by skolusu
Brifka,

The following DFSORT JCL will give you the desired results.

//STEP0100 EXEC PGM=SORT                                        
//SYSOUT   DD SYSOUT=*                                          
//SORTIN   DD *                                                
//SORTOUT  DD DSN=&&HDR,DISP=(,PASS),SPACE=(TRK,(1,0),RLSE)    
//SYSIN    DD *                                                
  SORT FIELDS=COPY                                              
  OUTFIL REMOVECC,FTOV,BUILD=(80X),HEADER1=('$$$',10X)          
//*
//STEP0200 EXEC PGM=ICETOOL                                  
//TOOLMSG  DD SYSOUT=*                                        
//DFSMSG   DD SYSOUT=*                                        
//INP      DD DSN=&&HDR,DISP=SHR,VOL=REF=*.STEP0100.SORTOUT  
//         DD DSN=your input VB 606 file A,DISP=SHR
//         DD DSN=&&HDR,DISP=SHR,VOL=REF=*.STEP0100.SORTOUT  
//         DD DSN=your input VB 606 file B,DISP=SHR
//DUP      DD DSN=&&DUP,DISP=(,PASS),SPACE=(CYL,(1,1),RLSE)  
//INA      DD DSN=your input VB 606 file A,DISP=SHR
//         DD DSN=your input VB 606 file B,DISP=SHR
//OUT      DD SYSOUT=*
//TOOLIN   DD *                                                  
  SORT FROM(INP) USING(CTL1)                                    
  SORT JKFROM USING(CTL2)                                        
//CTL1CNTL DD *                                                  
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,5,3,X,8,4)),                
  IFTHEN=(WHEN=GROUP,BEGIN=(5,3,CH,EQ,C'$$$'),PUSH=(8:ID=1))    
  SORT FIELDS=(8,5,CH,A),EQUALS                                  
  SUM FIELDS=NONE                                                
  OUTFIL FNAMES=DUP,VTOF,OMIT=(5,3,CH,EQ,C'$$$'),BUILD=(8,5)    
//CTL2CNTL DD *                                                  
  OPTION COPY                                                    
  JOINKEYS F1=INA,FIELDS=(8,4,A)                                
  JOINKEYS F2=DUP,FIELDS=(2,4,A)                                
  JOIN UNPAIRED                                                  
  REFORMAT FIELDS=(F1:1,4,?,F2:1,1,F1:5)                        
  OUTFIL FNAMES=OUT,INCLUDE=(5,2,CH,EQ,C'B3'),BUILD=(1,4,7)      
//JNF2CNTL DD *                                                  
  SUM FIELDS=(1,1,ZD)                                            
//*


brifka wrote:and one more question.
in my real file key is combination of CH and PD .


The join is performed treating the keys as binary. If all your PD fields have positive values then the match will work fine. However if you have both positive and negative values then you need to normalize the keys.

Re: Duplicates between two files

PostPosted: Tue Feb 15, 2011 2:27 am
by Frank Yaeger
Brifka,

Something doesn't add up here. You say that your input files are VB and your key is 4,4,ZD. But that doesn't match the example you show - a VB file has an RDW in positions 1-4, so your key would be 8,4,ZD.

If you actually used 4,4,ZD, you wouldn't get any matches so how did you get a "cartesian join"?

Please verify that your input files are actually VB and your key is 8,4,ZD.

in my real file key is combination of CH and PD .
will it work with join?


JOINKEYS works with binary keys. CH is binary. PD can be treated as binary in many cases.
You'd have to show an example what your PD values actually look like in hex (positive values only? positive and negative values?) before I could answer your question for sure.

Re: Duplicates between two files

PostPosted: Tue Feb 15, 2011 2:28 am
by Brifka
Thank you so much..