Page 1 of 1

discarding duplicates using condition in SORT

PostPosted: Thu Feb 22, 2018 3:02 pm
by parthiban_82
I have 2 files which I need to join and get the union of both after removing duplicates. But I need to remove the duplicates based on a 3rd file. If the duplicate record is present in 3rd file then I need to pick the duplicate record from 1st file else I need to pick the record from 2nd file.

Sample

1st file :
account1record from file1
account2record from file1
account3record from file1

2nd file :
account1record from file2
account2record from file2
account4record from file2

3rd file
account1record from file3

Expected result
account1record from file1 ------ since this is present in 3rd file
account2record from file2 ------ since this is not present in 3rd file
account3record from file1
account4record from file2

Re: discarding duplicates using condition in SORT

PostPosted: Thu Feb 22, 2018 3:50 pm
by NicC
They are not 'files' but data sets. You would need two steps - step one matches data set one and data set 3 dropping the duplicates and creataing a 4th data set. Tis would then be merged with data set 2 to create data set 5.
Or you could write a program to do a 3 data set match.

Re: discarding duplicates using condition in SORT

PostPosted: Thu Feb 22, 2018 6:54 pm
by parthiban_82
Hi Nic .. Thanks for the reply .. But if my 3rd dataset has other accounts it should not be part of my output. The 3rd dataset is just a reference to check whether the account in the 1st dataset is present or not. Please correct me if I am missing something.


Sample

1st file :
account1record from file1
account2record from file1
account3record from file1

2nd file :
account1record from file2
account2record from file2
account4record from file2

3rd file
account1record from file3
account5record from file3

Expected result
account1record from file1 ------ since this is present in 3rd file
account2record from file2 ------ since this is not present in 3rd file
account3record from file1
account4record from file2