I have two files with a 45 byte key. The key identifies one record but the records are in lots. To identify a lot, 22 byte keys are enough.
A sample 45 byte key would be
I want to just find a match for the 22 bytes in the file F1 and not the 45 bytes to identify if the lot exists in the file F2. There may be cases when the record may not exist in F2 but the lot may be there so it is necessary to match 22 bytes.
Then using outfil I am separating records
> if there is a match with a field as zero or if no match and a date field is older than 5 days.
> if there is a match with a field as non-zero or if no match and a date field is within 5 days old.
Please find my sort card below
The problem I have is that the input(in both files) may go upto billions, yes billions of records. And this card is very inefficient though it works.
The join creates huge number of dupes and then removes the dupes --- I want to know if there is a way around this.
Bottom line - I want the count of records in F1 to be split in the output. The driver(File F1) records should not have duplicates(for 45 bytes) just after the join nor do I want any records to be missing from the driver. The input file F2 will also have dupes with 22 byte keys.
Could you please help me achieve this with Syncsort in the most efficient manner?
I have searched everywhere but am not getting any clue.
Thanks in advance for your help.