Hi ,
I have a real complex requirement. Here it goes.
Input
Dataset 0 (FB):
A list of 62000 Policies
Dataset 1 (VB): ( named in date format, M0208 for Feb 2008, M1198 for November 1998 )
Date record ( specifies when the Dataset is generated )
Over a million policy records, sorted order, Policy Number being the key
Dataset 2(VB):
Date record ( specifies when the Dataset is generated )
Over a million policy records, sorted order, Policy Number being the key
So on to about 250 datasets.
Requirement
I need to extract policies from all the 250 datasets that match the Dataset 0, of 62000 policies.
I need the duplicates. So if a policy is found in all 250 datasets, I should have 250 records for that policy.
I have done this part, in two steps.
Step 1. To merge all the 250 dataset records ( with same filter conditions that match the criteria of Dataset 0 ) ,
Step 2. Used WAAPDSUT to match the Dataset 0 with the output of step 1.
Here comes the complex bit, When I read the final output, I should be able to tell the Data set it has come from ( that can be deduced from the Date field of the Date record ).
Obviously the above 2 steps discards the Date record form all the 250 Datasets., Even if the date records were to be kept they would be sorted and clustered at one place and can not be associated with each record.
The only way I can think of is to append a corresponding Date Identifier at the end of each record of all of the 250 dataset, before I start to merge these 250 datasets.
Here is the Illustration:
Dataset 0: Policies
12345
23456
34567
45678
56789
DataSet 1:
M0188 (Date Record)
12345ZZZZZZZZZZZZXXXXXXXXXXXXXXKKKKKKKAAAAAAAAA
23456AAAAAAAAAAJJJJJJJJJJJJJDDDDDDDDDDLLLLLLLLEEEEEEEEEE
89898SFADSFJA;LDFSKDSLFSDFADSFASDFASDFDS
DataSet 2:
M0288
12345SSSSDDDDDDDDDKKKKKKKKKKKEWWWWWWWWWWWLLLLLLLL
23456FFFFFLLLLLLLLDDDDDDDDDDIIIIIIIIIIIEEEEEEEEEE
34567IIIIIUUUUUTTTTTTTTLLLLLLLLLEEEOOOOOOOOEEEE
22222ASF;LSJDFLDFSADSLFADSFLADSF
DataSet 250:
M0909
12345KKKKKKKKKKKJJJJJJJRRRRRRRRRRRRLLLLLLLLLLEEEEEEEE
23456IIIOOOOOOLLLLLLLLLLLLEEEEEUUUUUUUUUUURRRRRRRRRRRRRRRKKKK
34567SFJAEIRAWE;SLKDFAS;FASDF;ALSDFJASDFA
45678SFASDFSADFJDS;LDFSSDFDSFDSFASDF
56789ASDFDSFAOFSDF;LASDFIEWRWEQ
88888UUUUUUUUUUULLLLLLLLLLLLLLADFERWREQRWE
77777DFSASDFJDSFASDFADSFADSFADSFDS
Output:
12345ZZZZZZZZZZZZXXXXXXXXXXXXXXKKKKKKKAAAAAAAAAM0188
12345SSSSDDDDDDDDDKKKKKKKKKKKEWWWWWWWWWWWLLLLLLLLM0288
12345KKKKKKKKKKKJJJJJJJRRRRRRRRRRRRLLLLLLLLLLEEEEEEEEM0909
23456AAAAAAAAAAJJJJJJJJJJJJJDDDDDDDDDDLLLLLLLLEEEEEEEEEEM0188
23456FFFFFLLLLLLLLDDDDDDDDDDIIIIIIIIIIIEEEEEEEEEEM0288
23456IIIOOOOOOLLLLLLLLLLLLEEEEEUUUUUUUUUUURRRRRRRRRRRRRRRKKKKM0909
34567IIIIIUUUUUTTTTTTTTLLLLLLLLLEEEOOOOOOOOEEEEM0288
34567SFJAEIRAWE;SLKDFAS;FASDF;ALSDFJASDFAM0909
45678SFASDFSADFJDS;LDFSSDFDSFDSFASDFM0909
56789ASDFDSFAOFSDF;LASDFIEWRWEQM0909
The Date field added at the end of each record is for illustration, but to add a field may be only possible way out.
Could somebody help me on this please?
Regards,
Tilak