match data across four files and write matching data



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

match data across four files and write matching data

Postby savitha_y » Mon Jun 01, 2009 11:07 pm

Hi,

I have a requirement where i need to match data across four files for the first two fields and write matching data into an output file. The file1 is master file and having duplicate records where as the other three file will have single record for the same key combination. I need to write an output will all duplicate records from the first file with the data from the other three files as shown below. I tried SPLICE function with WITHALL option. It did not work out. Could any one please suggest me..

File 1

name length start end
field1 8 1 8
field2 8 9 16
field3 5 17 21
field4 20 22 41
field5 5 42 46
field6 12 47 58


----+----1----+----2----+----3----+----4----+----5----+---
***************************** Top of Data ****************
0501082612893 1021 40477332363003 1 1049
0501082612893 1021 40477332363011 3 0
0501082612893 1021 4543120131583538 8 0
0501082612893 1021 12 0
0501082612893 1026 3 41037
0501082612893 1035 1 1738
0501082612893 1035 3 1
0501082612893 1048 1 1037
0501082612893 1048 3 25000
0489458212903 1021 40477361641204 1 412
0489458212903 1021 40477391690809 2 0
0489458212903 1021 40477332650177 3 0
0489458212903 1021 12 0
0489458212903 3005 8 -797
0489458212903 9017 2 0
0489458212903 9017 2 0
0489458212903 9017 2 0


File 2

name length start end
field1 8 1 8
field2 8 9 16
field3 8 17 24
field4 1 25 25
field5 1 26 26
field6 8 27 34

----+----1----+----2----+----3----
0183177212892 200404260020050615
0501082612893 200404260020070831
0489458212903 200404260020070831

File 3

name length start end
field1 8 1 8
field2 8 9 16
field3 8 17 24

----+----1----+----2----
0044402712889 20090314
0501082612893 20090508
0304833412902 20090524
0489458212903 20071227


File 4

name length start end
field1 8 1 8
field2 8 9 16
field3 1 17 17

----+----1----+----2----+
0183177212892 A20070829
0501082612893 U20070829
0303768512894 U20070829
0489458212903 U20070829


Output file

name length start end
CUST_ID 8 1 8
EWISE_ID 6 9 14
INST_ID 5 15 19
ACC_TYPE 5 20 24
ACC_NUM 30 25 54
REC_ACDT 8 55 62
BALANCE 12 63 74
MKT_ALL 1 75 75
MKT_WEB 1 76 76
REG_DT 8 77 84
REG_DT 8 85 92
USER_STATUS 1 93 93


----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9---
********************************* Top of Data ***********************************************
0501082612893 1021 1 40477332363003 200708311049 002004042620090508U
0501082612893 1021 3 40477332363011 200708310 002004042620090508U
0501082612893 1021 8 4543120131583538 200708310 002004042620090508U
0501082612893 1021 12 200708310 002004042620090508U
0501082612893 1026 3 2007083141037 002004042620090508U
0501082612893 1035 1 200708311738 002004042620090508U
0501082612893 1035 3 200708311 002004042620090508U
0501082612893 1048 1 200708311037 002004042620090508U
0501082612893 1048 3 2007083125000 002004042620090508U
0489458212903 1021 1 40477361641204 20070831412 002004042620071227U
0489458212903 1021 2 40477391690809 200708310 002004042620071227U
0489458212903 1021 3 40477332650177 200708310 002004042620071227U
0489458212903 1021 12 200708310 002004042620071227U
0489458212903 3005 8 20070831-797 002004042620071227U
0489458212903 9017 2 200708310 002004042620071227U
0489458212903 9017 2 200708310 002004042620071227U
0489458212903 9017 2 200708310 002004042620071227U
savitha_y
 
Posts: 14
Joined: Mon May 11, 2009 3:05 pm
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

 

Re: match data across four files and write matching data

Postby Frank Yaeger » Tue Jun 02, 2009 1:03 am

For the layout of input File1, File2, File3 and File4 you used Field1...Fieldn names. Then for the output, you used different names (e.g. CUST_ID). How are we supposed to relate the input field names to the output field names? Please use the same names for the input fields and output fields, e.g. something like F1_FLD1, F1_FLD2, ..., F2_FLD1, ...

And please use ubb code tags so you can line up your examples the way they should look ... put <code> before the data and </code> after the data, but use [ ] instead of < >. Here's an example:

A   B   C
D   E   F
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Re: match data across four files and write matching data

Postby arcvns » Tue Jun 02, 2009 11:16 pm

put <code> before the data and </code> after the data, but use [ ] instead of < >. Here's an example:
Or else just select the part of your post which you want to "Code" and click on the 'Code' Button provided just above the typing area.
Arun
User avatar
arcvns
 
Posts: 55
Joined: Sat Feb 28, 2009 12:36 am
Location: India
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

Postby savitha_y » Tue Jun 02, 2009 11:27 pm

Hi Frank,

Please find file data and layouts given below.
File 1

name      length      start    end
F1-Field1   8      1   8
F1-Field2   8      9   16
F1-Field3   5      17   21
F1-Field4   20      22   41
F1-Field5   5      42   46
F1-Field6   12      47   58


----+----1----+----2----+----3----+----4----+----5----+---
***************************** Top of Data ****************
0501082612893   1021 40477332363003      1    1049       
0501082612893   1021 40477332363011      3    0           
0501082612893   1021 4543120131583538    8    0           
0501082612893   1021                     12   0           
0501082612893   1026                     3    41037       
0501082612893   1035                     1    1738       
0501082612893   1035                     3    1           
0501082612893   1048                     1    1037       
0501082612893   1048                     3    25000       
0489458212903   1021 40477361641204      1    412         
0489458212903   1021 40477391690809      2    0           
0489458212903   1021 40477332650177      3    0           
0489458212903   1021                     12   0           
0489458212903   3005                     8    -797       
0489458212903   9017                     2    0           
0489458212903   9017                     2    0           
0489458212903   9017                     2    0           


File 2

name      length   start   end
F2-Field1   8   1   8
F2-Field2   8   9   16
F2-Field3   8   17   24
F2-Field4   1   25   25
F2-Field5   1   26   26
F2-Field6   8   27   34

----+----1----+----2----+----3----
0183177212892   200404260020050615
0501082612893   200404260020070831
0489458212903   200404260020070831

File 3

name      length   start    end
F3-Field1   8   1   8
F3-Field2   8   9   16
F3-Field3   8   17   24

----+----1----+----2----
0044402712889   20090314
0501082612893   20090508
0304833412902   20090524
0489458212903   20071227


File 4

name      length   start   end
F4-Field1   8   1   8
F4-Field2   8   9   16
F4-Field3   1   17   17

----+----1----+----2----+
0183177212892   A20070829
0501082612893   U20070829
0303768512894   U20070829
0489458212903   U20070829


Output file

name      length   start    end
F1-Field1   8   1   8
F1-Field2   6   9   14
F1-Field3   5   15   19
F1-Field5   5   20   24
F1-Field4   30   25   54
F2-Field6   8   55   62
F1-Field6   12   63   74
F2-Field5   1   75   75
F2-Field4   1   76   76
F2-Field3   8   77   84
F3-Field3   8   85   92
F4-Field3   1   93   93

though F1-Field4 is of length 20 it needs to be written as 30 bytes length with 10 bytes spaces.
from F1-Field2 first 6 bytes needs to be written onto output. first field and
first 6bytes of second field needs to be matched across four files.


----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9---
********************************* Top of Data ***********************************************
0501082612893 1021 1    40477332363003                200708311049        002004042620090508U     
0501082612893 1021 3    40477332363011                200708310           002004042620090508U     
0501082612893 1021 8    4543120131583538              200708310           002004042620090508U     
0501082612893 1021 12                                 200708310           002004042620090508U     
0501082612893 1026 3                                  2007083141037       002004042620090508U     
0501082612893 1035 1                                  200708311738        002004042620090508U     
0501082612893 1035 3                                  200708311           002004042620090508U     
0501082612893 1048 1                                  200708311037        002004042620090508U     
0501082612893 1048 3                                  2007083125000       002004042620090508U     
0489458212903 1021 1    40477361641204                20070831412         002004042620071227U         
0489458212903 1021 2    40477391690809                200708310           002004042620071227U             
0489458212903 1021 3    40477332650177                200708310           002004042620071227U             
0489458212903 1021 12                                 200708310           002004042620071227U             
0489458212903 3005 8                                  20070831-797        002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
savitha_y
 
Posts: 14
Joined: Mon May 11, 2009 3:05 pm
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

Postby Frank Yaeger » Tue Jun 02, 2009 11:48 pm

first field and first 6bytes of second field needs to be matched across four files


So you only want records that have a match for ALL four files on the first 14 bytes of each file?

In your example, file1 has duplicates within it, but the other files don't. Is that always the case, or can any of the other input files have duplicates within it (e.g. two 0501082612893 keys in file2)?
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Re: match data across four files and write matching data

Postby savitha_y » Wed Jun 03, 2009 11:15 am

Hi Frank,

Only file1 will have duplicates where as other 3 files don't have any duplicates. I need to get all the duplicates from file1 for matching key values splicing with the required data from the other three files as shown in my output.

Thanks..
savitha_y
 
Posts: 14
Joined: Mon May 11, 2009 3:05 pm
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

Postby Frank Yaeger » Wed Jun 03, 2009 9:13 pm

Sigh. I hate to have to keep repeating my questions, but I don't want to spend time doing a solution and then have you tell me that's not what you wanted. So again:

So you only want records that have a match for ALL four files on the first 14 bytes of each file?


The example you showed has records with two keys, both of which have a match in ALL of the other three files. Is it possible for file1 to have a key that isn't in one of the other three files and if so, do you want the file1 record with that key in the output file, or do you only want records from file1 that have keys in ALL FOUR FILES? So for example, if file1 had key1, and file2 had key1, but file3 and file4 did not have key1, would you want key1 from file1 in the output, or do you only want key1 from file1 in the output if key1 appears in file2, file3 AND file4? It would help if you would show me an example with all of the possible variations (key in file1 but not in other files, key in file1 and one other file, key in all four files, etc).
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Re: match data across four files and write matching data

Postby savitha_y » Wed Jun 03, 2009 9:26 pm

Sorry Frank if i am not clear enough.

I need records from file1 that have matching keys in ALL FOUR FILES as shown in my example. Only file1 will have duplicates.
savitha_y
 
Posts: 14
Joined: Mon May 11, 2009 3:05 pm
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

Postby savitha_y » Thu Jun 04, 2009 12:40 am

Hi Frank, please find all possibilities given below for match.
File 1

name      length      start    end
F1-Field1   8      1   8
F1-Field2   8      9   16
F1-Field3   5      17   21
F1-Field4   20      22   41
F1-Field5   5      42   46
F1-Field6   12      47   58


----+----1----+----2----+----3----+----4----+----5----+---
***************************** Top of Data ****************
0501082612893   1021 40477332363003      1    1049       
0501082612893   1021 40477332363011      3    0           
0501082612893   1021 4543120131583538    8    0           
0501082612893   1021                     12   0           
0501082612893   1026                     3    41037       
0501082612893   1035                     1    1738       
0501082612893   1035                     3    1           
0501082612893   1048                     1    1037       
0501082612893   1048                     3    25000       
0489458212903   1021 40477361641204      1    412         
0489458212903   1021 40477391690809      2    0           
0489458212903   1021 40477332650177      3    0           
0489458212903   1021                     12   0           
0489458212903   3005                     8    -797       
0489458212903   9017                     2    0           
0489458212903   9017                     2    0           
0489458212903   9017                     2    0
0183177212892   3425                     3    0
0304833412902   7863                     12   45
0345477383222   1235                     3    0



File 2

name      length   start   end
F2-Field1   8   1   8
F2-Field2   8   9   16
F2-Field3   8   17   24
F2-Field4   1   25   25
F2-Field5   1   26   26
F2-Field6   8   27   34

----+----1----+----2----+----3----
0183177212892   200404260020050615
0501082612893   200404260020070831
0489458212903   200404260020070831
0304833412902   200404260020070831

File 3

name      length   start    end
F3-Field1   8   1   8
F3-Field2   8   9   16
F3-Field3   8   17   24

----+----1----+----2----
0044402712889   20090314
0501082612893   20090508
0304833412902   20090524
0489458212903   20071227


File 4

name      length   start   end
F4-Field1   8   1   8
F4-Field2   8   9   16
F4-Field3   1   17   17

----+----1----+----2----+
0501082612893   U20070829
0303768512894   U20070829
0489458212903   U20070829

Key 0304833412902 is not available on file 4 hence it should not be written onto output
Key 0183177212892 is not available on file 3 and file 4 hence it should not be written onto output
Key 0345477383222 is not available on file 2,file 3,file 4 hence it should not be written onto output
Keys 0501082612893 and 0489458212903 are available on four files. Hence those needs to be written onto output
as shown below

Output file

name      length   start    end
F1-Field1   8   1   8
F1-Field2   6   9   14
F1-Field3   5   15   19
F1-Field5   5   20   24
F1-Field4   30   25   54
F2-Field6   8   55   62
F1-Field6   12   63   74
F2-Field5   1   75   75
F2-Field4   1   76   76
F2-Field3   8   77   84
F3-Field3   8   85   92
F4-Field3   1   93   93

though F1-Field4 is of length 20 it needs to be written as 30 bytes length with 10 bytes spaces.
from F1-Field2 first 6 bytes needs to be written onto output.


----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9---
********************************* Top of Data ***********************************************
0501082612893 1021 1    40477332363003                200708311049        002004042620090508U     
0501082612893 1021 3    40477332363011                200708310           002004042620090508U     
0501082612893 1021 8    4543120131583538              200708310           002004042620090508U     
0501082612893 1021 12                                 200708310           002004042620090508U     
0501082612893 1026 3                                  2007083141037       002004042620090508U     
0501082612893 1035 1                                  200708311738        002004042620090508U     
0501082612893 1035 3                                  200708311           002004042620090508U     
0501082612893 1048 1                                  200708311037        002004042620090508U     
0501082612893 1048 3                                  2007083125000       002004042620090508U     
0489458212903 1021 1    40477361641204                20070831412         002004042620071227U         
0489458212903 1021 2    40477391690809                200708310           002004042620071227U             
0489458212903 1021 3    40477332650177                200708310           002004042620071227U             
0489458212903 1021 12                                 200708310           002004042620071227U             
0489458212903 3005 8                                  20070831-797        002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
0489458212903 9017 2                                  200708310           002004042620071227U             
savitha_y
 
Posts: 14
Joined: Mon May 11, 2009 3:05 pm
Has thanked: 0 time
Been thanked: 0 time

Re: match data across four files and write matching data

Postby Frank Yaeger » Thu Jun 04, 2009 1:17 am

You can use a DFSORT/ICETOOL job like the following. The output records will be in sorted order by the key. If you really want them in their original file1 input order as shown in your example, you'll need another pass (to add sequence numbers to file1 and sort on them).

//S1   EXEC  PGM=ICETOOL
//TOOLMSG   DD  SYSOUT=*
//DFSMSG    DD  SYSOUT=*
//IN1 DD DSN=...  input file1 (FB/58)
//IN2 DD DSN=...  input file1 (FB/34)
//IN3 DD DSN=...  input file1 (FB/24)
//IN4 DD DSN=...  input file1 (FB/17)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//OUT DD DSN=...  output file (FB/93)
//TOOLIN DD *
COPY FROM(IN2) TO(T1) USING(CTL1)
COPY FROM(IN3) TO(T1) USING(CTL2)
COPY FROM(IN4) TO(T1) USING(CTL3)
SPLICE FROM(T1) TO(T2) ON(1,14,CH) WITHANY -
 WITH(55,8) WITH(75,1) WITH(76,1) WITH(77,8) WITH(85,8) WITH(93,1)-
 WITH(94,1) WITH(95,1) WITH(96,1) USING(CTL4)
COPY FROM(IN1) TO(T2) USING(CTL5)
SPLICE FROM(T2) TO(OUT) ON(1,14,CH) WITHALL -
  WITH(1,54) WITH(63,12) WITH(94,1) USING(CTL6)
/*
//CTL1CNTL DD *
  INREC BUILD=(1,14,55:27,8,75:26,1,76:25,1,77:17,8,94:C'B',2X)
/*
//CTL2CNTL DD *
  INREC BUILD=(1,14,85:17,8,95:C'B',X)
/*
//CTL3CNTL DD *
  INREC BUILD=(1,14,93:17,1,96:C'B')
/*
//CTL4CNTL DD *
  OUTFIL FNAMES=T2,INCLUDE=(94,3,CH,EQ,C'BBB')
/*
//CTL5CNTL DD *
  INREC BUILD=(1,14,15:17,5,20:42,5,25:22,20,63:47,12,94:C'VV',X)
/*
//CTL6CNTL DD *
  OUTFIL FNAMES=OUT,INCLUDE=(94,2,CH,EQ,C'VB'),
    BUILD=(1,93)
/*
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1080
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 14 times

Next

Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post