Keep only the highest score for each subject



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Keep only the highest score for each subject

Postby cbrio » Fri Apr 18, 2008 9:19 am

I have exam data for which I want to keep only the highest score for each ID by subject.

Sample input:

bytes 1-9 represent the ID, 10-12 subject, 13-22 date, and 23-25 score (scores are character fields because they contain the values 000-100 and bbP, bbF, and Wbb (bbP is a higher score than Wbb, Wbb is a higher score than bbF; bbP is equivalent to scores GE 055, bbF is equivalent to scores LE 054).

12345678904806/10/2004065
12345678904801/10/2003048
12345678904806/10/2005086
12345678905901/10/2003bbP
12345678905906/10/2002bbF
12345678905901/10/2005bbW
98765432120306/10/2002000
98765432120306/10/2003025
98765432120301/10/2003048
98765432130406/10/2005Wbb
98765432130401/10/2006bbF

Desired output
12345678904806/10/2005086
12345678905901/10/2003bbP
98765432120301/10/2003048
98765432130406/10/2005Wbb


I tried using ICETOOL SELECT operator, but couldn't get it to work. Any suggestions would be greatly appreciated.

Thanks.
Cindy
cbrio
 
Posts: 15
Joined: Thu Apr 17, 2008 11:34 am
Has thanked: 0 time
Been thanked: 0 time

Re: Keep only the highest score for each subject

Postby Frank Yaeger » Fri Apr 18, 2008 11:15 pm

Your description of which scores are higher than other scores is rather difficult to understand.

Please just list the scores in order from highest to lowest - something like (and I doubt this is correct):

bbP
Wbb
100
...
000
bbF
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Keep only the highest score for each subject

Postby cbrio » Mon Apr 21, 2008 7:55 am

Here's how I would rank them if I were selecting them by hand. Though I'm not entirely confident in this order because the decision rules are complex. (A score of W means that the test was not required, however, if they took the test and passed even though it is not required, we want to record this. But if they fail it, we don't want to penalize them for attempting it. The regular passing score is 65, scores between 55 and 64 are considered a variant pass, and anything below 55 is failing.) Numeric scores take precedence over P/F. Scores of W take precedence over failing scores (alpha or numeric). Passing scores (alpha or numeric) take precedence over W.

100
99
.
.
.
65
bbP
64
63
.
.
.
55
Wbb
54
53
52
.
.
.
0
bbF
cbrio
 
Posts: 15
Joined: Thu Apr 17, 2008 11:34 am
Has thanked: 0 time
Been thanked: 0 time

Re: Keep only the highest score for each subject

Postby Frank Yaeger » Mon Apr 21, 2008 9:21 pm

If you're not sure of the "rules", then it's difficult to tell you how to encode them in a DFSORT job.

However, conceptually you can just add a new field with the score multiplied by 10 for the numeric values, and values in between for the non-numeric values: 645 for ' P', 545 for 'W ' and -01 for ' F'. You can then sort on the id and subject ascending and the new field descending to get the records in the correct order. Finally, you can use OUTFIL with SECTIONS and HEADER3 to get the records you want. Here's a DFSORT job that shows how to do that:

//S1    EXEC  PGM=ICEMAN
//SYSOUT    DD  SYSOUT=*
//SORTIN DD DSN=...  input file
//SORTOUT DD DSN=...  output file
//SYSIN    DD    *
  INREC IFTHEN=(WHEN=(23,3,CH,EQ,C'  P'),OVERLAY=(26:C'645')),
    IFTHEN=(WHEN=(23,3,CH,EQ,C'W'),OVERLAY=(26:C'545')),
    IFTHEN=(WHEN=(23,3,CH,EQ,C'  F'),OVERLAY=(26:C'-01')),
    IFTHEN=(WHEN=NONE,
      OVERLAY=(26:23,3,ZD,MUL,+10,TO=ZD,LENGTH=3))
  SORT FIELDS=(1,9,CH,A,10,3,CH,A,26,3,FS,D)
  OUTFIL REMOVECC,NODETAIL,
    BUILD=(1,25),
    SECTIONS=(1,12,
      HEADER3=(1,25))
/*
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: Keep only the highest score for each subject

Postby cbrio » Tue Apr 22, 2008 9:27 pm

Thank you very much Frank.

I have one more question. Does the LENGTH=3 parameter refer to the length of the newly created multiple of the score?
cbrio
 
Posts: 15
Joined: Thu Apr 17, 2008 11:34 am
Has thanked: 0 time
Been thanked: 0 time

Re: Keep only the highest score for each subject

Postby Frank Yaeger » Tue Apr 22, 2008 9:36 pm

Does the LENGTH=3 parameter refer to the length of the newly created multiple of the score?


Yes, it sets the length of the new score to 3 bytes (all we need) in positions 26-28. Without LENGTH=3, the length would be 15 by default.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times


Return to DFSORT/ICETOOL/ICEGENER