Page 1 of 1

Keep only the highest score for each subject

PostPosted: Fri Apr 18, 2008 9:19 am
by cbrio
I have exam data for which I want to keep only the highest score for each ID by subject.

Sample input:

bytes 1-9 represent the ID, 10-12 subject, 13-22 date, and 23-25 score (scores are character fields because they contain the values 000-100 and bbP, bbF, and Wbb (bbP is a higher score than Wbb, Wbb is a higher score than bbF; bbP is equivalent to scores GE 055, bbF is equivalent to scores LE 054).

12345678904806/10/2004065
12345678904801/10/2003048
12345678904806/10/2005086
12345678905901/10/2003bbP
12345678905906/10/2002bbF
12345678905901/10/2005bbW
98765432120306/10/2002000
98765432120306/10/2003025
98765432120301/10/2003048
98765432130406/10/2005Wbb
98765432130401/10/2006bbF

Desired output
12345678904806/10/2005086
12345678905901/10/2003bbP
98765432120301/10/2003048
98765432130406/10/2005Wbb


I tried using ICETOOL SELECT operator, but couldn't get it to work. Any suggestions would be greatly appreciated.

Thanks.
Cindy

Re: Keep only the highest score for each subject

PostPosted: Fri Apr 18, 2008 11:15 pm
by Frank Yaeger
Your description of which scores are higher than other scores is rather difficult to understand.

Please just list the scores in order from highest to lowest - something like (and I doubt this is correct):

bbP
Wbb
100
...
000
bbF

Re: Keep only the highest score for each subject

PostPosted: Mon Apr 21, 2008 7:55 am
by cbrio
Here's how I would rank them if I were selecting them by hand. Though I'm not entirely confident in this order because the decision rules are complex. (A score of W means that the test was not required, however, if they took the test and passed even though it is not required, we want to record this. But if they fail it, we don't want to penalize them for attempting it. The regular passing score is 65, scores between 55 and 64 are considered a variant pass, and anything below 55 is failing.) Numeric scores take precedence over P/F. Scores of W take precedence over failing scores (alpha or numeric). Passing scores (alpha or numeric) take precedence over W.

100
99
.
.
.
65
bbP
64
63
.
.
.
55
Wbb
54
53
52
.
.
.
0
bbF

Re: Keep only the highest score for each subject

PostPosted: Mon Apr 21, 2008 9:21 pm
by Frank Yaeger
If you're not sure of the "rules", then it's difficult to tell you how to encode them in a DFSORT job.

However, conceptually you can just add a new field with the score multiplied by 10 for the numeric values, and values in between for the non-numeric values: 645 for ' P', 545 for 'W ' and -01 for ' F'. You can then sort on the id and subject ascending and the new field descending to get the records in the correct order. Finally, you can use OUTFIL with SECTIONS and HEADER3 to get the records you want. Here's a DFSORT job that shows how to do that:

//S1    EXEC  PGM=ICEMAN
//SYSOUT    DD  SYSOUT=*
//SORTIN DD DSN=...  input file
//SORTOUT DD DSN=...  output file
//SYSIN    DD    *
  INREC IFTHEN=(WHEN=(23,3,CH,EQ,C'  P'),OVERLAY=(26:C'645')),
    IFTHEN=(WHEN=(23,3,CH,EQ,C'W'),OVERLAY=(26:C'545')),
    IFTHEN=(WHEN=(23,3,CH,EQ,C'  F'),OVERLAY=(26:C'-01')),
    IFTHEN=(WHEN=NONE,
      OVERLAY=(26:23,3,ZD,MUL,+10,TO=ZD,LENGTH=3))
  SORT FIELDS=(1,9,CH,A,10,3,CH,A,26,3,FS,D)
  OUTFIL REMOVECC,NODETAIL,
    BUILD=(1,25),
    SECTIONS=(1,12,
      HEADER3=(1,25))
/*

Re: Keep only the highest score for each subject

PostPosted: Tue Apr 22, 2008 9:27 pm
by cbrio
Thank you very much Frank.

I have one more question. Does the LENGTH=3 parameter refer to the length of the newly created multiple of the score?

Re: Keep only the highest score for each subject

PostPosted: Tue Apr 22, 2008 9:36 pm
by Frank Yaeger
Does the LENGTH=3 parameter refer to the length of the newly created multiple of the score?


Yes, it sets the length of the new score to 3 bytes (all we need) in positions 26-28. Without LENGTH=3, the length would be 15 by default.