 |
|
|
Home
|
Instructional Development |
Measurement & Evaluation
|
|
|
|
 |
- Table of Contents
- Summary of Test Statistics
- Test Frequency Distribution
- Item Difficulty and Discrimination: Quintile Table
- Interpreting Item Statistics
- MERMAC - Test Analysis and Questionnaire Package
TEST ITEM PERFORMANCE: THE ITEM ANALYSIS
The ITEM ANALYSIS output consists of four parts: A summary of test statistics, a
test frequency distribution, an item quintile table, and item statistics. This
analysis can be processed for an entire class. If it is of interest to compare
the item analysis for different test forms, then the analysis can be processed
by test form. The Division of Measurement and Evaluation staff is available to
help instructors interpret their item analysis data.
Summary of Test Statistics
Part I of the ITEM ANALYSIS consists of a summary of the following
statistics:
* * * MERMAC -- TEST ANALYSIS AND QUESTIONNAIRE PACKAGE * * *
SAMPLE ITEM ANALYSIS
SUMMARY OF TEST STATISTICS
NUMBER OF ITEMS: 80
(Number of items on the test.)
MEAN SCORE 60.92
(Arithmetic average; the sum of
all scores divided by the
number of scores.)
MEDIAN SCORE 63.15
(The raw score point that divides the
raw score distribution in half; 50%
of the scores fall above the median
and 50% fall below.)
STANDARD DEVIATION 12.24
(Measure of the spread or variability of
the score distribution. The higher
the value of the standard deviation,
the better the test is discriminating
among student performance levels.)
RELIABILITY (KR-20) 0.915
(Is an estimate of test
reliability indicating the internal
consistency of the test. The range of
the reliability is from 0.00 to 1.00. A
reliability of .70 or better is
desirable for classroom tests.)
RELIABILITY (KR-21) 0.915
(When item difficulties are approximately
equal, is an estimate of test
reliability indicating the internal
consistency of the test. The range of
the reliability is from 0.00 to 1.00.
A reliability of .70 or better is
desirable for classroom tests.)
S.E. OF MEASUREMENT 3.58
(The accuracy of measurement expressed
in the test score scale. The larger the
standard error, the less precise the
measure of student achievement.
Two-thirds of the time test takers
obtained scores fall within one standard
error of measurement of their true score.)
POSSIBLE LOW SCORE 0
(The possible low score.)
POSSIBLE HIGH SCORE 80
(The possible high score.)
OBTAINED LOW SCORE 0
(The obtained low score.)
OBTAINED HIGH SCORE 80
(The obtained high score.)
NUMBER OF SCORES 603
(The number of answer sheets submitted
for scoring.)
BLANK SCORES1 0
(Number of test scores that could be not
computed.)
INVALID SCORES 0
(Number of test scores out of range
specified by the user.)
VALID SCORES 603
(Only those scores that fall within
the range specified by the user are
included in the analysis so that
the user has the option of disregarding
certain scores.)
1Blank and invalid scores (those falling outside the specified
range) are counted and are omitted from the analysis.
Table of Contents
Test Frequency Distribution
Part II of the ITEM ANALYSIS program displays a test frequency distribution. The
raw scores are ordered from high to low with corresponding statistics:
- Standard score--a linear transformation of the raw score that sets
the mean equal to 500 and the standard deviation equal to 100; in normal score
distributions for classes of 500 students of more the standard score range
usually falls between 200 and 800 (plus or minus three standard deviations of
the mean); for classes with fewer than 30 students the standard score range
usually falls within two standard deviations of the mean, i.e., a range of 300
to 700.
- Percentile rank--the percentage of individuals who received a score
lower than the given score plus the percentage of half the individuals who
received the given score. This measure indicates a person's relative position
within a group.
- Percentage of people in the total group who received the given score.
- Frequency--in a test analysis, the number of individuals who receive
a given score.
- Cumulative frequency--in a test analysis, the number of individuals
who score at or below a given score value.
* * * MERMAC -- TEST ANALYSIS AND QUESTIONNAIRE PACKAGE * * *
SAMPLE ITEM ANALYSIS
TEST FREQUENCY DISTRIBUTION
RAW STANDARD PER- CUM
SCORE SCORE CENTILE PERCENT FREQ FREQ EACH * REPRESENTS 1 PERSON(S)
92 717 99 0.2 1 603 *
91 708 99 0.3 2 602 **
90 700 99 0.0 0 600
89 691 99 0.2 1 600 *
88 683 99 0.8 5 599 *****
87 675 99 0.3 2 594 **
86 666 98 1.0 6 592 ******
85 658 97 1.3 8 586 ********
84 649 96 1.2 7 578 *******
83 641 95 2.0 12 571 ************
82 632 93 1.7 10 559 **********
81 624 91 1.5 9 549 *********
80 615 90 1.5 9 540 *********
79 607 88 2.8 17 531 *****************
78 598 85 4.1 25 514 *************************
77 590 81 2.3 14 489 **************
76 562 79 4.0 24 475 ************************
75 573 75 2.2 13 451 *************
74 565 73 3.3 20 438 ********************
73 556 69 2.0 12 418 ************
72 548 67 3.8 23 406 ***********************
71 539 64 2.8 17 383 *****************
70 531 61 3.0 18 366 ******************
69 522 58 3.2 19 326 *******************
67 505 51 3.6 22 307 **********************
66 497 47 3.8 23 285 ***********************
65 489 43 2.7 16 262 ****************
64 480 41 3.2 19 246 *******************
63 472 38 2.5 15 227 ***************
62 463 35 3.2 19 212 *******************
61 455 32 2.5 15 193 ***************
60 446 30 1.8 11 178 ***********
59 438 28 2.3 14 167 **************
58 429 25 3.0 18 153 ******************
57 421 22 1.7 10 135 **********
56 413 21 3.2 12 106 ************
54 396 16 1.7 10 94 **********
53 387 14 1.5 9 84 *********
52 379 12 1.2 7 75 *******
51 370 11 2.0 12 68 ************
50 362 9 1.2 7 56 *******
49 353 8 1.3 8 49 ********
48 345 7 1.7 10 41 **********
Table of Contents
Item Difficulty and Discrimination: Quintile Table
Part III of the ITEM ANALYSIS output, an item quintile table, can aid in the
interpretation of Part IV of the output. Part IV compares the item responses
versus the total score distribution for each item. A good item discriminates
between students who scored high or low on the examination as a whole. In order
to compare different student performance levels on the examination, the score
distribution is divided into fifths, or quintiles. The first fifth includes
students who scored between the 81st and 100th percentiles; the second fifth
includes students who scored between the 61st and 80th percentiles, and so
forth. When the score distribution is skewed, more than one-fifth of the
students may have scores within a given quintile and as a result, less than
one-fifth of the students may score within another quintile. The table indicates
the sample size, the proportion of the distribution, and the score ranges within
each fifth.
* * * MERMAC -- TEST ANALYSIS AND QUESTIONNAIRE PACKAGE * * *
THE QUINTILE GRAPH AND MATRIX OF RESPONSES
APPEARING WITH EACH ITEM ARE BASED ON THE
STATISTICS INDICATED IN THE TABLE BELOW:
QUINTILE SAMPLE SIZE PROPORTION SCORE RANGE
1ST 128 0.21 77 - 92
2ND 127 0.21 70 - 76
3RD 121 0.20 64 - 69
4TH 121 0.20 56 - 63
5TH 106 0.18 24 - 55
Table of Contents
Interpreting Item Statistics
Part IV of ITEM ANALYSIS portrays item statistics which can help determine
which items are good and which need improvement or deletion from the
examination. The quintile graph on the left side of the output indicates the
percent of students within each fifth who answered the item correctly. A
good, discrimination item is one in which students who scored well on the
examination answered the correct alternative more frequently than students
who did not score well on the examination. Therefore, the scattergram graph
should form a line going from the bottom left-hand corner to the top
right-hand corner of the graph. Item 1 in the sample output shows an example
of this type of positive linear relationship. Item 2 in the sample output
also portrays a discriminating item; although few students correctly
answered the item, the students in the first fifth answered it correctly
more frequently than the students in the rest of the score distribution.
Item 3 indicates a poor item, the graph indicates no relationship between
the fifths of the score distribution and the percentage of correct responses
by fifths. However, it is likely that this item was miskeyed by the
instructor--note the response pattern for alternative B.
A. Evaluating Item Distractors: Matrix of Responses
On the right-hand side of the output, a matrix of responses by fifths shows
the frequency of students within each fifth who answered each alternative
and who omitted the item. This information can help point out what
distractors, or incorrect alternatives, are not successful because: (a) they
are not plausible answers and few or no students chose the alternative (see
alternatives D and E, item 2), or (b) too many students, especially students
in the top fifths of the distribution, chose the incorrect alternative
instead of the correct response (see alternative B, item 3). A good item
will result in students in the top fifths answering the correct response
more frequently than students in the lower fifths, and students in the lower
fifths answering the incorrect alternative more frequently than students in
the top fifths. The matrix of responses prints the correct response of the
item on the right-hand side and encloses the correct response in the matrix
in parentheses.
B. Item Difficulty: The PROP Statistic
The proportion (PROP) of students who answer each alternative and who omit
the item is printed in the first row below the matrix. The item difficulty
is the proportion of subjects in a sample who correctly answer the item. In
order to obtain maximum spread of student scores it is best to use items
with moderate difficulties. Moderate difficulty can be defined as the point
halfway between perfect score and chance score. For a five choice item,
moderate difficulty level is .60, or a range between .50 and .70 (because
100% correct is perfect and we would expect 20% of the group to answer the
item correctly by blind guessing).
Evaluating Item Difficulty. For the most part, items which are too
easy or too difficult cannot discriminate adequately between student performance
levels. Item 2 in the sample output is an exception; although the item
difficulty is .23, the item is a good, discriminating one. In item 4, everyone
correctly answered the item; the item difficulty is 1.00. Such an item does not
discriminate at all between good and poor students, and therefore does not
contribute statistically to the effectiveness of the examination. However, if
one of the instructor's goals is to check that all students grasp certain basic
concepts and if the examination is long enough to contain a sufficient number of
discrimination items, then such an item may remain on the examination.
C. Item Discrimination: Point Biserial Correlation (RPBI)
Interpreting the RBI Statistic. The point biserieal correlation
(RPBI) for each alternative and omit is printed below the PROP row. It
indicates the relationship between the item response and the total test
score within the group tested, i.e., it measures the discriminating power of
an item. It is interpreted similarly to other correlation coefficients.
Assuming that the total test score accurately discriminates among
individuals in the group tested, then high positive RPBI's for the correct
responses would represent the most discriminating items. That is, students
who answered the correct response scored well on the examination, whereas
students who not answer the correct response did not score well on the
examination. It is also interesting to check the RPBI's for the item
distractors, or incorrect alternatives. The opposite correlation between
total score and choice of alternative is expected for the incorrect vs. the
correct alternative. Where a high positive correlation is desired for
the RPBI of a correct alternative, a high negative correlation is
good for the RPBI of a distractor, i.e., students who answer with an
incorrect alternative did not score well on the total examination.
Due to restrictions incurred when correlating a continuous variable (total
examination score) with a dichotomous variable (response vs nonresponse of
an alternative), the highest possible RPBI is .80 instead of the usual
maximum value of 1.00 for a correlation. This maximum RPBI is directly
influenced by the item difficulty level. The maximum RPBI value of .80
occurs with items of moderate difficulty level; the further the difficulty
level deviates from the moderate difficulty level in either direction, the
lower the ceiling and RPBI. For example, the maximum RPBI is about .58 for
difficulty levels of .10 or .90. Therefore, in order to maximize item
discrimination, items of moderate difficulty level are preferred, although
easy and difficult items still can be discriminating (see item 2 in the
sample output).
Evaluating Item Discrimination. When an instructor examines the item
analysis data, the RPBI is an important indicator in deciding which items
are discriminating and should be retained, and which items are not
discriminating and should be revised or replaced by a better item (other
content considerations aside). The quintile graph also illustrates this same
relationship between item response and total scores. However, the RPBI is a
more accurate representation of this relationship. An item with a RPBI of
.25 or below should be examined critically for revision or deletion; items
with RPBIs of .40 and above are good discriminators. Note that all items,
not only those with RPBIs lower than .25, can be improved. An examination of
the matrix of responses by fifths for all items may point out weaknesses,
such as implausible distractors, that can be reduced by modifying the item.
It is important to keep in mind that the statistical functioning of an item
should not be the sole basis for deleting or retaining an item. The most
important quality of a classroom test is its validity, the extent to which
items measure relevant tasks. Items that perform poorly statistically might
be retained (and perhaps revised) if they correspond to specific
instructional objectives in the course. Items that perform well
statistically but are not related to specific instructional objectives
should be reviewed carefully before being reused.
References
Ebel, R. L. & Frisbee, D. A. (1986). Essentials of educational
measurement (4th ed.). Eaglewood Cliffs, NJ: New Jersey: Prentice-Hall,
Inc.
Guilford, J. P. Pshychometric method. New York: McGraw-Hill, 1954.
Gronlund, N. E. & Linn, R. L. (1990). Measurement and evaluation in
teaching (6th ed.). NY: MacMillan.
Osterlind, S. J. Constructing test items Norwell, MA: Kluwer
Academic Publishers, 1989.
Thorndike, Robert L. & Hagen, Elizabeth. Measurement and evaluation in
psychology and education (3rd ed.). New York: John Wiley & Sons, 1969,
Chapters 4, 6.
Table of Contents
* * * MERMAC -- TEST ANALYSIS AND QUESTIONNAIRE PACKAGE * * *
ITEM 1 PERCENT OF CORRECT RESPONSE BY FIFTHS MATRIX OF RESPONSES BY FIFTHS E IS CORRECT RESPONSE
A B C D (E) OMIT
1ST + * 1ST 0 25 1 0 102 0
2ND + * 2ND 1 45 6 0 75 0
3RD + * 3RD 1 63 5 3 49 0
4TH + * 4TH 2 76 9 0 34 0
5TH + * 5TH 11 73 13 4 5 0
+----+----+----+----+----+----+----+----+----+
0 10 20 30 40 50 60 70 80 90 100 PROP 0.02 0.47 0.06 0.01 (0.44) 0.00
RPBI -0.20 -0.33 -0.20 -0.13 (0.51) 0.00
ITEM 2 PERCENT OF CORRECT RESPONSE BY FIFTHS MATRIX OF RESPONSES BY FIFTHS A IS CORRECT RESPONSE
(A) B C D E OMIT
1ST + * 1ST 83 35 10 0 0 0
2ND + * 2ND 19 85 23 0 0 0
3RD + * 3RD 17 67 37 0 0 0
4TH + * 4TH 13 78 30 0 0 0
5TH + * 5TH 6 84 16 0 0 0
+----+----+----+----+----+----+----+----+----+
0 10 20 30 40 50 60 70 80 90 100 PROP (0.23) 0.57 0.19 0.00 0.00 0.00
RPBI (0.43)-0.33 -0.05 0.00 0.00 0.00
ITEM 3 PERCENT OF CORRECT RESPONSE BY FIFTHS MATRIX OF RESPONSES BY FIFTHS E IS CORRECT RESPONSE
A B C D (E) OMIT
1ST * 1ST 2 125 0 1 0 0
2ND +* 2ND 6 109 0 8 4 0
3RD + * 3RD 14 86 4 7 10 0
4TH + * 4TH 23 71 2 19 6 0
5TH + * 5TH 29 45 8 15 8 1
+----+----+----+----+----+----+----+----+----+
0 10 20 30 40 50 60 70 80 90 100 PROP 0.12 0.72 0.02 0.08 (0.05) 0.00
RPBI-0.24 0.45 -0.16 -0.17 (0.13)-0.14
ITEM 4 PERCENT OF CORRECT RESPONSE BY FIFTHS MATRIX OF RESPONSES BY FIFTHS E IS CORRECT RESPONSE
A B C D (E) OMIT
1ST + * 1ST 0 0 0 0 128 0
2ND + * 2ND 0 0 0 0 127 0
3RD + * 3RD 0 0 0 0 121 0
4TH + * 4TH 0 0 0 0 121 0
5TH + * 5TH 0 0 0 0 106 0
+----+----+----+----+----+----+----+----+----+
0 10 20 30 40 50 60 70 80 90 100 PROP 0.00 0.00 0.00 0.00 (1.00) 0.00
RPBI 0.00 0.00 0.00 0.00 (0.00) 0.00
Table of Contents
|
|
|
|
| |
Kathy Duvall
Measurement and Evaluation
Room 247 Armory, MC-528
505 East Armory Avenue Champaign, IL 61820
(217) 333-3490, e-mail
kduvall@illinois.edu |
|
 |
University of Illinois
at Urbana-Champaign |