Home | Instructional Development | Measurement & Evaluation |

 

Improving Your Test Questions

Table of Contents

Choosing between Objective and Subjective Test Items

Suggestions for Using and Writing Test Items

Multiple Choice

True-False

Matching

Completion

Essay

Problem Solving

Performance

Two Methods for Assessing Test Item Quality

Assistance Offered by The Center for Teaching Excellence (CTE)

References for Further Reading

 

I. CHOOSING BETWEEN OBJECTIVE AND SUBJECTIVE TEST ITEMS

There are two general categories of test items: (1) objective items which require students to select the correct response from several alternatives or to supply a word or short phrase to answer a question or complete a statement; and (2) subjective or essay items which permit the student to organize and present an original answer. Objective items include multiple-choice, true-false, matching and completion, while subjective items include short-answer essay, extended-response essay, problem solving and performance test items. For some instructional purposes one or the other item types may prove more efficient and appropriate. To begin out discussion of the relative merits of each type of test item, test your knowledge of these two item types by answering the following questions.

Test Item Quiz

 

(circle the correct answer)

1.

Essay exams are easier to construct than are objective exams.

T

F

?

2.

Essay exams require more thorough student preparation and study time than objective exams.

T

F

?

3.

Essay exams require writing skills where objective exams do not.

T

F

?

4.

Essay exams teach a person how to write.

T

F

?

5.

Essay exams are more subjective in nature than are objective exams.

T

F

?

6.

Objective exams encourage guessing more so than essay exams.

T

F

?

7.

Essay exams limit the extent of content covered.

T

F

?

8.

Essay and objective exams can be used to measure the same content or ability.

T

F

?

9.

Essay and objective exams are both good ways to evaluate a student's level of knowledge.

T

F

?

Quiz Answers

1.

TRUE

Essay items are generally easier and less time consuming to construct than are most objective test items. Technically correct and content appropriate multiple-choice and true-false test items require an extensive amount of time to write and revise. For example, a professional item writer produces only 9-10 good multiple-choice items in a day's time.

2.

?

According to research findings it is still undetermined whether or not essay tests require or facilitate more thorough (or even different) student study preparation.

3.

TRUE

Writing skills do affect a student's ability to communicate the correct "factual" information through an essay response. Consequently, students with good writing skills have an advantage over students who have difficulty expressing themselves through writing.

4.

FALSE

Essays do not teach a student how to write but they can emphasize the importance of being able to communicate through writing. constant use of essay tests may encourage the knowledgeable but poor writing student to improve his/her writing ability in order to improve performance.

5.

TRUE

Essays are more subjective in nature due to their susceptibility to scoring influences. Different readers can rate identical responses differently, the same reader can rate the same paper differently over time, the handwriting, neatness or punctuation can unintentionally affect a paper's grade and the lack of anonymity can affect the grading process. While impossible to eliminate, scoring influences or biases can be minimized through procedures discussed later in this booklet.

6.

?

Both item types encourage some form of guessing. Multiple-choice, true-false and matching items can be correctly answered through blind guessing, yet essay items can be responded to satisfactorily through well written bluffing.

7.

TRUE

Due to the extent of time required by the student to respond to an essay question, only a few essay questions can be included on a classroom exam. Consequently, a larger number of objective items can be tested in the same amount of time, thus enabling the test to cover more content.

8.

TRUE

Both item types can measure similar content or learning objectives. Research has shown that students respond almost identically to essay and objective test items covering the same content. Studies1 by Sax & Collet (1968) and Paterson (1926) conducted forty-two years apart reached the same conclusion:

"...there seems to be no escape from the conclusions that the two types of exams are measuring identical things." (Paterson, p. 246)

This conclusion should not be surprising; after all, a well written essay item requires that the student (1) have a store of knowledge, (2) be able to relate facts and principles, and (3) be able to organize such information into a coherent and logical written expression, whereas an objective test item requires that the student (1) have a store of knowledge, (2) be able to relate facts and principles, and (3) be able to organize such information into a coherent and logical choice among several alternatives.

9.

TRUE

Both objective and essay test items are good devices for measuring student achievement. However, as seen in the previous quiz answers, there are particular measurement situations where one item type is more appropriate than the other. Following is a set of recommendations for using either objective or essay test items: (Adapted from Robert L. Ebel, Essentials of Educational Measurement, 1972, p. 144).

1Gilbert Sax and LeVerne S. Collet, "An Empirical Comparison of the Effects of Recall and Multiple-Choice Tests on Student Achievement," Journal of Educational Measurement, vol. 5 (1968), 169-73.

Donald G. Paterson, "Do New and Old Type Examinations Measure Different Mental Functions?" School and Society, vol. 24. (August 21, 1926), 246-48.

WHEN TO USE ESSAY OR OBJECTIVE TESTS

Essay tests are especially appropriate when:

  • the group to be tested is small and the test is not to be reused.
  • you wish to encourage and reward the development of student skill in writing.
  • you are more interested in exploring the student's attitudes than in measuring his/her achievement.
  • you are more confident of your ability as a critical and fair reader than as an imaginative writer of good objective test items.

Objective tests are especially appropriate when:

  • the group to be tested is large and the test may be reused.
  • highly reliable test scores must be obtained as efficiently as possible.
  • impartiality of evaluation, absolute fairness, and freedom from possible test scoring influences (e.g., fatigue, lack of anonymity) are essential.
  • you are more confident of your ability to express objective test items clearly than of your ability to judge essay test answers correctly.
  • there is more pressure for speedy reporting of scores than for speedy test preparation.

Either essay or objective tests can be used to:

  • measure almost any important educational achievement a written test can measure.
  • test understanding and ability to apply principles.
  • test ability to think critically.
  • test ability to solve problems.
  • test ability to select relevant facts and principles and to integrate them toward the solution of complex problems.

In addition to the preceding suggestions, it is important to realize that certain item types are better suited than others for measuring particular learning objectives. For example, learning objectives requiring the student to demonstrate or to show, may be better measured by performance test items, whereas objectives requiring the student to explain or to describe may be better measured by essay test items. The matching of learning objective expectations with certain item types can help you select an appropriate kind of test item for your classroom exam as well as provide a higher degree of test validity (i.e., testing what is supposed to be tested). To further illustrate, several sample learning objectives and appropriate test items are provided on the following page.

Learning Objectives

Most Suitable Test Item

The student will be able to categorize and name the parts of the human skeletal system.

Objective Test Item (M-C, T-F, Matching)

The student will be able to critique and appraise another student's English composition on the basis of its organization.

Essay Test Item (Extended-Response)

The student will demonstrate safe laboratory skills.

Performance Test Item

The student will be able to cite four examples of satire that Twain uses in Huckleberry Finn.

Essay Test Item (Short-Answer)

After you have decided to use either an objective, essay or both objective and essay exam, the next step is to select the kind(s) of objective or essay item that you wish to include on the exam. To help you make such a choice, the different kinds of objective and essay items are presented in the following section of this booklet. The various kinds of items are briefly described and compared to one another in terms of their advantages and limitations for use. Also presented is a set of general suggestions for the construction of each item variation.

Table of Contents


II. SUGGESTIONS FOR USING AND WRITING  TEST ITEMS

 

Multiple-choice test items

The multiple-choice item consists of two parts: (a) the stem, which identifies the question or problem and (b) the response alternatives. Students are asked to select the one alternative that best completes the statement or answers the question. For example,

Sample multiple-Choice Item

(a)

Item Stem: Which of the following is a chemical change?

(b)

Response Alternatives:

a.

Evaporation of alcohol

b.

Freezing of water

*c.

Burning of oil

d.

Melting of wax

*correct response

Advantages in Using Multiple-Choice Items

Multiple-choice items can provide ...

·         versatility in measuring all levels of cognitive ability.

·         highly reliable test scores.

·         scoring efficiency and accuracy.

·         objective measurement of student achievement or ability.

·         a wide sampling of content or objectives.

·         a reduced guessing factor when compared to true-false items.

·         different response alternatives which can provide diagnostic feedback.

Limitations in Using Multiple-Choice Items

Multiple-choice items ...

·         are difficult and time consuming to construct.

·         lead an instructor to favor simple recall of facts.

·         place a high degree of dependence on the student's reading ability and instructor's writing ability.

SUGGESTIONS FOR WRITING MULTIPLE-CHOICE TEST ITEMS

The Stem

1.

When possible, state the stem as a direct question rather than as an incomplete statement.

Undesirable:

Alloys are ordinarily produced by ...

Desirable:

How are allows ordinarily produced?

 

2.

Present a definite, explicit and singular question or problem in the stem.

Undesirable:

Psychology ...

Desirable:

The science of mind and behavior is called ...

 

3.

Eliminate excessive verbiage or irrelevant information from the stem.

Undesirable:

While ironing her formal, Jane burned her hand accidently on the hot iron. This was due to a transfer of heat be ...

Desirable:

Which of the following ways of heat transfer explains why Jane's hand was burned after she touched a hot iron?

 

4.

Include in the stem any word(s) that might otherwise be repeated in each alternative.

Undesirable:

In national elections in the United States the President is officially

a.

chosen by the people.

b.

chosen by members of Congress.

c.

chosen by the House of Representatives.

*d.

chosen by the Electoral College.

 

Desirable:

In national elections in the United States the President is officially chosen by

a.

the people.

b.

members of Congress.

c.

the House of Representatives.

*d.

the Electoral college.

 

5.

Use negatively stated stems sparingly. When used, underline and/or capitalize the negative word.

Undesirable:

Which of the following is not cited as an accomplishment of the Kennedy administration?

Desirable:

Which of the following is NOT cited as an accomplishment of the Kennedy administration? Item Alternatives

 

6.

Make all alternatives plausible and attractive to the less knowledgeable or skillful student.

What process is most nearly the opposite of photosynthesis?

  Undesirable   Desirable
a. Digestion a. Digestion
b. Relaxation b. Assimilation
*c. Respiration *c. Respiration
d. Exertion d. Catabolism

 

 

7. Make the alternatives grammatically parallel with each other, and consistent with the stem.

Undesirable:

What would do most to advance the application of atomic discoveries to medicine?

*a.

Standardized techniques for treatment of patients.

b.

Train the average doctor to apply radioactive treatments.

c.

Remove the restriction on the use of radioactive substances.

d.

Establishing hospitals staffed by highly trained radioactive therapy specialists.

 

Desirable:

What would do most to advance the application of atomic discoveries to medicine?

*a.

Development of standardized techniques for treatment of patients.

b.

Training of the average doctor in application of radioactive treatments.

c.

Removal of restriction on the use of radioactive substances.

d.

Addition of trained radioactive therapy specialists to hospital staffs.

 

 

8. Make the alternatives mutually exclusive.

Undesirable:

The daily minimum required amount of milk that a 10 year old child should drink is

a.

1-2 glasses.

*b.

2-3 glasses.

*c.

3-4 glasses.

d.

at least 4 glasses.

 

Desirable:

What is the daily minimum required amount of milk a 10 year old child should drink?

a.

1 glass.

b.

2 glasses.

*c.

3 glasses.

d.

4 glasses.

 

9. When possible, present alternatives in some logical order (e.g., chronological, most to least, alphabetical).

At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour and the other truck averages 38 miles per hour. At what time will they be 24 miles apart?

Undesirable

 

Desirable

 

a.

6 p.m.

 

a.

1 a.m.

 

b.

9 p.m.

 

b.

6 a.m.

 

c.

1 a.m.

 

c.

9 a.m.

 

*d.

1 p.m.

 

*d.

1 p.m.

 

e.

6 a.m.

 

e.

6 p.m.

 

10. Be sure there is only one correct or best response to the item.

Undesirable:

The two most desired characteristics in a classroom test are validity and

a.

precision.

*b.

reliability.

c.

objectivity.

*d.

consistency.

 

Desirable:

The two most desired characteristics in a classroom test are validity and

a.

precision.

*b.

reliability.

c.

objectivity.

d.

standardization.

 

11. Make alternatives approximately equal in length.

Undesirable:

The most general cause of low individual incomes in the United States is

*a.

lack of valuable productive services to sell.

b.

unwillingness to work.

c.

automation.

d.

inflation.

 

Desirable:

What is the most general cause of low individual incomes in the United States?

*a.

A lack of valuable productive services to sell.

b.

The population's overall unwillingness to work.

c.

The nation's increased reliance on automation.

d.

an increasing national level of inflation.

 

 

12. Avoid irrelevant clues such as grammatical structure, well known verbal associations or connections between stem and answer.

Undesirable:
(grammatical
clue)

A chain of islands is called an:

*a.

archipelago.

b.

peninsula.

c.

continent.

d.

isthmus.

 

Undesirable:
(verbal
association
clue)

The reliability of a test can be estimated by a coefficient of:

a.

measurement.

*b.

correlation.

c.

testing.

d.

error.

 

Undesirable:
(connection
between stem
and answer clue)

The height to which a water dam is built depends on

a.

the length of the reservCTE behind the dam.

b.

the volume of water behind the dam.

*c.

the height of water behind the dam.

d.

the strength of the reinforcing wall.

 

13. Use at least four alternatives for each item to lower the probability of getting the item correct by guessing.

14. Randomly distribute the correct response among the alternative positions throughout the test having approximately the same proportion of alternatives a, b, c, d and e as the correct response.

Use the alternatives "none of the above" and "all of the above" sparingly. When used, such alternatives should occasionally be used as the correct response.

Table of Contents


TRUE-FALSE TEST ITEMS

A true-false item can be written in one of three forms: simple, complex, or compound. Answers can consist of only two choices (simple), more than two choices (complex), or two choices plus a conditional completion response (compound). An example of each type of true-false item follows:

Sample True-False Item: Simple

The acquisition of morality is a developmental process.

True

False

Sample True-False Item: Complex

The acquisition of morality is a developmental process.

True

False

Opinion

Sample True-False Item: Compound

The acquisition of morality is a developmental process.

If this statement is false, what makes it false?

True

False


Advantages in Using True-False Items

True-False items can provide ...

  • the widest sampling of content or objectives per unit of testing time.
  • scoring efficiency and accuracy.
  • versatility in measuring all levels of cognitive ability.
  • highly reliable test scores.
  • an objective measurement of student achievement or ability.

Limitations in Using True-False Items

True-false items ...

  • incorporate an extremely high guessing factor. For simple true-false items, each student has a 50/50 chance of correctly answering the item without any knowledge of the item's content.
  • can often lead an instructor to write ambiguous statements due to the difficulty of writing statements which are unequivocally true or false.
  • do not discriminate between students of varying ability as well as other item types.
  • can often include more irrelevant clues than do other item types.
  • can often lead an instructor to favor testing of trivial knowledge.

SUGGESTIONS FOR WRITING TRUE-FALSE TEST ITEMS

1.

Base true-false items upon statements that are absolutely true or false, without qualifications or exceptions.

Undesirable:

Nearsightedness is hereditary in origin.

Desirable:

Geneticists and eye specialists believe that the predisposition to nearsightedness is hereditary.

 

2.

Express the item statement as simply and as clearly as possible.

Undesirable:

When you see a highway with a marker that reads, "Interstate 80" you know that the construction and upkeep of that road is built and maintained by the state and federal government.

Desirable:

The construction and maintenance of interstate highways is provided by both state and federal governments.

 

3.

Express a single idea in each test item.

Undesirable:

Water will boil at a higher temperature if the atmospheric pressure on its surface is increased and more heat is applied to the container.

Desirable:

Water will boil at a higher temperature if the atmospheric pressure on its surface is increased.
and/or
Water will boil at a higher temperature if more heat is applied to the container.

 

4.

Include enough background information and qualifications so that the ability to respond correctly to the item does not depend on some special, uncommon knowledge.

Undesirable:

The second principle of education is that the individual gathers knowledge.

Desirable:

According to John Dewey, the second principle of education is that the individual gathers knowledge.

 

5.

Avoid lifting statements from the text, lecture or other materials so that memory alone will not permit a correct answer.

Undesirable:

For every action there is an opposite and equal reaction.

Desirable:

If you were to stand in a canoe and throw a life jacket forward to another canoe, chances are your canoe would jerk backward.

 

6.

Avoid using negatively stated item statements.

Undesirable:

The Supreme Court is not composed of nine justices.

Desirable:

The Supreme is composed of nine justices.

 

7.

Avoid the use of unfamiliar vocabulary.

Undesirable:

According to some politicians, the raison d'etre for capital punishment is retribution.

Desirable:

According to some politicians, justification for the existence of capital punishment is retribution.

 

8.

Avoid the use of specific determiners which would permit a test-wise but unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like "all," "always," "none," "never," "impossible," "inevitable," etc. Statements including such terms are likely to be false. On the other hand, statements using qualifying determiners such as "usually," "sometimes," "often," etc., are likely to be true. When statements do require the use of specific determiners, make sure they appear in both true and false items.

Undesirable:

All sessions of Congress are called by the President. (F)

The Supreme Court is frequently required to rule on the constitutionality of a law.  (T)

An objective test is generally easier to score than an essay test.  (T)

Desirable:

(When specific determiners are used reverse the expected outcomes.)

The sum of the angles of a triangle is always 1800 . (T)

Each molecule of a given compound is chemically the same as every other molecule of that compound. (T)

The galvanometer is the instrument usually used for the metering of electrical energy used in a home. (F)

 

9.

False items tend to discriminate more highly than true items. Therefore, use more false items than true items (but no more than 15% additional false items).

Table of Contents


MATCHING TEST ITEMS

In general, matching items consist of a column of stimuli presented on the left side of the exam page and a column of responses placed on the right side of the page. Students are required to match the response associated with a given stimulus. For example,

Sample Matching Test Item

Directions:

On the line to the left of each factual statement, write the letter of the principle which bests explains the statement's occurrence. Each principle may be used more than once.

 

Factual Statements

Principles

1.

Fossils of primates first appear in the Cenozoic rock strata, while trilobite remains are found in the Proterozoic rocks.

2.

The Arctic and Antarctic regions are sparsely populated.

3.

Plants have no nervous system.

4.

Large coal beds exist in Alaska.

 

a.

There have been profound changes in the climate on earth.

b.

Coordination and integration of action is generally slower in plants than in animals.

c.

There is an increasing complexity of structure and functions from lower to higher forms of life.

d.

All life comes from life and produces its own kind of living organisms.

e.

Light is a limiting factor to life.

 

Advantages in Using Matching Items

Matching items

  • require short periods of reading and response time, allowing you to cover more content.
  • provide objective measurement of student achievement or ability.
  • provide highly reliable test scores.
  • provide scoring efficiency and accuracy.

Limitations in Using Matching Items

Matching items

  • have difficulty measuring learning objectives requiring more than simple recall of information.
  • are difficult to construct due to the problem of selecting a common set of stimuli and responses.

SUGGESTIONS FOR WRITING MATCHING TEST ITEMS

1.

Include directions which clearly state the basis for matching the stimuli with the responses. Explain whether or not a response can be used more than once and indicate where to write the answer.

Undesirable:

Directions:

Match the following.

Desirable:

Directions:

On the line to the left of each identifying location and characteristics in Column I, write the letter of the country in Column II that is best defined. Each country in Column II may be used more than once.

 

2.

Use only homogeneous material in matching items.

Undesirable:

Directions: Match the following.

 

1.

___

Water

A.

NaCl

2.

___

Discovered Radium

B.

Fermi

3.

___

Salt

C.

NH3

4.

___

Year of the 1st Nuclear Fission by Man

D.

H2O

5.

___

Ammonia

E.

1942

 

 

 

F.

Curie

 

Desirable:

Directions:

On the line to the left of each compound in Column I, write the letter of the compound's formula presented in Column II. Use each formula only once.

 

Column I

Column II

1.

___

Water

A.

H2SO4

2.

___

Salt

B.

HCl

3.

___

Ammonia

C.

NaCl

4.

___

Sulfuric Acid

D.

H2O

E.

H2HCl

 

 

3.

Arrange the list of responses in some systematic order if possible (e.g., chronological, alphabetical).

Directions:  On the line to the left of each definition in Column I, write the letter of the defense mechanism in Column II that is described. Use each defense mechanism only once.

      Undesirable   Desirable

Column I

   

Column II

   
____1. Hunting for reasons to support one's beliefs. a. Rationalization a. Denial of reality
____2. Accepting the values and norms of others as one's own even when they are contrary to previously held values. b. Identification b. Identification
____3. Attributing to others one's own unacceptable impulses, thoughts and desires. c. Projection c. Introjection
____4. Ignoring disagreeable situations, topics, sights. d. Introjection d. Projection
    e. Denial of Reality e. Rationalization

 

 

Avoid grammatical or other clues to the correct response.

Undesirable:

Directions: Match the following in order to complete the sentences on the left.

 

___

1.

Igneous rocks are formed

A.

a hardness of 7.

___

2.

The formation of coal requires

B.

with crystalline rock.

___

3.

A geode is filled

C.

a metamorphic rock.

___

4.

Feldspar is classified as

D.

heat and pressure.

 

 

 

E.

through the solid-ification of molten lava.

 

Desirable:

Avoid sentence completion due to grammatical clues.

 

Keep matching items brief, limiting the list of stimuli to under 10.
Include more responses than stimuli to help prevent answering through the process of elimination.
When possible, reduce the amount of reading time by including only short phrases or single words in the response list. Table of Contents

COMPLETION TEST ITEMS

The completion item requires the student to answer a question or to finish an incomplete statement by filling in a blank with the correct word or phrase. For example,

Sample Completion Item


According to Freud, personality is made up of three major systems, the _________, the ________ and the ________.

Advantages in Using Completion Items

Completion items

  • can provide a wide sampling of content.
  • can efficiently measure lower levels of cognitive ability.
  • can minimize guessing as compared to multiple-choice or true-false items.
  • can usually provide an objective measure of student achievement or ability.

Limitations in Using Completion Items

Completion items

  • are difficult to construct so that the desired response is clearly indicated.
  • have difficulty measuring learning objectives requiring more than simple recall of information.
  • can often include more irrelevant clues than do other item types.
  • are more time consuming to score when compared to multiple-choice or true-false items.
  • are more difficult to score since more than one answer may have to be considered correct if the item was not properly prepared.

SUGGESTIONS FOR WRITING COMPLETION TEST ITEMS

1.

Omit only significant words from the statement.

Undesirable:

Every atom has a central (core) called a nucleus.

Desirable:

Every atom has a central core called a(n) (nucleus) .

 

2.

Do not omit so many words from the statement that the intended meaning is lost.

Undesirable:

The ___________were to Egypt as the____________were to Persia and as __________were to the early tribes of Israel.

Desirable:

The Pharaohs were to Egypt as the__________were to Persia and as ____________were to the early tribes of Israel.

 

3.

Avoid grammatical or other clues to the correct response.

Undesirable:

Most of the United States' libraries are organized according to the (Dewey) decimal system.

Desirable:

Which organizational system is used by most of the United States' libraries? (Dewey decimal)

 

4.

Be sure there is only one correct response.

Undesirable:

Trees which shed their leaves annually are seed-bearing, common).