Standardized Tests
Problems with Individually
Administered Tests
Time required to administer test
Need for trained examiners
Unsuited for administration to large numbers of
Group Intelligence Tests
Robert M. Yerkes
Army Alpha & Army Beta tests for
WWI recruits (1917)
These tests initiated mass testing
Within a few years of the war’s end,
mass testing moved to the schools
8,000 students took the SAT when it
was first administered in 1926
Nearly 3 million take it annually now
Items from Army Beta Test
Group Tests of Intelligence:
The Cognitive Abilities Test (COGAT)
Latest revision is form 6 (2001)
Includes a kindergarten level, 2 levels for grades
1 & 2, and 8 levels (A to H) for grades 3 to 12
Each level is printed in a separate booklet
Levels A to H
Contain the same nine subtests, grouped into three
Each subtest preceded by practice exercises with
detailed explanations
Provides three separate scores: a verbal, quantitative &
nonverbal score
Scores have mean of 100, standard deviation of 16
Reliability & Validity
Reliabilities in the .90’s for each of the scores
Good validity: correlates with other tests &
school grades
Correlates with scores in social studies, math,
first grade reading, musical ability, even social
Nonverbal Group Tests:
Raven’s Standard Progressive Matrices
Developed in UK by J.C. Raven (1938)
Can be administered to individuals or groups
aged 5 to elderly adult
Consists of 60 matrices, each containing a
logical pattern or design with a missing part, of
increasing difficulty
Reliability & Validity
Internal consistency studies using either the split-half method
corrected for length or KR20 estimates result in values ranging
from .60 to .98, with a median of .90
Test-retest correlations range from a low of .46 for an elevenyear interval to a high of .97 for a two-day interval. The median
test-retest value is approximately .82.
test-retest coefficients for several age groups: .88 (13 yrs. plus),
.93 (under 30 yrs.), .88 (30-39 yrs.), .87 (40-49 yrs.), .83 (50 yrs.
and over).
Concurrent validity coefficients between the SPM and the
Stanford-Binet and Weschler scales range between .54 and .88,
with the majority in the .70s and .80s.
Benefits of Using SPM
Can be used without any verbal instructions
with young children, culturally deprived,
language-handicapped, brain-injured individuals
Minimizes the effects of language & culture
Differences between African Americans &
Caucasians are less (7 or 8 points) with RPM
than with SB or Wechsler scales
Goodenough-Harris Drawing Test
Individual instructed to draw a picture of a
whole man & do the best job possible
Respondents given credit for each item included
in drawings
Each detail given 1 point (to a total of 70)
Raw scores converted to standard scores with a
mean of 100, s.d. of 15, using age norms
Reliability & Validity
Reliabilities (split-half, test-retest, inter-scorer)
range from high .60’s to low .90’s
Scores level off at ages 14 or 15, so can only be
used with younger children
Reasonable validity; correlation with standard
IQ tests in one study was .81
Tests of Aptitude & Achievement
Used in making decisions about admission to
universities at the undergraduate level, graduate
level, and to business & professional schools
Referred to as “high stakes” tests because of the
impact they have on people’s lives
The Scholastic Assessment Test
Until 1995, known as the Scholastic Aptitude
Has been in use since 1926
Most widely used of university entrance tests
Given to nearly 3 million students each year
Newest form was introduced in March 2005, for
entry into university in fall of 2006
There is a Reasoning Tests (general aptitude test)
and Subject Tests in various subjects
Reasoning Test (formerly SAT-I)
“The SAT Reasoning Test is a measure of the
critical thinking skills you'll need for academic
success in college. The SAT assesses how well
you analyze and solve problems—skills you
learned in school that you'll need in college.”
Three sections:
Critical reading
 Mathematics
 Writing
Each section of the SAT is scored on a scale of
200-800, and the writing section generates two
administered seven times a year in the U.S.,
Puerto Rico, and U.S. Territories, and six times a
year in other countries.
Critical Reading Section
Reading comprehension,
sentence completions, and
paragraph-length critical
Hoping to _______ the
dispute, negotiators proposed
a compromise that they felt
would be _______ to both
labor and management.
(A) enforce . . useful
(B) end . . divisive
(C) overcome . . unattractive
(D) extend . . satisfactory
(E) resolve . . acceptable
Mathematics Section
Content: Number and
operations; algebra and
functions; geometry;
statistics, probability, and
data analysis
Item-types: Five-choice
multiple-choice questions
and student-produced
Writing Section
Multiple choice questions (35 min.) and studentwritten essay (25 min.)
E.g., The following sentences test your ability to recognize grammar
and usage errors. Each sentence contains either a single error or no
error at all. No sentence contains more than one error. The error, if
there is one, is underlined and lettered. If the sentence contains an
error, select the one underlined part that must be changed to make the
sentence correct. If the sentence is correct, select choice E. In
choosing answers, follow the requirements of standard written English.
The other delegates (A) and him (B) immediately (C) accepted the
resolution drafted (D) by the neutral states. No error (E)
Subject Tests (formerly SAT-II)
Subject Tests are designed to measure students'
knowledge and skills in particular subject areas,
as well as their ability to apply that knowledge.
Students take the Subject Tests to demonstrate
to universities their mastery of specific subjects
like English, history, mathematics, science, and
Reliability & Validity
Studies of old SAT show high internal consistency
(>.90), test-retest reliability (>.85 over 10 months)
Predictive validity of test, using university grades as the
criterion, is quite high
May 4, 2005
SAT Essay Test Rewards Length and Ignores Errors
Graduate Record Exam (GRE)
One of the most commonly used tests for graduateschool entrance
Used in combination with undergraduate grades, letters of
recommendation in selecting students for graduate school
General Test produces three scores:
Verbal (GRE-V)
Quantitative (GRE-Q)
Analytic (GRE-A)
Subject Tests in biology, chemistry, literature, psychology,
All scores have a mean of 500, standard deviation of 100
GRE Structure
Present your perspective
Analyze an argument
Sentence Completions
Reading Comprehension
Data analysis
Sample Questions
See http://www.gre.org/
Reliability & Validity
Stability (test-retest) & split-half reliability is good
Predictive validity “far from convincing” (Kaplan &
Saccuzzo, 2005, p. 330)
Correlations between GRE and grade point average are
low (.22 to .33 in one study, accounting for 5 to 10% of
High false negative rates
When combined with undergraduate grades, correlated
.63 with graduate grade point average
See http://www.fairtest.org/facts/gre.htm
High Stakes Tests in the Schools
Several states in the US, Great Britain, New Zealand
have implemented national testing programs
Bill Clinton’s proposal in 1997 to implement nationwide testing aroused considerable debate
In 1999 National Academy of Sciences published
report entitled “High Stakes: Testing for Tracking,
Promotion & Graduation”
Generally supported testing, but expressed concern that
test results are commonly misinterpreted &
misunderstanding of test results can damage individuals
Testing in Canda
A number of provinces, including Alberta &
Ontario, administer standardized ability tests to
all students in their jurisdictions
In Ontario, these tests are coordinated by the
Education Quality & Accountability Office
Budget for EQAO: approximately $50 million
The Ontario Secondary School
Literacy Test (OSSLT)
given every fall to assess the reading and writing
abilities of Grade 10 students
Students must pass the OSSLT in order to
obtain an Ontario Secondary School diploma
Students who don’t pass can retake the test an
unlimited number of times
Their school transcript will only list whether or
not they passed the OSSLT, not how many times
they attempted the test.
OSSLT (continued)
Reading: Students are given examples of different types
of reading selections. They are then tested on their
comprehension of what they have read.
Writing: Students are required to write four different
types of work
A summary
An opinion piece
An information paragraph
A news report
EQAO changes to standardized testing make them less disruptive
but do not address the fundamental validity of the tests
September 23, 2004
(Toronto) - “The changes to standardized testing in Ontario’s schools
announced by the Education Quality and Accountability Office (EQAO)
today do not address the fundamental question posed by educators
and parents as to whether the testing is in fact valid,” said Rhonda
Kimberley-Young, president of the Ontario Secondary School
Teachers’ Federation.”
“These changes will mean that these intrusive tests will not disrupt the
learning of students to the same degree as they have until now, but
simply making the tests shorter and changing how the results are
reported does not mean that the testing is any way a valid measure of
student achievement.
“Teachers and educational workers believe the Ontario government
should now take the next logical step and immediately conduct a
validity study of the standardized testing taking place in Ontario
“The EQAO and the testing it is conducting is a multi million dollar
expense. At a time when financial resources for schools and students
are stretched, OSSTF believes these education dollars would be far
better spent on meeting the educational needs of students,” concluded
OSSTF Position on Grade 10
The EQAO Grade 10 literacy test is not a fair measure.
The test is not administered consistently across the province.
It is impossible to standardize preparation and administration
conditions in a standardized test.
According to Alfie Kohn, who crusades against standardized
tests in the United States, socioeconomic status accounts for
"an overwhelming proportion of the variance in test scores".
Time is taken away from the regular curriculum in preparing
for the test. Student anxiety affects learning in other areas.
OSSTF Criticism (cont’d)
The EQAO Grade 10 literacy test is not a valid measure of
student reading and writing.
The test is very heavily weighted to writing.
Students need over 60% in BOTH reading and writing to pass.
No marked tests will be returned. Students who fail receive limited, vague
There are very few funds or opportunities to provide help to students
who perform poorly or fail.
Instructions for questions are unclear. On a question which asked for one
paragraph, students who wrote more than one paragraph failed the
question because they did not follow the instructions exactly.
EQAO is secretive and will reveal neither the marking criteria nor what
constitutes a pass.
OSSTF Criticisms (cont’d)
Cost of administering the tests
The cost of last year’s literacy test was $15
million at the same time as there were textbook
shortages, and cuts to library, music, guidance,
educational assistants and support staff.
Canadian Teachers Federation
High stakes testing
Encourages “teaching to the test”
Creates a situation in which students struggling with the
material or who have special needs are seen as a liability
because their low score influences averages
Squeezes “non-tested” subjects out of the curriculum
Are frequently biased against certain groups of students
Perpetuates the idea that a good education equals high test
Transfers control over curriculum to the body that controls
the exam
Not long ago, a widely respected middle-school teacher in
Wisconsin, famous for helping students design their own
innovative learning projects, stood up at a community meeting
and announced that he "used to be" a good teacher. The
auditorium fell silent at his use of the past tense. These days, he
explained, he just handed out textbooks and quizzed his students
on what they had memorized. The reason was very simple. He
and his colleagues were increasingly being held accountable for
raising test scores. The kind of wide-ranging and enthusiastic
exploration of ideas that once characterized his classroom could
no longer survive when the emphasis was on preparing students
to take a standardized examination.
Benefits of Standardized Tests
Allow for identification of children with
problems, so that remediation can take place
Allow for identification of schools that may
need extra resources
Increases accountability of school to parents,
Boards of Education, government
What do you think about
standardized tests?
