Standardized Tests Problems with Individually Administered Tests Time required to administer test Expense Need for trained examiners Unsuited for administration to large numbers of people Group Intelligence Tests Robert M. Yerkes Army Alpha & Army Beta tests for WWI recruits (1917) These tests initiated mass testing Within a few years of the war’s end, mass testing moved to the schools 8,000 students took the SAT when it was first administered in 1926 Nearly 3 million take it annually now Items from Army Beta Test Group Tests of Intelligence: The Cognitive Abilities Test (COGAT) Latest revision is form 6 (2001) Includes a kindergarten level, 2 levels for grades 1 & 2, and 8 levels (A to H) for grades 3 to 12 Each level is printed in a separate booklet Levels A to H Contain the same nine subtests, grouped into three batteries: Verbal Quantitative Nonverbal Each subtest preceded by practice exercises with detailed explanations Provides three separate scores: a verbal, quantitative & nonverbal score Scores have mean of 100, standard deviation of 16 Reliability & Validity Reliabilities in the .90’s for each of the scores Good validity: correlates with other tests & school grades Correlates with scores in social studies, math, first grade reading, musical ability, even social status Nonverbal Group Tests: Raven’s Standard Progressive Matrices Developed in UK by J.C. Raven (1938) Can be administered to individuals or groups aged 5 to elderly adult Consists of 60 matrices, each containing a logical pattern or design with a missing part, of increasing difficulty Reliability & Validity Internal consistency studies using either the split-half method corrected for length or KR20 estimates result in values ranging from .60 to .98, with a median of .90 Test-retest correlations range from a low of .46 for an elevenyear interval to a high of .97 for a two-day interval. The median test-retest value is approximately .82. test-retest coefficients for several age groups: .88 (13 yrs. plus), .93 (under 30 yrs.), .88 (30-39 yrs.), .87 (40-49 yrs.), .83 (50 yrs. and over). Concurrent validity coefficients between the SPM and the Stanford-Binet and Weschler scales range between .54 and .88, with the majority in the .70s and .80s. Benefits of Using SPM Can be used without any verbal instructions with young children, culturally deprived, language-handicapped, brain-injured individuals Minimizes the effects of language & culture Differences between African Americans & Caucasians are less (7 or 8 points) with RPM than with SB or Wechsler scales Goodenough-Harris Drawing Test Individual instructed to draw a picture of a whole man & do the best job possible Respondents given credit for each item included in drawings Each detail given 1 point (to a total of 70) Raw scores converted to standard scores with a mean of 100, s.d. of 15, using age norms Reliability & Validity Reliabilities (split-half, test-retest, inter-scorer) range from high .60’s to low .90’s Scores level off at ages 14 or 15, so can only be used with younger children Reasonable validity; correlation with standard IQ tests in one study was .81 Tests of Aptitude & Achievement Used in making decisions about admission to universities at the undergraduate level, graduate level, and to business & professional schools Referred to as “high stakes” tests because of the impact they have on people’s lives The Scholastic Assessment Test Until 1995, known as the Scholastic Aptitude Test Has been in use since 1926 Most widely used of university entrance tests Given to nearly 3 million students each year Newest form was introduced in March 2005, for entry into university in fall of 2006 There is a Reasoning Tests (general aptitude test) and Subject Tests in various subjects Reasoning Test (formerly SAT-I) “The SAT Reasoning Test is a measure of the critical thinking skills you'll need for academic success in college. The SAT assesses how well you analyze and solve problems—skills you learned in school that you'll need in college.” Three sections: Critical reading Mathematics Writing Each section of the SAT is scored on a scale of 200-800, and the writing section generates two subscores. administered seven times a year in the U.S., Puerto Rico, and U.S. Territories, and six times a year in other countries. Critical Reading Section Reading comprehension, sentence completions, and paragraph-length critical reading Hoping to _______ the dispute, negotiators proposed a compromise that they felt would be _______ to both labor and management. (A) enforce . . useful (B) end . . divisive (C) overcome . . unattractive (D) extend . . satisfactory (E) resolve . . acceptable Mathematics Section Content: Number and operations; algebra and functions; geometry; statistics, probability, and data analysis Item-types: Five-choice multiple-choice questions and student-produced responses Writing Section Multiple choice questions (35 min.) and studentwritten essay (25 min.) E.g., The following sentences test your ability to recognize grammar and usage errors. Each sentence contains either a single error or no error at all. No sentence contains more than one error. The error, if there is one, is underlined and lettered. If the sentence contains an error, select the one underlined part that must be changed to make the sentence correct. If the sentence is correct, select choice E. In choosing answers, follow the requirements of standard written English. Example: The other delegates (A) and him (B) immediately (C) accepted the resolution drafted (D) by the neutral states. No error (E) Subject Tests (formerly SAT-II) Subject Tests are designed to measure students' knowledge and skills in particular subject areas, as well as their ability to apply that knowledge. Students take the Subject Tests to demonstrate to universities their mastery of specific subjects like English, history, mathematics, science, and language. Reliability & Validity Studies of old SAT show high internal consistency (>.90), test-retest reliability (>.85 over 10 months) Predictive validity of test, using university grades as the criterion, is quite high May 4, 2005 ON EDUCATION SAT Essay Test Rewards Length and Ignores Errors By MICHAEL WINERIP http://www.nytimes.com/2005/05/04/education/04ed ucation.html?ei=5090&en=94808505ef7bed5a&ex=127 2859200&partner=rssuserland&emc=rss&pagewanted =print&position= Graduate Record Exam (GRE) One of the most commonly used tests for graduateschool entrance Used in combination with undergraduate grades, letters of recommendation in selecting students for graduate school General Test produces three scores: Verbal (GRE-V) Quantitative (GRE-Q) Analytic (GRE-A) Subject Tests in biology, chemistry, literature, psychology, etc. All scores have a mean of 500, standard deviation of 100 GRE Structure GRE (General) GRE-V GRE-Q GRE-A Antonyms Arithmetic Present your perspective Analogies Algebra Analyze an argument Sentence Completions Geometry Reading Comprehension Data analysis Sample Questions See http://www.gre.org/ Reliability & Validity Stability (test-retest) & split-half reliability is good Predictive validity “far from convincing” (Kaplan & Saccuzzo, 2005, p. 330) Correlations between GRE and grade point average are low (.22 to .33 in one study, accounting for 5 to 10% of variance) High false negative rates When combined with undergraduate grades, correlated .63 with graduate grade point average See http://www.fairtest.org/facts/gre.htm High Stakes Tests in the Schools Several states in the US, Great Britain, New Zealand have implemented national testing programs Bill Clinton’s proposal in 1997 to implement nationwide testing aroused considerable debate In 1999 National Academy of Sciences published report entitled “High Stakes: Testing for Tracking, Promotion & Graduation” Generally supported testing, but expressed concern that test results are commonly misinterpreted & misunderstanding of test results can damage individuals Testing in Canda A number of provinces, including Alberta & Ontario, administer standardized ability tests to all students in their jurisdictions In Ontario, these tests are coordinated by the Education Quality & Accountability Office (EQAO) Budget for EQAO: approximately $50 million annually The Ontario Secondary School Literacy Test (OSSLT) given every fall to assess the reading and writing abilities of Grade 10 students Students must pass the OSSLT in order to obtain an Ontario Secondary School diploma Students who don’t pass can retake the test an unlimited number of times Their school transcript will only list whether or not they passed the OSSLT, not how many times they attempted the test. OSSLT (continued) Reading: Students are given examples of different types of reading selections. They are then tested on their comprehension of what they have read. Writing: Students are required to write four different types of work A summary An opinion piece An information paragraph A news report EQAO changes to standardized testing make them less disruptive but do not address the fundamental validity of the tests September 23, 2004 (Toronto) - “The changes to standardized testing in Ontario’s schools announced by the Education Quality and Accountability Office (EQAO) today do not address the fundamental question posed by educators and parents as to whether the testing is in fact valid,” said Rhonda Kimberley-Young, president of the Ontario Secondary School Teachers’ Federation.” “These changes will mean that these intrusive tests will not disrupt the learning of students to the same degree as they have until now, but simply making the tests shorter and changing how the results are reported does not mean that the testing is any way a valid measure of student achievement. “Teachers and educational workers believe the Ontario government should now take the next logical step and immediately conduct a validity study of the standardized testing taking place in Ontario schools. “The EQAO and the testing it is conducting is a multi million dollar expense. At a time when financial resources for schools and students are stretched, OSSTF believes these education dollars would be far better spent on meeting the educational needs of students,” concluded Kimberley-Young. OSSTF Position on Grade 10 The EQAO Grade 10 literacy test is not a fair measure. The test is not administered consistently across the province. It is impossible to standardize preparation and administration conditions in a standardized test. According to Alfie Kohn, who crusades against standardized tests in the United States, socioeconomic status accounts for "an overwhelming proportion of the variance in test scores". Time is taken away from the regular curriculum in preparing for the test. Student anxiety affects learning in other areas. OSSTF Criticism (cont’d) The EQAO Grade 10 literacy test is not a valid measure of student reading and writing. The test is very heavily weighted to writing. Students need over 60% in BOTH reading and writing to pass. No marked tests will be returned. Students who fail receive limited, vague feedback. There are very few funds or opportunities to provide help to students who perform poorly or fail. Instructions for questions are unclear. On a question which asked for one paragraph, students who wrote more than one paragraph failed the question because they did not follow the instructions exactly. EQAO is secretive and will reveal neither the marking criteria nor what constitutes a pass. OSSTF Criticisms (cont’d) Cost of administering the tests The cost of last year’s literacy test was $15 million at the same time as there were textbook shortages, and cuts to library, music, guidance, educational assistants and support staff. Canadian Teachers Federation High stakes testing Encourages “teaching to the test” Creates a situation in which students struggling with the material or who have special needs are seen as a liability because their low score influences averages Squeezes “non-tested” subjects out of the curriculum Are frequently biased against certain groups of students Perpetuates the idea that a good education equals high test scores Transfers control over curriculum to the body that controls the exam Not long ago, a widely respected middle-school teacher in Wisconsin, famous for helping students design their own innovative learning projects, stood up at a community meeting and announced that he "used to be" a good teacher. The auditorium fell silent at his use of the past tense. These days, he explained, he just handed out textbooks and quizzed his students on what they had memorized. The reason was very simple. He and his colleagues were increasingly being held accountable for raising test scores. The kind of wide-ranging and enthusiastic exploration of ideas that once characterized his classroom could no longer survive when the emphasis was on preparing students to take a standardized examination. Benefits of Standardized Tests Allow for identification of children with problems, so that remediation can take place Allow for identification of schools that may need extra resources Increases accountability of school to parents, Boards of Education, government What do you think about standardized tests?