Read this Article on History and Research Overview
Assessment is as old as instruction. Speaking of reading comprehension assessment, Pearson and Hamm comment, “Although reading comprehension assessment as a formal identifiable activity is a 20th century phenomenon, it has been part of classrooms for as long as there have been schools, required texts, students who are required to read them, and teachers wanting to know whether students understood them” (p. 145).
The scientific measurement of reading began to appear in the first decade of the twentieth century. In 1909, Edward Lee Thorndike of Teachers College, Columbia University, introduced a handwriting scale, which was published the following year (Smith, 1967). The publication of his scale marked the introduction of scientific measurement in reading and writing. Other scales and assessments soon followed. The Gray Standard Oral Reading Paragraphs was published in 1915. A much revised version, the Gray Oral Reading Test (GORT) is still in use. Still, as Smith (1965) notes, “Reading was the last of the tools subjects to yield itself to the testing movement.” One reason for the delay was that oral reading predominated and didn’t lend itself to standardized assessment. Nevertheless, between 1915 and 1918, four standardized tests of silent reading were published. For the most part, the tests measured speed of reading and comprehension. The appearance of silent reading tests fostered the practice of silent reading in the schools, proving once again that what gets tested gets taught. Ironically, those early tests used retelling scores as a measure of comprehension.
Meanwhile, intelligence testing was having an impact on education. In his preface to Terman’s description of the Stanford-Binet Intelligence Test, the first individual intelligence test created in the United States, Cubberly (1916) commented that:
The educational significance of the results to be obtained from careful measurement of the intelligence of children can hardly be overestimated. Questions relating to the choice of studies, vocational guidance, schoolroom procedures, the grading of pupils, promotional schemes, the study of the retardation of children in the schools, juvenile delinquency, and the proper handling of subnormals on one hand and gifted children on the other,—all alike acquire new meaning and significance when viewed in the light of the measurement of intelligence. …. (pp. vii to viii)
Citing statistics that indicated between a third and a half of school children repeated one or more grades, Terman (1916) proposed that the solution was to measure students’ mental ability and then base instruction on that ability. “The remedy, of course, is to measure out the work for each child in proportion to his mental ability” (p. 4). Terman’s remedy was based on the assumption that intelligence was fixed and not impacted by the environment and that the Stanford-Binet would reveal the child’s “true Intelligence.” Both of these assumptions have been proved to be false. Research over the years has demonstrated that intelligence is a difficult concept to define, and, consequently, to measure and that is affected by the environment. However, the Stanford-Binet and a number of group intelligence tests that came after it were used widely and played an important role in educational decisions. The discrepancy definition of a learning disability has its roots in Terman’s belief that a test of mental ability provided a criterion for the level of work a student should be able to do.
In the 1930s, technology had a dramatic impact on the format of tests. IBM introduced a system of machine scoring. With the introduction of machine scoring, multiple choice items and scorable answer sheets became widespread. There was an increase in the use of group reading and group intelligence tests.
In 1953, Wilson Taylor created the cloze procedure. The advantage of cloze was that it would measure comprehension without the interference of comprehension questions, which could be tricky or subjective. Cloze was popular for a time but is apparently rarely used in its classical form. It is now mostly used in an adapted form in which the reader selects from three to five options. This, of course, changes the nature of the task from one of prediction to one of selection. Modified cloze is used in several currently published tests, including the Degrees of Reading Power, Scholastic Inventory, and STAR. It is also used in mazes, a curriculum based measure in which readers complete as many items as they can in two and a half or three minutes.
Impact of the Cognitive Revolution and Reader Response Theory
With the switch from a behavioral orientation to a cognitive one, reading tests also changed. Because reading came to be seen as the construction of meaning, retelling became popular. Systems for evaluating the quality of the retellings were created. Think-alouds, in which readers were asked to tell what they were thinking as they read, were also used to assess comprehension. Think-alouds are available for students reading on a sixth-grade level and above in the Qualitative Reading Inventory-4 (Allyn & Bacon). The TARA: Think-aloud Reading Assessment (Monti & Cicchettti, 1996) combines interviews and think-alouds. Designed to provide information about the textbase and situation model, TARA assesses fluency, reading rate, miscues, pre-reading strategies, prior knowledge, comprehension monitoring, fix-up strategies, and retelling. Assessment was also affected by reader response theory. In reader response theory, there is a transaction between the reader and the text so that both are impacted. The reader is changed by the text, and the text is changed by the reader. Although guided by the text, the reader’s response is affected by the reader’s background, so that aesthetic responses tend to be personalized. Tests began featuring longer texts and texts that were authentic.
Introduction of NAEP
Assessment was also influenced by the National Assessment of Educational Progress (NAEP). First administered in 1969, NAEP reading tests are now administered every two years to a national sample of students. Frameworks based on prevailing theories of comprehension were used as a basis for constructing the tests. Early versions emphasized analysis and interpretation. The framework for tests administered from 1992 through 2008, emphasized reader response. The 2009 Framework represents a cognitive approach to describing skills and strategies. It describes skills and strategies in terms of the cognitive process needed to implement the skill and includes three levels: locate and recall, integrate and interpret, and critique and evaluate.
Role of the Informal Inventory
One of the most frequently used assessments is the informal reading inventory, which uses a series of graded passages to determine students’ reading levels. Emmett Betts is typically credited with creating the informal reading inventory. In 1941, Betts reported on the use of informal reading inventories in the Reading Clinic at Pennsylvania State College (Johns, 2008). The inventory yielded four levels: independent, instructional, frustrational, and listening capacity. Criteria were validated in a study by Killgallon (1942). As originally designed, teachers had students read passages from increasingly advanced basal readers until the students’ reading levels were established. The inventories were constructed by teachers. The first commercially produced inventories were created by McCracken (1964) and Silvaroli (1969). Silvaroli created a shortened inventory—The Classroom Inventory—that could be administered by the classroom teacher in as little as ten minutes. The Classroom Inventory is now in its tenth edition and is one of more than a dozen commercially produced inventories.
Informal inventories were influenced by Goodman’s (1974) miscue theory. Instead of regarding students’ misreadings as errors, Goodman analyzed them according to their semantic, syntactic, and graphic similarity to the misread word and dubbed these misreadings “miscues.” An analysis of miscues provided insight into the students’ reading processes and could be used to plan instruction. Authors of commercial inventories adapted miscue analysis. Miscue analysis also became a prominent part of running records. Running records, which are based on the concept of the informal reading inventory, have been used to monitor the progress of reading recovery and other students, plan instruction, and determine the suitability of texts being used.
Curriculum-Based Assessment and Curriculum-Based Measurement
Curriculum-based assessment (CBA) is designed to assess students’ academic skills based on instructional materials actually used by students (Shapiro, 1996). A CBA IRA would use the texts that students are reading in their language arts or content area classes. “A curriculum-based assessment (CBA) is a criterion-referenced test that is teacher constructed and designed to reflect curriculum content” (Idol, L., Nevin, A., Paolucci-Whitcomb, 1999, p. ix). CBA can also include work samples to evaluate students’ progress.
Curriculum-Based Measurement (CBM) is a form of CBA. CBM grew out of work being conducted at the University of Minnesota in order to help special education teachers build more effective programs. The intent was to create valid and reliable formative indicators of progress that could be administered efficiently and frequently (Deno, 1985). Since traditional CBA measures were tied to a particular curriculum and therefore might only reflect mastery of specific content rather than acquisition of generalized skills, tasks were chosen that were not tied to a particular curriculum but which assessed general skill acquisition. For instance, measures of oral reading fluency would be a general outcome measure that would indicate acquisition of decoding skills and fluency but would not be tied to a particular reading program. Tasks were also chosen that could be measured repeatedly and thus used to monitor students’ progress frequently. In reading, general outcome measure include naming the letters of the alphabet, reading lists of words, oral reading fluency, and completing maze passages. One of the most widely used CBMs is the DIBELS, which contains oral reading passages, measures of phonological awareness, and reading of nonsense words.
With the implementation of Reading First, a program designed to help students in grades K-3 living in impoverished areas, CBM came into wide use. More traditional measures lacked the technical adequacy required by Reading First and other federally sponsored programs. Response to Intervention requires progress monitoring as explained below:
To ensure that underachievement in a child suspected of having a specific learning disability is not due to lack of appropriate instruction in reading or math, the group must consider, as part of the evaluation described in 34 CFR 300.304 through 300.306:
• Data that demonstrate that prior to, or as a part of, the referral process, the child was provided appropriate instruction in regular education settings, delivered by qualified personnel; and
• Data-based documentation of repeated assessments of achievement at reasonable intervals, reflecting formal assessment of student progress during instruction, which was provided to the child’s parents. (U. S. Department of Education, 2006)
CBMs fit the description of the assessments called for by RTI. However, other measures are also being used. CBMs, while adequate for assessing lower-level, skills such as fluency and decoding, do not have the assessment power needed to monitor comprehension. A current key issue in assessment is obtaining instruments that have technical adequacy but which measure essential skills and strategies in an authentic manner (Pearson & Hamm, 2005).
1. What are the different assessment tools are you familiar with?
2. Do you think as a reading specialist the use of intelligent tests should be administered? Why or why not?
3. Which are the best practices assessment tools based on those analyze above?
4. What are some informal measures that might be used to assess literacy development?
5. As a teacher do you honestly think portfolios are valuable?
6. Do you presently have portfolios in your classrooms? Should the idea of portfolios be discouraged or encouraged and why