Skip to Content

Few would challenge the claim that teachers make a difference for student learning.  Many, however, question the connection between defining "teacher effectiveness" by looking at student test scores or using them to evaluate teachers.

Since the Kansas Individual Data on Students (KIDS) system will soon make it possible to connect student data to individual teachers (even though both will be identified only by randomly generated ID numbers), it is important to understand the problematic nature of simplistic application of that possibility.


For years, measurement experts have cautioned against using student achievement test scores for measuring teacher performance.  As early as 1974, Gene Glass pointed out that such use is invalid and unreliable.

More recently, states and school systems have begun using various "value-added" statistical models that use student test scores to assess teacher effectiveness.  The state of Tennessee has the longest history in this arena, but even there, William Sanders, who developed the first model, acknowledged statistical difficulties that would have to be overcome for a value-added model to be accurate.

Another researcher, Ray Fenton, raised this concern: "My friends who are statisticians tell me, if the N is 50, generalizing may be OK. If it is less than 30, don't do it."  [N refers to the number of students whose scores are used to evaluate an individual teacher.]

The current scene

Most recently, the National Comprehensive Center for Teacher Quality released a series of reports that provide the most up to date information about evaluating teachers and the strengths and weaknesses of using student achievement data to evaluate teachers.  NCCTQ is a collaborative effort between ETS, the Educational Testing Service, Learning Point Associates and Vanderbilt University and funded by the US Department of Education.

In short, they raise the following concerns about using student test scores to evaluate teachers:

  • It assumes perfect alignment between the test and the curriculum.
  • It assumes that the tests reflect improvement, which standardized tests often do not.
  • It assumes test performance = student knowledge and skills, even though test performance is often affected by other influences [motivation, attitude, test taking skills, etc.].
  • It averages test scores across all students in a classroom.
  • It assumes that teachers are solely accountable for student learning, ignoring the influences of families, peers, the school and school system, etc.

Drawing conclusions about individual teachers based on student test scores is not valid since teachers are not randomly assigned to schools and students are not randomly assigned to teachers. That said, there are potential positive uses for student data linked to individual teachers:

  • Teachers can use the data for formative assessment purposes, to help with instructional decisions prior to a summative assessment.
  • Teachers and schools can use the data to identify areas for professional development.
  • Teachers and schools can use the data to identify groups of students with specific needs that should be addressed.


Subscribe to the Works4Me newsletter and never miss a great tip!

Enter your e-mail address:

Embed This Page (x)

Select and copy this code to your clipboard