Teacher Evaluation: In search of the Holy Grail


Teacher Evaluation: In search of the Holy Grail

Wednesday, May 18th, 2011 - 14:07
How to measure the performance of the nation’s teachers and schools? It seems likely that this may be the single most contentious measurement-related issue confronting federal, state and local leaders today.

Standardized tests are praised by some as a rational route to evaluating individual teachers or districts, while others vilify the technique as an outright attack on the entire educational profession. This debate, highly politicized, has emerged in many press articles as the beginning and the end of discussions about teacher evaluation.

Barrett and GreeneThe Race to the Top program, passed as part of the stimulus package, however, has encouraged a fair amount of experimentation in this area. The grant competition awarded cash to a number of states that were able to demonstrate they had strong and potentially sensible ideas relating to reform in schools, Of course, one of the big areas for recommended reforms involved methods for evaluating success.

A few years ago, secretary of education Arne Duncan described the Race to the Top as a “moon shot,” in that it was a first-ever opportunity to break with failed policies of the past. It doesn’t appear as if any of the states have yet planted their flag on the lunar surface, though. But one thing that’s clear is that the status quo in years past – in which many places made meager efforts to evaluate their educational systems or teachers – is running counter to the trend to make sure that taxpayers are getting the best bang for their buck for the dollars they hand over to government.

 “The status quo around teacher evaluation in the past, in many places was equivalent to educational malpractice,” says  Dr. Raymond Pecheone, the Executive Director of the Stanford Center for Assessment, Learning, and Equity (SCALE). “The data that came out of it wasn't trustworthy and there was little or no evidence that there was validity and reliability with respect to teacher competence. Race to the Top has transformed that conversation"

Though we have come to believe that Race to the Top has real potential, it’s only fair to point out that the program has its critics; many of whom complain about particular approaches being taken John Ewing, president of Math for America, for example, criticizes the degree to which people are talking about so-called value-added measurement methods. He reflects on some school superintendents he has seen planning to set salaries based on test score changes in summer school; a process he calls “breathtakingly stupid.”

But whether it’s a question of standardized tests, value-added or any other means, a growing number of experts in the states are pushing for a broad mixture of measures, intended to paint a complete portrait. Said one, “I'm not against tests, but they are only part of a much larger collection of judgments.”

Maryland is leading in the effort to create a spectrum of measures which leaders there hope will be both fair and useful to educational policy makers. In 2010, the state received $250 million in Race to the Top money, to be spread out over four years.  “Maryland has committed to multiple measures,” says Mary L. Gable, Assistant State Superintendent in the Division of Academic Policy of the Maryland State Department of Education. “If you link it to just one assessment there is a concern” that the students results on a particular day won’t be “reflective of all that is happening on the classroom. . . . We are putting an emphasis on multiple measures so that he teachers evaluation is not just one assessment that  occurs in May" What’s more, about half the teachers in the state work in non-tested areas, like art, so relying on tests to evaluate them is a physical impossibility.

Gable describes some of the measures Maryland is considering—these are tentative and not set in stone:

  • Benchmark testing which would measure how well the students are learning small chunks of information at points throughout the school year. The advantage here is that it would provide the capacity to see where students or classes are falling behind long before it is too late (and long before any kind of broad-scale standardized test could be given.
  • Examining the work of an entire school or district in moving students forward from the beginning to the end of the year in an effort to see how educators work together to improve an entire school.
  • Developing portfolios – actual samples of student’s work. For example if they were writing essays—use a rubric to assess a sample of work at the beginning of the year and also applying a rubric to that work at the end of the year,” explains Gable.

Of course, all of these are quantitative components. And educational leaders in Maryland, as well as other states are increasingly interested in using qualitative measures that look at things like planning, preparation, classroom environment and so on. Such measures make up about half of Maryland’s evaluation.

Some states are considering moving to a system of self-evaluation by teachers. In Ohio, for example,  teachers would assess themselves, identify strengths and weaknesses and go over ideas with an administrator. This process could also incorporate some student results to see how well the perception aligns with the data.

Though there’s reason for optimism in such efforts, at best, evaluating teachers is fraught with peril. In Georgia, the 80,000-member Professional Association of Georgia Educators runs a so-called “star teacher/star student program,” according to Tim Callahan a spokesman for the organization. Each year, the group selects outstanding high school seniors from across the state and then asks them to name and describe the characteristics of the teacher who had the greatest impact on them. There’s a fair amount of uniformity in the kinds of adjectives the students use to describe the teachers they’ve chosen as outstanding. Four of the top terms the students generally use to describe their selected teachers are that he or she knew the subject; exhibited enthusiasm; seemed to take a personal interest in the student; and had a sense of humor.

Callahan’s point is that there’s no system he’s seen that actually evaluates teachers on these four characteristics – even though they appear to be the ones that matter most to students. “We have something that captures [the idea that] teaching is almost as much art as it is science, “ he says.

As much as we enjoy seeing states trying different approaches, we have to admit to a certain amount of concern about rushing to find the final, one-size-fits-all approach. Holy Grails are pretty hard to find in the real world. “I wouldn't rush to judgment about putting policies in place that are not well researched,” says  Pecheone. “One of the worries is that policies get embedded in regulations or statutes without having critical support. That doesn’t mean you have to wait decades for the research, but at a minimum those decisions should be made once you start collecting data."


Photo credit:  Paul Stevenson