Validating the Invalid?

The Tennessee House of Representatives passed legislation today (HB 108) that makes changes to current practice in teacher evaluation as Tennessee transitions to its new testing regime, TNReady.

The changes adjust the percentage of a teacher’s evaluation that is dependent on TVAAS scores to 10% next year, 20% the following year, and back to the current 35% by the 2017-18 academic year.

This plan is designed to allow for a transition period to the new TNReady tests which will include constructed-response questions and be aligned to the so-called Tennessee standards which match up with the Common Core State Standards.

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Clearly, legislators feel like at the very least, this is an improvement. A reasonable accommodation to teachers as our state makes a transition.

But, how is using 10% of an invalid number a good thing? Should any part of a teacher’s evaluation be made up of a number that reveals nothing at all about that teacher’s performance?

While value-added data alone is a relatively poor predictor of teacher performance, the value-added estimate used next year is especially poor because it is not at all valid.

But, don’t just take my word for it. Researchers studying the validity of value-added measures asked whether value-added gave different results depending on the type of question asked. Particularly relevant now because Tennessee is shifting to a new test with different types of questions.

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

It seems likely that the Senate will follow the House’s lead on Monday and overwhelmingly support the proposed evaluation changes. But in doing so, they should be asking themselves if it’s really ok to base any part of a teacher’s evaluation on numbers that reliably predict nothing.

More on Value-Added:

Real World Harms of Value-Added Data

Struggles with Value-Added Data

 

Do Your Job, Get Less Money

Over at Bluff City Ed, there’s an article analyzing the new pay scale for teachers in Shelby County Schools. The scale is weighted toward TVAAS data and the evaluation rubric, which rates teachers on a scale of 1-5, 1 being significantly below expectations and 5 being significantly above. A teacher earning a 3 “meets expectations.” That means they are doing their job and doing it well.

Jon does a nice job of breaking down what it means to “meet expectations.” But, here’s the problem he’s highlighting:  Teachers who meet expectations in the new system would see a reduction in their annual step raise. That’s right: They do their job and meet the district’s performance expectations and yet earn LESS than they would with the current pay system.

Jon puts it this way:

But what the district outlines as meeting expectations exemplifies a hardworking and effective educator who is making real progress with their community, school and students. If a teacher is doing all these things, I believe that they should be in line for a yearly raise, not a cut. At its core, this new merit pay system devalues our teachers who fulfill their professional duties in every conceivable way.

I would add to this argument that to the extent that the new pay scale is based on a flawed TVAAS system which provides minimal differentiation among teachers, it is also flawed. Value-added data does not reveal much about the differences in teacher performance. As such, this data shouldn’t weigh heavily (or at all) in performance pay schemes.

Systems like Shelby County may be better served by a pay scale that starts teachers at a high salary and rewards them well over time. Increasing pay overall creates the type of economic incentives that both attract strong teachers and encourage school systems to develop talent and counsel out low performers.

Shelby County can certainly do more to attract and retain strong teaching talent. But the new pay scale is the wrong way to achieve that goal.

For more on education politics and policy in Tennessee, follow @TNEdReport

 

NEA President Visits Nashville

National Education Association President Lily Eskelsen Garcia was in Nashville today to kickoff American Education Week.

While in town, the visited Shwab Elementary where she toured the school and served as a guest teacher in a first grade classroom.

After the tour and class visit, Garcia was available to the media.

Here are some highlights of what she had to say:

On education policymaking:

“Policymakers should respect educators. We don’t need top-down management of teachers. We need to trust teachers and treat them like professionals. When we begin trusting teachers and providing them with resources, we’ll unleash a true revolution education.”

On Common Core:

Garcia says she was initially a Common Core skeptic. But says she reviewed the standards for 6th grade, which she taught, and found them to be reasonable. She said Common Core is and should be a state initiative.

“Common Core belongs to the states and states should adapt it to meet their needs. In order for Common Core to work, we need to get back to trusting teachers. Common Core sets the standard. Teachers should decide how to meet those standards. Where Common Core has failed, it is because of top-down management. Implementation must include teachers and trust teachers to meet the standards.”

On Value-Added Modeling:

“Voodoo value-added models are silly. They are silly because the voodoo formula can’t control for factors like poverty that impact kids. They can’t control for the fact that a kid may be hungry or may be an English Language Learner taking a test in English instead of their native language.

“I was the Utah Teacher of the Year. I know that kids are more than a test score. I’m not afraid of evaluation, I welcome it. Data can be helpful, but high-stakes use of value-added data is not appropriate.”

On NEA’s Education Agenda:

“NEA wants to end No Child Left Untested,” Garcia said. “2014 is the magic year when all kids were supposed to be proficient. Now, we’ve got a waiver process because that goal is simply not possible with human students. This just shows that NCLB was a fraud.

“NEA wants the federal government to set standards and provide resources and then listen to teachers and local communities.”

On Tennessee Senator Lamar Alexander’s Agenda with the Health, Education, Labor and Pensions (HELP) Committee:

“NEA shares common ground with Sen. Alexander on the need for local control and an end to the waiver process for NCLB. We also agree with him on the need to focus more on National Board Certification for teachers.

“Where we differ with Sen. Alexander is on his push for privatization, whether it be vouchers or charters. If Sen. Alexander respects science and data, he’ll see that charters and vouchers simply don’t work.”

On creating an “all-choice” zone in East Nashville:

Garcia said she wasn’t familiar with the specifics of the East Nashville plan, but said, “Whenever you see people pushing grand plans to expand charters, they’re just not reading the research. The research shows that charters aren’t any better than district schools.”

She also suggested that the few charter success stories happen as a result of significant outside money being poured in. “If districts saw that kind of money coming into their schools, they’d see a difference, too.”

For more on education politics and policy in Tennessee, follow @TNEdReport