Haslam’s TNReady Pit

Chalkbeat has a story demonstrating just how out of touch outgoing Governor Bill Haslam is. The story details Haslam’s belief that tying teacher evaluation to TNReady results is a key element in Tennessee’s recent education success.

Here’s some truth: Over the past few years, Tennessee has seen high school graduation rates and average ACT scores climb while also seeing the number of students requiring remediation at state schools decline. All of that is encouraging. All of it happened in a climate where the TNReady test was unreliable and poorly administered. In other words, Tennessee’s testing system had nothing to do with student performance. All other indicators point to teachers getting the job done and students hitting ever higher marks.

Here’s more truth:

Does basing teacher evaluation on student test scores get results that impact student outcomes?

No.

That’s the conclusion from a years-long study funded by the Gates Foundation that included Memphis/Shelby County Schools.

It’s also worth noting that while Haslam touts the “fastest-improving” NAEP results from back in 2013, further evidence suggests the results then were likely an outlier.

Here’s more from Chalkbeat:

Gov. Bill Haslam says he had a “pit” in his stomach every day of Tennessee’s testing season this spring when a parade of technical problems vexed students and teachers in the bumpy transition to computerized exams.

He also worries that three straight years of frustrations with the state’s 3-year-old standardized assessment, TNReady, could unravel policies that he believes led to students’ gains on national tests.

“Do we really want to go back? Do we really want to go back to when Tennessee was in the 40s out of the states ranked 1 to 50?”  the outgoing Republican governor asked recently in an exclusive interview with Chalkbeat.

First, no serious policymaker is suggesting Tennessee adopt weaker or lower standards for students.

Second, as noted above, other significant indicators demonstrate Tennessee students are improving — even without a reliable annual test.

Third, Haslam’s “beliefs” about policies have not been tested on a statewide level – in part due to the failure of his own Administration to execute the tests. Haslam has allowed Commissioner of Education Candice McQueen to keep her job despite multiple testing failures with different vendors. In fact, Haslam joined McQueen in touting a “new” testing vendor that turned out to actually be the parent company of the current vendor.

More from Chalkbeat:

“Hopefully Tennessee and the new administration won’t have the same struggles we’ve had this year with testing. But there will be some struggles; there just are by the very nature of it,” he said. “I worry that the struggles will cause us to say, ‘OK, we give. We’re no longer going to have an evaluation that’s tied to an assessment.’

To this, I’d note that experts suggest no state has had a more tumultuous transition to online testing than Tennessee:

“I’m not aware of a state that has had a more troubled transition” to online testing, said Douglas A. Levin of the consulting group EdTech Strategies.

In terms of an evaluation tied to an assessment, even if TNReady had gone well, the results in the initial years would not be in any way valid for use in teacher evaluation. That’s because the nature of value-added assessment requires multiple years of similar testing in order to produce results that are even vaguely reliable predictors of teacher performance. Here’s a bit more on that:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

But, TNReady hasn’t gone well. At all. It’s been so bad, the Department of Education has been unveiling a bunch of pie charts to demonstrate how they are attempting to correlate test scores and teacher evaluation. First, it went like this:

Second, this chart is crazy. A teacher’s growth score is factored on tests from three different years and three types of tests.

15% of the growth score comes from the old TCAP (the test given in 2014-15, b/c the 2015-16 test had some problems). Then, 10% comes from last year’s TNReady, which was given on paper and pencil. Last year was the first year of a full administration of TNReady, and there were a few problems with the data calculation. A final 10% comes from this year’s TNReady, given online.

So, you have data from the old test, a skipped year, data from last year’s test (the first time TNReady had truly been administered), and data from this year’s messed up test.

There is no way this creates any kind of valid score related to teacher performance. At all.

After teachers expressed outrage that the DOE was going to count this year’s scores in their evaluations, the legislature finally took action and passed legislation that said teachers could face “no adverse action” based on this year’s test results.

So, now the Department of Education has more pie charts and a lot of explanations:

What is included in teacher evaluation generally?

There are many factors that go into a teacher’s overall evaluation. One of those, the individual growth component (in gray in the charts in this document), is typically based on a three-year TVAAS measure if data is available. However, for the phase-in period there are two key items to note for the growth component:

• If the current single-year year growth score – in this case, 2017-18 data – provides the educator with a higher overall composite, it will be used as the full growth score.

• Additionally, if a teacher has 2017-18 TNReady data included in any part of their evaluation, they will be able to nullify their entire LOE this year.

What is included in teacher evaluation in 2017-18 for a teacher with 3 years of TVAAS data?

There are three composite options for this teacher:

• Option 1: TVAAS data from 2017-18 will be factored in at 10%, TVAAS data from 2016-17 will be factored in at 10% and TVAAS data from 2015-16 will be factored in at 15% if it benefits the teacher.

• Option 2: TVAAS data from 2017-18 and 2016-17 will be factored in at 35%.

• Option 3: TVAAS data from 2017-18 will be factored in at 35%. The option that results in the highest LOE for the teacher will be automatically applied. Since 2017-18 TNReady data is included in this calculation, this teacher may nullify his or her entire LOE this year.

And if you only have one or two years of TVAAS data or if you teach in a non-tested subject? Well, the key line continues to apply: Since 2017-18 TNReady data is included in this calculation, this teacher may nullify his or her entire LOE this year.

What does this mean? Well, it means you’d have a year with no evaluation score. Sounds fine, right? No. It’s not fine. In order to achieve tenure, a teacher must have consecutive years of evaluation scores at Level 4 or 5. But a year with no score at all means that teacher would then need to have TWO MORE YEARS of high scores in order to be tenure eligible. While it seems unlikely a teacher would choose to nullify their entire score if they achieved a high rank, it also seems only fair to allow that teacher to simply exclude the TNReady data and receive their LOE rating based on all the other factors that go into a TEAM rating.

But wait, excluding 2017-18 TNReady data is NOT an option provided. It’s either count it as 10%, count it as 35%, or nullify your entire LOE score. Doing so could certainly have an adverse impact on a teacher.

In short, the TNReady mess has made teacher evaluation a mess. Still, a host of indicators suggest Tennessee’s teachers are hitting the mark. One might conclude that tying a suspect teacher evaluation model to an unreliable test is, in fact, not the key to educational progress in our state. Unfortunately, Governor Bill Haslam has concluded the opposite.

For more on education politics and policy in Tennessee, follow @TNEdReport

Help keep the education news coming!