The Worst Teachers?

“There is a decently large percentage of teachers who are saying that they feel evaluation isn’t fair,” he (state data guru Nate Schwartz) said. “That’s something we need to think about in the process we use to evaluate teachers … and what we can do to make clear to teachers how this process works so they feel more secure about it.”

This from a story about the recently released 2015 Educator Survey regarding teacher attitudes in Tennessee.

One reason teachers might feel the evaluation is unfair is the continued push to align observation scores with TVAAS (Tennessee Value-Added Assessment System) data – data that purportedly captures student growth and thereby represents an indicator of teacher performance.

From WPLN:

Classroom observation scores calculated by principals should roughly line up with how a teacher’s students do on standardized tests. That’s what state education officials believe. But the numbers on the state’s five point scale don’t match up well.

“The gap between observation and individual growth largely exists because we see so few evaluators giving 1s or 2s on observation,” the report states.

“The goal is not perfect alignment,” Department of Education assistant commissioner Paul Fleming says, acknowledging that a teacher could be doing many of the right things at the front of the class and still not get the test results to show for it. But the two figures should be close.

In order to be better at aligning observation scores with TVAAS scores, principals could start by assigning lower scores to sixth and seventh grade teachers. At least, that’s what the findings of a study by Jessica Holloway-Libell published in June in the Teachers College Record suggest.

Holloway-Libell studied value-added scores assigned to individual schools in 10 Tennessee districts — Urban and suburban — and found:

In ELA in 2013, schools were, across the board, much more likely to receive positive value-added scores for ELA in fourth and eighth grades than in other grades (see Table 1). Simultaneously, districts struggled to yield positive value-added scores for their sixth and seventh grades in the same subject-areas. Fifth grade scores fell consistently in the middle range, while the third-grade scores varied across districts

Table 1. Percent of Schools that had Positive Value-Added Scores in English/language arts by Grade and District (2013) (Districts which had less than 25% of schools indicate positive growth are in bold)
District      Third      Fourth    Fifth     Sixth     Seventh      Eighth
Memphis      41%       43%        45%      19%        14%           76%
Nashville      NA        43%        28%      16%        15%           74%
Knox             72%       79%        47%      14%         7%            73%
Hamilton     38%      64%        48%      33%      29%            81%
Shelby           97%     76%         61%       6%        50%            69%
Sumner         77%     85%         42%       17%      33%            83%
Montgomery NA      71%         62%       0%        0%              71%
Rutherford     83%   92%         63%      15%     23%             85%
Williamson    NA      88%        58%      11%      33%           100%
Murfreesboro NA     90%        50%     30%     NA              NA

SOURCE: Teachers College Record, Date Published: June 08, 2015
http://www.tcrecord.org ID Number: 17987, Date Accessed: 7/27/2015

In examining three-year averages, Holloway-Libell found:

The three-year composite scores were similar except even more schools received positive value-added scores for the fifth and eighth grades. In fact, in each of the nine districts that had a composite score for eighth grade, at least 86% of their schools received positive value-added scores at the eighth-grade level.

By contrast, results in math were consistently positive across grade level and district type:

In particular, the fourth and seventh grade-level scores were consistently higher than those of the third, fifth, sixth, and eighth grades, which illustrated much greater variation across districts. The three-year composite scores were similar. In fact, a majority of schools across the state received positive value-added scores in mathematics across all grade levels.

So, what does this mean?

Well, it could mean that Tennessee’s 6th and 7th grade ELA teachers are the worst in the state. Or, it could mean that math teachers in Tennessee are better teachers than ELA teachers. Or, it could mean that 8th grade ELA teachers are rock stars.

Alternatively, one might suspect that the results of Holloway-Libell’s analysis suggest both grade level and subject matter bias in TVAAS.

In short, TVAAS is an unreliable predictor of teacher performance. Or, teaching 6th and 7th grade students reading is really hard.

Holloway-Libell’s findings are consistent with those of Lockwood and McCaffrey (2007) published in the Journal of Educational Measurement:

The researchers tested various VAM models and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

That is, it’s totally consistent with VAM to have different estimates for math and ELA teachers, for example. Math questions are often asked in a different manner than ELA questions and the assessment is covering different subject matter.

So, TVAAS is like other VAM models in this respect. Which means, as Lockwood and McCaffrey suggest, “caution is needed when interpreting estimated teacher effects” when using VAM models (like TVAAS).

In other words: TVAAS is not a reliable predictor of teacher performance.

Which begs the question: Why is the Tennessee Department of Education attempting to force correlation between observed teacher behavior and a flawed, unreliable measure of teacher performance? More importantly, why is such an unreliable measure being used to evaluate (and in some districts, reward with salary increases) teachers?

Don’t Tennessee’s students and parents deserve a teacher evaluation system that actually reveals strong teaching and provides support for teachers who need improvement?

Aren’t Tennessee’s teachers deserving of meaningful evaluation based on sound evidence instead of a system that is consistent only in its unreliability?

The American Statistical Association has said value-added models generally are unreliable as predictors of teacher performance. Now, there’s Tennessee-specific evidence that suggests strongly that TVAAS is biased, unreliable, and not effective as a predictor of teacher performance.

Unless, that is, you believe that 6th and 7th grade ELA teachers are our state’s worst.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

 

Is John Oliver Reading TN Ed Report?

John Oliver recently took on the issue of standardized testing and it sounds like he’s been reading Tennessee Education Report. In 18 brilliant minutes, he hits on a number of topics covered here time and again.

Oliver discussed teacher merit pay, the recruiting tactics of testing companies, value-added assessment, and testing transparency.

Back in 2013, Tennessee’s State Board of Education moved toward merit pay based on value-added data.

This year, while adding nearly $100 million to the pot for teacher compensation, Governor Haslam continued a push for merit pay.

While Oliver noted that Pearson recruits test scorers on Craigslist, Tennessee’s new testing vendor, Measurement, Inc. uses the same practice.

And of course, there’s the issue of value-added assessment — in Tennessee, called TVAAS. While it yields some interesting information, it’s not a reliable predictor of teacher performance and it’s going to be even more unreliable going forward, due to the shift from TCAP to TNReady. Here’s what we’ve learned from TVAAS in Tennessee:

In fact, this analysis demonstrates that the difference between a value-added identified “great” teacher and a value-added identified “average” teacher is about $300 in earnings per year per student.  So, not that much at all.  Statistically speaking, we’d call that insignificant.  That’s not to say that teachers don’t impact students.  It IS to say that TVAAS data tells us very little about HOW teachers impact students.

Surprisingly, Tennessee has spent roughly $326 million on TVAAS and attendant assessment over the past 20 years. That’s $16 million a year on a system that is not yielding much useful information.

And then there’s testing transparency. Oliver points out that it’s difficult if not impossible to get access to the actual test questions. In fact, Tennessee’s testing vendor, Measurement, Inc., has a contract with Utah’s testing vendor that involves a fine if test questions are revealed — $5000 per question:

The contract further notes that any release of the questions either by accident or as required by law, will result in a fee of $5000 per test item released. That means if Tennessee wants to release a bank of questions generated from the Utah test and used for Tennessee’s assessment, the state would pay $5000 per question.

Here’s the clip from John Oliver:

 

For more on education politics and policy in Tennessee, follow @TNEdReport

 

Validating the Invalid?

The Tennessee House of Representatives passed legislation today (HB 108) that makes changes to current practice in teacher evaluation as Tennessee transitions to its new testing regime, TNReady.

The changes adjust the percentage of a teacher’s evaluation that is dependent on TVAAS scores to 10% next year, 20% the following year, and back to the current 35% by the 2017-18 academic year.

This plan is designed to allow for a transition period to the new TNReady tests which will include constructed-response questions and be aligned to the so-called Tennessee standards which match up with the Common Core State Standards.

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Clearly, legislators feel like at the very least, this is an improvement. A reasonable accommodation to teachers as our state makes a transition.

But, how is using 10% of an invalid number a good thing? Should any part of a teacher’s evaluation be made up of a number that reveals nothing at all about that teacher’s performance?

While value-added data alone is a relatively poor predictor of teacher performance, the value-added estimate used next year is especially poor because it is not at all valid.

But, don’t just take my word for it. Researchers studying the validity of value-added measures asked whether value-added gave different results depending on the type of question asked. Particularly relevant now because Tennessee is shifting to a new test with different types of questions.

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

It seems likely that the Senate will follow the House’s lead on Monday and overwhelmingly support the proposed evaluation changes. But in doing so, they should be asking themselves if it’s really ok to base any part of a teacher’s evaluation on numbers that reliably predict nothing.

More on Value-Added:

Real World Harms of Value-Added Data

Struggles with Value-Added Data

 

Value Added Changes

 

In what is certain to be welcome news to many teachers across the state, Governor Bill Haslam announced yesterday that he will be proposing changes to the state’s teacher evaluation process in the 2015 legislative session.

Perhaps the most significant proposal is to reduce the weight of value-added data on teacher evaluations during the transition to a new test for Tennessee students.

From the Governor’s press release explaining the proposed changes:

The governor’s proposal would:
•        Adjust the weighting of student growth data in a teacher’s evaluation so that the new state assessments in ELA and math will count 10 percent of the overall evaluation in the first year of
administration (2016), 20 percent in year two (2017) and 35 percent in year
three (2018). Currently 35 percent of an educator’s evaluation is comprised of
student achievement data based on student growth;
•        Lower the weight of student achievement growth for teachers in non-tested grades and subjects
from 25 percent to 15 percent;
•        And make explicit local school district discretion in both the qualitative teacher evaluation model that is used for the observation portion of the evaluation as well as the specific
weight student achievement growth in evaluations will play in personnel
decisions made by the district.

 

The proposal does not go as far as some have proposed, but it does represent a transition period to new tests that teachers have been seeking.  It also provides more local discretion in how evaluations are conducted.

Some educators and critics question the ability of value-added modeling to accurately predict teacher performance.

In fact, the American Statistical Association released a statement on value-added models that says, in part:

Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores

Additional analysis of the ability of value-added modeling to predict significant differences in teacher performance finds that this data doesn’t effectively differentiate among teachers.

I certainly have been critical of the over-reliance on value-added modeling in the TEAM evaluation model used in Tennessee. While the proposed change ultimately returns to using VAM for a significant portion of teacher scores, it also represents an opportunity to both transition to a new test AND explore other options for improving the teacher evaluation system.

For more on value-added modeling and its impact on the teaching profession:

Saving Money and Supporting Teachers

Real World Harms of Value-Added Data

Struggles with Value-Added Data

An Ineffective Teacher?

Principals’ Group Challenges VAM

 

For more on education policy and politics in Tennessee, follow @TNEdReport

Ravitch: Ed Reform is a Hoax

Education scholar and activist Diane Ravitch spoke at Vanderbilt University in Nashville last night at an event hosted by Tennesseans Reclaiming Educational Excellence (TREE), the Tennessee BATs (Badass Teachers), and the Momma Bears.

Ravitch touched on a number of hot-button education issues, including vouchers, charter schools, teacher evaluations, and testing. Many of these issues are seeing plenty of attention in Tennessee public policy circles both on the local and state levels.

She singled out K12, Inc. as a bad actor in the education space, calling the Tennessee Virtual Academy it runs a “sham.”

Attempts have been made to cap enrollment and shut down K12, Inc. in Tennessee, but they are still operating this year. More recently, the Union County School Board defied the State Department of Education and allowed 626 students to remain enrolled in the troubled school. The reason? Union County gets a payoff of $132,000 for their contract with K12.

Ravitch noted that there are good actors in the charter sector, but also said she adamantly opposes for-profit charter schools. Legislation that ultimately failed in 2014 would have allowed for-profit charter management companies to be hired by Tennessee charter schools.

On vouchers, an issue that has been a hot topic in the last two General Assemblies, Ravitch pointed to well-established data from Milwaukee that vouchers have made no difference in overall student performance.

Despite the evidence against vouchers, it seems quite likely they will again be an issue in the 2015 General Assembly. In fact, the Koch Brothers and their allies spent heavily in the recent elections to ensure that vouchers are back on the agenda.

Ravitch told the crowd that using value-added data to evaluate teachers makes no sense. The Tennessee Value-Added Assessment System (TVAAS) has been around since the BEP in 1992. It was created by UT Ag Professor Bill Sanders. Outgoing Commissioner of Education Kevin Huffman made an attempt to tie teacher licenses to TVAAS scores, but that was later repealed by the state board of education. A careful analysis of the claims of value-added proponents demonstrates that the data reveals very little in terms of differentiation among teachers.

Ravitch said that instead of punitive evaluation systems, teachers need resources and support. Specifically, she mentioned Peer Assistance and Review as an effective way to provide support and meaningful development to teachers.

A crowd of around 400 listened and responded positively throughout the hour-long speech. Ravitch encouraged the audience to speak up about the harms of ed reform and rally for the reforms and investments our schools truly need.

For more on education politics and policy in Tennessee, follow @TNEdReport