New and Not Ready

Connie Kirby and Carol Bomar-Nelson, English teachers at Warren County High School, share their frustration with the transition to TNReady and what it means for teacher evaluation.

Connie Kirby:

This is going to be long, but I don’t usually take to social media to “air my grievances.” Today I feel like there’s no better answer than to share how I feel. It’s been a long year with some of the highest of the highs and lowest of the lows. I work in a wonderful department at a great school with some of the most intelligent, hard-working people I know. As the years have progressed, we have gone through many changes together and supported each other through the good and the bad (personally and professionally). We do our best to “comply” with the demands that the state has put on us, but this year everything that we’ve been hearing about and preparing for for years has come to fruition. We’re finally getting familiar with the “real deal” test, instead of dealing with EOCs and wondering how it’s going to change. I’ve seen the posts and rants about Common Core and have refrained from jumping on the bandwagon because I have had no issues with the new standards. I do, however, see an issue with the new assessment, so I have held my hand in the hopes that I might find something worth sharing and putting my name next to. Today, I witnessed an exchange between one of my colleagues and the state, and I couldn’t have said it better myself. With her permission, I am sharing her words.

Carol Bomar-Nelson:

I don’t know how to fix the problems with the test. I agree that teachers should have accountability, and I think student test scores are one way of doing that. Having said that, if the state is going to hold teachers accountable for student test scores, then the test needs to be fair. From what I have seen, I firmly believe that is not the case. I am not just basing this conclusion on the one “Informational Test” in MICA. Other quizzes I have generated in MICA have had similar flaws. When my department and I design common assessments in our PLC’s, we all take the tests and compare answers to see which questions are perhaps ambiguous or fallacious in some way. I do not see any evidence that the state is doing this for the tests that it is manufacturing. A team of people can make a test that is perfect with respect to having good distractors, clear wording, complex passages, and all the other components that make up a “good” test, but until several people take the test, compare answers, and discuss what they missed, that test is not ready for students to take–especially not on a high stakes test that is supposed to measure teacher effectiveness. I understand that this is the first year of this test. I am sympathetic to the fact that everyone is going through a ‘learning process’ as they adapt to the new test. Students have to learn how to use the technology; teachers have to learn how to prepare their students for a new type of tests; administrators have to figure out how to administer the test; the state has to work out the kinks in the test itself…The state is asking everyone to be “patient” with the new system. But what about for the teachers? Yes, the teacher effectiveness data only counts for 10% this year, but that 10% still represents how I am as a teacher. In essence, this new tests is like a pretest, correct? A pretest to get a benchmark about where students stand at the end of the year with this new test that has so many flaws and so many unknowns. In the teaching profession, I think all would agree that it is bad practice to count a pretest AT ALL for a student’s grade. Not 35%, not 25%, not even 10%. So how is it acceptable practice to count a flawed test for 10% of a teacher’s evaluation? We can quibble all day about which practice questions…are good and which questions are flawed, but that will not fix the problem. The problem lies in the test development process. If the practice questions go through the same process as the real questions, it would stand to reason that the real test questions are just as flawed as the practice questions. My students have to take that test; I never get to see it to determine if it is a fair test or not, and yet it still counts as 10% of my evaluation that shows my effectiveness as a teacher. How is that fair in any way whatsoever? In what other profession are people evaluated on something that they never get to see? Especially when that evaluation ‘tool’ is new and not ready for use?

I know how to select complex texts. I know how to collaborate with my PLC. I can teach my students how to read, think critically, analyze, and write. When I do not know how to do something, I have no problem asking other teachers or administrators for suggestions, advice, and help. I am managing all of the things that are in my control to give my students the best possible education. Yet in the midst of all of these things, my teacher accountability is coming from a test that is generated by people who have no one holding them accountable. And at the end of the year, when those scores come back to me, I have no way to see the test to analyze its validity and object if it is flawed.

For more on education politics and policy in Tennessee, follow @TNEdReport

A Little Less Bad

From a story in Chalkbeat:

Tennessee’s teacher evaluation system is more accurate than ever in measuring teacher quality…

That’s the conclusion drawn from a report on the state’s teacher evaluation system conducted by the State Department of Education.

The idea is that the system is improving.

Here’s the evidence the report uses to justify the claim of an improving evaluation system:

1) Teacher observation scores now more closely align with teacher TVAAS scores — TVAAS is the value-added modeling system used to determine a teacher’s impact on student growth

2) More teachers in untested subjects are now being evaluated using the portfolio system rather than TVAAS data from students they never taught

On the second item, I’d note that previously, 3 districts were using the a portfolio model and now 11 districts use it. This model allows related-arts teachers and those in other untested subjects to present a portfolio of student work to demonstrate that teacher’s impact on growth. The model is generally applauded by teachers who have a chance to use it.

However, there are 141 districts in Tennessee and 11 use this model. Part of the reason is the time it takes to assess portfolios well and another reason is the cost associated with having trained evaluators assess the portfolios. Since the state has not (yet) provided funding for the use of portfolios, it’s no surprise more districts haven’t adopted the model. If the state wants the evaluation model to really improve (and thereby improve teaching practice), they should support districts in their efforts to provide meaningful evaluation to teachers.

A portfolio system could work well for all teachers, by the way. The state could move to a system of project-based learning and thus provide a rich source of material for both evaluating student mastery of concepts AND teacher ability to impact student learning.

On to the issue of TVAAS and observation alignment. Here’s what the report noted:

Among the findings, state education leaders are touting the higher correlation between a teacher’s value-added score (TVAAS), which estimates how much teachers contribute to students’ growth on statewide assessments, and observation scores conducted primarily by administrators.

First, the purpose of using multiple measures of teacher performance is not to find perfect alignment, or even strong correlation, but to utilize multiple inputs to assess performance. Pushing for alignment suggests that the department is actually looking for a way to make TVAAS the central input driving teacher evaluation.

Advocates of this approach will tell suggest that student growth can be determined accurately by TVAAS and that TVAAS is a reliable predictor of teacher performance.

I would suggest that TVAAS, like most value-added models, is not a significant differentiator of teacher performance. I’ve written before about the need for caution when using value-added data to evaluate teachers.

More recently, I wrote about the problems inherent in attempting to assign growth scores when shifting to a new testing regime, as Tennessee will do next year when it moves from TCAP to TNReady. In short, it’s not possible to assign valid growth scores when comparing two entirely different tests.  Researchers at RAND noted:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

So, even if you buy the idea that TVAAS is a significant differentiator of teacher performance, drawing meaningful conclusions from next year’s TNReady simply is not reliable.

The state is touting improvement in a flawed system that may now be a little less bad.  And because they insist on estimating growth from two different tests with differing methodologies, the growth estimates in 2016 will be unreliable at best. If they wanted to improve the system, they would take two to three years to build growth data based on TNReady — that would mean two t0 three years of NO TVAAS data in teacher evaluation.

Alternatively, the state could move to a system of project-based learning and teacher evaluation and professional development based on a Peer Assistance and Review Model. Such an approach would be both student-centered and result in giving teachers the professional respect they deserve. It also carries a price tag — but our students are worth doing the work of both reallocating existing education dollars and finding new ways to invest in our schools.

For more on education politics and policy in Tennessee, follow @TNEdReport

 

 

 

Validating the Invalid?

The Tennessee House of Representatives passed legislation today (HB 108) that makes changes to current practice in teacher evaluation as Tennessee transitions to its new testing regime, TNReady.

The changes adjust the percentage of a teacher’s evaluation that is dependent on TVAAS scores to 10% next year, 20% the following year, and back to the current 35% by the 2017-18 academic year.

This plan is designed to allow for a transition period to the new TNReady tests which will include constructed-response questions and be aligned to the so-called Tennessee standards which match up with the Common Core State Standards.

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Clearly, legislators feel like at the very least, this is an improvement. A reasonable accommodation to teachers as our state makes a transition.

But, how is using 10% of an invalid number a good thing? Should any part of a teacher’s evaluation be made up of a number that reveals nothing at all about that teacher’s performance?

While value-added data alone is a relatively poor predictor of teacher performance, the value-added estimate used next year is especially poor because it is not at all valid.

But, don’t just take my word for it. Researchers studying the validity of value-added measures asked whether value-added gave different results depending on the type of question asked. Particularly relevant now because Tennessee is shifting to a new test with different types of questions.

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

It seems likely that the Senate will follow the House’s lead on Monday and overwhelmingly support the proposed evaluation changes. But in doing so, they should be asking themselves if it’s really ok to base any part of a teacher’s evaluation on numbers that reliably predict nothing.

More on Value-Added:

Real World Harms of Value-Added Data

Struggles with Value-Added Data

 

Why TN Teachers Didn’t Like Kevin Huffman

Kevin Huffman announced yesterday he’s leaving his post as Commissioner of Education. The news was met positively by many teachers around the state. But, why didn’t Tennessee teachers care for Kevin Huffman? Why did a number of local teacher associations vote “no confidence” in Huffman in 2013? Why did Directors from across the state sign a letter telling the Governor that Huffman needed to do a better job?

I wrote a post for a different blog back in 2011, Huffman’s first year, about his remarks on teacher evaluation. In short, he got off to a bad start in terms of communicating with and about teachers, and never recovered.

Here’s that post from 2011 in its entirety, with some notes about what has happened since then included:

Tennessee’s Commissioner of Education, Kevin Huffman, offered his thoughts today on the state’s new evaluation system for teachers which takes effect this year.

 

While I certainly agree that the evaluation system needed significant improvement, I have some concerns about the Commissioner’s statements.

 

Specifically, he notes:

 

Tennessee is now a few weeks into a new era of evaluation. The new system is strong, though not perfect, and it represents a dramatic leap forward over the past system that told nearly all teachers they had succeeded, even when students had failed.

 

This statement assumes that the poor performance of Tennessee students on the National Assessment of Educational Progress (NAEP) was solely or primarily the result of bad teachers. By his calculations, since 70 percent of students failed to meet satisfactory progress on the NAEP, 70 percent of Tennessee teachers must not be performing up to par.

 

What’s missing from his analysis, however, is the reality that until 2010, Tennessee had incredibly low standards relative to the NAEP. In fact, nearly 87% of students were deemed proficient on TCAPs despite only 27% testing proficient on the NAEP. Here’s the deal: Tennessee schools were held accountable under NCLB for hitting TCAP benchmarks. Tennessee policymakers set the standard. And Tennessee teachers were hitting the mark they were told was important. In fact, data suggest more and more Tennessee students were marching toward TCAP proficiency each year. By that indicator, Tennessee teachers were doing a fine job. Policymakers set a target, and Tennessee teachers hit it year after year. Since curriculum and accountability were not tied to NAEP, it seems unreasonable to expect that teachers would be helping students hit NAEP benchmarks.

 

Huffman’s remarks also ignore this reality: Tennessee spends less per student than most of our neighboring states. 8 states test 100% of graduates on the ACT. Tennessee ranks 7th in that group, below every other state that spends MORE per pupil than Tennessee. Kentucky spends about $1500 more per student than Tennessee and gets significantly better results on the NAEP year after year. The point being: teachers can only do so much with limited resources and our state has done a pretty good job of limiting the resources.

 

Huffman also notes:

 

As new student assessments are developed and vetted by Tennessee educators and experts, we expect that next year, it will be possible for 70 percent of teachers to be evaluated by their own student-assessment results. Eventually, more than 90 percent of teachers will have such options.

This dream still hasn’t been realized — Portfolios are available for some non-tested subjects, but are not in wide use due to cost.

So more teachers will have their own value-added data. This means more assessments (TESTS) for Tennessee students. Will there now be TCAP-like tests in grades K-2? As the parent of a Kindergartener, I certainly hope not. What about related arts? Will there be a written test for an instrumental music course? Or is the value-added that a student who previously struggled with the flute now excels? How is that measured? In performance-based art, music, and theatre classes, will more time be spent drilling on concepts so a kid can pass a written test rather than on actually improving one’s ability to draw, sing, or perform?

 

Finally, the new evaluations are time-intensive and do provide regular feedback. That’s a good thing. However, there’s no indication of available funding for meaningful professional development tied to the evaluations. There is yet to be a serious discussion of funding for mentors for early career teachers to help them get up to speed on key concepts and improve their technique. Teach for America (where Huffman worked as a teacher and then as a national organizational leader) relies heavily on intensive support for their Corps members. Lessons are video-taped, coaches are provided, feedback is regular and strategies for improvement are offered. Research suggests that intensive mentoring in the first two years of a teacher’s career not only improves their practice and increases retention, but also results in higher student achievement.

 

Tennessee’s new evaluation system for teachers is no doubt an improvement. But unless that system is coupled with meaningful support for teachers and adequate classroom resources, we’ll still find ourselves far behind the rest of the country.

There’s been no significant commitment to professional development or intensive mentoring by the state. Teachers didn’t get a promised raise this year.

So, Tennessee teachers started off hearing from Huffman that they had failed. Then, resources for support didn’t materialized and the transition to Common Core wasn’t well-communicated. Huffman suggested the same flawed, value-added based evaluations were responsible for a 2013 NAEP boost, and then a promised pay raise was taken away.

Is it any wonder Tennessee teachers aren’t too sad to see Huffman go?

 

For more on education politics and policy in Tennessee, follow @TNEdReport