Flexible Validity

Commissioner of Education Candice McQueen today provided additional information on how teacher evaluations would be handled in light of the flexibility the department is granting educators in light of TNReady troubles.

First, the email from McQueen, then some thoughts:

Dear educators,

Thank you for all of your thoughtful questions in response to Gov. Haslam’s proposal to create evaluation flexibility during our transition to TNReady. Last month, we shared an overview of the governor’s proposal (here). Earlier this week, the legislation began moving through the legislative process, so I’m writing to share more detailed information regarding the proposal, specifically how it is designed to create evaluation flexibility for you.

The department has developed an FAQ document on Evaluation Flexibility for Teachers (here) which provides detailed information regarding how this flexibility will affect teachers in different subjects and grades. I encourage you to closely read this document to learn how the flexibility applies to your unique situation.

Meanwhile, I wanted to share a few highlights. The governor’s proposal would provide you the option to include or not include results from the 2015-16 TNReady and TCAP tests within the student growth component of your evaluation, depending on which scenario benefits you the most. In other words, if student growth scores from this year help you earn a higher evaluation score, they will be used. If they do not help you earn a higher score, they will not be used. The option that helps your score the most will automatically be incorporated into your evaluation. This applies to all grades and subjects, including science and social studies.

Because Tennessee teachers will meet over this spring and summer to establish scoring guidelines and cut scores for the new assessment, achievement scores will not be available until the fall. TVAAS scores, however, will be available this summer because cut scores for proficiency levels are not required to calculate growth scores.

You can follow the progress of the governor’s proposal as it moves through the legislative process at the Tennessee General Assembly website (here). If you have additional questions about how this may apply to you, please contact TEAM.Questions@tn.gov.

We hope this evaluation flexibility eases concerns as we transition to a new, more rigorous assessment that is fully aligned to our Tennessee Academic Standards, as well as navigate the challenge of moving to a paper-based test this year. Thank you for your ongoing commitment to Tennessee students, as well as your continued flexibility as we transition to an assessment that will provide us with better information about our students’ progress on the path to college and career readiness.

My thoughts:

While flexibility is good, and the TVAAS waiver is needed, this sentence is troubling:

TVAAS scores, however, will be available this summer because cut scores for proficiency levels are not required to calculate growth scores.

The plan is to allow teachers to include TNReady TVAAS scores if they improve the teacher’s overall 1-5 TEAM rating. That’s all well and good, except that there can be no valid TVAAS score generated from this year’s TNReady data. This fact seems to have escaped the data gurus at the Department of Education.

Here’s what I wrote after analyzing studies of value-added data and teacher performance when using different types of assessments:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

This year’s TNReady-based TVAAS scores will be invalid. So will next year’s, for that matter. There’s not enough comparative data to make a predictive inference regarding past TCAP performance as it relates to current TNReady performance. In other words, it’s like comparing apples to oranges. Or, pulling a number out of your ass.

IT’S WRONG!

But, there’s also the fact that in states with both paper-based and online testing, students score significantly higher on the paper tests. No one is talking about how this year’s mixed approach (some 20,000 students completed a portion of the test online on day one) will impact any supposed TVAAS number.

How about we simply don’t count test scores in teacher evaluations at all this year? Or for the next three years? We don’t even have a valid administration of TNReady – there have been errors, delays, and there still are graders hired from Craigslist.

Let’s take a step back and get it right – even if that means not counting TNReady at all this year — not for teachers, not for students, not for schools or districts. If this 11 hour test is really the best thing since sliced bread, let’s take the time to get it right. Or, here’s an idea, let’s stop TNReady for this year and allow students and teachers to go about the business of teaching and learning.

As Flexible as a Brick Wall

Grace Tatter reports that officials at the Tennessee Department of Education are “perplexed” by concerns over using TNReady data in this year’s teacher evaluations.

While a number of districts have passed resolutions asking for a waiver from including TVAAS scores in this year’s teacher evaluations due to the transition to TNReady, a department spokesperson said:

“Districts have complete discretion to choose how they want to factor that data,” Ball said Thursday. “They don’t have to use TNReady or growth data in hiring, firing, retention or promotion.”

As Tatter’s story notes, however, data from TNReady will still be a part of a teacher’s TVAAS score — 10%. And that score becomes a part of a teacher’s overall evaluation score — a ranking from 1-5 that purports to measure a teacher’s relative effectiveness.

10% is enough to move a ranking up or down a number, and that can have significant impacts on a teacher’s career, even if they are not fired and their pay is not impacted. Of course, some districts may use this year’s data for those purposes, since it is not prohibited under the evaluation changes passed last year.

Dan Lawson outlines some of the of impact faced by teachers based on that final number:

The statutorily revised “new tenure” requires five years of service (probationary period) as well as an overall score of “4” or “5” for two consecutive years preceding the recommendation to the Board of Education. Last year, no social studies assessment score was provided since it was a field tested and the teacher was compelled to select a school wide measure of growth.  He chose POORLY and his observation score of a “4.38” paired with a school wide growth score in the selected area of a “2” producing a sum teacher score of “3” thereby making him ineligible for tenure nomination.

According to TCA 49-5-503, a teacher may not be awarded tenure unless she achieves a TEAM score of 4 or 5 in two consecutive years immediately prior to being tenure eligible. That means a TVAAS score that takes a teacher from a 4 to a 3 would render her ineligible.

Further, a tenured teacher who receives a TEAM score of a 1 or 2 in two consecutive years is returned to probationary status (TCA 49-5-504). So, that tenured teacher who was a 2 last year could be impacted by a TNReady-based TVAAS score that moves a TEAM score of a 3 down to a 2.

Districts don’t have “complete discretion” to waive state law as TNDOE spokesperson Ashley Ball seems to imply.

Further, basing any part of a teacher’s evaluation on TVAAS scores based on TNReady creates problems with validity. Why include a number in a teacher’s evaluation that is fundamentally invalid?

Teachers want an evaluation process that is fair and transparent. There’s nothing perplexing about that.

For more on education politics and policy in Tennessee, follow @TNEdReport

New and Not Ready

Connie Kirby and Carol Bomar-Nelson, English teachers at Warren County High School, share their frustration with the transition to TNReady and what it means for teacher evaluation.

Connie Kirby:

This is going to be long, but I don’t usually take to social media to “air my grievances.” Today I feel like there’s no better answer than to share how I feel. It’s been a long year with some of the highest of the highs and lowest of the lows. I work in a wonderful department at a great school with some of the most intelligent, hard-working people I know. As the years have progressed, we have gone through many changes together and supported each other through the good and the bad (personally and professionally). We do our best to “comply” with the demands that the state has put on us, but this year everything that we’ve been hearing about and preparing for for years has come to fruition. We’re finally getting familiar with the “real deal” test, instead of dealing with EOCs and wondering how it’s going to change. I’ve seen the posts and rants about Common Core and have refrained from jumping on the bandwagon because I have had no issues with the new standards. I do, however, see an issue with the new assessment, so I have held my hand in the hopes that I might find something worth sharing and putting my name next to. Today, I witnessed an exchange between one of my colleagues and the state, and I couldn’t have said it better myself. With her permission, I am sharing her words.

Carol Bomar-Nelson:

I don’t know how to fix the problems with the test. I agree that teachers should have accountability, and I think student test scores are one way of doing that. Having said that, if the state is going to hold teachers accountable for student test scores, then the test needs to be fair. From what I have seen, I firmly believe that is not the case. I am not just basing this conclusion on the one “Informational Test” in MICA. Other quizzes I have generated in MICA have had similar flaws. When my department and I design common assessments in our PLC’s, we all take the tests and compare answers to see which questions are perhaps ambiguous or fallacious in some way. I do not see any evidence that the state is doing this for the tests that it is manufacturing. A team of people can make a test that is perfect with respect to having good distractors, clear wording, complex passages, and all the other components that make up a “good” test, but until several people take the test, compare answers, and discuss what they missed, that test is not ready for students to take–especially not on a high stakes test that is supposed to measure teacher effectiveness. I understand that this is the first year of this test. I am sympathetic to the fact that everyone is going through a ‘learning process’ as they adapt to the new test. Students have to learn how to use the technology; teachers have to learn how to prepare their students for a new type of tests; administrators have to figure out how to administer the test; the state has to work out the kinks in the test itself…The state is asking everyone to be “patient” with the new system. But what about for the teachers? Yes, the teacher effectiveness data only counts for 10% this year, but that 10% still represents how I am as a teacher. In essence, this new tests is like a pretest, correct? A pretest to get a benchmark about where students stand at the end of the year with this new test that has so many flaws and so many unknowns. In the teaching profession, I think all would agree that it is bad practice to count a pretest AT ALL for a student’s grade. Not 35%, not 25%, not even 10%. So how is it acceptable practice to count a flawed test for 10% of a teacher’s evaluation? We can quibble all day about which practice questions…are good and which questions are flawed, but that will not fix the problem. The problem lies in the test development process. If the practice questions go through the same process as the real questions, it would stand to reason that the real test questions are just as flawed as the practice questions. My students have to take that test; I never get to see it to determine if it is a fair test or not, and yet it still counts as 10% of my evaluation that shows my effectiveness as a teacher. How is that fair in any way whatsoever? In what other profession are people evaluated on something that they never get to see? Especially when that evaluation ‘tool’ is new and not ready for use?

I know how to select complex texts. I know how to collaborate with my PLC. I can teach my students how to read, think critically, analyze, and write. When I do not know how to do something, I have no problem asking other teachers or administrators for suggestions, advice, and help. I am managing all of the things that are in my control to give my students the best possible education. Yet in the midst of all of these things, my teacher accountability is coming from a test that is generated by people who have no one holding them accountable. And at the end of the year, when those scores come back to me, I have no way to see the test to analyze its validity and object if it is flawed.

For more on education politics and policy in Tennessee, follow @TNEdReport

Not Yet TNReady?

As students and teachers prepare for this year’s standardized tests, there is more anxiety than usual due to the switch to the new TNReady testing regime. This according to a story in the Tennessean by Jason Gonzalez.

Teachers ask for “grace”

In his story, Gonzalez notes:

While teachers and students work through first-year struggles, teachers said the state will need to be understanding. At the Governor’s Teacher Cabinet meeting Thursday in Nashville, 18 educators from throughout the state told Gov. Bill Haslam and McQueen there needs to be “grace” over this year’s test.

The state has warned this year’s test scores will likely dip as it switches to a new baseline measure. TCAP scores can’t be easily compared to TNReady scores.

Despite the fact that the scores “can’t be easily compared,” the state will still use them in teacher evaluations. At the same time, the state is allowing districts to waive the requirement that the scores count toward student grades, as the TCAP and End of Course tests have in the past.

In this era of accountability, it seems odd that students would be relieved of accountability while teachers will still be held accountable.

While that may be one source of anxiety, another is that by using TNReady in the state’s TVAAS formula, the state is introducing a highly suspect means of evaluating teachers. It is, in fact, a statistically invalid approach.

As noted back in March citing an article from the Journal of Educational Measurement:

These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

 

That means that the shift to TNReady will change the way TVAAS estimates teacher effect. How? No one knows. We can’t know. We can’t know because the test hasn’t been administered and so we don’t have any results. Without results, we can’t compare TNReady to TCAP. And, even once we have this year’s results, we can’t fairly establish a pattern — because we will only have one year of data. What if this year’s results are an anomaly? With three or more years of results, we MAY be able to make some estimates as to how TCAP compares to TNReady and then possibly correlate those findings into teacher effect estimates. But, we could just end up compounding error rates.

Nevertheless, the state will count the TNReady results on this year’s teacher evaluations using a flawed TVAAS formula. And the percentage these results will count will grow in subsequent years, even if the confidence we have in the estimate does not. Meanwhile, students are given a reprieve…some “grace” if you will.

I’d say that’s likely to induce some anxiety.

For more on education politics and policy in Tennessee, follow @TNEdReport