A Lot of Words

The Murfreesboro City School Board has already expressed concern about the state’s TNReady tests and the delay in receiving results.

More recently, Board members expressed frustration with the response they received from Education Commissioner Candice McQueen.

The Murfreesboro Post reports:

“I felt like it was a lot of words for not really answering our questions,” said Board Member Jared Barrett. He referred to the response as having “excuses” and “dodging the question.”

“My first response when I read this letter was that there’s something in here that doesn’t add up,” said Board Member Phil King. “My fear is they haven’t solved the problem of getting the paper tests in our hands in a timely manner.”

King suggested moving away from using TNReady in teacher evaluations until the state can prove it can get results back to districts in a timely manner.

The Murfreesboro School Board meeting happened before the most recent round of TNReady troubles, with some students receiving incorrect scores and some teachers not having students properly counted in their TVAAS scores.

In response to those issues, House Speaker Beth Harwell has called for hearings on the issue of state testing.

Additionally, yesterday, the United Education Association of Shelby County called for TNReady scores for this year to be invalidated and for a moratorium on including TNReady scores in accountability measures until 2021.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

Dear Educator

The Tennessee Department of Education explains the case of the missing students as some 900 teachers see their TVAAS scores recalculated.

Here’s the email those educators were sent:

Dear Educator,

We wanted to share an update with you regarding your individual TVAAS data.

The department has processed about 1.5 million records to generate individual TVAAS scores for nearly 19,000 educators based on the assessment results from over 1.9 million student tests in grades 2-8 and high school. During the review process with districts, we found that a small number of educators did not have all of their teacher-student claiming linkage records fully processed in data files released in early September. All linkage data that was captured in EdTools directly was fully incorporated as expected. However, due to a coding error in their software, our data processing vendor, RANDA Solutions, did not fully apply the linkage information that districts provided in supplemental Excel files over the summer. As a result, we are working with Randa to ensure that this additional data is included in final TVAAS processing.

 

You have been identified as an educator with some linkage data submitted via an Excel file that was not fully processed. This means after our statistical analysis vendor, SAS, receives these additional linkage records your score may be revised to reflect all the students you identified in the teacher-student claiming process. Only students marked “F” for instructional availability are used when calculating individual TVAAS data. Based on our records, there will be [X] additional students marked “F” for instructional availability linked to you when the additional data is incorporated.

 

Your district’s and school’s TVAAS scores are not affected by this situation given that all students are included in these metrics, regardless of which teacher is linked to them, so no other part of your evaluation composite would change. Moreover, only those teachers with this additional linkage data in Excel files are impacted, so the vast majority of your colleagues across the state have their final individual TVAAS composites, which are inclusive of all student data.

 

We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback. We will follow-up with more detailed information in the next couple of weeks. Also, as announced to districts earlier this month, the department and your districts will be using new systems and processes this year that will ensure that this type of oversight does not happen again.

 

Thank you for your patience as we work to share complete and accurate feedback for you. We deeply value each Tennessee educator and apologize for this delay in providing your final TVAAS results. Please contact our office via the email address below if you have any questions.

 

Respectfully,

 

Office of Assessment Logistics

Tennessee Department of Education

A few things stand out about this communication:

  1. Tennessee continues to experience challenges with the rollout of TNReady. That’s to be expected, but it begs the question: Why are we rushing this? Why not take some time, hit pause, and get this right?
  2. The Department says, “Thank you for your patience as we work to share complete and accurate feedback for you.” If accurate feedback was important, the state would take the time to build a value-added data set based on TNReady. This would take three to five years, but would improve the accuracy of the information provided to educators. As it stands, the state is comparing apples to oranges and generating value-added scores of little real value.
  3. On the topic of value-added data generally, it is important to note that even with a complete data set, TVAAS data is of limited value in terms of evaluating teacher effectiveness. A recent federal lawsuit settlement in Houston ended the use of value-added data for teacher evaluation there. Additionally, a judge in New York ruled the use of value-added data in teacher evaluation was “arbitrary and capricious.”
  4.  When will teachers have access to this less than accurate data? Here’s what the TDOE says, “We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback.” Maybe they aren’t setting a clear deadline because they have a track record of missing deadlines?
  5. It’s amazing to me that a teacher’s “overall level of effectiveness” can only be determined once TVAAS data is included in their evaluation score. It’s as if there’s no other way to determine an overall level of a teacher’s effectiveness. Not through principal observation. Not through analysis of data points on student progress taken throughout the year. Not through robust peer-evaluation systems.
  6. Let’s assume for a moment that the “level of effectiveness” indicator is useful for teacher development. Providing that score “later” is not exactly helpful. Ideally, actionable insight would be provided to a teacher and his/her administrators near the end of a school year. This would allow for targeted professional development to address areas that need improvement. Of course, this assumes targeted PD is even available.
  7. Accountability. This is the latest in a series of mishaps related to the new testing regimen known as TNReady. Teachers are held accountable through their evaluation scores, and in some districts, their pay is tied to those scores. Schools and districts are held accountable for growth and achievement scores and must develop School Improvement Plans to target areas of weakness. On the other hand, the Department of Education continues to make mistakes in the TNReady transition and no one is held accountable.

The email to impacted teachers goes to great lengths to establish the enormous scope of the TNReady transition. Lots of tests, lots of students, not too many mistakes. If this were the only error so far in the TNReady process, all could be forgiven. Instead, it is the latest in a long line of bumps. Perhaps it will all smooth out in time. Which only makes the case for hitting pause all the stronger.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Muddy Waters

Laura Faith Kebede of Chalkbeat reports on the challenges in generating reliable TVAAS scores as a result of TNReady trouble last year. Her story cites a statistician from the Center for Assessment who explains the issue this way:

Damian Betebenner, a senior associate at Center for Assessment that regularly consults with state departments, said missing data on top of a testing transition “muddies the water” on results.

“When you look at growth over two years, so how much the student grew from third to fifth grade, then it’s probably going to be a meaningful quantity,” he said. “But to then assert that it isolates the school contribution becomes a pretty tenuous assertion… It adds another thing that’s changing underneath the scene.”

In other words, it’s difficult to get a meaningful result given the current state of testing in Tennessee. I wrote recently about this very issue and the problem with the validity of the growth scores this year.

Additionally, two years ago, I pointed out the challenges the state would face when shifting to a new test. Keep in mind, this was before all the TNReady trouble that further muddied the waters. Here’s what I said in March of 2015:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

The way to address this issue? Build multiple years of data in order to obtain reliable results:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

So, now we have two challenges: We have two different types of tests AND we have a missing year of data. Either one of these challenges creates statistical problems. The combination of the two calls for a serious reset of the state’s approach to accountability.

As I suggested yesterday, taking the time to get this right would mean not using the TNReady data for accountability for teachers, students, or schools until 2019 at the earliest. If our state is committed to TNReady, we should be committed to getting it right. We’re spending a lot of money on both TNReady and on TVAAS. If we’re going to invest in these approaches, we should also take the time to be sure that investment yields useful, reliable information.

Why does any of this matter? Because, as Kebede points out:

At the same time, TVAAS scores for struggling schools will be a significant factor to determine which improvement tracks they will be be placed on under the state’s new accountability system as outlined in its plan to comply with the federal Every Student Succeeds Act. For some schools, their TVAAS score will be the difference between continuing under a local intervention model or being eligible to enter the state-run Achievement School District. The school growth scores will also determine which charter schools are eligible for a new pot of state money for facilities.

TVAAS scores also count in teacher evaluations. TNReady scores were expected to count in student grades until the quick scores weren’t back in time. If all goes well with the online administration of TNReady this year, the scores will count for students.

The state says TNReady matters. The state evaluates schools based on TVAAS scores. The state teacher evaluation formula includes TVAAS scores for teachers and TNReady scores as one measure of achievement that can be selected.

In short: Getting this right matters.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Flexible Validity

Commissioner of Education Candice McQueen today provided additional information on how teacher evaluations would be handled in light of the flexibility the department is granting educators in light of TNReady troubles.

First, the email from McQueen, then some thoughts:

Dear educators,

Thank you for all of your thoughtful questions in response to Gov. Haslam’s proposal to create evaluation flexibility during our transition to TNReady. Last month, we shared an overview of the governor’s proposal (here). Earlier this week, the legislation began moving through the legislative process, so I’m writing to share more detailed information regarding the proposal, specifically how it is designed to create evaluation flexibility for you.

The department has developed an FAQ document on Evaluation Flexibility for Teachers (here) which provides detailed information regarding how this flexibility will affect teachers in different subjects and grades. I encourage you to closely read this document to learn how the flexibility applies to your unique situation.

Meanwhile, I wanted to share a few highlights. The governor’s proposal would provide you the option to include or not include results from the 2015-16 TNReady and TCAP tests within the student growth component of your evaluation, depending on which scenario benefits you the most. In other words, if student growth scores from this year help you earn a higher evaluation score, they will be used. If they do not help you earn a higher score, they will not be used. The option that helps your score the most will automatically be incorporated into your evaluation. This applies to all grades and subjects, including science and social studies.

Because Tennessee teachers will meet over this spring and summer to establish scoring guidelines and cut scores for the new assessment, achievement scores will not be available until the fall. TVAAS scores, however, will be available this summer because cut scores for proficiency levels are not required to calculate growth scores.

You can follow the progress of the governor’s proposal as it moves through the legislative process at the Tennessee General Assembly website (here). If you have additional questions about how this may apply to you, please contact TEAM.Questions@tn.gov.

We hope this evaluation flexibility eases concerns as we transition to a new, more rigorous assessment that is fully aligned to our Tennessee Academic Standards, as well as navigate the challenge of moving to a paper-based test this year. Thank you for your ongoing commitment to Tennessee students, as well as your continued flexibility as we transition to an assessment that will provide us with better information about our students’ progress on the path to college and career readiness.

My thoughts:

While flexibility is good, and the TVAAS waiver is needed, this sentence is troubling:

TVAAS scores, however, will be available this summer because cut scores for proficiency levels are not required to calculate growth scores.

The plan is to allow teachers to include TNReady TVAAS scores if they improve the teacher’s overall 1-5 TEAM rating. That’s all well and good, except that there can be no valid TVAAS score generated from this year’s TNReady data. This fact seems to have escaped the data gurus at the Department of Education.

Here’s what I wrote after analyzing studies of value-added data and teacher performance when using different types of assessments:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

This year’s TNReady-based TVAAS scores will be invalid. So will next year’s, for that matter. There’s not enough comparative data to make a predictive inference regarding past TCAP performance as it relates to current TNReady performance. In other words, it’s like comparing apples to oranges. Or, pulling a number out of your ass.

IT’S WRONG!

But, there’s also the fact that in states with both paper-based and online testing, students score significantly higher on the paper tests. No one is talking about how this year’s mixed approach (some 20,000 students completed a portion of the test online on day one) will impact any supposed TVAAS number.

How about we simply don’t count test scores in teacher evaluations at all this year? Or for the next three years? We don’t even have a valid administration of TNReady – there have been errors, delays, and there still are graders hired from Craigslist.

Let’s take a step back and get it right – even if that means not counting TNReady at all this year — not for teachers, not for students, not for schools or districts. If this 11 hour test is really the best thing since sliced bread, let’s take the time to get it right. Or, here’s an idea, let’s stop TNReady for this year and allow students and teachers to go about the business of teaching and learning.

As Flexible as a Brick Wall

Grace Tatter reports that officials at the Tennessee Department of Education are “perplexed” by concerns over using TNReady data in this year’s teacher evaluations.

While a number of districts have passed resolutions asking for a waiver from including TVAAS scores in this year’s teacher evaluations due to the transition to TNReady, a department spokesperson said:

“Districts have complete discretion to choose how they want to factor that data,” Ball said Thursday. “They don’t have to use TNReady or growth data in hiring, firing, retention or promotion.”

As Tatter’s story notes, however, data from TNReady will still be a part of a teacher’s TVAAS score — 10%. And that score becomes a part of a teacher’s overall evaluation score — a ranking from 1-5 that purports to measure a teacher’s relative effectiveness.

10% is enough to move a ranking up or down a number, and that can have significant impacts on a teacher’s career, even if they are not fired and their pay is not impacted. Of course, some districts may use this year’s data for those purposes, since it is not prohibited under the evaluation changes passed last year.

Dan Lawson outlines some of the of impact faced by teachers based on that final number:

The statutorily revised “new tenure” requires five years of service (probationary period) as well as an overall score of “4” or “5” for two consecutive years preceding the recommendation to the Board of Education. Last year, no social studies assessment score was provided since it was a field tested and the teacher was compelled to select a school wide measure of growth.  He chose POORLY and his observation score of a “4.38” paired with a school wide growth score in the selected area of a “2” producing a sum teacher score of “3” thereby making him ineligible for tenure nomination.

According to TCA 49-5-503, a teacher may not be awarded tenure unless she achieves a TEAM score of 4 or 5 in two consecutive years immediately prior to being tenure eligible. That means a TVAAS score that takes a teacher from a 4 to a 3 would render her ineligible.

Further, a tenured teacher who receives a TEAM score of a 1 or 2 in two consecutive years is returned to probationary status (TCA 49-5-504). So, that tenured teacher who was a 2 last year could be impacted by a TNReady-based TVAAS score that moves a TEAM score of a 3 down to a 2.

Districts don’t have “complete discretion” to waive state law as TNDOE spokesperson Ashley Ball seems to imply.

Further, basing any part of a teacher’s evaluation on TVAAS scores based on TNReady creates problems with validity. Why include a number in a teacher’s evaluation that is fundamentally invalid?

Teachers want an evaluation process that is fair and transparent. There’s nothing perplexing about that.

For more on education politics and policy in Tennessee, follow @TNEdReport

New and Not Ready

Connie Kirby and Carol Bomar-Nelson, English teachers at Warren County High School, share their frustration with the transition to TNReady and what it means for teacher evaluation.

Connie Kirby:

This is going to be long, but I don’t usually take to social media to “air my grievances.” Today I feel like there’s no better answer than to share how I feel. It’s been a long year with some of the highest of the highs and lowest of the lows. I work in a wonderful department at a great school with some of the most intelligent, hard-working people I know. As the years have progressed, we have gone through many changes together and supported each other through the good and the bad (personally and professionally). We do our best to “comply” with the demands that the state has put on us, but this year everything that we’ve been hearing about and preparing for for years has come to fruition. We’re finally getting familiar with the “real deal” test, instead of dealing with EOCs and wondering how it’s going to change. I’ve seen the posts and rants about Common Core and have refrained from jumping on the bandwagon because I have had no issues with the new standards. I do, however, see an issue with the new assessment, so I have held my hand in the hopes that I might find something worth sharing and putting my name next to. Today, I witnessed an exchange between one of my colleagues and the state, and I couldn’t have said it better myself. With her permission, I am sharing her words.

Carol Bomar-Nelson:

I don’t know how to fix the problems with the test. I agree that teachers should have accountability, and I think student test scores are one way of doing that. Having said that, if the state is going to hold teachers accountable for student test scores, then the test needs to be fair. From what I have seen, I firmly believe that is not the case. I am not just basing this conclusion on the one “Informational Test” in MICA. Other quizzes I have generated in MICA have had similar flaws. When my department and I design common assessments in our PLC’s, we all take the tests and compare answers to see which questions are perhaps ambiguous or fallacious in some way. I do not see any evidence that the state is doing this for the tests that it is manufacturing. A team of people can make a test that is perfect with respect to having good distractors, clear wording, complex passages, and all the other components that make up a “good” test, but until several people take the test, compare answers, and discuss what they missed, that test is not ready for students to take–especially not on a high stakes test that is supposed to measure teacher effectiveness. I understand that this is the first year of this test. I am sympathetic to the fact that everyone is going through a ‘learning process’ as they adapt to the new test. Students have to learn how to use the technology; teachers have to learn how to prepare their students for a new type of tests; administrators have to figure out how to administer the test; the state has to work out the kinks in the test itself…The state is asking everyone to be “patient” with the new system. But what about for the teachers? Yes, the teacher effectiveness data only counts for 10% this year, but that 10% still represents how I am as a teacher. In essence, this new tests is like a pretest, correct? A pretest to get a benchmark about where students stand at the end of the year with this new test that has so many flaws and so many unknowns. In the teaching profession, I think all would agree that it is bad practice to count a pretest AT ALL for a student’s grade. Not 35%, not 25%, not even 10%. So how is it acceptable practice to count a flawed test for 10% of a teacher’s evaluation? We can quibble all day about which practice questions…are good and which questions are flawed, but that will not fix the problem. The problem lies in the test development process. If the practice questions go through the same process as the real questions, it would stand to reason that the real test questions are just as flawed as the practice questions. My students have to take that test; I never get to see it to determine if it is a fair test or not, and yet it still counts as 10% of my evaluation that shows my effectiveness as a teacher. How is that fair in any way whatsoever? In what other profession are people evaluated on something that they never get to see? Especially when that evaluation ‘tool’ is new and not ready for use?

I know how to select complex texts. I know how to collaborate with my PLC. I can teach my students how to read, think critically, analyze, and write. When I do not know how to do something, I have no problem asking other teachers or administrators for suggestions, advice, and help. I am managing all of the things that are in my control to give my students the best possible education. Yet in the midst of all of these things, my teacher accountability is coming from a test that is generated by people who have no one holding them accountable. And at the end of the year, when those scores come back to me, I have no way to see the test to analyze its validity and object if it is flawed.

For more on education politics and policy in Tennessee, follow @TNEdReport

Not Yet TNReady?

As students and teachers prepare for this year’s standardized tests, there is more anxiety than usual due to the switch to the new TNReady testing regime. This according to a story in the Tennessean by Jason Gonzalez.

Teachers ask for “grace”

In his story, Gonzalez notes:

While teachers and students work through first-year struggles, teachers said the state will need to be understanding. At the Governor’s Teacher Cabinet meeting Thursday in Nashville, 18 educators from throughout the state told Gov. Bill Haslam and McQueen there needs to be “grace” over this year’s test.

The state has warned this year’s test scores will likely dip as it switches to a new baseline measure. TCAP scores can’t be easily compared to TNReady scores.

Despite the fact that the scores “can’t be easily compared,” the state will still use them in teacher evaluations. At the same time, the state is allowing districts to waive the requirement that the scores count toward student grades, as the TCAP and End of Course tests have in the past.

In this era of accountability, it seems odd that students would be relieved of accountability while teachers will still be held accountable.

While that may be one source of anxiety, another is that by using TNReady in the state’s TVAAS formula, the state is introducing a highly suspect means of evaluating teachers. It is, in fact, a statistically invalid approach.

As noted back in March citing an article from the Journal of Educational Measurement:

These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

 

That means that the shift to TNReady will change the way TVAAS estimates teacher effect. How? No one knows. We can’t know. We can’t know because the test hasn’t been administered and so we don’t have any results. Without results, we can’t compare TNReady to TCAP. And, even once we have this year’s results, we can’t fairly establish a pattern — because we will only have one year of data. What if this year’s results are an anomaly? With three or more years of results, we MAY be able to make some estimates as to how TCAP compares to TNReady and then possibly correlate those findings into teacher effect estimates. But, we could just end up compounding error rates.

Nevertheless, the state will count the TNReady results on this year’s teacher evaluations using a flawed TVAAS formula. And the percentage these results will count will grow in subsequent years, even if the confidence we have in the estimate does not. Meanwhile, students are given a reprieve…some “grace” if you will.

I’d say that’s likely to induce some anxiety.

For more on education politics and policy in Tennessee, follow @TNEdReport