No Adverse Action

After much wrangling in a day that saw the Tennessee House of Representatives hold up proceedings in order to move forward with an effort to truly hold students, teachers, and schools harmless in light of this year’s TNReady trouble, it appears a compromise of sorts has been reached.

Here’s the language just adopted by the Senate and subsequently passed by the House:

SECTION 1. Tennessee Code Annotated, Title 49, Chapter 6, Part 60, is amended by adding the following language as a new section: Notwithstanding any law to the contrary, no adverse action may be taken against any student, teacher, school, or LEA based, in whole or in part, on student achievement data generated from the 2017-2018 TNReady assessments. For purposes of this section, “adverse action” includes, but is not limited to, the identification of a school as a priority school and the assignment of a school to the achievement school district.

This language does not explicitly address the issue of using TNReady for TVAAS, but it has an effect similar to legislation passed in 2016 during that year’s TNReady trouble. Yes, it seems problems with testing in Tennessee are the norm rather than the exception.

Here’s what this should mean for teachers: Yes, a TVAAS score will be calculated based on this year’s TNReady. But, if that TVAAS score lowers your overall TEAM score, it will be excluded — lowering your TEAM score would be an “adverse action.”

While not perfect, this compromise is a victory — the TNReady data from a messed up test will not harm grades or be used in the state’s A-F report card for schools or be used to give a negative growth score to a teacher via TVAAS.

Yes, TVAAS is still suspect, but there’s an election in November and a new Commissioner of Education coming after that. Heading into the November election is a great time to talk with candidates for the legislature and for Governor about the importance of evaluations that are fair and not based on voodoo math like TVAAS. Remember, even under the best of circumstances, TVAAS would not have yielded valid results this year.

While it is disappointing that Senators did not want to follow the lead of their House counterparts and explicitly deal with the TVAAS issue, there’s no doubt that persistent outreach by constituents moved the needle on this issue.

For more on education politics and policy in Tennessee, follow @TNEdReport

If you enjoy the education news provided here, consider becoming a patron!


 

TNReady and TVAAS: A Teacher’s Perspective

Nashville teacher Amanda Kail talks about the connection between TNReady and TVAAS and the importance of legislation moving TODAY that could actually hold teachers harmless.

QUESTION: I thought the legislature said the tests wouldn’t count. What’s going on?
ANSWER: The state legislature was moved by all the horror stories surrounding testing problems to tack a bunch of amendments on to the only remaining education bill of the session (HB1109/SB0987) which attempted to “hold harmless” students, teachers, and schools for the results of the test. What this technically means is that local boards of education can vote on how much they want the students’ scores to count towards their grades (0-15%), and that the data cannot be used to issue a letter grade to schools (A-F, another asinine idea designed to find new ways to punish schools that serve mostly poor kids, but I digress).
However, for teachers the bill specified only that the results of the testing could not be used for decisions regarding employment and compensation. It does not say anything about the scores not being used for EVALUATIONS. Because of this, many teachers across the state pushed TEA to go back to the legislature and demand that the legislation be amended to exclude this year’s scores from TVAAS. You can read more about the particulars of that in Andy Spears’ excellent article for the Tennessee Education Report.
As a result, the House Finance Committee voted to strip all the amendments from HB1109 and start over again with the “hold harmless” language. That needs to happen TOMORROW (4/24/18 — TODAY).
QUESTION: What is TVAAS?
ANSWER: Teachers in Tennessee have evaluations based partly on value-added measures (we called it “TVAAS” here). What this means is that the Tennessee Department of Education uses some sort of mystical secret algorithm (based on cattle propagation– REALLY!) to calculate how much growth each student will generate on statewide tests. If a student scores less growth (because, like, maybe their test crashed 10 times and they weren’t really concentrating so much anymore) than predicted, that student’s teacher receives a negative number that is factored into their yearly effectiveness score. Generally, TVAAS has been decried by everyone from our state teacher union to the American Statistical Association (and when you upset the statisticians, you have really gone too far), but the state continues to defend its use.
QUESTION: What if I am a teacher who didn’t experience any problems, and I think my students did great on the test? Why would I want to oppose using this year’s data for TVAAS?
ANSWER: Thousands of your colleagues around the state don’t have that luxury, because they DID have problems, and their students’ scores suffered as a result. In fact, even in a good year, thousands of your colleagues have effectiveness scores based on subjects they don’t even teach, because TVAAS is only based on tested subjects (math, ELA, and depending on the year science and social studies). The fact is that TVAAS is a rotten system. If it benefits you individually as a teacher, that’s great for you. But too many of your colleagues are driven out of the classroom by the absurdity of being held accountable for things completely beyond their control. As a fellow professional, I hope you see the wisdom in advocating for a sane system over one that just benefits you personally.
QUESTION: Okay. So what do we do now?
ANSWER: Contact your state house and senate representatives! TODAY! These are the last days of the legislative session, so it is IMPERATIVE that you contact them now and tell them to support amendments to HB1109 and SB0987 that will stop the use of this year’s testing data towards TVAAS. You can find your legislators here.
Don’t leave teachers holding the bag for the state’s mistakes. AGAIN.
For more on education politics and policy in Tennessee, follow @TNEdReport


 

More TNReady Fallout

As the state continues to experience challenges with TNReady implementation, districts are speaking out. In October, the Williamson County school board adopted resolutions asking for changes to how the state will assign letter grades to schools and asking that TNReady scores not be included in report cards for students in grades 3-5.

This week, Knox County adopted three resolutions relevant to the current testing troubles.

All three were sponsored by Board Member Amber Rountree.

One addresses the proposed letter grading of individual schools and asks:

The Knox County Board of Education hereby urges the Senate to amend legislation SB 535 in the upcoming session by assigning a school level designation that aligns with the district designation, rather than assigning a letter grade to each school; and BE IT FURTHER RESOLVED, The Knox County Board of Education hereby urges Governor Haslam, the State Board of Education, and the Tennessee General Assembly to consider a moratorium in using any school or district designation based on data obtained via the TNReady assessment which was administered in School Year 2016-17.

Another relates to the use of TNReady data for student grades and teacher evaluation:

The Knox County Board of Education opposes the use of TCAP data for any percentage of teacher evaluations and student grades for School Year 2017-2018 and urges the General Assembly and the State Board of Education to provide a one-year waiver, as was previously provided for School Year 2015-2016.

And then there’s one similar to Williamson’s request to exclude TNReady data from report cards for students in grades 3-5:

WHEREAS, the Knox County Board of Education submits student scores on the Tennessee comprehensive assessment program’s grades 3-5 achievement test scores should not comprise a percentage of the student’s final grade for the spring semester in the areas of mathematics, reading/language arts, science and social studies.

NOW THEREFORE BE IT RESOLVED BY THE KNOX COUNTY BOARD OF EDUCATION AS FOLLOWS: The Knox County Board of Education hereby urges the Tennessee General Assembly amend Tennessee Code Annotated, Section 49-1-617 to remove the requirement of using any portion of the Tennessee comprehensive assessment program scores as a percentage of the students in grades 3-5 spring semester grade

 

No word yet on a response to these two districts speaking out on the proper use of TNReady data.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

TC Talks Testing

Nashville education blogger TC Weber talks about testing (and a lot of other things) in his latest post.

Specifically, he talks about the release of data on TNReady tests and the comparisons being made to previous TCAP tests.

Keep in mind: We didn’t have a complete administration of TNReady in 2016. Which means the 2017 test was the first year for TNReady. It also means the comparisons being made are based on different tests taken two years ago. So, you have analysis of 5th grade results and “growth” on TNReady being made in comparison to 3rd grade results on TCAP.

It’s apples and oranges. 

Here’s what TC has to say:

Let’s approach this in a different manner though. Say I annually run a 5k race and each year my timing goes up a little bit, so I’m feeling like  I want something different. After year 5 I change to a 10k race. My time for that race is substantially lower. What conclusions can I draw from that difference in time? Am I really not that good a 5k runner? Is the course really that much harder than the 5k I was running? Is my training off? Am I not that good a runner?
I’d say there are very few conclusions, based on comparing the results between my 5k and my 10k time, that can be drawn. It could be that the length of the course was a bigger adjustment than anticipated. It could be that conditions were worse on the day I ran the 10k vs the 5k. It could be that one course was flatter and one was hillier. A kid could be good at bubble in questions but not write ins. How do we know that improvement isn’t contingent just on familiarity with the course? Or the test?
I know people will argue that we should all be training to run hills instead of a flat races. But does running hills well really indicate that I am a better runner? Terrain is just another variable. My liberal arts education always explained to me that in order to get the most accurate measurement possible you need to remove as many of the variables as possible.
One year of data is not a real indication of anything other than, kid’s are not very good at taking this test. In order to draw any meaningful conclusions, you would have to have a set of data that you could analyze for trends. Simply taking a 10k race and comparing it’s results to a 5k race’s results, just because both are races, is not a valid means to draw conclusions about a runners abilities. The same holds true for students and testing.
If TNReady really is the amazing test we’ve all been waiting for, why not take the time to build a reliable set of data? The results from year one don’t really tell us much of anything. Because we skipped* 2016, it’s even MORE difficult to draw meaningful conclusions about the transition from TCAP to TNReady.
TC talks about these challenges and more issues. Check it out.
*We didn’t actually skip the 2016 test. Instead, many students attempted to take the test only to face glitches with the online system. Schools then were given various new times for testing to start only to have those dates changed and ultimately, to see the test cancelled. 
Kids were jerked around with messages about how the “important test” was coming up next week only to have it not happen. Teachers were told they’d be proctoring tests and instead had to quickly plan lessons. Our schools and students adapted, to be sure. But, there is no way to give back the instructional time lost in 2016.
Now, we have students taking THE test in 2017 only to see a slow drip of data come back. Students are told the test matters, it will count toward their grades. Teachers have growth scores based on it. Schools are assigned ratings based on it. But, getting it right doesn’t matter. Well, unless it does.
Oh, and we spend a lot of money on a testing system that produces questionable results with data coming back at a time that reduces usefulness.
What’s next? This year, we’ll try again to administer TNReady online across the state. That didn’t work so well with the previous vendor, but maybe it will this time. Of course, online administration adds another variable to the mix. So, 2018 will be the first time many students have taken a fully online TNReady test. Assuming it works, online administration could address the challenges of getting results back in a timely fashion. But, the transition could impact student performance, once again calling into question the legitimacy of growth scores assigned to students and schools.
For more on education politics and policy in Tennessee, follow @TNEdReport


 

Apples and Oranges

Here’s what Director of Schools Dorsey Hopson had to say amid reports that schools in his Shelby County district showed low growth according to recently released state test data:

Hopson acknowledged concerns over how the state compares results from “two very different tests which clearly are apples and oranges,” but he added that the district won’t use that as an excuse.

“Notwithstanding those questions, it’s the system upon which we’re evaluated on and judged,” he said.

State officials stand by TVAAS. They say drops in proficiency rates resulting from a harder test have no impact on the ability of teachers, schools and districts to earn strong TVAAS scores, since all students are experiencing the same change.

That’s all well and good, except when the system upon which you are evaluated is seriously flawed, it seems there’s an obligation to speak out and fight back.

Two years ago, ahead of what should have been the first year of TNReady, I wrote about the challenges of creating valid TVAAS scores while transitioning to a new test. TNReady was not just a different test, it was (is) a different type of test than the previous TCAP test. For example, it included constructed response questions instead of simply multiple choice bubble-in questions.

Here’s what I wrote:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Here’s a statement from the academic article I cited to support this claim:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers.
You get different value-added results depending on the type of test you use. That is, you can’t just say this is a new test but we’ll compare peer groups from the old test and see what happens. Plus, TNReady presents the added challenge of not having been fully administered last year, so you’re now looking at data from two years ago and extrapolating to this year’s results.
Of course, the company paid millions to crunch the TVAAS numbers says that this transition presents no problem at all. Here’s what their technical document has to say about the matter:
In 2015-16, Tennessee implemented new End-of-Course (EOC) assessments in math and English/language arts. Redesigned assessments in Math and English/language arts were also implemented in grades 3-8 during the 2016-17 school year. Changes in testing regimes occur at regular intervals within any state, and these changes need not disrupt the continuity and use of value-added reporting by educators and policymakers. Based on twenty years of experience with providing valueadded and growth reporting to Tennessee educators, EVAAS has developed several ways to accommodate changes in testing regimes.
Prior to any value-added analyses with new tests, EVAAS verifies that the test’s scaling properties are suitable for such reporting. In addition to the criteria listed above, EVAAS verifies that the new test is related to the old test to ensure that the comparison from one year to the next is statistically reliable. Perfect correlation is not required, but there should be a strong relationship between the new test and old test. For example, a new Algebra I exam should be correlated to previous math scores in grades seven and eight and to a lesser extent other grades and subjects such as English/language arts and science. Once suitability of any new assessment has been confirmed, it is possible to use both the historical testing data and the new testing data to avoid any breaks or delays in value-added reporting.
A couple of problems with this. First, there was NO complete administration of a new testing regime in 2015-16. It didn’t happen.
Second, EVAAS doesn’t get paid if there’s not a way to generate these “growth scores” so it is in their interest to find some justification for comparing the two very different tests.
Third, researchers who study value-added modeling are highly skeptical of the reliability of comparisons between different types of tests when it comes to generating value-added scores. I noted Lockwood and McCaffrey (2007) above. Here are some more:
John Papay (2011) did a similar study using three different reading tests, with similar results. He stated his conclusion as follows: [T]he correlations between teacher value-added estimates derived from three separate reading tests — the state test, SRI [Scholastic Reading Inventory], and SAT [Stanford Achievement Test] — range from 0.15 to 0.58 across a wide range of model specifications. Although these correlations are moderately high, these assessments produce substantially different answers about individual teacher performance and do not rank individual teachers consistently. Even using the same test but varying the timing of the baseline and outcome measure introduces a great deal of instability to teacher rankings.
Two points worth noting here: First, different tests yield different value-added scores. Second, even using the same test but varying the timing can create instability in growth measures.
Then, there’s data from the Measures of Effective Teaching (MET) Project, which included data from Memphis. In terms of reliability when using value-added among different types of tests, here’s what MET reported:
Once more, the MET study offered corroborating evidence. The correlation between value-added scores based on two different mathematics tests given to the same students the same year was only .38. For 2 different reading tests, the correlation was .22 (the MET Project, 2010, pp. 23, 25).
Despite the claims of EVAAS, the academic research raises significant concerns about extrapolating results from different types of tests. In short, when you move to a different test, you get different value-added results. As I noted in 2015:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

Dorsey Hopson and other Directors of Schools should be pushing back aggressively. Educators should be outraged. After all, this unreliable data will be used as a portion of their teacher evaluations this year. Schools are being rated on a 1-5 scale based on a growth model grounded in suspect methods.

How much is this apple like last year’s orange? How much will this apple ever be like last year’s orange?

If we’re determined to use value-added modeling to measure school-wide growth or district performance, we should at least be determined to do it in a way that ensures valid, reliable results.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

It Doesn’t Matter Except When It Does

This year’s TNReady quick score setback means some districts will use the results in student report cards and some won’t. Of course, that’s nobody’s fault. 

One interesting note out of all of this came as Commissioner McQueen noted that quick scores aren’t what really matters anyway. Chalkbeat reports:

The commissioner emphasized that the data that matters most is not the preliminary data but the final score reports, which are scheduled for release in July for high schools and the fall for grades 3-8. Those scores are factored into teachers’ evaluations and are also used to measure the effectiveness of schools and districts.

“Not until you get the score report will you have the full context of a student’s performance level and strengths and weaknesses in relation to the standards,” she said.

The early data matters to districts, though, since Tennessee has tied the scores to student grades since 2011.

First, tying the quick scores to student grades is problematic. Assuming TNReady is a good, reliable test, we’d want the best results to be used in any grade calculation. Using pencil and paper this year makes that impossible. Even when we switch to a test fully administered online, it may not be possible to get the full scores back in time to use those in student grades.

Shifting to a model that uses TNReady to inform and diagnose rather than evaluate students and teachers could help address this issue. Shifting further to a project-based assessment model could actually help students while also serving as a more accurate indicator of whether they have met the standards.

Next, the story notes that teachers will be evaluated based on the scores. This will be done via TVAAS — the state’s value-added modeling system. Even as more states move away from value-added models in teacher evaluation, Tennessee continues to insist on using this flawed model.

Again, let’s assume TNReady is an amazing test that truly measures student mastery of standards. It’s still NOT designed for the purpose of evaluating teacher performance. Further, this is the first year the test has been administered. That means it’s simply not possible to generate valid data on teacher performance from this year’s results. You can’t just take this year’s test (TNReady) and compare it to the TCAP from two years ago. They are different tests designed to measure different standards in a different way. You know, the old apples and oranges thing.

One teacher had this to say about the situation:

“There’s so much time and stress on students, and here again it’s not ready,” said Tikeila Rucker, a Memphis teacher who is president of the United Education Association of Shelby County.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Hamilton Principals Call for TNReady Waiver

A group of school principals in Hamilton County is joining the call for a waiver of the use of TNReady scores in teacher evaluations and accountability data in light of day one problems with the administration of the online assessment.

Here’s the resolution:

 

 

HCPA Resolution Regarding State Assessments