Supposedly

Recently, Chalkbeat asked readers to pose questions about TNReady given the latest round of trouble for the state’s standardized test. One particular question asked about the validity of the scoring given that “scorers are hired off Craigslist.”

Here’s what the Tennessee Department of Education had to say:

“Questar does not use Craigslist. Several years ago, another assessment company supposedly posted advertisements on Craigslist, but Questar does not. We provide opportunities for our educators to be involved in developing our test, and we also encourage Tennessee teachers to apply to hand-score TNReady.

So, good news: scorers for the new vendor are not hired off of Craigslist. But, disturbing that the TDOE used the hedge “supposedly.” Back in 2015, I wrote about Measurement, Inc.’s ads on Craigslist:

Certainly, quality scorers for TNReady can be found for $10.70-$11.20 an hour via ads posted on Craigslist. I’m sure parents in the state are happy to know this may be the pool of scorers determining their child’s test score. And teachers, whose evaluations are based on growth estimates from these tests, are also sure to be encouraged by the validity of results obtained in this fashion.

My post even included a copy of the ad being used by Measurement, Inc. Then, in 2016, WSMV ran a story on scorers being hired via Craigslist ads.

Another response from the TDOE also caught my attention. This one dealt with the validity of comparisons between the old TCAP test and the new TNReady. The TDOE suggests this is like a group of runners changing from running 5Ks to running a 10K.

Runner and blogger TC Weber has a good response.

Then, when the issue of students not taking the tests seriously due to the perennial problems with returning data, the TDOE engages in more blame shifting:

“We believe that if districts and schools set the tone that performing your best on TNReady is important, then students will take the test seriously, regardless of whether TNReady factors into their grade. We should be able to expect our students will try and do their best at any academic exercise, whether or not it is graded. This is a value that is established through local communication from educators and leaders, and it will always be key to our test administration.

So, the fact that testing data has been returned late or that the quick score calculation method has changed has nothing to do with how students understand the test. If only those pesky school districts and their troublesome teachers would get on board and reinforce the right “values,” everything would be fine.

Here’s a hint, TDOE: Take some damn responsibility. TNReady has been a dumpster fire. Before that, you couldn’t get TCAP scores back in a reliable fashion. When districts told the TDOE that TNReady’s online administration wasn’t going to go well in 2016, the TDOE ignored them. Now, some students are wary of the test and whether or not it has any impact on their grades or any relevance to their learning. The TDOE simply responds by telling districts that if they just stopped asking so many questions and started drilling in the right messages, all would be well.

The disconnect is real.

As I noted in an earlier piece, accountability is a one way street when it comes to TDOE. This message is worth repeating:

How many warning signs will be ignored? How important is the test that it must be administered at all costs and the mistakes must be excused away because “accountability” demands it?

How can you hold students and teachers and schools accountable when no one is holding the Department of Education accountable? How long will legislators tolerate a testing regime that creates nightmares for our students and headaches for our teachers while yielding little in terms of educational value?

Apparently, according to Governor Haslam, everything is fine.

Still, the legislature meets again starting in January. And, there’s a Governor’s race on next year as well. Perhaps the combination of those events will lead to an environment that produces real answers.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

This is Fine

Amid the latest round of TNReady troubles that included both miscalculated student scores and errors in how those scores were used in some teacher evaluations, the House of Representatives held hearings last week to search for answers.

On the same day of the committee hearings, Governor Bill Haslam let everyone know that things were going well.

Chalkbeat reports:

Earlier in the day, Gov. Bill Haslam called the controversy overblown because this year’s errors were discovered as part of the state’s process for vetting scores.

“I think the one thing that’s gotten lost in all this discussion is the process worked,” Haslam told reporters. “It was during the embargo period before any of the results were sent out to students and their families that this was caught.”

Here’s the deal: If this were the only problem with TNReady so far, Governor Haslam would be right. This would be no big deal. But, you know, it’s not the only problem. At all.

Let’s start from the beginning. Which was supposed to be 2016. Except it didn’t happen. And then it kept not happening. For full disclosure, I have a child who was in 4th grade at the time of what was to be the inaugural year of TNReady. The frustration of watching her prepare for a week of testing only to be told it would happen later and then later and then maybe never was infuriating. That adults at decision-making levels think it is just fine to treat students that way is telling. It also says something that when some adults try to stand up for their students, they are smacked down by our Commissioner of Education.

As for the aforementioned Commissioner of Education, some may remember the blame shifting and finger pointing engaged in by Commissioner McQueen and then-TNReady vendor Measurement, Inc. That same attitude was on display again this year when key deadlines were missed for the return of “quick scores” to school districts.

Which brings us to the perennial issue of delivering accurate score reports to districts. This year was the fourth year in a row there have been problems delivering these results to school districts. Each year, we hear excuses and promises about how it will be better next year. Then, it isn’t.

Oh, and what if you’re a parent like me and you’re so frustrated you just want to opt your child out of testing. Well, according to Commissioner McQueen and the Governor who supports her, that’s not an option. Sadly, many districts have fallen in line with this way of thinking.

Here’s the thing: McQueen’s reasoning is missing something. Yes, she lacks credibility generally. But, specifically, she’s ignoring some key evidence. As I noted previously:

All along, the state has argued a district’s federal funds could be in jeopardy due to refusal to administer the test or a district’s inability to test at least 95% of its students.

As such, the argument goes, districts should fight back against opt-outs and test refusals by adopting policies that penalize students for taking these actions.

There’s just one problem: The federal government has not (yet) penalized a single district for failing to hit the 95% benchmark. In fact, in the face of significant opt-outs in New York last year (including one district where 89% of students opted-out), the U.S. Department of Education communicated a clear message to New York state education leaders:  Districts and states will not suffer a loss of federal dollars due to high test refusal rates. The USDOE left it up to New York to decide whether or not to penalize districts financially.

So, you have a system that is far from perfect and based on this system (TNReady), you penalize teachers (through their evaluations) and schools (through an A-F school grading system). Oh yeah, and you generate “growth” scores and announce “reward” schools based on what can best be described as a problematic (so far) measuring stick with no true comparability to the previous measuring stick.

Anyway, Bill Haslam is probably right. This is fine.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

A Lot of Words

The Murfreesboro City School Board has already expressed concern about the state’s TNReady tests and the delay in receiving results.

More recently, Board members expressed frustration with the response they received from Education Commissioner Candice McQueen.

The Murfreesboro Post reports:

“I felt like it was a lot of words for not really answering our questions,” said Board Member Jared Barrett. He referred to the response as having “excuses” and “dodging the question.”

“My first response when I read this letter was that there’s something in here that doesn’t add up,” said Board Member Phil King. “My fear is they haven’t solved the problem of getting the paper tests in our hands in a timely manner.”

King suggested moving away from using TNReady in teacher evaluations until the state can prove it can get results back to districts in a timely manner.

The Murfreesboro School Board meeting happened before the most recent round of TNReady troubles, with some students receiving incorrect scores and some teachers not having students properly counted in their TVAAS scores.

In response to those issues, House Speaker Beth Harwell has called for hearings on the issue of state testing.

Additionally, yesterday, the United Education Association of Shelby County called for TNReady scores for this year to be invalidated and for a moratorium on including TNReady scores in accountability measures until 2021.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

Dear Educator

The Tennessee Department of Education explains the case of the missing students as some 900 teachers see their TVAAS scores recalculated.

Here’s the email those educators were sent:

Dear Educator,

We wanted to share an update with you regarding your individual TVAAS data.

The department has processed about 1.5 million records to generate individual TVAAS scores for nearly 19,000 educators based on the assessment results from over 1.9 million student tests in grades 2-8 and high school. During the review process with districts, we found that a small number of educators did not have all of their teacher-student claiming linkage records fully processed in data files released in early September. All linkage data that was captured in EdTools directly was fully incorporated as expected. However, due to a coding error in their software, our data processing vendor, RANDA Solutions, did not fully apply the linkage information that districts provided in supplemental Excel files over the summer. As a result, we are working with Randa to ensure that this additional data is included in final TVAAS processing.

 

You have been identified as an educator with some linkage data submitted via an Excel file that was not fully processed. This means after our statistical analysis vendor, SAS, receives these additional linkage records your score may be revised to reflect all the students you identified in the teacher-student claiming process. Only students marked “F” for instructional availability are used when calculating individual TVAAS data. Based on our records, there will be [X] additional students marked “F” for instructional availability linked to you when the additional data is incorporated.

 

Your district’s and school’s TVAAS scores are not affected by this situation given that all students are included in these metrics, regardless of which teacher is linked to them, so no other part of your evaluation composite would change. Moreover, only those teachers with this additional linkage data in Excel files are impacted, so the vast majority of your colleagues across the state have their final individual TVAAS composites, which are inclusive of all student data.

 

We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback. We will follow-up with more detailed information in the next couple of weeks. Also, as announced to districts earlier this month, the department and your districts will be using new systems and processes this year that will ensure that this type of oversight does not happen again.

 

Thank you for your patience as we work to share complete and accurate feedback for you. We deeply value each Tennessee educator and apologize for this delay in providing your final TVAAS results. Please contact our office via the email address below if you have any questions.

 

Respectfully,

 

Office of Assessment Logistics

Tennessee Department of Education

A few things stand out about this communication:

  1. Tennessee continues to experience challenges with the rollout of TNReady. That’s to be expected, but it begs the question: Why are we rushing this? Why not take some time, hit pause, and get this right?
  2. The Department says, “Thank you for your patience as we work to share complete and accurate feedback for you.” If accurate feedback was important, the state would take the time to build a value-added data set based on TNReady. This would take three to five years, but would improve the accuracy of the information provided to educators. As it stands, the state is comparing apples to oranges and generating value-added scores of little real value.
  3. On the topic of value-added data generally, it is important to note that even with a complete data set, TVAAS data is of limited value in terms of evaluating teacher effectiveness. A recent federal lawsuit settlement in Houston ended the use of value-added data for teacher evaluation there. Additionally, a judge in New York ruled the use of value-added data in teacher evaluation was “arbitrary and capricious.”
  4.  When will teachers have access to this less than accurate data? Here’s what the TDOE says, “We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback.” Maybe they aren’t setting a clear deadline because they have a track record of missing deadlines?
  5. It’s amazing to me that a teacher’s “overall level of effectiveness” can only be determined once TVAAS data is included in their evaluation score. It’s as if there’s no other way to determine an overall level of a teacher’s effectiveness. Not through principal observation. Not through analysis of data points on student progress taken throughout the year. Not through robust peer-evaluation systems.
  6. Let’s assume for a moment that the “level of effectiveness” indicator is useful for teacher development. Providing that score “later” is not exactly helpful. Ideally, actionable insight would be provided to a teacher and his/her administrators near the end of a school year. This would allow for targeted professional development to address areas that need improvement. Of course, this assumes targeted PD is even available.
  7. Accountability. This is the latest in a series of mishaps related to the new testing regimen known as TNReady. Teachers are held accountable through their evaluation scores, and in some districts, their pay is tied to those scores. Schools and districts are held accountable for growth and achievement scores and must develop School Improvement Plans to target areas of weakness. On the other hand, the Department of Education continues to make mistakes in the TNReady transition and no one is held accountable.

The email to impacted teachers goes to great lengths to establish the enormous scope of the TNReady transition. Lots of tests, lots of students, not too many mistakes. If this were the only error so far in the TNReady process, all could be forgiven. Instead, it is the latest in a long line of bumps. Perhaps it will all smooth out in time. Which only makes the case for hitting pause all the stronger.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Wrong Answer

In the never ending saga that is testing in Tennessee, the latest chapter spins a familiar but frustrating tale. It seems the state’s testing vendor incorrectly scored thousands of TNReady tests, impacting student score reports and teacher evaluation scores based on those student scores.

Jennifer Pignolet and Jason Gonzales have more:

About 9,400 TNReady tests across the state were scored incorrectly, according to the Tennessee Department of Education.

The scoring issue impacted about 70 schools in 33 districts. Just over 1,000 of the incorrectly scored tests were in Shelby County Schools, according to an email from Superintendent Dorsey Hopson to his board on Friday.

Approximately 1,700 of the total incorrect tests scores, once corrected, changed what scoring category that test fell into, possibly affecting whether a student passed the test.

The error also impacted value-added scores for up to 230 teachers. A separate problem could impact TVAAS scores for as many as 900 teachers.

The scope of the error means scores in nearly 25% of the state’s school districts will need to be corrected. The Department of Education says the testing vendor, Questar, is re-scoring the tests.

UPDATE — Here’s a list of districts impacted:

  • Achievement School District
  • Anderson County
  • Benton County
  • Bradley County
  • Bristol City
  • Carter County
  • Cocke County
  • Collierville City
  • Crockett County
  • Davidson County
  • Elizabethton City
  • Giles County
  • Hamilton County
  • Hardin County
  • Henry County
  • Huntingdon Special School District
  • Jackson-Madison County
  • Knox County
  • Lewis County
  • Lincoln County
  • Marshall County
  • Maryville City
  • Monroe County
  • Montgomery County
  • Obion County
  • Putnam County
  • Roane County
  • Rutherford County
  • Shelby County
  • Smith County
  • Sumner County
  • Union City
  • Weakley County

The State of Tennessee has spent millions of dollars on a new testing regime supposedly better able to assess student mastery of state standards. So far, all most students, teachers, and parents have seen is problems.

The first set of problems happened on day one of the initial online administration of the test in 2016. Then, a series of missed deadlines led to the state firing then-vendor Measurement, Inc. That’s the same company that hired test scorers via ads on Craigslist.

Of course, this is the same Department of Education that has repeatedly had issues with test score data.

If only there had been warning signs or calls to take the time to phase-in TNReady so that it best serves students and educators.

You know, something like:

TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

That’s from an article I wrote in March of 2015 about TNReady data and the challenges of adapting to a new test using our current accountability system.

That was BEFORE the 2016 TNReady mess. It was before the state had a problem getting data back this year.

How many warning signs will be ignored? How important is the test that it must be administered at all costs and the mistakes must be excused away because “accountability” demands it?

How can you hold students and teachers and schools accountable when no one is holding the Department of Education accountable? How long will legislators tolerate a testing regime that creates nightmares for our students and headaches for our teachers while yielding little in terms of educational value?

At least one school board has complained about the state’s handling of TNReady data this year. I suspect more will follow in the wake of this latest mistake.

So far, TNReady has sent one clear message: Accountability is a one way street in Tennessee and students, teachers, and districts are on the wrong end.

 

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

TC Talks Testing

Nashville education blogger TC Weber talks about testing (and a lot of other things) in his latest post.

Specifically, he talks about the release of data on TNReady tests and the comparisons being made to previous TCAP tests.

Keep in mind: We didn’t have a complete administration of TNReady in 2016. Which means the 2017 test was the first year for TNReady. It also means the comparisons being made are based on different tests taken two years ago. So, you have analysis of 5th grade results and “growth” on TNReady being made in comparison to 3rd grade results on TCAP.

It’s apples and oranges. 

Here’s what TC has to say:

Let’s approach this in a different manner though. Say I annually run a 5k race and each year my timing goes up a little bit, so I’m feeling like  I want something different. After year 5 I change to a 10k race. My time for that race is substantially lower. What conclusions can I draw from that difference in time? Am I really not that good a 5k runner? Is the course really that much harder than the 5k I was running? Is my training off? Am I not that good a runner?
I’d say there are very few conclusions, based on comparing the results between my 5k and my 10k time, that can be drawn. It could be that the length of the course was a bigger adjustment than anticipated. It could be that conditions were worse on the day I ran the 10k vs the 5k. It could be that one course was flatter and one was hillier. A kid could be good at bubble in questions but not write ins. How do we know that improvement isn’t contingent just on familiarity with the course? Or the test?
I know people will argue that we should all be training to run hills instead of a flat races. But does running hills well really indicate that I am a better runner? Terrain is just another variable. My liberal arts education always explained to me that in order to get the most accurate measurement possible you need to remove as many of the variables as possible.
One year of data is not a real indication of anything other than, kid’s are not very good at taking this test. In order to draw any meaningful conclusions, you would have to have a set of data that you could analyze for trends. Simply taking a 10k race and comparing it’s results to a 5k race’s results, just because both are races, is not a valid means to draw conclusions about a runners abilities. The same holds true for students and testing.
If TNReady really is the amazing test we’ve all been waiting for, why not take the time to build a reliable set of data? The results from year one don’t really tell us much of anything. Because we skipped* 2016, it’s even MORE difficult to draw meaningful conclusions about the transition from TCAP to TNReady.
TC talks about these challenges and more issues. Check it out.
*We didn’t actually skip the 2016 test. Instead, many students attempted to take the test only to face glitches with the online system. Schools then were given various new times for testing to start only to have those dates changed and ultimately, to see the test cancelled. 
Kids were jerked around with messages about how the “important test” was coming up next week only to have it not happen. Teachers were told they’d be proctoring tests and instead had to quickly plan lessons. Our schools and students adapted, to be sure. But, there is no way to give back the instructional time lost in 2016.
Now, we have students taking THE test in 2017 only to see a slow drip of data come back. Students are told the test matters, it will count toward their grades. Teachers have growth scores based on it. Schools are assigned ratings based on it. But, getting it right doesn’t matter. Well, unless it does.
Oh, and we spend a lot of money on a testing system that produces questionable results with data coming back at a time that reduces usefulness.
What’s next? This year, we’ll try again to administer TNReady online across the state. That didn’t work so well with the previous vendor, but maybe it will this time. Of course, online administration adds another variable to the mix. So, 2018 will be the first time many students have taken a fully online TNReady test. Assuming it works, online administration could address the challenges of getting results back in a timely fashion. But, the transition could impact student performance, once again calling into question the legitimacy of growth scores assigned to students and schools.
For more on education politics and policy in Tennessee, follow @TNEdReport


 

Muddy Waters

Laura Faith Kebede of Chalkbeat reports on the challenges in generating reliable TVAAS scores as a result of TNReady trouble last year. Her story cites a statistician from the Center for Assessment who explains the issue this way:

Damian Betebenner, a senior associate at Center for Assessment that regularly consults with state departments, said missing data on top of a testing transition “muddies the water” on results.

“When you look at growth over two years, so how much the student grew from third to fifth grade, then it’s probably going to be a meaningful quantity,” he said. “But to then assert that it isolates the school contribution becomes a pretty tenuous assertion… It adds another thing that’s changing underneath the scene.”

In other words, it’s difficult to get a meaningful result given the current state of testing in Tennessee. I wrote recently about this very issue and the problem with the validity of the growth scores this year.

Additionally, two years ago, I pointed out the challenges the state would face when shifting to a new test. Keep in mind, this was before all the TNReady trouble that further muddied the waters. Here’s what I said in March of 2015:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

The way to address this issue? Build multiple years of data in order to obtain reliable results:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

So, now we have two challenges: We have two different types of tests AND we have a missing year of data. Either one of these challenges creates statistical problems. The combination of the two calls for a serious reset of the state’s approach to accountability.

As I suggested yesterday, taking the time to get this right would mean not using the TNReady data for accountability for teachers, students, or schools until 2019 at the earliest. If our state is committed to TNReady, we should be committed to getting it right. We’re spending a lot of money on both TNReady and on TVAAS. If we’re going to invest in these approaches, we should also take the time to be sure that investment yields useful, reliable information.

Why does any of this matter? Because, as Kebede points out:

At the same time, TVAAS scores for struggling schools will be a significant factor to determine which improvement tracks they will be be placed on under the state’s new accountability system as outlined in its plan to comply with the federal Every Student Succeeds Act. For some schools, their TVAAS score will be the difference between continuing under a local intervention model or being eligible to enter the state-run Achievement School District. The school growth scores will also determine which charter schools are eligible for a new pot of state money for facilities.

TVAAS scores also count in teacher evaluations. TNReady scores were expected to count in student grades until the quick scores weren’t back in time. If all goes well with the online administration of TNReady this year, the scores will count for students.

The state says TNReady matters. The state evaluates schools based on TVAAS scores. The state teacher evaluation formula includes TVAAS scores for teachers and TNReady scores as one measure of achievement that can be selected.

In short: Getting this right matters.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Seeping Scores Sour School Board

Members of the Murfreesboro City School Board are not happy with the slow pace of results coming from the state’s new TNReady test. All seven elected board members sent a letter to Commissioner of Education Candice McQueen expressing their concerns.

The Daily News Journal reports:

“However, currently those test scores seep ever-so-slowly back to their source of origin from September until January,” the letter states. “And every year, precious time is lost. We encourage you to do everything possible to get test results — all the test results — to schools in a timely manner.

“We also encourage you to try to schedule distribution of those results at one time so that months are not consumed in interpreting, explaining and responding to those results,” the letter continued.

A Department of Education spokesperson suggested the state wants the results back sooner, too:

“We know educators, families and community members want these results so they can make key decisions and improve, and we want them to be in their hands as soon as possible,” Gast said.. “We, at the department, also desire these results sooner.”

Of course, this is the same department that continues to have trouble releasing quick score data in time for schools to use it in student report cards. In fact, this marked the fourth consecutive year there’s been a problem with end of year data — either timely release of that data or clear calculation of the data.

TDOE spokesperson Sara Gast went further in distancing the department from blame, saying:

Local schools should go beyond TNReady tests in determining student placement and teacher evaluations, Gast said.

“All personnel decisions, including retaining, placing, and paying educators, are decisions that are made locally, and they are not required to be based on TNReady results,” Gast said. “We hope that local leaders use multiple sources of feedback in making those determinations, not just one source, but local officials have discretion on their processes for those decisions.”

Here’s the problem with that statement: This is THE test. It is the test that determines a school’s achievement and growth score. It is THE test used to calculate an (albeit invalid) TVAAS score for teachers. It is THE test used in student report cards (when the quick scores come back on time). This is THE test.

Teachers are being asked RIGHT NOW to make choices about the achievement measure they will be evaluated on for their 2017-18 TEAM evaluation. One choice: THE test. The TNReady test. But there aren’t results available to allow teachers and principals to make informed choices.

One possible solution to the concern expressed by the Murfreesboro School Board is to press the pause button. That is, get the testing right before using it for any type of accountability measure. Build some data in order to establish the validity of the growth scores. Administer the test, get the results back, and use the time to work out any challenges. Set a goal of 2019 to have full use of TNReady results.

Another solution is to move to a different set of assessments. Students in Tennessee spend a lot of time taking tests. Perhaps a set of assessments that was less time-consuming could allow for both more instructional time and more useful feedback. I’ve heard some educators suggest the ACT suite of assessments could be adapted in a way that’s relevant to Tennessee classrooms.

It will be interesting to see if more school districts challenge the Department of Education on the current testing situation.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Apples and Oranges

Here’s what Director of Schools Dorsey Hopson had to say amid reports that schools in his Shelby County district showed low growth according to recently released state test data:

Hopson acknowledged concerns over how the state compares results from “two very different tests which clearly are apples and oranges,” but he added that the district won’t use that as an excuse.

“Notwithstanding those questions, it’s the system upon which we’re evaluated on and judged,” he said.

State officials stand by TVAAS. They say drops in proficiency rates resulting from a harder test have no impact on the ability of teachers, schools and districts to earn strong TVAAS scores, since all students are experiencing the same change.

That’s all well and good, except when the system upon which you are evaluated is seriously flawed, it seems there’s an obligation to speak out and fight back.

Two years ago, ahead of what should have been the first year of TNReady, I wrote about the challenges of creating valid TVAAS scores while transitioning to a new test. TNReady was not just a different test, it was (is) a different type of test than the previous TCAP test. For example, it included constructed response questions instead of simply multiple choice bubble-in questions.

Here’s what I wrote:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Here’s a statement from the academic article I cited to support this claim:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers.
You get different value-added results depending on the type of test you use. That is, you can’t just say this is a new test but we’ll compare peer groups from the old test and see what happens. Plus, TNReady presents the added challenge of not having been fully administered last year, so you’re now looking at data from two years ago and extrapolating to this year’s results.
Of course, the company paid millions to crunch the TVAAS numbers says that this transition presents no problem at all. Here’s what their technical document has to say about the matter:
In 2015-16, Tennessee implemented new End-of-Course (EOC) assessments in math and English/language arts. Redesigned assessments in Math and English/language arts were also implemented in grades 3-8 during the 2016-17 school year. Changes in testing regimes occur at regular intervals within any state, and these changes need not disrupt the continuity and use of value-added reporting by educators and policymakers. Based on twenty years of experience with providing valueadded and growth reporting to Tennessee educators, EVAAS has developed several ways to accommodate changes in testing regimes.
Prior to any value-added analyses with new tests, EVAAS verifies that the test’s scaling properties are suitable for such reporting. In addition to the criteria listed above, EVAAS verifies that the new test is related to the old test to ensure that the comparison from one year to the next is statistically reliable. Perfect correlation is not required, but there should be a strong relationship between the new test and old test. For example, a new Algebra I exam should be correlated to previous math scores in grades seven and eight and to a lesser extent other grades and subjects such as English/language arts and science. Once suitability of any new assessment has been confirmed, it is possible to use both the historical testing data and the new testing data to avoid any breaks or delays in value-added reporting.
A couple of problems with this. First, there was NO complete administration of a new testing regime in 2015-16. It didn’t happen.
Second, EVAAS doesn’t get paid if there’s not a way to generate these “growth scores” so it is in their interest to find some justification for comparing the two very different tests.
Third, researchers who study value-added modeling are highly skeptical of the reliability of comparisons between different types of tests when it comes to generating value-added scores. I noted Lockwood and McCaffrey (2007) above. Here are some more:
John Papay (2011) did a similar study using three different reading tests, with similar results. He stated his conclusion as follows: [T]he correlations between teacher value-added estimates derived from three separate reading tests — the state test, SRI [Scholastic Reading Inventory], and SAT [Stanford Achievement Test] — range from 0.15 to 0.58 across a wide range of model specifications. Although these correlations are moderately high, these assessments produce substantially different answers about individual teacher performance and do not rank individual teachers consistently. Even using the same test but varying the timing of the baseline and outcome measure introduces a great deal of instability to teacher rankings.
Two points worth noting here: First, different tests yield different value-added scores. Second, even using the same test but varying the timing can create instability in growth measures.
Then, there’s data from the Measures of Effective Teaching (MET) Project, which included data from Memphis. In terms of reliability when using value-added among different types of tests, here’s what MET reported:
Once more, the MET study offered corroborating evidence. The correlation between value-added scores based on two different mathematics tests given to the same students the same year was only .38. For 2 different reading tests, the correlation was .22 (the MET Project, 2010, pp. 23, 25).
Despite the claims of EVAAS, the academic research raises significant concerns about extrapolating results from different types of tests. In short, when you move to a different test, you get different value-added results. As I noted in 2015:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

Dorsey Hopson and other Directors of Schools should be pushing back aggressively. Educators should be outraged. After all, this unreliable data will be used as a portion of their teacher evaluations this year. Schools are being rated on a 1-5 scale based on a growth model grounded in suspect methods.

How much is this apple like last year’s orange? How much will this apple ever be like last year’s orange?

If we’re determined to use value-added modeling to measure school-wide growth or district performance, we should at least be determined to do it in a way that ensures valid, reliable results.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

The Data Wars: Herb Strikes Back

Yes, the Data Wars continue. Metro Nashville Public Schools (MNPS) gained new hope recently when 33 members of Nashville’s Metro Council penned a letter supporting resistance to the Achievement School District’s request for student data.

Now, Tennessee’s Attorney General has weighed-in and says the alliance of MNPS and Shelby County must comply with the ASD’s request. What happens if they don’t? Nate Rau notes in the Tennessean:

McQueen’s warning leaves open the possibility the state would dock education dollars from Metro and Shelby schools if they continue to deny her request.

It wouldn’t be the first time for Nashville, as the Haslam administration withheld $3.4 million in state funds in 2012 after the school board refused to approve controversial Great Hearts charter school.

Withholding state BEP funds is a favorite “ultimate weapon,” used in the Great Hearts controversy and also threatened during the TNReady debacle in year one of that test that wasn’t.

During the debate that ultimately saw Nashville schools lose funds in a BEP penalty, Commissioner Kevin Huffman and the Department of Education had an ally in then-Nashville Mayor Karl Dean. Joey Garrison reported in the (now defunct) City Paper at the time:

By this point, Huffman had already facilitated a July 26 meeting to discuss Great Hearts’ next move, a gathering that took place just hours before Great Hearts’ revised application would go before the Metro board for second consideration. The meeting site: the office of Mayor Karl Dean, also a Great Hearts backer. In attendance, among others, were Huffman, Dean, Barbic, Deputy Mayor Greg Hinote, Great Hearts officials Dan Scoggin and Peter Bezanson, and Bill DeLoache, a wealthy Nashville investor and one of the state’s leading charter school proponents.

As Rau points out, the current controversy stems from a newly-passed state law giving charter schools the opportunity to request student data from district schools. It seems, however, that there is some dispute over the intent of that law. Rau explains:

Slatery’s opinion also said that the student data may be used for the ASD to promote its schools to prospective students. State Rep. John Forgety, who chairs a House education committee and supported the legislation, told The Tennessean the intent was not to create a law that allowed districts to market to each other’s students.

So it seems the legislature may need to revisit the issue to clear things up.

Also unclear: Where do the current candidates for Governor stand on protecting student data vs. providing marketing information to competing districts and schools?

Stay tuned for more. Will the Shelby-MNPS alliance continue their resistance? Will Commissioner McQueen unleash the power of BEP fund withholding? Will this issue end up in court?

For more on education politics and policy in Tennessee, follow @TNEdReport