Driving Teachers Crazy

State Representative Jeremy Faison of Cosby says the state’s teacher evaluation system, and especially the portion that relies on student scores on TNReady is causing headaches for Tennessee’s teachers.

Faison made the remarks at a hearing of the House Government Operations Committee, which he chairs. The hearing featured teachers, administrators, and representatives from the Department of Education and Tennessee’s testing vendor, Questar.

Zach Vance of the Johnson City Press reports:

“What we’re doing is driving the teachers crazy. They’re scared to death to teach anything other than get prepared for this test. They’re not even enjoying life right now. They’re not even enjoying teaching because we’ve put so much emphasis on this evaluation,” Faison said.

Faison also said that if the Department of Education were getting ratings on a scale of 1 to 5, as teachers do under the state’s evaluation system (the TEAM model), there are a number of areas where the Department would receive a 1. Chief among them is communication:

“We’ve put an immense amount of pressure on my educators, and when I share with you what I think you’d get a one on, I’m speaking for the people of East Tennessee, the 11th House District, from what I’m hearing from 99.9 percent of my educators, my principal and my school superintendents.”

Rather frankly, Faison said both the state Department of Education and Questar should receive a one for its communication with local school districts regarding the standardized tests.

Faison’s concerns about the lack of communication from the TNDOE echo concerns expressed by Wilson County Director of Schools Donna Wright recently related to a different issue. While addressing the state’s new A-F report card to rate schools, Wright said:

We have to find a way to take care of our kids and particularly when you have to look at kids in kindergarten, kids in the 504 plan and kids in IEP. When you ask the Department of Education right now, we’re not getting any answers.

As for including student test scores in teacher evaluations, currently a system known as Tennessee Value Added Assessment System (TVAAS) is used to estimate the impact a teacher has on a student’s growth over the course of the year. At best, TVAAS is a very rough estimate of a fraction of a teacher’s impact. The American Statistical Association says value-added scores can estimate between 1-14% of a teacher’s impact on student performance.

Now, however, Tennessee is in the midst of a testing transition. While McQueen notes that value-added scores count less in evaluation (15% this past year, 20% for the current year), why county any percentage of a flawed score? When changing tests, the value of TVAAS is particularly limited:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best.

After the meeting, Faison confirmed that legislation will be forthcoming that detaches TNReady data from teacher evaluation and student grades.

Faison’s move represents policy based on acknowledging that TNReady is in the early stages, and more years of data are needed in order to ensure a better performance estimate. Or, as one principal who testified before the committee said, there’s nothing wrong with taking the time to get this right.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

A Lot of Words

The Murfreesboro City School Board has already expressed concern about the state’s TNReady tests and the delay in receiving results.

More recently, Board members expressed frustration with the response they received from Education Commissioner Candice McQueen.

The Murfreesboro Post reports:

“I felt like it was a lot of words for not really answering our questions,” said Board Member Jared Barrett. He referred to the response as having “excuses” and “dodging the question.”

“My first response when I read this letter was that there’s something in here that doesn’t add up,” said Board Member Phil King. “My fear is they haven’t solved the problem of getting the paper tests in our hands in a timely manner.”

King suggested moving away from using TNReady in teacher evaluations until the state can prove it can get results back to districts in a timely manner.

The Murfreesboro School Board meeting happened before the most recent round of TNReady troubles, with some students receiving incorrect scores and some teachers not having students properly counted in their TVAAS scores.

In response to those issues, House Speaker Beth Harwell has called for hearings on the issue of state testing.

Additionally, yesterday, the United Education Association of Shelby County called for TNReady scores for this year to be invalidated and for a moratorium on including TNReady scores in accountability measures until 2021.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

Dear Educator

The Tennessee Department of Education explains the case of the missing students as some 900 teachers see their TVAAS scores recalculated.

Here’s the email those educators were sent:

Dear Educator,

We wanted to share an update with you regarding your individual TVAAS data.

The department has processed about 1.5 million records to generate individual TVAAS scores for nearly 19,000 educators based on the assessment results from over 1.9 million student tests in grades 2-8 and high school. During the review process with districts, we found that a small number of educators did not have all of their teacher-student claiming linkage records fully processed in data files released in early September. All linkage data that was captured in EdTools directly was fully incorporated as expected. However, due to a coding error in their software, our data processing vendor, RANDA Solutions, did not fully apply the linkage information that districts provided in supplemental Excel files over the summer. As a result, we are working with Randa to ensure that this additional data is included in final TVAAS processing.

 

You have been identified as an educator with some linkage data submitted via an Excel file that was not fully processed. This means after our statistical analysis vendor, SAS, receives these additional linkage records your score may be revised to reflect all the students you identified in the teacher-student claiming process. Only students marked “F” for instructional availability are used when calculating individual TVAAS data. Based on our records, there will be [X] additional students marked “F” for instructional availability linked to you when the additional data is incorporated.

 

Your district’s and school’s TVAAS scores are not affected by this situation given that all students are included in these metrics, regardless of which teacher is linked to them, so no other part of your evaluation composite would change. Moreover, only those teachers with this additional linkage data in Excel files are impacted, so the vast majority of your colleagues across the state have their final individual TVAAS composites, which are inclusive of all student data.

 

We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback. We will follow-up with more detailed information in the next couple of weeks. Also, as announced to districts earlier this month, the department and your districts will be using new systems and processes this year that will ensure that this type of oversight does not happen again.

 

Thank you for your patience as we work to share complete and accurate feedback for you. We deeply value each Tennessee educator and apologize for this delay in providing your final TVAAS results. Please contact our office via the email address below if you have any questions.

 

Respectfully,

 

Office of Assessment Logistics

Tennessee Department of Education

A few things stand out about this communication:

  1. Tennessee continues to experience challenges with the rollout of TNReady. That’s to be expected, but it begs the question: Why are we rushing this? Why not take some time, hit pause, and get this right?
  2. The Department says, “Thank you for your patience as we work to share complete and accurate feedback for you.” If accurate feedback was important, the state would take the time to build a value-added data set based on TNReady. This would take three to five years, but would improve the accuracy of the information provided to educators. As it stands, the state is comparing apples to oranges and generating value-added scores of little real value.
  3. On the topic of value-added data generally, it is important to note that even with a complete data set, TVAAS data is of limited value in terms of evaluating teacher effectiveness. A recent federal lawsuit settlement in Houston ended the use of value-added data for teacher evaluation there. Additionally, a judge in New York ruled the use of value-added data in teacher evaluation was “arbitrary and capricious.”
  4.  When will teachers have access to this less than accurate data? Here’s what the TDOE says, “We expect to share your final growth score and overall level of effectiveness later this year. While we do not have more specific timing to share right now, we are expediting this process with our vendors to get you accurate feedback.” Maybe they aren’t setting a clear deadline because they have a track record of missing deadlines?
  5. It’s amazing to me that a teacher’s “overall level of effectiveness” can only be determined once TVAAS data is included in their evaluation score. It’s as if there’s no other way to determine an overall level of a teacher’s effectiveness. Not through principal observation. Not through analysis of data points on student progress taken throughout the year. Not through robust peer-evaluation systems.
  6. Let’s assume for a moment that the “level of effectiveness” indicator is useful for teacher development. Providing that score “later” is not exactly helpful. Ideally, actionable insight would be provided to a teacher and his/her administrators near the end of a school year. This would allow for targeted professional development to address areas that need improvement. Of course, this assumes targeted PD is even available.
  7. Accountability. This is the latest in a series of mishaps related to the new testing regimen known as TNReady. Teachers are held accountable through their evaluation scores, and in some districts, their pay is tied to those scores. Schools and districts are held accountable for growth and achievement scores and must develop School Improvement Plans to target areas of weakness. On the other hand, the Department of Education continues to make mistakes in the TNReady transition and no one is held accountable.

The email to impacted teachers goes to great lengths to establish the enormous scope of the TNReady transition. Lots of tests, lots of students, not too many mistakes. If this were the only error so far in the TNReady process, all could be forgiven. Instead, it is the latest in a long line of bumps. Perhaps it will all smooth out in time. Which only makes the case for hitting pause all the stronger.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Muddy Waters

Laura Faith Kebede of Chalkbeat reports on the challenges in generating reliable TVAAS scores as a result of TNReady trouble last year. Her story cites a statistician from the Center for Assessment who explains the issue this way:

Damian Betebenner, a senior associate at Center for Assessment that regularly consults with state departments, said missing data on top of a testing transition “muddies the water” on results.

“When you look at growth over two years, so how much the student grew from third to fifth grade, then it’s probably going to be a meaningful quantity,” he said. “But to then assert that it isolates the school contribution becomes a pretty tenuous assertion… It adds another thing that’s changing underneath the scene.”

In other words, it’s difficult to get a meaningful result given the current state of testing in Tennessee. I wrote recently about this very issue and the problem with the validity of the growth scores this year.

Additionally, two years ago, I pointed out the challenges the state would face when shifting to a new test. Keep in mind, this was before all the TNReady trouble that further muddied the waters. Here’s what I said in March of 2015:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

The way to address this issue? Build multiple years of data in order to obtain reliable results:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

So, now we have two challenges: We have two different types of tests AND we have a missing year of data. Either one of these challenges creates statistical problems. The combination of the two calls for a serious reset of the state’s approach to accountability.

As I suggested yesterday, taking the time to get this right would mean not using the TNReady data for accountability for teachers, students, or schools until 2019 at the earliest. If our state is committed to TNReady, we should be committed to getting it right. We’re spending a lot of money on both TNReady and on TVAAS. If we’re going to invest in these approaches, we should also take the time to be sure that investment yields useful, reliable information.

Why does any of this matter? Because, as Kebede points out:

At the same time, TVAAS scores for struggling schools will be a significant factor to determine which improvement tracks they will be be placed on under the state’s new accountability system as outlined in its plan to comply with the federal Every Student Succeeds Act. For some schools, their TVAAS score will be the difference between continuing under a local intervention model or being eligible to enter the state-run Achievement School District. The school growth scores will also determine which charter schools are eligible for a new pot of state money for facilities.

TVAAS scores also count in teacher evaluations. TNReady scores were expected to count in student grades until the quick scores weren’t back in time. If all goes well with the online administration of TNReady this year, the scores will count for students.

The state says TNReady matters. The state evaluates schools based on TVAAS scores. The state teacher evaluation formula includes TVAAS scores for teachers and TNReady scores as one measure of achievement that can be selected.

In short: Getting this right matters.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Apples and Oranges

Here’s what Director of Schools Dorsey Hopson had to say amid reports that schools in his Shelby County district showed low growth according to recently released state test data:

Hopson acknowledged concerns over how the state compares results from “two very different tests which clearly are apples and oranges,” but he added that the district won’t use that as an excuse.

“Notwithstanding those questions, it’s the system upon which we’re evaluated on and judged,” he said.

State officials stand by TVAAS. They say drops in proficiency rates resulting from a harder test have no impact on the ability of teachers, schools and districts to earn strong TVAAS scores, since all students are experiencing the same change.

That’s all well and good, except when the system upon which you are evaluated is seriously flawed, it seems there’s an obligation to speak out and fight back.

Two years ago, ahead of what should have been the first year of TNReady, I wrote about the challenges of creating valid TVAAS scores while transitioning to a new test. TNReady was not just a different test, it was (is) a different type of test than the previous TCAP test. For example, it included constructed response questions instead of simply multiple choice bubble-in questions.

Here’s what I wrote:

Here’s the problem: There is no statistically valid way to predict expected growth on a new test based on the historic results of TCAP. First, the new test has (supposedly) not been fully designed. Second, the test is in a different format. It’s both computer-based and it contains constructed-response questions. That is, students must write-out answers and/or demonstrate their work.

Since Tennessee has never had a test like this, it’s impossible to predict growth at all. Not even with 10% confidence. Not with any confidence. It is the textbook definition of comparing apples to oranges.

Here’s a statement from the academic article I cited to support this claim:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers.
You get different value-added results depending on the type of test you use. That is, you can’t just say this is a new test but we’ll compare peer groups from the old test and see what happens. Plus, TNReady presents the added challenge of not having been fully administered last year, so you’re now looking at data from two years ago and extrapolating to this year’s results.
Of course, the company paid millions to crunch the TVAAS numbers says that this transition presents no problem at all. Here’s what their technical document has to say about the matter:
In 2015-16, Tennessee implemented new End-of-Course (EOC) assessments in math and English/language arts. Redesigned assessments in Math and English/language arts were also implemented in grades 3-8 during the 2016-17 school year. Changes in testing regimes occur at regular intervals within any state, and these changes need not disrupt the continuity and use of value-added reporting by educators and policymakers. Based on twenty years of experience with providing valueadded and growth reporting to Tennessee educators, EVAAS has developed several ways to accommodate changes in testing regimes.
Prior to any value-added analyses with new tests, EVAAS verifies that the test’s scaling properties are suitable for such reporting. In addition to the criteria listed above, EVAAS verifies that the new test is related to the old test to ensure that the comparison from one year to the next is statistically reliable. Perfect correlation is not required, but there should be a strong relationship between the new test and old test. For example, a new Algebra I exam should be correlated to previous math scores in grades seven and eight and to a lesser extent other grades and subjects such as English/language arts and science. Once suitability of any new assessment has been confirmed, it is possible to use both the historical testing data and the new testing data to avoid any breaks or delays in value-added reporting.
A couple of problems with this. First, there was NO complete administration of a new testing regime in 2015-16. It didn’t happen.
Second, EVAAS doesn’t get paid if there’s not a way to generate these “growth scores” so it is in their interest to find some justification for comparing the two very different tests.
Third, researchers who study value-added modeling are highly skeptical of the reliability of comparisons between different types of tests when it comes to generating value-added scores. I noted Lockwood and McCaffrey (2007) above. Here are some more:
John Papay (2011) did a similar study using three different reading tests, with similar results. He stated his conclusion as follows: [T]he correlations between teacher value-added estimates derived from three separate reading tests — the state test, SRI [Scholastic Reading Inventory], and SAT [Stanford Achievement Test] — range from 0.15 to 0.58 across a wide range of model specifications. Although these correlations are moderately high, these assessments produce substantially different answers about individual teacher performance and do not rank individual teachers consistently. Even using the same test but varying the timing of the baseline and outcome measure introduces a great deal of instability to teacher rankings.
Two points worth noting here: First, different tests yield different value-added scores. Second, even using the same test but varying the timing can create instability in growth measures.
Then, there’s data from the Measures of Effective Teaching (MET) Project, which included data from Memphis. In terms of reliability when using value-added among different types of tests, here’s what MET reported:
Once more, the MET study offered corroborating evidence. The correlation between value-added scores based on two different mathematics tests given to the same students the same year was only .38. For 2 different reading tests, the correlation was .22 (the MET Project, 2010, pp. 23, 25).
Despite the claims of EVAAS, the academic research raises significant concerns about extrapolating results from different types of tests. In short, when you move to a different test, you get different value-added results. As I noted in 2015:

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable. As more years of data become available, it may be possible to make some correlation between past TCAP results and TNReady scores.

Or, if the state is determined to use growth scores (and wants to use them with accuracy), they will wait several years and build completely new growth models based on TNReady alone. At least three years of data would be needed in order to build such a model.

Dorsey Hopson and other Directors of Schools should be pushing back aggressively. Educators should be outraged. After all, this unreliable data will be used as a portion of their teacher evaluations this year. Schools are being rated on a 1-5 scale based on a growth model grounded in suspect methods.

How much is this apple like last year’s orange? How much will this apple ever be like last year’s orange?

If we’re determined to use value-added modeling to measure school-wide growth or district performance, we should at least be determined to do it in a way that ensures valid, reliable results.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

It Doesn’t Matter Except When It Does

This year’s TNReady quick score setback means some districts will use the results in student report cards and some won’t. Of course, that’s nobody’s fault. 

One interesting note out of all of this came as Commissioner McQueen noted that quick scores aren’t what really matters anyway. Chalkbeat reports:

The commissioner emphasized that the data that matters most is not the preliminary data but the final score reports, which are scheduled for release in July for high schools and the fall for grades 3-8. Those scores are factored into teachers’ evaluations and are also used to measure the effectiveness of schools and districts.

“Not until you get the score report will you have the full context of a student’s performance level and strengths and weaknesses in relation to the standards,” she said.

The early data matters to districts, though, since Tennessee has tied the scores to student grades since 2011.

First, tying the quick scores to student grades is problematic. Assuming TNReady is a good, reliable test, we’d want the best results to be used in any grade calculation. Using pencil and paper this year makes that impossible. Even when we switch to a test fully administered online, it may not be possible to get the full scores back in time to use those in student grades.

Shifting to a model that uses TNReady to inform and diagnose rather than evaluate students and teachers could help address this issue. Shifting further to a project-based assessment model could actually help students while also serving as a more accurate indicator of whether they have met the standards.

Next, the story notes that teachers will be evaluated based on the scores. This will be done via TVAAS — the state’s value-added modeling system. Even as more states move away from value-added models in teacher evaluation, Tennessee continues to insist on using this flawed model.

Again, let’s assume TNReady is an amazing test that truly measures student mastery of standards. It’s still NOT designed for the purpose of evaluating teacher performance. Further, this is the first year the test has been administered. That means it’s simply not possible to generate valid data on teacher performance from this year’s results. You can’t just take this year’s test (TNReady) and compare it to the TCAP from two years ago. They are different tests designed to measure different standards in a different way. You know, the old apples and oranges thing.

One teacher had this to say about the situation:

“There’s so much time and stress on students, and here again it’s not ready,” said Tikeila Rucker, a Memphis teacher who is president of the United Education Association of Shelby County.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

It May Be Ready, But is it Valid?

In today’s edition of Commissioner Candice McQueen’s Educator Update, she talks about pending legislation addressing teacher evaluation and TNReady.

Here’s what McQueen has to say about the issue:

As we continue to support students and educators in the transition to TNReady, the department has proposed legislation (HB 309) that lessens the impact of state test results on students’ grades and teachers’ evaluations this year.

In 2015, the Tennessee Teaching Evaluation Enhancement Act created a phase-in of TNReady in evaluation to acknowledge the state’s move to a new assessment that is fully aligned to Tennessee state standards with new types of test questions. Under the current law, TNReady data would be weighted at 20 percent for the 2016-17 year.

However, in the spirit of the original bill, the department’s new legislation resets the phase-in of growth scores from TNReady assessments as was originally proposed in the Tennessee Teaching Evaluation Enhancement Act. Additionally, moving forward, the most recent year’s growth score will be used for a teacher’s entire growth component if such use results in a higher evaluation score for the teacher.

We will update you as this bill moves through the legislative process, and if signed into law, we will share detailed guidance that includes the specific options available for educators this year. As we announced last year, if a teacher’s 2015-16 individual growth data ever negatively impacts his or her overall evaluation, it will be excluded. Additionally, as noted above, teachers will be able to use 2016-17 growth data as 35 percent of their evaluation if it results in a higher overall level of effectiveness.

And here’s a handy graphic that describes the change:

TNReady Graphic

 

 

Of course, there’s a problem with all of this: There’s not going to be valid data to use for TVAAS. Not this year. It’s bad enough that the state is transitioning from one type of test to another. That alone would call into question the validity of any comparison used to generate a value-added score. Now, there’s a gap in the data. As you might recall, there wasn’t a complete TNReady test last year. So, to generate a TVAAS score, the state will have to compare 2014-15 data from the old TCAP tests to 2016-17 data from what we hope is a sound administration of TNReady.

We really need at least three years of data from the new test to make anything approaching a valid comparison. Or, we should start over building a data-set with this year as the baseline. Better yet, we could go the way of Hawaii and Oklahoma and just scrap the use of value-added scores altogether.

Even in the best of scenarios — a smooth transition from TCAP to TNReady — data validity was going to be challenge.

As I noted when the issue of testing transition first came up:

Here’s what Lockwood and McCaffrey (2007) had to say in the Journal of Educational Measurement:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

And they concluded:

Our results provide a clear example that caution is needed when interpreting estimated teacher effects because there is the potential for teacher performance to depend on the skills that are measured by the achievement tests.

If you measure different skills, you get different results. That decreases (or eliminates) the reliability of those results. TNReady is measuring different skills in a different format than TCAP. It’s BOTH a different type of test AND a test on different standards. Any value-added comparison between the two tests is statistically suspect, at best. In the first year, such a comparison is invalid and unreliable.

So, we’re transitioning from TCAP to TNReady AND we have a gap in years of data. That’s especially problematic — but, not problematic enough to keep the Department of Education from plowing ahead (and patting themselves on the back) with a scheme that validates a result sure to be invalid.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

Knox County Takes a Stand

Last night, the Knox County School Board voted 6-3 in favor of a resolution calling on the General Assembly and State Board of Education to waive the use of TCAP/TNReady data in student grades and teacher evaluations this year.

The move comes as the state prepares to administer the tests this year with a new vendor following last year’s TNReady disaster. The lack of a complete testing cycle last year plus the addition of a new vendor means this year is the first year of the new test.

The Board passed the resolution in spite of Governor Haslam warning against taking such a step.

In his warning, Haslam said:

“The results we’ve seen are not by accident in Tennessee, and I think you have to be really careful about doing anything that could cause that to back up,” Haslam said.

He added:

Haslam attributed that progress to three things, including tying standardized tests to teacher evaluations.

“It’s about raising our standards and expectations, it’s about having year-end assessments that match those standards and then I think it’s about having assessments that are part of teachers’ evaluations,” Haslam said. “I think that you have to have all of those for a recipe for success.”

Haslam can present no evidence for his claim about the use of student assessment in teacher evaluation. In fact, it’s worth noting that prior to 2008, Tennessee students achieved at a high level according to what were then the state standards. While the standards themselves were determined to need improvement, the point is teachers were helping students hit the designated mark.

Teachers were moving students forward at this time without evaluations tied to student test results. Policymakers set a mark for student performance, teachers worked to hit that mark and succeeded. Standards were raised in 2008, and since then, Tennessee has seen detectable growth in overall results, including some exciting news when NAEP results are released.

To suggest that a year without the use of TVAAS scores in teacher evaluations will cause a setback is to insult Tennessee’s teachers. As if they’ll just relax and not teach as hard.

Another argument raised against the resolution is that it will somehow absolve teachers and students of accountability.

Joe Sullivan reports in the Knoxville Mercury:

In an email to board members, [Interim Director of Schools Buzz] Thomas asserted that, “We need a good standardized test each year to tell us how we are doing compared to others across the state and the nation. We will achieve greatness not by shying away from this accountability but by embracing it.” And he fretted that, “This resolution puts that at risk. In short, it will divide us. Once again we could find ourselves in two disputing camps. The pro-achievement folks on the one side and the pro-teacher folks on the other.”

Right now, we don’t know if we have a good standardized test. Taking a year to get it right is important, especially in light of the frustrations of last year’s TNReady experience.

Of course, there’s no need for pro-achievement and pro-teacher folks to be divided into two camps, either. Tennessee can have a good, solid test that is an accurate measure of student achievement and also treat teachers fairly in the evaluation process.

To be clear, teachers aren’t asking for a waiver from all evaluation. They are asking for a fair, transparent evaluation system. TVAAS has long been criticized as neither. Even under the best of circumstances, TVAAS provides a minimal level of useful information about teacher performance.

Now, we’re shifting to a new test. That shift alone makes it impossible to achieve a valid value-added score. In fact, researchers in the Journal of Educational Measurement have said:

We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.
These findings align with similar findings by Martineau (2006) and Schmidt et al (2005)
You get different results depending on the type of question you’re measuring.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

Changing to a new type of test creates value-added uncertainty. That means results attributed to teachers based on a comparison of this year’s tests and the old tests will not yield valid results.

While insisting that districts use TVAAS in teacher evaluations this year, the state is also admitting it’s not quite sure how that will work.

From Sullivan’s story:

When asked how these determinations will be made, a spokesperson for the state Department of Education acknowledges that a different methodology will have to be employed and says that, “we are still working with various statisticians and experts to determine the exact methodology we will use this year.”

Why not at take at least a year, be sure there’s a test that works, and then build a model based on that? What harm would come from giving teachers and students a year with a test that’s just a test? Moreover, the best education researchers have already warned that testing transitions create value-added bumps. Why not avoid the bumps and work to create an evaluation system that is fair and transparent?

Knox County has taken a stand. We’ll soon see if others follow suit. And if the state is listening.

For more on education politics and policy in Tennessee, follow @TNEdReport


 

 

Ready for a Fight

Yesterday, Williamson County Director of Schools Mike Looney issued a statement saying his district would not be administering the high school end of course tests in addition to the suspension of the grades 3-8 TNReady tests.

Commissioner McQueen is not very happy about that. She served notice to Looney and all other directors that refusing to administer the EOC would be considered a violation of state law.

Here’s the email she sent to Directors of Schools:

First, I want to thank you for your partnership and support as we have worked together to implement and administer the first year of a new assessment. I know you share my disappointment and frustration with the inability of our vendor to deliver on this higher quality assessment in grades 3-8, and I truly appreciate your patience and leadership.

 

I want to reiterate that the state’s termination of its contract with the testing vendor Measurement Incorporated (MI) and the related suspension of grades 3-8 testing does not apply to high school and End of Course (EOC) exams, and, therefore, all school districts are required to administer these assessments.

 

The state of Tennessee and local districts are under an obligation under both federal and state law, as well as state board of education rules and regulations, to administer annual assessments to our students. My decision to suspend grade 3-8 testing was based on the impossibility of testing and made in close consultation with the U.S. Department of Education (USDOE). Based on the fact that testing in grades 3-8 was not feasible due to the failure of MI to meet its contractual obligations, the USDOE has acknowledged that the department made a good faith effort to administer the assessments to all students in grades 3-8. Unlike grades 3-8, districts are in receipt of EOC exams and the challenges associated with the delivery of grades 3-8 do not exist.

 

Because EOC exams have been delivered, students should have the opportunity to show what they know to measure their progress toward postsecondary and the workforce. Failure to administer the high school assessments will adversely impact students who will not only lose the experience of an improved, high quality test aligned to our higher standards but also the information we plan to provide to students, parents and educators relative to student performance. In addition, districts will eliminate the option for their teachers to use this year’s student achievement data as part of their teacher evaluation if the data results in a higher score.

 

Because of these factors and because state or district action to cancel high school testing would willfully violate the laws that have been set forth relative to state assessment, neither the state nor districts have the authority to cancel EOC exams. Districts that have taken action to cancel EOC exams or communicated such action are in violation of the law and should rescind this action or communication.

What Does This Mean?

In response to the Murfreesboro City School Board considering refusing to administer Phase II of TNReady, the Department of Education issued a statement noting that doing so would be considered a major violation of state law and that withholding state funds was a possible penalty.

McQueen doesn’t say what the penalty would be if districts like Williamson proceed with their refusal to administer the EOCs, but she may well attempt to impose a financial penalty.

In her email, McQueen says:

Failure to administer the high school assessments will adversely impact students who will not only lose the experience of an improved, high quality test aligned to our higher standards but also the information we plan to provide to students, parents and educators relative to student performance.

Just what students want and need: Another test. Some have proposed using the ACT battery of tests as the high school testing measure rather than the current EOC structure.

McQueen also says:

In addition, districts will eliminate the option for their teachers to use this year’s student achievement data as part of their teacher evaluation if the data results in a higher score. 

While the idea of flexibility seems nice, I want to reiterate that any data gleaned from this year’s test is invalid as a value-added indicator of teacher performance. As such, there’s no useful information to be gained relative to teacher performance from this year’s EOCs. Put another way, McQueen’s argument about depriving teachers of an opportunity is invalid.

While the use of value-added data to assess teacher performance is of limited usefulness under optimum conditions, under this year’s transition, it is clearly and plainly invalid. If the goal of using such data is to improve teacher performance, why use data that yields essentially no information?

I have not yet seen a response from Dr. Looney or any other directors. But a fight could be brewing.

For more on education politics and policy in Tennessee, follow @TNEdReport

 

Ready to Refuse

As Tennessee schools prepare for Phase II of TNReady, the Department of Education has sent districts a memo outlining how they should handle students who refuse or attempt to “opt-out” of the test.

The general gist, according to reporting by Grace Tatter, is that you can’t opt-out or refuse. She reports:

District leaders received a memo last week instructing schools to “address student absences on testing days in the same manner as they would address a student’s failure to participate in any other mandatory activity at school (e.g. final exams) by applying the district’s or school’s attendance policies.”

The memo specifically notes:

 “State and federal law also requires student participation in state assessments. In fact, these statutes specifically reference the expectation that all students enrolled in public schools in Tennessee will complete annual assessments.”

That’s not entirely true.

Federal law, even with the newly passed Every Student Succeeds Act (ESSA), requires states to administer annual assessments in grades 3-8 and at least once in high school.

But there’s a difference in requiring a state to administer and requiring a student to complete an assessment. Federal law requires administration of the test, but does not compel students to complete the exams.

Then, there is state law. The memo lacks specific references to Tennessee statute, but there are a few sections that relate to testing.

TCA 49-1-6 includes references to performance assessment and the Tennessee Value-Added Assessment System (TVAAS). This portion of state law says that annual assessments will be administered in grades 3-8 and then outlines the secondary school testing schedule. Here again, the law notes tests will be administered, but contains no compulsory language for students.

Then there’s TCA 49-6-60 dealing with proficiency testing. This section specifically details testing to be administered in grades 8, 10, and 11 as a strategy to promote college readiness. As these three tests are required for graduation, they are essentially mandated. Students who don’t take them won’t complete the graduation requirements.

What’s missing? Language that compels a student to take the test or requires a district to compel students to take the test. The memo says that “state and federal” statutes specifically reference the expectation that students will complete the assessment. True, TVAAS and other accountability measures are made valid by significant student participation in state tests. But, that alone doesn’t make them compulsory. Unless it’s one of the three proficiency tests specifically referenced in the graduation requirements section, there’s no language directly compelling students to participate in annual assessments.

It’s worth noting that while the Department of Education has said there would be penalties if districts refused to administer the TNReady tests, the memo says districts are not authorized to allow “opting-out” or test refusal. What it doesn’t say is what impact allowing opt-out would have on the district. If a district offers the test, and students refuse, then what?

Stay tuned as Phase II starts later this month.

For more on education politics and policy in Tennessee, follow @TNEdReport