A Matter of Fairness

A coalition of education advocacy groups released an online petition today calling for a one year waiver from using student test scores in teacher evaluations in Tennessee.

Here’s the press release:

A coalition of groups supporting public education today launched an online petition asking the Tennessee General Assembly and Governor Bill Haslam to grant teachers a grace period from the use of student test scores in their evaluations in the first year of new TNReady tests. The petition tracks language adopted unanimously by the Knox County School Board, which passed a resolution last week opposing the use of student test scores in teacher evaluation for this academic year.

“The state has granted waivers so that TNReady scores aren’t required to be counted in student grades for this year,” said Lyn Hoyt, president of Tennesseans Reclaiming Educational Excellence (TREE). “If TNReady won’t count in student grades, it’s only fair that it shouldn’t count for teacher evaluation.” Hoyt noted that the transition to the new test means entering uncharted territory in terms of student scores and impact on teacher evaluation scores. As such, she said, there should be a one year or more grace period to allow for adjustment to the new testing regime.

“TNReady is different than the standardized tests we’ve had in the past,” Hoyt said. “Our students and teachers both deserve a reasonable transition period. We support the Knox County resolution and we are calling on the General Assembly to take notice and take action. Taking a thoughtful path transitioning to the new test can also build confidence and trust in the process.”

Hoyt also cited a recent policy statement by the American Educational Research Association that cautions against using value-added data in teacher evaluations and for high-stakes purposes. “Researchers who study value-added data are urging states to be cautious in how it is used to evaluate teachers,” Hoyt said. “The transition to TNReady is the perfect time to take a closer look at how test scores are used in teacher evaluations. Let’s take a year off, and give our students and teachers time to adjust. It’s a matter of fundamental fairness.”

Groups supporting the petition include:

Strong Schools (Sumner County)
Williamson Strong (Williamson County)
SPEAK (Students, Parents, Educators Across Knox County)
SOCM (Statewide Organizing for Community eMpowerment)

Middle TN CAPE (Coalition Advocating for Public Education)
Momma Bears Blog
Advocates for Change in Education (Hamilton County)
Concerned Parents of Franklin County (Franklin County)
Parents of Wilson County, TN, Schools
Friends of Oak Ridge Schools (City of Oak Ridge Schools)
TNBATs (State branch of National BATs)
TREE (Tennesseans Reclaiming Educational Excellence)
TEA (Tennessee Education Association)

For more on education politics and policy in Tennessee, follow @TNEdReport

New and Not Ready

Connie Kirby and Carol Bomar-Nelson, English teachers at Warren County High School, share their frustration with the transition to TNReady and what it means for teacher evaluation.

Connie Kirby:

This is going to be long, but I don’t usually take to social media to “air my grievances.” Today I feel like there’s no better answer than to share how I feel. It’s been a long year with some of the highest of the highs and lowest of the lows. I work in a wonderful department at a great school with some of the most intelligent, hard-working people I know. As the years have progressed, we have gone through many changes together and supported each other through the good and the bad (personally and professionally). We do our best to “comply” with the demands that the state has put on us, but this year everything that we’ve been hearing about and preparing for for years has come to fruition. We’re finally getting familiar with the “real deal” test, instead of dealing with EOCs and wondering how it’s going to change. I’ve seen the posts and rants about Common Core and have refrained from jumping on the bandwagon because I have had no issues with the new standards. I do, however, see an issue with the new assessment, so I have held my hand in the hopes that I might find something worth sharing and putting my name next to. Today, I witnessed an exchange between one of my colleagues and the state, and I couldn’t have said it better myself. With her permission, I am sharing her words.

Carol Bomar-Nelson:

I don’t know how to fix the problems with the test. I agree that teachers should have accountability, and I think student test scores are one way of doing that. Having said that, if the state is going to hold teachers accountable for student test scores, then the test needs to be fair. From what I have seen, I firmly believe that is not the case. I am not just basing this conclusion on the one “Informational Test” in MICA. Other quizzes I have generated in MICA have had similar flaws. When my department and I design common assessments in our PLC’s, we all take the tests and compare answers to see which questions are perhaps ambiguous or fallacious in some way. I do not see any evidence that the state is doing this for the tests that it is manufacturing. A team of people can make a test that is perfect with respect to having good distractors, clear wording, complex passages, and all the other components that make up a “good” test, but until several people take the test, compare answers, and discuss what they missed, that test is not ready for students to take–especially not on a high stakes test that is supposed to measure teacher effectiveness. I understand that this is the first year of this test. I am sympathetic to the fact that everyone is going through a ‘learning process’ as they adapt to the new test. Students have to learn how to use the technology; teachers have to learn how to prepare their students for a new type of tests; administrators have to figure out how to administer the test; the state has to work out the kinks in the test itself…The state is asking everyone to be “patient” with the new system. But what about for the teachers? Yes, the teacher effectiveness data only counts for 10% this year, but that 10% still represents how I am as a teacher. In essence, this new tests is like a pretest, correct? A pretest to get a benchmark about where students stand at the end of the year with this new test that has so many flaws and so many unknowns. In the teaching profession, I think all would agree that it is bad practice to count a pretest AT ALL for a student’s grade. Not 35%, not 25%, not even 10%. So how is it acceptable practice to count a flawed test for 10% of a teacher’s evaluation? We can quibble all day about which practice questions…are good and which questions are flawed, but that will not fix the problem. The problem lies in the test development process. If the practice questions go through the same process as the real questions, it would stand to reason that the real test questions are just as flawed as the practice questions. My students have to take that test; I never get to see it to determine if it is a fair test or not, and yet it still counts as 10% of my evaluation that shows my effectiveness as a teacher. How is that fair in any way whatsoever? In what other profession are people evaluated on something that they never get to see? Especially when that evaluation ‘tool’ is new and not ready for use?

I know how to select complex texts. I know how to collaborate with my PLC. I can teach my students how to read, think critically, analyze, and write. When I do not know how to do something, I have no problem asking other teachers or administrators for suggestions, advice, and help. I am managing all of the things that are in my control to give my students the best possible education. Yet in the midst of all of these things, my teacher accountability is coming from a test that is generated by people who have no one holding them accountable. And at the end of the year, when those scores come back to me, I have no way to see the test to analyze its validity and object if it is flawed.

For more on education politics and policy in Tennessee, follow @TNEdReport

Not Yet Ready for Teacher Evaluation?

Last night, the Knox County Board of Education passed a resolution asking the state to not count this year’s new TNReady test in teacher evaluation.

Board members cited the grace period the state is granting to students as one reason for the request. While standardized test scores count in student grades, the state has granted a waiver of that requirement in the first year of the new test.

However, no such waiver was granted for teachers, who are evaluated using student test scores and a metric known as value-added modeling that purports to reflect student growth.

Instead, the Department of Education proposed and the legislature supported a plan to phase-in the TNReady scores in teacher evaluations. This plan presents problems in terms of statistical validity.

Additionally, the American Educational Research Association released a statement recently cautioning states against using value-added models in high-stakes decisions involving teachers:

In a statement released today, the American Educational Research Association (AERA) advises those using or considering use of value-added models (VAM) about the scientific and technical limitations of these measures for evaluating educators and programs that prepare teachers. The statement, approved by AERA Council, cautions against the use of VAM for high-stakes decisions regarding educators.

So, regardless of the phase-in of TNReady, value-added models for evaluating teachers are problematic. When you add the transition to a new test to the mix, you only compound the existing problems, making any “score” assigned to a teacher even more unreliable.

Tullahoma City Schools Superintendent Dan Lawson spoke to the challenges with TVAAS recently in a letter he released in which he noted:

Our teachers are tasked with a tremendous responsibility and our principals who provide direct supervision assign teachers to areas where they are most needed. The excessive reliance on production of a “teacher number” produces stress, a lack of confidence and a drive to first protect oneself rather than best educate the child.

It will be interesting to see if other school systems follow Knox County’s lead on this front. Even more interesting: Will the legislature take action and at the least, waive the TNReady scores from teacher evaluations in the first year of the new test?

A more serious, long-term concern is the use of value-added modeling in teacher evaluation and, especially, in high-stakes decisions like the granting of tenure, pay, and hiring/firing.

More on Value-Added Modeling

The Absurdity of VAM

Unreliable and Invalid

Some Inconvenient Facts About VAM

For more on education politics and policy in Tennessee, follow @TNEdReport

 

It All Comes Down to a Number

Dan Lawson is the Director of Schools for Tullahoma City Schools. He sent this message and the American Educational Research Association press release to a group of Tennessee lawmakers.

I am the superintendent of Tullahoma City Schools and in light of the media coverage associated with Representative Holt and a dialogue with teachers in west Tennessee I wanted to share a few thoughts with each of who represent teachers in other districts in Tennessee. I am thankful that each of you have a commitment to service and work to cultivate a great relationship with teachers and communities that you represent.

While it is certainly troubling that the standards taught are disconcerting in that developmental appropriateness is in question by many, and that the actual test administration may be a considerable challenge due to hardware, software and capacity concerns, I think one of the major issues has been overlooked and is one that could easily address many concerns and restore a sense of confidence in many of our teachers.

Earlier this week the American Educational Research Association released a statement (see below) cautioning states “against the use of VAM for high-stakes decisions regarding educators.” It seems to me that no matter what counsel I provide, what resources I bring to assist and how much I share our corporate school district priorities, we boil our work and worth as a teacher down to a number. And for many that number is a product of how well they guess on what a school-wide number could be since they don’t have a tested area.

Our teachers are tasked with a tremendous responsibility and our principals who provide direct supervision assign teachers to areas where they are most needed. The excessive reliance on production of a “teacher number” produces stress, a lack of confidence and a drive to first protect oneself rather than best educate the child. As an example, one of my principals joined me in meeting with an exceptional middle school math teacher, Trent Stout. Trent expressed great concerns about the order in which the standards were presented (grade level) and advised that our math department was confident that a different order would better serve our students developmentally and better prepare them for higher level math courses offered in our community. He went on to opine that while he thought we (and he) would take a “hit” on our eighth grade assessment it would serve our students better to adopt the proposed timeline. I agreed. It is important to note that I was able to dialogue with this professional out of a sense of joint respect and trust and with knowledge that his status with our district was solely controlled by local decision makers. He is a recipient of “old tenure.” However, don’t mishear me, I am not requesting the restoration of “old tenure,” simply a modification of the newly enacted statute. I propose that a great deal of confidence in “listening and valuing” teachers could be restored by amending the tenure statute to allow local control rather than state eligibility.

I have teachers in my employ with no test data who guess well and are eligible for the tenure status, while I have others who guess poorly and are not eligible. Certainly, the final decision to award tenure is a local one, but local based on state produced data that may be flawed or based on teachers other than the potential nominee. Furthermore, if we opine that tenure does indeed have value, I am absolutely lost when I attempt to explain to new teachers that if they are not eligible for tenure I may employ them for an unlimited number of added contracts but if they are eligible based on their number and our BOE decides that they will not award tenure to anyone I am compelled to non-renew those who may be highly effective teachers. The thought that statue allows me to reemploy a level 1 teacher while compelling me to non-renew a level 5 teacher seems more than a bit ironic and ridiculous.

I greatly appreciate your service to our state and our future and would love to see an extensive dialogue associated to the adoption of Common Sense.

The American Educational Research Association Statement on Value-Added Modeling:

In a statement released today, the American Educational Research Association (AERA) advises those using or considering use of value-added models (VAM) about the scientific and technical limitations of these measures for evaluating educators and programs that prepare teachers. The statement, approved by AERA Council, cautions against the use of VAM for high-stakes decisions regarding educators.

In recent years, many states and districts have attempted to use VAM to determine the contributions of educators, or the programs in which they were trained, to student learning outcomes, as captured by standardized student tests. The AERA statement speaks to the formidable statistical and methodological issues involved in isolating either the effects of educators or teacher preparation programs from a complex set of factors that shape student performance.

“This statement draws on the leading testing, statistical, and methodological expertise in the field of education research and related sciences, and on the highest standards that guide education research and its applications in policy and practice,” said AERA Executive Director Felice J. Levine.

The statement addresses the challenges facing the validity of inferences from VAM, as well as specifies eight technical requirements that must be met for the use of VAM to be accurate, reliable, and valid. It cautions that these requirements cannot be met in most evaluative contexts.

The statement notes that, while VAM may be superior to some other models of measuring teacher impacts on student learning outcomes, “it does not mean that they are ready for use in educator or program evaluation. There are potentially serious negative consequences in the context of evaluation that can result from the use of VAM based on incomplete or flawed data, as well as from the misinterpretation or misuse of the VAM results.”

The statement also notes that there are promising alternatives to VAM currently in use in the United States that merit attention, including the use of teacher observation data and peer assistance and review models that provide formative and summative assessments of teaching and honor teachers’ due process rights.

The statement concludes: “The value of high-quality, research-based evidence cannot be over-emphasized. Ultimately, only rigorously supported inferences about the quality and effectiveness of teachers, educational leaders, and preparation programs can contribute to improved student learning.” Thus, the statement also calls for substantial investment in research on VAM and on alternative methods and models of educator and educator preparation program evaluation.

The AERA Statement includes 8 technical requirements for the use of VAM:

  1. “VAM scores must only be derived from students’ scores on assessments that meet professional standards of reliability and validity for the purpose to be served…Relevant evidence should be reported in the documentation supporting the claims and proposed uses of VAM results, including evidence that the tests used are a valid measure of growth [emphasis added] by measuring the actual subject matter being taught and the full range of student achievement represented in teachers’ classrooms” (p. 3).
  2. “VAM scores must be accompanied by separate lines of evidence of reliability and validity that support each [and every] claim and interpretative argument” (p. 3).
  3. “VAM scores must be based on multiple years of data from sufficient numbers of students…[Related,] VAM scores should always be accompanied by estimates of uncertainty to guard against [simplistic] overinterpretation[s] of [simple] differences” (p. 3).
  4. “VAM scores must only be calculated from scores on tests that are comparable over time…[In addition,] VAM scores should generally not be employed across transitions [to new, albeit different tests over time]” (AERA Council, 2015, p. 3).
  5. “VAM scores must not be calculated in grades or for subjects where there are not standardized assessments that are accompanied by evidence of their reliability and validity…When standardized assessment data are not available across all grades (K–12) and subjects (e.g., health, social studies) in a state or district, alternative measures (e.g., locally developed assessments, proxy measures, observational ratings) are often employed in those grades and subjects to implement VAM. Such alternative assessments should not be used unless they are accompanied by evidence of reliability and validity as required by the AERA, APA, and NCME Standards for Educational and Psychological Testing” (p. 3).
  6. “VAM scores must never be used alone or in isolation in educator or program evaluation systems…Other measures of practice and student outcomes should always be integrated into judgments about overall teacher effectiveness” (p. 3).
  7. “Evaluation systems using VAM must include ongoing monitoring for technical quality and validity of use…Ongoing monitoring is essential to any educator evaluation program and especially important for those incorporating indicators based on VAM that have only recently been employed widely. If authorizing bodies mandate the use of VAM, they, together with the organizations that implement and report results, are responsible for conducting the ongoing evaluation of both intended and unintended consequences. The monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use within a given evaluation system” (AERA Council, 2015, p. 3).
  8. “Evaluation reports and determinations based on VAM must include statistical estimates of error associated with student growth measures and any ratings or measures derived from them…There should be transparency with respect to VAM uses and the overall evaluation systems in which they are embedded. Reporting should include the rationale and methods used to estimate error and the precision associated with different VAM scores. Also, their reliability from year to year and course to course should be reported. Additionally, when cut scores or performance levels are established for the purpose of evaluative decisions, the methods used, as well as estimates of classification accuracy, should be documented and reported. Justification should [also] be provided for the inclusion of each indicator and the weight accorded to it in the evaluation process…Dissemination should [also] include accessible formats that are widely available to the public, as well as to professionals” ( p. 3-4).

The bottom line:  Tennessee’s use of TVAAS in teacher evaluations is highly problematic.

More on TVAAS:

Not Yet TNReady

The Worst Teachers

Validating the Invalid

More on Peer Assistance and Review:

Is PAR a Worthy Investment?

For more on education politics and policy in Tennessee, follow @TNEdReport

 

Not Yet TNReady?

As students and teachers prepare for this year’s standardized tests, there is more anxiety than usual due to the switch to the new TNReady testing regime. This according to a story in the Tennessean by Jason Gonzalez.

Teachers ask for “grace”

In his story, Gonzalez notes:

While teachers and students work through first-year struggles, teachers said the state will need to be understanding. At the Governor’s Teacher Cabinet meeting Thursday in Nashville, 18 educators from throughout the state told Gov. Bill Haslam and McQueen there needs to be “grace” over this year’s test.

The state has warned this year’s test scores will likely dip as it switches to a new baseline measure. TCAP scores can’t be easily compared to TNReady scores.

Despite the fact that the scores “can’t be easily compared,” the state will still use them in teacher evaluations. At the same time, the state is allowing districts to waive the requirement that the scores count toward student grades, as the TCAP and End of Course tests have in the past.

In this era of accountability, it seems odd that students would be relieved of accountability while teachers will still be held accountable.

While that may be one source of anxiety, another is that by using TNReady in the state’s TVAAS formula, the state is introducing a highly suspect means of evaluating teachers. It is, in fact, a statistically invalid approach.

As noted back in March citing an article from the Journal of Educational Measurement:

These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.

The researchers tested various VAM models (including the type used in TVAAS) and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured. 

 

That means that the shift to TNReady will change the way TVAAS estimates teacher effect. How? No one knows. We can’t know. We can’t know because the test hasn’t been administered and so we don’t have any results. Without results, we can’t compare TNReady to TCAP. And, even once we have this year’s results, we can’t fairly establish a pattern — because we will only have one year of data. What if this year’s results are an anomaly? With three or more years of results, we MAY be able to make some estimates as to how TCAP compares to TNReady and then possibly correlate those findings into teacher effect estimates. But, we could just end up compounding error rates.

Nevertheless, the state will count the TNReady results on this year’s teacher evaluations using a flawed TVAAS formula. And the percentage these results will count will grow in subsequent years, even if the confidence we have in the estimate does not. Meanwhile, students are given a reprieve…some “grace” if you will.

I’d say that’s likely to induce some anxiety.

For more on education politics and policy in Tennessee, follow @TNEdReport

Testing Time

While Tennessee teachers are raising concerns about the amount of time spent on testing and test preparation, the Department of Education is lauding the new TNReady tests as an improvement for Tennessee students.

According to an AP story:

However, the survey of nearly 37,000 teachers showed 60 percent say they spend too much time helping students prepare for statewide exams, and seven out of ten believe their students spend too much time taking exams.

“What teachers recognize is the unfortunate fact that standardized testing is the only thing valued by the state,” said Jim Wrye, assistant executive director of the Tennessee Education Association, the state’s largest teachers’ union.

“Teachers and parents know there are so many things that affect future student success that are not measured by these tests, like social and emotional skills, cooperative behaviors, and academic abilities that do not lend themselves to be measured this way.”

Despite teacher concerns, the Department of Education says the new tests will be better indicators of student performance, noting that it will be harder for students to “game” the tests. That’s because the tests will include some open-ended questions.

What they don’t mention is that the company administering the tests, Measurement, Inc., is seeking test graders on Craigslist. And, according to a recent New York Times story, graders of tests like TNReady have, “…the possibility of small bonuses if they hit daily quality and volume targets.”  The more you grade, the more you earn, in other words.

Chalkbeat summarizes the move to TNReady like this:

The state was supposed to move in 2015 to the PARCC, a Common Core-aligned assessment shared by several states, but the legislature voted in 2014 to stick to its multiple-choice TCAP test while state education leaders searched for a test similar to the PARCC but designed exclusively for Tennessee students.

Except the test is not exactly exclusive to Tennessee.  That’s because Measurement, Inc. has a contract with AIR to use test questions already in use in Utah for tests in Florida, Arizona, and Tennessee.

And, for those concerned that students already spend too much time taking standardized tests, the DOE offers this reassurance about TNReady:

The estimated time for TNReady includes 25-50 percent more time per question than on the prior TCAP for English and math. This ensures that all students have plenty of time to answer each test question, while also keeping each TNReady test short enough to fit into a school’s regular daily schedule.

According to the schedule, the first phase of testing will start in February/March and the second phase in April/May. That means the tests are not only longer, but they also start earlier and consume more instructional time.

For teachers, that means it is critical to get as much curriculum covered as possible by February. This is because teachers are evaluated in part based on TVAAS — Tennessee Value-Added Assessment System — a particularly problematic statistical formula that purports to measure teacher impact on student learning.

So, if you want Tennessee students to spend more time preparing for and taking tests that will be graded by people recruited on Craigslist and paid bonuses based on how quickly they grade, TNReady is for you. And, you’re in luck, because testing time will start earlier than ever this year.

Interestingly, the opt-out movement hasn’t gotten much traction in Tennessee yet. TNReady may be just the catalyst it needs.

For more on education politics and policy in Tennessee, follow @TNEdReport

That’s Not That Much, Really

So, statewide TCAP results are out and as soon as they were released, the Achievement School District (ASD) touted its gains.

Embedded image permalink

But, what does all that mean? How are these schools doing relative to the goal of taking them from the bottom 5% of schools to the top 25% within 5 years, as founder Chris Barbic boasted before his recent revelation that educating poor kids can be difficult.

Fortunately, Gary Rubinstien has done some analysis. Here’s what he found:

By this metric the top performing ASD school from the first cohort was Corning with a score of 48.6 followed by Brick Church (47.9), Frayser (45.2), Westside (42.1), Cornerstone (37.6), and Hume (33.1).  To check where these scores ranked compared to all the Tennessee schools, I calculated this metric for all 1358 schools that had 3-8 math and reading and sorted them from high to low.

The values below represent the school’s overall score and their percentile relative to the rest of the state, in that order.

Hume 33.1 1.5%
Cornerstone 37.6 2.6%
Westside 42.1 3.2%
Frayser 45.2 4.1%
Brick Church 47.9 5.2%
Corning 48.6 5.5%

As you can see, four of the original six schools are still in the bottom 5% while the other two have now ‘catapulted’ to the bottom 6%.  Perhaps this is one reason that Chris Barbic recently announced he is resigning at the end of the year.

So, the schools that have been in the ASD the longest, making the greatest gains, are at best in the bottom 6% of all schools in the state. That’s a long, long way from the top 25.

But here’s something else. Back in December, the ASD decided to take over Neely’s Bend Middle School in Nashville. The school had been on the priority list, after all, and it was declared the victor in a school vs. school battle against Madison Middle.

I reported earlier in the week about the impressive gains at Neely’s Bend. In fact, the state’s TVAAS website shows Neely’s Bend receiving a 5 overall in its growth score — the state’s highest number.

I wondered where Neely’s Bend might fall in comparison to Rubinstein’s analysis of the ASD schools that had been under management for the past three years. Turns out, Neely’s Bend’s proficient/advanced composite for reading and is 54.4.

Yes, you read that right. Neely’s Bend’s score is 5.8 points higher than the best performing school that’s been under ASD control the longest.

Neely’s Bend is being taken over and converted to a charter school and yet the school posted significant gains (above district average), has a TVAAS overall score of 5, and has a higher percentage of students at the proficient/advanced level than the BEST schools under ASD management.

For more on education politics and policy in Tennessee, follow @TNEdReport

 

 

The Worst Teachers?

“There is a decently large percentage of teachers who are saying that they feel evaluation isn’t fair,” he (state data guru Nate Schwartz) said. “That’s something we need to think about in the process we use to evaluate teachers … and what we can do to make clear to teachers how this process works so they feel more secure about it.”

This from a story about the recently released 2015 Educator Survey regarding teacher attitudes in Tennessee.

One reason teachers might feel the evaluation is unfair is the continued push to align observation scores with TVAAS (Tennessee Value-Added Assessment System) data – data that purportedly captures student growth and thereby represents an indicator of teacher performance.

From WPLN:

Classroom observation scores calculated by principals should roughly line up with how a teacher’s students do on standardized tests. That’s what state education officials believe. But the numbers on the state’s five point scale don’t match up well.

“The gap between observation and individual growth largely exists because we see so few evaluators giving 1s or 2s on observation,” the report states.

“The goal is not perfect alignment,” Department of Education assistant commissioner Paul Fleming says, acknowledging that a teacher could be doing many of the right things at the front of the class and still not get the test results to show for it. But the two figures should be close.

In order to be better at aligning observation scores with TVAAS scores, principals could start by assigning lower scores to sixth and seventh grade teachers. At least, that’s what the findings of a study by Jessica Holloway-Libell published in June in the Teachers College Record suggest.

Holloway-Libell studied value-added scores assigned to individual schools in 10 Tennessee districts — Urban and suburban — and found:

In ELA in 2013, schools were, across the board, much more likely to receive positive value-added scores for ELA in fourth and eighth grades than in other grades (see Table 1). Simultaneously, districts struggled to yield positive value-added scores for their sixth and seventh grades in the same subject-areas. Fifth grade scores fell consistently in the middle range, while the third-grade scores varied across districts

Table 1. Percent of Schools that had Positive Value-Added Scores in English/language arts by Grade and District (2013) (Districts which had less than 25% of schools indicate positive growth are in bold)
District      Third      Fourth    Fifth     Sixth     Seventh      Eighth
Memphis      41%       43%        45%      19%        14%           76%
Nashville      NA        43%        28%      16%        15%           74%
Knox             72%       79%        47%      14%         7%            73%
Hamilton     38%      64%        48%      33%      29%            81%
Shelby           97%     76%         61%       6%        50%            69%
Sumner         77%     85%         42%       17%      33%            83%
Montgomery NA      71%         62%       0%        0%              71%
Rutherford     83%   92%         63%      15%     23%             85%
Williamson    NA      88%        58%      11%      33%           100%
Murfreesboro NA     90%        50%     30%     NA              NA

SOURCE: Teachers College Record, Date Published: June 08, 2015
http://www.tcrecord.org ID Number: 17987, Date Accessed: 7/27/2015

In examining three-year averages, Holloway-Libell found:

The three-year composite scores were similar except even more schools received positive value-added scores for the fifth and eighth grades. In fact, in each of the nine districts that had a composite score for eighth grade, at least 86% of their schools received positive value-added scores at the eighth-grade level.

By contrast, results in math were consistently positive across grade level and district type:

In particular, the fourth and seventh grade-level scores were consistently higher than those of the third, fifth, sixth, and eighth grades, which illustrated much greater variation across districts. The three-year composite scores were similar. In fact, a majority of schools across the state received positive value-added scores in mathematics across all grade levels.

So, what does this mean?

Well, it could mean that Tennessee’s 6th and 7th grade ELA teachers are the worst in the state. Or, it could mean that math teachers in Tennessee are better teachers than ELA teachers. Or, it could mean that 8th grade ELA teachers are rock stars.

Alternatively, one might suspect that the results of Holloway-Libell’s analysis suggest both grade level and subject matter bias in TVAAS.

In short, TVAAS is an unreliable predictor of teacher performance. Or, teaching 6th and 7th grade students reading is really hard.

Holloway-Libell’s findings are consistent with those of Lockwood and McCaffrey (2007) published in the Journal of Educational Measurement:

The researchers tested various VAM models and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

That is, it’s totally consistent with VAM to have different estimates for math and ELA teachers, for example. Math questions are often asked in a different manner than ELA questions and the assessment is covering different subject matter.

So, TVAAS is like other VAM models in this respect. Which means, as Lockwood and McCaffrey suggest, “caution is needed when interpreting estimated teacher effects” when using VAM models (like TVAAS).

In other words: TVAAS is not a reliable predictor of teacher performance.

Which begs the question: Why is the Tennessee Department of Education attempting to force correlation between observed teacher behavior and a flawed, unreliable measure of teacher performance? More importantly, why is such an unreliable measure being used to evaluate (and in some districts, reward with salary increases) teachers?

Don’t Tennessee’s students and parents deserve a teacher evaluation system that actually reveals strong teaching and provides support for teachers who need improvement?

Aren’t Tennessee’s teachers deserving of meaningful evaluation based on sound evidence instead of a system that is consistent only in its unreliability?

The American Statistical Association has said value-added models generally are unreliable as predictors of teacher performance. Now, there’s Tennessee-specific evidence that suggests strongly that TVAAS is biased, unreliable, and not effective as a predictor of teacher performance.

Unless, that is, you believe that 6th and 7th grade ELA teachers are our state’s worst.

For more on education politics and policy in Tennessee, follow @TNEdReport