The Worst Teachers?

“There is a decently large percentage of teachers who are saying that they feel evaluation isn’t fair,” he (state data guru Nate Schwartz) said. “That’s something we need to think about in the process we use to evaluate teachers … and what we can do to make clear to teachers how this process works so they feel more secure about it.”

This from a story about the recently released 2015 Educator Survey regarding teacher attitudes in Tennessee.

One reason teachers might feel the evaluation is unfair is the continued push to align observation scores with TVAAS (Tennessee Value-Added Assessment System) data – data that purportedly captures student growth and thereby represents an indicator of teacher performance.

From WPLN:

Classroom observation scores calculated by principals should roughly line up with how a teacher’s students do on standardized tests. That’s what state education officials believe. But the numbers on the state’s five point scale don’t match up well.

“The gap between observation and individual growth largely exists because we see so few evaluators giving 1s or 2s on observation,” the report states.

“The goal is not perfect alignment,” Department of Education assistant commissioner Paul Fleming says, acknowledging that a teacher could be doing many of the right things at the front of the class and still not get the test results to show for it. But the two figures should be close.

In order to be better at aligning observation scores with TVAAS scores, principals could start by assigning lower scores to sixth and seventh grade teachers. At least, that’s what the findings of a study by Jessica Holloway-Libell published in June in the Teachers College Record suggest.

Holloway-Libell studied value-added scores assigned to individual schools in 10 Tennessee districts — Urban and suburban — and found:

In ELA in 2013, schools were, across the board, much more likely to receive positive value-added scores for ELA in fourth and eighth grades than in other grades (see Table 1). Simultaneously, districts struggled to yield positive value-added scores for their sixth and seventh grades in the same subject-areas. Fifth grade scores fell consistently in the middle range, while the third-grade scores varied across districts

Table 1. Percent of Schools that had Positive Value-Added Scores in English/language arts by Grade and District (2013) (Districts which had less than 25% of schools indicate positive growth are in bold)
District      Third      Fourth    Fifth     Sixth     Seventh      Eighth
Memphis      41%       43%        45%      19%        14%           76%
Nashville      NA        43%        28%      16%        15%           74%
Knox             72%       79%        47%      14%         7%            73%
Hamilton     38%      64%        48%      33%      29%            81%
Shelby           97%     76%         61%       6%        50%            69%
Sumner         77%     85%         42%       17%      33%            83%
Montgomery NA      71%         62%       0%        0%              71%
Rutherford     83%   92%         63%      15%     23%             85%
Williamson    NA      88%        58%      11%      33%           100%
Murfreesboro NA     90%        50%     30%     NA              NA

SOURCE: Teachers College Record, Date Published: June 08, 2015 ID Number: 17987, Date Accessed: 7/27/2015

In examining three-year averages, Holloway-Libell found:

The three-year composite scores were similar except even more schools received positive value-added scores for the fifth and eighth grades. In fact, in each of the nine districts that had a composite score for eighth grade, at least 86% of their schools received positive value-added scores at the eighth-grade level.

By contrast, results in math were consistently positive across grade level and district type:

In particular, the fourth and seventh grade-level scores were consistently higher than those of the third, fifth, sixth, and eighth grades, which illustrated much greater variation across districts. The three-year composite scores were similar. In fact, a majority of schools across the state received positive value-added scores in mathematics across all grade levels.

So, what does this mean?

Well, it could mean that Tennessee’s 6th and 7th grade ELA teachers are the worst in the state. Or, it could mean that math teachers in Tennessee are better teachers than ELA teachers. Or, it could mean that 8th grade ELA teachers are rock stars.

Alternatively, one might suspect that the results of Holloway-Libell’s analysis suggest both grade level and subject matter bias in TVAAS.

In short, TVAAS is an unreliable predictor of teacher performance. Or, teaching 6th and 7th grade students reading is really hard.

Holloway-Libell’s findings are consistent with those of Lockwood and McCaffrey (2007) published in the Journal of Educational Measurement:

The researchers tested various VAM models and found that teacher effect estimates changed significantly based on both what was being measured AND how it was measured.

That is, it’s totally consistent with VAM to have different estimates for math and ELA teachers, for example. Math questions are often asked in a different manner than ELA questions and the assessment is covering different subject matter.

So, TVAAS is like other VAM models in this respect. Which means, as Lockwood and McCaffrey suggest, “caution is needed when interpreting estimated teacher effects” when using VAM models (like TVAAS).

In other words: TVAAS is not a reliable predictor of teacher performance.

Which begs the question: Why is the Tennessee Department of Education attempting to force correlation between observed teacher behavior and a flawed, unreliable measure of teacher performance? More importantly, why is such an unreliable measure being used to evaluate (and in some districts, reward with salary increases) teachers?

Don’t Tennessee’s students and parents deserve a teacher evaluation system that actually reveals strong teaching and provides support for teachers who need improvement?

Aren’t Tennessee’s teachers deserving of meaningful evaluation based on sound evidence instead of a system that is consistent only in its unreliability?

The American Statistical Association has said value-added models generally are unreliable as predictors of teacher performance. Now, there’s Tennessee-specific evidence that suggests strongly that TVAAS is biased, unreliable, and not effective as a predictor of teacher performance.

Unless, that is, you believe that 6th and 7th grade ELA teachers are our state’s worst.

For more on education politics and policy in Tennessee, follow @TNEdReport




12 thoughts on “The Worst Teachers?

  1. It isn’t just the TVAAS that is biased. The observation scores themselves are more often than not biased. Administrators conducting the observations often give teachers that are excellent at “brown nosing” higher marks even if they are not deserving of the higher marks. It is human nature. The administrators in a building should not be conducting the observations. Their relationships with the teachers is too personal and therefore the observations are more subjective than objective. Furthermore, what I have experienced is administrators with ZERO experience in elementary observing my kindergarten classroom with no understanding of what the specific points on the rubric look like in a kindergarten classroom. The entire evaluation process is severely flawed and not a useful tool.

  2. Pingback: Here Is Another Problem With Common Core | kavips

  3. Pingback: Tennessee Education Report | Neely’s Bend Rising

  4. It’s a bad system. Statistical research seems to indicate tvaas is flawed and unless you have evaluators who are truly independent of the system, politics, personal relationships, etc, it’s hard to have an objective evaluation. Still the evaluators, who are mostly administrative staff such as principals have enough to do managing their appointed school, then they have to go in an observe a teacher do his or her job according to the states model/guideline….(just don’t call it a checklist in front of a state person). The catch is if the person being observed scores low, more observations have to be made on an already time crunched administrator. Not to mention the paperwork and pre meeting and post meeting with the observed teacher.
    People want to know why schools are having a hard time, the state and federal legislatures and educational departments do not understand all the things that are asked of a school these days. There is only so much blood you can squeeze from a turnip. I think this is the reason for the increases in retirements and new teachers not staying in the profession.

  5. Pingback: TVAAS Trouble in Tennessee | Spears Strategy

  6. Pingback: Kentucky Education Report | The PGES Student Growth Component

  7. Pingback: VAM: “Arbitrary” and “Capricious” | Spears Strategy

  8. Pingback: Tennessee Education Report | Bias Confirmed

  9. Pingback: VAM-Based Bias | Spears Strategy

  10. Math is often a straigtforward concrete process (often with formulas provided and calculators used), whereas reading and other areas are nuanced analysis and more abstract synthesis, with questions worded in ways that need more time to process and analysis themselves (hard to write clear questions)

Leave a Reply

Your email address will not be published. Required fields are marked *

7 + nine =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>