Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Login with Facebook Sign In with Google Sign In with OpenID Sign In with Twitter

In this Discussion

Who's Online (0)

VAM and Merit Pay Resources
  • bvaliantbvaliant
    Posts: 185
         Robert Valiant
         Here you will find articles and data to refute the claims of proponents of VAM and merit pay.
             *
         Robert Valiant

    http://gfbrandenburg.wordpress.com/2012/03/05/more-on-the-utter-stupidity-of-nycs-value-added-machinations/

    or, if you prefer,
    http://bit.ly/zNLAbn

    and
    http://bit.ly/zXPBK1

    Guy Brandenburg analysis of NYC VAM evaluations








        John Rogers

         Director, UCLA's Institute for Democracy, Education, and Access
         From Huffington Post
         Posted: August 24, 2010 11:05 AM

         Value Added is No Magic: Assessing Teacher Effectiveness



         Read More: Lausd , Los Angeles Times , Los Angeles Unified School
    District , School Reform , Teacher Effectiveness , Teacher Rankings ,
    Value Added , Los Angeles News




         That old sorcerer has vanished
         And for once has gone away!
         Spirits called by him, now banished,
         My commands shall soon obey.


         In Goethe's classic, the apprentice uses a sorcerer's spell to
    ease his daily chores. Chanting the master's words, he brings a
    broomstick to life and tells it to fetch water to clean the workshop.
    The broomstick obeys, only too well. It races between the well and back
    until the workshop begins to flood. Although the apprentice had enough
    knowledge to set magic in motion, he could not think ahead to what he
    did not know.



         I worry about a similar flood of unintended consequences if the
    Los Angeles Times moves forward with its plans to publish a database
    that places 6,000 Los Angeles third- to fifth-grade teachers on a
    spectrum from "least effective" to "most effective." The Times believes
    that the data will be a powerful tool to force better teaching, but it
    cannot anticipate all of the consequences. For example, consider that
    capable prospective teachers might avoid a profession in which they risk
    public embarrassment based on an undeveloped science. Consider the
    well-documented estimates that 25% of the value-added assessments are
    likely to be in error.



         Publishing the database might easily undermine parent and teacher
    morale and make it more difficult for principals to advance school
    improvement. Being told that their child's teacher is "ineffective," or
    even marginally less effective than a teacher across the hall, may lead
    some parents to pressure the principal to place their child with a
    "high-scoring" teacher. Pitting parents against one another or against
    their principal is not a recipe for school improvement.



         The Times' teacher effectiveness rankings are based on an
    elaborate statistical model created by Richard Buddin, a senior
    economist and education researcher at the Rand Corporation.
    (Significantly, Buddin did not attach teachers' names to his analysis;
    that was done by the Times.)



         Buddin is one of many researchers across the country exploring
    so-called value-added approaches to assessing teacher quality. The
    assessments measure gains that students make on standardized tests from
    one year to the next. For example, researchers compare test scores of
    fourth graders with their scores as third graders to determine the
    "value added" by the fourth grade teacher. Proponents believe that the
    "value added" reliably distinguishes between more and less effective
    teachers. And they think that school officials would use such
    comparisons to target support to struggling teachers and motivate them
    to do better.



         Yet value-added analyses focus narrowly on standardized tests,
    usually in math and English Language Arts. These tests give important
    information about student learning, but they ignore much learning that
    matters to students, parents, and teachers. That's why it can be a
    useful tool, but cannot possibly stand alone as a measure of
    "effectiveness." The National Academy of Sciences has identified several
    of the problems posed by value-added methods. These cautions should be
    taken seriously.



         * First, student assignments to schools and classrooms are rarely
    random. As a consequence it is not possible to definitively determine
    whether higher or lower students test scores result from teacher
    effectiveness or are an artifact of how students are distributed.



         * Second, it is difficult to compare growth of struggling students
    with the growth of high performers. In technical terms, standardized
    tests do not form equal interval scales. Enabling students to move from
    the 20th percentile to the 30th is not the same as helping students move
    from the 80th to the 90th percentile. These test score numbers are not
    like inches along a tape measure that have the same value regardless of
    where they occur.



         * Third, estimates of teacher effectiveness can range widely from
    year to year. In recent studies, 10-15% of teachers in the lowest
    category of effectiveness one year moved to the highest category the
    following year while 10-15% of teachers in the highest category fell to
    the lowest tier.




         The National Academy of Sciences concluded that value-added
    analysis "should not be used as the sole or primary basis for making
    operational decisions because the extent to which the measures reflect
    the contribution of teachers themselves, rather than other factors, is
    not understood."



         And yet, the Los Angeles Times is about to publish a database with
    the teacher effectiveness rankings of 6,000 elementary school teachers.
    The Times argues that its role is to provide "parents and the public
    ... information that would otherwise be withheld" about the "performance
    of public employees." The Times should not believe in the magic of this
    data, and should realize that it cannot foresee or control all of the
    consequences.



         Follow John Rogers on Twitter: www.twitter.com/UCLA_IDEA

         about 10 months ago · Delete Post
        *
         Robert Valiant
         Evidence about the use of test scores to evaluate teachers: Economic Policy Institute, 2010


         “…there is broad agreement among statisticians, psychometricians,
    and economists that student test scores alone are not sufficiently
    reliable and valid indicators of teacher effectiveness to be used in
    high-stakes personnel decisions, even when the most sophisticated
    statistical applications such as value-added modeling are employed.


         For a variety of reasons, analyses of VAM results have led
    researchers to doubt whether the methodology can accurately identify
    more and less effective teachers. VAM estimates have proven to be
    unstable across statistical models, years, and classes that teachers
    teach. One study found that across five large urban districts, among
    teachers who were ranked in the top 20% of effectiveness in the first
    year, fewer than a third were in that top group the next year, and
    another third moved all the way down to the bottom 40%. Another found
    that teachers’ effectiveness ratings in one year could only predict from
    4% to 16% of the variation in such ratings in the following year. Thus,
    a teacher who appears to be very ineffective in one year might have a
    dramatically different result the following year. The same dramatic
    fluctuations were found for teachers ranked at the bottom in the first
    year of analysis. This runs counter to most people’s notions that the
    true quality of a teacher is likely to change very little over time and
    raises questions about whether what is measured is largely a “teacher
    effect” or the effect of a wide variety of other factors.”


         about 10 months ago · Delete Post
        *
         Robert Valiant
         Neither Fair Nor Accurate • Research-Based Reasons Why High-Stakes Tests Should Not Be Used to Evaluate Teachers



         By Wayne Au


         A pitched battle raged in my hometown of Seattle this fall.
    Superintendent Maria Goodloe-Johnson and the Seattle Public Schools
    district fought with the Seattle Education Association over their most
    recent teachers’ union contract. At the heart of the dispute: Should
    teacher evaluations be based in part on student scores on standardized
    tests?



         Seattle is not unique in this struggle, and it is clear that
    Superintendent Goodloe-Johnson takes her cue from what is happening
    nationally.



         In August, for instance, the Los Angeles Times printed a massive
    study in which LA student test scores were used to rate individual
    teacher effectiveness. The study was based on a statistical model
    referred to as value-added measurement (VAM). As part of the story, the
    Times published the names of roughly 6,000 teachers and their VAM
    ratings (see sidebar, p. 37).



         In October the New York City Department of Education followed
    suit, publicizing plans to release the VAM scores for nearly 12,000
    public school teachers. U.S. Secretary of Education Arne Duncan lauded
    both the Times study and the NYC Department of Education plans, a stance
    consistent with Race to the Top guidelines and President Obama’s
    support for using test scores to evaluate teachers and determine merit
    pay.



         Current and former leaders of many major urban school districts,
    including Washington, D.C.’s Michelle Rhee and New Orleans’ Paul Vallas,
    have sought to use tests to evaluate teachers. In fact, the use of
    high-stakes standardized tests to evaluate teacher performance à la VAM
    has become one of the cornerstones of current efforts to reshape public
    education along the lines of the free market.



         On the surface, the logic of VAM and using student scores to
    evaluate teachers seems like common sense: The more effective a teacher,
    the better his or her students should do on standardized tests.



         However, although research tells us that teacher quality has an
    effect on test scores, this does not mean that a specific teacher is
    responsible for how a specific student performs on a standardized test.
    Nor does it mean we can equate effective teaching (or actual learning)
    with higher test scores.



         Given the current attacks on teachers, teachers’ unions, and
    public education through the use of educational accountability schemes
    based wholly or partly on high-stakes standardized test scores and VAM,
    it is important that educators, students, and parents understand why,
    based on educational research, such tests should not be used to evaluate
    teachers.



         Although there are many well-documented problems with using VAM to
    evaluate teachers, I’ve chosen to highlight six critical issues with
    VAM that are so problematic they alone should be enough to stop the use
    of high-stakes standardized tests for such evaluations. I hope these
    will be helpful as talking points for op-ed pieces, blogs, and
    discussions at school board meetings, PTA meetings, and in the bleachers
    at basketball games.


         Statistical Error Rates


         There is a statistical error rate of 35 percent when using one
    year’s worth of test data to measure a teacher’s effectiveness, and an
    error rate of 25 percent when using data from three years, researchers
    Peter Schochet and Hanley Chiang find in their 2010 report “Error Rates
    in Measuring Teacher and School Performance Based on Test Score Gains,”
    released by the U.S. Department of Education’s National Center for
    Education Statistics.



         Bruce Baker, finance expert at Rutgers University, explains that
    using high-stakes test scores to evaluate teachers in this manner means
    there is a one-in-four chance that a teacher rated as “average” could be
    incorrectly rated as “below average” and face disciplinary measures.
    Because of these error rates, a teacher’s performance evaluation may
    pivot on what amounts to a statistical roll of the dice.


         Year-to-Year Test Score Instability


         As Tim Sass, economics professor at Florida State University,
    points out in “The Stability of Value-Added Measures of Teacher Quality
    and Implications for Teacher Compensation Policy,” test scores of
    students taught by the same teacher fluctuate wildly from year to year.
    In one study comparing two years of test scores across five urban
    districts, more than two-thirds of the bottom-ranked teachers one year
    had moved out of the bottom ranks the next year. Of this group, a full
    third went from the bottom 20 percent one year to the top 40 percent the
    next. Similarly, only one-third of the teachers who ranked highest one
    year kept their top ranking the next, and almost a third of the formerly
    top-ranked teachers landed in the bottom 40 percent in year two.



         If test scores were an accurate measurement of teacher
    effectiveness, “effective” teachers would rate high consistently from
    year to year because they are good teachers; and one would expect
    “ineffective” teachers to rate low in terms of test scores just as
    consistently. Instead, the year-to-year instability that Sass highlights
    shows that test scores have very little to do with the effectiveness of
    a single teacher and have more to do with the change of students from
    year to year (unless, of course, one believes that one-third of the
    highest ranked teachers in the first year of the study simply decided to
    teach poorly in the second).


         Day-to-Day Score Instability


         Fifty to 80 percent of any improvement or decline in a student’s
    standardized test scores can be attributed to one-time, randomly
    occurring factors, according to Thomas Kane of Harvard University and
    Douglas Staiger of Dartmouth College in their research report
    “Volatility in Test Scores.”



         This means that factors such as whether or not a child ate
    breakfast on test day, whether or not a child got into an argument with
    parents or peers on the way to school, which other students happened to
    be in attendance while taking the test, and the child’s feelings about
    the test administrator account for at least half of any given student’s
    standardized test score gains or losses. Some factors, such as a dog
    barking outside an open window, can affect an entire class.



         Kane and Staiger’s findings illustrate that using tests to
    evaluate teachers ignores the reality that a host of individual daily
    factors that are completely out of a teacher’s control contribute to how
    a student performs on any given test. To reward or punish a teacher
    based on such scores could literally mean rewarding or punishing a
    teacher based on how well or poorly a student’s morning went.


         Nonrandom Student Assignments


         The grouping of students—either within schools through formal and
    informal tracking or across schools through race, socioeconomic class,
    and linguistic (ELL) segregation—greatly influences VAM test results, as
    10 leading researchers in teacher quality and educational assessment
    highlight in their policy brief “Problems with the Use of Student Test
    Scores to Evaluate Teachers,” published by the Economic Policy
    Institute.



         These researchers note that “teachers who have chosen to teach in
    schools serving more affluent students may appear to be more effective
    simply because they have students with more home and school supports for
    their prior and current learning, and not because they are better
    teachers.”



         Even when VAM models attempt to take into account a student’s
    prior achievement or demographic characteristics, the models assume that
    all students will show test gains at an equal rate. This assumption,
    however, does not necessarily hold true for groups of students who
    historically have performed poorly on tests, for English language
    learners who are asked to become proficient in both a new language and a
    tested subject area, or for students with disabilities whose test-based
    rates of progress may be incomparable to any other student.



         Nonrandom student assignment means that a teacher could be
    punished, dismissed, or lose tenure purely because the course they teach
    or the school they teach in has a significant population of
    traditionally low-scoring students who may show variable or slower test
    score gains.


         Imprecise Measurement


         High-stakes, standardized tests are also unable to account for the
    complexities of learning (and, by extension, teaching). For instance,
    we know from the linguistic research of Steven Pinker and others that
    learning often happens in a U-shape—that making mistakes is an integral
    part of the learning process. When children are tested, we never quite
    know where on the U-shaped learning curve they might be, nor do we
    realize that their mistakes could be a vital part of a natural learning
    process. When tests are used to evaluate teachers, it is possible that
    highly effective teachers who push students out of their cognitive
    comfort zones are penalized for provoking the deep learning that
    requires students to make mistakes on the way to greater understanding.



         Standardized tests are also too crude to account for the
    possibility of cognitive transfer of skills that students learn across
    different subjects. Using VAM, as the researchers in the above-mentioned
    Economic Policy Institute policy brief explain, means that “the essay
    writing a student learns from his history teacher may be credited to his
    English teacher, even if the English teacher assigns no writing; the
    mathematics a student learns in her physics class may be credited to her
    math teacher.” In other words, we can never be certain which class and
    which teacher contributed to a given student’s test performance in any
    given subject.


         Out-of-School Factors


         Out-of-school factors such as inadequate access to health care,
    food insecurity, and poverty-related stress, among others, negatively
    impact the in-school achievement of students so profoundly that they
    severely limit what schools and teachers can do on their own, explains
    David Berliner, Regents Professor of Education at Arizona State
    University, in his report “Poverty and Potential.”



         Although it is clear from the research of Stanford University’s
    Linda Darling-Hammond and others that teachers play an absolutely
    pivotal role in student success, when we use high-stakes tests to
    evaluate teachers, we incorrectly assume that teachers have the ability
    to overcome any obstacle in students’ lives to improve learning.
    Although good teachers are critically necessary, they are not always
    sufficient.



         To assume otherwise is to think that teachers (and schools) can
    somehow make up for the lack of housing, food, safety, and living wage
    employment, among other factors, all on their own. The social safety net
    is the responsibility of a much broader socioeconomic network—not the
    sole responsibility of the teacher.


         Politics, Not Reality


         The reality of standardized tests is that they are too imprecise
    and inaccurate to measure the effectiveness of individual teachers. The
    sad thing is that testing experts, researchers, and psychometricians
    have known this for quite some time. In 1999, for instance, the expert
    panel that made up the Committee on Appropriate Test Use of the National
    Research Council cautioned that “an educational decision that will have
    a major impact on a test-taker should not be made solely or
    automatically on the basis of a single test score.”



         Yet two short years later, a bipartisan Congress and the
    presidential administration of George W. Bush passed No Child Left
    Behind and its test-and-punish approach to school reform into law.



         Although the Bush administration seemed to ignore educational
    research as a matter of policy (as illustrated through NCLB’s Reading
    First program and the advocacy of using phonics-only teaching methods
    that had little basis in research), many hoped for something different
    with the election of President Obama.



         Unfortunately, the Obama administration has sent a clear message:
    When it comes to high-stakes standardized testing, the research doesn’t
    matter.



         It hasn’t mattered that, according to the above cited U.S.
    Department of Education report, “More than 90 percent of the variation
    in student gain scores is due to the variation in student-level factors
    that are not under control of the teacher.”



         It hasn’t mattered that the National Research Council of the
    National Academy of Sciences has stated that “VAM estimates of teacher
    effectiveness should not be used to make operational decisions because
    such estimates are far too unstable to be considered fair or reliable.”



         It hasn’t mattered that even the researchers who completed the Los
    Angeles Times study acknowledged that VAM data were too unreliable to
    use as the sole measure of teacher performance (a point that the Times
    neglected to clearly articulate in their article).


         Sadly, with Bush, now with Obama, politics and ideology trump educational research.


         One would think that all of the policy makers, politicians,
    pundits, superintendents, talk show hosts, documentary movie makers,
    business leaders, and philanthropic foundations so in love with the idea
    of using test score data to evaluate teachers would be equally as
    passionate about accuracy. People’s lives are at stake, and yet the
    “data” underlying important decisions about teacher performance couldn’t
    be shakier.



         The shakiness of test-based VAM data illustrates that the current
    fight over teacher “accountability” isn’t really about effectiveness.
    The more substantial public conversation we should be having about
    rising poverty, the racial resegregation of our schools, increasing
    unemployment, lack of health care, and the steady defunding of the
    public sector—all factors that have an overwhelming impact on students’
    educational achievement—has been buried. Instead, teachers and their
    unions have become convenient scapegoats for our social, educational,
    and economic woes.



         Yes, teachers’ performance needs to be evaluated, but in a manner
    that is fair and accurate. Using high-stakes standardized tests and VAM
    to make such evaluations is neither.


         A former high school teacher, Wayne Au is a Rethinking Schools
    editor and assistant professor at the University of Washington, Bothell
    Campus.



         about 9 months ago · Delete Post
        *
         School District Citizens

         One of the best compendiums of arguments against VAM can be found
    here:
    <http://rdsathene.blogspot.com/2011/02/are-value-added-methods-vam-new-flat.html&gt;

         about 8 months ago · Delete Post
        *
         School District Citizens
         Here is another great source for arguing against VAM: http://www.njspotlight.com/ets_symposium/
         about 8 months ago · Delete Post
        *
         School District Citizens
         Read the EPI study of VAM here. Theirfindings: VAM is a SCAM.

         http://voices.washingtonpost.com/answer-sheet/teachers/new-study-blasts-popular-teach.html
         about 8 months ago · Delete Post
        *
         School District Citizens
         http://www.economics.harvard.edu/faculty/fryer/files/teacher+incentives.pdf

         ABSTRACT

         Financial incentives for teachers to increase student performance
    is an increasingly popular education policy around the world. This paper
    describes a school-based randomized trial in over two-hundred New York
    City public schools designed to better understand the impact of teacher
    incentives on student achievement. I find no evidence that teacher
    incentives increase student performance, attendance, or graduation, nor
    do I find any evidence that the incentives change student or teacher
    behavior. If anything, teacher incentives may decrease student
    achievement, especially in larger schools. The paper concludes with a
    specu
  • bvaliantbvaliant
    Posts: 185
    From Fairtest, a Fact Sheet on the use of VAM:  http://www.fairtest.org/teacher-evaluation-shouldnt-rest-on-test-scores

  • bvaliantbvaliant
    Posts: 185
    Here is one of the conclusions of a major new study on using VAM to evaluate teachers:

    "We cannot at this time encourage anyone to
    use VAM in a high stakes endeavor."

    http://s.shr.lc/Z4B6Mk
  • bvaliantbvaliant
    Posts: 185
    Value
    Added scores are about as accurate as guessing what the temperature
    will be next year in Louisiana based only on these statistics.

    http://crazycrawfish.wordpress.com/2012/11/02/855/
  • bvaliantbvaliant
    Posts: 185
    VAM Compilation via Dora Taylor

    Dora
    Taylor has compiled a collection of research on VAM that should be at
    the top of your list when looking for resources to WHAM the VAM: http://seattleducation2010.wordpress.com/2013/02/07/a-look-at-the-map-test-and-value-added-measures-vam/

  • bvaliantbvaliant
    Posts: 185
    Rothstein Study:  Economic Policy Institute:

    http://www.epi.org/publication/bp278/

    This is from 2010, but still one of the best  looking at the use of srtudent test scores to evaluate teachers and schools.

  • bvaliantbvaliant
    Posts: 185
    It turns out VAM doesn't work for evaluation of principals either (but that won't stop the Rheeformers from trying). http://nepc.colorado.edu/newsletter/2013/03/review-estimating-effect-principals
  • bvaliantbvaliant
    Posts: 185
    VAM as "junk science"

    And yet another piece on VAM as "junk science" by Ravitch: http://dianeravitch.net/2013/05/07/is-vam-junk-science-matt-di-carlo-and-i-disagree/