Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Login with Facebook Sign In with Google Sign In with OpenID Sign In with Twitter

In this Discussion

Who's Online (0)

High-stakes testing vulnerabilities
  • bvaliantbvaliant
    Posts: 213
    High-stakes testing vulnerabilities

    This discussion is open to all who see that high-stakes testing vulnerabilities are being exposed and a variety of groups are working to either opt out or end the tests.  Bring your ideas here for discussion and to gather support for the weeks ahead.

    http://dumpduncan.org/forum/discussion/27/high-stakes-testing-vulnerabilities#Item_1
  • bvaliantbvaliant
    Posts: 213
    Here is the first major study of testing errors I know about.  I worked wit one of the authors, Rhoades, while I was doing a study of the WA State WASL test.  Although significant errors have been found from the beginning of the high-stakes testing movement, this work has gone unreported in mainstream media.   http://www.ascd.org/publications/researchbrief/v1n16/toc.aspx
  • bvaliantbvaliant
    Posts: 213
    Has anyone done cost studies for high-stakes testing.  Even the state contracts with Pearson and others would be significant. I see these in passing from various states, but a stste-by-state listing would be powerful.  A longitudinal study would be even better.
  • Sandra
    Posts: 23
    Here is one study of the overall fiscal impact of he Common Core elements. Testing is found on pg. 8.
    http://www.pioneerinstitute.org/pdf/120222_CCSSICost.pdf
  • bvaliantbvaliant
    Posts: 213
    Thanks, Sandra.
  • Sandra
    Posts: 23
    Here's some more on costs.
    Implementing Common Core Could Cost States $30 Billion
    RACHEL SHEFFIELD  November 28, 2011   Heartlander


    Cost of Common Core Standards will be $800 million in California, $300 million in Washington state
    November 10, 2011 by Liv Finne  Washington Policy Center
  • bvaliantbvaliant
    Posts: 213
    I don't generally trust the WA Policy Center, but these numbers look OK.  All of these estimates under-value the cost to local districts, however.  Administrative costs and opportunity costs (the time lost to real instruction and other services like library time, counselling time) add millions to the total for each state but are never accounted for.
  • bvaliantbvaliant
    Posts: 213
    The Case Against High Stakes Testing can be found at http://www.fairtest.org/k-12/high stakes.  Fairtest is the first place I go with concerns about testing.
  • Sandra
    Posts: 23
    Bob - I visit Fair Test, but search for any published views on costs. I do not remember exactly how I stumbled on these and do not know anything about the researcher. I am in Florida. Few such state/district level reports exist.

    I agree that these costs should be viewed as estimates, but they are a start.
  • Sandra
    Posts: 23
    ok, I see now, this "think-tank" is pro market-driven approaches. They are non-partisan! HA...in today's reality, that view is working well in both establishments. If the data is useable for questioning fiscal responsibility and the estimates seem reasonable, while incomplete, the info may have some use. I am sticking to,parent opposition and resistance. We all find a piece we can contribute to, or try.

    TY for your efforts.
  • bvaliantbvaliant
    Posts: 213
    Here is a brief strategy discussion from Dump Duncan, Facebook.  http://www.facebook.com/groups/dumpduncan/doc/284830531610479/

  • A question was posted on how much testing costs. The closest attempt to look at costs comes from the Pioneer Institute. I wondered if their analysis had gotten any reaction. There was here:

    http://stateimpact.npr.org/indiana/2012/04/19/how-much-new-nationwide-academic-standards-could-cost-indiana/

    The problem for me is the claim there will be cost-savings, but critics fail to transate that into dollars and cents.

    Thoughts on ths critique?

  • bvaliantbvaliant
    Posts: 213
    These early cost estimates are usually way too low and these are really only the cost to the state.  Local districts end up getting stuck with administrative costs on top of all this  and then there are the opportunity costs which never get counted.  While the kids are being tested their teachers are not teaching but doing proctoring or some other low-skill job while they are being paid a professional salary.  This is a ridiculous waste of money and of highly skilled professionals time.
  • Bob - So in the end, no one knows the true costs. For a data-hungry ed reform environment, the basics of costs to the taxpayers are hidden. You'd think that those who scream the loudest about fiscal responsibility and small government might be asking now; but confirmation after confirmation comes that both political Establishments and powerful $ forces are fueling this drift to centralized control. 

    Found a tweet on today's ALEC meeting posted by Joy Resmovits, who now writes for the Huffington Post. The board has decided to "delay endorsing or rejecting the Common Core."
  • bvaliantbvaliant
    Posts: 213
    The Seattle Times costs for WA suffer from the same problem I stated above.  These are only the state costs.  The State Superintendents office, and the legislature, act as if the districts have no costs associated with administering the tests, purchase of test prep materials, or cost of professional development.  I also have questions about the reported state costs.  In my earlier studies of the WASL I found they hid all kinds of costs of the tests such as training programs, publication of materials, costs of reporting the results on their websites, contracts with consultants, etc.  The number they report is usually just the contract figure they have with the testing company
  • Then, the question should be asked at that level and collected and in the meanwhile point out your observation.
  • bvaliantbvaliant
    Posts: 213
    I found this on another page and really enjoyed reading it.


    QUOTE: Although I’m an individual with a background in the sciences –
    both from a formal education and as a hobbyist, my 15 years experience
    as a teacher has me convinced beyond a shadow of a doubt that educating
    our youth effectively is an art. Innovation, creativity, and passion are
    requisites. Without these, even the brightest would be rendered useless
    as educators.

    from Ed Komperda

    "A process of
    evaluating and paying police officers dependent upon the amount of
    crimes committed in their jurisdiction, or judging firefighters on the
    rate of fires or accidents in specific neighborhood would be absurd.
    These factors are beyond the control of hardest working, most caring,
    and competent professionals. Similarly, teachers, administrators – and
    for that matter schools overall, cannot be accurately evaluated via
    standardized tests. Factors of diversity and variability, both
    inter/intra-community, make this ideology impossible. The job
    responsibilities of school professionals cannot be measured
    quantitatively with any degree of accuracy. Standardized test scores
    have no bearing on the quality of an educational environment,
    attitude/ability of teachers, administrators or support staff, nor do
    they accurately assess learning in individual students.

    The
    current standardized assessments mandated by New York State Department
    of Education are a giant leap back in time. They fail to address the
    growth we’ve made as a nation in accepting all people of various
    ethnic/racial backgrounds, socioeconomics, traditional vs.
    non-traditional families, talents, and disabilities. A one-size-fits-all
    education doesn’t work. Adding insult to injury, is a thinly veiled
    attempt at political correctness through an assortment of
    difficult-to-pronounce names randomly tossed into state test questions
    for an artificial ‘multi-cultural’ appearance. This only serves to
    confuse students and detracts from accurate measurement of their
    comprehension level. Most disturbing is witnessing special education
    students who are non-readers, but required to sit through a reading test
    on their chronological age level -- that they cannot read. New York
    State chooses not to disclose its grading procedure to the public.
    Consequently, scores that students receive are often confusing and can
    be manipulated each year by NYSED for its own agenda. As a teacher, it
    would be preposterous to give a test and refuse to tell students and
    parents how the grade was determined. NYSED's attempt at the
    standardization of students assessments is a textbook example of the
    opposite -- total inconsistency of the tests themselves. Since the late
    1990's, NYSED has, at its own whim, altered the format and lengthened
    their standardized tests numerous times. Meanwhile they attempt to
    compare results on these tests year after year.

    NYSED chooses
    not to review the work of the majority of students on their standardized
    tests. Instead, school districts are left no choice than to hire
    substitute teachers and teachers are removed from the classroom for a
    day(s) to sit and evaluate state test questions and/or staff development
    days are spent grading state tests. When this occurs, certainly no
    "development" transpires on these days intended for professional growth.
    Teachers do not assign overall scores to the tests. Instead, they
    provided criteria from NYSED which attempts to explain scoring for
    individual answers. Generally speaking, the answers are highly
    subjective as are the scoring rubrics generated by NYSED. Many questions
    have possible scores ranging between a low of zero points and a high of
    two points. NYSED explains to school districts that they can differ by a
    point either way on each question and that this will not pose a problem
    in the event of an audit. However, it is a huge problem for teachers
    who all wish to grade each and every student fairly. So a student who
    deserves a grade of a 1 out of a possible 2 points can "fairly",
    according to NYSED, receive a 0 or a 2 and the grading cannot be
    questioned. In the context of a conventional test, this would mean a
    student can fairly be graded 0% (0 out of 2) by one scorer and 100% (2
    out of 2) by another. This is completely unfair to all students.


    On a hit or miss basis, "field questions" are included in the tests.
    NYSED claims these are trial questions with which they are experimenting
    on students for the creation of future tests. NYSED does not disclose
    to students or teachers which questions on tests fall into this
    category. They state that these questions do not count towards a child's
    grade.

    Teachers give tests to help children learn. Students
    who have difficulty with specific concept(s) on a test are provided one
    on one or small group instruction. When a large portion of a class finds
    a topic challenging a test, that concept is revisited and reviewed by
    the teacher. Months are spent preparing for NYS tests, yet NYSED gives
    children no opportunity to learn from their mistakes. That's because
    students, parents, and teachers are not given the right to see the
    'graded' tests. Several months after the tests are administered, a
    letter arrives in the mail showing a student's score. It's an attempt by
    NYSED state to appear intellectually inclined, as they provide
    "data"along with a colorful graph. Unfortunately, this teaches the child
    nothing. This process defies the most basic axiom of education.
    Furthermore, it shows a complete disinterest and lack of respect for
    students as learners and their parents as mentors and tax payers.


    The students who do not score at what NYSED deems a “proficient” level
    on these tests are required by the state to be pulled out of class for
    remedial services. This method is highly inaccurate in identifying
    students who need extra support. It’s not uncommon to witness students
    who perform on a regular basis at an average or even above average level
    in the classroom attending these remediation services. This comes as a
    disservice to all students. Those in need of remediation are receiving
    less support due to remedial teachers being spread thinner amongst
    larger groups of students. Students who are forced to attend remediation
    and do not truly need it are missing out on valuable classroom
    instruction and activity on their own level. At times they're at risk of
    becoming discipline problems since the instructional level is beneath
    them. Currently, NYSED has branded a large number of schools statewide
    as "schools in need of improvement." Surely, these schools are like most
    others -- students are working hard as are the educators and
    administrators. Often this label is a result of one (though sometimes
    several) subject areas in which overall students "haven't made
    improvement" in the past two years. This initiates a so-called "school
    improvement process" in which schools are required to have a third party
    work with teachers for the purpose of improving scores on NYS tests.
    Once again, more time, efforts, and money are spent on state tests which
    could be used productively to provide real education.

    From a
    teacher’s perspective, more standardized testing makes for an easier –
    though less rewarding job. An increasingly scripted school year requires
    less time planning. Due to pressure felt by school districts, some have
    resorted to programs which require a specific lesson to be taught each
    and every day of the school year in a given subject area. The goal:
    Prepare all year for the state tests. This strategy neglects students'
    grasp of concepts -- both as a group and as individuals. Some years,
    students may grasp specific topics better than others. Certain students
    may have better background knowledge on specific concepts. This method
    sets students up for failure and adds unnecessary, counterproductive
    pressure. A whole school year can never be planned day by day. School
    districts are pitted against one another and students are lost in the
    shuffle.

    McGraw-Hill, an educational publishing company, has
    found standardized assessments to be quite lucrative for their own
    financial gain. Since 1996, their stock value has quintupled. However,
    Pearson Education has recently garnered a $33 million contract to print
    standardized assessments in NYS and continues to spend millions on
    lobbying. (Apparently going as far as footing the bill for politicians'
    trips amongst other perks cast Pearson Education in a more favorable
    light to those in power.) The financial success of these companies is
    largely the result of “involuntary customers” – students who are
    required to take the test and teachers/parents who are also mandated to
    comply. Our youth are our future – our most important resource for
    society, not mere ‘tools’ for personal gain.

    Are students
    entering more college more proficient today than a decade or so ago
    prior to excessive testing? Are people entering the "real world" and
    workforce more competent today than their predecessors due to
    standardized testing? Is the increase in the number of tests and amount
    of time spent testing and preparing for standardized tests benefiting
    anyone besides those who make the tests?

    Although I’m an
    individual with a background in the sciences – both from a formal
    education and as a hobbyist, my 15 years experience as a teacher has me
    convinced beyond a shadow of a doubt that educating our youth
    effectively is an art. Innovation, creativity, and passion are
    requisites. Without these, even the brightest would be rendered useless
    as educators. Bringing interesting and unique subject matter to the
    classroom enriches curriculum by capturing students' interest and
    generating enthusiasm. These emotions drive learning. During a recent
    two week period, 4th graders have spent 9 hours taking standardized
    tests. Other students throughout various grade levels have done
    likewise. Some special education students get double time for their
    tests. So we have 10 years olds with special needs working on tests for
    18 hours within the span of two weeks. Instead, this time could have
    been spent on learning. Never have I witnessed enthusiasm by students
    sitting for extended periods of time filling in bubbles on a state test
    or answering questions on generally non-classic literature (i.e. 8th
    graders reading a tale entitled "The Hare And The Pineapple.").


    In spite of tax payers in New York State paying some of the highest
    rates in the nation, the state has cried poverty in recent years and cut
    funding to schools. Don't be fooled, these are crocodile tears. NYS has
    forced many schools to reduce the number educators, administrators,
    support staff, as well cutting student programs and basic educational
    supplies. Yet, NYS continues to spend haphazardly -- over $30 Million in
    taxpayers' money on state tests. Instead these funds could have been
    utilized productively in the best interest of students.

    Quality
    education allows students to progress at their own rates, while
    striving for benchmarks and utilizing standards for reference points.
    Students’ progress is most accurately measured holistically, on a day to
    day basis, and through various modalities of formal and informal
    assessments. The intention is schools is to prioritize students, not
    make standardized testing the focal point. The grand finale of learning
    should not be a standardized test. As is life, true education is a
    journey, not a destination.

  • http://staugustine.com/news/local-news/2012-05-15/superintendents-want-know-what-wrong-writing-test#.T7Q31Gt5mSP

    Florida is THE poster child of messed up high stakes assessment.

    State is in chaos.
  • JonLJonL
    Posts: 2
    Absenteeism and VAM
     I have one question on VAM that I think is important and would love to
    have answered.  It's based on a new study on absenteeism that just came
    out. http://tinyurl.com/8ybm5ng


    Basically, the data collected on absenteeism is of very poor
    quality and is not the same across jurisdictions. As the report shows, chronic
    absenteeism has a strong negative affect on individual student outcomes as well as a teachers ability to move the whole class ahead, so does that lack of data on this critical factor further degrade the already poor accuracy of VAM
    based teacher evaluations? Here are some snips from the executive summary of the report to show in
    part what I mean:


    "Chronic absenteeism is not the same as truancy or average daily attendance – the attendance rate
    schools use for state report cards and federal accountability. Chronic absenteeism means missing


    10 percent of a school year for any reason. A school can have average daily attendance of 90
    percent and still have 40 percent of its students chronically absent, because on different days,
    different students make up that 90 percent."



     "Chronic absenteeism is most prevalent among low-income students. Gender and ethnic
    background do not appear to play a role in this. The youngest and the oldest students tend to
    have the highest rates of chronic absenteeism, with students attending most regularly in third


    through fifth grades."
    "America’s education system is based on the assumption that barring illness or an extraordinary
    event, students are in class every weekday. So strong is this assumption that it is not even

    measured. Indeed, it is the rare state education department, school district or principal that can
    tell you how many students have missed 10 percent or more of the school year or in the previous
    year missed a month or more school − two common definitions of chronic absence."

    "Because it is not measured, chronic absenteeism is not acted upon. Like bacteria in a hospital,
    chronic absenteeism can wreak havoc long before it is discovered. If the evidence in this report is
    borne out through more systematic data collection and analysis, that havoc may have already

    undermined school reform efforts of the past quarter century and negated the positive impact of
    future efforts."

    Executive summary of the report:   http://tinyurl.com/d2hrxn4
  • bvaliantbvaliant
    Posts: 213
    Here is a whole pile of resources on VAM and Merit Pay:

     






    VAM Analysis and Merit Pay





    Displaying all 12 posts.



     



        *



    yes">      Robert Valiant



    yes">      Here you will find articles and data
    to refute the claims of proponents of VAM and merit pay.



    yes">      about 10 months ago · Delete Post



        *



    yes">      Robert Valiant



     



    http://gfbrandenburg.wordpress.com/2012/03/05/more-on-the-utter-stupidity-of-nycs-value-added-machinations/



     



    or, if you prefer,



    http://bit.ly/zNLAbn



     



    and



    http://bit.ly/zXPBK1



     



    Guy Brandenburg analysis of NYC VAM evaluations



     



     



     



     



     



     



     



     



        
    John Rogers



     



    yes">      Director, UCLA's Institute for
    Democracy, Education, and Access



    yes">      From Huffington Post



    yes">      Posted: August 24, 2010 11:05 AM



     



    yes">      Value Added is No Magic: Assessing
    Teacher Effectiveness



     



     



    yes">      Read More: Lausd , Los Angeles Times
    , Los Angeles Unified School District , School Reform , Teacher Effectiveness ,
    Teacher Rankings , Value Added , Los Angeles News



     



     



     



    yes">      That old sorcerer has vanished



    yes">      And for once has gone away!



    yes">      Spirits called by him, now banished,



    yes">      My commands shall soon obey.



     



    yes">      In Goethe's classic, the apprentice
    uses a sorcerer's spell to ease his daily chores. Chanting the master's words,
    he brings a broomstick to life and tells it to fetch water to clean the
    workshop. The broomstick obeys, only too well. It races between the well and back
    until the workshop begins to flood. Although the apprentice had enough
    knowledge to set magic in motion, he could not think ahead to what he did not
    know.



     



    yes">      I worry about a similar flood of
    unintended consequences if the Los Angeles Times moves forward with its plans
    to publish a database that places 6,000 Los Angeles third- to fifth-grade
    teachers on a spectrum from "least effective" to "most
    effective." The Times believes that the data will be a powerful tool to
    force better teaching, but it cannot anticipate all of the consequences. For
    example, consider that capable prospective teachers might avoid a profession in
    which they risk public embarrassment based on an undeveloped science. Consider
    the well-documented estimates that 25% of the value-added assessments are
    likely to be in error.



     



    yes">      Publishing the database might easily
    undermine parent and teacher morale and make it more difficult for principals
    to advance school improvement. Being told that their child's teacher is
    "ineffective," or even marginally less effective than a teacher
    across the hall, may lead some parents to pressure the principal to place their
    child with a "high-scoring" teacher. Pitting parents against one
    another or against their principal is not a recipe for school improvement.



     



    yes">      The Times' teacher effectiveness
    rankings are based on an elaborate statistical model created by Richard Buddin,
    a senior economist and education researcher at the Rand Corporation. (Significantly,
    Buddin did not attach teachers' names to his analysis; that was done by the
    Times.)



     



    yes">      Buddin is one of many researchers
    across the country exploring so-called value-added approaches to assessing
    teacher quality. The assessments measure gains that students make on standardized
    tests from one year to the next. For example, researchers compare test scores
    of fourth graders with their scores as third graders to determine the
    "value added" by the fourth grade teacher. Proponents believe that
    the "value added" reliably distinguishes between more and less
    effective teachers. And they think that school officials would use such
    comparisons to target support to struggling teachers and motivate them to do
    better.



     



    yes">      Yet value-added analyses focus
    narrowly on standardized tests, usually in math and English Language Arts.
    These tests give important information about student learning, but they ignore
    much learning that matters to students, parents, and teachers. That's why it
    can be a useful tool, but cannot possibly stand alone as a measure of
    "effectiveness." The National Academy of Sciences has identified
    several of the problems posed by value-added methods. These cautions should be
    taken seriously.



     



    yes">      * First, student assignments to
    schools and classrooms are rarely random. As a consequence it is not possible
    to definitively determine whether higher or lower students test scores result
    from teacher effectiveness or are an artifact of how students are distributed.



     



    yes">      * Second, it is difficult to compare
    growth of struggling students with the growth of high performers. In technical
    terms, standardized tests do not form equal interval scales. Enabling students
    to move from the 20th percentile to the 30th is not the same as helping
    students move from the 80th to the 90th percentile. These test score numbers
    are not like inches along a tape measure that have the same value regardless of
    where they occur.



     



    yes">      * Third, estimates of teacher
    effectiveness can range widely from year to year. In recent studies, 10-15% of
    teachers in the lowest category of effectiveness one year moved to the highest
    category the following year while 10-15% of teachers in the highest category
    fell to the lowest tier.



     



     



    yes">      The National Academy of Sciences
    concluded that value-added analysis "should not be used as the sole or
    primary basis for making operational decisions because the extent to which the
    measures reflect the contribution of teachers themselves, rather than other
    factors, is not understood."



     



    yes">      And yet, the Los Angeles Times is
    about to publish a database with the teacher effectiveness rankings of 6,000
    elementary school teachers. The Times argues that its role is to provide
    "parents and the public ... information that would otherwise be
    withheld" about the "performance of public employees." The Times
    should not believe in the magic of this data, and should realize that it cannot
    foresee or control all of the consequences.



     



     



    yes">      Follow John Rogers on Twitter:
    www.twitter.com/UCLA_IDEA



     



    yes">      about 10 months ago · Delete Post



        *



    yes">      Robert Valiant



    yes">      Evidence about the use of test
    scores to evaluate teachers: Economic Policy Institute, 2010



     



    yes">      “…there is broad agreement among
    statisticians, psychometricians, and economists that student test scores alone
    are not sufficiently reliable and valid indicators of teacher effectiveness to
    be used in high-stakes personnel decisions, even when the most sophisticated
    statistical applications such as value-added modeling are employed.



    yes">      For a variety of reasons, analyses
    of VAM results have led researchers to doubt whether the methodology can
    accurately identify more and less effective teachers. VAM estimates have proven
    to be unstable across statistical models, years, and classes that teachers
    teach. One study found that across five large urban districts, among teachers
    who were ranked in the top 20% of effectiveness in the first year, fewer than a
    third were in that top group the next year, and another third moved all the way
    down to the bottom 40%. Another found that teachers’ effectiveness ratings in
    one year could only predict from 4% to 16% of the variation in such ratings in
    the following year. Thus, a teacher who appears to be very ineffective in one
    year might have a dramatically different result the following year. The same
    dramatic fluctuations were found for teachers ranked at the bottom in the first
    year of analysis. This runs counter to most people’s notions that the true
    quality of a teacher is likely to change very little over time and raises
    questions about whether what is measured is largely a “teacher effect” or the
    effect of a wide variety of other factors.”



     



    yes">      about 10 months ago · Delete Post



        *



    yes">      Robert Valiant



    yes">      Neither Fair Nor Accurate •
    Research-Based Reasons Why High-Stakes Tests Should Not Be Used to Evaluate
    Teachers



     



     



     



    yes">      By Wayne Au



     



    yes">      A pitched battle raged in my
    hometown of Seattle this fall. Superintendent Maria Goodloe-Johnson and the
    Seattle Public Schools district fought with the Seattle Education Association over
    their most recent teachers’ union contract. At the heart of the dispute: Should
    teacher evaluations be based in part on student scores on standardized tests?



     



    yes">      Seattle is not unique in this
    struggle, and it is clear that Superintendent Goodloe-Johnson takes her cue
    from what is happening nationally.



     



    yes">      In August, for instance, the Los
    Angeles Times printed a massive study in which LA student test scores were used
    to rate individual teacher effectiveness. The study was based on a statistical
    model referred to as value-added measurement (VAM). As part of the story, the
    Times published the names of roughly 6,000 teachers and their VAM ratings (see
    sidebar, p. 37).



     



    yes">      In October the New York City
    Department of Education followed suit, publicizing plans to release the VAM
    scores for nearly 12,000 public school teachers. U.S. Secretary of Education
    Arne Duncan lauded both the Times study and the NYC Department of Education
    plans, a stance consistent with Race to the Top guidelines and President
    Obama’s support for using test scores to evaluate teachers and determine merit
    pay.



     



    yes">      Current and former leaders of many
    major urban school districts, including Washington, D.C.’s Michelle Rhee and
    New Orleans’ Paul Vallas, have sought to use tests to evaluate teachers. In
    fact, the use of high-stakes standardized tests to evaluate teacher performance
    à la VAM has become one of the cornerstones of current efforts to reshape
    public education along the lines of the free market.



     



    yes">      On the surface, the logic of VAM and
    using student scores to evaluate teachers seems like common sense: The more
    effective a teacher, the better his or her students should do on standardized
    tests.



     



    yes">      However, although research tells us
    that teacher quality has an effect on test scores, this does not mean that a
    specific teacher is responsible for how a specific student performs on a
    standardized test. Nor does it mean we can equate effective teaching (or actual
    learning) with higher test scores.



     



    yes">      Given the current attacks on
    teachers, teachers’ unions, and public education through the use of educational
    accountability schemes based wholly or partly on high-stakes standardized test
    scores and VAM, it is important that educators, students, and parents
    understand why, based on educational research, such tests should not be used to
    evaluate teachers.



     



    yes">      Although there are many
    well-documented problems with using VAM to evaluate teachers, I’ve chosen to
    highlight six critical issues with VAM that are so problematic they alone
    should be enough to stop the use of high-stakes standardized tests for such
    evaluations. I hope these will be helpful as talking points for op-ed pieces,
    blogs, and discussions at school board meetings, PTA meetings, and in the
    bleachers at basketball games.



     



    yes">      Statistical Error Rates



     



          There is a statistical error
    rate of 35 percent when using one year’s worth of test data to measure a
    teacher’s effectiveness, and an error rate of 25 percent when using data from
    three years, researchers Peter Schochet and Hanley Chiang find in their 2010
    report “Error Rates in Measuring Teacher and School Performance Based on Test
    Score Gains,” released by the U.S. Department of Education’s National Center
    for Education Statistics.



     



    yes">      Bruce Baker, finance expert at
    Rutgers University, explains that using high-stakes test scores to evaluate
    teachers in this manner means there is a one-in-four chance that a teacher
    rated as “average” could be incorrectly rated as “below average” and face
    disciplinary measures. Because of these error rates, a teacher’s performance
    evaluation may pivot on what amounts to a statistical roll of the dice.



     



    yes">      Year-to-Year Test Score Instability



     



    yes">      As Tim Sass, economics professor at
    Florida State University, points out in “The Stability of Value-Added Measures
    of Teacher Quality and Implications for Teacher Compensation Policy,” test
    scores of students taught by the same teacher fluctuate wildly from year to
    year. In one study comparing two years of test scores across five urban
    districts, more than two-thirds of the bottom-ranked teachers one year had
    moved out of the bottom ranks the next year. Of this group, a full third went
    from the bottom 20 percent one year to the top 40 percent the next. Similarly,
    only one-third of the teachers who ranked highest one year kept their top
    ranking the next, and almost a third of the formerly top-ranked teachers landed
    in the bottom 40 percent in year two.



     



    yes">      If test scores were an accurate
    measurement of teacher effectiveness, “effective” teachers would rate high
    consistently from year to year because they are good teachers; and one would
    expect “ineffective” teachers to rate low in terms of test scores just as
    consistently. Instead, the year-to-year instability that Sass highlights shows
    that test scores have very little to do with the effectiveness of a single
    teacher and have more to do with the change of students from year to year
    (unless, of course, one believes that one-third of the highest ranked teachers
    in the first year of the study simply decided to teach poorly in the second).



     



    yes">      Day-to-Day Score Instability



     



    yes">      Fifty to 80 percent of any
    improvement or decline in a student’s standardized test scores can be
    attributed to one-time, randomly occurring factors, according to Thomas Kane of
    Harvard University and Douglas Staiger of Dartmouth College in their research
    report “Volatility in Test Scores.”



     



    yes">      This means that factors such as
    whether or not a child ate breakfast on test day, whether or not a child got
    into an argument with parents or peers on the way to school, which other
    students happened to be in attendance while taking the test, and the child’s
    feelings about the test administrator account for at least half of any given
    student’s standardized test score gains or losses. Some factors, such as a dog
    barking outside an open window, can affect an entire class.



     



    yes">      Kane and Staiger’s findings
    illustrate that using tests to evaluate teachers ignores the reality that a
    host of individual daily factors that are completely out of a teacher’s control
    contribute to how a student performs on any given test. To reward or punish a
    teacher based on such scores could literally mean rewarding or punishing a
    teacher based on how well or poorly a student’s morning went.



     



    yes">      Nonrandom Student Assignments



     



    yes">      The grouping of students—either
    within schools through formal and informal tracking or across schools through
    race, socioeconomic class, and linguistic (ELL) segregation—greatly influences
    VAM test results, as 10 leading researchers in teacher quality and educational
    assessment highlight in their policy brief “Problems with the Use of Student
    Test Scores to Evaluate Teachers,” published by the Economic Policy Institute.



     



    yes">      These researchers note that
    “teachers who have chosen to teach in schools serving more affluent students
    may appear to be more effective simply because they have students with more
    home and school supports for their prior and current learning, and not because
    they are better teachers.”



     



    yes">      Even when VAM models attempt to take
    into account a student’s prior achievement or demographic characteristics, the
    models assume that all students will show test gains at an equal rate. This
    assumption, however, does not necessarily hold true for groups of students who
    historically have performed poorly on tests, for English language learners who
    are asked to become proficient in both a new language and a tested subject
    area, or for students with disabilities whose test-based rates of progress may
    be incomparable to any other student.



     



    yes">      Nonrandom student assignment means
    that a teacher could be punished, dismissed, or lose tenure purely because the
    course they teach or the school they teach in has a significant population of
    traditionally low-scoring students who may show variable or slower test score
    gains.



     



    yes">      Imprecise Measurement



     



    yes">      High-stakes, standardized tests are
    also unable to account for the complexities of learning (and, by extension,
    teaching). For instance, we know from the linguistic research of Steven Pinker
    and others that learning often happens in a U-shape—that making mistakes is an
    integral part of the learning process. When children are tested, we never quite
    know where on the U-shaped learning curve they might be, nor do we realize that
    their mistakes could be a vital part of a natural learning process. When tests
    are used to evaluate teachers, it is possible that highly effective teachers
    who push students out of their cognitive comfort zones are penalized for
    provoking the deep learning that requires students to make mistakes on the way
    to greater understanding.



     



    yes">      Standardized tests are also too
    crude to account for the possibility of cognitive transfer of skills that
    students learn across different subjects. Using VAM, as the researchers in the
    above-mentioned Economic Policy Institute policy brief explain, means that “the
    essay writing a student learns from his history teacher may be credited to his
    English teacher, even if the English teacher assigns no writing; the
    mathematics a student learns in her physics class may be credited to her math
    teacher.” In other words, we can never be certain which class and which teacher
    contributed to a given student’s test performance in any given subject.



     



    yes">      Out-of-School Factors



     



    yes">      Out-of-school factors such as
    inadequate access to health care, food insecurity, and poverty-related stress,
    among others, negatively impact the in-school achievement of students so
    profoundly that they severely limit what schools and teachers can do on their
    own, explains David Berliner, Regents Professor of Education at Arizona State
    University, in his report “Poverty and Potential.”



     



    yes">      Although it is clear from the
    research of Stanford University’s Linda Darling-Hammond and others that
    teachers play an absolutely pivotal role in student success, when we use
    high-stakes tests to evaluate teachers, we incorrectly assume that teachers
    have the ability to overcome any obstacle in students’ lives to improve
    learning. Although good teachers are critically necessary, they are not always
    sufficient.



     



    yes">      To assume otherwise is to think that
    teachers (and schools) can somehow make up for the lack of housing, food,
    safety, and living wage employment, among other factors, all on their own. The
    social safety net is the responsibility of a much broader socioeconomic
    network—not the sole responsibility of the teacher.



     



    yes">      Politics, Not Reality



     



    yes">      The reality of standardized tests is
    that they are too imprecise and inaccurate to measure the effectiveness of
    individual teachers. The sad thing is that testing experts, researchers, and
    psychometricians have known this for quite some time. In 1999, for instance,
    the expert panel that made up the Committee on Appropriate Test Use of the
    National Research Council cautioned that “an educational decision that will
    have a major impact on a test-taker should not be made solely or automatically
    on the basis of a single test score.”



     



    yes">      Yet two short years later, a
    bipartisan Congress and the presidential administration of George W. Bush
    passed No Child Left Behind and its test-and-punish approach to school reform
    into law.



     



    yes">      Although the Bush administration
    seemed to ignore educational research as a matter of policy (as illustrated
    through NCLB’s Reading First program and the advocacy of using phonics-only
    teaching methods that had little basis in research), many hoped for something
    different with the election of President Obama.



     



    yes">      Unfortunately, the Obama
    administration has sent a clear message: When it comes to high-stakes
    standardized testing, the research doesn’t matter.



     



    yes">      It hasn’t mattered that, according
    to the above cited U.S. Department of Education report, “More than 90 percent
    of the variation in student gain scores is due to the variation in student-level
    factors that are not under control of the teacher.”



     



    yes">      It hasn’t mattered that the National
    Research Council of the National Academy of Sciences has stated that “VAM
    estimates of teacher effectiveness should not be used to make operational
    decisions because such estimates are far too unstable to be considered fair or
    reliable.”



     



    yes">      It hasn’t mattered that even the
    researchers who completed the Los Angeles Times study acknowledged that VAM
    data were too unreliable to use as the sole measure of teacher performance (a
    point that the Times neglected to clearly articulate in their article).



     



    yes">      Sadly, with Bush, now with Obama,
    politics and ideology trump educational research.



     



    yes">      One would think that all of the
    policy makers, politicians, pundits, superintendents, talk show hosts,
    documentary movie makers, business leaders, and philanthropic foundations so in
    love with the idea of using test score data to evaluate teachers would be
    equally as passionate about accuracy. People’s lives are at stake, and yet the
    “data” underlying important decisions about teacher performance couldn’t be
    shakier.



     



    yes">      The shakiness of test-based VAM data
    illustrates that the current fight over teacher “accountability” isn’t really
    about effectiveness. The more substantial public conversation we should be
    having about rising poverty, the racial resegregation of our schools,
    increasing unemployment, lack of health care, and the steady defunding of the
    public sector—all factors that have an overwhelming impact on students’
    educational achievement—has been buried. Instead, teachers and their unions
    have become convenient scapegoats for our social, educational, and economic
    woes.



     



    yes">      Yes, teachers’ performance needs to
    be evaluated, but in a manner that is fair and accurate. Using high-stakes
    standardized tests and VAM to make such evaluations is neither.



    yes">      A former high school teacher, Wayne
    Au is a Rethinking Schools editor and assistant professor at the University of
    Washington, Bothell Campus.



     



     



    yes">      about 9 months ago · Delete Post



        *



          School District Citizens



    yes">      One of the best compendiums of
    arguments against VAM can be found here:
    <http://rdsathene.blogspot.com/2011/02/are-value-added-methods-vam-new-flat.html&gt;



    yes">      about 8 months ago · Delete Post



        *



    yes">      School District Citizens



    yes">      Here is another great source for
    arguing against VAM: http://www.njspotlight.com/ets_symposium/



    yes">      about 8 months ago · Delete Post



        *



    yes">      School District Citizens



    yes">      Read the EPI study of VAM here.
    Theirfindings: VAM is a SCAM.



     



         
    http://voices.washingtonpost.com/answer-sheet/teachers/new-study-blasts-popular-teach.html



    yes">      about 8 months ago · Delete Post



        *



    yes">      School District Citizens



    yes">     
    http://www.economics.harvard.edu/faculty/fryer/files/teacher+incentives.pdf



     



          ABSTRACT



    yes">      Financial incentives for teachers to
    increase student performance is an increasingly popular education policy around
    the world. This paper describes a school-based randomized trial in over
    two-hundred New York City public schools designed to better understand the
    impact of teacher incentives on student achievement. I find no evidence that
    teacher incentives increase student performance, attendance, or graduation, nor
    do I find any evidence that the incentives change student or teacher behavior.
    If anything, teacher incentives may decrease student achievement, especially in
    larger schools. The paper concludes with a speculative discussion of theories
    that may explain these stark results.



    yes">      Roland G. Fryer Department of
    Economics Harvard University



    yes">      about 7 months ago · Delete Post



        *



    yes">      School District Citizens



    yes">      Of course we should hold teachers
    accountable,



    yes">      but this does not mean we have to
    pretend



    yes">      that mathematical models can do
    something they



    yes">      cannot. Of course we should rid our
    schools of



    yes">      incompetent teachers, but
    value-added models are



    yes">      an exceedingly blunt tool for this
    purpose. In any



    yes">      case, we ought to expect more from
    our teachers



    yes">      than what value-added attempts to
    measure.



     



    yes">      John Ewing



     



    yes">      I came across this article by
    Mathematician John Ewing and wanted to share it with you.



     



    yes">      Dora



     



    yes">      Mathematical Intimidation: Driven by
    Data



     



    yes">      by John Ewing



     



    yes">      Mathematicians occasionally worry



    yes">      about the misuse of their subject.



    yes">      G. H. Hardy famously wrote about



    yes">      mathematics used for war in his



    yes">      autobiography, A Mathematician’s



    yes">      Apology (and solidified his
    reputation as a foe of



    yes">      applied mathematics in doing so).
    More recently,



    yes">      groups of mathematicians tried to
    organize a boycott



    yes">      of the Star Wars project on the
    grounds that



    yes">      it was an abuse of mathematics. And
    even more



    yes">      recently some fretted about the role
    of mathematics



    yes">      in the financial meltdown.



     



    yes">      But the most common misuse of
    mathematics



    yes">      is simpler, more pervasive, and
    (alas) more



    yes">      insidious: mathematics employed as a
    rhetorical



    yes">      weapon—an intellectual credential to
    convince



    yes">      the public that an idea or a process
    is “objective”



          and hence better than other
    competing ideas or



    yes">      processes. This is mathematical
    intimidation. It is



    yes">      especially persuasive because so
    many people are



    yes">      awed by mathematics and yet do not
    understand



    yes">      it—a dangerous combination.



     



    yes">      The latest instance of the
    phenomenon is



    yes">      valued-added modeling (VAM), used to
    interpret



    yes">      test data. Value-added modeling pops
    up everywhere



    yes">      today, from newspapers to television
    to



    yes">      political campaigns. VAM is heavily
    promoted with



          unbridled and uncritical
    enthusiasm by the press,



    yes">      by politicians, and even by (some)
    educational experts,



    yes">      and it is touted as the modern,
    “scientific”



    yes">      way to measure educational success
    in everything



    yes">      from charter schools to individual
    teachers.



     



    yes">      Yet most of those promoting
    value-added



    yes">      modeling are ill-equipped to judge
    either its



    yes">      effectiveness or its limitations.
    Some of those



    yes">      who are equipped make extravagant
    claims without



    yes">      much detail, reassuring us that
    someone



    yes">      has checked into our concerns and we
    shouldn’t



    yes">      worry. Value-added modeling is
    promoted because



    yes">      it has the right pedigree—because it
    is based on



    yes">      “sophisticated mathematics”. As a
    consequence,



    yes">      mathematics that ought to be used to
    illuminate



    yes">      ends up being used to intimidate.
    When that happens,



    yes">      mathematicians have a responsibility
    to



    yes">      speak out.



     



    yes">      Background



    yes">      Value-added models are all about
    tests—standardized



    yes">      tests that have become ubiquitous in
    K–12



    yes">      education in the past few decades.
    These tests have



    yes">      been around for many years, but
    their scale, scope,



    yes">      and potential utility have changed
    dramatically.



     



    yes">      Fifty years ago, at a few key points
    in their education,



        
     schoolchildren would bring
    home a piece of



    yes">      paper that showed academic
    achievement, usually



    yes">      with a percentile score showing
    where they landed



    yes">      among a large group. Parents could
    take pride in



    yes">      their child’s progress (or fret over
    its lack); teachers



    yes">      could sort students into those who
    excelled



    yes">      and those who needed remediation;
    students could



    yes">      make plans for higher education.



     



    yes">      Today, tests have more consequences.
    “No



    yes">      Child Left Behind” mandated that
    tests in reading



    yes">      and mathematics be administered in
    grades 3–8.



    yes">      Often more tests are given in high
    school, including



    yes">      high-stakes tests for graduation.
    With all that



    yes">      accumulating data, it was inevitable
    that people



    yes">      would want to use tests to evaluate
    everything



    yes">      educational—not merely teachers,
    schools, and



    yes">      entire states but also new
    curricula, teacher training



    yes">      programs, or teacher selection
    criteria. Are



    yes">      the new standards better than the
    old? Are experienced



        
     teachers better than
    novice? Do teachers



    yes">      need to know the content they teach?
    Using data



    yes">      from tests to answer such questions
    is part of the



    yes">      current “student achievement”
    ethos—the belief



    yes">      that the goal of education is to
    produce high test scores. But it is also part of a broader trend in modern



    yes">      society to place a higher value on
    numerical



    yes">      (objective) measurements than verbal
    (subjective)



    yes">      evidence. But using tests to
    evaluate teachers,



    yes">      schools, or programs has many
    problems. (For a



    yes">      readable and comprehensive account,
    see [Koretz



    yes">      2008].) Here are four of the most
    important problems,



    yes">      taken from a much longer list.



     



    yes">      1. Influences. Test scores are
    affected by many factors,



    yes">      including the incoming levels of
    achievement,



    yes">      the influence of previous teachers,
    the



    yes">      attitudes of peers, and parental
    support. One



    yes">      cannot immediately separate the
    influence of a



    yes">      particular teacher or program among
    all those



    yes">      variables.



     



          2. Polls. Like polls,
    tests are only samples. They



    yes">      cover only a small selection of
    material from



    yes">      a larger domain. A student’s score
    is meant to



    yes">      represent how much has been learned
    on all



    yes">      material, but tests (like polls) can
    be misleading.



     



    yes">      3. Intangibles. Tests (especially
    multiple-choice



    yes">      tests) measure the learning of facts
    and procedures



    yes">      rather than the many other goals of



    yes">      teaching. Attitude, engagement, and
    the ability



    yes">      to learn further on one’s own are
    difficult



    yes">      to measure with tests. In some
    cases, these



    yes">      “intangible” goals may be more
    important



    yes">      than those measured by tests. (The
    father of



    yes">      modern standardized testing, E. F.
    Lindquist,



    yes">      wrote eloquently about this [Lindquist
    1951];



    yes">      a synopsis of his comments can be
    found in



    yes">      [Koretz 2008, 37].)



     



    yes">      4. Inflation. Test scores can be
    increased without



    yes">      increasing student learning. This
    assertion has



    yes">      been convincingly demonstrated, but
    it is widely



          ignored by many in the
    education establishment



    yes">      [Koretz 2008, chap. 10]. In fact,
    the assertion



    yes">      should not be surprising. Every
    teacher knows



    yes">      that providing strategies for
    test-taking can



    yes">      improve student performance and that
    narrowing



    yes">      the curriculum to conform precisely
    to the



    yes">      test (“teaching to the test”) can
    have an even



    yes">      greater effect. The evidence shows
    that these



    yes">      effects can be substantial: One can
    dramatically



    yes">      increase test scores while at the same
    time actually



    yes">      decreasing student learning. “Test
    scores”



    yes">      are not the same as “student
    achievement”.



     



    yes">      This last problem plays a larger
    role as the stakes



    yes">      increase. This is often referred to
    as Campbell’s



    yes">      Law: “The more any quantitative
    social indicator



    yes">      is used for social decision-making,
    the more



    yes">      subject it will be to corruption
    pressures and



    yes">      the more apt it will be to distort
    and corrupt the



    yes">      social processes it is intended to
    measure” [Campbell



        
     1976]. In its simplest
    form, this can mean



    yes">      that high-stakes tests are likely to
    induce some



    yes">      people (students, teachers, or
    administrators)



    yes">      to cheat…and they do [Gabriel 2010].
    But the



    yes">      more common consequence of
    Campbell’s Law



        
     is a distortion of the
    education experience, ignoring



    yes">      things that are not tested (for
    example, student



    yes">      engagement and attitude) and
    concentrating on



    yes">      precisely those things that are.



     



    yes">      The remainder of this paper can be
    read at Mathematical Intimidation:



    yes">      Driven by the Data.



     



    yes">      about 5 months ago · Delete Post



        *



    yes">      School District Citizens



    yes">      May 15, 2011 To The New York State
    Board of Regents:



    yes">      As researchers who have done
    extensive work in the area of testing and measurement, and the use of
    value-added methods of analysis, we write to express our concern about the
    decision pending before the Board of Regents to require the use of state test
    scores as 40% of the evaluation decision for teachers.



    yes">      As the enclosed report from the
    Economic Policy Institute describes, the research literature includes many
    cautions about the problems of basing teacher evaluations on student test
    scores. These include problems of attributing student gains to specific
    teachers; concerns about overemphasis on “teaching to the test” at the expense
    of other kinds of learning; and disincentives for teachers to serve high-need
    students, for example, those who do not yet speak English and those who have
    special education needs.



    yes">      Reviews of research on value-added
    methodologies for estimating teacher “effects” based on student test scores
    have concluded that these measures are too unstable and too vulnerable to many
    sources of error to be used as a major part of teacher evaluation. A report by
    the RAND Corporation concluded that:



    yes">      The research base is currently
    insufficient to support the use of VAM for high-stakes decisions about
    individual teachers or schools.1



    yes">      The Board on Testing and Assessment
    of the National Research Council of the National Academy of Sciences stated,



    yes">      ...VAM estimates of teacher
    effectiveness ... should not be used to make operational decisions because such
    estimates are far too unstable to be considered fair or reliable.



    yes">      Henry Braun, then of the Educational
    Testing Service, concluded in his review of research:



    yes">      VAM results should not serve as the
    sole or principal basis for making consequential decisions about teachers.
    There are many pitfalls to making causal attributions of teacher effectiveness
    on the basis of the kinds of data available from typical school districts. We
    still lack sufficient understanding of how seriously the different technical
    problems threaten the validity of such interpretations.2



    yes">      According to these studies, the
    problems with using value-added testing models to determine teacher
    effectiveness include:



    yes">      1 Daniel F. McCaffrey, Daniel
    Koretz, J. R. Lockwood, Laura S. Hamilton (2005). Evaluating Value-Added Models
    for Teacher Accountability. Santa Monica: RAND Corporation. 2 Henry Braun,
    Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models
    (Princeton, NJ: ETS, 2005), p. 17.



    yes">      1



    yes">&nbs

  • bvaliantbvaliant
    Posts: 213
    Bruce Baker on High-Stakes Tests

    http://epaa.asu.edu/ojs/article/view/1298/1043


     


     



    Abstract: In this article, we explain how overly prescriptive, rigid state statutory and regulatory
    policy frameworks regarding teacher evaluation, tenure and employment decisions outstrip the
    statistical reliability and validity of proposed measures of teaching effectiveness. We begin with a
    discussion of the emergence of highly prescriptive state legislation regarding the use of student
    testing data within teacher evaluation systems, specifically for
    purposes of making employment decisions. Next, we explain the most
    problematic features of those policies, which include a)
    requirements that test-based measures constitute fixed, non-negotiable weight in final decisions, b)
    that test-based measures are used to place teachers into categories of effectiveness by applying
    numerical cutoffs beyond the precision or accuracy of the available data, and c) that professional
    judgment is removed from personnel decisions by legislating (or regulating) specific actions be taken
    when teachers fall into certain performance categories. In the subsequent section, we point out that
    different types of measures are being developed and implemented across states, and we explain that
    while value-added metrics in particular are, in fact designed to estimate a teacher’s effect on student
    outcomes, descriptive growth percentile measures are not designed for making such inference and
    thus have no place in making determinations regarding teacher effectiveness. We also explain that,
    due to the properties of value-added estimates, they have no place in making high-stakes decisions
    based on rigid policy frameworks like those described herein. We evaluate the legal implications of
    rigid reliance on measures of teaching effectiveness that a) lack reliability and b) may be entirely
    invalid.
  • bvaliantbvaliant
    Posts: 213
    Bracey on Testing: 

    Gerald Bracey was my go-to-guy on assessment until his passing.  Here is an article he wrote several years ago that should shine a light into some dark corners of testing.  I especially like his reference to "reading level" as expressed in his answer to question #7.  I think this is one of the biggest misunderstandings related to test scores.  Grade level does not mean what virtually everyone thinks and is the reason many of the test publishers practically abandoned its use for a couple of decades.  https://resources.oncourse.iu.edu/access/content/user/fpawan/L540 _ CBI/bracey_standard-tests.pdf
  • bvaliantbvaliant
    Posts: 213
    This teacher documents some of the uncounted administrative costs of mandated testing:  https://criticalclassrooms.wordpress.com/2013/05/25/an-open-letter-to-bill-gates-from-an-overwhelmed-teacher/
  • bvaliantbvaliant
    Posts: 213
    High-stakes
    Assessments: Oppose. These tests were never designed for the purposes
    for which they are being used, test an incredibly narrow range of
    knowledge and skills even within the domain for which they were
    designed, over time they narrow the taught curriculum to the type of
    items tested, and often lead to manipulation of scores (Campbell's Law).
    They are costly both in dollars and in the instructional time lost to
    their administration. There is no evidence that their use over the past
    decade has led to any significant gains in student achievement as
    measured by the National Assessment of Educational Progress.