Abstract
One urban district in the state of Arizona sought to use an alternative achievement test (i.e., the Northwest Evaluation Association’s (NWEA) Measures of Academic Progress for Primary Grades (MAP)) to include more value-added ineligible teachers in the districts’ growth and merit pay system. The goal was to allow for its K-2 teachers to be more fairly and inclusively eligible for individual, teacher-level value-added scores and the differential merit pay bonuses that were to come along with growth. At the request of district administrators, researchers examined whether the different tests to be used, along with their growth estimates, yielded similar output (i.e., concurrent-related evidence of validity). Researchers found results to be (disappointingly for the district) chaotic, without underlying trend or order. Using the K-2 test for increased fairness and inclusivity was therefore deemed inappropriate. Research findings might be used to inform other districts’ examinations, particularly in terms of this early childhood test.
Similar content being viewed by others
Notes
One option is as follows for interpreting r: 0.8 ≤ r ≤ 1.0 = a very strong correlation; 0.6 ≤ r ≤ 0.8 = a strong correlation; 0.4 ≤ r ≤ 0.6 = a moderate correlation; 0.2 ≤ r ≤ 0.4 = a weak correlation; and 0 ≤ r ≤ 0.2 = a very weak correlation, if any at all (Merrigan & Huston, 2004).
Ibid
Interpreting r: 0.8 ≤ r ≤ 1.0 = a very strong correlation; 0.6 ≤ r ≤ 0.8 = a strong correlation; 0.4 ≤ r ≤ 0.6 = a moderate correlation; 0.2 ≤ r ≤ 0.4 = a weak correlation; and 0 ≤ r ≤ 0.2 = a very weak correlation, if any at all (Merrigan & Huston, 2004).
References
Adler, M. (2013). Findings vs. interpretation in “The Long-Term Impacts of Teachers” by Chetty et al. Education Policy Analysis Archives, 21(1), p. 10. doi:10.14507/epaa.v21n10.2013 Retrieved from http://epaa.asu.edu/ojs/article/view/1264/1033
American Statistical Association (2014). ASA statement on using value-added models for educational assessment. Alexandria, VA. Retrieved from: http://vamboozled.com/wp-content/uploads/2014/03/ASA_VAM_Statement.pdf
Amrein-Beardsley, A. (2014). Rethinking value-added models in education: critical perspectives on tests and assessment-based accountability. New York, NY: Routledge.
Arizona Department of Education. (2012c). A parent’s guide to understanding AIMS 3–8. Phoenix, AZ. Retrieved from http://www.azed.gov/wp-content/uploads/PDF/AIMSDPAcolor.pdf
Arizona Department of Education. (2012a). A-F Accountability. Phoenix, AZ. Retrieved from http://www.azed.gov/research-evaluation/a-f-accountability
Arizona Department of Education. (2012b). A-F Letter Grade Accountability System technical manual. Phoenix, AZ. Retrieved from http://www.azed.gov/research-evaluation/files/2011/09/final_a-f-tech-manual.pdf
Arizona Department of Education. (2014a). Assessment. Phoenix, AZ. Retrieved from http://www.azed.gov/standards-development-assessment/
Arizona Department of Education. (2014b). Arizona Framework for Measuring Educator Effectiveness: Effective through the 2013–2014 school year. Phoenix, AZ. Retrieved from http://www.azed.gov/teacherprincipal-evaluation/files/2013/08/2013-14framework.pdf
Arizona Department of Education. (2014c). Arizona Framework for Measuring Educator Effectiveness: Effective beginning the 2014–2015 school year. Phoenix, AZ. Retrieved from http://www.azed.gov/teacherprincipal-evaluation/files/2013/08/2014-15-arizonaframeworkformeasuringeducatoreffectiveness.pdf
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., Rothstein, R., Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. Washington, D.C.: Economic Policy Institute. Retrieved from http://www.epi.org/publications/entry/bp278
Baker, B. D., Oluwole, J. O., & Green, P. C. (2013). The legal consequences of mandating high stakes decisions based on low quality information: teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5), 1–71. doi:10.14507/epaa.v21n5.2013 Retrieved from http://epaa.asu.edu/ojs/article/view/1298
Berliner, D. C. (2014). Exogenous variables and value-added assessments: a fatal flaw. Teachers College Record, 116(1). Retrieved from http://www.tcrecord.org/content.asp?contentid=17293.
Betebenner, D. W. (2009a). Growth, standards and accountability. Dover: The Center for Assessment. Retrieved from: http://www.nciea.org/publication_PDFs/growthandStandard_DB09.pdf.
Betebenner, D. W. (2009b). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42–51. doi:10.1111/j.1745-3992.2009.00161.x.
Bill & Melinda Gates Foundation. (2010, December). Learning about teaching: initial findings from the Measures of Effective Teaching Project. Seattle, WA. Retrieved from http://www.gatesfoundation.org/college-ready-education/Documents/preliminary-findings-research-paper.pdf
Bill & Melinda Gates Foundation. (2013, January 8). Ensuring fair and reliable measures of effective teaching: culminating findings from the MET project’s three-year study. Seattle, WA. Retrieved from http://metproject.org/downloads/MET_Ensuring_Fair_and_Reliable_Measures_Practitioner_Brief.pdf
Brennan, R. L. (2006) Perspectives on the evolution and future of educational measurement. In R. L. Brennan (Ed.) 2006. Educational measurement (4th ed.), pp. 1–16. Westport, CT: American Council on Education/Praeger
Brennan, R. L. (2013). Commentary on “Validating interpretations and uses of test scores.”. Journal of Educational Measurement, 50(1), 74–83. doi:10.1111/jedm.12001.
Briggs, D. C., & Betebenner, D. (2009). Is growth in student achievement scale dependent? Paper presented at the annual meeting of the National Council for Measurement in Education (NCME), San Diego, CA.
Castellano, K.E. & Ho, A.D. (2013). A practitioner’s guide to growth models. Council of Chief State School Officers
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011, December). The long-term impacts of teachers: teacher value-added and student outcomes in adulthood. Retrieved from http://obs.rc.fas.harvard.edu/chetty/value_added.pdf
Chetty, R., Friedman, J. N., & Rockoff, J. (2014). Discussion of the American Statistical Association’s Statement (2014) on using value-added models for educational assessment. Retrieved from http://obs.rc.fas.harvard.edu/chetty/ASA_discussion.pdf
Collins, C. (2014). Houston, we have a problem: teachers find no value in the SAS Education Value-Added Assessment System (EVAAS®). Education Policy Analysis Archives, 22. doi:10.14507/epaa.v22.1594. Retrieved from http://epaa.asu.edu/ojs/article/view/1594
Collins, C., & Amrein-Beardsley, A. (2014). Putting growth and value-added models on the map: A national overview. Teachers College Record, 16(1). Retrieved from: http://www.tcrecord.org/Content.asp?ContentId=17291
Corcoran, S. P., Jennings, J. L., & Beveridge, A. A. (2011). Teacher effectiveness on high- and low-stakes tests. Retrieved from https://files.nyu.edu/sc129/public/papers/corcoran_jennings_beveridge_2011_wkg_teacher_effects.pdf
Di Carlo, M. (2013, January 17). A few points about the instability of value-added estimates. The Shanker Blog. Retrieved from http://shankerblog.org/?p=7446
Duncan, A. (2009, July 4). The race to the top begins: remarks by Secretary Arne Duncan. Retrieved from http://www.ed.gov/news/speeches/2009/07/07242009.html
Duncan, A. (2011, March 9). Winning the future with education: responsibility, reform and results. Testimony given to the U.S. Congress, Washington, D.C.: Retrieved from http://www.ed.gov/news/speeches/winning-future-education-responsibility- reform-and-results
Duncan, A. (2014, August 21). A back-to-school conversation with teachers and school leaders. SmartBlog on Education. Retrieved from http://smartblogs.com/education/2014/08/21/listening-to-teachers-on-testing
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012, August). Selecting growth measures for school and teacher evaluations. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research (CALDER). Retrieved from www.caldercenter.org/publications/upload/WP-80.pdf
Gabriel, R., & Lester, J. N. (2013). Sentinels guarding the grail: value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21(9), 1–30. doi:10.14507/epaa.v21n9.2013. Retrieved from http://epaa.asu.edu/ojs/article/view/1165.
Gill, B., English, B., Furgeson, J., & McCullough, M. (2014). Alternative student growth measures for teacher evaluation: profiles of early-adopting districts. (REL 2014–016). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic. Retrieved from http://ies.ed.gov/ncee/edlabs
Glazerman, S. M., & Potamites, L. (2011, December). False performance gains: a critique of successive cohort indicators. Mathematica Policy Research. Retrieved from www.mathematica-mpr.com/publications/pdfs/…/False_Perf.pdf
Goldhaber, D., Gabele, B., & Walch, J. (2012). Does the model matter? Exploring the relationship between different achievement-based teacher assessments. CEDR Working Paper No. 2012–6. Seattle, WA: University of Washington. Retrieved from http://www.tandfonline.com/doi/pdf/10.1080/2330443X.2013.856169
Goldhaber, D. & Theobald, R. (2012, October 15). Do different value-added models tell us the same things? Carnegie Knowledge Network. Retrieved from http://www.carnegieknowledgenetwork.org/briefs/value-added/different-growth-models/
Goldschmidt, P., Choi, K., & Beaudoin, J. B. (2012, February). Growth model comparison study: practical implications of alternative models for evaluating school performance. Technical Issues in Large-Scale Assessment State Collaborative on Assessment and Student Standards. Council of Chief State School Officers
Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The test matters: the relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43(6), 293–303. doi:10.3102/0013189X14544542.
Guarino, C., Reckase, M., Stacy, B., & Wooldridge, J. (2015). A comparison of student growth percentile and value-added models of teacher performance. Statistics and Public Policy, 2(1), e1034820–1. doi:10.1080/2330443X.2015.1034820.
Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores. Princeton: Education Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/PICANG14.pdf.
Harris, D. N. (2011). Value-added measures in education: what every educator needs to know. Cambridge: Harvard Education Press.
Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794–831. doi:10.3102/0002831210387916.
Ho, A. D., Lewis, D. M., & Farris, J. L. (2009). The dependence of growth-model results on proficiency cut scores. Educational Measurement: Issues and Practice, 28(4), 15–26. doi:10.1111/j.1745-3992.2009.00159.x.
Jacob, B. A., & Lefgren, L. (2005, June). Principals as agents: subjective performance measurement in education. Cambridge, MA: The National Bureau of Economic Research (NBER). Retrieved from www.nber.org/papers/w11463
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, D.C.: The National Council on Measurement in Education & the American Council on Education.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi:10.1111/jedm.12000.
Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: combining high-quality observations with student surveys and achievement gains. Seattle: Bill & Melinda Gates Foundation. Retrieved from http://www.metproject.org/downloads/MET_Gathering_Feedback_Practioner_Brief.pdf.
Kersting, N. B., Chen, M., & Stigler, J. W. (2013). Value-added added teacher estimates as part of teacher evaluations: exploring the effects of data and model specifications on the stability of teacher value-added scores. Education Policy Analysis Archives, 21(7), 1–39. Retrieved from http://epaa.asu.edu/ojs/article/view/1167.
Koedel, C., & Betts, J. R. (2007, April). Re-examining the role of teacher quality in the educational production function. Working Paper No. 2007–03. Nashville, TN: National Center on Performance Initiatives.
Linn, R. L. (1980). Issues of validity for criterion-referenced measures. Applied Psychological Measurement, 4, 547–561. doi:10.1177/014662168000400407.
Lockwood, J. R., & McCaffrey, D. F. (2009). Exploring student-teacher interactions in longitudinal achievement data. Education Finance and Policy, 4(4), 439–467. doi:10.1162/edfp.2009.4.4.439.
Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V., & Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement, 44(1), 47–67. doi:10.1111/j.1745-3984.2007.00026.x.
McCaffrey, D. F., Sass, T., Lockwood, J., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4(4), 572–606. doi:10.1162/edfp.2009.4.4.572.
Messick, S. (1975). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 30(10), 955–66.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35(11), 1012–1027.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd ed. (pp. 13-103.) New York: American Council on Education and Macmillan
Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749.
National Council on Teacher Quality. (2013). State of the States 2013 [Connect the dots]: using evaluations of teacher effectiveness to inform policy and practice. Retrieved from http://www.nctq.org/dmsView/State_of_the_States_2013_Using_Teacher_Evaluations_NCTQ_Report
Newton, X., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value-added modeling of teacher effectiveness: an exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23), 1–27. Retrieved from http://epaa.asu.edu/ojs/article/view/810.
No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107–110, § 115 Stat. 1425. (2002). Retrieved from http://www.ed.gov/legislation/ESEA02/
Northwest Evaluation Association (NWEA). (2004). Reliability and validity estimates: NWEA Achievement Level Tests and Measures of Academic Progress. Lake Oswego, Oregon: Retrieved from http://images.pcmac.org/Uploads/Jacksonville117/Jacksonville117/Sites/DocumentsCategories/Documents/Reliability_and_Validity_Estimates.pdf
Northwest Evaluation Association (NWEA). (2011a). Arizona linking study: a study of the alignment of the NWEA RIT Scale with Arizona’s Instrument to Measure Standards (AIMS). Portland, OR: Retrieved from http://www.nwea.org/sites/www.nwea.org/files/resources/AZ_Linking%20Study.pdf
Northwest Evaluation Association (NWEA). (2011b). 2011 normative data. Portland, OR: Retrieved from http://www.nwea.org/sites/www.nwea.org/files/resources/2011_Normative_Data_Overvi ew.pdf
Northwest Evaluation Association (NWEA). (2014a). RIT charts—MAP. Portland, OR: Retrieved from http://www.nwea.org/node/4863
Northwest Evaluation Association (NWEA). (2014b). Growth norms. Portland, OR: Retrieved from http://www.nwea.org/node/4347
Northwest Evaluation Association (NWEA). (2012). MAP® basics overview. Portland, OR: Retrieved fromhttp://www.nwea.org/sites/www.nwea.org/files/resources/MAPBasicsOverview_0.pdf
Northwest Evaluation Association (NWEA). (2013). Common Core MAP® and MAP for Primary Grades (MPG). Portland, OR: Retrieved from http://www.nwea.org/support/article/common-core-map-and-map-primary-grades
Papay, J. P. (2010). Different tests, different answers: the stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163–193. doi:10.3102/0002831210362589.
Pearson Education, Inc. (2011). Stanford Achievement Test Series, Tenth Edition. Retrieved from http://www.pearsonassessments.com/HAIWEB/Cultures/en-us/Productdetail.htm?Pid=SAT10C
Pivovarova, M., Broatch, J., & Amrein-Beardsley, A. (2014). Chetty et al. on the American Statistical Association’s recent position statement on value-added models (VAMs): five points of contention [commentary]. Teachers College Record. Retrieved from http://www.tcrecord.org/content.asp?contentid=17633
Polikoff, M. S., & Porter, A. C. (2014, May 12). Instructional alignment as a measure of teaching quality. Education Evaluation and Policy Analysis. doi:10.3102/0162373714531851
Popham, W. J. (1993). Educational testing in America: What’s right, what’s wrong? a criterion referenced perspective. Educational Measurement, 2((1), 11–14. doi:10.1111/j.1745-3992.1993.tb00517.x.
Popham, W. J. (2011). Classroom assessment: what teachers need to know (6th ed.). Boston
Race to the Top Act of 2011, S. 844--112th Congress. (2011). Retrieved from http://www.govtrack.us/congress/bills/112/s844
Rothstein, J. (2009, January 11). Student sorting and bias in value-added estimation: selection on observables and unobservables. Cambridge, MA: The National Bureau of Economic Research. Retrieved from http://www.nber.org/papers/w14607
Sass, T. R. (2008). The stability of value-added measures of teacher quality and implications for teacher compensation policy. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research (CALDER). Retrieved from www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf
Sass, T., Semykina, A., & Harris, D. (2014). Value-added models and the measurement of teacher productivity. Economics of Education Review, 38, 9–23.
Schochet, P. Z., & Chiang, H. S. (2013). What are error rates for classifying teacher and school performance using value-added models? Journal of Educational and Behavioral Statistics, 38, 142–171. doi:10.3102/1076998611432174.
Shaw, L. (2013, March 30). Educators debate validity of MAP testing. The Seattle Times. Retrieved from http://seattletimes.com/html/localnews/2020678255_maptestswebxml.html
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH. Retrieved from http://www.siop.org/_principles/principles.pdf
Strunk, K. O., Weinsten, T. L., Makkonen, R. (2014). Sorting out the signal: do multiple measures of teachers’ effectiveness provide consistent information to teachers and principals? Education Policy Analysis Archives, 22(1), 100. doi:10.14507/epaa.v22.1590 Retrieved from http://epaa.asu.edu/ojs/article/view/1590
U.S. Department of Education. (2006, May 17). Secretary Spellings approves Tennessee and North Carolina growth model pilots for 2005–2006. Retrieved from http://votesmart.org/public-statement/174269/secretary-spellings-approves-tennessee-and-north-carolina-growth-model-pilots-for-2005-2006#.U2kVosf94a8
Walsh, E., & Isenberg, E. (2015). How does value-added compare to student growth percentiles? Statistics and Public Policy, 2(1), e1034390. doi:10.1080/2330443X.2015.1034390.
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The Widget Effect. Education Digest, 75(2), 31–35.
Whitehurst, G. J. R., Chingos, M. M., & Lindquist, M. M. (2015). Getting classroom observations right. Education Next, 15(1). Retrieved from http://educationnext.org/getting-classroom-observations-right/.
Wright, S. P., White, J. T., Sanders, W. L., & Rivers, J. C. (2010). SAS white paper. Cary: SAS Institute. SAS® EVAAS® statistical models, Retrieved from http://www.sas.com/resources/asset/SAS-EVAAS-Statistical-Models.pdf.
Yeh, S. S. (2013). A re-analysis of the effects of teacher replacement using value-added modeling. Teachers College Record, 115(12), 1–35. Retrieved from http://www.tcrecord.org/Content.asp?ContentID=16934.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
We, as authors, submit the following in terms of our compliance with ethical standards regarding the research at the focus of this manuscript.
Conflict of interest
The authors declare that they have no competing interests.
Research involving human participants and/or animals
This research involved human subjects, but only data already available at the district and collected and analyzed in line with Arizona State University’s Institutional Review Board (IRB) procedures (ruling: exempt).
Informed consent
None required
Rights and permissions
About this article
Cite this article
Amrein-Beardsley, A., Polasky, S. & Holloway-Libell, J. Validating “value added” in the primary grades: one district’s attempts to increase fairness and inclusivity in its teacher evaluation system. Educ Asse Eval Acc 28, 139–159 (2016). https://doi.org/10.1007/s11092-015-9234-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092-015-9234-5