Reliability Of Raters For Writing Assessment: Analytic - Holistic, Analytic - Analytic, Holistic – Holistic/Kompozisyon Değerlendirmesinde Değerlendiricilerin Güvenirliği: Analitik - Holistik, Analitik - Analitik, Holistik – Holistik

Yakup Çetin
3.505 677


Öğrencilerin kompozisyonlarını değerlendirmede temel sorunlardan bir tanesi güvenilir ve geçerli bir değerlendirme kriterine karar vermektir. Bu amaçla analitik ve holistik değerlendirme kriterleri yabancı dil programlarında uzmanlarca yaygınca kullanılmaktadır. Bu çalışmada, değerlendirme kriter ve tercihlerine göre – analitik veya holistik - 31 tane yabancı dil öğretmeni 344 öğrenci kompozisyonu değerlendirmek üzere rast gele seçilerek görevlendirildi. Her bir öğrenci kompozisyonu iki öğretmen tarafından değerlendirildiği için kullanılan farklı kriterlerden – holistik/holistik, holistik/analitik, analitik/analitik dolayı oluşan farklı değerlerin korelasyonları incelendi. Üç farklı durum dikkate alınarak öğretmenlerin kompozisyonlara verdikleri puanların korelasyonları hesap edilerek değerlendiriciler arasındaki güvenirlik tespit edildi. Sonuçlara göre en yüksek korelasyon aynı kompozisyonu holistik kritere göre değerlendiren öğretmenler arasında ortaya çıktı. Bu çalışmada en düşük korelasyon aynı kompozisyonu farkı kriterlere – analitik veya holistik – göre değerlendiren öğretmenler arasında saptanmıştır.


One of the main concerns in writing assessment is the choice of reliable and valid rating criteria to decide on students' writing proficiency levels. For this purpose, holistic and analytic rubrics have been employed most commonly by EFL/ESL programs and specialists as essay scoring instruments. In this present study, based on their actual use of rubric types, 31 novice Turkish teachers of English who were responsible for rating 344 student essays were randomly appointed as either holistic raters or analytic raters. Given that there were two rater pairs per essay, three different conditions and approaches to essay scoring were realized and these were used for this correlational study: holistic versus holistic, holistic­­­­­­ versus analytic, and analytic versus analytic. Inter-rater reliability was determined through the study of these three types of scoring conditions to find the amount of correlation between raters' scores. Results determined that the highest correlation occurred most strongly between two holistic raters followed respectively by two analytic raters. The study also revealed that inter-rater reliability is rather low in a condition when two different rater types – holistic versus analytic – score the same student essay.

Tam metin:



Andrade, H., Du, Y., &Wang, X. (2008). “Putting rubrics to the test: The effect of a model, criteria generation, and rubric-referenced self-assessment on elementary school students' writing.” Educational Measurement: Issues and Practice, 27 (2), 3–13.

Bacha, N. (2009). “Writing evaluation: what can analytic versus holistic essay scoring tell us?” System, 29, 371–383.

Bacha, Nahla. (2001). “Writing evaluation: what can analytic versus holistic scoring tell us?” System, 29, 3, 371-383.

Beyreli, L. & Ari, G. (2009). “The Use of Analytic Rubric in the Assessment of Writing Performance-Inter-Rater Concordance Study- Kuram ve Uygulamada Eğitim Bilimleri “, Educational Sciences: Theory & Practice, 9 (1), 105-125.

Breland, H. M. (1983). The direct assessment of writing skill: A measurement review. New York: College Entrance Examination Board.

Brookhart, S.M. (1999). “The Art and Science of Classroom Assessment: The Missing Part of Pedagogy.” ASHE-ERIC Higher Education Report (Vol. 27, No. 1). Washington, DC: The George Washington University, Graduate School of Education and Human Development.

Büyüköztürk, Ş. (2009). Sosyal Bilimler İçin Veri Analizi El Kitabı: İstatistik, Araştırma Deseni SPSS Uygulamaları Ve Yorum (10.Baskı). Pegem Akademi.

Cohen, L. and Holliday, M. (1982) Statistics for Social Scientists, London: Harper & Row.

Cohen, L., and Manion, L. (1994). Research methods in education (4th ed.). Newyork: Routledge.

Cumming, A., & Riazi, A. M. (2000).”Building Models of Adult Second Language Writing Instruction”. Learning and Instruction 10, 55-71.

Dyer, Jack L., & Thorne, Daniel. (1994). “Holistic scoring for measuring and promoting improvement in writing skills”. Journal of Education for Business, 69/4, 226-231.

East, M. (2009). “Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing”. Assessing Writing, 14, 88–115

Elbow, P. (1999). Ranking, evaluating, and liking: sorting out three forms of judgments. In: Straub, R. (Ed.), A Sourcebook for Responding to Student Writing. Hampton Press, Inc, New Jersey, pp. 175–196.

Finson, K. D. (1998). “Rubrics and their use in inclusive science”. Intervention in School and Clinic, 34 (2), 79–88.

Gunning, T. G. (2006). Assessing and correcting reading and writing difficulties (3th ed.). Boston: Pearson Education Inc.

Hafner, J. C., & Hafner, P. M. (2003). “Quantitative analysis of the rubric as an assessment tool: An empirical study of student peer-group rating”. International Journal of Science Education, 25, 12, 1509–1528.

Hamp-Lyons, L. (1995). “Rating non-native writing: the trouble with holistic scoring”. TESOL Quarterly 29, 4:759-62.

Jonsson, A., & Svingby, G. (2007). “The use of scoring rubrics: Reliability, validity and educational consequences”. Educational Research Review, 2, 130–144.

Knoch, U. (2009). “Diagnostic assessment of writing: A comparison of two rating scales”. Language Testing, 26, 20, 275–304.

Knoch, U., Read, J., & von Randow, J. (2007). “Re-training writing raters online: How does it compare with face-to-face training?” Assessing Writing, 12, 26–43.

Kohn, A. (2006). “The trouble with rubrics”. English Journal, 95 (4), 12–15.

Kroll, B. (ed.) (1990). Second Language Writing: Research Insights for the Classroom. Cambridge University Press, Cambridge.

Kroll, B. (1998). “Assessing writing abilities”. Annual Review of Applied Linguistics 18:219-40.

Lumley, T. (2005). Assessing second language writing: The rater's perspective. Frankfurt: Lang.

Mabry, L. (1999). “Writing to the rubric: Lingering effects of traditional standardized testing on direct writing assessment”. Phi Delta Kappan, 80 (9), 673–679.

Madigan, R., & Brosamer, J. (1991). “Holistic Grading of Written Work in Introductory Psychology: Reliability, Validity, and Efficiency”. Teaching of Psychology, 18/2, 91-94.

McNamara, T. F. (1996). Measuring second language performance. London: Longman.

Moskal, B. (2000). Assessment Resource Page. (November 20, 2010).

Pula, J. J., & Huot, B. A. (1993). “A model of background influences on holistic raters. In: M. M. Williamson & B. A. Huot (Eds.)”, Validating holistic scoring for writing assessment: Theoretical and empirical foundations (pp. 237–265). Cresskill, NJ: Hampton Press.

Read, B., Francis, B., & Robson, J. (2005). “Gender, bias, assessment and feedback: Analyzing the written assessment of undergraduate history essays.” Assessment and Evaluation in Higher Education, 30 (3), 241–260.

Rezaei, A. R. and Lovorn, M. (2010). “Reliability and validity of rubrics for assessment through writing”. Assessing Writing, 15,18–39.

Ross-Fisher, R. L. (2005). “Developing effective success rubrics”. Kappa Delta Pi, 41 (3), 131–135.

Silvestri, L., & Oescher, J. (2006). “Using rubrics to increase the reliability of assessment in health classes”. International Electronic Journal of Health Education, 9, 25–30.

Spandel, V. (2006). “In defense of rubrics”. English Journal, 96 (1), 19–22.

Sweedler-Brown, C.O., 1992. “The effect of training on the appearance bias of holistic essay graders”. Journal of Research and Development in Education 26 (1), 24–29.

Turley, E. D. & Gallagher, C. G. (2008). “On the uses of rubrics: Reframing the great rubric debate”. English Journal, 79, (4), 87–92.

Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.

White, E. M. (1994). Teaching and Assessing Writing: Recent Advances in Understanding, Evaluating, and Improving Student Performance. Jossey-Bass Publishers, San Francisco.

Wilson, M. (2007). “Why I won't be using rubrics to respond to students' writing”. English Journal, 96 (4), 62–66.

Wolfe, E. W. (1997). “The relationship between essay reading style and scoring proficiency in a psychometric scoring system”. Assessing Writing, 4 (1), 83–106.