This article defines Educational Assessment as the systematic process of gathering, interpreting, and using information about student learning to make decisions about instruction, grading, progression, and programme effectiveness. Evaluation refers to the broader judgement of educational programmes, schools, or systems. Core features: (1) formative assessment (ongoing, low-stakes, used to adjust teaching), (2) summative assessment (end-of-unit or end-of-year, high-stakes, measures achievement), (3) diagnostic assessment (pre-instruction to identify prior knowledge), (4) norm-referenced (comparing student to peers) vs criterion-referenced (comparing student to fixed standards). The article addresses: stated objectives of educational assessment; key concepts including reliability, validity, fairness, and washback; core mechanisms such as test design, item analysis, and standard setting; international comparisons and debated issues (standardised testing pressure, grade inflation, authentic assessment); summary and emerging trends (computerised adaptive testing, portfolio assessment, AI scoring); and a Q&A section.
This article describes educational assessment without endorsing any particular testing regime. Objectives commonly cited: measuring student learning to inform instruction, certifying competence for progression or graduation, evaluating teacher and school effectiveness, and providing accountability data to stakeholders. The article notes that assessment practices vary widely and are contested due to impacts on student motivation, curriculum narrowing, and equity concerns.
Key terminology:
Historical context: Standardised testing expanded with compulsory schooling (early 20th century). SAT introduced 1926. No Child Left Behind Act (US, 2001) mandated annual testing. International large-scale assessments: PISA (2000), TIMSS (1995).
Test design principles:
Standard setting methods:
Effectiveness evidence:
Major testing programmes:
| Programme | Administering body | Age/grade | Purpose | Frequency |
|---|---|---|---|---|
| PISA | OECD | 15 years | Cross-national comparison | Every 3 years |
| TIMSS | IEA | Grade 4, 8 | Maths/science trends | Every 4 years |
| NAEP | US Dept of Education | Grades 4,8,12 | National assessment | Every 2-4 years |
| National College Entrance Exam (Gaokao) | China | Grade 12 | University admission | Annual |
Debated issues:
Summary: Educational assessment includes formative (ongoing, lower stakes) and summative (end-of-period, higher stakes) methods. Reliability and validity are essential quality criteria. Standardised testing supports accountability but may narrow curriculum. Authentic assessment measures deeper skills at cost of reliability.
Emerging trends:
Q1: Does frequent testing improve learning?
A: Yes, the testing effect (retrieval practice) improves long-term retention. Frequent low-stakes quizzes (weekly) outperform fewer high-stakes exams. Effect size d=0.5.
Q2: What is the optimal class size for assessment reliability?
A: Not directly relevant. For reliable grading of constructed responses, single reader is sufficient with rubrics. Two readers improve reliability (r=0.7 to 0.85) but increase cost.
Q3: Can parents opt children out of standardised tests?
A: In some US states, yes (parental opt-out laws). Consequences vary: school may still be penalised for low participation; student may receive no score. Most countries do not permit opt-out.
Q4: How are tests adapted for students with disabilities?
A: Accommodations include extended time, readers, scribes, separate setting, braille. Research shows accommodations improve scores for disabled students (d=0.3-0.5) without artificially inflating for non-disabled.
https://www.ets.org/
https://www.nciea.org/
https://www.oecd.org/pisa/
https://www.iaea.info/ (International Association for Educational Assessment)
https://www.edglossary.org/assessment/