Basic Concepts in Assessment
CLASSROOM ASSESSMENT
One of the most
basic and difficult task that teachers face in their work is the process of
assessment. Classroom assessment includes all the process involved in making
decisions about students learning progress. It includes the observation of
students’ written work, their answers to questions in class, and performance on
teacher-made and standardized tests.
According to (Koyalik, 2002 as cited in Eggen &
Kauchak, 2004):
It facilitates teachers in decision making about
learning progress through systematic information gathering.
Besides that,
assessment also accomplishes two other important goals; increasing learning and
increasing motivation.
The relationship between learning and assessment
is very strong. Students learn more in classes where assessment is an integral
part of instruction than in those where it isn’t. Brief assessment that
provides frequent feedback about learning progress is more effective than long,
infrequent ones, like once-a-term tests.
Basic Concepts in Assessment
This section describes how teachers can apply the
principles of behaviourism in the classroom. It is divided into three
subsections according to the perspectives discussed earlier that are classical
conditioning, operant conditioning and social learning theory.
Reliability
Reliability refers to the extent to which
assessments are consistent. Just as we enjoy having reliable cars (cars that
start every time we need them), we strive to have reliable, consistent
instruments to measure students’ achievement.
Validity
Validity refers to the
accuracy of an assessment; whether or not it measures what it is supposed to
measure. Even if a test is reliable, it may not provide a valid measure.
Your school district is looking for an
assessment instrument to measure reading ability. They have narrowed the
selection to two possibilities. Test A provides data indicating that it has
high validity, but there is no information about its reliability. Test B
provides data indicating that it has high reliability, but there is no
information about its validity. Which test would you recommend? Why?
Types of Assessment in the Classroom
Almost everyone knows about the types of tests
typically encountered in school. There are final exams, midterm exams, end-of
unit tests, pop quizzes and so on. All of those tests have one thing in common.
They represent teacher’s attempt to get a fix on how much the students have
learned. More accurately, such tests are employed to determine a student’s
status with respect to the knowledge and skills that the teacher is attempting
to promote. If teachers are reasonably sure about what their students currently
know, then the teacher can more accurately tailor instructional activities to
what students need to know.
The types of assessments such as the quizzes and
examinations that most of us took in school, have historically been paper-and
pencil instruments. However, in recent years, educators have been urged to
broaden their conception of testing so that students’ status is determined via
a wider variety of measuring devices.
Formative and Summative
Formative and summative assessments are the two
types of assessments that are always used by teachers. Now let us look what are
formative and summative assessments and then the common
Formative
assessments are
on-going assessments, reviews, and observations in a classroom. Teachers use
formative assessment to improve instructional methods and student’s feedback
throughout the teaching and learning process. For example, if a teacher
observes that some students do not grasp a concept, she or he can design a
review activity or use a different instructional strategy. Likewise, students
can monitor their progress with periodic quizzes and performance tasks.
The results of formative assessments are used to
modify and validate instruction. Formative assessment is generally carried out
throughout a course or project. Formative assessment also referred to as
‘educative assessment’, is used to aid learning. In educational setting,
formative assessment could be a teacher (or peer) or the learner, providing
feedback on a student’s work and would not necessarily be used for grading
purposes.
b) Summative
assessments are typically used to evaluate the effectiveness of
instructional programs and services at the end of an academic year or at a
pre-determined time. The goal of summative assessments is to make a judgment of
student competency, after an instructional phase is complete. For example, in
Malaysia, the final examination is administered once a year. It is a summative
assessment to determine each student’s ability at pre-determined points in
time.
Summative evaluations are used to determine if
students have mastered specific competencies and to identify instructional
areas that need additional attention. Summative assessment is generally carried
out at the end of a course or project. In an educational setting, summative
assessments are typically used to assign students a course grade.
Objective and Subjective
Assessment (either summative or formative) is often
categorized as either objective or subjective.
Objective
assessment is a form of questioning which has a single correct answer. It is
also known as Selected-Response Items.
A scoring key for correct responses is created and
can be applied by an examiner or by a computer. The scoring is easy, objective
and reliable. The task is highly structured and clear where it can measure both
simple and complex learning outcomes. However, constructing good items is time
consuming and it is ineffective to measure some types of problem-solving items.
Subjective
assessment is a form of questioning which may have more than one correct answer
(or more than one way of expressing the correct answer). It is known as
Constructed-Response Items.
It requires students to write out information
rather than select a response from a menu. In scoring, many
constructed-response items require judgment on the part of examiner. It can
measure the highest level of learning outcomes such as analysis, synthesis and
evaluation
and the
integration and application of ideas can be emphasized. In term of preparation,
the essay types questions can be prepared in less time compared to
selection-types format.
However, the disadvantage is that scoring is
time consuming, subjective and possibly unreliable. There are various types of
objective and subjective questions. Objective question types include true/false
answers, multiple-choice, multiple-responses and matching questions. Subjective
questions include extended-response questions, restricted-response questions
and essays.
Some have argued that the distinction between
objective and subjective assessments is neither useful nor accurate because, in
reality, there is no such thing as ‘objective’ assessment. In fact, all
assessments are created with inherent biases built into decisions about
relevant subject matter and content, as well as cultural (class, ethnic, and
gender) biases.
Bases of Comparison
Test results can be compared against an established criterion,
or against the performance of other students, or against the previous
performance.
a) Criterion-referenced assessment
Typically, using a criterion-referenced test,
as the name implies, occurs when candidates are measured against defined (and objective)
criteria. Criterion-referenced assessment is often, but not always, used to
establish a person’s competence (whether she or he can do something). The best
known example of criterion-referenced assessment is the driving test, when
learner drivers are measured against a range of explicit criteria (such as ‘Not
endangering other road users’).
b) Norm-referenced
assessment
Typically, using a norm-referenced test, is
not measured against defined criteria. This type of assessment is relative to
the student undertaking the assessment. It is effectively a way of comparing
students. The IQ test is the best known example of norm-referenced assessment.
Many entrance tests (to prestigious schools or universities) are
norm-referenced, permitting a fixed proportion of students to pass (‘passing’
in this context means being accepted into the school or university rather than
an explicit level of ability). This means that standards may vary from year to
year, depending on the quality of the cohort, criterion-referenced assessment
on the other hand does not vary from year to year (unless the criteria change).
Informal and Formal
Assessment can be either formal or informal.
Internal
and External Assessment
Internal assessment is set and marked by the
school (i.e. teachers). Students get the mark and feedback regarding the
assessment. External assessment is set by the governing body, and is marked by
non-biased personnel. With external assessment, students only receive a mark.
Therefore, they have no idea how they actually performed (i.e. what bits they
answered correctly.)
Teacher-made Tests
Individuals interested in constructing a test are
confronted with challenges concerning what to assess, how to assess it, and
whether they measure it in a reliable and valid manner. These are fundamental
challenges to teachers who construct their own tests and to professionals who
design standardised tests. Consequently, these topics deserve our attention.
Teacher-made Paper-and-pencil Tests
Although teachers use many techniques in evaluating
students, probably the most popular is the written paper-and-pencil test that
they themselves construct. These usually consist of essay or multiple-choice
items. The multiple choice, paper-and-pencil test is probably the most
frequently used test, with other types such as true/false, essay and
performance tests.
“Good multiple choice items are
difficult to prepare but can be scored easily and objectively. Essay tests, on
the other hand, are relatively easy to prepare but extremely difficult to
score.”
(Elliott, Kratochwill, Cook, & Travers, 2000)
Teacher-made Performance and Product Assessment
To assess some
instructional outcomes or evaluate certain areas of students’ learning, using
paper-and-pencil tests is inappropriate. For example, teacher cannot determine
how well a student can type a letter with a multiple choice test. Sometimes, a
paper-and-pencil assessment will do, but it is not always the best method,
there are instructional outcomes for which other assessment option works
better.
For certain
assessment situation, observation is clearly the most appropriate approach. For
example, attitudes can be assessed by asking persons to respond in writing to a
number of questions. But a person may say he is a good sport and then break the
rules to win or walk away pouting. In such cases, more accurate information
could be obtained by actually watching him compete.
Observation is used as a method of assessment when
actions speak louder than words. In observation assessment, teachers gather
data not by asking for information but by watching closely. The students being
observed, usually does not write anything as she would on a paper-and-pencil
test. Instead, the student performs some actions and her behaviour is observed
and recorded by the teacher.
Performance assessments include what is commonly
thought of as students’ actual performance such as oral presentation, dance,
music, art and physical education. In addition to assessing student
performance, student products may also be assessed. Book report, papers,
dioramas, science fair projects, art work and portfolios are all examples of
tangible output that can be assessed on a number of dimensions depending on the
teacher’s objectives for the class. Even a student-composed answer to an essay
question on a written test can be considered a product.
According to Gallagher:
“The teacher decides the important
dimensions and criteria for success the product accordingly.”
Teacher-made
Rating Scales and Checklists
Rating scales and checklists are instruments
teachers use to help with data-gathering when assessing either a performance or
a product. A rating scale is typically an instrument with a number of
items related to a given variable, each item representing a continuum of
categories between two extremes, usually with a number of points along the
continuum highlighted in some way. Persons responding to the items place a mark
to indicate their position to each item.
A checklist enumerates a number of behaviours
or features that constitute a procedure or product. When a procedure is
involved, the steps are typically listed in the desired order. According to
Gallagher,
Teachers are not psychometricians. Thus, the
assessment instruments (such as tests and observation checklists teachers
produce) are not likely to be formally validated, field-tested and revised.
Larger budgets than those teachers normally have at hand are required to
produce assessments that meet such rigorous requirements. For these
assessments, teachers usually turn to commercial test publishers.
“The person completing the
checklist indicates whether a given behaviour or feature occurred or is present
(Gallagher, 1998).”
1.
Suppose that you have been told that as part of your teacher certification
process, there will be a performance assessment. What do you think is the best
performance assessment of your teaching skills?
2. What are some
features of rating scale? How it is differ from checklist?
3. If you are a
teacher, choose one suitable assessment that can be used to measure your
students’ performance? Explain why you do think that type of assessment is
suitable to measure your target area?
10.6 StandardisedTests
Standardised tests are no better as assessment tools
than teacher-made tests. Standardised tests are better suited to large scale
data collection and when uniform comparisons across students are crucial.
According to (Airasian, 1997 as cited in Tan,
Parsons, Hinson, & Sardo-Brown, 2003):
“Standardised tests are intended to
be administered, scored, and interpreted in the same way for all test takers,
regardless of where or when they are assess.”
(Airasian, 1997 as cited in
Tan, Parsons, Hinson, & Sardo-Brown, 2003)
These tests can
be administered individually or to groups and provide students with feedback.
It often served the bureaucratic needs of educational leaders. It is also used
to inform admission and selection procedures, to sort and identify the special
needs of students, and to provide accountability information about the efficacy
of schooling at all levels. Many standardised tests, therefore, are
norm-referenced by design, as they are created to make comparison between
students along specified measurement. Most standardised tests measure aptitude
and achievement.
10.6.1
Standardised Achievement Test
There are some characteristics of standardised
achievement test as discussed follows.
The tests aim
to measure attainment of objectives in school-based curricula. It explicitly
try to gauge skills and knowledge developed as a result of specific
instruction. This is the principal that differentiate it with the ‘ability‘
test.
Most of the
standardised tests use multiple-choice items at least to a substantial degree.
Some of the tests are entirely multiple choice. Many use mixture of multiple-choice
and constructed-response items. Use of constructed-response items is increasing
modestly. However, multiple-choice items still predominate in most areas.
Developers of
standardized achievement tests typically pay considerable attention to the
technical characteristics desired for tests. They usually have reliability
data, item analysis and other such technical information for these tests. Such
information usually appears in a technical manual or other formal report for
the test. In addition, the pre-publication research typically includes
professional reviews for cultural, racial, ethnic, and gender bias.
The interpretation of scores on standardised achievement tests relies on
use of large-usually been quite good. When the norm is not national in scope,
it is at least based on a reasonably large group considered relevant for the
purposes of the test.
Standardised Ability Test
Human beings possess and display a wide range of
abilities. Some are easy to identify, for example athletic ability, musical
talent and artistic flair. Some abilities are more difficult to
recognise.Traditionally, the type of ability of most interest in the
educational circles has been the mental ability. Most people believe that
mental ability has something to do with success in school. However, it is not
the only important factor to succeed in school.
Effort, motivation, concentration and other
variables are also important. But, it seems that human beings have a level of
mental ability that makes school learning more or less difficult. The mental
ability tests used in schools fall into two general classifications. They are
individual administered tests and group administered tests.
(a) Individual administered tests
A teacher would not ordinarily administer such
tests. However, teachers will encounter reports of scores from these tests.
Thus, teachers need to be familiar with them. It is expensive and
time-consuming to give these tests. Usually, they are administered only to
students with special needs. These tests play a key role in the identification
of learning disabilities, mental retardation and other such conditions. They
may also be used in the selection of students for gifted programs. Most
students will complete school without ever taking one of these tests.
The most obvious feature of the group test is that,
as suggested by the name of this category, it can be administered to a group
all at one time. The most typical arrangement would be administration to a
classroom of students. However, with appropriate spacing and proctoring, these
tests can be administered to hundreds of people at once.
Administration of these
tests does not require a specialised training needed to administer the
individual tests. Many teachers will administer one of these group tests to
their students.
Items
appearing in group tests are very similar in many ways to the items appearing
in individually
administered tests.
According to Hogan, 2007:
“The group tests essentially tried
to duplicate, as far as possible, the individual tests but in format suitable
for group administration.”
(Hogan, 2007)
1. What are some
of the features of standardised tests?
2. Identify at
least 3 examples of ability that can be measured and what type of ability gain
most interest in educational circles?
3. Most
standardised tests measure achievement. List down some of the criteria of
achievement test?
10.6.3
Standardised Aptitude Test
Aptitude tests are used to predict what students can
learn. Aptitude tests do not measure innate capacity or learning potential
directly, rather they measure performance based on learning abilities.
Intelligence tests are perhaps the best example of aptitude tests commonly used
in schools. It is interesting to note some differences between aptitude and
achievement tests. According to Elliot, Kratochwill, Cook and Traverse,
“An aptitude test predict and
individual’s performance in a certain task or in
particular job by sampling the
cumulative effect on the individual of many
experiences in daily living,
including specific educational experience. Aptitude
tests measure only innate capacity,
while achievement tests measure only the effects of learning.”
(Elliott, Kratochwill, Cook, & Travers, 2000)
10.6.4
Standardised Personality Test
There are several types of
personality measures.Many of these tests concentrate on clinical
applications that fall outside the realm of ordinary educational assessment.
Therefore, we provide just an overview of these categories, with special
reference to the types of measures a teacher might encounter.
(a) Self report inventories
The most widely used type of personality test. This
type of test consists of simple statement to which a person responds True or
False, Yes or No or similarly simple options. The statements are very simple and
the response is also simple. They are sometimes called objective inventories,
because they can objectively scored.
(b) Projective techniques
Present a person with an
ambiguous or innocuous stimulus and a simple instruction about how to respond.
The person constructs a response with near maximum freedom. The response
reveals something about the individual’s personality, motivations and inner
dynamics. The use of these techniques requires advance training and supervised
experience. Even then their value is controversial. They are widely used by
professionals such as school of psychologist. Therefore, although teachers
would never administer these instruments, they do need to have some familiarity
with them. The classic, well-known projective technique is the Rorschach
Inkblot Test and Thematic Apperception Test (TAT).
(c) Behaviour
rating scales
Now routinely used in schools for determ nation of
such conditions; attention disorder, hyperactivity, depression, assorted
emotional problems. It has two essential features, first, someone other than
the person being evaluated completes the rating such as teacher, parents or
other caregivers. Second, as suggested by the title, lists specific behaviours.
The person completing the form indicates frequency of observing the behaviour.
Hogan, 2007 stated that:
“The descriptor usually short: 1 to
3 words. The ratings are made on 3 to 5 point scale, typically ranging from
”never” to “always” or “definitely Not true” to “Definitely true”.
(Elliott, Kratochwill, Cook, & Travers, 2000)
10.7
Authentic Assessment
Authentic assessment is a kind of assessment that
directly measures students’ performance through real-life tasks or product.
This alternative assessment includes the following tasks or products such as
creating an original piece of artwork, writing a paper, delivering speech and
so on. Often teachers who use authentic assessment are interested not only in
the products of learning but also in the processes that students use to
prepare such products. Thus, portfolio of writing samples may be used to chart
the development of students’ writing skills over time as they relate to the
production of a final editorial. In some cases, teachers may videotape students
practising their delivery of speech on successive occasions to document their
growth in the development of final version.
The authentic assessment has several advantages
compared to the traditional assessment. These advantage of authentic assessment
are as explained follows.
They use real-world applications in which students are asked to be
active participants in performing, creating or producing something.
They
are more likely than traditional assessments to call upon higher-order thinking
or problem solving skills.
They ensure
that the students are actively involved in constructing understanding.
There are two most popular forms of authentic
assessment. The two forms of authentic assessment are performance assessment
and portfolio assessment.
10.7.1
Performance Assessment
Performance assessment of higher-level thinking
often emphasis ‘doing’, open-ended activities for which there is no one correct
answer. The tasks are sometimes realistic and many, but not all, performance
assessments are authentic. Evaluating performance often includes direct
methods of evaluation, self-assessment, assessment of group performance as well
as individual performance and an extended period of time.
Portfolio Assessment
A portfolio is a systematic and organised collection
of a students’ work compiled by students and teachers that are reviewed against
preset criteria to judge a student or program. Four classes of evidence can be
included; artefacts, reproductions, attestations, and production. The
collections consist of the products of learning such as a videotape, a piece of
artwork, a journal entry or an essay and the portfolio items are different
according to the content area.
A portfolio provides a tangible evidence of
accomplishment and skills that must be updated as the person grow and change.
Two broad types of purposes
of portfolio are to document growth through a growth portfolio and to showcase
the students’ most outstanding work through a best-work portfolio. The
strengths of learning portfolio such as capturing the complexity and
completeness of the students’ work and accomplishments, as well as encouraging
students’ decision-making and self-reflection. The weaknesses of learning
portfolio, such as the time required to coordinate and evaluate them and the
difficulty in evaluating them.
Principles of Authentic Assessment
They are 6 principles of Authentic Assessment. The
description for those principles are as follows.
Authentic assessment is continuous, informing every aspect of
instruction and curriculum building. As they engage in authentic assessment,
teachers discover and learn what to teach as well as how and when to teach
them.
1
Authentic assessment is an integral part of the curriculum.
Children are assessed while they are involved with classroom learning
experiences, not just before or after a unit through pre or post tests.
2
3
Authentic assessment is developmentally and culturally
appropriate.
Authentic assessment focuses on students’ strengths. Teachers
assess what students can do, what they know, and how they can use what they
know to learn.
4
Authentic assessment recognizes that the most important
evaluation is self evaluation. Students and teachers need to understand why
they are doing what they are doing so that they may have some sense of their
own success and growth.
5
Authentic assessment invites active collaboration between
teachers, students and parents work together to reflect and assess learning
(Bridges, 1995).
6
1. What makes an
assessment “authentic”?
2. How can portfolio be
used in assessmentportfolio assessment?