Dear students be ready for final exam

Basic Concepts in Assessment



CLASSROOM ASSESSMENT


One of the most basic and difficult task that teachers face in their work is the process of assessment. Classroom assessment includes all the process involved in making decisions about students learning progress. It includes the observation of students’ written work, their answers to questions in class, and performance on teacher-made and standardized tests.
According to (Koyalik, 2002 as cited in Eggen & Kauchak, 2004):
It facilitates teachers in decision making about learning progress through systematic information gathering.
Besides that, assessment also accomplishes two other important goals; increasing learning and increasing motivation.
The relationship between learning and assessment is very strong. Students learn more in classes where assessment is an integral part of instruction than in those where it isn’t. Brief assessment that provides frequent feedback about learning progress is more effective than long, infrequent ones, like once-a-term tests.


Basic Concepts in Assessment
This section describes how teachers can apply the principles of behaviourism in the classroom. It is divided into three subsections according to the perspectives discussed earlier that are classical conditioning, operant conditioning and social learning theory.
 Reliability
Reliability refers to the extent to which assessments are consistent. Just as we enjoy having reliable cars (cars that start every time we need them), we strive to have reliable, consistent instruments to measure students’ achievement.

Validity
Validity refers to the accuracy of an assessment; whether or not it measures what it is supposed to measure. Even if a test is reliable, it may not provide a valid measure. 
Your school district is looking for an assessment instrument to measure reading ability. They have narrowed the selection to two possibilities. Test A provides data indicating that it has high valid­ity, but there is no information about its reliability. Test B provides data indicating that it has high reliability, but there is no information about its validity. Which test would you recommend? Why? 

Types of Assessment in the Classroom
Almost everyone knows about the types of tests typically encountered in school. There are final exams, midterm exams, end-of unit tests, pop quizzes and so on. All of those tests have one thing in common. They represent teacher’s attempt to get a fix on how much the students have learned. More accurately, such tests are employed to determine a student’s status with respect to the knowledge and skills that the teacher is attempting to promote. If teachers are reasonably sure about what their students currently know, then the teacher can more accurately tailor instructional activities to what students need to know.
The types of assessments such as the quizzes and examinations that most of us took in school, have historically been paper-and pencil instruments. However, in recent years, educators have been urged to broaden their conception of testing so that students’ status is determined via a wider variety of measuring devices.
 Formative and Summative
Formative and summative assessments are the two types of assessments that are always used by teachers. Now let us look what are formative and summative assessments and then the common


Formative assessments are on-going assessments, reviews, and observations in a classroom. Teachers use formative assessment to improve instruction­al methods and student’s feedback throughout the teaching and learning process. For example, if a teacher observes that some students do not grasp a concept, she or he can design a review activity or use a different instructional strategy. Likewise, students can monitor their progress with periodic quizzes and performance tasks.
The results of formative assessments are used to modify and validate instruction. Formative assessment is generally carried out throughout a course or project. Formative assessment also referred to as ‘educative assessment’, is used to aid learning. In educational setting, formative assessment could be a teacher (or peer) or the learner, providing feedback on a student’s work and would not necessarily be used for grading purposes.
b) Summative assessments are typically used to evaluate the effectiveness of instructional programs and services at the end of an academic year or at a pre-determined time. The goal of summative assessments is to make a judgment of student competency, after an instructional phase is complete. For example, in Malaysia, the final examination is administered once a year. It is a summative assessment to determine each student’s ability at pre-determined points in time.
Summative evaluations are used to determine if students have mastered specific competen­cies and to identify instructional areas that need additional attention. Summative assessment is generally carried out at the end of a course or project. In an educational setting, summative assessments are typically used to assign students a course grade.

Objective and Subjective                                                                            
Assessment (either summative or formative) is often categorized as either objective or subjective.
Objective assessment is a form of questioning which has a single correct answer. It is also known as Selected-Response Items.
A scoring key for correct responses is created and can be applied by an examiner or by a computer. The scoring is easy, objective and reliable. The task is highly structured and clear where it can measure both simple and complex learning outcomes. However, constructing good items is time consuming and it is ineffective to measure some types of problem-solving items.
Subjective assessment is a form of questioning which may have more than one correct answer (or more than one way of expressing the correct answer). It is known as Constructed-Response Items.
It requires students to write out information rather than select a response from a menu. In scoring, many constructed-response items require judgment on the part of examiner. It can measure the highest level of learning outcomes such as analysis, synthesis and evaluation
and the integration and application of ideas can be emphasized. In term of preparation, the essay types questions can be prepared in less time compared to selection-types format.
However, the disadvantage is that scoring is time consuming, subjective and possibly unreliable. There are various types of objective and subjective questions. Objective question types include true/false answers, multiple-choice, multiple-responses and matching questions. Subjective questions include extended-response questions, restricted-response questions and essays.
Some have argued that the distinction between objective and subjective assessments is neither useful nor accurate because, in reality, there is no such thing as ‘objective’ assessment. In fact, all assessments are created with inherent biases built into decisions about relevant subject matter and content, as well as cultural (class, ethnic, and gender) biases.
Bases of Comparison                     
Test results can be compared against an established criterion, or against the performance of other students, or against the previous performance.
a) Criterion-referenced assessment
Typically, using a criterion-referenced test, as the name implies, occurs when candidates are measured against defined (and objective) criteria. Criterion-referenced assessment is often, but not always, used to establish a person’s competence (whether she or he can do something). The best known example of criterion-referenced assessment is the driving test, when learner drivers are measured against a range of explicit criteria (such as ‘Not endangering other road users’).
b) Norm-referenced assessment
Typically, using a norm-referenced test, is not measured against defined criteria. This type of assessment is relative to the student undertaking the assessment. It is effectively a way of comparing students. The IQ test is the best known example of norm-referenced assessment. Many entrance tests (to prestigious schools or universities) are norm-referenced, permitting a fixed proportion of students to pass (‘passing’ in this context means being accepted into the school or university rather than an explicit level of ability). This means that standards may vary from year to year, depending on the quality of the cohort, criterion-referenced assessment on the other hand does not vary from year to year (unless the criteria change).
Informal and Formal                         
Assessment can be either formal or informal. 
Internal and External Assessment
Internal assessment is set and marked by the school (i.e. teachers). Students get the mark and feedback regarding the assessment. External assessment is set by the governing body, and is marked by non-biased personnel. With external assessment, students only receive a mark. Therefore, they have no idea how they actually performed (i.e. what bits they answered correctly.)
Teacher-made Tests
Individuals interested in constructing a test are confronted with challenges concerning what to assess, how to assess it, and whether they measure it in a reliable and valid manner. These are fundamental challenges to teachers who construct their own tests and to professionals who design standardised tests. Consequently, these topics deserve our attention.
Teacher-made Paper-and-pencil Tests
Although teachers use many techniques in evaluating students, probably the most popular is the written paper-and-pencil test that they themselves construct. These usually consist of essay or multiple-choice items. The multiple choice, paper-and-pencil test is probably the most frequently used test, with other types such as true/false, essay and performance tests.
“Good multiple choice items are difficult to prepare but can be scored easily and objectively. Essay tests, on the other hand, are relatively easy to prepare but extremely difficult to score.”
(Elliott, Kratochwill, Cook, & Travers, 2000)
 Teacher-made Performance and Product Assessment
To assess some instructional outcomes or evaluate certain areas of students’ learning, using paper-and-pencil tests is inappropriate. For example, teacher cannot determine how well a student can type a letter with a multiple choice test. Sometimes, a paper-and-pencil assessment will do, but it is not always the best method, there are instructional outcomes for which other assessment option works better.
For certain assessment situation, observation is clearly the most appropriate approach. For example, attitudes can be assessed by asking persons to respond in writing to a number of questions. But a person may say he is a good sport and then break the rules to win or walk away pouting. In such cases, more accurate information could be obtained by actually watching him compete.
Observation is used as a method of assessment when actions speak louder than words. In observation assessment, teachers gather data not by asking for information but by watching closely. The students being observed, usually does not write anything as she would on a paper-and-pencil test. Instead, the student performs some actions and her behaviour is observed and recorded by the teacher.
Performance assessments include what is commonly thought of as students’ actual performance such as oral presentation, dance, music, art and physical education. In addition to assessing student performance, student products may also be assessed. Book report, papers, dioramas, science fair projects, art work and portfolios are all examples of tangible output that can be assessed on a number of dimensions depending on the teacher’s objectives for the class. Even a student-composed answer to an essay question on a written test can be considered a product.
According to Gallagher:                                                                                         
“The teacher decides the important dimensions and criteria for success the product accordingly.”
Teacher-made Rating Scales and Checklists
Rating scales and checklists are instruments teachers use to help with data-gathering when assessing either a performance or a product. A rating scale is typically an instrument with a number of items related to a given variable, each item representing a continuum of categories between two extremes, usually with a number of points along the continuum highlighted in some way. Persons responding to the items place a mark to indicate their position to each item.
A checklist enumerates a number of behaviours or features that constitute a procedure or product. When a procedure is involved, the steps are typically listed in the desired order. According to Gallagher,
Teachers are not psychometricians. Thus, the assessment instruments (such as tests and observation checklists teachers produce) are not likely to be formally validated, field-tested and revised. Larger budgets than those teachers normally have at hand are required to produce assessments that meet such rigorous requirements. For these assessments, teachers usually turn to commercial test publishers.
“The person completing the checklist indicates whether a given behaviour or feature occurred or is present (Gallagher, 1998).”
1. Suppose that you have been told that as part of your teacher certifi­cation process, there will be a performance assessment. What do you think is the best performance assessment of your teaching skills?
2. What are some features of rating scale? How it is differ from checklist?
3. If you are a teacher, choose one suitable assessment that can be used to measure your students’ performance? Explain why you do think that type of assessment is suitable to measure your target area?
10.6 StandardisedTests
Standardised tests are no better as assessment tools than teacher-made tests. Standardised tests are better suited to large scale data collection and when uniform comparisons across students are crucial.
According to (Airasian, 1997 as cited in Tan, Parsons, Hinson, & Sardo-Brown, 2003):
“Standardised tests are intended to be administered, scored, and interpreted in the same way for all test takers, regardless of where or when they are assess.”
(Airasian, 1997 as cited in Tan, Parsons, Hinson, & Sardo-Brown, 2003)
These tests can be administered individually or to groups and provide students with feedback. It often served the bureaucratic needs of educational leaders. It is also used to inform admission and selection procedures, to sort and identify the special needs of students, and to provide accountability information about the efficacy of schooling at all levels. Many standardised tests, therefore, are norm-referenced by design, as they are created to make comparison between students along specified measurement. Most standardised tests measure aptitude and achievement.
10.6.1 Standardised Achievement Test
There are some characteristics of standardised achievement test as discussed follows.
The tests aim to measure attainment of objectives in school-based curricula. It explicitly try to gauge skills and knowledge developed as a result of specific instruction. This is the principal that differentiate it with the ‘ability‘ test.
Most of the standardised tests use multiple-choice items at least to a substantial degree. Some of the tests are entirely multiple choice. Many use mixture of multiple-choice and constructed-response items. Use of constructed-response items is increasing modestly. However, multiple-choice items still predominate in most areas.
Developers of standardized achievement tests typically pay considerable attention to the technical characteristics desired for tests. They usually have reliability data, item analysis and other such technical information for these tests. Such information usually appears in a technical manual or other formal report for the test. In addition, the pre-publication research typically includes professional reviews for cultural, racial, ethnic, and gender bias.
The interpretation of scores on standardised achievement tests relies on use of large-usually been quite good. When the norm is not national in scope, it is at least based on a reasonably large group considered relevant for the purposes of the test.
Standardised Ability Test
Human beings possess and display a wide range of abilities. Some are easy to identify, for example athletic ability, musical talent and artistic flair. Some abilities are more difficult to recognise.Traditionally, the type of ability of most interest in the educational circles has been the mental ability. Most people believe that mental ability has something to do with success in school. However, it is not the only important factor to succeed in school.
Effort, motivation, concentration and other variables are also important. But, it seems that human beings have a level of mental ability that makes school learning more or less difficult. The mental ability tests used in schools fall into two general classifications. They are individual administered tests and group administered tests.
(a) Individual administered tests
A teacher would not ordinarily administer such tests. However, teachers will encounter reports of scores from these tests. Thus, teachers need to be familiar with them. It is expensive and time-consuming to give these tests. Usually, they are administered only to students with special needs. These tests play a key role in the identification of learning disabilities, mental retardation and other such conditions. They may also be used in the selection of students for gifted programs. Most students will complete school without ever taking one of these tests.
The most obvious feature of the group test is that, as suggested by the name of this category, it can be administered to a group all at one time. The most typical arrangement would be administration to a classroom of students. However, with appropriate spacing and proctoring, these tests can be administered to hundreds of people at once.
Administration of these tests does not require a specialised training needed to administer the individual tests. Many teachers will administer one of these group tests to their students.
Items appearing in group tests are very similar in many ways to the items appearing 
in individually administered tests.
According to Hogan, 2007:
“The group tests essentially tried to duplicate, as far as possible, the individual tests but in format suitable for group administration.”
(Hogan, 2007)
1. What are some of the features of standardised tests?
2. Identify at least 3 examples of ability that can be measured and what type of ability gain most interest in educational circles?
3. Most standardised tests measure achievement. List down some of the criteria of achievement test?
10.6.3 Standardised Aptitude Test
Aptitude tests are used to predict what students can learn. Aptitude tests do not measure innate capacity or learning potential directly, rather they measure performance based on learning abilities. Intelligence tests are perhaps the best example of aptitude tests commonly used in schools. It is interesting to note some differences between aptitude and achievement tests. According to Elliot, Kratochwill, Cook and Traverse,
“An aptitude test predict and individual’s performance in a certain task or in
particular job by sampling the cumulative effect on the individual of many
experiences in daily living, including specific educational experience. Aptitude
tests measure only innate capacity, while achievement tests measure only the effects of learning.”
(Elliott, Kratochwill, Cook, & Travers, 2000)
10.6.4 Standardised Personality Test
There are several types of personality measures.Many of these tests concentrate on clinical applications that fall outside the realm of ordinary educational assessment. Therefore, we provide just an overview of these categories, with special reference to the types of measures a teacher might encounter.
(a) Self report inventories
The most widely used type of personality test. This type of test consists of simple statement to which a person responds True or False, Yes or No or similarly simple options. The statements are very simple and the response is also simple. They are sometimes called objective inventories, because they can objectively scored.
(b) Projective techniques
Present a person with an ambiguous or innocuous stimulus and a simple instruction about how to respond. The person constructs a response with near maximum freedom. The response reveals something about the individual’s personality, motivations and inner dynamics. The use of these techniques requires advance training and supervised experience. Even then their value is controversial. They are widely used by professionals such as school of psychologist. Therefore, although teachers would never administer these instruments, they do need to have some familiarity with them. The classic, well-known projective technique is the Rorschach Inkblot Test and Thematic Apperception Test (TAT).

(c) Behaviour rating scales
Now routinely used in schools for determ na­tion of such conditions; attention disorder, hyperactivity, depression, assorted emotional prob­lems. It has two essential features, first, someone other than the person being evaluated completes the rating such as teacher, parents or other caregivers. Second, as suggested by the title, lists specific behaviours. The person completing the form indicates frequency of observing the behaviour.
Hogan, 2007 stated that:
“The descriptor usually short: 1 to 3 words. The ratings are made on 3 to 5 point scale, typically ranging from ”never” to “always” or “definitely Not true” to “Definitely true”.
(Elliott, Kratochwill, Cook, & Travers, 2000)
10.7 Authentic Assessment
Authentic assessment is a kind of assessment that directly measures students’ performance through real-life tasks or product. This alternative assessment includes the following tasks or products such as creating an original piece of artwork, writing a paper, delivering speech and so on. Often teachers who use authentic assessment are interested not only in the prod­ucts of learning but also in the processes that students use to prepare such products. Thus, portfolio of writing samples may be used to chart the development of students’ writing skills over time as they relate to the production of a final editorial. In some cases, teachers may videotape students practising their delivery of speech on successive occasions to document their growth in the development of final version.
The authentic assessment has several advantages compared to the traditional assessment. These advantage of authentic assessment are as explained follows.
They use real-world applications in which students are asked to be active participants in performing, creating or producing something.
They are more likely than traditional assessments to call upon higher-order thinking or problem solving skills.
They ensure that the students are actively involved in constructing understanding.
There are two most popular forms of authentic assessment. The two forms of authentic assessment are performance assessment and portfolio assessment.
10.7.1 Performance Assessment
Performance assessment of higher-level thinking often emphasis ‘doing’, open-ended activities for which there is no one correct answer. The tasks are sometimes realistic and many, but not all, performance assessments are authentic. Evaluat­ing performance often includes direct methods of evaluation, self-assessment, assessment of group performance as well as individual performance and an extended period of time.
Portfolio Assessment
A portfolio is a systematic and organised collection of a students’ work compiled by students and teachers that are reviewed against preset criteria to judge a student or program. Four classes of evidence can be included; artefacts, reproductions, attestations, and production. The collections consist of the products of learning such as a videotape, a piece of artwork, a journal entry or an essay and the portfolio items are different according to the content area.
A portfolio provides a tangible evidence of accomplishment and skills that must be updated as the person grow and change.
Two broad types of purposes of portfolio are to document growth through a growth portfolio and to showcase the students’ most outstanding work through a best-work portfolio. The strengths of learning portfolio such as capturing the complexity and completeness of the students’ work and accomplishments, as well as encouraging students’ decision-making and self-reflection. The weaknesses of learning portfolio, such as the time required to coordinate and evaluate them and the difficulty in evaluating them.
Principles of Authentic Assessment
They are 6 principles of Authentic Assessment. The description for those principles are as follows.
Authentic assessment is continuous, informing every aspect of instruction and curriculum building. As they engage in authentic assessment, teachers discover and learn what to teach as well as how and when to teach them.
1
Authentic assessment is an integral part of the curriculum. Children are assessed while they are involved with classroom learning experiences, not just before or after a unit through pre or post tests.
2
3
Authentic assessment is developmentally and culturally appropriate.
Authentic assessment focuses on students’ strengths. Teachers assess what students can do, what they know, and how they can use what they know to learn.
4
Authentic assessment recognizes that the most important evaluation is self evaluation. Students and teachers need to understand why they are doing what they are doing so that they may have some sense of their own success and growth.
5
Authentic assessment invites active collaboration between teachers, students and parents work together to reflect and assess learning (Bridges, 1995).
6
1. What makes an assessment “authentic”?
2. How can portfolio be used in assessmentportfolio assessment?