|
Techniques when evaluating assessment
measures that involve rubrics suffer from the reliability/validity
problem. Reliability
refers to the whether you get the same response across multiple
occurrences and multiple judges. Is the same result occurring each time
you measure?
To measure Reliability, there must be
multiple occurrences.
1. Two or more judges evaluating the
same set of measures (inter-rater reliability)
2. One judge evaluating measures more
than once (test-retest reliability)
To improve Reliability:
1. Better training for coders (if
students)
2. Discussion for agreement (if
faculty)
3. If score is resulting from averaging
across coders, variable increases in reliability with more coders.
Validity
refers to whether you are measuring what you are intending to measure. If
you want to measure writing ability and measure that using a multiple
choice test, your measure is probably NOT valid. A more valid measure
would probably involve an essay of some sort.
Validity definitions (What’s the
criteria?):
An objectively defined response to the
problem. (construct validity)
Does it look like it measures what it
was intended to measure? (face validity)
How does it compare to other measures
of the same variable? (concurrent validity)
To measure Validity:
Agreement within the department or rubric
should yield face validity
Construct validity
is inherently difficult to measure in disciplines without objective
answers. Performance on essays or projects is difficult to compare to some
objective criterion.
Correlations between different measures
(or different experts) of the same skill can establish a degree of
concurrent validity. Sometimes, concurrent validity and inter-rater
reliability are the same.
Issues related to reliability and
validity are at the heart of many disagreement and difficulties in the
assessment process. They are not always easy to measure or fix.
|