Tests given in pschology

greenspun.com : LUSENET : History & Theory of Psychology : One Thread

Can a written test that is given in psychology be reliable but not valid, can it be valid but not reliable, or does it have to be both reliable and valid?

-- Chrystal Wells (cw143143@aol.com), September 09, 2002

Answers

A test can be very reliable, yet not valid. However, if it is not reliable, it follows that it is not valid. Any good psychometric instrument must be both reliable and valid.

-- Daniel J. Denis (dand@yorku.ca), September 09, 2002.

I think I would differ from Daniel here. You can measure the right thing, but do it badly -- that would be valid but not reliable. You can also measure the wrong thing, but do it very well -- that would be reliable but not valid.

-- Christopher Green (cgreen@chass.utoronto.ca), September 09, 2002.

Both qualities are needed espacially when tests results are used to guide a decision (examples: which tract a student may follow, whom to choose between applicants to a promotion).
About reliability see: Knapp, T. R. (2001). The Reliability of Measuring Instruments. Vancouver, B.C.: Edgeworth Laboratory for Quantitative Educational and Behavioral Science Series. You may have access to it at: http://www.educ.ubc.ca/faculty/zumbo/series/knapp/knapp.pdf
About validity see: Samuel Messick (1989) Validity. in Robert Linn (Ed)Educational Measurement. New York: Mac Millan

-- Rock Faulkner (rock.faulkner@umontreal.car), September 11, 2002.

Technically speaking, valididy can never be higher than reliability.
If you measure the right thing, but do it very badly, the validity of your result will be bad, no matter how well you have chosen to measure it.
Example: suppose you want to measure people's body temperature. You have an old thermometer that has a very low reliability.
Even if the "validity" of your measurement is OK, would you trust the data you gather? I guess you won't have any confidence to claim, "mean body temperature is 36.8°".
P.

-- Peter Doomen (peter.doomen@advalvas.be), February 17, 2003.

I agree with Christopher. An essay test is the classic example. It can be very valid (high content validity) but have poor reliability because equally competent teachers often disagree regarding the scores to assign to written essays.

-- tom knapp (tknapp5@juno.com), February 19, 2003.

Don’t take it personal but it is my opinion that the essay test example is one of the worst examples to refer to. How do you explain that two “equally competent” teachers disagree about the score to assign to a copy? Obviously, these two don’t refer to the same criteria. Inter-rater agreement is a form of reliability to be evaluated.
A lot of recent psychological and educational measurement textbooks specify that content validity has more to do with reliability than validity. Why? Because validity is not a test (or measurement instrument) quality but an attribute of the scores’ interpretation.

-- Rock Faulkner (rock.faulkner@umontreal.ca), February 20, 2003.

Rock--I obviously disagree. Methinks you've been influenced in your thinking by my nemeses, Bruce Thompson and the late Samuel Messick. Cheers, Tom

-- tom knapp (tknapp5@juno.com), February 21, 2003.

Tom, Your are wright. And I think that everyone received influences; mine are those expressed by Messick, Cronbach, Ghiselli, Anastasi, etc...Which are yours? Up to now no-one presented arguments from which I may change the background I learned. It is one thing to recognize problems with the classical approach but where is the solution (and the rationale behind that new approach)? This is where History and Philosophy are IMPORTANT.

-- Rock Faulkner (rock.faulkner@umontreal.ca), February 24, 2003.

Rock is technically correct here, I think, if one restricts onself to *measurements* of validity that are based on the same statistical foundations as the standard reliability measures (i.e., covariance). When I said, way back, that something could be valid without being terribly reliable, I was speaking of validity more conceptually, including things such as "face validity" among the various kinds of validity one might consider. In effect, what I was thinking was that a test could have the "right" mean (i.e., be valid, loosely speaking) but have a fairly large spread around that mean. Naturally this would make it a not-terribly-appealing measure of whatever it was one was attempting to study.

-- Christopher Green (cgreen@chass.utoronto.ca), March 08, 2003.

Christopher, Rock, and I have corresponded with one another about this and I think we're converging. I too was talking about validity in general, not just statistical validity. I'm convinced that the mantra "you can't have validity without reliability" can be directly attributed to the cross-multiplied version of the formula for the correction for attenuation in which zero reliability for either X or Y wipes out the possibility of non-zero statistical validity.

-- tom knapp (tknapp5@juno.com), March 12, 2003.

Moderation questions? read the FAQ