Validity- defined & described:
- The degree to which a test measures what it is supposed to measure
- The most important issue in psychological measurement
- More formally, the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of a test
- Test is valid:Thus, a test itself is neither valid or invalid o Concerns the interpretations and uses of a measure’s scores
- Is related to the proposed uses of the scores
- g. final exam scores are used principally to order students from most knowledgeable/competent to least knowledgeable/competent
- There is some level of interpretation of degree of knowledge/competence o g. someone who scored 90 on the final would be expected to have knowledge/competence of the material. Contrasting, someone who scored 20 on the final would be expected to have knowledge/competence of less than 50% of the maternal
- People do tend to refer to a test as valid
- This is incorrect:
o Naivety o Laziness
- The test itself isn’t valid, but the interpretations of the test is valid
Validity is a matter of degree:
Validity is not an all or none issue
- The validity of test score interpretations should be conceived in terms of strong versus weak rather than valid versus invalid
- When you choose a psychological test, you should choose the test that will support the interpretations that you want to make from the test scores
- Typically there are several tests on the market from which to choose, validity should be one of the primary considerations
- Validity is based on empirical evidence and theory
- It’s not good enough to hear someone say that the test (or scores) are valid in someone’s experience
- There are many popular tests out there that have little or no validity o g. writing analysis as an indicator of someone’s personality (no evidence) o E.g. colour quiz (no evidence)
How is validity determined empirically?
- Unlike internal consistency reliability, there is no single analysis that can be used to represent the degree to which the interpretations of test scores are valid
- Instead, several different types of analyses are conducted
- Some validity analyses are quantitative and do involve statistical analyses
- The pursuit of establishing the validity of the interpretation of test scores revolves around the concept of construct validity
- Construct validity refers to the degree to which test scores can be interpreted as reflecting a particular psychological construct
Test content:
- Most fundamental type of validity
- Represents the match between the actual content of the test and the content that should be included in the test
- If test scores are to be interpreted as indicators of a particular construct of interest, then the items included in the test should reflect the important facets of the construct
- The description of the nature of the construct should help define the appropriate content of the test
- Two types of validity related to test content:
o Content validity o Face validity
Content validity:
A test may be suggested to be associated with good content validity when the items cover the entire breadth of the construct
However the items cannot exceed the boundaries of the construct
- g. final exam, there should be items in the exam from all lectures in the semester. But there should be no items from a different unit
Face validity:
- The degree to which the items associated with a measure appear to be related to the construct of interest
- This appearance is in the judgement of ‘non-experts’
- Isn’t crucial from a fundamental psychometric perspective, just more a practical consideration of respondents
- Respondents need to be made to feel that they are responding to items that are relevant to the task at hand
- g. trying to hire introverts for a traffic controller job, candidates are asked to respond if they are ‘the life of the party’. Some applicants may say this question is unrelated to the job, don’t want to answer even though it would be useful.
- Disadvantages:
o People can respond in a way that they think is most advantageous for them
Factorial validity:
- When a test is designed, it is typically done so in such a way that the number of dimensions and facets are specified
- Use a technique known as factor analysis to evaluate the factorial validity of the scores derived from a test
- There are two types of factor analysis:
o Unrestricted factor analysis o Restricted factor analysis
Response processes:
- There should be a close match between psychological processes that the respondents actually use when completing a measure and the process that they should use
- You can’t just assume that people are going to do what you expect them to do
- g. responding well to questions because you want the job, not because they possess an attribute
Association with other variables:
- Another type of validity involves the association between test scores and other variables
understanding of scores from a test will be, in part shaped by the association between those scores and other measures or variables
We would expect a particular pattern of associations
Emotional intelligence:
- Several researchers have created psychological inventories designed to measure emotional intelligence
- To establish the validity of the scored derived from the inventories, they specified that the EI scores should:
- Correlate positively with intellectual intelligence
- Correlate negatively with the neuroticism personality dimension o Correlate positively with age
- No correlation with a measure of morningness/eveningness
Convergence evidence:
- Usually described as convergence validity
- The degree to which test scores are correlated with tests of related constructs
- Emotional intelligence should correlate positively with intellectual intelligence
- There should be a positive relationship between your self-reported scores and the raterreported scores. This evidence is known as consensual validity
Discriminant evidence:
- Also known as discriminant validity
- The degree to which test scores are uncorrelated with tests of unrelated constructs
- It often helps to know what a construct is not in the process of its validation
- Constructs should not correlate with everything under the sun, if they do they boundaries are overly expansive
- Researchers do not hypothesise the correlation to be zero, but just generally low
- When the correlation is so big, you know there is no discriminant validity
Concurrent validity:
- rbserved when the scores from one measure correlate in a theoretically meaningful way with the scores of another measure which is considered the ‘gold standard’
- g. correlating scores from a new IQ test with the WAIS
- Least compelling evidence, in some cases there is no gold standard test
Predictive validity
The degree to which test scores are correlated with relevant variables that are measured at a future point in time
E.g. correlate university grades with future annual earnings
- Most impressing evidence, but is relatively rare because of the time and resources required to keep track of people over time
Consequential validity:
- The social/personal consequences associated with using a particular test
- g. two tests were equally predictive of a criterion of interest, but one of the tests tended to yield scores that were biased against women, then we would consider the non-biased test to be associated with greater consequential validity
Criterion validity:
- Non-theoretical approach to validation
- g. psyc lab exam, administer the tests to students, staff, post-grad students
- The spss lab exam scores would be associated validity, if there was a linear trend in the means across the three groups (with staff scoring highest)
Induction-construct development interplay:
- There are occasions where a measure is developed solely from an inductive perspective
- g. create a measure of personality by including all of the ‘person-descriptive’ adjectives in the dictionary (moody, unpredictable)
- People rate the degree to which all of the adjectives describe them
- Then the researcher would factor analyse all of the responses to help uncover the common dimensions
Contrasting reliability and validity:
- Very related but very consistent
- Reliability is pertinent to consistency in measurement
- Differences in test scores from the perspective of reliability reflect differences among people in their levels of the trait that affects test scores- whatever the trait may be
- Validity by contrast, is directly related to the nature of the trait supposedly being assessed by the measure o Reliability is a property of test scores o Validity is a property of test score interpretations o Validity is closely tied to psychological theory (reliability is not)
- Reliability is a necessary but not sufficient condition for validity (need consistency for any hope of consistency)