- Is pertinent to the consistency of measurement
- There are various types of reliability, all of which are estimated quantitatively
- Most basic way to understand the concept is through repeated measurements
- Reliability and measurement error are essentially antonymous
- We calculate various reliability estimates to understand the nature and magnitude of the measurement error associated with scores obtained from instruments and tests (tools and tests don’t have reliability, only scores)
Importance of reliability:
- A very large percentage of psychology is relevant to the estimation associations between variables, broadly defined (regression o ANrVA).
- If the variables we use to test our hypotheses are associated with a larger amount of measurement error, then the whole exercise is pointless.
Classical test theory (CTT):
- Is a measurement theory that defines the conceptual basis of reliability
- Also specifies procedures for estimating the reliability of scores derived from a psychological test or instrument
- A person’s observed score on a test is a function of that person’s true score, plus error:
Xr = XT + XE
True scores:
- A hypothetical score devoid of measurement error, never be able to actually measure them
- Conceived in the context of a particular test or instrument- not a construct
- Note that true scores are not ‘construct scores’, there is no such thing
- True scores may be perfect but this is only true in the context of measurement error associated with data derived from a particular test or instrument
- True scores can be perfect from error standpoint but absolutely terrible from a valid representation of a construct standpoint
- g. person steps on scales 10 times, each time the weight is the same. This score is perfectly reliable, a true score. However how do you now that the weight the scale says is actually the person’s weight? Weight could be consistently wrong
Observed scores:
We obtain from tests or instruments, the actual measurement.
We want our observed scores to be close to their corresponding true scores as much as possible
- The differences between the two scores is the reliability.
- Discrepancy between observed scores and true scores is considered to be due to measurement error
Observed, true and error scores:
- All other things equal, you want there to be a large positive correlation between observed scores and true scores o This correlation exists only in theory
- By contrast, you want observed scores and error scores to be uncorrelated
- If the observed scores and the error scores are correlated highly, it means they are measuring the same process: error.
Error scores:
- Should have a mean of zero o This is because there should e just as many people that should have an observed score that is too large as too small
- Should be a random process o As they are random, they should not correlate with anything (except possibly their corresponding observed scores)
- Error scores should be uncorrelated with true scores o Whether you are genuinely high or low on self-esteem, the ‘extraneous’ error related factors should effect people equally
R2 between observed and true scores:
- When you square the reliability index (r), you get a conceptual estimation of the reliability
RXX = rot2
- The final term (e.g RXX= 0.48) concludes that 48% of the variance in observed scores is shared with true scores
Interpretation guidelines for variability:
- 60 – too low for any purpose
- 70 – bare minimum acceptable for beginning stage research
- 80 – good level for research purposes
- 90+ – necessary in applied contexts where important decisions are made about individuals
Ratio of true score variance to observed variance:
This conceptualisation is similar to eta squared: the ratio of SSEFFECT to SSTrTAL
Conceptually, in the reliability case it is the ratio of SSTRUE to SSrBSERVED
- Formulated as: Rxx = st2 / So2
Lack of correlation between O and E:
- If reliability is the correlation between true scores and observed scores, then it is necessarily the case that it is the relative absence of a correlation between observed and error scores
- Rxx = 1 -r2oe
Relative lack of error variance:
- Instead of the ratio of true score variance to observed variance, in this case we speak of the ratio of error variance to observed variance
- We subtract this ratio by 1 to place in the same context of reliability (rather than error)
- Rxx = 1 – s2e / s20 Conceptualisations review:
Parallel tests:
- Two tests are considered parallel if they are identical to each other psychometrically, but differ in the actual items that make up each test
- All tau-equivalence assumptions:
o Implies that the true scores associated with each test represent the same construct o Thus a person’s true score on one test would be expected to be identical on the other test
- Assumes equal error variance between the two tests as well
Parallel tests and reliability:
According to CTT, the correlation between the composite scores on test 1 and the composite
scores on test 2 represent the reliability associated with the scores
The closer the correlation is to 1.0 the more reliable we consider the scores
- Note the correlation between tests can still be reliable, but invalid at representing the attribute or construct of interest