- Test: measuring device or a procedure
- Psychological test: involves sample of behaviour e.g. responses to a questionnaire, performance on task, oral responses.
- Even two psychological tests designed to measure personality may differ in their content – because two test developers may have different ideas about what is important in measuring ‘personality’
- Tests may have different theoretical orientations e.g. psychoanalytically oriented personality test may have little resemblance to those on a behaviourally oriented personality test, yet both are personality tests.
- Format: form, plan, structure, arrangement and layout of test items as well as to related considerations such as time limits.
- Format also used to refer to the form in which the test is administered: computerised, pen and paper, or some other form.
- tests differ in administration procedures – some tests may require an active and knowledgable administrator (one on one basis), some may not require one to be present.
- Tests differ in their scoring and interpretation procedures.
- Score: code or summary statement
- Scoring: process of assigning such evaluative codes or statements to performance on tests, tasks, interviews, or other behaviour samples.
- Scores themselves can be described and categorised in many ways
- g. a cut score: a reference point usually numerical, derived by judgment and used to divide a set of data into two or more classifications.
- Test differ with respect to their psychometric soundness or technical quality.
- Psychometrics: the science of psychological measurement.
The interview
- Interview: method of gathering information through direct communication involving reciprocal exchange.
- Interview as a tool of psychological assessment involves more than just talking – if the interview is face to face, the interviewer is probably taking note of not only the content of what is said but also the way it is being said, verbal and non-verbal behaviour e..g body language, movements, facial expressions, the extent of eye contact, willingness to cooperate, and generally the demands of the interview.
- Ideally conducted face to face
- Panel interview: more that one interviewer, reduced bias
The portfolio
Portfolio: work products
Case history data
- Case history data: refers to records, transcripts, and other accounts in written, pictorial or other form that preserve archival information, official and informal accounts, and other data and items relevant to an assessee.
- May include files from institutions such as schools, hospitals, employees, religious institutions, and criminal justice agencies.
- Useful tool in a wide variety of assessment contexts – in a clinical evaluation, for example, case history data can shed light on an individuals past and current adjustment as well as on the events and circumstances that may have contributed to any changes in adjustment.
- Critical in neuropsychological evaluations – provide info about neuropsychological functioning prior to the occurrence of a trauma or other event that results in a deficit.
- Assembly of case history data = Case study: report or illustrative account concerning a person or event that was complied on the basis of case history data.
Behavioural observation
- behavioural observation: monitoring the actions of oneself by visual or electronic means while recording qualitative and/or quantitative information regarding those actions.
- Used as diagnostic aid e.g. in inpatient facilities, behavioural research labs, classrooms, tool for selection in organisational settings
- Naturalistic observation
- Aid to designing therapeutic intervention
- Used by researchers in classrooms, clinics, prisons – not practical or economically feasible for private practitioners to spend time out of the consulting room observing clients as they go about day to day lives.
Role-play tests
- Role-play test: tool of assessment wherein assessees are directed to act as if they were in a particular situation.
- Useful in evaluating various skills – e.g. grocery shopping skills
Computers as tools
- Can serve as test administrators or efficient scorers.
- Scoring may be done on site (local processing), or at a central location (central processing)
- Simple scoring report: listing of a score or scores
- Extended scoring report: statistical analyses of the testers performance.
- Interpretive report: inclusion of numerical or narrative interpretive statements in the report.
- Consultative report: language appropriate for communication between assessment professionals, may provide expert opinion
- Integrative report: employ previously collected data e.g. medication records or behavioural observation data into the report
- CAPA (computer assisted psychological assessment)
CAT (computer adaptive testing) – adaptive is a reference to the computer’s ability to tailor the test to the testtaker’s ability or test-taking pattern.
- Internet testing has disadvantages —> ‘test-client integrity’ – varying interests of the tester versus the test administrator.
- Test takers may have unrestricted access to notes, other internet resources, other aids in test taking
Other tools
- Video
- Medical tools – thermometer to measure BT, BP
- Biofeedback equipment
WHO, WHAT, WHY, HOW AND WHERE?
Who are the parties?
- Test developer
- Test user
- Test taker
- Society at large
- Other parties
In what types of settings are assessments conducted and why?
Educational settings
- Achievement test
- Diagnostic tests of reading, mathematics and other subjects may be administered to assess the need for educational intervention and to establish or rule out eligibility for special education programs.
- Informal evaluation: typically nonsystematic assessment that leads to the formation of an opinion or attitude. g. school report
Clinical settings
- Help screen for or diagnose behaviour problems
- intelligence tests, personality tests, neuropsychological tests
- ONLY ONE individual at a time
Counseling settings
- May occur in environments such as schools, prisons and governmental or privately owned institutions.
- Measures of social and academic skills and measures of personality, interest, attitudes and values
- g. How can this child better focus on tasks?, For what career is this client best suited?, What activities are recommended for retirement?
Geriatric settings
- evaluate cognitive, psychological, adaptive, or other functioning.
Extent to which assesses are enjoying as good a quality of life as possible.
- Screening for cognitive decline and dementia
- Severe depression can lead to cognitive functioning that mimics dementia – pseudodementia
Business and military settings
- Assessment used for decision making about the careers of personnel
- Achievement, aptitude, interest, motivational
- Hiring employees, promotions, job satisfaction
- Marketing – taking the interests of consumers, diagnosing what is wrong about brands, products and campaigns.
Governmental and organisational credentialing
- g. law students passing the bar test to become a lawyer
Academic research settings
- Other settings
How are assessments conducted?
- Test users have obligations before, during an after a test is administered
- test administrator must be familiar eh test materials and procedures and must have at the test site all the materials needed to properly administer the test.
- Protocol: form/sheet/booklet on which a test takers responses are entered
- Building rapport important
- Following test administration – test users must safeguard the test protocols, convey the results in a clearly understandable fashion.
- Accommodation: adaption of a test, procedure, or situation or the substitute of one test for another, to make the assessment more suitable for an assess with exceptional needs.
A HISTORICAL PERSPECTIVE
SOME ASSUMPTIONS ABOUT PSYCHOLOGICAL TEXTING AND ASSESSMENT
Assumption 1: Psychological Traits and States Exist
- Trait: any distinguishable, relatively enduring way in which one individual varies from another
- State: distinguish one person from another but are relatively less enduring, temporary.
- A psychological trait exists only as a construct – a constructed way to describe or explain behaviour – can see their existence from overt behaviour.
- Trait not expected to manifest 100% of the time – yet there seems to be rank order stability in personality traits.
- Nature of situation and context
Assumption 2: Psychological traits and states can be measured
- Once its acknowledged that psychological traits and states do exist, the specific traits and states to be measured and quantified need to be carefully defined.
E.g. the term aggressive can be used in many ways, e.g. an aggressive salesperson, an aggressive killer – if a personality test yields a score purporting to provide info about how aggressive a testtaker is, a first step in understanding the meaning of that score is understanding how aggressive was defined by the test developer – what types of behaviours are presumed to be indicative of someone who is aggressive as defined by the test?
- Once having defined the trait or construct to be measured, a test developer considers the types of behaviours presumed to be indicative of the targeted trait e.g. if the test developer considers knowledge of American history to be one component of intelligence in US adults, then the item ‘Who was the second president of the United States?’ may appear on the test. Or social judgment ‘Why should guns in the home always be inaccessible to children?’
- Should the correct response to the knowledge of American History have more weight than social judgment? or vice versa? – weighting the comparative value of a test’s items comes about as the result of a complex interplay of factors
e.g. technical considerations, the way a construct has been defined for purposes of the test, and the value society (and the test developer) attaches to the behaviours evaluated.
• Measuring traits and states also involve appropriate ways to score the test and interpret the results
- Test score is presumed to represent the strength of the target ability or trait or state and is frequently based on cumulative scoring: the more the test taker responds in a particular direction as keyed by the test manual as correct or consistent with a particular trait, the higher that test taker is presumed to be on that targeted ability or trait.
Assumption 3: Test-Related Behaviour predicts Non-Test-Related behaviour
- g. patterns of answers to true false questions on one widely used test of personality are used in decision making regarding mental disorders.
- Tasks in some tests mimic actual behaviours that the test user is attempting to understand —> however such tests yield only a sample of behaviour of what can be expected under non-test conditions.
- Obtained sample of behaviour used to make predictions about future behaviour e.g. work performance of a job applicant
- In some forensic legal matters, psychological tests may be used to postdict behaviour – aid in understanding of behaviour that has already occurred.
Assumption 4: Tests and other Measurement Techniques Have Strengths and Weaknesses
Assumption 5: Various Sources of Error are Part of the Assessment Process
Error: something that is more than expected, factors other than what a test attempts to measure will influence performance on the test.
Error variance: the component of a test score attributable to sources other than the trait or ability measured E.g. whether an assessee has the flu when taking a test is a source of error variance
Classical Test Theory (CTT): the assumption is made that each testtaker has a true score on a test that would be obtained but for the action of measurement error.
Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner
- One source of fairness-related problems is a test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test was intended.
- Some potential problems more political than psychometric
Assumption 7: Testing and Assessment Benefit Society
In a world without tests, people could present themselves as surgeons, airline pilots etc. regardless of their background, ability or professional credentials.
- No way to diagnose educational difficulties and remedy problems
- No instruments to diagnose neuropsychological impairments etc.
WHAT IS A GOOD TEST?
Reliability
- Reliability: Consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements.
- Reliability is a necessity, but not a sufficient element of a good test – must also be valid (accurate) Validity
- Validity: a test measures what it is supposed to measure.
NORMS
- Norm-references testing and assessment: evaluating an individual test-taker’s score and comparing it with scores of a group of test-takers – standing or ranking against some comparison group of test takers.
- Norms: test performance data of a particular group of test-takers that are designed for use as a reference when evaluating or interpreting individual test scores.
- Normative sample: group of people who’s performance on a particular test in analysed for reference in evaluating the performance of individual test-takers.
- People in the normative sample will be typical with respect to some characteristic(s) of the people for whom the test was designed.
Sampling to develop norms Sampling
- A test developer can obtain a distribution of test responses by administering the test to a sample of the population.
- Stratified sampling
- Stratified random sampling
- Purposive sampling: when we arbitrarily select some sample because we believe it to be representative of the population. Manufacturers of products often use purposive sampling when they test the appeal of a new product in one city or market and make assumptions about how that product would sell nationally.
- Incidental/Convenience sampling: a sample that is convenient or available to use e.g. first year psychology students at Monash Uni.
Developing norms for a standardised test
Having obtained a sample, the test developer administers the test according to the standard set of instructions and conditions that will be used with the test.
e.g. if the normative group does the test under noisy conditions, and the test takers complete the test under quiet conditions, the test takers would most likely do better than the normative group, resulting in a higher standard score. Normative group must be similar in characteristics to test takers
Types of norms
Percentiles
- Percentile norms: the raw data from a tests standardisation sample converted to percentile form.
- Percentile: an expression of the percentage of people whose score on a test or measure falls below a particular raw score
- Percentage correct: refers to the distribution of raw scores – to the number of items that were answered correctly multiple by 100 and divided by the total number of items.
- a problem wit using percentiles with normally distributed scores is that real differences between raw scores may be minimised near the ends of the distribution and exaggerated in the middle of the distribution, may be worse with highly skewed data.
Age norms
- Indicate the average performance of different samples of test takers who were at various ages at the time the test was administered.
- g. if the measurement under consideration is heights in inches for example, we know that scores (heights) for children will gradually increase at various rates as a function to age up to the middle to late teens. • Performance on psychological tests as a function of advancing age
Grade norms
- Designed to indicate the average test performance of test takers in a given school grade
- Grade norms developed by administering the test to representative samples of children over a range of consecutive grade levels.
- Next the mean or median score for children at each grade is calculated
- Drawback: useful only with respect to years and months of schooling completed
National norms
- National norms are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.
- g. national norms may be obtained by testing large numbers of people representative of different variables of interest such as age, gender, race, SES, geographical location, and different types of communities within various parts of the country (e.g. rural, urban, suburban)
National anchor norms
- National anchor norms provide some stability to test scores by anchoring them to other test scores
- Usually begins with the computation of percentile norms for each of the tests to be compared.
- Using the equipercentile method, the equivalency of scores on different tests is calculated with reference to corresponding percentile scores.
- g. if the 96th percentile corresponds to a score of 69 on the BRT and the 96th percentile corresponds to a score of 14 on the XYZ test, then we can say that a BRT score of 69 is equivalent to an XYZ score of 14.
- National anchor scores obtained from the same sample.
Subgroup norms
- A normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample.
- g. suppose the criteria in selecting children for the XYZ Reading Test normative sample were age, education level, SES, geographic region, community type and handedness – the test manual may report normative information by each of these subgroups.
Local norms
- Typically developed by test users
- Local norms provide normative information with respect to the local population’s performance on some test.
- g. individual high schools may wish to develop their own school norms (local norms) for student scores on an examination that is administered statewide.
Norms provide a context for interpreting the meaning of a test score.
Fixed reference group scoring system: the distribution of scores obtained on the test from one group of test takers (the fixed reference group) is used as the basis for the calculation of test scores for future administrations of the test.
E.g. SAT
Norm-Referenced versus Criterion-Referenced Evaluation
- Norm-referenced: derive meaning from a test score by evaluating the test score in relation to other scores on the same test.
- Criterion-referenced: derive meaning from a test score by evaluating it on the basis of whether or not some criterion has been met
- Criterion: a standard on which a judgment or decision may be based.
- Criterion referenced testing and assessment: a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard.
- g. to be eligible for a diploma, students must demonstrate at least a sixth-grade reading level
- the criterion in criterion-referenced assessments typically derives from the values or standards of an individual or organisation.
- in norm referenced interpretations of test data, a usual area of focus is the how an individual performed relative to other people who took the test
- in criterion-referenced interpretations of test data, a usual area of focus is the testtaker’s performance: what the testtaker can or cannot do; what the testtaker has or has not learned, whether the test taker does or does not meet specified criteria for inclusion in some group, access certain privileges etc.
- g. if a standard of criterion, for passing a hypothetical Airline Pilot Test (APT) has been set at 85%, the trainees who scores 84% correct or less will not pass
- Disadvantages:
- Potentially important info about an individuals performance relative to other test takers is lost
- Although this approach may have value with respect to the assessment of mastery of basic knowledge, skills or both, it has little or no meaningful application at the upper end of the knowledge/skill continuum – identifying stand alone originality or brilliant analytic ability is not the stuff of which criterion-oriented tests are made
- In contrast, brilliance and superior abilities are recognisable in norm-referenced tests.