jeromedelisle.org

School of Education
University of the West Indies, St. Augustine
St. Augustine

ph: 868-477-1500

delislejerome@gmail.com

  • Home
  • My Life in Pictures
  • The Research PageClick to open the The Research Page menu
    • List of Publications
  • EDEA 6202-Educational Improvement Project 2020Click to open the EDEA 6202-Educational Improvement Project 2020 menu
    • School Self Evaluation
    • Step by Step
  • EDRS 6910-Mixed Methods Research Designs: Issues and ApplicationsClick to open the EDRS 6910-Mixed Methods Research Designs: Issues and Applications menu
    • MMR-AN INTRODUCTION
    • Course Calendar
    • MMR-History & Controversies
    • Readings
    • Designs
    • Assignment
    • Paradigms/ Paradigmatic Frameworks
    • Integration in MMR
  • EDME 6121-Evaluation of Educational InstitutionsClick to open the EDME 6121-Evaluation of Educational Institutions menu
    • EDME 6121-Evaulation of Educational Institutions (Introduction)
    • EDME 6121-COURSE ASSESSMENT & EXAMINATIONS
    • EDME 6121-GENERAL TOPICS IN EVALUATION
    • EDME 6121-National Systems of Evaluation in Education
    • EDME 6121-International Benchmarking Page
    • EDME 6121-Programme Evaluation Page
    • EDME 6121-Logic Modelling
    • EDME 6121- Models & Designs
    • A Trinidad and Tobago Evaluation Association?
  • EDEA 6201-SCHOOL AND SYSTEM IMPROVEMENTClick to open the EDEA 6201-SCHOOL AND SYSTEM IMPROVEMENT menu
    • Course Content
    • Keep up-to-date
    • Theory Page
    • Key Readings
    • History Videos
    • Events
    • High Performing School Systems-Read & Watch
    • Evidenced-informed system and school improvement
    • The coherence framework
    • How School Systems Improve
    • Benchmarking for system improvement
    • Educational Improvement-Improving Learning Ecosystems
    • High Performing Systems-More Videos
    • Even More Videos-Snapshots of Improvement
    • Strong Performers and Successful Reformers in Education (OECD PISA)
    • Innovative school change
    • School Turnaround theory
    • Improving Teacher Preparation Systems
    • Assignment Page
  • EDME 6300-Measurement TheoryClick to open the EDME 6300-Measurement Theory menu
    • Course Schedule
    • Measurement Publications
    • People in Measurement
    • Validity Theory
    • Validity Theory Applied to the Caribbean
    • The Validity Debates
    • The Validation Process
    • Test Fairness
    • Item and test bias
    • Assignment Page
  • EDEA 6200-ACTION RESEARCH FOR SCHOOL IMPROVEMENTClick to open the EDEA 6200-ACTION RESEARCH FOR SCHOOL IMPROVEMENT menu
    • What is Action Research?
    • History & Thinkers
    • Books & References
    • The Reading Room
    • Nuts & Bolts
    • Whole School Interventions
    • SAMPLE STUDY
    • The Viewing Room
    • Brainstorming Strategies
    • Assignment Page
    • Assignment Rubric Page
    • Giving Presentations
    • Starting up a PLC
    • What Works Page-
    • Improvement & Implementation Science Tools
    • Research & Evidence in the Schools
    • The Class of 2016 Issues
  • EDME 6006-ASSESSMENT & EVALUATION-2016Click to open the EDME 6006-ASSESSMENT & EVALUATION-2016 menu
    • Updates-week by week
    • TOBAGO SCHEDULE
    • EDME 6006-ASSIGNMENTS
    • A Model for Classroom Assessment In Trinidad & Tobago
    • Assessment System-Purposes and Types
    • Issues in assessment
    • New videos on classroom assessment
    • More videos on classroom assessment
    • Experts on Formative Assessment
    • Resources & Support-Formative Assessment & Item Writing
    • Resources & Support-A call for extended performance assessments
    • Resources and Support-All about rubrics
    • Assessment in the New Primary School Curriculum
    • Item Analysis Support
    • Test Fairness & Validity
    • More on test validity
    • NEW PRACTICE QUESTIONS
    • Resources & Support-Measurement Topics
  • Friends and Family

Validity Theory

BASIC TENETS

1. Validity is NOT a property of a test.

2. A test cannot be valid or invalid.

3. What we seek to validate are (inferences) and uses of test scores.

4. Validity is not all or none.

5. Test validity must be evaluated with respect to a specific testing purpose.

6. Evaluating the validity of inferences derived from test scores requires multiple lines of evidence - different types of evidence for validity.

7. Test validation never ends—it is an ongoing process.

8. Construct validity is the all-encompassing form of validity evidence where the construct of interest is measured in a precise and accurate fashion while being interpretable within the population. All forms of validity evidence fall under the guise of construct validity. As researchers acquire more types of validity evidence, they will be generating more evidence of construct validity.

 


 

 Kelley                                               Meehl                                                 Cronbach

Loevinger                                                   Messick                                          Kane

Ideas on validity have evolved over time. 

  1. Kelley (1927) explicitly stated that validity was the extent to which a test really measures what it purports to measure and is appropriate for a specifically noted purpose.

  2. Between 1920 and 1950 criterion related validity came to be the gold standard. Some attention was also paid to the content validity model. The ability of interest was taken as a given-construct.

  3. In 1951, Cureton (1951) argued that validity was not simply a property of the test but depended upon the intended interpretation and use of scores. Traits were dispositions to behave in a certain way.

  4. The criterion based methodology was by then robust, but criterion measures were not readily available in all cases.

  5. Cronbach and Meehl (1955) wrote "Construct Validity in Psychological Tests" The basicideas were incorporated in the Technical Recommendations (APA, 1954).

  6. Construct interpretations were based upon scientific theory but there was difficulty applying to the social sciences.

  7. in 1957 Jane Loevinger suggested that construct validity is the whole of the subject from a systematic, scientific point of view.

  8. Campbell and Fiske (1959) multitrait-multimethod approach provided a methodology.

  9. Cronbach (1971) argued for an overall evaluation of validity, with many kinds of evidence. He softened his positioned on theories and nomological networks focusing instead upon a reasonably definite statement of the proposed interpretation.

  10. Although a softer version of construct validity was increasingly dominant, it was also agreed that effective validation required testable hypotheses from the proposed interpretation and use, proposed interpretation must be evaluated againts alternative interpretations, and a validation programme requires examination of claims and counterclaims.

  11. Messick (1975) agreed that construct validity was the whole of the subject and hypothesis testing was required. Although he did not require that the construct be enbedded in a theory, he stressed the need to be clear on construct  meanings and associated values.

  12. Messick was pragmatic about test validity and consequences but was strongly attuned to values.

  13. The 1985 Standards was a victory for the unified view- with this position mostly adopted. However, many lessons were also adopted from a strong programme on construct validity.

  14. By 1989, Messick presented a faceted unified view on validity. Despite the elegance of the theory, there was continued conflict over the application to validation. Indeed most tests were focused upon practical solutions rather than theory based use.

  15. By 1970, debate arose over the role of adverse impact as a negative consequence. By the 1990s the consequences debate was at its peak. To some extent Messick's position was misunderstood.

  16. A general argument based framework was developed but the utility of such to validation is questionable. Opposition has come from traditions in the UK which have built testing systems on public examinations.

 

We might examine the changes through the eyes of the standards o educational and psychological testing.

Five separate editions of the Standards have been issued to date, beginning with the Technical Recommendations for Psychological Tests and Diagnostic Techniques (APA, 1954), which were issued by APA alone.

 

•American Psychological Association (1954).  Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51, 2, 1-38.

•Prepared by a joint committee of the American Psychological  Association, American Educational Research Association, and National Council on Measurements Used in Education.

–“Validity information indicates to the test user the degree to which the test is capable of achieving certain aims. … “Thus, a vocabulary test might be used simply as a measure of present vocabulary, as a predictor of college success, as a means of discriminating schizophrenics from organics, or as a means of making inferences about "intellectual capacity."



Shown below are the 1966 to 2014 standards

 

 Stuart Shaw

 

Shaw, S., & Crisp, V. (2011). Tracing the evolution of validity in educational measurement: Past issues and contemporary challenges. Research Matters, 11(1), 14-9.

 

Stephen Sircei maps the changes in validity

1966:  AERA, APA, NCME

Standards for Educational and Psychological Tests and Manuals

Three “aspects” of validity:

–Criterion-related (concurrent + predictive)

–Construct

–Content



1974:  AERA, APA, NCME

Standards for Educational and Psychological Tests

Validity descriptions borrowed heavily from Cronbach (1971)

–Validity chapter in 2nd edition of “Educational Measurement” (edited by R.L. Thorndike)

 

1985:  AERA, APA, NCME

Standards for Educational and Psychological Testing

Described validity as unitary concept

Notion of validating score-based inferences

Very Messick-influenced



Samuel Messick's contribution was invaluable

“Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores and other modes of assessment.” (p. 13)

Messick's Model- interconnected facets of validity as a unitary concept

1999: AERA, APA, NCME

Incorporated the “argument-based approach to validity”

Five “Sources of Validity Evidence”

1.Test content

2.Response processes

3.Internal structure

4.Relations to other variables

5.Testing consequences

 

“Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.” (p. 9)

 

“Validation can be viewed as developing a scientifically sound validity argument to support the intended interpretation of test scores and their relevance to the proposed use.” (AERA et al., 1999, p. 9)

 

2014:AERA, APA, NCME

  • Standards (2014): Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.

Validity is requires different kinds of evidence, with the intended uses need to be justified from different aspects.

The three major sources of validity evidence are:

  1. Content-related: Test content

  2. Criterion-related: Relations to other variables

  3. Construct-related: Internal structure

 

 

Content-Related Validity

  • refers to an assessment of whether a test contains appropriate content and requires that appropriate processes be applied to that content. We need a specific explicit statement of what the test is intended to measure (Test Blueprint), to either assess the content validity of an existing test or construct a test that measures a particular set of contents

  • A test blueprint (also called table of specifications for the test) is an explicit plan that guides the test construction. E.g., An English literacy test.

➢ Description of content to be covered by the test.
➢ Specifications of cognitive processes in each content area.

 

Criterion-Related Validity:

The criterion-related validity focus on the degree to which it correlates with some chosen criterion measure of the same construct (relations to other variables). There are two broad classes of this validity form.

  • Predictive validity: if the test information is to be used to forecast future criterion performance.

Example: Use spelling test scores to predict reading test scores, the validity of the SAT scores for predicting First-Year Grades given high-school GPA.

  • Concurrent validity: whether the scores on the test correlate highly with scores obtained concurrently with another criterion.

Example: A new test vs. an old test measuring the same construct. Usually the scores on both tests are obtained at essentially the same time.

Construct-Related Validity:

requires collecting multiple types of evidence.

Four commonly used approaches are:

  1. Provide correlational evidence showing that a construct has a strong relationship with certain variables and a weak relationship with other variables.

The valid measures of a construct will indicate that it should be strongly related to certain measures (Convergent validity), and it should be weakly related to others (Discriminant validity).

Convergent validity is the degree to which concepts that should be related theoretically are interrelated in reality. Discriminant validity is the degree to which concepts that should not be related theoretically are, in fact, not interrelated in reality.

 

An explicit method for studying the patterns of high and low correlations among a set of measures is called the analysis of Multi- Trait Multi-Method (MTMM) matrix of correlations.

  1. Show that certain groups obtain higher scores than other groups, with the high- and low-scoring groups being determined on logical grounds prior to the test administration. If a theory suggests that certain groups should possess an especially high or low level of a trait and, consequently, should score exceptionally high or low on a test measuring that trait, construct validity can be assessed based on predictions about group differences.

  2. Study the construct that underly performance (i.e., scores) on a test using factor analysis.

Read more here

 

FACTOR ANALYSIS

The factor analysis investigates the construct validity from the perspective of examining the Internal structure of the construct. It investigates if the items “hang together” to measure the construct.

The two primary classes of factor analytic methods are exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).

Exploratory factor analysis (EFA):

EFA explores factor structures without a consideration of the theoretical expectations of the researcher, even when such expectations are available.

  1. An exploratory tool to understand the underlying structure of a construct

  2. Explore the number of dimensions/factors underly the performance (i.e., scores)

  3. Explore which set of items “hang together” to measure each dimension.

Confirmatory factor analysis (CFA)

CFA is used to validate a pre-specified structure and to quantify the fit of each model to the data. In EFA, a single model is tested, but CFA can readily be used to test several competitive models and compare the fit among the models.

It is strongly encouraged to test all plausible models using CFA and report which model fits better than others based on fit indices.







 

  • Xioming Xi on Language Assessment and Validation
  • Robert Mislevy
  • No test is neutral
  • Admissions test-Cambridge Assessment
  • John Kunnan-Language Assessments-Fairness

Copyright 2009 jeromedelisle.org. All rights reserved.

Web Hosting by Turbify

School of Education
University of the West Indies, St. Augustine
St. Augustine

ph: 868-477-1500

delislejerome@gmail.com