Construct validity

Construct validity is "the degree to which a test measures what it claims, or purports, to be measuring."[1][2][3] In the classical model of test validity, construct validity is one of three main types of validity evidence, alongside content validity and criterion validity.[4][5] Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence.[6][7]

Construct validity is the appropriateness of inferences made on the basis of observations or measurements (often test scores), specifically whether a test measures the intended construct. Constructs are abstractions that are deliberately created by researchers in order to conceptualize the latent variable, which is correlated with scores on a given measure (although it is not directly observable). Construct validity examines the question: Does the measure behave like the theory says a measure of that construct should behave?

Construct validity is essential to the perceived overall validity of the test. Construct validity is particularly important in the social sciences, psychology, psychometrics and language studies.

Psychologists such as Samuel Messick (1998) have pushed for a unified view of construct validity "...as an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores..."[8] Key to construct validity are the theoretical ideas behind the trait under consideration, i.e. the concepts that organize how aspects of personality, intelligence, etc. are viewed.[9] Paul Meehl states that, "The best construct is the one around which we can build the greatest number of inferences, in the most direct fashion."[2]

Scale purification, i.e. "the process of eliminating items from multi-item scales" (Wieland et al., 2017) can influence construct validity. A framework presented by Wieland et al. (2017) highlights that both statistical and judgmental criteria need to be taken under consideration when making scale purification decision.[10]

History

Throughout the 1940s scientists had been trying to come up with ways to validate experiments prior to publishing them. The result of this was a myriad of different validities ( intrinsic validity, face validity, logical validity, empirical validity, etc.). This made it difficult to tell which ones were actually the same and which ones were not useful at all. Until the middle of the 1950s there were very few universally accepted methods to validate psychological experiments. The main reason for this was because no one had figured out exactly which qualities of the experiments should be looked at before publishing. Between 1950 and 1954 the APA Committee on Psychological Tests met and discussed the issues surrounding the validation of psychological experiments.[2]

Around this time the term construct validity was first coined by Paul Meehl and Lee Cronbach in their seminal article " Construct Validity In Psychological Tests". They noted the idea that construct validity was not new at that point. Rather, it was a combinations of many different types of validity dealing with theoretical concepts. They proposed the following three steps to evaluate construct validity:

  1. articulating a set of theoretical concepts and their interrelations
  2. developing ways to measure the hypothetical constructs proposed by the theory
  3. empirically testing the hypothesized relations[2]

Many psychologists note that an important role of construct validation in psychometrics was that it place more emphasis on theory as opposed to validation. The core issue with validation was that a test could be validated, but that did not necessarily show that it measured the theoretical construct it purported to measure. Construct validity has three aspects or components: the substantive component, structural component, and external component.[11] They are related close to three stages in the test construction process: constitution of the pool of items, analysis and selection of the internal structure of the pool of items, and correlation of test scores with criteria and other variables.

In the 1970s there was growing debate between theorist who began to see construct validity as the dominant model pushing towards a more unified theory of validity and those who continued to work from multiple validity frameworks.[12] Many psychologists and education researchers saw "predictive, concurrent, and content validities as essentially ad hoc, construct validity was the whole of validity from a scientific point of view"[11] In the 1974 version of The Standards for Educational and Psychological Testing the inter-relatedness of the three different aspects of validity was recognized: "These aspects of validity can be discussed independently, but only for convenience. They are interrelated operationally and logically; only rarely is one of them alone important in a particular situation".

In 1989 Messick presented a new conceptualization of construct validity as a unified and multi-faceted concept.[13] Under this framework, all forms of validity are connected to and are dependent on the quality of the construct. He noted that a unified theory was not his own idea, but rather the culmination of debate and discussion within the scientific community over the preceding decades. There are six aspects of construct validity in Messick's unified theory of construct validity.[14] They examine six items that measure the quality of a test's construct validity:

  1. Consequential – What are the potential risks if the scores are, in actuality, invalid or inappropriately interpreted? Is the test still worthwhile given the risks?
  2. Content – Do test items appear to be measuring the construct of interest?
  3. Substantive – Is the theoretical foundation underlying the construct of interest sound?
  4. Structural – Do the interrelationships of dimensions measured by the test correlate with the construct of interest and test scores?
  5. External – Does the test have convergent, discriminant, and predictive qualities?
  6. Generalizability – Does the test generalize across different groups, settings and tasks?

How construct validity should be properly viewed is still a subject of debate for validity theorists. The core of the difference lies in an epistemological difference between positivist and postpositivist theorists.