Psychological assessment is an important part of both experimental research and clinical treatment. One of the greatest concerns when creating a psychological test is whether or not it actually measures what we think it is measuring. For example, a test might be designed to measure a stable personality trait but instead, it measures transitory emotions generated by situational or environmental conditions. A valid test ensures that the results are an accurate reflection of the dimension undergoing assessment. Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are four types of validity.

Content Validity

When a test has content validity, the items on the test represent the entire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics. In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge bases their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.

A test is said to have criterion-related validity when it has demonstrated its effectiveness in predicting criteria, or indicators, of a construct. For example, when an employer hires new employees, they will examine different criteria that could predict whether or not a prospective hire will be a good fit for a job. People who do well on a test may be more likely to do well at a job, while people with a low score on a test will do poorly at that job. There are two different types of criterion validity: concurrent and predictive.

Concurrent Validity

Concurrent validity occurs when criterion measures are obtained at the same time as test scores, indicating the ability of test scores to estimate an individual’s current state. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker.

Predictive Validity

Predictive validity is when the criterion measures are obtained at a time after the test. Examples of tests with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.

Construct Validity

A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. A valid intelligence test should be able to accurately measure the construct of intelligence rather than other characteristics, such as memory or education level. Essentially, construct validity looks at whether a test covers the full range of behaviors that make up the construct being measured. The procedure here is to identify necessary tasks to perform a job like typing, design, or physical ability. In order to demonstrate the construct validity of a selection procedure, the behaviors demonstrated in the selection should be a representative sample of the behaviors of the job.

Face Validity

Face validity is one of the most basic measures of validity. Essentially, researchers are simply taking the validity of the test at face value by looking at whether it appears to measure the target variable. On a measure of happiness, for example, the test would be said to have face validity if it appeared to actually measure levels of happiness. Obviously, face validity only means that the test looks like it works. It does not mean that the test has been proven to work. However, if the measure seems to be valid at this point, researchers may investigate further in order to determine whether the test is valid and should be used in the future. A survey asking people which political candidate they plan to vote for would be said to have high face validity, while a complex test used as part of a psychological experiment that looks at a variety of values, characteristics, and behaviors might be said to have low face validity because the exact purpose of the test is not immediately clear, particularly to the participants.

Reliability vs. Validity

While validity examines how well a test measures what it is intended to measure, reliability refers to how consistent the results are. There are four ways to assess reliability:

Internal consistency: Internal consistency examines the consistency of different items within the same test. Inter-rater: In this method, multiple independent judges score the test on its reliability. Parallel or alternate forms: This approach uses different forms of the same test and compares the results.Test-retest: This measures the reliability of results by administering the same test at different points in time.

It’s important to remember that a test can be reliable without being valid. Consistent results do not always indicate that a test is measuring what researchers designed it to.