What do you mean by a test or an instrument which is used in accurately measuring what its supposed to in research?
Reliability and validity are important aspects of selecting a survey instrument. Reliability refers to the extent that the instrument yields the same results over multiple trials. Validity refers to the extent that the instrument measures what it was designed to measure. In research, there are three ways to approach validity and they include content validity, construct validity, and criterion-related validity. Show
Content validity measures the extent to which the items that comprise the scale accurately represent or measure the information that is being assessed. Are the questions that are asked representative of the possible questions that could be asked? Discover How We Assist to Edit Your Dissertation ChaptersAligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.
Construct validity measures what the calculated scores mean and if they can be generalized. Construct validity uses statistical analyses, such as correlations, to verify the relevance of the questions. Questions from an existing, similar instrument, that has been found reliable, can be correlated with questions from the instrument under examination to determine if construct validity is present. If the scores are highly correlated it is called convergent validity. If convergent validity exists, construct validity is supported. Criterion-related validity has to do with how well the scores from the instrument predict a known outcome they are expected to predict. Statistical analyses, such as correlations, are used to determine if criterion-related validity exists. Scores from the instrument in question should be correlated with an item they are known to predict. If a correlation of > .60 exists, criterion related validity exists as well. Reliability can be assessed with the test-retest method, alternative form method, internal consistency method, the split-halves method, and inter-rater reliability. Test-retest is a method that administers the same instrument to the same sample at two different points in time, perhaps one year intervals. If the scores at both time periods are highly correlated, > .60, they can be considered reliable. The alternative form method requires two different instruments consisting of similar content. The same sample must take both instruments and the scores from both instruments must be correlated. If the correlations are high, the instrument is considered reliable. Internal consistency uses one instrument administered only once. The coefficient alpha (or Cronbach’s alpha) is used to assess the internal consistency of the item. If the alpha value is .70 or higher, the instrument is considered reliable. The split-halves method also requires one test administered once. The number of items in the scale are divided into halves and a correlation is taken to estimate the reliability of each half of the test. To estimate the reliability of the entire survey, the Spearman-Brown correction must be applied. Inter-rater reliability involves comparing the observations of two or more individuals and assessing the agreement of the observations. Kappa values can be calculated in this instance. Chapter 3: Understanding Test Quality-Concepts of Reliability and ValidityTest reliability and validity are two technical properties of a test that indicate the quality and usefulness of the test. These are the two most important features of a test. You should examine these features when evaluating the suitability of the test for your use. This chapter provides a simplified explanation of these two complex ideas. These explanations will help you to understand reliability and validity information reported in test manuals and reviews and use that information to evaluate the suitability of a test for your use. Chapter Highlights
Principles of Assessment Discussed Use only reliable assessment instruments and procedures. Use only assessment procedures and instruments that have been demonstrated to be valid for the specific purpose for which they are being used. Use assessment tools that are appropriate for the target population. What makes a good test?An employment test is considered "good" if the following can be said about it:
The degree to which a test has these qualities is indicated by two technical properties: reliability and validity. Test reliabilityReliability refers to how dependably or consistently a test measures a characteristic. If a person takes the test again, will he or she get a similar test score, or a much different score? A test that yields similar scores for a person who repeats the test is said to measure a characteristic reliably. How do we account for an individual who does not get exactly the same test score every time he or she takes the test? Some possible reasons are the following:
Principle of Assessment: Use only reliable assessment instruments and procedures. In other words, use only assessment tools that provide dependable and consistent information. These factors are sources of chance or random measurement error in the assessment process. If there were no random errors of measurement, the individual would get the same test score, the individual's "true" score, each time. The degree to which test scores are unaffected by measurement errors is an indication of the reliability of the test. Reliable assessment tools produce dependable, repeatable, and consistent information about people. In order to meaningfully interpret test scores and make useful employment or career-related decisions, you need reliable tools. This brings us to the next principle of assessment. Interpretation of reliability information from test manuals and reviewsTest manuals and independent review of tests provide information on test reliability. The following discussion will help you interpret the reliability information about any test. The reliability of a test is indicated by the reliability coefficient. It is denoted by the letter "r," and is expressed as a number ranging between 0 and 1.00, with r = 0 indicating no reliability, and r = 1.00 indicating perfect reliability. Do not expect to find a test with perfect reliability. Generally, you will see the reliability of a test as a decimal, for example, r = .80 or r = .93. The larger the reliability coefficient, the more repeatable or reliable the test scores. Table 1 serves as a general guideline for interpreting test reliability. However, do not select or reject a test solely based on the size of its reliability coefficient. To evaluate a test's reliability, you should consider the type of test, the type of reliability estimate reported, and the context in which the test will be used. Table 1. General Guidelines for
Types of reliability estimatesThere are several types of reliability estimates, each influenced by different sources of measurement error. Test developers have the responsibility of reporting the reliability estimates that are relevant for a particular test. Before deciding to use a test, read the test manual and any independent reviews to determine if its reliability is acceptable. The acceptable level of reliability will differ depending on the type of test and the reliability estimate used. The discussion in Table 2 should help you develop some familiarity with the different kinds of reliability estimates reported in test manuals and reviews.
|
Validity coefficient value | Interpretation |
---|---|
above .35 | very beneficial |
.21 - .35 | likely to be useful |
.11 - .20 | depends on circumstances |
below .11 | unlikely to be useful |
As a general rule, the higher the validity coefficient the more beneficial it is to use the test. Validity coefficients of r =.21 to r =.35 are typical for a single test. Validities for selection systems that use multiple tests will probably be higher because you are using different tools to measure/predict different aspects of performance, where a single test is more likely to measure or predict fewer aspects of total performance. Table 3 serves as a general guideline for interpreting test validity for a single test. Evaluating test validity is a sophisticated task, and you might require the services of a testing expert. In addition to the magnitude of the validity coefficient, you should also consider at a minimum the following factors:
- level of adverse impact associated with your assessment tool
- selection ratio (number of applicants versus the number of openings)
- cost of a hiring error
- cost of the
selection tool
- probability of hiring qualified applicant based on chance alone.
Here are three scenarios illustrating why you should consider these factors, individually and in combination with one another, when evaluating validity coefficients:
Scenario One
You are in the process of hiring applicants where you have a high selection ratio and are filling positions that do not require a great deal of skill. In this situation, you might be willing
to accept a selection tool that has validity considered "likely to be useful" or even "depends on circumstances" because you need to fill the positions, you do not have many applicants to choose from, and the level of skill required is not that high.
Now, let's change the situation.
Scenario Two
You are recruiting for jobs that require a high level of accuracy, and a mistake made by a worker could be dangerous and costly. With these additional factors, a slightly lower
validity coefficient would probably not be acceptable to you because hiring an unqualified worker would be too much of a risk. In this case you would probably want to use a selection tool that reported validities considered to be "very beneficial" because a hiring error would be too costly to your company.
Here is another scenario that shows why you need to consider multiple factors when evaluating the validity of assessment tools.
Scenario Three
A company you are working
for is considering using a very costly selection system that results in fairly high levels of adverse impact. You decide to implement the selection tool because the assessment tools you found with lower adverse impact had substantially lower validity, were just as costly, and making mistakes in hiring decisions would be too much of a risk for your company. Your company decided to implement the assessment given the difficulty in hiring for the particular positions, the "very beneficial" validity
of the assessment and your failed attempts to find alternative instruments with less adverse impact. However, your company will continue efforts to find ways of reducing the adverse impact of the system.
Again, these examples demonstrate the complexity of evaluating the validity of assessments. Multiple factors need to be considered in most situations. You might want to seek the assistance of a testing expert (for example, an industrial/organizational psychologist) to evaluate the appropriateness of particular assessments for your employment situation.
When properly applied, the use of valid and reliable assessment instruments will help you make better decisions. Additionally, by using a variety of assessment tools as part of an assessment program, you can more fully assess the skills and capabilities of people, while reducing the effects of errors associated with any one tool on your decision making.
A document by
the:
U.S. Department of Labor
Employment and Training Administration
1999