Classical test theory

A test is a scientific instrument to the extent that it measures what it intends, that is, it is valid, and it measures well, that is, it is accurate or reliable. If we find an instrument that we cannot trust the measurements they provide, since they vary from time to time when we measure the same object, then we will say that it is not reliable. An instrument, to measure something correctly, has to be precise, because if not, measure what it measures, it will measure it wrong. Therefore, being precise is a necessary but not sufficient condition. In addition, it must be valid, that is, what it measures accurately will be what it is intended to measure, and nothing else.

You may also be interested in: Item Response Theory - Applications and Test

Reliability:

Absolute and relative reliability: We can approach the problem of the reliability of a test in two different ways, although basically they coincide.

Reliability and inaccuracy of its measurements: When a subject responds to a test, he obtains an empirical score, which is affected by an error. If there were no errors, the subject would get his true score. The test is imprecise because the empirical score does not match the true true score. This difference between the two scores is the sampling error, the measurement error. The standard measurement error will be the standard deviation of the measurement errors. The typical measurement error indicates the absolute precision of the test, since it allows estimating the difference between the measurement obtained and the one that would be obtained if there were no error.

Reliability and stability of measurements: A test will be more reliable the more constant or stable the results it provides are maintained when repeated. The more stable the results are on two occasions, the greater the correlation between them. This correlation is called the reliability coefficient. This tells us, not the amount of the error, but the consistency of the test with itself and the consistency of the information it offers. The reliability coefficient expresses the relative reliability of the test.

The reliability coefficient and the reliability index: - The reliability coefficient of a test is the correlation of the test with itself, obtained for example, in two parallel ways: rxx. - The precision index is the correlation between the empirical scores of a test and its true scores: rxv The precision index will always be greater than the reliability coefficient To find out the reliability coefficient, these three classic methods are worth highlighting:

Find the correlation between the test and its repetition: The repetition method or test-retest method: It consists of applying the same test to the same group on two occasions and the correlation between the two series of scores is calculated. This correlation is the reliability coefficient. This method usually gives a higher reliability coefficient than those obtained by other procedures, and may be contaminated by disturbing factors.
Find the correlation between two parallel forms of the test: The method of parallel forms: Two parallel forms of the same test are prepared, that is, two equivalent forms that give the same information, and are applied to the same group of subjects. The correlation between the two forms is the reliability coefficient. With this method, by not repeating the same test, disturbing sources of retest reliability are avoided.
Find the correlation between two parallel halves of the test: The two halves method: The test is divided into two equivalent halves and the correlation between them is found. It is the preferred method, as it is simple and avoids the limitations of the previous procedures. You can choose the odd elements of the test, to constitute one half, and the even elements to constitute the other.

The reliability coefficient and the correlation between parallel tests

The reliability coefficient of a test indicates the proportion that the true variance is of the empirical variance: graph33 The reliability coefficient of a test varies between 0 and 1. For example: if the correlation between two parallel tests is rxx '= 0'80, it means that 80% of the variance of the test is due to the true measure, and the rest, that is, 20% of the variance of the test is due to error. The reliability index of a test is the correlation between its empirical scores and its true scores Reliability index = The reliability index is equal to the square root of the reliability coefficient

Once two parallel forms of a test have been developed, the analysis of variance procedure is applied to check the homogeneity of the variances and the difference between the measures. If the variances are homogeneous, the difference between the means is not significant and the two forms are constructed with the same number of elements of the same type and psychological content, it can be said that they are parallel. If not, you have to reform them until they are. The lack of reliability is identified with the value rxx´ = 0 4.- The typical measurement error: The difference between the empirical score and the true one is the random error, called the measurement error. The standard deviation of the measurement errors is called the standard measurement error. The standard error of measurement allows estimating the absolute reliability of the test, that is, estimating how much measurement error affects a score.

Reliability and length: The length of the test refers to the number of its elements. Its reliability depends on this length. If a test consists of three elements, a subject may obtain a score of 1 on one occasion and a score of

From one occasion to another, the score has varied by one point; one point out of three is a 33% variation, a high variation. If the subjects obtain random variations of this type, the correlation of the test with itself or that of the two parallel forms of the test will be greatly lowered and cannot be high. If the test is much longer, if it has, for example, 100 elements, a subject can obtain 70 points on one occasion and 67 in a parallel way. From one time to another it has varied 3 points; it is a relatively small variance in relation to the total test, specifically 3%. These small accidental alterations of this magnitude, which occur in the scores of the subjects, when passing from one form to the parallel, are relatively unimportant and will not decrease the correlation between the two as much as before.

The reliability coefficient will be much higher than in the previous case. The Spearman-Brown equation expresses the relationship between reliability and length. The precision of a test is null when the length is 0, and it increases as the length increases. Although the increase is relatively less as the length of the part is greater. This means that the precision grows a lot in the beginning and relatively less afterwards. When the length tends to infinity, the reliability coefficient tends to

As the length of a test increases, its precision increases because the true variance increases at a higher rate than the error variance. This means that the precision of the test increases because the proportion of variance due to the error decreases. The Rulon formula, as well as the Flanagan and Guttman formula, are especially applicable when calculating the reliability coefficient by the two-halves method. These are formulas used to calculate the reliability coefficient.

Reliability and consistency: The reliability coefficient can also be found in another way, it is the so-called alpha coefficient or coefficient of generalizability or representativeness (Cronbach). This alpha coefficient indicates the precision with which some items measure an aspect of personality or behavior. It can be interpreted as: An estimate of the mean correlation of all possible items in a certain aspect. A measure of the precision of the test as a function of its coherence or internal consistency (interrelation between its elements; to what extent the elements of the test are all measuring the same thing) and of its length. Indicating the representativeness of the test, that is, the amount in which the sample of items that composes it is representative of the population of possible items of the same type and psychological content. The alpha coefficient It mainly reflects two basic concepts in the precision of a test: 1. The interrelation between its elements: the extent to which they all measure the same thing well.

The length of the test: by increasing the number of cases in a sample, and if systematic errors are eliminated, the sample better represents the population from which it is drawn and it is less likely that chance error intervenes. If the test items are dichotomous (yes or no, 1 or 0, agree or disagree, etc), the equation for the alpha coefficient is simplified, giving rise to the Kuder-Richardson equations (KR20 and KR21). Given a certain number of items, a test will be the more reliable, the more homogeneous it is. The alpha coefficient tells us the reliability insofar as it represents homogeneity and coherence or internal consistency of the elements of a test.

Standards and reliability criteria

According to the item sample space model, the objective of the test is to estimate the measure that would be obtained if all the items in the sample space were used. This measurement would be the true score, to which the actual measurements more or less approximate. Depending on the degree to which a sample of items correlates with the true scores, the test is more or less reliable. Central to this model is the matrix of correlations between all items in the sample space. This sample model insists more directly on internal consistency, and to the extent that it achieves it, it indirectly guarantees stability.

The linear model of parallel tests insists more on the stability of the scores, and to the extent that it achieves stability, it indirectly favors internal consistency. If we apply a test to establish individual diagnoses and prognoses, the reliability coefficient must be 0.90 upwards. In forecasts and collective classifications, the requirement is not so great, although it is not convenient to stray far from 0.90 to 0.80.

Sometimes in certain kinds of tests, such as personality tests, it is difficult to achieve coefficients of more than 0.70. If the parallel shapes, or parallel halves, are applied after a more or less large interval, the chance errors may be more numerous than those affecting the alpha coefficient. This is so because what lowers the correlation is not only the random errors intrinsic to the test and on a single occasion, which are those taken into account by the alpha coefficient, but also all the errors that can come from the two different situations, which may differ in numerous details. Therefore, the alpha coefficient is usually higher than the other coefficients.

With the exception of the coefficient found by repeating the same test, since the random errors of the first application are more likely to be repeated in the second, and instead of decreasing the correlation between the two, they increase it. Care must be taken that the second application is completely independent of the first. If we achieve this, this will be the easiest and cheapest method and advisable when trying to assess the stability of the scores, especially over long periods of time and with complex tests. > Next: Validity of the tests

This article is merely informative, in Psychology-Online we do not have the power to make a diagnosis or recommend a treatment. We invite you to go to a psychologist to treat your particular case.

If you want to read more articles similar to Classical Test Theory, we recommend that you enter our category of Experimental Psychology.

Table of contents:

Reliability:

Standards and reliability criteria

Editor's choice