Feedback in Spearman correlation. Correlation analysis using the Spearman method (Spearman ranks)

Pearson correlation coefficient

Coefficient r- Pearson is used to study the relationship between two metric variables measured on the same sample. There are many situations in which its use is appropriate. Does intelligence affect academic performance in senior university years? Is the size of an employee's salary related to his friendliness towards colleagues? Does a student’s mood affect the success of solving a complex arithmetic problem? To answer such questions, the researcher must measure two indicators of interest for each member of the sample.

The value of the correlation coefficient is not affected by the units of measurement in which the characteristics are presented. Consequently, any linear transformations of features (multiplying by a constant, adding a constant) do not change the value of the correlation coefficient. An exception is the multiplication of one of the signs by a negative constant: the correlation coefficient changes its sign to the opposite.

Application of Spearman and Pearson correlation.

Pearson correlation is a measure of the linear relationship between two variables. It allows you to determine how proportional the variability of two variables is. If the variables are proportional to each other, then the relationship between them can be graphically represented as a straight line with a positive (direct proportion) or negative (inverse proportion) slope.

In practice, the relationship between two variables, if there is one, is probabilistic and graphically looks like an ellipsoidal dispersion cloud. This ellipsoid, however, can be represented (approximated) as a straight line, or regression line. A regression line is a straight line constructed using the least squares method: the sum of the squared distances (calculated along the Y axis) from each point on the scatter plot to the straight line is the minimum.

Of particular importance for assessing the accuracy of prediction is the variance of estimates of the dependent variable. Essentially, the variance of estimates of a dependent variable Y is that portion of its total variance that is due to the influence of the independent variable X. In other words, the ratio of the variance of estimates of the dependent variable to its true variance is equal to the square of the correlation coefficient.

The square of the correlation coefficient between the dependent and independent variables represents the proportion of variance in the dependent variable that is due to the influence of the independent variable and is called the coefficient of determination. The coefficient of determination thus shows the extent to which the variability of one variable is caused (determined) by the influence of another variable.

The determination coefficient has an important advantage over the correlation coefficient. Correlation is not a linear function of the relationship between two variables. Therefore, the arithmetic mean of the correlation coefficients for several samples does not coincide with the correlation calculated immediately for all subjects from these samples (i.e., the correlation coefficient is not additive). On the contrary, the coefficient of determination reflects the relationship linearly and is therefore additive: it can be averaged over several samples.

Additional information about the strength of the connection is provided by the value of the correlation coefficient squared - the coefficient of determination: this is the part of the variance of one variable that can be explained by the influence of another variable. Unlike the correlation coefficient, the coefficient of determination increases linearly with increasing connection strength.

Spearman correlation coefficients and τ - Kendall ( rank correlations )

If both variables between which the relationship is being studied are presented on an ordinal scale, or one of them is on an ordinal scale and the other on a metric scale, then rank correlation coefficients are used: Spearman or τ - Kendella. Both coefficients require a preliminary ranking of both variables for their application.

Spearman's rank correlation coefficient is a non-parametric method that is used for the purpose of statistically studying the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied characteristics is determined and an assessment of the closeness of the established connection is given using a quantitatively expressed coefficient.

If the members of a size group were ranked first on the x variable, then on the y variable, then the correlation between the x and y variables can be obtained simply by calculating the Pearson coefficient for the two series of ranks. Provided there are no rank relationships (i.e., no repeating ranks) for either variable, the Pearson formula can be greatly simplified computationally and converted into what is known as the Spearman formula.

The power of the Spearman rank correlation coefficient is somewhat inferior to the power of the parametric correlation coefficient.

It is advisable to use the rank correlation coefficient when there is a small number of observations. This method can be used not only for quantitative data, but also in cases where the recorded values ​​are determined by descriptive features of varying intensity.

Spearman's rank correlation coefficient with a large number of identical ranks for one or both compared variables gives rough values. Ideally, both correlated series should represent two sequences of divergent values

An alternative to the Spearman correlation for ranks is the τ correlation - Kendall. The correlation proposed by M. Kendall is based on the idea that the direction of the connection can be judged by comparing subjects in pairs: if a pair of subjects have a change in x that coincides in direction with a change in y, then this indicates a positive connection, if does not match - then about a negative connection.

Correlation coefficients were specifically designed to quantify the strength and direction of the relationship between two properties measured on numerical scales (metric or rank). As already mentioned, the maximum strength of the connection corresponds to correlation values ​​of +1 (strict direct or directly proportional connection) and -1 (strict inverse or inversely proportional connection); the absence of connection corresponds to a correlation equal to zero. Additional information about the strength of the relationship is provided by the coefficient of determination: this is the portion of the variance in one variable that can be explained by the influence of another variable.

9. Parametric methods for data comparison


Parametric comparison methods are used if your variables were measured on a metric scale.

Comparison of Variances 2- x samples according to Fisher's test .


This method allows you to test the hypothesis that the variances of the 2 general populations from which the compared samples are extracted differ from each other. Limitations of the method - the distribution of the characteristic in both samples should not differ from normal.

An alternative to comparing variances is the Levene test, for which there is no need to test for normality of distribution. This method can be used to check the assumption of equality (homogeneity) of variances before checking the significance of differences in means using the Student's test for independent samples of different sizes.

The rank correlation coefficient, proposed by K. Spearman, refers to a nonparametric measure of the relationship between variables measured on a rank scale. When calculating this coefficient, no assumptions are required about the nature of the distributions of characteristics in the population. This coefficient determines the degree of closeness of connection between ordinal characteristics, which in this case represent the ranks of the compared quantities.

The Spearman correlation coefficient also lies in the range of +1 and -1. It, like the Pearson coefficient, can be positive and negative, characterizing the direction of the relationship between two characteristics measured on a rank scale.

In principle, the number of ranked features (qualities, traits, etc.) can be any, but the process of ranking more than 20 features is difficult. It is possible that this is why the table of critical values ​​of the rank correlation coefficient was calculated only for forty ranked features (n< 40, табл. 20 приложения 6).

Spearman's rank correlation coefficient is calculated using the formula:

where n is the number of ranked features (indicators, subjects);

D is the difference between the ranks for two variables for each subject;

Sum of squared rank differences.

Using the rank correlation coefficient, consider the following example.

Example: A psychologist finds out how individual indicators of readiness for school, obtained before the start of school among 11 first-graders, are related to each other and their average performance at the end of the school year.

To solve this problem, we ranked, firstly, the values ​​of indicators of school readiness obtained upon admission to school, and, secondly, the final indicators of academic performance at the end of the year for these same students on average. We present the results in the table. 13.

Table 13

Student no.

Ranks of school readiness indicators

Average annual performance ranks

We substitute the obtained data into the formula and perform the calculation. We get:

To find the significance level, refer to the table. 20 of Appendix 6, which shows the critical values ​​for the rank correlation coefficients.

We emphasize that in table. 20 of Appendix 6, as in the table for linear Pearson correlation, all values ​​of correlation coefficients are given in absolute value. Therefore, the sign of the correlation coefficient is taken into account only when interpreting it.

Finding the significance levels in this table is carried out by the number n, i.e. by the number of subjects. In our case n = 11. For this number we find:

0.61 for P 0.05

0.76 for P 0.01

We construct the corresponding ``significance axis'':

The resulting correlation coefficient coincided with the critical value for the significance level of 1%. Consequently, it can be argued that the indicators of school readiness and the final grades of first-graders are connected by a positive correlation - in other words, the higher the indicator of school readiness, the better the first-grader studies. In terms of statistical hypotheses, the psychologist must reject the null hypothesis of similarity and accept the alternative hypothesis of the presence of differences, which suggests that the relationship between indicators of school readiness and average academic performance is different from zero.

The case of identical (equal) ranks

If there are identical ranks, the formula for calculating the Spearman linear correlation coefficient will be slightly different. In this case, two new terms are added to the formula for calculating correlation coefficients, taking into account the same ranks. They are called equal rank corrections and are added to the numerator of the calculation formula.

where n is the number of identical ranks in the first column,

k is the number of identical ranks in the second column.

If there are two groups of identical ranks in any column, then the correction formula becomes somewhat more complicated:

where n is the number of identical ranks in the first group of the ranked column,

k is the number of identical ranks in the second group of the ranked column. The modification of the formula in the general case is as follows:

Example: A psychologist, using a mental development test (MDT), conducts a study of intelligence in 12 9th grade students. At the same time, he asks teachers of literature and mathematics to rank these same students according to indicators of mental development. The task is to determine how objective indicators of mental development (SHTUR data) and expert assessments of teachers are related to each other.

We present the experimental data of this problem and the additional columns necessary to calculate the Spearman correlation coefficient in the form of a table. 14.

Table 14

Student no.

Ranks of testing using SHTURA

Expert assessments of teachers in mathematics

Expert assessments of teachers on literature

D (second and third columns)

D (second and fourth columns)

(second and third columns)

(second and fourth columns)

Since the same ranks were used in the ranking, it is necessary to check the correctness of the ranking in the second, third and fourth columns of the table. Summing each of these columns gives the same total - 78.

We check using the calculation formula. The check gives:

The fifth and sixth columns of the table show the values ​​of the difference in ranks between the psychologist’s expert assessments on the SHTUR test for each student and the values ​​of the teachers’ expert assessments, respectively, in mathematics and literature. The sum of the rank difference values ​​must be equal to zero. Summing the D values ​​in the fifth and sixth columns gave the desired result. Therefore, the subtraction of ranks was carried out correctly. A similar check must be done every time when conducting complex types of ranking.

Before starting the calculation using the formula, it is necessary to calculate corrections for the same ranks for the second, third and fourth columns of the table.

In our case, in the second column of the table there are two identical ranks, therefore, according to the formula, the value of the correction D1 will be:

The third column has three identical ranks, therefore, according to the formula, the value of the correction D2 will be:

In the fourth column of the table there are two groups of three identical ranks, therefore, according to the formula, the value of the correction D3 will be:

Before proceeding to solve the problem, let us recall that the psychologist is clarifying two questions - how the values ​​of ranks on the SHTUR test are related to expert assessments in mathematics and literature. That is why the calculation is carried out twice.

We calculate the first ranking coefficient taking into account additives according to the formula. We get:

Let's calculate without taking into account the additive:

As we can see, the difference in the values ​​of the correlation coefficients turned out to be very insignificant.

We calculate the second ranking coefficient taking into account additives according to the formula. We get:

Let's calculate without taking into account the additive:

Again, the differences were very small. Since the number of students in both cases is the same, according to Table. 20 of Appendix 6 we find the critical values ​​at n = 12 for both correlation coefficients at once.

0.58 for P 0.05

0.73 for P 0.01

We plot the first value on the ``significance axis'':

In the first case, the obtained rank correlation coefficient is in the zone of significance. Therefore, the psychologist must reject the null hypothesis that the correlation coefficient is similar to zero and accept the alternative hypothesis that the correlation coefficient is significantly different from zero. In other words, the obtained result suggests that the higher the students’ expert assessments on the SHTUR test, the higher their expert assessments in mathematics.

We plot the second value on the ``significance axis'':

In the second case, the rank correlation coefficient is in the zone of uncertainty. Therefore, a psychologist can accept the null Hypothesis that the correlation coefficient is similar to zero and reject the alternative Hypothesis that the correlation coefficient is significantly different from zero. In this case, the result obtained suggests that students’ expert assessments on the SHTUR test are not related to expert assessments on literature.

To apply the Spearman correlation coefficient, the following conditions must be met:

1. The variables being compared must be obtained on an ordinal (rank) scale, but can also be measured on an interval and ratio scale.

2. The nature of the distribution of correlated quantities does not matter.

3. The number of varying characteristics in the compared variables X and Y must be the same.

Tables for determining the critical values ​​of the Spearman correlation coefficient (Table 20, Appendix 6) are calculated from the number of characteristics equal to n = 5 to n = 40, and with a larger number of compared variables, the table for the Pearson correlation coefficient should be used (Table 19, Appendix 6). Finding critical values ​​is carried out at k = n.

​ Spearman's rank correlation coefficient is a non-parametric method that is used to statistically study the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied characteristics is determined and an assessment of the closeness of the established connection is given using a quantitatively expressed coefficient.

1. History of the development of the rank correlation coefficient

This criterion was developed and proposed for correlation analysis in 1904 Charles Edward Spearman, English psychologist, professor at the Universities of London and Chesterfield.

2. What is the Spearman coefficient used for?

Spearman's rank correlation coefficient is used to identify and evaluate the closeness of the relationship between two series of compared quantitative indicators. In the event that the ranks of indicators, ordered by degree of increase or decrease, in most cases coincide (a greater value of one indicator corresponds to a greater value of another indicator - for example, when comparing the patient's height and body weight), it is concluded that there is direct correlation connection. If the ranks of indicators have the opposite direction (a higher value of one indicator corresponds to a lower value of another - for example, when comparing age and heart rate), then they talk about reverse connections between indicators.

    The Spearman correlation coefficient has the following properties:
  1. The correlation coefficient can take values ​​from minus one to one, and with rs=1 there is a strictly direct relationship, and with rs= -1 there is a strictly feedback relationship.
  2. If the correlation coefficient is negative, then there is a feedback relationship; if it is positive, then there is a direct relationship.
  3. If the correlation coefficient is zero, then there is practically no connection between the quantities.
  4. The closer the module of the correlation coefficient is to unity, the stronger the relationship between the measured quantities.

3. In what cases can the Spearman coefficient be used?

Due to the fact that the coefficient is a method nonparametric analysis, no test for normal distribution is required.

Comparable indicators can be measured both in continuous scale(for example, the number of red blood cells in 1 μl of blood), and in ordinal(for example, expert assessment points from 1 to 5).

The effectiveness and quality of the Spearman assessment decreases if the difference between the different values ​​of any of the measured quantities is large enough. It is not recommended to use the Spearman coefficient if there is an uneven distribution of the values ​​of the measured quantity.

4. How to calculate the Spearman coefficient?

Calculation of the Spearman rank correlation coefficient includes the following steps:

5. How to interpret the Spearman coefficient value?

When using the rank correlation coefficient, the closeness of the connection between characteristics is conditionally assessed, considering coefficient values ​​equal to 0.3 or less as indicators of weak connection; values ​​more than 0.4, but less than 0.7 are indicators of moderate closeness of connection, and values ​​of 0.7 or more are indicators of high closeness of connection.

The statistical significance of the obtained coefficient is assessed using Student's t-test. If the calculated t-test value is less than the tabulated value for a given number of degrees of freedom, the observed relationship is not statistically significant. If it is greater, then the correlation is considered statistically significant.

A psychology student (sociologist, manager, manager, etc.) is often interested in how two or more variables are related to each other in one or more groups being studied.

In mathematics, to describe the relationships between variable quantities, the concept of a function F is used, which associates each specific value of the independent variable X with a specific value of the dependent variable Y. The resulting dependence is denoted as Y=F(X).

At the same time, the types of correlations between the measured characteristics can be different: for example, the correlation can be linear and nonlinear, positive and negative. It is linear - if with an increase or decrease in one variable X, the second variable Y, on average, either also increases or decreases. It is nonlinear if, with an increase in one quantity, the nature of the change in the second is not linear, but is described by other laws.

The correlation will be positive if, with an increase in the variable X, the variable Y on average also increases, and if, with an increase in X, the variable Y tends to decrease on average, then we speak of the presence of a negative correlation. It is possible that it is impossible to establish any relationship between variables. In this case, they say there is no correlation.

The task of correlation analysis comes down to establishing the direction (positive or negative) and form (linear, nonlinear) of the relationship between varying characteristics, measuring its closeness, and, finally, checking the level of significance of the obtained correlation coefficients.

The rank correlation coefficient, proposed by K. Spearman, refers to a nonparametric measure of the relationship between variables measured on a rank scale. When calculating this coefficient, no assumptions are required about the nature of the distributions of characteristics in the population. This coefficient determines the degree of closeness of connection between ordinal characteristics, which in this case represent the ranks of the compared quantities.

Spearman's rank linear correlation coefficient is calculated using the formula:

where n is the number of ranked features (indicators, subjects);
D is the difference between the ranks for two variables for each subject;
D2 is the sum of squared differences of ranks.

The critical values ​​of the Spearman rank correlation coefficient are presented below:

The value of Spearman's linear correlation coefficient lies in the range of +1 and -1. Spearman's linear correlation coefficient can be positive or negative, characterizing the direction of the relationship between two characteristics measured on a rank scale.

If the correlation coefficient in absolute value is close to 1, then this corresponds to a high level of connection between the variables. So, in particular, when a variable is correlated with itself, the value of the correlation coefficient will be equal to +1. Such a relationship characterizes a directly proportional dependence. If the values ​​of the variable X are arranged in ascending order, and the same values ​​(now designated as variable Y) are arranged in descending order, then in this case the correlation between the variables X and Y will be exactly -1. This value of the correlation coefficient characterizes an inversely proportional relationship.

The sign of the correlation coefficient is very important for interpreting the resulting relationship. If the sign of the linear correlation coefficient is plus, then the relationship between the correlating features is such that a larger value of one feature (variable) corresponds to a larger value of another feature (another variable). In other words, if one indicator (variable) increases, then the other indicator (variable) increases accordingly. This dependence is called a directly proportional dependence.

If a minus sign is received, then a larger value of one characteristic corresponds to a smaller value of another. In other words, if there is a minus sign, an increase in one variable (sign, value) corresponds to a decrease in another variable. This dependence is called inversely proportional dependence. In this case, the choice of the variable to which the character (tendency) of increase is assigned is arbitrary. It can be either variable X or variable Y. However, if variable X is considered to increase, then variable Y will correspondingly decrease, and vice versa.

Let's look at the example of Spearman correlation.

The psychologist finds out how individual indicators of readiness for school, obtained before the start of school among 11 first-graders, are related to each other and their average performance at the end of the school year.

To solve this problem, we ranked, firstly, the values ​​of indicators of school readiness obtained upon admission to school, and, secondly, the final indicators of academic performance at the end of the year for these same students on average. We present the results in the table:

We substitute the obtained data into the above formula and perform the calculation. We get:

To find the level of significance, we refer to the table “Critical values ​​of the Spearman rank correlation coefficient,” which shows the critical values ​​for the rank correlation coefficients.

We construct the corresponding “axis of significance”:

The resulting correlation coefficient coincided with the critical value for the significance level of 1%. Consequently, it can be argued that the indicators of school readiness and the final grades of first-graders are connected by a positive correlation - in other words, the higher the indicator of school readiness, the better the first-grader studies. In terms of statistical hypotheses, the psychologist must reject the null (H0) hypothesis of similarity and accept the alternative (H1) of differences, which suggests that the relationship between indicators of school readiness and average academic performance is different from zero.

Spearman correlation. Correlation analysis using the Spearman method. Spearman ranks. Spearman correlation coefficient. Spearman rank correlation

Correlation analysis is a method that allows one to detect dependencies between a certain number of random variables. The purpose of correlation analysis is to identify an assessment of the strength of connections between such random variables or features that characterize certain real processes.

Today we propose to consider how Spearman correlation analysis is used to visually display the forms of communication in practical trading.

Spearman correlation or basis of correlation analysis

In order to understand what correlation analysis is, you first need to understand the concept of correlation.

At the same time, if the price starts to move in the direction you need, you need to unlock your positions in time.


For this strategy, which is based on correlation analysis, trading instruments with a high degree of correlation are best suited (EUR/USD and GBP/USD, EUR/AUD and EUR/NZD, AUD/USD and NZD/USD, CFD contracts and the like) .

Video: Application of Spearman correlation in the Forex market



Did you like the article? Share with your friends!