A positive correlation shows that. What does the concept of correlation mean in simple words?

What is Correlation? The meaning of the word “Correlation” in popular dictionaries and encyclopedias, examples of the use of the term in everyday life.

Correlation Canonical

A generalization of pairwise correlation used to determine the relationship between two groups of traits. Kanonich. analysis, i.e., the method of finding K.k., is based on the construction of such linear combinations of characteristics of one and another group that the usual pairwise correlation coefficient between these combinations reaches its greatest value. This maximum coefficient is called the first canonical. correlation coefficient, and the corresponding linear combinations of two groups of characteristics are called. the first canonical quantities. See Kendall M.J., Stewart A. Multivariate static analysis and time series. M., 1976; Vold G. Path models with latent variables // Mathematics in sociology: modeling and information processing M., 1977; Bolch B., Huan K.J. Multivariate statistical methods for economics. M., 1979; Dubrovsky S.A. Applied Multivariate Statistical Analysis 1982; Lipovetsky S.S. Some models of Gunner's analysis as extremals of quadratic and bilinear forms//Comprehensive application of mathematical methods in sociological research. M., 1983; Van den Wollenberg A.L. Redundancy: An alternative for canonical correlation analysis//Psychometrica. 1977. Vol. 42, No. 2. C.C. Lipovetsky, L.G. Badalyan.

Date of publication: 09/03/2017 13:01

The term “correlation” is actively used in the humanities and medicine; often appears in the media. Correlations play a key role in psychology. In particular, the calculation of correlations is an important stage in the implementation of empirical research when writing thesis on psychology.

Materials on correlations on the Internet are too scientific. It is difficult for a non-specialist to understand the formulas. At the same time, understanding the meaning of correlations is necessary for a marketer, sociologist, physician, psychologist - anyone who conducts research on people.

In this article, we will explain in simple language the essence of correlation, types of correlations, methods of calculation, features of the use of correlation in psychological research, as well as when writing dissertations in psychology.

Content

What is correlation

Correlation is connection. But not just any one. What is its peculiarity? Let's look at an example.

Imagine that you are driving a car. You press the gas pedal and the car goes faster. You slow down the gas - the car slows down. Even a person not familiar with the structure of a car will say: “There is a direct connection between the gas pedal and the speed of the car: the harder the pedal is pressed, the higher the speed.”

This is a functional relationship - speed is a direct function of the gas pedal. The specialist will explain that the pedal controls the supply of fuel to the cylinders, where the mixture is burned, which leads to an increase in power to the shaft, etc. This connection is rigid, deterministic, and does not allow exceptions (provided that the machine is working properly).

Now imagine that you are the director of a company whose employees sell products. You decide to increase sales by increasing employee salaries. You increase your salary by 10%, and sales on average for the company increase. After a while, you increase it by another 10%, and again there is growth. Then another 5%, and again there is an effect. The conclusion suggests itself - there is a direct relationship between the company's sales and the salaries of employees - the higher the salaries, the higher the organization's sales. Is this the same connection as between the gas pedal and the speed of the car? What's the key difference?

That's right, the relationship between salary and sales is not strict. This means that some of the employees’ sales could even decrease, despite the salary increase. Some will remain unchanged. But on average, sales for the company have increased, and we say that there is a connection between sales and employee salaries, and it is correlational.

The functional connection (gas pedal - speed) is based on a physical law. The basis of the correlation relationship (sales - salary) is the simple consistency of changes in two indicators. There is no law (in the physical sense of the word) behind correlation. There is only a probabilistic (stochastic) pattern.

Numerical expression of the correlation dependence

So, the correlation relationship reflects the dependence between phenomena. If these phenomena can be measured, then it receives a numerical expression.

For example, the role of reading in people's lives is being studied. The researchers took a group of 40 people and measured two indicators for each subject: 1) how much time he reads per week; 2) to what extent he considers himself prosperous (on a scale from 1 to 10). The scientists entered this data into two columns and used a statistical program to calculate the correlation between reading and well-being. Let's say they got the following result -0.76. But what does this number mean? How to interpret it? Let's figure it out.

The resulting number is called the correlation coefficient. To interpret it correctly, it is important to consider the following:

  1. The “+” or “-” sign reflects the direction of the dependence.
  2. The value of the coefficient reflects the strength of the dependence.

Direct and reverse

The plus sign in front of the coefficient indicates that the relationship between phenomena or indicators is direct. That is, the greater one indicator, the greater the other. Higher salary means higher sales. This correlation is called direct, or positive.

If the coefficient has a minus sign, it means the correlation is inverse, or negative. In this case, the higher one indicator, the lower the other. In the reading and well-being example, we found -0.76, which means that the more people read, the lower their level of well-being.

Strong and weak

A correlation in numerical terms is a number in the range from -1 to +1. Denoted by the letter "r". The higher the number (ignoring the sign), the stronger the correlation.

The lower the numerical value of the coefficient, the less the relationship between phenomena and indicators.

The maximum possible dependency strength is 1 or -1. How to understand and present this?

Let's look at an example. They took 10 students and measured their intelligence level (IQ) and academic performance for the semester. Arranged this data in the form of two columns.

Subject

IQ

Academic performance (points)

Look carefully at the data in the table. From 1 to 10 the test subject's IQ level increases. But the level of achievement is also increasing. Of any two students, the one with the higher IQ will perform better. And there will be no exceptions to this rule.

Here is an example of a complete, 100% consistent change in two indicators in a group. And this is an example of the greatest possible positive relationship. That is, the correlation between intelligence and academic performance is equal to 1.

Let's look at another example. The same 10 students were assessed using a survey to what extent they feel successful in communicating with the opposite sex (on a scale from 1 to 10).

Subject

IQ

Success in communicating with the opposite sex (points)

Let's look carefully at the data in the table. From 1 to 10 the test subject's IQ level increases. At the same time, in the last column the level of success in communicating with the opposite sex consistently decreases. Of any two students, the one with the lower IQ will be more successful in communicating with the opposite sex. And there will be no exceptions to this rule.

This is an example of complete consistency in changes in two indicators in a group - the maximum possible negative relationship. The correlation between IQ and success in communicating with the opposite sex is -1.

How can we understand the meaning of a correlation equal to zero (0)? This means there is no connection between the indicators. Let's return to our students once again and consider another indicator measured by them - the length of their standing jump.

Subject

IQ

Standing jump length (m)

There is no consistency observed between person-to-person variation in IQ and jump length. This indicates the absence of correlation. The correlation coefficient between IQ and standing jump length among students is 0.

We've looked at edge cases. In real measurements, the coefficients are rarely exactly equal to 1 or 0. The following scale is adopted:

  • if the coefficient is more than 0.70, the relationship between the indicators is strong;
  • from 0.30 to 0.70 - moderate connection,
  • less than 0.30 - the relationship is weak.

If we evaluate the correlation between reading and well-being that we obtained above on this scale, it turns out that this relationship is strong and negative -0.76. That is, there is a strong negative relationship between being well-read and well-being. Which once again confirms the biblical wisdom about the relationship between wisdom and sorrow.

The given gradation gives very rough estimates and is rarely used in research in this form.

Gradations of coefficients according to significance levels are more often used. In this case, the actual coefficient obtained may or may not be significant. This can be determined by comparing its value with the critical value of the correlation coefficient taken from a special table. Moreover, these critical values ​​depend on the size of the sample (the larger the volume, the lower the critical value).

Correlation analysis in psychology

The correlation method is one of the main ones in psychological research. And this is no coincidence, because psychology strives to be an exact science. Is it working?

What are the peculiarities of laws in the exact sciences? For example, the law of gravity in physics operates without exception: the greater the mass of a body, the stronger it attracts other bodies. This physical law reflects the relationship between body mass and gravity.

In psychology the situation is different. For example, psychologists publish data on the connection between warm relationships in childhood with parents and the level of creativity in adulthood. Does this mean that any of the subjects with a very warm relationship with their parents in childhood will have very high creative abilities? The answer is clear - no. There is no law like the physical one. There is no mechanism for the influence of childhood experience on adult creativity. These are our fantasies! There is consistency of data (relationships - creativity), but there is no law behind it. But there is only a correlation. Psychologists often call the identified relationships psychological patterns, emphasizing their probabilistic nature - not rigidity.

The student study example from the previous section illustrates well the use of correlations in psychology:

  1. Analysis of the relationship between psychological indicators. In our example, IQ and success in communicating with the opposite sex are psychological parameters. Identifying the correlation between them expands the understanding of the mental organization of a person, the relationships between various aspects of his personality - in this case, between the intellect and the sphere of communication.
  2. Analysis of the relationship between IQ and academic performance and jumping is an example of the connection between a psychological parameter and non-psychological ones. The results obtained reveal the features of the influence of intelligence on educational and sports activities.

Here's what a summary of the concocted student study might look like:

  1. A significant positive relationship between students' intelligence and their academic performance was revealed.
  2. There is a negative significant relationship between IQ and success in communicating with the opposite sex.
  3. There was no connection between IQ of students and the ability to jump.

Thus, the level of intelligence of students acts as a positive factor in their academic performance, while at the same time negatively affecting relationships with the opposite sex and not having a significant impact on sports success, in particular, the ability to jump.

As we see, intelligence helps students learn, but hinders them from building relationships with the opposite sex. However, it does not affect their sporting success.

The ambiguous influence of intelligence on the personality and activity of students reflects the complexity of this phenomenon in the structure of personal characteristics and the importance of continuing research in this direction. In particular, it seems important to analyze the relationship between intelligence and the psychological characteristics and activities of students, taking into account their gender.

Pearson and Spearman coefficients

Let's consider two calculation methods.

The Pearson coefficient is a special method for calculating the relationship between indicators between the severity of numerical values ​​in one group. Very simply, it boils down to the following:

  1. The values ​​of two parameters in a group of subjects are taken (for example, aggression and perfectionism).
  2. The average values ​​of each parameter in the group are found.
  3. The differences between the parameters of each subject and the average value are found.
  4. These differences are substituted into a special form to calculate the Pearson coefficient.

Spearman's rank correlation coefficient is calculated in a similar way:

  1. The values ​​of two indicators in the group of subjects are taken.
  2. The ranks of each factor in the group are found, that is, the place in the list in ascending order.
  3. The rank differences are found, squared and summed.
  4. Next, the rank differences are substituted into a special form to calculate the Spearman coefficient.

In Pearson's case, the calculation was carried out using the average value. Consequently, random outliers in the data (significant differences from the average), for example due to processing errors or unreliable responses, can significantly distort the result.

In Spearman's case, the absolute values ​​of the data do not play a role, since only their relative positions in relation to each other (ranks) are taken into account. That is, data outliers or other inaccuracies will not have a serious impact on the final result.

If the test results are correct, then the differences between the Pearson and Spearman coefficients are insignificant, while the Pearson coefficient shows a more accurate value of the relationship between the data.

How to calculate the correlation coefficient

Pearson and Spearman coefficients can be calculated manually. This may be necessary for in-depth study of statistical methods.

However, in most cases, when solving applied problems, including in psychology, it is possible to carry out calculations using special programs.

Calculation using Microsoft Excel spreadsheets

Let's return again to the example with students and consider data on their level of intelligence and the length of their standing jump. Let's enter this data (two columns) into an Excel table.

Moving the cursor to an empty cell, click the “Insert Function” option and select “CORREL” from the “Statistical” section.

The format of this function involves the selection of two data arrays: CORREL (array 1; array"). We highlight the column with IQ and jump length accordingly.

Excel spreadsheets have a formula for calculating only the Pearson coefficient.

Calculation using STATISTICA program

We enter data on intelligence and jump length into the initial data field. Next, select the option “Nonparametric tests”, “Spearman”. We select the parameters for calculation and get the following result.


As you can see, the calculation gave a result of 0.024, which differs from the Pearson result - 0.038, obtained above using Excel. However, the differences are minor.

Using correlation analysis in psychology dissertations (example)

Most topics of final qualifying papers in psychology (diplomas, coursework, master's) involve conducting correlation research (the rest are related to identifying differences in psychological indicators in different groups).

The term “correlation” itself is rarely heard in the names of topics - it is hidden behind the following formulations:

  • “The relationship between the subjective feeling of loneliness and self-actualization in women of mature age”;
  • “Features of the influence of managers’ resilience on the success of their interaction with clients in conflict situations”;
  • “Personal factors of stress resistance of employees of the Ministry of Emergency Situations.”

Thus, the words “relationship”, “influence” and “factors” are sure signs that the method of data analysis in an empirical study should be correlation analysis.

Let us briefly consider the stages of its implementation when writing a thesis in psychology on the topic: “The relationship between personal anxiety and aggressiveness in adolescents.”

1. For the calculation, raw data is required, which is usually the test results of the subjects. They are entered into a pivot table and placed in the application. This table is organized as follows:

  • each line contains data for one subject;
  • each column contains indicators on one scale for all subjects.

Subject No.

Personality anxiety

Aggressiveness

2. It is necessary to decide which of the two types of coefficients - Pearson or Spearman - will be used. We remind you that Pearson gives a more accurate result, but it is sensitive to outliers in the data. Spearman coefficients can be used with any data (except for the nominative scale), which is why they are most often used in psychology degrees.

3. Enter the table of raw data into the statistical program.

4. Calculate the value.



5. The next step is to determine whether the relationship is significant. The statistical program highlighted the results in red, which means the correlation is statistically significant at the 0.05 significance level (stated above).

However, it is useful to know how to determine significance manually. To do this, you will need a table of Spearman's critical values.

Table of critical values ​​of Spearman coefficients

Level of statistical significance

Number of subjects

p=0.05

p=0.01

p=0.001

0,88

0,96

0,99

0,81

0,92

0,97

0,75

0,88

0,95

0,71

0,83

0,93

0,67

0,63

0,77

0,87

0,74

0,85

0,58

0,71

0,82

0,55

0,68

0,53

0,66

0,78

0,51

0,64

0,76

We are interested in a significance level of 0.05 and our sample size is 10 people. At the intersection of these data we find the Spearman critical value: Rcr=0.63.

The rule is this: if the resulting empirical Spearman value is greater than or equal to the critical value, then it is statistically significant. In our case: Ramp (0.66) > Rcr (0.63), therefore, the relationship between aggressiveness and anxiety in the group of adolescents is statistically significant.

5. In the text of the thesis you need to insert data in a table in word format, and not a table from a statistical program. Below the table we describe the result obtained and interpret it.

Table 1

Spearman coefficients of aggression and anxiety in a group of adolescents

Aggressiveness

Personality anxiety

0,665*

* - statistically significant (p0,05)

Analysis of the data presented in Table 1 shows that there is a statistically significant positive relationship between aggression and anxiety in adolescents. This means that the higher the personal anxiety of adolescents, the higher the level of their aggressiveness. This result suggests that aggression for adolescents is one of the ways to relieve anxiety. Experiencing self-doubt and anxiety due to threats to self-esteem, which is especially sensitive in adolescence, a teenager often uses aggressive behavior, reducing anxiety in such an unproductive way.

6. Is it possible to talk about influence when interpreting connections? Can we say that anxiety affects aggressiveness? Strictly speaking, no. We showed above that the correlation between phenomena is probabilistic in nature and reflects only the consistency of changes in characteristics in the group. At the same time, we cannot say that this consistency is caused by the fact that one of the phenomena is the cause of the other and influences it. That is, the presence of a correlation between psychological parameters does not give grounds to talk about the existence of a cause-and-effect relationship between them. However, practice shows that the term “influence” is often used when analyzing the results of correlation analysis.

06.06.2018 12 879 0 Igor

Psychology and Society

Everything in the world is interconnected. Each person, at the level of intuition, tries to find relationships between phenomena in order to be able to influence and control them. The concept that reflects this relationship is called correlation. What does it mean in simple words?

Content:

Concept of correlation

Correlation (from the Latin “correlatio” - ratio, relationship)– a mathematical term that means a measure of statistical probabilistic dependence between random quantities (variables).



Example: Let's take two types of relationships:

  1. First- a pen in a person’s hand. In which direction the hand moves, in that direction the pen goes. If the hand is at rest, then the pen will not write. If a person presses it a little harder, the mark on the paper will be richer. This type of relationship reflects a strict dependence and is not correlational. This relationship is functional.
  2. Second type– the relationship between a person’s level of education and reading literature. It is not known in advance which people read more: those with or without higher education. This connection is random or stochastic; it is studied by statistical science, which deals exclusively with mass phenomena. If a statistical calculation makes it possible to prove the correlation between the level of education and reading literature, then this will make it possible to make any forecasts and predict the probabilistic occurrence of events. In this example, with a high degree of probability, it can be argued that people with higher education, those who are more educated, read more books. But since the connection between these parameters is not functional, we may be mistaken. You can always calculate the probability of such an error, which will be clearly small and is called the level of statistical significance (p).

Examples of relationships between natural phenomena are: the food chain in nature, the human body, which consists of organ systems that are interconnected and function as a single whole.

Every day we encounter correlations in everyday life: between the weather and a good mood, the correct formulation of goals and their achievement, a positive attitude and luck, a feeling of happiness and financial well-being. But we are looking for connections, relying not on mathematical calculations, but on myths, intuition, superstitions, and idle speculation. These phenomena are very difficult to translate into mathematical language, express in numbers, and measure. It’s another matter when we analyze phenomena that can be calculated and presented in the form of numbers. In this case, we can define correlation using the correlation coefficient (r), which reflects the strength, degree, closeness and direction of the correlation between random variables.

Strong correlation between random variables- evidence of the presence of some statistical connection specifically between these phenomena, but this connection cannot be transferred to the same phenomena, but for a different situation. Often, researchers, having obtained a significant correlation between two variables in their calculations, based on the simplicity of correlation analysis, make false intuitive assumptions about the existence of cause-and-effect relationships between characteristics, forgetting that the correlation coefficient is probabilistic in nature.

Example: the number of people injured during icy conditions and the number of road accidents among motor vehicles. These quantities will correlate with each other, although they are absolutely not interconnected, but only have a connection with the common cause of these random events - black ice. If the analysis does not reveal a correlation between phenomena, this is not yet evidence of the absence of dependence between them, which may be complex nonlinear and not revealed by correlation calculations.




The first to introduce the concept of correlation into scientific use was the French paleontologist Georges Cuvier. In the 18th century, he deduced the law of correlation of parts and organs of living organisms, thanks to which it became possible to restore the appearance of an entire fossil creature, animal, from the found parts of the body (remains). In statistics, the term correlation was first used in 1886 by an English scientist Francis Galton. But he could not derive the exact formula for calculating the correlation coefficient, but his student did it - famous mathematician and biologist Karl Pearson.

Types of correlation

By importance– highly significant, significant and insignificant.

Species

what is r equal to

Highly significant

r corresponds to the level of statistical significance p<=0,01

Significant

r corresponds to p<=0,05

Insignificant

r does not reach p>0.1

Negative(a decrease in the value of one variable leads to an increase in the level of another: the more phobias a person has, the less likely he is to occupy a leadership position) and positive (if an increase in one variable leads to an increase in the level of another: the more nervous you are, the more likely you are to get sick). If there is no connection between the variables, then such a correlation is called zero.

Linear(when one value increases or decreases, the second also increases or decreases) and nonlinear (when when one value changes, the nature of the change in the second cannot be described using a linear relationship, then other mathematical laws are applied - polynomial, hyperbolic relationships).

By strength.

Odds




Depending on which scale the variables under study belong to, different types of correlation coefficients are calculated:

  1. The Pearson correlation coefficient, pairwise linear correlation coefficient, or product moment correlation is calculated for variables with interval and quantity scales of measurement.
  2. Spearman or Kendall rank correlation coefficient - when at least one of the quantities has an ordinal scale or is not normally distributed.
  3. Point biserial correlation coefficient (Fechner sign correlation coefficient) – if one of the two quantities is dichotomous.
  4. Four-field correlation coefficient (multiple rank correlation (concordance) coefficient – ​​if two variables are dichotomous.

The Pearson coefficient refers to parametric correlation indicators, all others are non-parametric.

The correlation coefficient value ranges from -1 to +1. With a complete positive correlation, r = +1, with a complete negative correlation, r = -1.

Formula and calculation





Examples

It is necessary to determine the relationship between two variables: the level of intellectual development (according to the testing) and the number of delays per month (according to entries in the educational journal) among schoolchildren.

The initial data is presented in the table:

IQ data (x)

Data on the number of delays (y)

Sum

1122

Arithmetic mean

112,2


To give a correct interpretation of the obtained indicator, it is necessary to analyze the sign of the correlation coefficient (+ or -) and its absolute value (modulo).

In accordance with the table of classification of the correlation coefficient by strength, we conclude that rxy = -0.827 is a strong negative correlation. Thus, the number of schoolchildren being late has a very strong dependence on their level of intellectual development. It can be said that students with a high IQ level are late for classes less often than students with a low IQ level.



The correlation coefficient can be used both by scientists to confirm or refute the assumption of the dependence of two quantities or phenomena and measure its strength and significance, and by students to conduct empirical and statistical research in various subjects. It must be remembered that this indicator is not an ideal tool; it is calculated only to measure the strength of a linear relationship and will always be a probabilistic value that has a certain error.

Correlation analysis is used in the following areas:

  • economic science;
  • astrophysics;
  • social sciences (sociology, psychology, pedagogy);
  • agrochemistry;
  • metallurgy;
  • industry (for quality control);
  • hydrobiology;
  • biometrics, etc.

Reasons for the popularity of the correlation analysis method:

  1. The relative simplicity of calculating correlation coefficients does not require special mathematical education.
  2. Allows you to calculate the relationships between mass random variables, which are the subject of analysis in statistical science. In this regard, this method has become widespread in the field of statistical research.

I hope that now you will be able to distinguish a functional relationship from a correlational relationship and will know that when you hear on television or read in the press about correlation, it means a positive and fairly significant interdependence between two phenomena.

Scientific concepts are always popular. The verb “correlate” is widely used by journalists and politicians, sometimes inappropriately. Typically, the term “correlation” refers to any relationship.

People have long noticed that all the phenomena occurring on our planet influence each other to some extent. The connections between them cannot always be easily detected, but, nevertheless, they exist. When talking about the interdependence of events, the word “correlation” is often used. It is most often used by economists and analysts.

Let's figure out what this concept actually means.

Correlation: Definition

Perhaps the first in the scientific world to talk about correlation was paleontologist Georges Cuvier. At the turn of the 18th and 19th centuries, he made a number of discoveries in the field of comparative anatomy. As a result of these discoveries, Cuvier formulated the law of the relationship of parts, according to which changes in the structure of one of the organs of an animal lead to changes in the structure of other organs. Based on this knowledge, Cuvier learned to restore the appearance of fossil animals from individual surviving fragments.

As for statistics, the concept of correlation was established in this science later - at the end of the 19th century, thanks to the English biologist Francis Galton.

Correlation– this is not just a connection (relation), but rather a relationship or interdependence (co-relation).

The formula for obtaining the correlation coefficient was derived by Galton's student, mathematician and biologist K. Pearson.

Correlation coefficient

Correlation is a statistical connection between any quantities independent of each other. It is assumed that as soon as the value of one of the parameters changes, the value of the other also changes. If changes concern only individual statistical characteristics, a relationship of this kind is considered statistical. There is no question of correlation in this case.

The correlation coefficient is used to express the degree of interdependence. The range of coefficient values ​​is from -1 to +1.

  • If the correlation is absolute and positive (+1), then when the price of one security rises, the other will rise in price to the same extent.
  • When we talk about absolute negative correlation, we mean that if the value of one security rises, the value of a negatively correlated security falls.
  • If the correlation coefficient is zero, then there is no interdependence between the movements of securities: they are random.

The higher the coefficient value, the greater the interdependence. If the coefficient value is greater than 0.5, then the relationship is clearly expressed.

It is necessary to clarify that absolute correlation of securities exists only in an ideal world. In reality, stocks are only somewhat correlated.

Pairwise correlation

This term is used to denote the relationship between two specific quantities. It is known that advertising expenditures in the United States significantly influence the volume of the country's GDP. The correlation coefficient between these values ​​based on observations that lasted for 20 years is 0.9699.

A more “down-to-earth” example is the relationship between traffic to an online store’s page and its sales volume.

And, of course, hardly anyone would deny the existence of a relationship between air temperature and sales of beer or ice cream.

Correlation is the interdependence of two quantities; the correlation coefficient is an objective indicator that determines the degree of this interdependence. The correlation coefficient can be either positive or negative. As for securities, they are extremely rarely completely correlated.

A correlation between two quantities is a statistical relationship in which a change in one of the quantities leads to a systematic change in the other. A quantitative measure of correlation is the linear correlation coefficient (also called the Pearson correlation coefficient), calculated by the formula:

  • r xy – correlation coefficient of values ​​of quantities x and y;
  • d x – deviation of a certain value of the series x from the average value of this series;
  • d y – deviation of a certain value of the series y from the average value of this series.

The range of possible values ​​for the correlation coefficient is between +1 and -1. The following options are possible:

  • +1 – direct relationship between quantities;
  • |r xy| > 0.7 – pronounced dependence between values;
  • 0.4 < |r xy| >0.7 – moderately pronounced relationship between values;
  • |r xy|< 0.4 – слабо выраженная зависимость между величинами;
  • -1 – inverse relationship between quantities.

It is important to note that the larger the sample of values, the smaller the magnitude of the correlation coefficient modulus we can talk about the presence of a relationship between x and y. Unfortunately, the formula contains a trap that, when applied to financial instruments, can play a cruel joke on the investor. In the numerator, deviations of quantities can have both the same and different signs, so the product can also be both positive and negative. In the denominator, the deviations are squared, which guarantees a positive denominator. For now we will just pay attention to this, and later we will return to what can come of it.

The practical purpose of calculating the correlation between financial instruments is to obtain important fundamental data necessary for making trading decisions. The reaction of markets to the release of important economic news is expressed in the fact that first the prices of major assets (gold, oil, futures for industrial indices), and sometimes profitability, begin to move. As a result, exchange rates and stock prices change. By tracking the relationship of individual instruments, as well as cause-and-effect relationships between price changes, you can quickly revise your trading and investment plans. In addition, correlation analysis is used in management as a mandatory part.

You can visualize the correlation of two quantities as a graph in time-amplitude coordinates. For example, with a negative correlation we get a similar picture:

Knowing asset correlation reduces portfolio risks


Let, for example, there are 2 assets. For simplicity, let’s imagine that their prices depend on time according to the law of a sinusoid. Then, with a correlation of +1, we get a complete overlap of waves and opening transactions on both assets will be equivalent to doubling positions on one of them. A correlation of -1, on the contrary, means that the assets' gains and losses cancel each other out. Of course, well-chosen assets generally do not move around the same level, but tend to increase over time. In addition, with some assets, growth in others allows us to minimize the total risk of the portfolio:

A process called portfolio rebalancing allows you to generate income by alternating the proportion of assets in your portfolio. This is most easily achieved with a pronounced negative correlation. Let’s assume that initially the portfolio contained assets A and B with an inverse correlation and a 1:1 ratio, for a total amount of 1 million rubles. Within six months, asset A fell in price by 20% and its value from the original 500 thousand rubles became 400 thousand rubles. Asset B, on the contrary, increased by 20% and its value rose to 600 thousand rubles. The total value of the portfolio has not changed and remains 1 million rubles. Now we transfer 50% of asset B (300 thousand) to A and its value is now 700 thousand, and asset B is 300 thousand.

Over the next six months, the opposite process occurs: assets return to their original price. Now asset A costs 840 thousand instead of 700 thousand, and asset B costs 240 thousand instead of 300 thousand. The total value of the portfolio thus amounted to 1 million 80 thousand rubles, i.e. its profitability due to rebalancing is 8% per annum. Without rebalancing, the portfolio's return would be 0%. Real situations are much more complicated, because... Most instruments' correlations are between +0.5 and -0.5. If we consider the risk-return graph for different ratios of two instruments with different correlation values, we get the following picture:

As you can see, the lower the value of the instrument correlation coefficient, the greater the possible portfolio return for the same risk value, or the lower the risk for the same return value.

Correlation in Forex

A common strategy based on the correlation of currency pairs is that in the event of a sharp deviation of the correlation coefficient from the current value, transactions are opened in the direction of restoring this value. For example, if the EURUSD and GBPUSD pairs moved in the same direction for a long time, then if they diverge strongly, one can expect a convergence, if the divergence is not caused by long-term ones (for example, a change in the discount rate).

In addition, the correlation of currency pairs is used in a comprehensive market assessment. For example, on the eve of the mortgage crisis of 2008-2009, when the Australian and New Zealand dollars, as well as the British pound had a high key rate, a trading strategy called carry trade received great development. It consisted in the fact that when events were favorable for stock markets, pairs of these currencies with the yen, traditionally characterized by a very low rate, grew especially actively, and they also actively decreased during unfavorable events.

Despite the fact that no correlation can affect absolutely all time intervals and multidirectional movements of currencies are possible, a pronounced unidirectional movement, as a rule, indicates the presence of a common fundamental “driver”. This makes it easier to plan deals. In particular, there is no point in looking for pullbacks and working intraday if all clearly correlated pairs go in the same direction.

You can view the real-time correlation table for currency pairs and some other instruments at myfxbook.com/forex-market/correlation. From this table it can be seen that the EURUSD and AUDCAD pairs practically do not correlate with each other. If you open trades on these pairs at the same time, you don’t have to worry about the summation of losses or the overlap of profits on one pair with losses on another.

This chart shows how the Australian and New Zealand dollars, which are inversely correlated with the safe-haven currencies the yen and the Swiss franc, rose strongly during the period of greatest differential in key rates. This trend reversed after a period of rate cuts began as the mortgage crisis deepened.

There are no effects without a cause

Asset price correlation is somewhat similar to trends: the longer the time interval for its calculation, the slower it changes. But there is also something that distinguishes correlation from many other methods. It can be calculated for pairs of assets that are not traded on any exchange (oil-gas, oil-gold), which allows you to supplement the analyst’s arsenal with valuable information that allows you to “read the market between charts.”

Any correlation of two or more quantities always has a cause-and-effect relationship. One of the quantities is decisive on which the other (or others) depend. Correlation in the stock market is no exception. For example, in the oil-gas pair, oil quotes were decisive for a long time. In the graph below you can see that the widening of the spread between oil and gas due to the sharp relative growth of gas was followed by an equally sharp return to relative equilibrium:

At the same time, in another pair of assets, gold-oil, the determining factor is gold. With significant expansion (a sharp rise or fall in oil with more stable gold), it is oil that restores the disturbed balance:

By monitoring this behavior of “following” assets, you can open transactions in the direction of restoring the balance. By the way, correlation is often based on the peg of certain currencies to commodity assets. They are called “commodity currencies”. For example, the Canadian dollar and ruble are highly dependent on oil. In both cases, the correlation is direct: the more expensive the oil, the higher the exchange rate of these currencies against the US dollar.

In the case of the ruble, the correlation of the charts is so clear that it can be used in a trading strategy. Let's look at the beginning of 2014. Oil is trading around $110 per barrel, after which it rises slightly higher for a while. At this time, the ruble, on the contrary, from 33 per US dollar briefly decreases to 36. At some point, the correlation becomes almost reverse, but the balance is quickly restored and the ruble returns to the rate of 33 per dollar, obediently following oil. We see an even more striking example at the end of 2014, when there was a sharp weakening of the ruble against the backdrop of a much more gradually declining oil price. And this time, the disturbed balance was soon restored thanks to the strengthening of the ruble. Over time, the correlation can undergo strong changes and even go from direct to reverse. This was especially evident in the case of the correlation between the Dow Jones Industrial Average and RTS indices.

At the end of 2007, when the first signs of the mortgage crisis in the United States began to appear, the DJ index turned downward, but the RTS index, thanks to the active growth of oil prices, was still approaching a historical maximum. However, later the sharp collapse of all stock indices in the world also affected oil. This led to the fact that the RTS index fell almost twice as fast as the DJ. In addition to oil, the rate of decline of the RTS index was also affected by the general outflow of capital from emerging markets.

However, the crisis was short-lived and already at the beginning of 2009 was replaced by economic growth. A high correlation between DJ and RTS was observed until April 2012, which was marked by the exhaustion of the possibilities of the raw material model of development of the Russian economy. Starting this year, even expensive oil no longer ensured economic growth. Subsequently, the economic recession in Russia only worsened against the backdrop of falling oil prices, while the American economy received an additional stimulus for growth. The correlation between and became inverse.

The mere presence of a correlation between assets does not mean that you can build a trading or investment strategy on this. Let's say we are interested in the correlation of IBM shares over the last 12 months (see impactopia.com/correlation). So, in 4th place in terms of correlation is Banco Santander (about 0.43). Most likely, this is just a random coincidence or a systemic flaw in the method of calculating correlations itself.

Math trap

As I mentioned above, the formula for calculating the correlation coefficient is very sensitive to the signs of deviations of the values ​​of quantities from their average values. If these deviations more often have the same signs, a high value of the correlation coefficient is obtained. But will this value make sense? The answer is not at all obvious. Let's look at a practical example. Let’s assume that on the graphs of two quantities there is simultaneously:

Then the new values ​​of these quantities will systematically be on one side of their average values. This will result in a high positive correlation. Unfortunately, this information will not be of any use, because... Apart from the presence of a gap, there is nothing in common between the charts. This is just a clear example of the fact that when calculating correlation it is allowed to use exclusively stationary series of values, i.e. series in which there is no trend component. This means that calculating correlations in the world of financial assets inevitably leads to an overestimation of the significance of factors that are actually random in nature. Understand correctly: it is important not to look for these factors and introduce special corrections for them, but to show the very essence of the phenomenon and not look for the next Grail where it does not exist.

However, it's not all bad. There is a way to bypass the influence of trends by calculating the correlation not of the prices themselves, but of their increments. Then the GEP mentioned above will turn out to be a statistical outlier that has virtually no effect on the result. It remains only to wait for this approach to prevail. It is not always possible to find up-to-date asset correlation data. In such cases, they can be calculated using Microsoft Excel. To do this, quotes are written in the form of two ranges of cells, and then a function of the following form is written in one of the free cells: =CORREL (array 1; array 2). The array might look like this, for example: A1:A100. To calculate correlation based on price increments, this program is doubly useful, because based on closing prices, you must first calculate the increments themselves.

Resume

Correlation between asset prices is an important tool for both data analysis and risk management in portfolio investments. But, like all statistical approaches, it is not without serious drawbacks:

  • the presence of a pronounced correlation between data in the past cannot guarantee it in the future;
  • the mathematical model used has large errors during trend periods.

The use of a correlation approach will bring maximum benefit in addition to other methods of analysis and money management. In the comments, I propose to discuss how you can make money by correlating specific assets. I gave my examples in the article, I look forward to yours for discussion.

Profit to everyone!



Did you like the article? Share with your friends!