Constructing a correlation field from the data in the table. Correlation and regression analysis in Excel: execution instructions

Graphically, the relationship between two characteristics is depicted using the correlation field. In the coordinate system, the values ​​of the factor characteristic are plotted on the abscissa axis, and the resultant characteristic is plotted on the ordinate axis. Each intersection of lines drawn through these axes is indicated by a dot. In the absence of close connections, there is a random arrangement of points on the graph (Fig. 11.1).  


Let us depict the resulting dependence graphically with points on the coordinate plane (Fig. 3.1). Such an image of a statistical dependence is called a correlation field.  

Construct a correlation field and formulate a hypothesis about the form of the connection.  

When studying the relationship between two characteristics, the graphical method for selecting the type of regression equation is quite clear. It is based on the correlation field. The main types of curves used in the quantitative assessment of connections are presented in Fig. 2.1.  

Since not all points of the correlation field lie on the regression line, their scatter always occurs, both due to the influence of factor x, i.e., regression of y on x, and caused by other reasons (unexplained variation). The suitability of a regression line for prediction depends on how much of the total variation in the trait y is accounted for by the explained variation. Obviously, if the sum of squared deviations due to regression is greater than the residual sum of squares, then the regression equation is statistically significant and the x factor has a significant impact on the result. This is equivalent to the fact that the coefficient of determination r2 will approach unity.  

Accordingly, for the dependence shown in the correlation fields of Fig. 3.5 b) and c), the heteroscedasticity of the residuals is presented in Fig. 3.9 and 3.10.  

If the quantities are independent, then the “correlation field” or pa-  

If the correlation field can be approximated by a straight line, which is called a regression line, then proceed to the calculation of the pair correlation coefficient r. Its numerical values ​​are in the interval [-1, 1]. If r is equal to 1 or -1, then there is a functional feedforward or feedback relationship. When r is close to zero, there is no connection between the phenomena, and when r is 0.7, the connection is considered significant. The correlation coefficient is calculated using the formula  

After identifying the above-mentioned groups of railway farms, another approximate method of preliminary analysis of the homogeneity of the population for each group of railway farms was used - constructing correlation fields for each of the factors included in the study with the cost of transportation. The main sign of homogeneity or heterogeneity of the selected populations was the absence or presence of breaks and jumps in the location of points in the correlation fields.  

For study, all possible factors were pre-selected through professional logical analysis, data on changes in which for enterprises are available in the ministry’s reports. Such factors should be considered the total volume of transportation, the average productivity of cars and locomotives of the working fleet, freight intensity, capital intensity of a transportation unit and labor productivity, etc. (11 factors in total). Thus, 44 correlation fields were constructed for four groups of enterprises.  

After determining the indicated quantities, an equation of pair dependence is obtained, the graphical representation of which in coordinate axes is called a theoretical regression line. If we plot all the measurements on such a field, and not just the theoretical regression line, then we get a correlation field.  

We systematize the source material in the correlation field and in the correlation table. In our example, the factor is the cost of machines Cm, and the function is the average annual number of workers R.  

As a result of the breakdown into intervals, the entire plane on which measurements are plotted for both characteristics k and y, called the correlation field, will represent cells, and each measurement is characterized not by the exact values ​​of its coordinates, but only by the values ​​of the interval to which it is assigned.  

In Fig. Figure 16 shows a correlation field on which the x-axis shows the intervals for the values ​​of the argument Ci, and the y-axis shows the intervals for the value of the function P. The correlation field constructed in this way is called secondary.  

A primary correlation field can also be constructed to select intervals. All points on this field are marked taking into account the values ​​of their coordinates. The intervals are outlined based on the density of the points.  

Along with constructing the correlation field, as indicated above, a correlation table is compiled in which all calculations related to determining averages, constructing an empirical regression line and initial data for determining parameters in a system of normal equations are made.  

In table 36 all material is distributed into intervals. Using it, we build a secondary correlation field on which we plot all the values ​​of the variables, and determine the average values ​​(/, //,..., pn over the intervals. By connecting the average values ​​in each interval with straight line segments, we obtain an empirical regression line ( see fig. 16).  

Reconstructing the perpendicular to the x-axis from the center of each interval, we plot on each of them the corresponding values ​​for the intervals r/, = 1081, 1/2 = 1774, etc. We connect the resulting points with straight segments. The resulting broken line is an empirical regression line for the relationship between the cost of machines Cm and the number of workers R. By analogy with the calculations performed, we can build correlation tables and correlation fields to identify the relationship between the number of workers P, the volume of work O, the number of prefabricated concrete and reinforced concrete structures /Izh.b.  
Rice. 18. Correlation table and secondary correlation field between the number of workers and the volume of use of prefabricated reinforced concrete structures /info/5440">The equations of paired regression and the multiple regression derived later are applicable if the variables change within the following limits: number of workers - from 850 to 7850 people, cost of machines - from 0.15 to 3.15 million rubles. , the volume of prefabricated structures is from 10 to 230 thousand m and is plotted along the vertical axis, and the independent values ​​are plotted along the horizontal axis. The correlation field is used to determine the form of the relationship between variables. The graph gives the researcher the first.  

The third premise of OLS requires that the variance of the residuals be homoscedastic. This means that for each value of the factor Xj, the residuals e, - have the same variance. If this condition for using OLS is not met, then heteroscedasticity occurs. The presence of heteroscedasticity can be clearly seen from the correlation field (Fig. 3.5).  

Another typical research problem - assessing the relationship between phenomena - is solved using the correlation theory apparatus, well developed in mathematical statistics. To do this, it is necessary to have samples of compared phenomena shown on maps of different subjects (for example, D and C). The values ​​of a and b are taken at the same i points, i.e. strictly coordinated, and then plot the correlation field.  

A visual representation of a correlation table is the correlation field. It is a graph where X values ​​are plotted on the abscissa axis, Y values ​​are plotted on the ordinate axis, and combinations of X and Y are shown by dots. By the location of the dots, one can judge the presence of a connection.

Using the graphical method.

This method is used to visually depict the form of connection between the studied economic indicators. To do this, a graph is drawn in a rectangular coordinate system, the individual values ​​of the resultant characteristic Y are plotted along the ordinate axis, and the individual values ​​of the factor characteristic X are plotted along the abscissa axis.

The set of points of the resultant and factor characteristics is called the correlation field.

Based on the correlation field, we can hypothesize (for the population) that the relationship between all possible values ​​of X and Y is linear.

The linear regression equation is y = bx + a + ε

Here ε is a random error (deviation, disturbance).

Reasons for the existence of a random error:

1. Failure to include significant explanatory variables in the regression model;

2. Aggregation of variables. For example, the total consumption function is an attempt to express generally the aggregate of individual spending decisions. This is only an approximation of individual relations that have different parameters.

3. Incorrect description of the model structure;

4. Incorrect functional specification;

21. Correlation and regression analysis.

Correlation-regression analysis as a general concept includes measuring the closeness and direction of a connection and establishing an analytical expression (form) of the connection (regression analysis).

The purpose of regression analysis is to assess the functional dependence of the conditional average value of the resultant characteristic (Y) on the factor factors (x1, x2, ..., xk).

The regression equation, or statistical model of the relationship between socio-economic phenomena, is expressed by the function:

Yx = f(x1, x2, …, xn),

where “n” is the number of factors included in the model;

Хi – factors influencing the result Y.

Stages of correlation and regression analysis:

Preliminary (a priori) analysis. It gives good results if carried out by a sufficiently qualified researcher.

Collection of information and its primary processing.

Building a model (regression equations). As a rule, this procedure is performed on a PC using standard programs.

Assessing the closeness of relationships between features, estimating the regression equation and analyzing the model.

Forecasting the development of the analyzed system using the regression equation.

At the first stage, the research problem is formulated, the methodology for measuring indicators or collecting information is determined, the number of factors is determined, and duplicate factors or those linked into a rigidly determined system are eliminated.

At the second stage, the volume of units is analyzed: the population must be large enough in terms of the number of units and observations (N>>50), the number of factors “n” must correspond to the number of observations “N”. Data must be quantitatively and qualitatively homogeneous.

At the third stage, the form of the connection and the type of analytical function (parabola, hyperbola, straight line) are determined and its parameters are found.

At the fourth stage, the reliability of all characteristics of the correlation relationship and the regression equation is assessed using the Fisher or Student reliability criterion, and an economic and technological analysis of the parameters is performed.

At the fifth stage, the possible values ​​of the result are predicted based on the best values ​​of the factor characteristics included in the model. Here the best and worst values ​​of the factors and the result are selected.

22. Types of regression equations.

To quantitatively describe the relationships between economic variables, statistics use regression and correlation methods.

Regression is a quantity that expresses the dependence of the average value of a random variable y on the values ​​of a random variable x.

The regression equation expresses the average value of one characteristic as a function of another.

The regression function is a model of the form y = l”, where y is the dependent variable (resultative attribute); x is an independent, or explanatory, variable (feature-factor).

Regression line - graph of the function y = f (x).

2 types of relationships between x and y:

1) it may be unknown which of the two variables is independent and which is dependent, the variables are equal, this is a correlation type relationship;

2) if x and y are unequal and one of them is considered as an explanatory (independent) variable, and the other as a dependent variable, then this is a regression type relationship.

Types of regressions:

1) hyperbolic - regression of an equilateral hyperbola: y = a + b / x + E;

2) linear - regression used in statistics in the form of a clear economic interpretation of its parameters: y = a+b*x+E;

3) logarithmically linear - regression of the form: In y = In a + b * In x + In E

4) multiple - regression between variables y and x1, x2 ...xm, i.e. a model of the form: y = f(x1, x2 ...xm)+E, where y is the dependent variable (resultative attribute), x1 , x2 ...xm - independent explanatory variables (features-factors), E - disturbance or stochastic variable, including the influence of unaccounted factors in the model;

5) nonlinear - regression that is nonlinear with respect to the explanatory variables included in the analysis, but linear with respect to the estimated parameters; or regression that is nonlinear in the parameters being estimated.

6) inverse - regression reduced to linear form, implemented in standard application packages of the form: y = 1/a + b*x+E;

    paired - regression between two variables y and x, i.e., a model of the form: y = f (x) + E, where y is the dependent variable (resultative attribute), x is the independent, explanatory variable (attribute - factor), E - disturbance, or stochastic variable, including the influence of unaccounted factors in the model.

    Dynamic series and their types

A time series always consists of 2 elements: 1) a point in time or time period in relation to which statistical data are provided, 2) a statistical indicator called the level of the time series.

Depending on the content of the time indicator, dynamics series can be moment or interval

Depending on the type of statistical indicator, time series are divided into series of absolute, relative and average values

Absolute show exact values

Relative ones show changes in the specific weights of the indicator in the total population

Average values ​​contain the change over time of the indicator, which is the average level of the phenomenon

    Indicators of a series of dynamics. Average level of the dynamics series.

Indicators: 1) average level of dynamic series, 2) absolute growth, chain and basic, average absolute growth, 3) growth and growth rates, chain and basic, average growth and increment rate, 4) fmcjk.nyst values ​​1 % increase

Average dynamics

Generalized characteristics of a number of dynamics, with their help the intensity of development of a phenomenon is compared in relation to different objects, for example, by country, industry, enterprise

Average level at current time ui. The method for calculating the average level depends on the type of series (instant/interval) (with equal/different intervals). If an interval series of dynamics of absolute or average values ​​with equal time intervals is given, then to calculate the average level, the formula for calculating the average simple value is used. If the time intervals of the interval series are unequal, then the average level is found using the weighted arithmetic mean. Usr=smmUi*Ti/smmTi

25. Absolute increase(delta and) is the difference between two levels of a dynamic series, which shows how much a given level of a series exceeds the level taken as the basis of comparison. Delta u=Ui-U0

Delta u=Ui-Ui-1

Absolute acceleration- the difference between the absolute growth for a given period and the absolute growth for the previous period of the same duration: Delta and with the line = delta and - delta and -1. Absolute acceleration shows how much the rate of change of an indicator has increased (decreased). The acceleration indicator is used for chain absolute increments. A negative acceleration value indicates a slowdown in growth or an acceleration in the decline in series levels.

    Indicators of relative changes in the levels of a series of dynamics.

Growth rate (growth rate)- this is the ratio of two compared levels, which shows how many times this level exceeds the level of the base period. Reflects the intensity of changes in the levels of a series of dynamics and shows how many times the level has increased compared to the base level, and in the case of a decrease, what part of the base level is the compared level.

Formula for calculating the growth rate: when compared with a constant base: K i .=y i /y 0 , when compared with a variable base: K i .=y i /y i -1 .

Growth rate is the growth rate expressed as a percentage:

T r = TO 100 %.

Growth rates for any time series are interval indicators, i.e. characterize a particular period (interval) of time.

Rate of increase- relative amount of growth, i.e. the ratio of absolute growth to the previous or baseline level. Characterizes by what percentage the level of a given period is greater (or less) than the base level.

Rate of increase- the ratio of absolute growth to the level taken as the basis of comparison:

Tpr=Ui-U0/U0*100%

Rate of increase- the difference between the growth rate (in percent) and 100,

Systematic problem solving Lapygin Yuri Nikolaevich

7.3. Correlation field

7.3. Correlation field

Logic is a straitjacket of fantasy.

Helmar Nahr

Graphs are usually used to establish relationships between two variables.

If both variables change synchronously, this may mean that there are connections between them and they influence each other. An example is the dynamics of growth in the share of wages in the structure of product costs and the dynamics of labor productivity. Observations show that as the first variable increases, the second also increases.

Although it should be borne in mind that even if there is a certain degree of synchronicity in the changes in variables, this does not mean the unconditional presence of a cause-and-effect relationship between them (perhaps there is a third variable that causes such an effect).

Examples of correlation fields are shown in Fig. 7.2.

A description of the plotting is presented below.

1. Two variables are selected for analysis: one independent, the other dependent.

2. For each value of the independent variable, measure the corresponding value of the dependent variable. These two values ​​form a data pair that is plotted as a dot on the graph. Typically, you should take at least 30 points, but to create a meaningful graph, the number of points must be at least 100.

3. The value of the independent variable characterizing the expected cause is plotted along the axis X, and the value of the dependent variable characterizing the problem is along the axis at.

4. The resulting data pairs are plotted as dots on the graph and the result is analyzed. If the correlation does not appear in the diagram, then you can try to construct a graph on a logarithmic scale.

From the book Marketing Wars by Rice Al

From the book Advertising text. Methodology of compilation and design author Berdyshev Sergey Nikolaevich

5.2. Onomastic field A.V. Superanskaya, N.V. Podolskaya and other linguists tend to identify the following classes of named objects and the corresponding onomastic categories that are significant for naming and commerce in general: names of documents and laws - documentonyms,

From the book You Must Use This author Slovtsova Irina

There is safety in numbers? For several years I worked in the regional press and wrote about local government problems. I must say that the bureaucratic apparatus is so structured, built according to a hierarchical scheme, permeates all spheres of our life, that one person (even

From the book My Life in Advertising by Claude Hopkins

From the book iPresentation. Lessons in Persuasion from Apple Leader Steve Jobs by Gallo Carmine

"Reality Warp Field" Sculley witnessed what Apple Vice President Bud Tribble once described as a "reality warp field" - the ability to convince anyone of almost anything. Many people cannot resist this magnetic attraction and

From the book Exhibition Management: Management Strategies and Marketing Communications author Filonenko Igor

9. Public relations at the exhibition field 9.1. Goals, objectives, tools of public relations on the exhibition field In a broad sense, public relations (hereinafter referred to as PR) is defined as “planned and implemented efforts aimed at establishing and maintaining goodwill

From the book The Inspiring Manager author Leary-Joyce Judith

“Field of Miracles” I personally think that this is an excellent prospect: I couldn’t even dream of anything better. Actually, that's why I wrote this book. Have you seen the movie "Field of Dreams"? There, Kevin Costner's character decides to build on his corn plantation

From the book Advertising Agency: where to start, how to succeed author Golovanov Vasily Anatolievich

"In the field!" In this chapter we will consider all the main issues related to the main stage of work on negotiating and concluding contracts for the services that you are going to sell. All entrepreneurs in 80% of cases are easily available for negotiations - I know from

From the Apple book. The phenomenon of faith author Vasiliev Yuri Nikolaevich

The Altered Reality Field One of the main developers of the first Mac, Andy Herzwild, said the following about Steve Jobs: “The Altered Reality Field was an amazing mixture of charismatic oratorical style, stubbornness and a desire to twist any fact so that it

From the book Etiquette. A complete set of rules for social and business communication. How to behave in familiar and unusual situations author Belousova Tatyana

From the book What didn’t kill the LEGO company, but made it stronger. Brick by brick by Bryn Bill

From the book Three Circles of Leadership author Sudarkin Alexander

There is safety in numbers. Involving an HR specialist in the work Some time ago, in the mid-2000s, the topic “HR as a manager’s strategic partner” was actively discussed on HR manager forums. Disputes gave way to temporary consensuses, those invited to speak

From the book Launch! Quick start for your business by Walker Jeff

From the book The Big Book of the Store Director 2.0. New technologies by Krok Gulfira

From the book Hug Your Customers. Outstanding Service Practice by Mitchell Jack

From the book Guidelines for organizing the work of the diocesan press service author E Zhukovskaya E

Theoretical part

To differentiate the direction of influence of one characteristic on another, the concepts of positive and negative connections were introduced.

If with an increase (decrease) in one attribute, the values ​​of another generally increase (decrease), then such a correlation is called direct or positive.

If, with an increase (decrease) in one characteristic, the values ​​of another generally decrease (increase), then such a correlation is called inverse or negative.

Correlation fields and their use in preliminary correlation analysis

When raising the question of the correlation between two statistical characteristics X and Y, an experiment is carried out with parallel recording of their values.

Example -
We will call the correlation field the scatter zone of the points obtained in this way on the graph. Visually analyzing the correlation field in Figure 8, you can see that it seems to be elongated along some straight line. This picture is typical for the so-called linear correlation relationship between characteristics. In this case, it can be generally assumed that with an increase in the final take-off speed, the length of the jump also increases, and vice versa. Those. There is a direct (positive) relationship between the characteristics under consideration.

Along with this example, from the many other possible correlation fields, the following can be distinguished (Fig. 9-11):

Figure 9 also shows a linear relationship, but as the values ​​of one attribute increase, the values ​​of the other decrease, and vice versa, i.e. feedback or negative. It can be assumed that in Figure 11 the points of the correlation field are scattered around some kind of curved line. In this case, they say that there is a curvilinear correlation between the characteristics.

With regard to the correlation field shown in Figure 10, it cannot be said that the points are located along some straight or curved line; it has a spherical shape. In this case, they say that characteristics X and Y do not depend on each other.



In addition, the correlation field can be used to approximately judge the closeness of the correlation connection, if this connection exists. Here they say: the fewer points are scattered around the imaginary average line, the closer the correlation between the characteristics under consideration.

Visual analysis of correlation fields helps to understand the essence of the correlation relationship and allows us to make assumptions about the presence, direction and closeness of the connection. But it is impossible to say for sure whether there is a connection between the signs or not, a linear connection or a curvilinear one, a close connection (reliable) or a weak one (unreliable), using this method. The most accurate method for identifying and assessing the linear relationship between characteristics is the method of determining various correlation indicators from statistical data.

3. Correlation coefficients and their properties

Often to determine the reliability of the relationship between two signs (X, Y) use nonparametric (rank) Spearman correlation coefficient and parametric Pearson correlation coefficient . The value of these correlation indicators is determined by the following formulas:

(1)

Where: dx - ranks of statistical data of characteristic x;

dy - ranks of statistical data of the characteristic y.

(2)

Where: - statistical data of characteristic x,

Statistical data of the characteristic y.

These coefficients have the following powerful features:

1. Based on correlation coefficients, one can only judge a linear correlation between characteristics. Nothing can be said about a curvilinear connection with their help.
2. The values ​​of the correlation coefficients are a dimensionless quantity that cannot be less than -1 or more than +1, i.e.
3.
4. If the values ​​of the correlation coefficients are zero, i.e. = 0 or = 0, then the connection between the characteristics x, y absent.
5. If the values ​​of the correlation coefficients are negative, i.e.< 0 или < 0, то связь между признаками Х и Y reverse.
6. If the values ​​of the correlation coefficients are positive, i.e. > 0 or y> 0, then the relationship between features X and Y straight(positive).
7. If the correlation coefficients take values ​​+1 or -1, i.e. = ± 1 or = ± 1, then the relationship between characteristics X and Y linear (functional).
8. The reliability of the correlation between characteristics cannot be judged only by the magnitude of the correlation coefficients. This reliability also depends on number of degrees of freedom.

Practical part.

Determine the correlation coefficient between body temperature and pulse rate and evaluate the identified relationship.

You will need

  • - distribution series from the dependent and independent variable;
  • - paper, pencil;
  • - computer and spreadsheet program.

Instructions

Choose two that you believe have a relationship, usually those that change over time. Note that one of the variables must be independent; it will act as a cause. The second one should change with it - decrease, increase or change randomly.

Measure the value of the dependent variable for each independent variable. Record your results in a table, in two rows or two columns. To detect the presence of a connection, at least 30 readings are needed, but for a more accurate result, ensure that there are at least 100 points.

Construct a coordinate plane, and plot the values ​​of the dependent variable on the ordinate axis, and the independent variable on the abscissa axis. Label the axes and indicate the units of measurement for each indicator.

Mark the points of the correlation field on the graph. On the x-axis, find the first value of the independent variable, and on the y-axis, find the corresponding value of the dependent variable. Construct perpendiculars to these projections and find the first point. Mark it, circle it with a soft pencil or pen. Construct all other points in the same way.

The resulting set of points is called correlation field. Analyze the resulting graph, draw conclusions about the presence of a strong or weak cause-and-effect relationship, or its absence.

Please note occasional deviations from the schedule. If, in general, a linear or other relationship can be traced, but the whole “picture” is spoiled by one or two points that are apart from the general population, they can be caused by random errors and not taken into account when interpreting the graph.

If you need to build and analyze a field correlations For large amounts of data, use spreadsheet programs, such as Excel, or purchase special programs.

The relationship of several quantities, during which changes in one leads to changes in the others, is called correlation. It can be simple, multiple or partial. This concept is accepted not only in mathematics, but also in biology.

Word correlation comes from the Latin correlatio, relationship. All phenomena, events and objects, as well as the quantities characterizing them, are interconnected. Correlation dependence differs from functional dependence in that in this type of dependence, any can be measured only on average, approximately. Correlation dependence assumes that a variable value corresponds to changes in an independent value only with a certain degree of probability. The degree of dependence is called the correlation coefficient. The concept of correlation is the relationship between the structure and functions of individual parts of the body. Quite often the concept correlation used by statisticians. In statistics, this is the relationship between statistical quantities, series and groups. To determine the presence or absence or existence of a correlation, a special method is used. The correlation method is used to determine the direct or inverse changes in numbers in the series that are being compared. When found, then the measure or degree of parallelism itself. But internal cause-and-effect factors are not found in this way. The main task of statistics as a science is to detect such causal dependencies for other sciences. In form, a correlation relationship can be linear or nonlinear, positive and negative. When, as one of the variables increases or decreases, the other also increases or decreases, then the relationship is linear. If, when one quantity changes, the nature of the changes in another is nonlinear, then this correlation nonlinear.Positive correlation It is considered when an increase in the level of one value is accompanied by an increase in the level of another. For example, when an increase in sound is accompanied by a feeling of an increase in its pitch. A correlation when an increase in the level of one variable is accompanied by a decrease in the level of another is called negative. In communities, an increased level of anxiety of an individual leads to a decrease in the probability of this individual occupying a dominant niche among its fellows. When there is no connection between variables, correlation is called zero.

Video on the topic

Sources:

  • Nonlinear correlation in 2019

Correlation is the mutual dependence of two random variables (usually two groups of values), in which a change in one of them leads to a change in the other. The correlation coefficient shows how likely it is that the second value will change when the values ​​of the first change, i.e. the degree of her dependence. The easiest way to calculate this value is to use the corresponding function built into the Microsoft Office Excel spreadsheet editor.

You will need

  • Microsoft Office Excel spreadsheet editor.

Instructions

Launch Excel and open a document containing groups of data that you want to calculate the correlation coefficient between. If such a document has not yet been created, then enter the data in - the spreadsheet editor creates it automatically when you start the program. Enter each of the groups of values, the correlation between which you are interested in, in a separate column. These do not have to be adjacent columns; you are free to design the table in the most convenient way - add additional columns with explanations of the data, column headings, summary cells with total or average values, etc. You can even arrange data not in a vertical direction (in columns), but in a horizontal direction (in rows). The only requirement that must be met is that the cells with the data of each group must be located sequentially one after the other, so that a continuous array is created in this way.

Go to the cell that should contain the correlation value of the data of the two arrays, and click on the “Formulas” tab in the Excel menu. In the "Function Library" group of commands, click on the most recent icon - "More Functions". A drop-down list will open in which you should go to the “Statistical” section and select the CORREL function. As a result, the Function Wizard window will open with a form for you to fill out. The same window can be called up without the “Formulas” tab by simply clicking on the insert function icon located to the left of the formula bar.

Specify the first group of correlating data in the Array1 field of the Formula Wizard. To enter a range of cells manually, type the address of the first and last cells, separating them with a colon (no spaces). Another option is to simply select the desired range with the mouse, and Excel will place the required entry in this form field on its own. The same operation must be done with the second group of data in the “Array2” field.

Click OK. The spreadsheet editor will calculate and display the correlation value in the cell with the formula. If necessary, you can save this document for future use (keyboard shortcut Ctrl + S).



Did you like the article? Share with your friends!