The concept of statistical reliability. Concepts of statistical significance and statistical criterion

The level of significance in statistics is an important indicator that reflects the degree of confidence in the accuracy and truth of the obtained (predicted) data. The concept is widely used in various fields: from conducting sociological research to statistical testing of scientific hypotheses.

Definition

The level of statistical significance (or statistically significant result) shows how likely it is that the indicators being studied occur by chance. The overall statistical significance of a phenomenon is expressed by the p-value coefficient (p-level). In any experiment or observation, there is a possibility that the data obtained were due to sampling errors. This is especially true for sociology.

That is, a statistically significant value is one whose probability of random occurrence is extremely small or tends to the extreme. The extreme in this context is the degree to which statistics deviate from the null hypothesis (a hypothesis that is tested for consistency with the obtained sample data). In scientific practice, the significance level is selected before data collection and, as a rule, its coefficient is 0.05 (5%). For systems where precise values are extremely important, this figure may be 0.01 (1%) or less.

Background

The concept of significance level was introduced by the British statistician and geneticist Ronald Fisher in 1925, when he was developing a technique for testing statistical hypotheses. When analyzing any process, there is a certain probability of certain phenomena. Difficulties arise when working with small (or not obvious) percentages of probabilities that fall under the concept of “measurement error.”

When working with statistical data that is not specific enough to test, scientists are faced with the problem of the null hypothesis, which “prevents” operating with small quantities. Fisher proposed for such systems to determine the probability of events at 5% (0.05) as a convenient sampling cut that allows one to reject the null hypothesis in calculations.

Introduction of fixed odds

In 1933, scientists Jerzy Neumann and Egon Pearson recommended in their works that a certain level of significance be established in advance (before data collection). Examples of the use of these rules are clearly visible during elections. Let's say there are two candidates, one of whom is very popular and the other is little known. It is obvious that the first candidate will win the election, and the chances of the second tend to zero. They strive - but are not equal: there is always the possibility of force majeure, sensational information, unexpected decisions that can change the predicted election results.

Neyman and Pearson agreed that Fisher's significance level of 0.05 (denoted by α) was most appropriate. However, Fischer himself in 1956 opposed fixing this value. He believed that the level of α should be set according to specific circumstances. For example, in particle physics it is 0.01.

p-level value

The term p-value was first used by Brownlee in 1960. The P-level (p-value) is an indicator that is inversely related to the truth of the results. The highest p-value coefficient corresponds to the lowest level of confidence in the sampled relationship between variables.

This value reflects the likelihood of errors associated with the interpretation of the results. Let's assume p-level = 0.05 (1/20). It shows a five percent probability that the relationship between variables found in the sample is just a random feature of the sample. That is, if this dependence is absent, then with repeated similar experiments, on average, in every twentieth study, one can expect the same or greater dependence between the variables. The p-level is often seen as a "margin" for the error rate.

By the way, p-value may not reflect the real relationship between variables, but only shows a certain average value within the assumptions. In particular, the final analysis of the data will also depend on the selected values of this coefficient. At p-level = 0.05 there will be some results, and at a coefficient equal to 0.01 there will be different results.

Testing statistical hypotheses

The level of statistical significance is especially important when testing hypotheses. For example, when calculating a two-sided test, the rejection region is divided equally at both ends of the sampling distribution (relative to the zero coordinate) and the truth of the resulting data is calculated.

Suppose, when monitoring a certain process (phenomenon), it turns out that new statistical information indicates small changes relative to previous values. At the same time, the discrepancies in the results are small, not obvious, but important for the study. The specialist is faced with a dilemma: are changes really occurring or are these sampling errors (measurement inaccuracy)?

In this case, they use or reject the null hypothesis (attribute everything to an error, or recognize the change in the system as a fait accompli). The problem solving process is based on the ratio of overall statistical significance (p-value) and significance level (α). If p-level< α, значит, нулевую гипотезу отвергают. Чем меньше р-value, тем более значимой является тестовая статистика.

Values used

The level of significance depends on the material being analyzed. In practice, the following fixed values are used:

α = 0.1 (or 10%);
α = 0.05 (or 5%);
α = 0.01 (or 1%);
α = 0.001 (or 0.1%).

The more accurate the calculations are required, the lower the α coefficient is used. Naturally, statistical forecasts in physics, chemistry, pharmaceuticals, and genetics require greater accuracy than in political science and sociology.

Significance thresholds in specific areas

In high-precision fields such as particle physics and manufacturing, statistical significance is often expressed as the ratio of the standard deviation (denoted by the sigma coefficient - σ) relative to a normal probability distribution (Gaussian distribution). σ is a statistical indicator that determines the dispersion of the values of a certain quantity relative to mathematical expectations. Used to plot the probability of events.

Depending on the field of knowledge, the coefficient σ varies greatly. For example, when predicting the existence of the Higgs boson, the parameter σ is equal to five (σ = 5), which corresponds to p-value = 1/3.5 million. In genome studies, the significance level can be 5 × 10 -8, which is not uncommon for this areas.

Efficiency

It must be taken into account that coefficients α and p-value are not exact characteristics. Whatever the level of significance in the statistics of the phenomenon under study, it is not an unconditional basis for accepting the hypothesis. For example, the smaller the value of α, the greater the chance that the hypothesis being established is significant. However, there is a risk of error, which reduces the statistical power (significance) of the study.

Researchers who focus solely on statistically significant results may reach erroneous conclusions. At the same time, it is difficult to double-check their work, since they apply assumptions (which in fact are the α and p-values). Therefore, it is always recommended, along with calculating statistical significance, to determine another indicator - the magnitude of the statistical effect. Effect size is a quantitative measure of the strength of an effect.

PAID FEATURE. The statistical significance feature is only available on select plans. Check if it is in .

You can find out whether there are statistically significant differences in the responses received from different groups of respondents to questions in a survey. To use the statistical significance feature in SurveyMonkey, you must:

Enable the statistical significance feature when adding a comparison rule to a question in your survey. Select groups of respondents to compare to sort survey results into groups for visual comparison.
Examine the data tables for your survey questions to identify any statistically significant differences in the responses received from different groups of respondents.

View statistical significance

By following the steps below, you can create a survey that displays statistical significance.

1. Add closed-ended questions to your survey

In order to show statistical significance when analyzing results, you will need to apply a comparison rule to any question in your survey.

You can apply the comparison rule and calculate statistical significance in responses if you use one of the following types of questions in your survey design:

It is necessary to make sure that the proposed answer options can be divided into complete groups. The response options you select for comparison when you create a comparison rule will be used to organize the data into crosstabs throughout the survey.

2. Collect answers

Once you've completed your survey, create a collector to send it out. There are several ways.

You must receive at least 30 responses for each response option you plan to use in your comparison rule to activate and view statistical significance.

Survey example

You want to find out whether men are significantly more satisfied with your products than women.

Add two multiple choice questions to your survey:
What is your gender? (male, female)
Are you satisfied or dissatisfied with our product? (satisfied, dissatisfied)
Make sure that at least 30 respondents select “male” for the gender question AND at least 30 respondents select “female” as their gender.
Add a comparison rule to the question "What is your gender?" and select both answer options as your groups.
Use the data table below the question chart "Are you satisfied or dissatisfied with our product?" to see if any response options show a statistically significant difference

What is a statistically significant difference?

A statistically significant difference means that statistical analysis has determined that there are significant differences between the responses of one group of respondents and the responses of another group. Statistical significance means that the numbers obtained are significantly different. Such knowledge will greatly help you in data analysis. However, you determine the importance of the results obtained. It is you who decide how to interpret the survey results and what actions should be taken based on them.

For example, you receive more complaints from female customers than from male customers. How can we determine whether such a difference is real and whether action needs to be taken regarding it? One great way to test your observations is to conduct a survey that will show you whether male customers are significantly more satisfied with your product. Using a statistical formula, our statistical significance function will give you the ability to determine whether your product is actually much more appealing to men than to women. This will allow you to take action based on facts rather than guesswork.

Statistically significant difference

If your results are highlighted in the data table, it means that the two groups of respondents are significantly different from each other. The term “significant” does not mean that the resulting numbers have any particular importance or significance, only that there is a statistical difference between them.

No statistically significant difference

If your results are not highlighted in the corresponding data table, this means that although there may be a difference in the two figures being compared, there is no statistical difference between them.

Responses without statistically significant differences demonstrate that there is no significant difference between the two items being compared given the sample size you use, but this does not necessarily mean that they are not significant. Perhaps by increasing the sample size, you will be able to identify a statistically significant difference.

Sample size

If you have a very small sample size, only very large differences between the two groups will be significant. If you have a very large sample size, both small and large differences will be counted as significant.

However, if two numbers are statistically different, this does not mean that the difference between the results has any practical meaning to you. You will have to decide for yourself which differences are meaningful for your survey.

Calculating Statistical Significance

We calculate statistical significance using a standard 95% confidence level. If an answer option is shown as statistically significant, it means that by chance alone or due to sampling error there is less than a 5% probability of the difference between the two groups occurring (often shown as: p<0,05).

To calculate statistically significant differences between groups, we use the following formulas:

Parameter	Description
a1	The percentage of participants from the first group who answered the question in a certain way, multiplied by the sample size of this group.
b1	The percentage of participants from the second group who answered the question in a certain way, multiplied by the sample size of this group.
Pooled sample proportion (p)	The combination of two shares from both groups.
Standard error (SE)	An indicator of how much your share differs from the actual share. A lower value means the fraction is close to the actual fraction, a higher value means the fraction is significantly different from the actual fraction.
Test statistic (t)	Test statistic. The number of standard deviations by which a given value differs from the mean.
Statistical significance	If the absolute value of the test statistic is greater than 1.96* standard deviations from the mean, it is considered a statistically significant difference.

*1.96 is the value used for the 95% confidence level because 95% of the range handled by the Student's t-distribution function lies within 1.96 standard deviations of the mean.

Calculation example

Continuing with the example used above, let's find out whether the percentage of men who say they are satisfied with your product is significantly higher than the percentage of women.

Let's say 1,000 men and 1,000 women took part in your survey, and the result of the survey was that 70% of men and 65% of women say that they are satisfied with your product. Is the 70% level significantly higher than the 65% level?

Substitute the following data from the survey into the given formulas:

p1 (% of men satisfied with the product) = 0.7
p2 (% of women satisfied with the product) = 0.65
n1 (number of men surveyed) = 1000
n2 (number of women interviewed) = 1000

Since the absolute value of the test statistic is greater than 1.96, it means that the difference between men and women is significant. Compared to women, men are more likely to be satisfied with your product.

Hiding statistical significance

How to hide statistical significance for all questions

Click the down arrow to the right of the comparison rule in the left sidebar.
Select an item Edit rule.
Disable the feature Show statistical significance using a switch.
Click the button Apply.

To hide statistical significance for one question, you need to:

Click the button Tune above the diagram of this issue.
Open the tab Display options.
Uncheck the box next to Statistical significance.
Click the button Save.

The display option is automatically enabled when statistical significance display is enabled. If you clear this display option, the statistical significance display will also be disabled.

Turn on the statistical significance feature when adding a comparison rule to a question in your survey. Examine the data tables for your survey questions to determine if there are statistically significant differences in the responses received from different groups of respondents.

What do you think makes your “other half” special and meaningful? Is it related to her/his personality or to your feelings that you have for this person? Or maybe with the simple fact that the hypothesis that your crush is random, as studies show, has a probability of less than 5%? If the last statement is considered reliable, then successful dating sites would not exist in principle:

When you conduct split testing or any other analysis of your website, misunderstanding “statistical significance” can lead to misinterpretation of the results and, therefore, incorrect actions in the conversion optimization process. This is true for the thousands of other statistical tests performed every day in every existing industry.

To understand what “statistical significance” is, you need to dive into the history of the term, learn its true meaning, and understand how this “new” old understanding will help you correctly interpret the results of your research.

A little history

Although humanity has been using statistics to solve various problems for many centuries, the modern understanding of statistical significance, hypothesis testing, randomization and even Design of Experiments (DOE) began to take shape only at the beginning of the 20th century and is inextricably linked with the name of Sir Ronald Fisher (Sir Ronald Fisher, 1890-1962):

Ronald Fisher was an evolutionary biologist and statistician who had a special passion for the study of evolution and natural selection in the animal and plant kingdoms. During his illustrious career, he developed and popularized many useful statistical tools that we still use today.

Fisher used the techniques he developed to explain processes in biology such as dominance, mutations and genetic deviations. We can use the same tools today to optimize and improve the content of web resources. The fact that these analysis tools can be used to work with objects that did not even exist at the time of their creation seems quite surprising. It is equally surprising that people used to perform complex calculations without calculators or computers.

To describe the results of a statistical experiment as having a high probability of being true, Fisher used the word “significance.”

Also, one of Fisher’s most interesting developments can be called the “sexy son” hypothesis. According to this theory, women prefer sexually promiscuous men (promiscuous) because this will allow the sons born to these men to have the same predisposition and produce more offspring (note that this is just a theory).

But no one, even brilliant scientists, is immune from making mistakes. Fisher's flaws still plague specialists to this day. But remember the words of Albert Einstein: “Whoever has never made a mistake has never created anything new.”

Before moving on to the next point, remember: statistical significance is when the difference in test results is so large that the difference cannot be explained by random factors.

What is your hypothesis?

To understand what “statistical significance” means, you first need to understand what “hypothesis testing” is, since the two terms are closely intertwined.
A hypothesis is just a theory. Once you have developed a theory, you will need to establish a process for collecting enough evidence and actually collecting that evidence. There are two types of hypotheses.

Apples or oranges - which is better?

Null hypothesis

As a rule, this is where many people experience difficulties. One thing to keep in mind is that a null hypothesis is not something that needs to be proven, like you prove that a certain change on a website will lead to an increase in conversions, but vice versa. The null hypothesis is a theory that states that if you make any changes to the site, nothing will happen. And the goal of the researcher is to refute this theory, not prove it.

If we look at the experience of solving crimes, where investigators also form hypotheses as to who the criminal is, the null hypothesis takes the form of the so-called presumption of innocence, the concept according to which the accused is presumed innocent until proven guilty in a court of law.

If the null hypothesis is that two objects are equal in their properties, and you are trying to prove that one is better (for example, A is better than B), you need to reject the null hypothesis in favor of the alternative. For example, you are comparing one or another conversion optimization tool. In the null hypothesis, they both have the same effect (or no effect) on the target. In the alternative, the effect of one of them is better.

Your alternative hypothesis may contain a numerical value, such as B - A > 20%. In this case, the null hypothesis and the alternative can take the following form:

Another name for an alternative hypothesis is a research hypothesis because the researcher is always interested in proving this particular hypothesis.

Statistical significance and p value

Let's return again to Ronald Fisher and his concept of statistical significance.

Now that you have a null hypothesis and an alternative, how can you prove one and disprove the other?

Since statistics, by their very nature, involve the study of a specific population (sample), you can never be 100% sure of the results obtained. A good example: election results often differ from the results of preliminary polls and even exit pools.

Dr. Fisher wanted to create a dividing line that would let you know whether your experiment was a success or not. This is how the reliability index appeared. Credibility is the level we take to say what we consider “significant” and what we don’t. If "p", the significance index, is 0.05 or less, then the results are reliable.

Don't worry, it's actually not as confusing as it seems.

Gaussian probability distribution. Along the edges are the less probable values of the variable, in the center are the most probable. The P-score (green shaded area) is the probability of the observed outcome occurring by chance.

The normal probability distribution (Gaussian distribution) is a representation of all possible values of a certain variable on a graph (in the figure above) and their frequencies. If you do your research correctly and then plot all your answers on a graph, you will get exactly this distribution. According to the normal distribution, you will receive a large percentage of similar answers, and the remaining options will be located at the edges of the graph (the so-called “tails”). This distribution of values is often found in nature, which is why it is called “normal”.

Using an equation based on your sample and test results, you can calculate what is called a “test statistic,” which will indicate how much your results deviate. It will also tell you how close you are to the null hypothesis being true.

To help you get your head around it, use online calculators to calculate statistical significance:

One example of such calculators

The letter "p" represents the probability that the null hypothesis is true. If the number is small, it will indicate a difference between the test groups, whereas the null hypothesis would be that they are the same. Graphically, it will look like your test statistic will be closer to one of the tails of your bell-shaped distribution.

Dr. Fisher decided to set the significance threshold at p ≤ 0.05. However, this statement is controversial, since it leads to two difficulties:

1. First, the fact that you have proven the null hypothesis false does not mean that you have proven the alternative hypothesis. All this significance just means that you can't prove either A or B.

2. Secondly, if the p-score is 0.049, it will mean that the probability of the null hypothesis will be 4.9%. This may mean that your test results may be both true and false at the same time.

You may or may not use the p-score, but then you will need to calculate the probability of the null hypothesis on a case-by-case basis and decide whether it is large enough to prevent you from making the changes you planned and tested.

The most common scenario for conducting a statistical test today is to set a significance threshold of p ≤ 0.05 before running the test itself. Just be sure to look closely at the p-value when checking your results.

Errors 1 and 2

So much time has passed that errors that can occur when using the statistical significance metric have even been given their own names.

Type 1 Errors

As mentioned above, a p-value of 0.05 means there is a 5% chance that the null hypothesis is true. If you don't, you'll be making mistake number 1. The results say your new website increased your conversion rates, but there's a 5% chance that it didn't.

Type 2 Errors

This error is the opposite of error 1: you accept the null hypothesis when it is false. For example, test results tell you that the changes made to the site did not bring any improvements, while there were changes. As a result, you miss the opportunity to improve your performance.

This error is common in tests with an insufficient sample size, so remember: the larger the sample, the more reliable the result.

Conclusion

Perhaps no term is as popular among researchers as statistical significance. When test results are not found to be statistically significant, the consequences range from an increase in conversion rates to the collapse of a company.

And since marketers use this term when optimizing their resources, you need to know what it really means. Test conditions may vary, but sample size and success criteria are always important. Remember this.

Statistical significance or p-level of significance is the main result of the test

statistical hypothesis. In technical terms, this is the probability of receiving a given

the result of a sample study, provided that in fact for the general

In the aggregate, the null statistical hypothesis is true - that is, there is no connection. In other words, this

the probability that the detected relationship is random and not a property

totality. It is statistical significance, the p-level of significance, that is

quantitative assessment of communication reliability: the lower this probability, the more reliable the connection.

Suppose, when comparing two sample means, a level value was obtained

statistical significance p=0.05. This means that testing the statistical hypothesis about

equality of means in the population showed that if it is true, then the probability

The random occurrence of detected differences is no more than 5%. In other words, if

two samples were repeatedly drawn from the same population, then in 1 of

20 cases, the same or greater difference would be found between the means of these samples.

That is, there is a 5% chance that the differences found are due to chance.

character, and are not a property of the aggregate.

In relation to a scientific hypothesis, the level of statistical significance is a quantitative

an indicator of the degree of distrust in the conclusion about the existence of a connection, calculated from the results

selective, empirical testing of this hypothesis. The lower the p-level value, the higher

the statistical significance of a research result confirming a scientific hypothesis.

It is useful to know what affects the significance level. Significance level, other things being equal

conditions are higher (the p-level value is lower) if:

The magnitude of the connection (difference) is greater;

The variability of the trait(s) is less;

The sample size(s) is larger.

Unilateral Two-sided significance tests

If the purpose of the study is to identify differences in the parameters of two general

aggregates that correspond to its various natural conditions (living conditions,

age of the subjects, etc.), then it is often unknown which of these parameters will be greater, and

Which one is smaller?

For example, if you are interested in the variability of results in a test and

experimental groups, then, as a rule, there is no confidence in the sign of the difference in variances or

standard deviations of results by which variability is assessed. In this case

the null hypothesis is that the variances are equal, and the purpose of the study is

prove the opposite, i.e. presence of differences between variances. It is allowed that

the difference can be of any sign. Such hypotheses are called two-sided.

But sometimes the challenge is to prove an increase or decrease in a parameter;

for example, the average result in the experimental group is higher than the control group. Wherein

It is no longer allowed that the difference may be of a different sign. Such hypotheses are called

One-sided.

Significance tests used to test two-sided hypotheses are called

Double-sided, and for one-sided - unilateral.

The question arises as to which criterion should be chosen in a given case. Answer

This question is beyond the scope of formal statistical methods and is completely

Depends on the goals of the study. Under no circumstances should you choose one or another criterion after

Conducting an experiment based on the analysis of experimental data, as this may

Lead to incorrect conclusions. If, before conducting an experiment, it is assumed that the difference

The compared parameters can be either positive or negative, then you should

The main features of any relationship between variables.

We can note the two simplest properties of the relationship between variables: (a) the magnitude of the relationship and (b) the reliability of the relationship.

- Magnitude . Dependency magnitude is easier to understand and measure than reliability. For example, if any man in the sample had a white blood cell count (WCC) value higher than any woman, then you can say that the relationship between the two variables (Gender and WCC) is very high. In other words, you could predict the values of one variable from the values of another.

- Reliability (“truth”). The reliability of interdependence is a less intuitive concept than the magnitude of dependence, but it is extremely important. The reliability of the relationship is directly related to the representativeness of a certain sample on the basis of which conclusions are drawn. In other words, reliability refers to how likely it is that a relationship will be rediscovered (in other words, confirmed) using data from another sample drawn from the same population.

It should be remembered that the ultimate goal is almost never to study this particular sample of values; a sample is of interest only insofar as it provides information about the entire population. If the study satisfies certain specific criteria, then the reliability of the found relationships between sample variables can be quantified and presented using a standard statistical measure.

Dependency magnitude and reliability represent two different characteristics of dependencies between variables. However, it cannot be said that they are completely independent. The greater the magnitude of the relationship (connection) between variables in a sample of normal size, the more reliable it is (see the next section).

The statistical significance of a result (p-level) is an estimated measure of confidence in its “truth” (in the sense of “representativeness of the sample”). More technically speaking, the p-level is a measure that varies in decreasing order of magnitude with the reliability of the result. A higher p-level corresponds to a lower level of confidence in the relationship between variables found in the sample. Namely, the p-level represents the probability of error associated with the distribution of the observed result to the entire population.

For example, p-level = 0.05(i.e. 1/20) indicates that there is a 5% chance that the relationship between variables found in the sample is just a random feature of the sample. In many studies, a p-level of 0.05 is considered an "acceptable margin" for the level of error.

There is no way to avoid arbitrariness in deciding what level of significance should truly be considered "significant". The choice of a certain significance level above which results are rejected as false is quite arbitrary.

In practice, the final decision usually depends on whether the result was predicted a priori (i.e., before the experiment was carried out) or discovered a posteriori as a result of many analyzes and comparisons performed on a variety of data, as well as on the tradition of the field of study.

Generally, in many fields, a result of p .05 is an acceptable cutoff for statistical significance, but keep in mind that this level still includes a fairly large margin of error (5%).

Results significant at the p .01 level are generally considered statistically significant, while results at the p .005 or p .00 level are generally considered statistically significant. 001 as highly significant. However, it should be understood that this classification of significance levels is quite arbitrary and is just an informal agreement adopted on the basis of practical experience in a particular field of study.

It is clear that the greater the number of analyzes that are carried out on the totality of the collected data, the greater the number of significant (at the selected level) results will be discovered purely by chance.

Some statistical methods that involve many comparisons, and thus have a significant chance of repeating this type of error, make a special adjustment or correction for the total number of comparisons. However, many statistical methods (especially simple exploratory data analysis methods) do not offer any way to solve this problem.

If the relationship between variables is “objectively” weak, then there is no other way to test such a relationship other than to study a large sample. Even if the sample is perfectly representative, the effect will not be statistically significant if the sample is small. Likewise, if a relationship is “objectively” very strong, then it can be detected with a high degree of significance even in a very small sample.

The weaker the relationship between variables, the larger the sample size required to detect it meaningfully.

Many different measures of relationship between variables. The choice of a particular measure in a particular study depends on the number of variables, the measurement scales used, the nature of the relationships, etc.

Most of these measures, however, follow a general principle: they attempt to estimate an observed relationship by comparing it with the “maximum conceivable relationship” between the variables in question. Technically speaking, the usual way to make such estimates is to look at how the values of the variables vary and then calculate how much of the total variation present can be explained by the presence of "common" ("joint") variation in two (or more) variables.

Significance depends mainly on the sample size. As already explained, in very large samples even very weak relationships between variables will be significant, while in small samples even very strong relationships are not reliable.

Thus, in order to determine the level of statistical significance, a function is needed that represents the relationship between the “magnitude” and “significance” of the relationship between variables for each sample size.

Such a function would indicate exactly “how likely it is to obtain a dependence of a given value (or more) in a sample of a given size, assuming that there is no such dependence in the population.” In other words, this function would give a significance level
(p-level), and, therefore, the probability of erroneously rejecting the assumption of the absence of this dependence in the population.

This "alternative" hypothesis (that there is no relationship in the population) is usually called null hypothesis.

It would be ideal if the function that calculates the probability of error were linear and only had different slopes for different sample sizes. Unfortunately, this function is much more complex and is not always exactly the same. However, in most cases its form is known and can be used to determine significance levels in studies of samples of a given size. Most of these functions are associated with a class of distributions called normal .