Presentation of the basis of the theory of tests in physical education. Characteristics of control testing in physical education

What is testing

In accordance with IEEE Std 829-1983 Testing is a process of software analysis aimed at identifying differences between its actually existing and required properties (defect) and at assessing the properties of the software.

According to GOST R ISO IEC 12207-99 c life cycle The software defines, among others, supporting processes for verification, certification, joint analysis and audit. The verification process is the process of determining that software products function in full accordance with the requirements or conditions implemented in previous work. This process may include analysis, inspection and testing (testing). The certification process is the process of determining the completeness of compliance with established requirements, a created system or software product their functional purpose. The joint review process is the process of assessing the states and, if necessary, the results of the work (products) of the project. The audit process is the process of determining compliance with requirements, plans and contract terms. Together, these processes make up what is usually called testing.

Testing is based on test procedures with specific input data, initial conditions and expected result, designed for a specific purpose, such as testing a specific program or verifying conformance to a specific requirement. Test procedures can check various aspects functioning of the program - from proper operation separate function until business requirements are adequately met.

When carrying out a project, it is necessary to consider in accordance with what standards and requirements the product will be tested. What tools (if any) will be used to find and document defects found. If you remember about testing from the very beginning of the project, testing the product under development will not cause unpleasant surprises. This means that the quality of the product will most likely be quite high.

Product life cycle and testing

Increasingly nowadays, iterative software development processes are used, in particular, technology RUP - Rational Unified Process(Fig. 1). With this approach, testing ceases to be an “off-the-cuff” process that occurs after programmers have written all the necessary code. Work on tests starts from the very beginning initial stage identifying requirements for a future product and closely integrating with current tasks. And this places new demands on testers. Their role is not limited to simply identifying errors as fully and as early as possible. They must participate in general process identifying and eliminating the most significant project risks. To do this, for each iteration the testing goal and methods for achieving it are determined. And at the end of each iteration, it is determined to what extent this goal has been achieved, whether additional tests, and whether it is necessary to change the principles and instruments of testing. In turn, each detected defect must go through its own life cycle.

Rice. 1. Product life cycle according to RUP

Testing is usually carried out in cycles, each of which has a specific list of tasks and goals. The testing cycle may coincide with an iteration or correspond to a specific part of it. Typically, a testing cycle is carried out for a specific system build.

The life cycle of a software product consists of a series of relatively short iterations (Figure 2). An iteration is a complete development cycle leading to the release of a final product or some shortened version of it, which expands from iteration to iteration to eventually become a complete system.

Each iteration typically includes tasks of work planning, analysis, design, implementation, testing and evaluation. results achieved. However, the relationship between these tasks can change significantly. In accordance with the ratio various tasks in iteration they are grouped into phases. The first phase, Beginning, focuses on the analysis tasks. The iterations of the second phase - Development - focus on designing and testing key design solutions. In the third phase - Construction - the largest proportion of development and testing tasks. And in the last phase - Transfer - the tasks of testing and transferring the system to the Customer are solved to the greatest extent.

Rice. 2. Iterations of the software product life cycle

Each phase has its own specific goals in the product life cycle and is considered complete when those goals are achieved. All iterations, except perhaps the Beginning phase iterations, end with the creation of a functioning version of the system being developed.

Test categories

Tests vary significantly in the problems they solve and the technology they use.

Test categories Category description Types of testing
Current testing A set of tests performed to determine the functionality of new system features added.
  • load testing;
  • business cycle testing;
  • stress testing.
Regression testing The purpose of regression testing is to verify that additions to the system do not reduce its capabilities, i.e. testing is carried out according to requirements that have already been met before adding new features.
  • load testing;
  • business cycle testing;
  • stress testing.

Testing subcategories

Testing subcategories Description of the type of testing Subtypes of testing
Load testing Used to test all application functions without exception. IN in this case The order in which the functions are tested does not matter.
  • functional testing;
  • interface testing;
  • database testing
Business cycle testing Used to test application functions in the sequence they are called by the user. For example, simulating all the actions of an accountant for the 1st quarter.
  • unit testing (unit testing);
  • functional testing;
  • interface testing;
  • database testing.
Stress testing

Used for testing

Application performance. The purpose of this testing is to determine the scope stable operation applications. During this testing, all available functions are called.

  • unit testing (unit testing);
  • functional testing;
  • interface testing;
  • database testing.

Types of testing

Unit testing (unit testing) - this type involves testing individual application modules. To obtain maximum results, testing is carried out simultaneously with the development of modules.

Functional testing - The purpose of this testing is to ensure that the test item is functioning properly. The correctness of navigation through the object is tested, as well as the input, processing and output of data.

Database testing - checking the functionality of the database during normal application operation, during overloads and in multi-user mode.

Unit testing

For OOP, the usual way to organize unit testing is to test the methods of each class, then the class of each package, and so on. We are gradually moving on to testing the entire project, and the previous tests are of the regression type.

The output documentation of these tests includes test procedures, input data, code executing the test, and output data. The following is the type of output documentation.

Functional testing

Functional testing of the test item is planned and conducted based on the test requirements specified during the requirements definition phase. The requirements include business rules, use-case diagrams, business functions, and, if available, activity diagrams. The purpose of functional tests is to verify that the developed graphical components meet the specified requirements.

This type of testing cannot be fully automated. Therefore, it is divided into:

  • Automated testing (will be used in the case where it is possible to check the output information).

Purpose: to test data input, processing and output;

  • Manual testing (in other cases).

Purpose: Tests whether user requirements are met correctly.

It is necessary to execute (play) each of the use-cases, using both correct values ​​and obviously erroneous ones, to confirm correct functioning, according to the following criteria:

  • the product responds adequately to all input data (expected results are output in response to correctly entered data);
  • the product responds adequately to incorrectly entered data (corresponding error messages appear).

Database testing

The purpose of this testing is to ensure the reliability of database access methods, their correct execution, without violating data integrity.

It is necessary to consistently use as much as possible possible number database calls. An approach is used in which the test is designed in such a way as to “load” the database with a sequence of both correct values ​​and obviously erroneous ones. The reaction of the database to data input is determined, and the time intervals for their processing are estimated.

REPORT

student 137 gr. Ivanova I.

on testing the effectiveness of training methods
using methods mathematical statistics

Sections of the report are drawn up in accordance with the samples given in this manual at the end of each stage of the game. The completed reports are stored at the Department of Biomechanics until consultation before the exam. Students who have not reported for the work done and have not submitted a notebook with a report to the teacher are not allowed to take the sports metrology exam.


Stage I business games
Control and measurement in sports

Target:

1. Familiarize yourself with theoretical foundations control and measurement in sports and physical education.

2. Acquire skills in measuring speed performance indicators in athletes.

1. Physical control
education and sports

Physical education and sports training is not a spontaneous, but a controlled process. At each moment of time, a person is in a certain physical state, which is determined mainly by health (compliance of vital signs with the norm, the degree of resistance of the body to adverse sudden influences), physique and the state of physical functions.

It is advisable to manage the physical condition of a person by changing it in in the right direction. This management is carried out by means of physical education and sports, which, in particular, include physical exercises.

It just seems like the teacher (or coach) is in control. physical condition, influencing the athlete’s behavior, i.e. offering certain physical exercises, as well as monitoring the correctness of their implementation and the results obtained. In reality, the athlete’s behavior is controlled not by the coach, but by the athlete himself. During sports training, the self-governing system (the human body) is influenced. Individual differences in the condition of athletes, they do not provide confidence that the same impact will cause the same response. Therefore, the relevant question is about feedback: information about the athlete’s condition received by the coach during control of the training process.

Control in physical education and sports is based on measuring indicators, selecting the most significant ones and their mathematical processing.

Management of the educational and training process includes three stages:

1) collection of information;

2) its analysis;

3) decision making (planning).

Information collection is usually carried out during comprehensive control, the objects of which are:

1) competitive activity;

2) training loads;

3) the athlete’s condition.



There are (V.A. Zaporozhanov) three types of athlete’s states depending on the duration of the interval required for the transition from one state to another.

1. Staged(permanent) condition. Saved relatively long – weeks or months. Comprehensive characteristics the staged state of an athlete, reflecting his ability to demonstrate sporting achievements, is called preparedness, and the state of optimal (best for a given training cycle) readiness is called sports uniform. Obviously, a state of fitness cannot be achieved or lost within one or several days.

2. Current state. Changes under the influence of one or several classes. Often the consequences of participation in competitions or performance in one of the classes training work lasts for several days. In this case, the athlete usually notes the events as unfavorable (for example, muscle pain), and positive (for example, the state increased performance). Such changes are called delayed training effect.

The current state of the athlete determines the nature of the next training sessions and the magnitude of the loads in them. Special case current state, characterized by readiness to perform a competitive exercise in the coming days with a result close to the maximum, is called current readiness.

3. Operational state. Changes under the influence one-time execution physical exercise and is temporary (for example, fatigue caused by running a distance once; a temporary increase in performance after warming up). The athlete’s operational state changes during the training session and should be taken into account when planning rest intervals between approaches, repeated races, when deciding on the advisability of additional warm-up, etc. A special case of an operational state, characterized by immediate readiness to perform a competitive exercise with a result close to the maximum, is called operational readiness.

In accordance with the above classification, there are three main types of monitoring the athlete’s condition:

1) stage control. Its purpose is to assess the stage condition (readiness) of the athlete;

2) current control . Its main task is to determine everyday (current) fluctuations in the athlete’s condition;

3) operational control . Its purpose is a rapid assessment of the athlete’s condition at the moment.

A measurement or test performed to determine the condition or ability of an athlete is called test. The measurement or test procedure is called testing.

Any test involves measurement. But not every measurement serves as a test. Only those that satisfy the following metrological requirements can be used as tests: requirements:

2) standardization;

3) the presence of a rating system;

4) reliability and information content (quality factor) of tests;

5) type of control (stage-by-stage, current or operational).

A test based on motor tasks is called motor. There are three groups of motor tests:

1. Test exercises, performing which the athlete receives the task to show maximum result. The test result is a motor achievement. For example, the time it takes an athlete to run a distance of 100 m.

2. Standard functional tests, during which the task, the same for everyone, is dosed either according to the amount of work performed, or according to the magnitude of physiological changes. The test result is physiological or biochemical indicators during standard work or motor achievements with a standard amount of physiological changes. For example, the percentage increase in heart rate after 20 squats or the speed at which an athlete runs with a fixed heart rate of 160 beats per minute.

3. Maximum functional tests, during which the athlete must show maximum results. The test result is physiological or biochemical indicators at maximum work. For example, maximum oxygen consumption or maximum oxygen debt.

High quality testing requires knowledge of measurement theory.

Submitting your good work to the knowledge base is easy. Use the form below

good job to the site">

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

1. BASIC CONCEPTS

A test is a measurement or test conducted to determine the condition or ability of an athlete. The testing process is called testing: the resulting measurement numeric value-- test result (or test result). For example, the 100m run is a test, the procedure for conducting races and timing is testing, the running time is the test result.

Tests based on motor tasks are called motor (or motor) tests. In these tests, the results can be either motor achievements (time to complete the distance, number of repetitions, distance traveled, etc.), or physiological and biochemical indicators. Depending on this, as well as on the task facing the subject, three groups of motor tests are distinguished (Table A).

Table A. Types of motor tests.

Test name

Task for the athlete

Test results

Test exercises

Motor achievement

1500m run, running time

Standard functional tests

The same for everyone, dosed either: a) according to the amount of work performed, or: b) according to the magnitude of physiological changes

Physiological or biochemical indicators during standard work Motor indicators during a standard amount of physiological changes

Heart rate registration during standard work 1000 km/min Running speed at heart rate 160 beats/min, PVC sample (170)

Maximum functional tests

Show maximum result

Physiological or biochemical indicators

Determination of maximum oxygen debt or maximum oxygen consumption

Sometimes not one, but several tests are used that have a common final goal(for example, assessing the athlete’s condition during the competitive training period). Such a group is called a complex or battery of tests. Not all measurements can be used as tests. To do this they must satisfy special requirements. These include: 1) test reliability; 2) information content of the test; 3) the presence of a rating system (see the next chapter); 4) standardization - the testing procedure and conditions must be the same in all cases of application of the test. Tests that meet the requirements of reliability and information content are called good or authentic tests.

2. TEST RELIABILITY

2.1 Concept of test reliability

physical treadmill testing

Test reliability is the degree to which results agree when repeated testing of the same people (or other objects) under the same conditions. Ideally, the same test administered to the same subjects under the same conditions should produce the same results. However, even with the most stringent standardization of testing and precise equipment, test results always vary somewhat. For example, an athlete who has just bench-pressed 55 kg on a wrist dynamometer will only show 50 kg in a few minutes. Such variation is called intra-individual or (using the more general terminology of mathematical statistics) intra-class variation. It is caused by four main reasons:

change in the state of the subjects (fatigue, training, learning, change in motivation, concentration, etc.);

uncontrolled changes external conditions and equipment (temperature and humidity, power supply voltage, presence of unauthorized persons, wind, etc.);

change in the state of the person conducting or evaluating the test, replacement of one experimenter or judge with another;

imperfection of the test (there are tests that are obviously unreliable, for example, free throws into a basketball basket before the first miss; even an athlete with a high percentage of hits can accidentally make a mistake on the first throws).

The following simplified example will help understand the idea of ​​the methods used to judge the reliability of tests. Let's assume that they want to compare the standing long jump results of two athletes based on two attempts performed. If you want to draw accurate conclusions, you cannot limit yourself to registering only best results. Let us assume that the results of each of the athletes vary within ±10 cm from average size and are equal to 220±10 cm (i.e. 210 and 230 cm) and 320±10 cm (i.e. 310 and 330 cm), respectively. In this case, the conclusion, of course, will be completely unambiguous: the second athlete is superior to the first. The difference between the results (320 cm - 220 cm = 100 cm) is clearly greater than random fluctuations (±10 cm). It will be much less certain

Rice. 1. The ratio of inter- and intraclass variation with high (top) and low (bottom) reliability.

Short vertical strokes - data from individual attempts, X and A" 2, X 3 - average results of three subjects

conclusion if, for the same intraclass variation (equal to ±10 cm), the difference between subjects (interclass variation) will be small. Let's say the average values ​​will be 220 cm (in one attempt 210 cm, in another 230 cm) and 222 (212 and 232 cm). Then it may happen, for example, that in the first attempt the first athlete jumps 230 cm, and the second only 212, and the impression will be created that the first is significantly stronger than the second.

The example shows that the main significance is not intraclass variability itself, but its relationship with interclass differences. The same intraclass variation gives different reliability at various differences between classes (in the particular case, subjects, Fig. 1).

The theory of test reliability is based on the fact that the result of any measurement carried out on a person - X ( - is the sum of two quantities:

X^Hoo + Heh, (1)

where X x is the so-called true result that they want to record;

X e - an error caused by an uncontrolled variation in the state of the subject, introduced by a measuring device, etc.

By definition, the true result is the average value of X^ for infinite large number observations under the same conditions (that is why at X they put the infinity sign oo).

If the errors are random (their sum is zero, and in different attempts they do not depend on each other), then from mathematical statistics it follows:

O/ = Ooo T<З е,

i.e., the dispersion of the results recorded in the experiment (st/ 2) is equal to the sum of the dispersions of the true results ((Xm 2) and errors (0 e 2).

Ooo 2 characterizes idealized (i.e., error-free) interclass variation, and e 2 characterizes intraclass variation. The influence of o e 2 changes the distribution of test results (Fig. 2).

By definition, the reliability coefficient (Hz) is equal to the ratio of the true variance to the variance recorded in the experiment:

In other words, r p is simply the proportion of true variation in the variation that is recorded in experience.

In addition to the reliability coefficient, the reliability index is also used:

which is considered as a theoretical correlation coefficient between the recorded test values ​​and the true ones. They also use the concept of standard error of reliability, which is understood as the standard deviation of the recorded test results (X () from the regression line connecting the value of X g with the true results (X") - Fig. 3.

2.2 Reliability assessment based on experimental data

The concept of a true test result is an abstraction. Hoe cannot be measured experimentally (after all, it is impossible in reality to carry out an infinitely large number of observations under the same conditions). Therefore, we have to use indirect methods.

The most preferable method for assessing reliability is analysis of variance followed by calculation of the so-called intraclass correlation coefficients.

Analysis of variance, as is known, makes it possible to decompose the experimentally recorded variation in test results into components due to the influence of individual factors. For example, if we register the results of subjects in any test, repeating this test on different days, and making several attempts on each day, periodically changing experimenters, then a variation will occur:

a) from subject to subject (interindividual variation),

b) from day to day,

c) from experimenter to experimenter,

d) from attempt to attempt.

Analysis of variance makes it possible to isolate and evaluate the variations caused by these factors.

A simplified example shows how this is done. Let’s assume that the results of two attempts were measured in 5 subjects (k = 5, n = 2)

The results of variance analysis (see the course in mathematical statistics, as well as Appendix 1 to the first part of the book) are given in the traditional form in table. 2.

Table 2

Reliability is assessed using the so-called intraclass correlation coefficient:

where r "i is the intraclass correlation coefficient (reliability coefficient, which, in order to distinguish it from the usual correlation coefficient (r), is denoted with an additional prime (r")\

n -- number of attempts used in the test;

n" - the number of attempts for which the reliability assessment is carried out.

For example, if they want to estimate the reliability of the average of two attempts based on the data given in the example, then

If we limit ourselves to only one attempt, then the reliability will be equal to:

and if you increase the number of attempts to four, the reliability coefficient will also increase slightly:

Thus, in order to assess reliability, it is necessary, firstly, to perform an analysis of variance and, secondly, to calculate the intraclass correlation coefficient (reliability coefficient).

Some difficulties arise when there is a so-called trend, i.e. a systematic increase or decrease in results from attempt to attempt (Fig. 4). In this case, more complex methods for assessing reliability are used (they are not described in this book).

For the case of two attempts and the absence of a trend, the values ​​of the intraclass correlation coefficient practically coincide with the values ​​of the usual correlation coefficient between the results of the first and second attempts. Therefore, in such situations, the usual correlation coefficient can be used to assess reliability (it estimates the reliability of one rather than two attempts). However, if the number of retries in a test is more than two, and especially if complex test designs are used,

Rice. 4. A series of six attempts, of which the first three (left) or the last three (right) are subject to the trend

(for example, 2 attempts per day for two days), calculation of the intraclass coefficient is necessary.

The reliability coefficient is not an absolute indicator characterizing the test. This coefficient may vary depending on the population of subjects (for example, it may be different for beginners and skilled athletes), testing conditions (whether repeated attempts are carried out one after another or, say, at intervals of one week) and other reasons. Therefore, it is always necessary to describe how and on whom the test was carried out.

2.3 Reliability in test practice

The unreliability of experimental data reduces the magnitude of estimates of correlation coefficients. Since no test can correlate more with another test than with itself, the upper limit for estimating the correlation coefficient here is no longer ±1.00, but the reliability index

g (oo = Y~g and

To move from estimating correlation coefficients between empirical data to estimating the correlation between true values, you can use the expression

where r xy is the correlation between the true values ​​of X and Y;

1~xy -- correlation between empirical data; HzI^ - assessment of the reliability of X and Y.

For example, if r xy = 0.60, r xx = 0.80 and r yy = 0.90, then the correlation between the true values ​​is 0.707.

The given formula (6) is called the reduction correction (or the Spearman-Brown formula), it is constantly used in practice.

There is no fixed reliability value that allows a test to be considered acceptable. It all depends on the importance of the conclusions drawn from the application of the test. And yet, in most cases in sports, the following approximate guidelines can be used: 0.95--0.99 --¦ excellent reliability, 0.90-^0.94 - - good, 0.80--0.89 - acceptable, 0.70--0.79 - bad, 0.60--0.69 - doubtful for individual assessments, the test is suitable only for characterizing a group of subjects.

You can achieve some improvement in test reliability by increasing the number of retries. Here is how, for example, in the experiment the reliability of the test (throwing a 350 g grenade with a running start) increased as the number of attempts increased: 1 attempt - 0.53, 2 attempts - 0.72, 3 attempts - 0.78, 4 attempts -- 0.80, 5 attempts -- 0.82, 6 attempts -- 0.84. The example shows that if at first reliability increases quickly, then after 3-4 attempts the increase slows down significantly.

With several repeated attempts, the results can be determined in different ways: a) by the best attempt, b) by the arithmetic mean, c) by the median, d) by the average of two or three best attempts, etc. Research has shown that in most cases The most reliable is to use the arithmetic mean, the median is somewhat less reliable, and the best attempt is even less reliable.

When talking about the reliability of tests, a distinction is made between their stability (reproducibility), consistency, and equivalence.

2.4 Test stability

Test stability refers to the reproducibility of results when repeated after a certain time under the same conditions. Repeated testing is usually called a retest. The test stability assessment scheme is as follows: 1

In this case, two cases are distinguished. In one, a retest is carried out in order to obtain reliable data on the condition of the subject during the entire time interval between the test and retest (for example, to obtain reliable data on the functional capabilities of skiers in June, they are measured twice with an interval of one week). In this case, accurate test results are important and reliability should be assessed using analysis of variance.

In another case, it may be important only to preserve the order of the subjects in the group (whether the first remains first, the last remains among the last). In this case, stability is assessed by the correlation coefficient between test and retest.

The stability of the test depends on:

type of test

contingent of subjects,

time interval between test and retest. For example, morphological characteristics at small

time intervals are very stable; tests for accuracy of movements (for example, throwing at a target) have the least stability.

In adults, test results are more stable than in children; among athletes they are more stable than among those who do not engage in sports.

As the time interval between test and retest increases, test stability decreases (Table 3).

2.5 Test consistency

The consistency of the test is characterized by the independence of the test results from the personal qualities of the person conducting or evaluating the test." Consistency is determined by the degree of agreement of the results obtained on the same subjects by different experimenters, judges, and experts. In this case, two options are possible:

The person administering the test only evaluates the test results without influencing its performance. For example, different examiners may evaluate the same written work differently. Judges’ assessments in gymnastics, figure skating, boxing, manual timing indicators, electrocardiogram or radiograph assessments by different doctors, etc. often differ.

The person performing the test influences the results. For example, some experimenters are more persistent and demanding than others and are better at motivating subjects. This affects results (which themselves can be measured quite objectively).

Test consistency is essentially the reliability of the test's scores when different people administer the test.

1 Instead of the term “consistency,” the term “objectivity” is often used. This use of words is unfortunate, since the coincidence of the results of different experimenters or judges (experts) does not at all indicate their objectivity. Together they can consciously or unconsciously make mistakes, distorting the objective truth.

2.6 Test equivalence

Often a test is the result of a selection from a certain number of similar tests.

For example, throwing a basketball basket can be done from different points, sprinting can be done over a distance of, say, 50, 60 or 100 m, pull-ups can be done on rings or a bar, with an overhand or underhand grip, etc.

In such cases, the so-called parallel forms method can be used, when subjects are asked to perform two versions of the same test and then the degree of agreement between the results is assessed. The testing scheme here is as follows:

The correlation coefficient calculated between test results is called the equivalence coefficient. The attitude towards test equivalence depends on the specific situation. On the one hand, if two or more tests are equivalent, their combined use increases the reliability of the estimates; on the other hand, it may be useful to leave only one equivalent test in the battery - this will simplify testing and only slightly reduce the information content of the test set. The solution to this issue depends on reasons such as the complexity and cumbersomeness of the tests, the degree of required testing accuracy, etc.

If all the tests included in a test suite are highly equivalent, it is called homogeneous. This entire complex measures one property of human motor skills. Let's say a complex consisting of standing long, vertical and triple jumps is likely to be homogeneous. On the contrary, if there are no equivalent tests in the complex, then all the tests included in it measure different properties. Such a complex is called heterogeneous. Example of a heterogeneous battery of tests: pull-ups on the bar, bending forward (to test flexibility), 1500 m run.

2.7 Ways to improve test reliability

The reliability of tests can be increased to a certain extent by:

a) more stringent standardization of testing,

b) increasing the number of attempts,

c) increasing the number of appraisers (judges, experts) and increasing the consistency of their opinions,

d) increasing the number of equivalent tests,

e) better motivation of the subjects.

3. INFORMATIVE TESTS

3.1 Basic concepts

The informativeness of a test is the degree of accuracy with which it measures the property (quality, ability, characteristic, etc.) that it is used to evaluate. Informativeness is often also called validity (from the English uaNaNu - validity, validity, legality). Let us assume that to determine the level of special strength preparedness of sprinters - runners and swimmers - they want to use the following indicators: 1) carpal dynamometry, 2) plantar flexion strength of the foot, 3) strength of the extensors of the shoulder joint (these muscles bear a large load when swimming crawl) , 4) strength of the neck extensor muscles. Based on these tests, it is proposed to manage the training process, in particular, to find weak links in the motor system and purposefully strengthen them. Are the tests chosen good? Are they informative? Even without conducting special experiments, one can guess that the second test is probably informative for sprinters and runners, the third for swimmers, and the first and fourth, probably, will not show anything interesting for either swimmers or runners (although they may be very useful in other sports, such as wrestling). In different cases, the same tests may have different information content.

The question about the informativeness of the test is divided into 2 specific questions:

What does this test measure?

How exactly does he do this?

For example, is it possible to judge the fitness of long distance runners based on such an indicator as maximum oxygen consumption (MOC), and if so, with what degree of accuracy? In other words, what is the information content of the IPC among stayers? Can this test be used in the control process?

If the test is used to determine (diagnose) the athlete’s condition at the time of examination, then they speak of diagnostic informativeness. If, based on the test results, they want to draw a conclusion about the athlete’s possible future performance, the test must have predictive information. A test can be diagnostically informative, but not prognostically, and vice versa.

The degree of information content can be characterized quantitatively - on the basis of experimental data (the so-called empirical information content) and qualitatively - on the basis of a meaningful analysis of the situation (substantive, or logical, information content).

3.2 Empirical information content (case one - there is a measurable criterion)

The idea of ​​determining empirical information content is that the test results are compared with some criterion. To do this, calculate the correlation coefficient between the criterion and the test (this coefficient is called the informativeness coefficient and is denoted r gk, where I is the first letter in the word “test”, k in the word “criterion”).

The criterion is taken to be an indicator that obviously and indisputably reflects the property that is going to be measured using the test.

It often happens that there is a well-defined criterion with which the proposed test can be compared. For example, when assessing the special preparedness of athletes in sports with objectively measured results, the result itself usually serves as such a criterion: the test whose correlation with the sports result is higher is more informative. In the case of determining prognostic information content, the criterion is the indicator whose forecast must be carried out (for example, if the length of a child’s body is predicted, the criterion is the length of his body in adulthood).

The most common criteria in sports metrology are:

Sports result.

Any quantitative characteristic of a basic sports exercise (for example, stride length in running, push-off force in jumping, success of fighting under the backboard in basketball, serving in tennis or volleyball, percentage of accurate long passes in football).

The results of another test, the information content of which has been proven (this is done if conducting a criterion test is cumbersome and difficult and you can select another test that is equally informative, but simpler. For example, instead of gas exchange, determine the heart rate). This special case, when the criterion is another test, is called competitive information content.

Belonging to a specific group. For example, you can compare members of the national team, masters of sports and first-class athletes; belonging to one of these groups is a criterion. In this case, special types of correlation analysis are used.

The so-called composite criterion, for example the sum of points in the all-around. In this case, all-around types and points tables can be either generally accepted or newly compiled by the experimenter (for how the tables are compiled, see the next chapter). A composite criterion is resorted to when there is no single criterion (for example, if the task is to assess general physical fitness, a player’s skill in sports games, etc., not a single indicator taken by itself can serve as a criterion).

An example of determining the information content of the same test - running speed of 30 m on the move for men - with different criteria is given in Table 4.

The question of choosing a criterion is essentially the most important in determining the real meaning and informativeness of the test. For example, if the task is to determine the information content of such a test as the standing long jump of sprinters, then you can choose different criteria: the result in the 100 m run, step length, the ratio of step length to leg length or to height, etc. Information content the test will change in this case (in the example given, it increased from 0.558 for running speed to 0.781 for the “step length/leg length” ratio).

In sports where it is impossible to objectively measure sportsmanship, they try to get around this difficulty by introducing artificial criteria. For example, in team sports games, experts rank all the players according to their skill in a certain order (i.e., they make lists of the 20, 50, or, say, 100 strongest players). The place occupied by the athlete (as they say, his rank) is considered as a criterion with which the test results are compared in order to determine their informativeness.

The question arises: why use tests if the criterion is known? For example, isn’t it easier to organize control competitions and determine sports results than to determine achievements in control exercises? The use of tests has the following advantages:

a sports result is not always possible or advisable to determine (for example, marathon running competitions cannot be held often; in winter it is usually impossible to register a result in javelin throwing, and in summer in cross-country skiing);

a sports result depends on many reasons (factors), such as the athlete’s strength, endurance, technique, etc. The use of tests makes it possible to determine the strengths and weaknesses of an athlete and evaluate each of these factors separately

3.3 Empirical informativeness (case two - there is no single criterion; factorial informativeness)

It often happens that there is no single criterion with which the results of proposed tests can be compared. Let’s say they want to find the most informative tests to assess the strength readiness of young people. What to prefer: pull-ups on the bar or push-ups, squats with a barbell, barbell rows, or going into a squat from a supine position? What could be the criterion for choosing the right test here?

You can offer subjects a large battery of various strength tests, and then select among them those that give the greatest correlation with the results of the entire complex (after all, you cannot systematically use the entire complex - it is too cumbersome and inconvenient). These tests will be the most informative: they will provide information about the possible results of the subjects for the entire initial set of tests. But the results in a set of tests are not expressed in one number. It is possible, of course, to form some kind of composite criterion (for example, to determine the amount of points scored on some scale). However, another way, based on the ideas of factor analysis, is much more effective.

Factor analysis is one of the methods of multivariate statistics (the word “multidimensional” indicates that many different indicators are studied simultaneously, for example, the results of subjects in many tests). This is a rather complex method, so here it is advisable to limit ourselves to presenting only its main idea.

Factor analysis proceeds from the fact that the result of any test is a consequence of the simultaneous action of a number of directly unobservable (otherwise known as latent) factors. For example, results in running 100, 800 and 5000 m depend on the athlete’s speed, strength, endurance, etc. The significance of these factors for each distance is not equally important. If you choose two tests that are influenced approximately equally by the same factors, then the results in these tests will be highly correlated with each other (say, in running at distances of 800 and 1000 m). If tests have no common factors or they have little influence on the results, the correlation between these tests will be low (for example, the correlation between performance in the 100 m and 5000 m). When a large number of different tests are taken and correlation coefficients between them are calculated, then using factor analysis it is possible to determine how many factors act together on these tests and what is the degree of their contribution to each test. And then it is easy to select tests (or combinations thereof) that most accurately assess the level of individual factors. This is the idea of ​​factorial information content of tests. The following example of a specific experiment shows how this is done.

The task was to find the most informative tests for assessing the general strength readiness of third- and first-class student-athletes involved in different sports. For this purpose, it was examined. (N.V. Averkovich, V.M. Zatsiorsky, 1966) according to 15 tests, 108 people. As a result of factor analysis, three factors were identified: 1) strength of the upper limbs, 2) strength of the lower limbs, 3) strength of the abdominal muscles and hip flexors. The most informative tests among those tested were: for the first factor - push-ups, for the second - a standing long jump, for the third - raising straight legs while hanging and the maximum number of transitions to a squat from a supine position for 1 minute . If we limit ourselves to only one test, then the most informative was the force-flip on the crossbar (the number of repetitions was assessed).

3.4 Empirical informatics in practical work

When using empirical informativeness indicators in practice, it should be borne in mind that they are valid only in relation to those subjects and the conditions for which they are calculated. A test that is informative in a group of beginners may turn out to be completely uninformative if you try to use it in a group of masters of sports.

The information content of the test is not the same in different groups. In particular, in groups that are more homogeneous in composition, the test is usually less informative. If the information content of a test in any group is determined, and then the strongest of it are included in the national team, then the information content of the same test in the national team will be significantly lower. The reasons for this are clear from Fig. 5: selection reduces the overall variance of results in the group and reduces the magnitude of the correlation coefficient. For example, if we determine the information content of such a test as the MPC of 400 m swimmers who have sharply different results (say, from 3.55 to 6.30), then the information content coefficient will be very high (Y 4th>0.90); if we carry out the same measurements in a group of swimmers with results of 3.55 to 4.30, g No. in absolute value will not exceed 0.4--0.6; if we determine the same indicator among the strongest swimmers in the world (3.53>, 5=4.00), the coefficient of information content in general ""may be equal to zero: with the help of this test alone it will be impossible to distinguish between swimmers swimming, say, 3.55 and 3.59: and those and others have MIC values. will be high and approximately the same.

Informativeness coefficients very much depend on the reliability of the test and criterion. A test with low reliability is always not very informative, so it makes no sense to check low-reliability tests for information content. Insufficient reliability of the criterion also leads to a decrease in informativeness coefficients. However, in this case, it would be wrong to neglect the test as uninformative - after all, the upper limit of the possible correlation of a test is not ±1, but its reliability index. Therefore, it is necessary to compare the information content coefficient with this index. The actual information content (adjusted for the unreliability of the criterion) is calculated using the formula:

Thus, in one of the works, the rank of an athlete in water polo (rank was considered as a criterion of skill) was established based on the assessments of 4 experts. Reliability (consistency) of the criterion, determined using the intraclass correlation coefficient, was 0.64. The information coefficient was 0.56. The actual coefficient of information content (adjusted for the unreliability of the criterion) is equal to:

Closely related to the informativeness and reliability of the test is the concept of its discriminative ability, which is understood as the minimal difference between subjects that is diagnosed using the test (this concept is similar in meaning to the concept of the sensitivity of the device). The discriminative ability of the test depends on:

Interindividual variation in results. For example, a test such as “maximum number of repeated throws of a basketball against a wall from a distance of 4 m within 10 seconds” is good for beginners, but unsuitable for skilled basketball players, since they all show approximately the same result and become indistinguishable . In many cases, interrater variation (interclass variation) can be increased by increasing the difficulty of the test. For example, if you give athletes of different qualifications a functional test that is easy for them (say, 20 squats or working on a bicycle ergometer with a power of 200 kgm/min), then the magnitude of physiological changes in everyone will be approximately the same and it will be impossible to assess the degree of readiness. If you offer them a difficult task, then the differences between the athletes will become large, and based on the test results it will be possible to judge the preparedness of the athletes.

Reliability (i.e., the relationship between inter- and intra-individual variation) of the test and criterion. If the results of the same subject in the standing long jump vary, say,

In cases ±10 cm, then, although the length of the jump can be determined with an accuracy of ±1 cm, it is impossible to distinguish with confidence the subjects whose “true” results are 315 and 316 cm.

There is no fixed value for the information content of a test, after which the test can be considered suitable. Much depends on the specific situation: the desired accuracy of the prediction, the need to obtain at least some additional information about the athlete, etc. In practice, tests are used for diagnostics, the information content of which is not less than 0.3. For a forecast, as a rule, a higher information content is needed - at least 0.6.

The information content of a battery of tests is naturally higher than the information content of one test. It often happens that the information content of one individual test is too low to use this test. The information content of a battery of tests may be quite sufficient.

The information content of a test cannot always be determined using an experiment and mathematical processing of its results. For example, if the task is to develop tickets for exams or topics for dissertations (this is also a type of testing), it is necessary to select questions that are the most informative, by which you can most accurately assess the knowledge of graduates and their preparedness for practical work. So far, in such cases, they rely only on a logical, meaningful analysis of the situation.

Sometimes it happens that the information content of a test is clear without any experiments, especially when the test is simply part of the actions that an athlete performs in competitions. Experiments are hardly needed to prove the informativeness of such indicators as the time it takes to perform turns in swimming, the speed in the last steps of the run-up in the long jump, the percentage of free throws in basketball, the quality of the serve in tennis or volleyball.

However, not all such tests are equally informative. For example, a throw-in in football, although an element of the game, can hardly be considered one of the most important indicators of the skill of football players. If there are many such tests and you need to select the most informative ones, you cannot do without mathematical methods of test theory.

The content analysis of the information content of the test and its experimental and mathematical justification should complement each other. None of these approaches taken on their own is sufficient. In particular, if as a result of an experiment a high coefficient of information content of a test is determined, it is necessary to check whether this is not a consequence of the so-called false correlation. It is known that false correlations appear when the results of both correlated characteristics are influenced by some third indicator, which in itself does not represent

interest. For example, among high school students one can find a significant correlation between the result in the 100 m run and knowledge of geometry, since they, compared to elementary school students, on average will show higher performance in both running and knowledge of geometry. The third, extraneous feature that caused the emergence of a correlation was the age of the subjects. Of course, the researcher who did not notice this and recommended the geometry exam as a test for 100 m runners would make a mistake. In order to avoid making such mistakes, it is necessary to analyze the cause-and-effect relationships that caused the correlation between the criterion and the test. It is useful, in particular, to imagine what would happen if the test scores improved. Will this lead to an increase in criterion results? In the example above, this means: if the student knows geometry better, will he be faster in the 100 m race? The obvious negative answer leads to a natural conclusion: knowledge of geometry cannot serve as a test for sprinters. The correlation found is false. Of course, real-life situations are much more complex than this deliberately stupid example.

A special case of meaningful informativeness of tests is informativeness by definition. In this case, they simply agree on what meaning should be put into this or that word (term). For example, they say: “a standing high jump is characterized by jumping ability.” It would be more accurate to say this: “let’s agree to call jumping ability what is measured by the result of jumping up from a place.” Such mutual agreement is necessary, since it prevents unnecessary misunderstandings (after all, someone may understand by jumping ability the results in a ten-fold jump on one leg, and consider a standing high jump, say, a test of “explosive” leg strength).

56.0 Standardization of tests

Standardization of physical fitness tests to assess human aerobic performance is achieved by adhering to the following principles.

The testing methodology must allow for direct measurements or indirect calculation of the body's maximum oxygen consumption (aerobic capacity), since this physiological indicator of human physical fitness is the most important. It will be designated by the symbol gpax1ggsht U 0g and expressed in milliliters per kilogram of the subject’s weight per minute (ml/kg-min.).

In general, the test methodology should be the same for both laboratory and field measurements, however:

1. In laboratory conditions (in stationary and mobile laboratories), a person’s aerobic performance can be directly determined using fairly complex equipment and a large number of measurements.

2. In the field, aerobic performance is assessed indirectly based on a limited number of physiological measurements.

The test methodology should allow comparison of their results.

Testing should be carried out in one day and preferably without interruptions. This will make it possible to expediently distribute time, equipment, and effort during initial and re-testing.

The testing methodology must be flexible enough to allow testing of groups of people with different physical abilities, different ages, genders, different activity levels, etc.

57.0. Equipment selection

All of the above principles of physiological testing can be observed, first of all, subject to the correct selection of the following technical means:

treadmill,

bicycle ergometer,

stepergometer,

necessary auxiliary equipment that can be used in any type of test.

57.1. The treadmill can be used in a wide variety of studies. However, this device is the most expensive. Even the smallest version is too bulky to be widely used in the field. The treadmill should allow speeds from 3 to (at least) 8 km/h (2-5 mph) and inclines from 0 to 30%. The inclination of a treadmill is defined as the percentage of vertical rise to the horizontal distance traveled."

Distance and vertical elevation must be expressed in meters, speed in meters per second (m/sec) or kilometers per hour (km/h).

57.2. Bicycle ergometer. This device is easy to use both in laboratory and field conditions. It is quite versatile; it can be used to perform work of varying intensity - from minimal to maximum level.

The bicycle ergometer has a mechanical or electrical braking system. The electric braking system can be powered either from an external source or from a generator located on the ergometer.

Adjustable mechanical resistance is expressed in kilogram meters per minute (kgm/min) and in watts. Kilometers per minute are converted to watts using the formula:

1 watt = 6 kgm/min. 2

The bicycle ergometer must have a movably fixed seat so that the height of its position can be adjusted for each individual person. When testing, the seat is installed in such a way that the person sitting on it can reach the lower pedal with an almost fully straightened leg. On average, the distance between the seat and the pedal in the maximum lowered position should be 109% of the length of the test subject's leg.

There are various designs of bicycle ergometer. However, the type of ergometer does not affect the results of the experiment if the specified resistance in watts or kilograms per minute exactly corresponds to the total external load.

Stepergometer. This is a relatively inexpensive device with adjustable step heights from 0 to 50 cm. Like a bicycle ergometer, it can be easily used both in the laboratory and in the field.

Comparison of three testing options. Each of these instruments has its own advantages and disadvantages (depending on whether it is used in laboratories or in the field). Usually, when working on a treadmill, the value of max1ggsht U07 is slightly larger than when working on a bicycle ergometer; in turn, the readings on the bicycle ergometer exceed the readings on the stepergometer.

The level of energy expenditure of subjects at rest or performing a task to overcome gravity is directly proportional to their weight. Therefore, exercises on the treadmill and stepergometer create for all subjects the same relative workload of lifting (their body. - Ed.) to a given height: at a given speed and inclination of the treadmill, frequency of steps and heights of steps on the stepergometer, the height of the body will be lifted - is the same (but the work performed is different. - Ed.). On the other hand, a bicycle ergometer at a fixed value of a given load requires almost the same energy expenditure, regardless of the gender and age of the subject.

58.0, General Notes on Test Procedures

To apply tests to large groups of people, simple and time-efficient testing methods are needed. However, for a more detailed study of the physiological characteristics of the subject, more in-depth and labor-intensive tests are needed. To get more value from tests and use them more flexibly, it is necessary to find the optimal compromise between these two requirements.

58.1. Work intensity. Testing must begin with small loads that the weakest of the test subjects can handle. Assessment of the adaptive capabilities of the cardiovascular and respiratory systems should be carried out during work with gradually increasing loads. Functional limits must therefore be established with sufficient precision. Practical considerations suggest taking the baseline metabolic rate (i.e., resting metabolic rate) as a unit of measurement for the amount of energy required to perform a given activity. The initial load and its subsequent stages are expressed in Meta, multiples of the metabolic rate of a person in a state of complete rest. The physiological indicators underlying Meta are the amount of oxygen (in milliliters per minute) consumed by a person at rest, or its caloric equivalent (in kilocalories per minute).

To monitor loads in Met units or equivalent oxygen consumption values ​​directly during testing, complex electronic computing equipment is required, which is currently still relatively inaccessible. Therefore, when determining the amount of oxygen required by the body to perform loads of a certain type and intensity, it is practically convenient to use empirical formulas. The predicted (based on empirical formulas. - Ed.) values ​​of oxygen consumption when working on a treadmill - by speed and inclination, during a step test - by height and frequency of steps are in good agreement with the results of direct measurements and can be used as the physiological equivalent of physical effort, with which all physiological indicators obtained during testing are correlated.

58.2. Duration of tests. The desire to shorten the testing process should not be to the detriment of the goals and objectives of the test. Tests that are too short will not produce sufficiently distinguishable results and their discriminative capabilities will be small; tests that are too long activate thermoregulatory mechanisms to a greater extent, which interferes with the establishment of maximum aerobic performance. In the recommended testing procedure, each load level is maintained for 2 minutes. The average test time is from 10 to 16 minutes.

58.3. Indications for stopping the test. Testing should be stopped unless:

pulse pressure drops steadily despite increased workload;

systolic blood pressure exceeds 240--250 mmHg. Art.;

diastolic blood pressure rises above 125 mm Hg. Art.;

symptoms of malaise appear, such as increasing chest pain, severe shortness of breath, intermittent claudication;

clinical signs of anoxia appear: pallor or cyanosis of the face, dizziness, psychotic phenomena, lack of response to irritation;

Electrocardiogram readings indicate paroxysmal superventricular or ventricular arrhythmia, the appearance of ventricular extrasystolic complexes that occur before the end of the T wave, conduction disturbances, except for mild L V blockade, a decrease in /?--5G horizontal or descending type by more than 0.3 mV . .;";, -

58.4. Precautions.

Health of the subject. Before being examined, the subject must undergo a medical examination and receive a certificate stating that he is healthy. It is highly advisable to do an electrocardiogram (at least one chest lead). For men over 40 years of age, an electrocardiogram is mandatory. Regularly repeated blood pressure measurements should be an integral part of the entire testing procedure. At the end of testing, subjects should be informed about measures to prevent dangerous accumulation of blood in the lower extremities.

Contraindications. The subject is not allowed to take tests in the following cases:

lack of permission from a doctor to take part in tests with maximum loads;

oral temperature exceeds 37.5°C;

heart rate after a long rest is above 100 beats/min;

obvious decline in cardiac activity;

a case of myocardial infarction or myocarditis in the last 3 months; symptoms and electrocardiogram readings indicating the presence of these diseases; signs of angina pectoris;

infectious diseases, including colds.

Menstruation is not a contraindication to participation in the tests. However, in some cases it is advisable to change the schedule of their holding.

B. STANDARD TESTS

59.0. Description of the main methodology for conducting standard

In all three types of exercise, and regardless of whether the test is performed at a maximal or submaximal load, the basic testing procedure is the same.

The subject comes to the laboratory in light sportswear and soft shoes. Within 2 hours. Before starting the test, he should not eat, drink coffee, or smoke.

Rest. The test is preceded by a rest period that lasts 15 minutes. During this time, while the physiological measuring instruments are being installed, the subject sits comfortably in a chair.

Accommodation period. The very first testing of any subject, like all repeated tests, will give fairly reliable results if the main test is preceded by a short period of exercise with a low load - a period of accommodation. It lasts 3 minutes. and serves the following purposes:

familiarize the subject with the equipment and type of work that he must perform;

preliminary study of the physiological response of the subject to a load of approximately 4 Meta, which corresponds to a heart rate of approximately 100 beats/min;

speed up the body’s adaptation to the actual test itself.

Rest. The accommodation period is followed by a short (2 min.) rest period; the subject sits comfortably in a chair while the experimenter makes the necessary technical preparations.

Test. At the beginning of the test, a load equal to the load of the accommodation period is set, and the subject performs the exercises without interruption until the test is completed. Every 2 min. work load increases by 1 Meter.

Testing stops when one of the following conditions occurs:

the subject is unable to continue performing the task;

there are signs of physiological decompensation (see 58.3);

data obtained at the last stage of the load allow the extrapolation of maximum aerobic performance based on sequential physiological measurements (performed during testing. - Editor's note).

59.5. Measurements. Maximum oxygen consumption in milliliters per kilogram per minute is measured directly or calculated. Methods for determining oxygen consumption are very diverse, as are the additional techniques used to analyze the physiological capabilities of each individual. This will be discussed in more detail later.

59.6. Recovery. At the end of the experiment, physiological observation continues for at least 3 minutes. The subject again rests in a chair, slightly raising his legs.

Note. The described testing technique provides comparable physiological data obtained with the same sequence of increasing load on a treadmill, bicycle ergometer and stepergometer. Below, the testing methodology is described separately for each of the three devices.

60.0. Treadmill test

Equipment. Treadmill and necessary auxiliary equipment.

Description. The basic testing procedures described in 59.0 are carefully followed.

The speed of the treadmill with the subject walking on it is 80 m/min (4.8 km/h, or 3 mph). At this speed, the energy required to move horizontally is approximately 3 Meta; Each 2.5% increase in slope adds one unit of initial metabolic rate, i.e. 1 Met, to energy expenditure. At the end of the first 2 min. the inclination of the treadmill quickly increases to 5%, at the end of the next 2 minutes - to 7.5%, then to 10%, 12.5%, etc. The complete scheme is given in table. 1.

Similar documents

    Conducting control tests using control exercises or tests to determine readiness for physical exercise. The problem of test standardization. External and internal validity of tests. Maintaining a control examination protocol.

    abstract, added 11/12/2009

    Characteristics of motor abilities and methods for developing flexibility, endurance, agility, strength and speed. Testing the motor abilities of schoolchildren in physical education lessons. Application of motor tests in practical activities.

    thesis, added 02/25/2011

    Assessment of the dynamics of changes in anthropometric data in schoolchildren systematically involved in athletics and schoolchildren who do not participate in sports sections. Development of tests to determine general physical fitness; analysis of results.

    thesis, added 07/07/2015

    The main directions of using tests, their classification. Tests for selection in wrestling. Methods for assessing sports achievements. Testing a wrestler's special endurance. The relationship between test indicators and the technical skill of freestyle wrestlers.

    thesis, added 03/03/2012

    Assessing a swimmer’s special endurance using control exercises. Adaptability of the basic reactions of physiological systems in an aquatic environment. Development of principles for assessing medical and biological indicators used when testing a swimmer.

    article, added 08/03/2009

    Consideration of healthy energy as the fundamental basis of health. Familiarization with the features of gymnastic exercises according to the qigong system. Selection of a set of exercises for home exercises. Drawing up tests to draw conclusions on the work done.

    thesis, added 07/07/2015

    Sports metrology is the study of physical quantities in physical education and sports. Basics of measurement, theory of tests, assessments and norms. Methods for obtaining information on quantitative assessment of the quality of indicators; qualimetry. Elements of mathematical statistics.

    presentation, added 02/12/2012

    The essence and importance of control in physical education and its types. Testing and evaluation of motor skills acquired in physical education lessons. Testing the level of physical fitness. Monitoring the functional state of students.

    course work, added 06/06/2014

    Calculation of absolute and relative measurement errors. Converting test results into scores using regressive and proportional scales. Ranking of test results. Changes in group placements compared to previous assessments.

    test, added 02/11/2013

    Mode of motor activity. The role of factors determining the physical performance of football players at different stages of long-term training. Types of ergogenic aids. Methodology for conducting tests to determine the level of physical performance.

Fundamentals of test theory 1. Basic concepts of test theory 2. Test reliability and ways to determine it

Test questions 1. What is the test called? 2. What are the requirements for the test? 3. What tests are called authentic? 4. What is the reliability of a test? 5. List the reasons that cause variation in results during repeated testing. 6. How does intraclass variation differ from interclass variation? 7. How to practically determine the reliability of a test? 8. What is the difference between test consistency and stability? 9. What is the equivalence of tests? 10. What is a homogeneous set of tests? 11. What is a heterogeneous set of tests? 12. Ways to improve the reliability of tests.

A test is a measurement or test carried out to determine a person's condition or ability. Not all measurements can be used as tests, but only those that meet special requirements. These include: 1. standardization (the testing procedure and conditions must be the same in all cases of using the test); 2. reliability; 3. information content; 4. Availability of a rating system.

Test requirements: n Information content - the degree of accuracy with which it measures the property (quality, ability, characteristic) for which it is used to evaluate. n Reliability is the degree to which results are consistent when the same people are tested repeatedly under the same conditions. Consistency - (different people, but the same devices and the same conditions). n n Standardity of conditions - (same conditions for repeated measurements). n Availability of a grading system - (translation into a grading system. Like in school 5 -4 -3...).

Tests that meet the requirements of reliability and information content are called sound or authentic (Greek authentiko - in a reliable manner)

The testing process is called testing; the resulting numerical value obtained as a result of the measurement is the test result (or test result). For example, the 100 m run is a test, the procedure for conducting races and timing is testing, and the time of the race is the test result.

Tests based on motor tasks are called motor or motor tests. Their results can be either motor achievements (time to complete the distance, number of repetitions, distance traveled, etc.), or physiological and biochemical indicators.

Sometimes not one, but several tests are used that have a single final goal (for example, assessing the athlete’s condition during the competitive training period). Such a group of tests is called a set or battery of tests.

The same test, applied to the same subjects, should give identical results under the same conditions (unless the subjects themselves have changed). However, even with the most stringent standardization and precise equipment, test results always vary somewhat. For example, a subject who has just shown a result of 215 kG in the deadlift dynamometry test, when repeated, shows only 190 kG.

Reliability of tests and ways to determine it Reliability of a test is the degree of agreement of results when repeated testing of the same people (or other objects) under the same conditions.

Variation in test-retest results is called within-individual, or within-group, or within-class. Four main reasons cause this variation: 1. Change in the state of the subjects (fatigue, training, “learning”, change in motivation, concentration, etc.). 2. Uncontrolled changes in external conditions and equipment (temperature, wind, humidity, voltage in the electrical network, the presence of unauthorized persons, etc.), i.e., everything that is united by the term “random measurement error.”

Four main reasons cause this variation: 3. A change in the condition of the person administering or scoring the test (and, of course, the replacement of one experimenter or judge by another). 4. Imperfection of the test (there are tests that are obviously unreliable. For example, if the subjects are making free throws into a basketball basket, then even a basketball player with a high percentage of hits can accidentally make a mistake on the first throws).

The concept of a true test result is an abstraction (it cannot be measured experimentally). Therefore, we have to use indirect methods. The most preferable method for assessing reliability is analysis of variance followed by calculation of intraclass correlation coefficients. Analysis of variance makes it possible to decompose the experimentally recorded variation in test results into components determined by the influence of individual factors.

If we register the results of the subjects in any test, repeating this test on different days, and making several attempts every day, periodically changing experimenters, then variations will occur: a) from subject to subject; n b) from day to day; n c) from experimenter to experimenter; n d) from attempt to attempt. Analysis of variance makes it possible to isolate and evaluate these variations. n

Thus, in order to assess the practical reliability of the test, it is necessary, n firstly, to perform an analysis of variance, n secondly, to calculate the intraclass correlation coefficient (reliability coefficient).

Speaking about the reliability of tests, it is necessary to distinguish between their stability (reproducibility), consistency, and equivalence. n n Test stability refers to the reproducibility of results when repeated after a certain time under the same conditions. Repeated testing is usually called a retest. Test consistency is characterized by the independence of test results from the personal qualities of the person administering or evaluating the test.

If all the tests included in a test set are highly equivalent, it is called homogeneous. This entire complex measures one property of human motor skills (for example, a complex consisting of standing long, up and triple jumps; the level of development of speed-strength qualities is assessed). If there are no equivalent tests in the complex, that is, the tests included in it measure different properties, then it is called heterogeneous (for example, a complex consisting of deadlift dynamometry, Abalakov jump, 100 m run).

Test reliability can be improved to a certain extent by: n n n a) more stringent standardization of testing; b) increasing the number of attempts; c) increasing the number of evaluators (judges, experiments) and increasing the consistency of their opinions; d) increasing the number of equivalent tests; e) better motivation of the subjects.

The applications, goals and objectives of software testing are varied, so testing is evaluated and explained in different ways. Sometimes it is difficult for the testers themselves to explain what “as is” software testing is. Confusion ensues.

To untangle this confusion, Alexey Barantsev (practitioner, trainer and consultant in software testing; a native of the Institute of System Programming of the Russian Academy of Sciences) precedes his testing trainings with an introductory video about the main provisions of testing.

It seems to me that in this report the lecturer was able to most adequately and balancedly explain “what testing is” from the point of view of a scientist and programmer. It’s strange that this text has not yet appeared on Habré.

I give here a condensed retelling of this report. At the end of the text there are links to the full version, as well as to the mentioned video.

Testing Basics

Dear colleagues,

First, let's try to understand what testing is NOT.

Testing is not development,

Even if testers know how to program, including tests (automation testing = programming), they can develop some auxiliary programs (for themselves).

However, testing is not a software development activity.

Testing is not analysis,

And not the activity of collecting and analyzing requirements.

Although, during the testing process, sometimes you have to clarify the requirements, and sometimes you have to analyze them. But this activity is not the main one; rather, it has to be done simply out of necessity.

Testing is not management,

Despite the fact that in many organizations there is such a role as “test manager”. Of course, testers need to be managed. But testing in itself is not management.

Testing is not technical writing,

However, testers have to document their tests and their work.

Testing cannot be considered one of these activities simply because during the development process (or analyzing requirements, or writing documentation for their tests), testers do all this work for myself, and not for someone else.

An activity is significant only when it is in demand, that is, testers must produce something “for export”. What do they do “for export”?

Defects, defect descriptions, or test reports? This is partly true.

But this is not the whole truth.

Main activities of testers

is that they provide participants in a software development project with negative feedback about the quality of the software product.

“Negative feedback” does not have any negative connotation, and does not mean that the testers are doing something bad, or that they are doing something bad. It's just a technical term that means a fairly simple thing.

But this thing is very significant, and probably the single most significant component of the activities of testers.

There is a science - “systems theory”. It defines the concept of “feedback”.

“Feedback” is some data that goes back to the input from the output, or some part of the data that goes back to the input from the output. This feedback can be positive or negative.

Both types of feedback are equally important.

In software systems development, positive feedback is, of course, some kind of information we receive from end users. These are requests for some new functionality, this is an increase in sales (if we release a quality product).

Negative feedback can also come from end users in the form of some negative reviews. Or it can come from testers.

The sooner negative feedback is provided, the less energy is needed to modify that signal. That is why testing needs to start as early as possible, at the earliest stages of the project, and provide this feedback both at the design stage and, perhaps, even earlier, at the stage of collecting and analyzing requirements.

By the way, this is where the understanding grows that testers are not responsible for quality. They help those who are responsible for it.

Synonyms for the term "testing"

From the point of view that testing is the provision of negative feedback, the world-famous abbreviation QA (Quality Assurance) is definitely NOT synonymous with the term “testing”.

Merely providing negative feedback cannot be considered quality assurance, because Assurance is some positive measures. It is understood that in this case we ensure quality and take timely measures to ensure that the quality of software development improves.

But “quality control” - Quality Control, can be considered in a broad sense as a synonym for the term “testing”, because quality control is the provision of feedback in its most varied varieties, at various stages of a software project.

Sometimes testing is meant as some separate form of quality control.

The confusion comes from the history of testing development. At different times, the term “testing” meant various actions that can be divided into 2 large classes: external and internal.

External definitions

The definitions that Myers, Beiser, and Kaner gave at different times describe testing precisely from the point of view of its EXTERNAL significance. That is, from their point of view, testing is an activity that is intended FOR something, and does not consist of something. All three of these definitions can be summarized as providing negative feedback.

Internal Definitions

These are definitions that are contained in a standard for terminology used in software engineering, such as a de facto standard called SWEBOK.

Such definitions constructively explain WHAT the testing activity is, but do not give the slightest idea of ​​WHY testing is needed, for which all the results obtained from checking the correspondence between the actual behavior of the program and its expected behavior will then be used.

testing is

  • checking program compliance with requirements,
  • carried out by observing its work
  • in special, artificially created situations, chosen in a certain way.
From here on we will consider this to be the working definition of “testing”.

The general testing scheme is approximately as follows:

  1. The tester receives the program and/or requirements at the entrance.
  2. He does something with them, observes the work of the program in certain situations artificially created by him.
  3. At the output, it receives information about matches and non-matches.
  4. This information is then used to improve the existing program. Or in order to change the requirements for a program that is still being developed.

What is a test

  • This is a special, artificially created situation, chosen in a certain way,
  • and a description of what observations to make about the program's operation
  • to check whether it meets some requirement.
There is no need to assume that the situation is something momentary. The test can be quite long, for example, when testing performance, this artificially created situation can be a load on the system that continues for quite a long time. And the observations that need to be made are a set of different graphs or metrics that we measure during the execution of this test.

The test developer is engaged in selecting a limited set from a huge, potentially infinite set of tests.

Well, thus we can conclude that the tester does two things during testing.

1.Firstly, it controls the execution of the program and creates these very artificial situations in which we are going to check the behavior of the program.

2.And, secondly, he observes the behavior of the program and compares what he sees with what is expected.

If a tester automates tests, then he does not himself observe the behavior of the program - he delegates this task to a special tool or a special program that he himself wrote. It is she who observes, she compares the observed behavior with the expected one, and gives the tester only some final result - whether the observed behavior coincides with the expected one or does not coincide.

Any program is a mechanism for processing information. The input is information in one form, the output is information in some other form. At the same time, a program can have many inputs and outputs, they can be different, that is, a program can have several different interfaces, and these interfaces can have different types:

  • User Interface (UI)
  • Application Programming Interface (API)
  • Network protocol
  • File system
  • Environment state
  • Events
The most common interfaces are
  • custom,
  • graphic,
  • text,
  • cantilevered,
  • and speech.
Using all these interfaces, the tester:
  • somehow creates artificial situations,
  • and checks how the program behaves in these situations.

This is testing.

Other classifications of testing types

The most commonly used division into three levels is
  1. unit testing,
  2. integration testing,
  3. system testing.
Unit testing usually means testing at a fairly low level, that is, testing individual operations, methods, and functions.

System testing refers to testing at the user interface level.

Some other terms are sometimes used, such as "component testing", but I prefer to highlight these three, due to the fact that the technological division between unit and system testing does not make much sense. The same tools and the same techniques can be used at different levels. The division is conditional.

Practice shows that tools that are positioned by the manufacturer as unit testing tools can be used with equal success at the level of testing the entire application as a whole.

And tools that test the entire application at the user interface level sometimes want to look, for example, into the database or call some separate stored procedure there.

That is, the division into system and unit testing is generally speaking purely conditional, speaking from a technical point of view.

The same tools are used, and this is normal, the same techniques are used, at each level we can talk about testing of a different type.

We combine:

That is, we can talk about unit testing of functionality.

We can talk about system testing of functionality.

We can talk about unit testing, for example, efficiency.

We can talk about system effectiveness testing.

Either we consider the effectiveness of a single algorithm, or we consider the effectiveness of the entire system as a whole. That is, the technological division into unit and system testing does not make much sense. Because the same tools, the same techniques can be used at different levels.

Finally, during integration testing we check if modules within a system interact with each other correctly. That is, we actually perform the same tests as during system testing, only we additionally pay attention to how exactly the modules interact with each other. We perform some additional checks. That's the only difference.

Let us once again try to understand the difference between system and unit testing. Since this division occurs quite often, this difference should exist.

And this difference manifests itself when we perform not a technological classification, but a classification by purpose testing.

Classification by goals can be conveniently done using the “magic square”, which was originally invented by Brian Marik and then improved by Ari Tennen.

In this magic square, all types of testing are located in four quadrants, depending on what the tests pay more attention to.

Vertically - the higher the type of testing is, the more attention is paid to some external manifestations of the program’s behavior; the lower it is, the more attention we pay to its internal technological structure of the program.

Horizontally - the further to the left our tests are, the more attention we pay to their programming, the further to the right they are, the more attention we pay to manual testing and human research of the program.

In particular, terms such as acceptance testing, Acceptance Testing, and unit testing can easily be entered into this square in the sense in which it is most often used in the literature. This is low-level testing with a large, overwhelming share of programming. That is, all the tests are programmed, executed completely automatically, and attention is paid primarily to the internal structure of the program, precisely to its technological features.

In the upper right corner we will have manual tests aimed at some external behavior of the program, in particular, usability testing, and in the lower right corner we will most likely have tests of various non-functional properties: performance, security, and so on.

So, based on the classification by purpose, unit testing is in the lower left quadrant, and all other quadrants are system testing.

Thank you for your attention.



Did you like the article? Share with your friends!