A summary of comparative analysis methods of univariate data differences
1 1 table contingency table
Summary of correlation analysis methods of binary (multivariate) data
Second, common statistical problems in measurement data analysis
2. 1 ignores the precondition of t-test.
Title: Clinical study of severe acute pancreatitis complicated with hepatic insufficiency. The experimental data are shown in Table 5. The original author used t-test to analyze this data. Excuse me: Is this right?
Discrimination:
1. The data in Table 5 were tested for homogeneity of variance, and it was found that the indexes of serum amylase and creatinine in the two groups could not meet the requirements of homogeneity of variance, so t test could not be used for analysis.
The correct way: use variable transformation to make the data obey normal distribution and the variance is homogeneous, and then carry out t test, otherwise use nonparametric test.
2. Give accurate statistical data and P value.
2.2 Misuse of grouped design data through T-test analysis and paired design data
Argument: 1 Considering that the data conversion is correct. However, if the group T test is used, two population variance homogeneity are required. However, this study is a paired design, and if the group T test is adopted, the test efficiency will be reduced. Paired t test should be used for analysis. Pay attention to the condition of paired t-test: the difference (d) of each pair of data should be tested for normality. If not, Wilcoxon signed rank test is used. 2. Accurate statistical data and P value should be given.
2.3 Ignore the preconditions of ANOVA
Topic: Signal transduction mechanism of curcumin inhibiting the proliferation of lens epithelial cells.
Question: The author digested the mixed digestive juice of healthy calf lens, collected cells, subcultured them, and took the third generation cells for experiments.
The experiment was divided into three groups: blank control group, model group and curcumin group with 6 samples in each group.
Excuse me: Is the analysis of variance correct?
Discrimination: This experiment is divided into three groups, and the quantitative data is designed by single factor and three levels. First of all, we should test the independence, normality and homogeneity of variance. If we meet the three preconditions of variance analysis, we should conduct variance analysis. If not, use variable transformation or rank sum test. If p
Give accurate statistical data and p value.
2.4 Misuse of test analysis grade data
Title: Retention enema with Zhitong Rushen Decoction for CNUP: A double-blind randomized placebo-controlled trial.
Table 4 Main clinical symptoms and colonoscopy score of intestinal mucosal lesions in two groups
Compared with the control group (G2), there is no significant difference between the two groups (P & gt0.05). ※: △: Compared with before treatment, P
Discrimination:
T-test is not applicable to the comparison of abdominal pain, diarrhea, purulent bloody stool, falling sensation, congestion, edema, mucosal erosion and mucosal ulcer between groups. Because each score is a discontinuous assignment of 1, 2, 3, etc., the data does not obey the normal distribution. The data are sorted into hierarchical data with disorderly grouping and orderly indicators, and rank sum test is used.
When analyzing the "total score", the data should be tested for normality and variance homogeneity, and then t test or rank sum test should be selected.
Compared with the control group (G2), P & gt0.05, and there is no statistical significance between the two groups, so it is unnecessary to express it in the remarks.
Accurate statistical data and p values should be written.
2.5 Misuse of T-test to deal with repeated measurement of two-factor experimental design
Topic: Yiqi Huoxue method to prevent deep venous thrombosis of lower limbs after hip surgery in the elderly
Statistical processing: SPSS 10.0 statistical software was used, and t test was used to compare the measurement data between the two groups.
Discrimination: The design type of this study is: two factors (processing and time) repeatedly measure the design data.
On the premise that the data meet the requirements of "independence", "normality" and "homogeneity of variance", the analysis of variance of repeated measurements with two-factor design should be selected for the test of spherical symmetry, and T test should not be selected to analyze the data.
Objective: The effect of compound Sophora flavescens injection on T lymphocyte subsets in patients with malignant tumor after gamma knife radiotherapy.
DESIGN: Sixty patients with malignant tumor were randomly divided into two groups. The experimental group was given compound Sophora flavescens injection 20ml and physiological saline 500ml at the same time of gamma knife radiotherapy, with intravenous drip 1 time /d and 10 days as 1 course of treatment; The control group only received gamma knife radiotherapy. The test results are shown in Table 7.
Excuse me: Is it correct to choose T test for statistical analysis?
Discrimination:
1. Statistical analysis error
? In this experiment, each subject repeatedly measured the value of the same index at two different time points before and after the experiment, and the data before and after the experiment were not independent of each other. This experimental design belongs to the experimental design of repeated measurement, and time is a test factor related to repeated measurement. The original author separated the whole design with the comparison of T-test, and could not accurately estimate and control the error, so he could not get a reliable conclusion.
Correct practice: The table in Table 7 should be changed into the standard table of repeated measurement test design, and the corresponding analysis of variance should be used for data processing.
2.6 misuse of paired design data t-test to deal with single factor k (k >); 3) Graphic design data
Original title: "Study on the effect of Rhizoma Curcumae on the myoelectric activity of uterus in rats in vivo and its mechanism". In order to observe the effect of zedoary turmeric decoction on the myoelectric activity of pregnant rats, 40 rats were randomly divided into 4 groups: control group: saline 65438 00 ml/kg, zedoary turmeric group: 25%, 50% and 65438 00 ml/kg respectively. The peak area, duration and times of EMG burst in uterus of each rat were observed. The original author used the T test of paired design quantitative data for statistical processing, and the data are shown in Table 4.
Effect of Biaozu Decoction on Myoelectric Activity of Uterus in Rats (Mean Standard Deviation)
Discrimination:
There is no explanation that rats are randomly divided according to important non-experimental factors such as weight as compatibility conditions.
There are four doses in this data, which belong to the quantitative data of single factor four-level design and cannot be used for t-test of group design or paired design.
Measures: Under the premise of normality and homogeneity of variance, the variance analysis of quantitative data adopts single factor four-level design, and Dunnett test or LSD test can be further used after reaching a statistically significant conclusion.
If it is necessary to investigate three indicators at the same time in the specialty, we should also choose the three-way variance analysis of quantitative data in this design for data processing.
Three, the common problems of counting data analysis method
3. The denominator of1is too small when calculating the relative number.
Title: Experimental study on the prevention and treatment of cholesterol gallstone formation with traditional Chinese medicine for soothing liver and benefiting gallbladder. The experimental data are shown in Table 4. Excuse me: What's wrong with expressing information?
Discrimination:
When calculating the relative number, the denominator is too small, the relative number is unstable and prone to distortion, which not only can not correctly reflect the real situation, but also often leads to illusion.
In Table 4, the number of samples in each group is less than 20, which is too small for calculating the ratio. Just give the sample number directly.
3.2 χ2 test analysis results of data misuse variables of ordered variables.
A doctor treated 240 cases of a disease with drugs A and B, and the curative effect was divided into four grades: cured, markedly effective, improved and ineffective, as shown in Table 4. χ2 test of R×C table showed χ 2 = 53.33, P
Discrimination:
This data belongs to the ordered R×C table, and the clinical efficacy is graded. For graded data, Ridit analysis or rank sum test can be used. The χ2 test of R×C table is not the χ2 test of R×C, but the χ2 test of r×C table can only test whether the internal components of the two groups are the same or whether the frequency distribution is the same, but can not test whether the curative effect is different. It is not difficult to see that if any two columns of figures in Table 4 are reversed, it can be clearly found that the χ2 value is still 53.33 and will not change.
3.3 Misuse of χ2 test to answer related questions
Table 1 shows the distribution of coronary atherosclerosis at different ages.
The above data is tested by χ2: χ 2 = 163.05438+0, P < 0.005. The conclusion is that the degree of coronary atherosclerosis is related to age. Combined with this data, it can be seen that the degree of coronary atherosclerosis tends to increase with age.
Q: What's wrong with the statistical analysis methods and conclusions used to process these data?
Discrimination of 1:
This data is "two-way ordered two-dimensional contingency table data with different attributes". There are three purposes to process these data, so there are three different sets of statistical analysis methods.
To analyze whether there are differences in the degree of coronary atherosclerosis in patients of different age groups: as one-way ordered data, rank sum test was selected.
To analyze whether there is a correlation between age and coronary atherosclerosis classification: choose classification correlation.
Linear trend test is used to analyze whether there is a linear trend between them.
The author wants to investigate whether there is correlation between two ordered variables, and the result of χ2 test is P.
In fact, if the frequencies of any two rows or any two columns in the table are exchanged, the statistical value of χ2 test will not change, which shows that χ2 test is not suitable for dealing with two-dimensional contingency table data composed of ordered variables.
Discrimination 2:
In order to investigate whether there is correlation between two ordered variables, we should choose correlation analysis methods to analyze qualitative data, such as Spearman rank correlation analysis, Kendall rank correlation analysis or canonical correlation analysis.
In this case, Spearman rank correlation analysis is used, and the result is: rs=0.532 15, P.
The conclusion is that there is a positive correlation between the two ordered variables in the table, that is, the degree of coronary atherosclerosis increases with age, and the correlation between them is statistically significant.
3.4 High-dimensional contingency table data of multi-valued ordered variables
3.5 If the continuity χ2 test condition is not met, no correction will be made.
3.6 Eleven forms of contingency table
3.7 Misuse of χ2 test instead of Fisher exact test
3.8 Make a pairwise comparison between the direct segments of the R×C table.
Fourth, the expression of statistical analysis methods
(1) In the statistical method, it is indicated that "SPSS software is used for statistical processing". Is this statement correct?
Discrimination:
? From this statement, we can only know what statistical analysis software the original author used to process the data, without specifying the version and serial number of the software; Moreover, it does not reflect the experimental design types and specific statistical analysis methods used in this paper.
(2) In the statistical method, it is indicated that "analysis of variance is used for measurement data". Excuse me: Is this statement correct?
Discrimination:
From this statement, we can only know that the author used variance analysis to process quantitative data, but it is not known whether these statistical analysis methods are correct or not.
Generally speaking, there are three kinds of t-test and 10 analysis of variance that can be used to compare the differences between the mean values. The essential difference between the two is reflected in the "experimental design type" corresponding to quantitative data.
When describing statistical methods, we should write out the methods used completely, that is, add statistical analysis methods before the name of experimental design, such as T-test of paired design quantitative data, T-test of group (or single factor and two levels) design quantitative data or variance analysis of two-factor factorial design quantitative data.
(3) All qualitative data are tested by χ2, right?
Discrimination:
? In fact, qualitative data can usually be compiled as 1 1 contingency table. In statistical analysis, we should choose the corresponding statistical analysis method according to the different forms of contingency table, the purpose of statistical analysis and the actual preconditions of data, and we should not choose it blindly at will, let alone use χ2 test as a general tool to deal with qualitative data.
(4) In many papers, when statistical data is tested for hypothesis, only P >; is listed as P value; 0.05 or p
Assuming the test result is correct, the expression is:
? Descriptive statistics should be written, such as sample mean, ratio, correlation coefficient, regression coefficient, relative risk, median effect, etc. , and its confidence interval, test statistics, such as χ2, t, u, f, etc. ) and p value; Then make statistical inference according to the size of P value, and make corresponding medical professional conclusions.
For example:
Using SPSS to realize pairwise comparison of multi-group ratios
Pearson chi-square test
SPSS is a frequency table of raw data, which needs to be weighted (let the software look at the data horizontally) and then tested by chi-square.
Conditions: (1) Pearson chi-square requires that the total number of cases be greater than 40; (2) The expected count of 0 cells (0.0%) is less than 5. The minimum expected count is 15.25.
Data collection comes from Baidu Library.