Did You Know?
The popular journal Basic and Applied Social Psychology has banned all research papers containing p-values, believing that they lead to low-quality research.
The field of statistics determines the characteristics of a population by collecting and studying data about it. This data is collected from a sample, i.e., a collection of individuals who are assumed to represent the population, since it may not be possible to study each member of a large population, individually.
Statistical techniques are used in a variety of fields, such as sociology, criminology, and epidemiology. One of the most important techniques used here is that of hypothesis testing, in which a hypothetical claim is verified by data analysis to learn more about a population.
Such tests are crucial in deciding which research papers get published in scientific journals, and which drug earns billions of dollars in sales. Let us understand the p-value, a statistical measure of great significance in hypothesis testing.
Null and Alternative Hypotheses
In the field of statistics, a particular claim about a population is studied by using a representative sample by the process of 'hypothesis testing'.
It involves assuming an explanation for the event called the 'hypothesis', even before studies are carried out, and then comparing the collected data to decide if the hypothesis was correct or not. Each event is studied using two hypotheses - null and alternative.
The null hypothesis assumes that different observations in a particular event have no effect on each other. On the contrary, the alternative hypothesis assumes that these observations will affect each other.
For instance, let us assume that the question to be studied is 'Does wearing glasses while flipping a coin affect the outcome (heads or tails)?' The null hypothesis for this would be 'wearing glasses does not affect the outcome of flipping a coin.'
The alternative hypothesis would be 'wearing glasses affects the outcome of flipping a coin.' It is the null hypothesis which any study aims to reject, or 'nullify'.
In statistics, p-value is the probability that some results obtained from an experiment or study may be due to chance, or accident. More precisely, it tries to find out evidence against the null hypothesis, by deciding how many results would be obtained merely by chance if the null hypothesis were true.
Another explanation for p-value is that, it tells us how frequently extreme values would be observed during a study if the null hypothesis were true, thus indicating that this hypothesis is an extreme (i.e., incorrect) event, rather than a representation of the actual scenario. Since it is a measure of probability ('p'), its value is always between 0 and 1.
As mentioned before, p-value tries to find evidence against the null hypothesis, i.e., the event that different observations in a study do not affect each other. Before any study or experiment is performed, a particular 'significance level' is fixed (hence called 'fixed hypothesis testing').
By convention, the most accepted significance level is 5%, or 0.05. Then, the data is collected and the p-value is determined using statistical/spreadsheet software or p-value tables.
P-values that are lesser than the significance level are said to be 'statistically significant', because they present a strong evidence against the null hypothesis. A p-value less than 0.05 indicates that the probability of obtaining a p-value as small as this is 1 in 20 cases (5%), even when the null hypothesis is true.
Therefore, the test rejects the null hypothesis in this case, as such samples are probably obtained incorrectly or due to random chance, since it's rare to get a p-value this low.
P-values that are larger than 0.1 or 10% are said to be 'statistically insignificant', because they provide no evidence against the null hypothesis. This means that, if the null hypothesis is true, high p-values will be obtained in 1-10 cases, or merely 10% of the times. The fact that this p-value has been obtained means that the null hypothesis is true.
Let us assume an example where researchers have discovered (by statistical studies) that a drug is ineffective in treating a disease. Therefore, the null hypothesis is assumed to be true. Now, let's say that this drug was surveyed among a sample from a population, data collected, and the p-value comes to be 0.04.
This means that the assumed null hypothesis (of the drug being ineffective) is likely to be false, because if it were true, such a low p-value would be observed only in 4% of cases. Thus, the studies which led them to believe in the ineffectiveness of the drug could have been affected by sampling errors, or mere chance.
Despite importance in statistical analysis, p-value has had its share of criticism. It allows people to set their own significance levels, either 0.05 or 0.001, though a study which gives a p-value of 0.027 may be statistically significant when compared to the former, but not the latter limit.
P-value is also commonly misinterpreted to conclude that the null hypothesis is true in case of higher values, when this simply indicates that there is no evidence for it to be false. This is the reason why the psychology journal BASP has decided to go for a complete ban on p-values.