The realm of bioinformatics and biological research relies extensively on robust statistical analyses to extract valuable insights from vast datasets. Central to statistical inference are the pivotal concepts of p-values and their significance in association studies. However, as the complexity of research has escalated, along with the prevalence of multiple hypothesis testing, a demand for more stringent adjustments has emerged. Consequently, the introduction of q-values, commonly known as FDR-adjusted p-values, has emerged as a pivotal advancement.
P-values stand as the representation of the probability of obtaining the observed results, or even more extreme outcomes, under the assumption that the null hypothesis holds true. In the domain of biological studies, researchers often engage in comparisons between diverse groups or conditions to discern significant disparities in gene expression, genotype frequencies, or other pertinent variables. The p-value assumes a numerical identity that quantifies the strength of evidence against the null hypothesis.
For instance, in a differential gene expression analysis, a p-value of 0.01 signifies a mere 1% chance of obtaining the observed gene expression data if indeed no genuine distinctions exist between the groups under scrutiny. Traditionally, a p-value threshold of 0.05 is employed to determine statistical significance, implying a 5% probability of observing such results purely by random chance alone.
With the advancement of research, the pervasive practice of multiple-hypothesis testing has given rise to an increase in false positives. The simultaneous conduction of multiple statistical tests inflates the likelihood of encountering at least one false positive, thus culminating in the predicament of the multiple testing problem.
To tackle this predicament, researchers have pioneered the concept of q-values, also acknowledged as FDR-adjusted p-values. The False Discovery Rate (FDR) approach aims to regulate the proportion of false discoveries among all the discoveries made. Consequently, it furnishes a more precise estimate of the false positive rate when conducting multiple tests.
Two widely adopted methods for adjusting p-values encompass the Bonferroni correction and the Benjamini and Hochberg method (FDR control). The Bonferroni correction adheres to a conservative course of action, entailing the multiplication of each raw p-value by the number of tests executed (m). Subsequently, the adjusted p-values (p') are juxtaposed against a new significance threshold, customarily established at α/m, where α denotes the desired significance level.
In contrast, the Benjamini and Hochberg method engenders q-values by organizing the raw p-values in ascending order. The adjusted p-values (q-values) are subsequently derived through the multiplication of the ranked p-values by m/k, where k represents the position of a particular p-value in the sorted vector. al confounders.
In contemporary biological studies, particularly in high-throughput omics analyses encompassing genomics, transcriptomics, and proteomics, researchers often undertake thousands, if not millions, of statistical tests concurrently. Within such intricate scenarios, exercising control over the false discovery rate assumes paramount importance to ensure the reliability and reproducibility of findings.
The Bonferroni correction, though robust, can be overly conservative, thereby amplifying the risk of false negatives, and in turn, potentially overlooking genuine significant findings. Conversely, the Benjamini and Hochberg method stands as a more potent approach, skillfully balancing the delicate trade-off between false positives and false negatives. By rigorously regulating the FDR, researchers can discern and corroborate more meaningful associations with a significantly diminished likelihood of reporting false positives.
Fig. 1. Comparison of the two multiple testing adjustment methods in a matrix plot. (Jafari M, et al., 2019)
In the final analysis, p-values and q-values emerge as indispensable statistical instruments in the domain of biological research, particularly in association studies and omics analyses. While p-values provide a means to assess the evidence against the null hypothesis, their relevance becomes most profound when they are judiciously adjusted to accommodate the multiple-testing problem. The advent of q-values, or FDR-adjusted p-values, represents a revolutionary milestone in the sphere of statistical inference concerning high-throughput data analysis. This transformative advancement empowers researchers to unveil truly significant associations with enhanced accuracy and reliability. As the field of bioinformatics continues its progressive trajectory, a comprehensive understanding and proficient implementation of q-value adjustments remains pivotal in driving meaningful discoveries and advancing our comprehension of intricate biological processes.
Reference