PCA analysis (Principal Component Analysis) is a technology for simplified analysis of data sets. PCA uses variance decomposition to reduce the dimensionality of multi-dimensional data, remove noise and redundancy, and reveal the most important elements and structures hidden behind complex data. In the field of life sciences, in proteomics and metabolomics research applications, it is usually necessary to observe data containing multiple variables, and collect a large amount of data to analyze and find rules. In addition, the microbial community structure is affected by many factors, such as light, temperature, humidity, etc. It is necessary to understand whether the purpose grouping is related to a certain factor. At this time, we often use the PCA sorting method to visually analyze the data.
The commonly used tools in PCA analysis are the PCA module in GCTA, Canoco software, smartpca in EIGENSOFT, and many R packages that can do PCA analysis. After the analysis, the visualization operation is generally implemented using the ggplot package.
Fig 1. Principal component analysis (PCA) of phylum abundance data using Canoco 4.5. (Wang Y, et al. 2012)
The points of different colors or shapes in the PCA diagram represent sample groups under different environments or conditions. The scales on the horizontal and vertical axes are relative distances and have no practical meaning. Principal component 1 (PC1) and principal component 2 (PC2) represent the suspected influencing factors of the deviation of the three groups of sample composition, which need to be summarized and summarized in combination with the sample feature information. The contribution rates of PC1 and PC2 are 57.7% and 15.5%, respectively.
Using PCA analysis can solve three major problems faced by big data analysis:
Solve high-dimensional problems by reducing dimensionality.
Reducing the dimensionality can effectively remove redundant data and ensure that the loss of feature information is minimized.
The data after dimensionality reduction can be displayed visually to facilitate the interpretation of effective information in big data.
Microbial principal component analysis.
PCA analysis is a very common method of biological information analysis. If you have data analysis needs in this area, but you don’t have enough time to complete it. CD Genomics as a biological information service provider, can provide you with PCA analysis services. You only need to provide the original data, we will give you accurate data analysis and make beautiful results pictures. If you have any questions or other analysis needs, please feel free to contact us.