Pie plots, also known as pie charts, are widely used in data visualization to represent proportional relationships within a dataset. Their intuitive structure makes them effective for comparing relative sizes of categories, particularly in market analysis, financial distributions, and scientific research. This article explores the advantages of pie plots, their interpretational aspects, and methods for creating them in R. Various approaches, including base R's pie() function, the ggpie package for enhanced customization. While pie plots excel in simplicity, they are most effective for datasets with limited categories.
Pie plot is a circular statistical graphic that is divided into slices to illustrate numerical proportions. Each sector (slice) represents a category's contribution to the whole, with the total sum always equating to 100%. Pie plots are widely used in data visualization across various industries, including business intelligence, market research, and scientific data analysis. Their intuitive structure makes them a preferred choice for representing percentage distributions in an easily comprehensible manner.
Pie plots are particularly useful when comparing the relative size of components within a dataset. Common applications include market share analysis, financial budget allocations, survey responses, and biological data distribution. However, their effectiveness diminishes when dealing with an excessive number of categories or when precise comparisons between slices are required.
In bioinformatics, a pie plot is a circular chart used to visualize the relative proportions of categorical biological data. Each slice represents a category-such as gene biotypes, genomic regions, or microbial taxa-and the size of each slice reflects its proportion of the total dataset (always summing to 100%).
Pie plots are frequently employed in genomics, transcriptomics, and metagenomics to provide a quick visual summary of data distributions. For example:
While simple, pie plots offer an effective way to communicate categorical distributions in biological data-particularly when dealing with a small number of categories.
Despite their simplicity, pie plots offer numerous advantages when used appropriately. Key benefits include:
(1) Intuitive Visualization – The clear and straightforward design of pie plots allows viewers to quickly grasp proportional differences.
(2) Effective for Small Data Categories – When dealing with a limited number of groups (ideally fewer than six), pie plots provide an immediate snapshot of category proportions.
(3) Enhanced Communication of Percentage-Based Data – Pie plots are excellent for displaying percentage distributions, making them valuable in business reports and presentations.
(4) Customizability – Modern tools, including R, allow for advanced formatting, such as 3D effects, color gradients, and interactive features, to enhance data presentation.
(5) Widely Recognized Format – Pie plots are universally understood, reducing the need for extensive explanations when presenting data.
However, it is crucial to note the limitations of pie plots, such as difficulties in accurately comparing similar-sized segments and inefficiency in representing large datasets. Alternative visualization methods like bar plots or stacked charts may be preferable in such cases.
Interpreting a pie plot correctly is essential for extracting meaningful insights. The following aspects should be considered:
Sector Size Proportionality – Each slice's size is directly proportional to its value relative to the whole dataset.
Labeling and Legends – Categories are often labeled directly on the chart or referenced through a color-coded legend.
Color Coding – Distinct colors or patterns help differentiate between categories, improving clarity.
Ordering of Sectors – Segments are usually arranged from largest to smallest in a clockwise or counterclockwise manner to facilitate quick comparison.
Annotations and Data Labels – Percentages or absolute values are often displayed to provide additional context.
A well-designed pie plot should be easy to interpret without requiring extensive explanation. Ambiguous segment differentiation, excessive categories, or inconsistent color schemes can reduce readability.
The distribution of exon number of exonic circular RNAs.(Wang, C.,et.al, 2022)
So, how do we draw pie plot in the R language? Here, the editor brings you a examples. We will start with ggplot2 built-in diamond data.
The simplest way to create a pie plot in R is by using the built-in pie() function. This function allows for basic customization, such as defining colors and labels. However, it has limitations in terms of aesthetic flexibility and interactive elements.
First we need to install and load the ggplot2 package:
install.packages("ggplot2") library(ggplot2)
Next, we randomly sampled 10 samples and visualized the prices of different samples using the pie() function.
set.seed(123) data <- diamonds data <- data[sample(rownames(data), 10),] pie(data$price)
The diamond dataset pie plot result.
Additionally, we can sort the data and modify the "labels" and "col" of the pie charts using the label and color parameters.
data_sort <- data[order(data$price), ] label <- paste0(round(prop.table(data_sort$price)*100, 2), "%") colors <- c('#b4b0ef', '#fbfbfb','#dda6dc', '#fcfbf8', '#f9efc1', '#f2f7f8','#6dc8cc') pie(data_sort$price, labels = label, col = colorRampPalette(colors)(length(data_sort$price)) )
The diamond dataset pie plot result.
For greater customization and a professional appearance, the ggplot2 package, combined with ggpie, is a more versatile option. The ggpie function offers enhanced aesthetics, better label placement, and increased customizability, making it suitable for publication-quality figures.
install.packages("ggpie") library(ggpie) data_use <- data.frame(group=rownames(data), count=data$price) colors <- c('#b4b0ef', '#fbfbfb','#dda6dc', '#fcfbf8', '#f9efc1', '#f2f7f8','#6dc8cc') data_use <- data_use[order(data_use$count), ] ggpie(data = data_use, fill_color = colorRampPalette(colors)(length(data_use$count)), label_info = 'ratio', #label content label_color = '#2a3441', # label text color label_type = 'horizon', # label text style label_pos = 'in', # set label position label_threshold = 10, # sector labels less than this threshold will be placed outside label_size = 3, # label font size border_color = 'black',# border color border_size = 0.7, # border line thickness nudge_x = 0.5, # Set the position of the pie chart outer label nudge_y = 0.5 # Set the position of the pie chart outer label ) + ggtitle('ggpie plot')
The diamond dataset pie plot result.
Pie plots serve as an essential tool for proportional data visualization, offering a quick and intuitive method to interpret percentage distributions. While they excel in simplicity and clarity, their effectiveness is maximized when applied to datasets with limited categories.
The choice of visualization tool should depend on the complexity of the dataset and the intended audience. When precise comparisons are required, alternatives such as bar charts may be more appropriate. However, when presenting an overview of proportional relationships, pie plots remain a valuable asset in data-driven decision-making.
R provides multiple methods for constructing pie plots, from simple base R functions to sophisticated ggplot2 approaches. Selecting the appropriate method depends on the level of customization, interactivity, and presentation quality required. By leveraging these powerful tools, professionals across various disciplines can effectively communicate data insights through visually compelling pie plots.
Reference
Wang, C., Liu, WR., Tan, S. et al. Characterization of distinct circular RNA signatures in solid tumors. Mol Cancer 21, 63 (2022). https://doi.org/10.1186/s12943-022-01546-4