How to Create Publication-Quality Upset Plots (R & Python Guide)
Master the art of UpSet plots. Complete guide with R (ComplexHeatmap) and Python (upsetplot) code, color palettes, and interpretation guides for complex dataset intersections.
Handling complex set intersections is a common challenge in data science and bioinformatics. When you have more than three sets, Venn diagrams transform from helpful visualizations into "hairballs" of illegible overlapping shapes.
The solution used by top researchers in Nature and Cell? The UpSet plot.
In this guide, we'll deconstruct publication-quality UpSet plots to understand why they work, and then provide the exact R and Python code you need to create them for your own research.
The Problem: The "Hairball" Effect
Venn diagrams rely on area-proportional circles to represent sets. This works perfectly for A vs. B. It's manageable for A vs. B vs. C. But add a fourth set, and the geometry breaks down. You can no longer represent all possible intersections with simple circles.
Researchers often try to force it, resulting in figures where:
- Intersection sizes are impossible to judge visually.
- Labels become cluttered and unreadable.
- The most important biological patterns (e.g., "genes shared by all treatment conditions") get lost in the center.
The Solution: The Matrix Layout
UpSet plots (lex et al., 2014) solve this by treating intersections as a matrix.
- Columns represent intersections.
- Rows represent the sets.
- Dots and Lines indicate which sets are part of an intersection.
- Bar Charts on top show the size of that specific intersection.
This approach scales linearly. Whether you have 4 sets or 40, the visualization remains crisp and readable.
Real-World Examples from Top Journals
Let's look at how top journals use UpSet plots to communicate complex multi-omics and clinical data.
1. Multi-Omics Integration (Nature Medicine)
Nature Medicine: Feasibility of multiomics tumor profiling
Why it works:
- sorting: The intersections are sorted by size (descending), immediately highlighting the most common data availability scenarios.
- Clarity: It visualizes the overlap between 7 different technology platforms (scDNA, scRNA, etc.). A 7-way Venn diagram would be impossible.
- Context: The side bars show the total count for each technology, providing essential context for the intersection sizes.
2. Spatial Transcriptomics Benchmarking (Genome Biology)
Genome Biology: Benchmarking spatial transcriptomics technologies
Why it works:
- Comparison: Effectively compares gene detection across different sample preparations (FFPE vs. OCT).
- Sparsity: Shows that many genes are unique to specific conditions (the single-dot columns), a finding that might be obscured in a crowded Venn diagram.
3. Chromatin Loop Dynamics (BMC Biology)
BMC Biology: CTCF-anchored chromatin loop dynamics
Why it works:
- Cell State Analysis: Tracks chromatin loops across 5 distinct cell populations in spermatogenesis.
- Intersection Logic: Clearly separates "constitutive" loops (shared by all) from "stage-specific" loops (unique to one).
4. Pathway Enrichment Analysis (Cell Reports)
Cell Reports: Insulin hypersecretion pathway analysis
Why it works:
- Integration: Combines UpSet-style logic with enrichment statistics, showing how different gene modules (WGCNA) overlap with defined clusters.
- Color Coding: Uses color to map statistical significance or other metadata onto the bars, adding another layer of information.
How to Create UpSet Plots: Code Guide
Here is how you can recreate these publication-quality figures using industry-standard tools.
R: The Gold Standard (ComplexHeatmap)
In the R ecosystem, while UpSetR is popular, ComplexHeatmap is the superior choice for publication figures because it allows for extensive annotation and integration with other heatmaps.
library(ComplexHeatmap)
# 1. Prepare your data (List of sets)
# Example: Genes found significantly mutated in different cancer types
genes_list = list(
Breast = c("TP53", "PIK3CA", "GATA3", "MAP3K1", "KMT2C"),
Lung = c("TP53", "KRAS", "EGFR", "STK11", "KEAP1"),
Colon = c("APC", "TP53", "KRAS", "PIK3CA", "BRAF"),
Kidney = c("VHL", "PBRM1", "SETD2", "BAP1", "KDM5C"),
Ovary = c("TP53", "BRCA1", "BRCA2", "NF1", "CDK12")
)
# 2. Convert to Combination Matrix
m = make_comb_mat(genes_list)
# 3. Create the Plot with Publication Styling
UpSet(m,
# Sort by intersection size to emphasize main patterns
comb_order = order(comb_size(m), decreasing = TRUE),
# Customize standard aesthetics
pt_size = unit(3, "mm"),
lwd = 2,
# Add annotations and styling
top_annotation = HeatmapAnnotation(
"Intersection Size" = anno_barplot(
comb_size(m),
gp = gpar(fill = "#2C3E50"), # Dark blue-grey (Nature style)
height = unit(3, "cm")
),
annotation_name_side = "left"
),
right_annotation = rowAnnotation(
"Set Size" = anno_barplot(
set_size(m),
gp = gpar(fill = "#E74C3C"), # Muted Red
width = unit(3, "cm")
)
)
)
Pro Tip: Use comb_col to color specific vertical bars (e.g., to highlight the "all shared" intersection).
Python: The Versatile Option (upsetplot)
Python's upsetplot library allows for easy integration with pandas and matplotlib.
import matplotlib.pyplot as plt
from upsetplot import UpSet, from_contents
# 1. Prepare data (Dictionary of sets)
# Example: Shared users across different platforms
data_contents = {
'Platform A': ['u1', 'u2', 'u3', 'u4', 'u5', 'u6'],
'Platform B': ['u1', 'u2', 'u7', 'u8', 'u9', 'u10'],
'Platform C': ['u1', 'u3', 'u7', 'u11', 'u12', 'u13'],
'Platform D': ['u1', 'u4', 'u8', 'u12', 'u14', 'u15']
}
# 2. Transform to multi-index series
data = from_contents(data_contents)
# 3. Create the plot
fig = plt.figure(figsize=(10, 6))
upset = UpSet(data,
subset_size='count',
show_counts=True,
sort_by='cardinality',
# Aesthetic tweaks
element_size=40,
intersection_plot_elements=3
)
# Custom color scheme
display_style = {'facecolor': '#2C3E50', 'edgecolor': 'black'}
upset.style_subsets(present=['Platform A', 'Platform B', 'Platform C', 'Platform D'],
facecolor='#E74C3C', label="Shared by All")
upset.plot(fig=fig)
plt.title("User Overlap Analysis", fontsize=16, pad=20)
plt.show()
Styling & Color Palettes
For a professional look, avoid the default bright colors.
Nature / Science Style:
- Primary Bars: Dark Grey (
#333333) or Navy Blue (#00468B) - Sets: Deep Red (
#ED0000) or Forest Green (#009900) for headers measures. - Background: Clean white, remove grid lines from the bar charts if they add clutter.
Colorblind Safe:
- Palette: Okabe-Ito or Viridis.
- Contrast: Ensure high contrast between the dots and the background matrix lines.
Decision Framework: When to Use UpSet?
| Feature | Venn Diagram | UpSet Plot |
|---|---|---|
| Number of Sets | 2-3 (Perfect) | 4+ (Essential) |
| Quantitative Precision | Low (Area is hard to judge) | High (Bar charts are precise) |
| Empty Intersections | Hard to show "zero" overlap | Explicitly shows emptiness |
| Space Efficiency | Compact | Requires more width |
| Audience | General Public | Technical / Scientific |
Common Mistakes to Avoid
- Sorting by Degree: Don't just sort by the number of sets (1-way, 2-way...). Usually, sorting by intersection size (cardinality) tells the most interesting data story.
- Too Many Intersections: If you have 50 sparsely overlapping sets, you might get hundreds of columns. Use
min_sizeormin_degreeto filter out negligible intersections (e.g., n < 5). - Ignoring Empty Sets: Sometimes the fact that no intersection exists between Set A and Set B is the most important finding. Ensure your plot settings don't hide these if they are relevant.
Ready to upgrade your figures? Start by exploring our curated collection of UpSet Plots to find design inspiration for your next publication.
Related Articles

Beyond Venn Diagrams: Why & When to Use UpSet Plots for Complex Data
Discover why UpSet plots outperform Venn diagrams for visualizing 4+ set intersections. Learn when to switch, how to create publication-quality plots in R/Python, and explore real examples from Nature Genetics and Cell.

Genomic Data Integration: Circos Plots in Circular Genome Visualization and Multi-Omics Analysis
Master Circos plot creation for genomic data integration and circular visualization through real examples from Nature Genetics, Cell, and leading journals. Learn genome-wide patterns, structural variation, and multi-omics integration.

Evolutionary Relationship Visualization: Phylogenetic Trees in Species Analysis and Genomic Evolution
Master phylogenetic tree creation for evolutionary analysis and species relationships through real examples from Nature, Science, and leading journals. Learn tree topology, branch lengths, and evolutionary inference.
