Data Upload Overview

Last updated: June 24, 2026

Written by Andrew Goodspeed, PhD

A guide that summarizes the data formats accepted by Pluto for each experiment type

Every Pluto experiment requires two uploads: assay data (the measurements) and sample data (the metadata describing each sample). This article will cover the following experiments:

(bulk) RNA-seq
CUT&RUN / ChIP-seq
ATAC-seq
Proteomics
scRNA-seq
Metabolomics
Spatial transcriptomics

RNA-seq

RNA-seq experiments accept either FASTQ files (raw sequencing reads) or a processed data table (CSV, comma-separated). If uploading FASTQ files, the sample data must include r1_fastq and r2_fastq columns. If uploading processed data, choose from the accepted units below.

Accepted processed data units:

Raw counts (integer)
Expected counts (continuous)
Transcripts per million (TPM)
Reads per kilobase per million (RPKM)
Fragments per kilobase per million (FPKM)

NOTE: if uploading processed data, only raw counts and expected counts support differential expression analysis.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data file is a gene-by-sample matrix uploaded as a CSV (CSV, comma-separated) file. The first column must be gene_symbol. The pathway analysis modules require common gene names. As a result, common gene names (e.g., ACTB, TP53) should be uploaded. However, if at least 90% of values in the first column are Ensembl IDs, they will automatically be converted to common gene names. If duplicates result from this conversion, only the most highly expressed entry is retained. Unmatched IDs or those without a gene symbol mapping are kept as-is.

Each subsequent column is named with the exact Sample ID listed within the sample_id column in the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column	Type	Requirement
`gene_symbol`	string	Required — first column; use common gene names or Ensembl IDs
`<sample_id>`	numeric	One column per sample; must match `sample_id` values in sample data exactly

Example:

gene_symbol	ctrl_fem_1	ctrl_fem_3	cyp_fem_6	cyp_male_5
Acsbg3	106	319	123	85
Acsf2	1267	1586	1247	1194
Acsl1	1497	1931	1662	1343
Acta1	8757	23898	13231	14879

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column. Additionally, sample_id must start with a letter and contain only alphanumeric characters and underscores.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only
`r1_fastq`	Required if uploading FASTQ files — filename of the R1 FASTQ
`r2_fastq`	Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., sex, tissue, treatment, group) are encouraged. Column names must start with a letter and contain no special characters except underscores. Samples run on multiple lanes should have equivalent rows and sample_id and only differ in the r1_fastq and r2_fastq columns. Pluto will automatically merge them during processing.

Replace any NA values with blanks before uploading.

Example:

sample_id	r1_fastq	r2_fastq	sex	tissue	treatment
ctrl_fem_1	SRX18852859_1.fastq.gz	SRX18852859_2.fastq.gz	female	urinary bladder	Control
ctrl_fem_3	SRX18852860_1.fastq.gz	SRX18852860_2.fastq.gz	female	urinary bladder	Control
cyp_fem_6	SRX18852865_1.fastq.gz	SRX18852865_2.fastq.gz	female	urinary bladder	CYP
cyp_male_5	SRX18852868_1.fastq.gz	SRX18852868_2.fastq.gz	male	urinary bladder	CYP

CUT&RUN / ChIP-seq

CUT&RUN and ChIP-seq experiments share the same data format in Pluto. Both accept FASTQ files or a processed data table (consensus peak counts, CSV). Each experiment should contain only one antibody, not counting the controls. If an experiment has multiple antibodies, it should be broken up into multiple experiments. BigWig files are optional and only applicable when uploading processed data — see note below.

BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample and all samples must be in the sample data and assay data counts. If your pipeline doesn't typically return those samples in the counts table, they should be given a mock count of 1 for every peak.

BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a peak-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be peak_id; gene_symbol is optional but recommended and must use common gene names for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column	Type	Requirement
`peak_id`	string	Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier
`gene_symbol`	string	Optional but recommended — use common gene names for pathway analysis
`<sample_id>`	numeric	One column per sample; must match `sample_id` values in sample data exactly

Example:

peak_id	gene_symbol	group1_rep1	group1_rep2	CD34_NT_IgG	group2_rep1
C12orf45_chr12_1050326_1050334	C12orf45	0	2	1	1
MIR7843_chr14_725993_726001	MIR7843	4	22	1	4
SRRM1_chr1_246351_246357	SRRM1	83	226	1	95
RAI2_chrX_179211_179227	RAI2	6	21	1	24

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column and follow the same naming rules as above. antibody and input_control are required for all CUT&RUN and ChIP-seq experiments.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only
`antibody`	Required — antibody used (e.g., `IgG`, `H3K27me3`, `GATA1`, `input`)
`input_control`	Required — the `sample_id` of the matching control; leave blank for control samples themselves
`r1_fastq`	Required if uploading FASTQ files — filename of the R1 FASTQ
`r2_fastq`	Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., genotype, cell_line, condition, replicate) are encouraged.

Replace any NA values with blanks before uploading.

Example:

sample_id	antibody	input_control	genotype	cell_line	r1_fastq	r2_fastq
CD34_NT_IgG	IgG		NT	CD34+ HSPCs	S11_R1.fastq.gz	S11_R2.fastq.gz
group1_rep1	GATA1	CD34_NT_IgG	NT	CD34+ HSPCs	S12_R1.fastq.gz	S12_R2.fastq.gz
group1_rep2	GATA1	CD34_NT_IgG	NT	CD34+ HSPCs	S13_R1.fastq.gz	S13_R2.fastq.gz
CD34_VHLKO_IgG	IgG		VHLKO	CD34+ HSPCs	S14_R1.fastq.gz	S14_R2.fastq.gz
group2_rep1	GATA1	CD34_VHLKO_IgG	VHLKO	CD34+ HSPCs	S15_R1.fastq.gz	S15_R2.fastq.gz

ATAC-seq

ATAC-seq experiments use the same assay data format as CUT&RUN / ChIP-seq (a peak-by-sample count matrix), but because there is no antibody targeting, the sample data follows the simpler RNA-seq structure — i.e., no antibody or input_control columns are required. Both FASTQ files and processed data tables (consensus peak counts, CSV) are accepted. BigWig files are optional and only applicable when uploading processed data.

BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample.

BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data file is a peak-by-sample matrix. The first column must be peak_id; gene_symbol is optional but recommended and must use common gene names (e.g., ACTB, TP53) for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column	Type	Requirement
`peak_id`	string	Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier
`gene_symbol`	string	Optional but recommended — use common gene names for pathway analysis
`<sample_id>`	numeric	One column per sample; must match `sample_id` values in sample data exactly

Example:

peak_id	gene_symbol	ctrl_rep1	ctrl_rep2	treated_rep1	treated_rep2
C12orf45_chr12_105032601_105033450	C12orf45	0	2	1	1
MIR7843_chr14_72599301_72600150	MIR7843	4	22	1	4
SRRM1_chr1_24635101_24635700	SRRM1	83	226	1	95
RAI2_chrX_17921151_17922750	RAI2	6	21	1	24

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only
`r1_fastq`	Required if uploading FASTQ files — filename of the R1 FASTQ
`r2_fastq`	Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., condition, cell_type, tissue, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id	r1_fastq	r2_fastq	condition	cell_type	replicate
ctrl_rep1	ctrl_rep1_R1.fastq.gz	ctrl_rep1_R2.fastq.gz	control	CD34+ HSPCs	1
ctrl_rep2	ctrl_rep2_R1.fastq.gz	ctrl_rep2_R2.fastq.gz	control	CD34+ HSPCs	2
treated_rep1	treated_rep1_R1.fastq.gz	treated_rep1_R2.fastq.gz	treated	CD34+ HSPCs	1
treated_rep2	treated_rep2_R1.fastq.gz	treated_rep2_R2.fastq.gz	treated	CD34+ HSPCs	2

Proteomics

Proteomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. The uploaded data must be fully normalized, log-transformed, contain only numeric values, and be ready for comparisons. Character entries indicating limits of detection (e.g., <LOD) are not permitted.

For pathway analysis to be available, common gene names must be present in either the protein_id or gene_symbol column. Alternatively, UniProt IDs can be provided in protein_id or a uniprot_id column, and Pluto will automatically map them to gene symbols.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a protein-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be protein_id, which can be populated with common protein names, UniProt IDs, or gene symbols. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column	Type	Requirement
`protein_id`	string	Required — first column; common protein name, UniProt ID, or gene symbol
`gene_symbol`	string	Optional — use common gene names; required for pathway analysis if not provided in `protein_id`
`uniprot_id`	string	Optional — UniProt ID; Pluto can use this to map gene symbols for pathway analysis
`<sample_id>`	numeric	One column per sample; must match `sample_id` values in sample data exactly

Example:

protein_id	gene_symbol	ctrl_rep1	ctrl_rep2	treated_rep1	treated_rep2
ACTB_HUMAN	ACTB	12.3	12.1	12.4	12.2
P53_HUMAN	TP53	8.7	8.9	10.2	10.5
MYC_HUMAN	MYC	9.1	9.3	11.8	11.6
GAPDH_HUMAN	GAPDH	14.2	14.0	14.1	14.3

SAMPLE DATA

The sample data file is also uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only

Additional columns (e.g., condition, cell_type, tissue, treatment, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id	condition	cell_type	replicate
ctrl_rep1	control	HeLa	1
ctrl_rep2	control	HeLa	2
treated_rep1	treated	HeLa	1
treated_rep2	treated	HeLa	2

scRNA-seq

scRNA-seq experiments accept either FASTQ files or a processed Seurat object (RDS). FASTQ upload is supported for 10x Genomics 3’ and 10x Genomics FLEX experiments only, processed via the nf-core/scrnaseq pipeline. For all other scRNA-seq data, a processed Seurat object must be uploaded instead.

Processed Seurat object: The object must be uploaded in .rds format and must already be filtered, normalized, and clustered prior to upload. Raw, unprocessed Seurat objects are not supported. Only the RNA assay (located in so@assays$RNA@data) will be used for differential expression and gene plotting so the data in that assay must be normalized. Cluster annotation is optional and can be performed within Pluto after upload.

NOTE: the 10x Genomics FLEX protocol option is only available in Pluto after a supported genome is selected (e.g., GRCh38).

SAMPLE DATA

The sample data file follows the same structure as bulk RNA-seq. It is uploaded as a CSV (comma-separated) file and describes each sample's metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores. If uploading FASTQ files, include r1_fastq and r2_fastq columns.

NOTE: the 10x Genomics FLEX assays requires two additional columns: pool and probe_barcode_ids.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only
`r1_fastq`	Required if uploading FASTQ files — filename of the R1 FASTQ
`r2_fastq`	Required if uploading paired-end FASTQ files — filename of the R2 FASTQ
`pool`	Required for FLEX assays only — pool identifier for the sample
`probe_barcode_ids`	Required for FLEX assays only — probe barcode ID(s) associated with the sample

Additional columns (e.g., condition, cell_type, tissue, replicate) are encouraged.

Replace any NA values with blanks before uploading.

Example (standard 10x 3’):

sample_id	r1_fastq	r2_fastq	condition	replicate
ctrl_rep1	ctrl_rep1_R1.fastq.gz	ctrl_rep1_R2.fastq.gz	control	1
ctrl_rep2	ctrl_rep2_R1.fastq.gz	ctrl_rep2_R2.fastq.gz	control	2
treated_rep1	treated_rep1_R1.fastq.gz	treated_rep1_R2.fastq.gz	treated	1
treated_rep2	treated_rep2_R1.fastq.gz	treated_rep2_R2.fastq.gz	treated	2

Example (10x Genomics FLEX) with one pool, two lanes, and four biological samples:

sample_id	r1_fastq	r2_fastq	pool	probe_barcode_ids	condition	replicate
ctrl_rep1	pool1_L001_R1.fastq.gz	pool1_L001_R2.fastq.gz	pool1	BC001	control	1
ctrl_rep2	pool1_L001_R1.fastq.gz	pool1_L001_R2.fastq.gz	pool1	BC002	control	2
treated_rep1	pool1_L001_R1.fastq.gz	pool1_L001_R2.fastq.gz	pool1	BC003	treated	1
treated_rep2	pool1_L001_R1.fastq.gz	pool1_L001_R2.fastq.gz	pool1	BC004	treated	2
ctrl_rep1	pool1_L002_R1.fastq.gz	pool1_L002_R2.fastq.gz	pool1	BC001	control	1
ctrl_rep2	pool1_L002_R1.fastq.gz	pool1_L002_R2.fastq.gz	pool1	BC002	control	2
treated_rep1	pool1_L002_R1.fastq.gz	pool1_L002_R2.fastq.gz	pool1	BC003	treated	1
treated_rep2	pool1_L002_R1.fastq.gz	pool1_L002_R2.fastq.gz	pool1	BC004	treated	2

Metabolomics

Metabolomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. As with proteomics, the uploaded data must be fully normalized, ready for comparisons, log-transformed, and contain only numeric values. Character entries indicating limits of detection (e.g., <LOD) are not permitted.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a metabolite-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be metabolite_id. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column	Type	Requirement
`metabolite_id`	string	Required — first column; name or identifier for the metabolite
`<sample_id>`	numeric	One column per sample; must match `sample_id` values in sample data exactly

Example:

metabolite_id	ctrl_rep1	ctrl_rep2	treated_rep1	treated_rep2
Glucose	8.2	8.4	6.1	6.3
Lactate	6.5	6.7	9.2	9.0
Glutamine	7.1	7.3	5.8	5.6
ATP	9.4	9.2	7.8	7.9

SAMPLE DATA

The sample data file describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column	Requirement
`sample_id`	Required — first column; alphanumeric, starts with a letter, underscores only

Additional columns (e.g., condition, cell_type, tissue, treatment, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id	condition	cell_type	replicate
ctrl_rep1	control	HeLa	1
ctrl_rep2	control	HeLa	2
treated_rep1	treated	HeLa	1
treated_rep2	treated	HeLa	2

Spatial Transcriptomics

Spatial transcriptomics data from 10X Genomics Xenium and Visium experiments can be uploaded as a processed Spatial object in H5AD (.h5ad) format — raw data and FASTQ upload are not supported.

Each experiment supports one sample only; multi-sample datasets must be split into separate experiments.

The object must already be filtered, normalized, and clustered prior to upload, and must include spatial coordinates. Cluster annotation is optional and can be performed within Pluto after upload.

SPATIAL OBJECT

Upload a processed Spatial object in .h5ad format. After upload, Pluto will extract key metadata — including sample information, cluster labels, and cell embeddings — for use in downstream analysis. You will be prompted to map which metadata columns correspond to samples, clusters, and other variables.

Keep the upload window open while the file transfers; you may leave the page once the object has been received.

IMAGE FILES

After uploading the Spatial object, image files are required to proceed. Pluto will alert you if the required image files are missing from the object.

NOTE: if an image is contained within the file, it cannot be edited or flipped within the platform.

Supported image formats:

OME-ZARR (.zarr)
OME-TIFF (.ome.tif / .ome.tiff)
SpatialData ZARR (.spatialdata.zarr)

FAQs

How do I format the sample data file if a sample has been sequenced in multiple lanes?

If a sample has been sequenced in multiple lanes, create separate rows for each lane in the sample data file. Each row should have the same sample_id but different r1_fastq and r2_fastq values corresponding to the different lanes. Pluto will automatically merge these during processing, resulting in only the unique sample_id values.

See below for a bulk RNA-seq example where 8 pairs of fastq files will be merged into 4 samples:

sample_id	r1_fastq	r2_fastq	condition	replicate
ctrl_rep1	c_rep1_lane1_R1.fastq.gz	c_rep1_lane1_R2.fastq.gz	control	1
ctrl_rep1	c_rep1_lane2_R1.fastq.gz	c_rep1_lane2_R2.fastq.gz	control	1
ctrl_rep2	c_rep2_lane1_R1.fastq.gz	c_rep2_lane1_R2.fastq.gz	control	2
ctrl_rep2	c_rep2_lane2_R1.fastq.gz	c_rep2_lane2_R2.fastq.gz	control	2
treated_rep1	t_rep1_lane1_R1.fastq.gz	t_rep1_lane1_R2.fastq.gz	treated	1
treated_rep1	t_rep1_lane2_R1.fastq.gz	t_rep1_lane2_R2.fastq.gz	treated	1
treated_rep2	t_rep2_lane1_R1.fastq.gz	t_rep2_lane1_R2.fastq.gz	treated	2
treated_rep2	t_rep2_lane2_R1.fastq.gz	t_rep2_lane2_R2.fastq.gz	treated	2

What should I do when uploading processed ChIP-seq or CUT&RUN data when I have more BigWig samples than samples in my assay data counts table?

It is common for some epigenetics pipelines to exclude input controls from the final counts table but still produce bigwig files for those samples, which are useful to view coverage. If you have bigwig files for samples that are not included in your assay data counts table, you should add those samples to both the assay and sample data files. Include a mock count of 1 for each peak for those samples in the assay data.

Here is an example of an assay data upload with an input control added:

peak_id	gene_symbol	EXP_1	EXP_2	input_control_1
C12orf45_chr12_1050326_1050334	C12orf45	0	2	1
MIR7843_chr14_725993_726001	MIR7843	4	22	1
SRRM1_chr1_246351_246357	SRRM1	83	226	1

🚀 Ready to create new experiments?

We hope that the summary above helps you feel empowered to create new experiments and dive into analyses! For more resources, we encourage you to take a look at our Blog and Knowledge Base.

Please reach out to support@pluto.bio if you have additional questions.

As always, our scientific support team is here to help! 🧬