Data Upload Overview

Last updated: June 24, 2026

Written by Andrew Goodspeed, PhD

A guide that summarizes the data formats accepted by Pluto for each experiment type

Every Pluto experiment requires two uploads: assay data (the measurements) and sample data (the metadata describing each sample). This article will cover the following experiments:

  • (bulk) RNA-seq

  • CUT&RUN / ChIP-seq

  • ATAC-seq

  • Proteomics

  • scRNA-seq

  • Metabolomics

  • Spatial transcriptomics


RNA-seq

RNA-seq experiments accept either FASTQ files (raw sequencing reads) or a processed data table (CSV, comma-separated). If uploading FASTQ files, the sample data must include r1_fastq and r2_fastq columns. If uploading processed data, choose from the accepted units below.

Accepted processed data units:

  • Raw counts (integer)

  • Expected counts (continuous)

  • Transcripts per million (TPM)

  • Reads per kilobase per million (RPKM)

  • Fragments per kilobase per million (FPKM)

NOTE: if uploading processed data, only raw counts and expected counts support differential expression analysis.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data file is a gene-by-sample matrix uploaded as a CSV (CSV, comma-separated) file. The first column must be gene_symbol. The pathway analysis modules require common gene names. As a result, common gene names (e.g., ACTBTP53) should be uploaded. However, if at least 90% of values in the first column are Ensembl IDs, they will automatically be converted to common gene names. If duplicates result from this conversion, only the most highly expressed entry is retained. Unmatched IDs or those without a gene symbol mapping are kept as-is.

Each subsequent column is named with the exact Sample ID listed within the sample_id column in the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column

Type

Requirement

gene_symbol

string

Required — first column; use common gene names or Ensembl IDs

<sample_id>

numeric

One column per sample; must match sample_id values in sample data exactly

Example:

gene_symbol

ctrl_fem_1

ctrl_fem_3

cyp_fem_6

cyp_male_5

Acsbg3

106

319

123

85

Acsf2

1267

1586

1247

1194

Acsl1

1497

1931

1662

1343

Acta1

8757

23898

13231

14879

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column. Additionally, sample_id must start with a letter and contain only alphanumeric characters and underscores.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

r1_fastq

Required if uploading FASTQ files — filename of the R1 FASTQ

r2_fastq

Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., sextissuetreatmentgroup) are encouraged. Column names must start with a letter and contain no special characters except underscores. Samples run on multiple lanes should have equivalent rows and sample_id and only differ in the r1_fastq and r2_fastq columns. Pluto will automatically merge them during processing.

Replace any NA values with blanks before uploading.

Example:

sample_id

r1_fastq

r2_fastq

sex

tissue

treatment

ctrl_fem_1

SRX18852859_1.fastq.gz

SRX18852859_2.fastq.gz

female

urinary bladder

Control

ctrl_fem_3

SRX18852860_1.fastq.gz

SRX18852860_2.fastq.gz

female

urinary bladder

Control

cyp_fem_6

SRX18852865_1.fastq.gz

SRX18852865_2.fastq.gz

female

urinary bladder

CYP

cyp_male_5

SRX18852868_1.fastq.gz

SRX18852868_2.fastq.gz

male

urinary bladder

CYP


CUT&RUN / ChIP-seq

CUT&RUN and ChIP-seq experiments share the same data format in Pluto. Both accept FASTQ files or a processed data table (consensus peak counts, CSV). Each experiment should contain only one antibody, not counting the controls. If an experiment has multiple antibodies, it should be broken up into multiple experiments. BigWig files are optional and only applicable when uploading processed data — see note below.

BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample and all samples must be in the sample data and assay data counts. If your pipeline doesn't typically return those samples in the counts table, they should be given a mock count of 1 for every peak.

BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a peak-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be peak_idgene_symbol is optional but recommended and must use common gene names for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column

Type

Requirement

peak_id

string

Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier

gene_symbol

string

Optional but recommended — use common gene names for pathway analysis

<sample_id>

numeric

One column per sample; must match sample_id values in sample data exactly

Example:

peak_id

gene_symbol

group1_rep1

group1_rep2

CD34_NT_IgG

group2_rep1

C12orf45_chr12_1050326_1050334

C12orf45

0

2

1

1

MIR7843_chr14_725993_726001

MIR7843

4

22

1

4

SRRM1_chr1_246351_246357

SRRM1

83

226

1

95

RAI2_chrX_179211_179227

RAI2

6

21

1

24

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column and follow the same naming rules as above. antibody and input_control are required for all CUT&RUN and ChIP-seq experiments.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

antibody

Required — antibody used (e.g., IgGH3K27me3GATA1input)

input_control

Required — the sample_id of the matching control; leave blank for control samples themselves

r1_fastq

Required if uploading FASTQ files — filename of the R1 FASTQ

r2_fastq

Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., genotypecell_lineconditionreplicate) are encouraged.

Replace any NA values with blanks before uploading.

Example:

sample_id

antibody

input_control

genotype

cell_line

r1_fastq

r2_fastq

CD34_NT_IgG

IgG

NT

CD34+ HSPCs

S11_R1.fastq.gz

S11_R2.fastq.gz

group1_rep1

GATA1

CD34_NT_IgG

NT

CD34+ HSPCs

S12_R1.fastq.gz

S12_R2.fastq.gz

group1_rep2

GATA1

CD34_NT_IgG

NT

CD34+ HSPCs

S13_R1.fastq.gz

S13_R2.fastq.gz

CD34_VHLKO_IgG

IgG

VHLKO

CD34+ HSPCs

S14_R1.fastq.gz

S14_R2.fastq.gz

group2_rep1

GATA1

CD34_VHLKO_IgG

VHLKO

CD34+ HSPCs

S15_R1.fastq.gz

S15_R2.fastq.gz


ATAC-seq

ATAC-seq experiments use the same assay data format as CUT&RUN / ChIP-seq (a peak-by-sample count matrix), but because there is no antibody targeting, the sample data follows the simpler RNA-seq structure — i.e., no antibody or input_control columns are required. Both FASTQ files and processed data tables (consensus peak counts, CSV) are accepted. BigWig files are optional and only applicable when uploading processed data.

BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample.

BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data file is a peak-by-sample matrix. The first column must be peak_idgene_symbol is optional but recommended and must use common gene names (e.g., ACTB, TP53) for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column

Type

Requirement

peak_id

string

Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier

gene_symbol

string

Optional but recommended — use common gene names for pathway analysis

<sample_id>

numeric

One column per sample; must match sample_id values in sample data exactly

Example:

peak_id

gene_symbol

ctrl_rep1

ctrl_rep2

treated_rep1

treated_rep2

C12orf45_chr12_105032601_105033450

C12orf45

0

2

1

1

MIR7843_chr14_72599301_72600150

MIR7843

4

22

1

4

SRRM1_chr1_24635101_24635700

SRRM1

83

226

1

95

RAI2_chrX_17921151_17922750

RAI2

6

21

1

24

SAMPLE DATA

The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

r1_fastq

Required if uploading FASTQ files — filename of the R1 FASTQ

r2_fastq

Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

Additional columns (e.g., conditioncell_typetissuereplicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id

r1_fastq

r2_fastq

condition

cell_type

replicate

ctrl_rep1

ctrl_rep1_R1.fastq.gz

ctrl_rep1_R2.fastq.gz

control

CD34+ HSPCs

1

ctrl_rep2

ctrl_rep2_R1.fastq.gz

ctrl_rep2_R2.fastq.gz

control

CD34+ HSPCs

2

treated_rep1

treated_rep1_R1.fastq.gz

treated_rep1_R2.fastq.gz

treated

CD34+ HSPCs

1

treated_rep2

treated_rep2_R1.fastq.gz

treated_rep2_R2.fastq.gz

treated

CD34+ HSPCs

2


Proteomics

Proteomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. The uploaded data must be fully normalized, log-transformed, contain only numeric values, and be ready for comparisons. Character entries indicating limits of detection (e.g., <LOD) are not permitted.

For pathway analysis to be available, common gene names must be present in either the protein_id or gene_symbol column. Alternatively, UniProt IDs can be provided in protein_id or a uniprot_id column, and Pluto will automatically map them to gene symbols.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a protein-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be protein_id, which can be populated with common protein names, UniProt IDs, or gene symbols. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column

Type

Requirement

protein_id

string

Required — first column; common protein name, UniProt ID, or gene symbol

gene_symbol

string

Optional — use common gene names; required for pathway analysis if not provided in protein_id

uniprot_id

string

Optional — UniProt ID; Pluto can use this to map gene symbols for pathway analysis

<sample_id>

numeric

One column per sample; must match sample_id values in sample data exactly

Example:

protein_id

gene_symbol

ctrl_rep1

ctrl_rep2

treated_rep1

treated_rep2

ACTB_HUMAN

ACTB

12.3

12.1

12.4

12.2

P53_HUMAN

TP53

8.7

8.9

10.2

10.5

MYC_HUMAN

MYC

9.1

9.3

11.8

11.6

GAPDH_HUMAN

GAPDH

14.2

14.0

14.1

14.3

SAMPLE DATA

The sample data file is also uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

Additional columns (e.g., conditioncell_typetissuetreatmentreplicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id

condition

cell_type

replicate

ctrl_rep1

control

HeLa

1

ctrl_rep2

control

HeLa

2

treated_rep1

treated

HeLa

1

treated_rep2

treated

HeLa

2


scRNA-seq

scRNA-seq experiments accept either FASTQ files or a processed Seurat object (RDS). FASTQ upload is supported for 10x Genomics 3’ and 10x Genomics FLEX experiments only, processed via the nf-core/scrnaseq pipeline. For all other scRNA-seq data, a processed Seurat object must be uploaded instead.

Processed Seurat object: The object must be uploaded in .rds format and must already be filtered, normalized, and clustered prior to upload. Raw, unprocessed Seurat objects are not supported. Only the RNA assay (located in so@assays$RNA@data) will be used for differential expression and gene plotting so the data in that assay must be normalized. Cluster annotation is optional and can be performed within Pluto after upload.

NOTE: the 10x Genomics FLEX protocol option is only available in Pluto after a supported genome is selected (e.g., GRCh38).

SAMPLE DATA

The sample data file follows the same structure as bulk RNA-seq. It is uploaded as a CSV (comma-separated) file and describes each sample's metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores. If uploading FASTQ files, include r1_fastq and r2_fastq columns.

NOTE: the 10x Genomics FLEX assays requires two additional columns: pool and probe_barcode_ids.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

r1_fastq

Required if uploading FASTQ files — filename of the R1 FASTQ

r2_fastq

Required if uploading paired-end FASTQ files — filename of the R2 FASTQ

pool

Required for FLEX assays only — pool identifier for the sample

probe_barcode_ids

Required for FLEX assays only — probe barcode ID(s) associated with the sample

Additional columns (e.g., conditioncell_typetissuereplicate) are encouraged.

Replace any NA values with blanks before uploading.

Example (standard 10x 3’):

sample_id

r1_fastq

r2_fastq

condition

replicate

ctrl_rep1

ctrl_rep1_R1.fastq.gz

ctrl_rep1_R2.fastq.gz

control

1

ctrl_rep2

ctrl_rep2_R1.fastq.gz

ctrl_rep2_R2.fastq.gz

control

2

treated_rep1

treated_rep1_R1.fastq.gz

treated_rep1_R2.fastq.gz

treated

1

treated_rep2

treated_rep2_R1.fastq.gz

treated_rep2_R2.fastq.gz

treated

2

Example (10x Genomics FLEX) with one pool, two lanes, and four biological samples:

sample_id

r1_fastq

r2_fastq

pool

probe_barcode_ids

condition

replicate

ctrl_rep1

pool1_L001_R1.fastq.gz

pool1_L001_R2.fastq.gz

pool1

BC001

control

1

ctrl_rep2

pool1_L001_R1.fastq.gz

pool1_L001_R2.fastq.gz

pool1

BC002

control

2

treated_rep1

pool1_L001_R1.fastq.gz

pool1_L001_R2.fastq.gz

pool1

BC003

treated

1

treated_rep2

pool1_L001_R1.fastq.gz

pool1_L001_R2.fastq.gz

pool1

BC004

treated

2

ctrl_rep1

pool1_L002_R1.fastq.gz

pool1_L002_R2.fastq.gz

pool1

BC001

control

1

ctrl_rep2

pool1_L002_R1.fastq.gz

pool1_L002_R2.fastq.gz

pool1

BC002

control

2

treated_rep1

pool1_L002_R1.fastq.gz

pool1_L002_R2.fastq.gz

pool1

BC003

treated

1

treated_rep2

pool1_L002_R1.fastq.gz

pool1_L002_R2.fastq.gz

pool1

BC004

treated

2

Metabolomics

Metabolomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. As with proteomics, the uploaded data must be fully normalized, ready for comparisons, log-transformed, and contain only numeric values. Character entries indicating limits of detection (e.g., <LOD) are not permitted.

ASSAY DATA (PROCESSED DATA TABLE)

The assay data is a metabolite-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be metabolite_id. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.

Column

Type

Requirement

metabolite_id

string

Required — first column; name or identifier for the metabolite

<sample_id>

numeric

One column per sample; must match sample_id values in sample data exactly

Example:

metabolite_id

ctrl_rep1

ctrl_rep2

treated_rep1

treated_rep2

Glucose

8.2

8.4

6.1

6.3

Lactate

6.5

6.7

9.2

9.0

Glutamine

7.1

7.3

5.8

5.6

ATP

9.4

9.2

7.8

7.9

SAMPLE DATA

The sample data file describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.

Column

Requirement

sample_id

Required — first column; alphanumeric, starts with a letter, underscores only

Additional columns (e.g., conditioncell_typetissuetreatmentreplicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.

Replace any NA values with blanks before uploading.

Example:

sample_id

condition

cell_type

replicate

ctrl_rep1

control

HeLa

1

ctrl_rep2

control

HeLa

2

treated_rep1

treated

HeLa

1

treated_rep2

treated

HeLa

2


Spatial Transcriptomics

Spatial transcriptomics data from 10X Genomics Xenium and Visium experiments can be uploaded as a processed Spatial object in H5AD (.h5ad) format — raw data and FASTQ upload are not supported.

Each experiment supports one sample only; multi-sample datasets must be split into separate experiments.

The object must already be filtered, normalized, and clustered prior to upload, and must include spatial coordinates. Cluster annotation is optional and can be performed within Pluto after upload.

SPATIAL OBJECT

Upload a processed Spatial object in .h5ad format. After upload, Pluto will extract key metadata — including sample information, cluster labels, and cell embeddings — for use in downstream analysis. You will be prompted to map which metadata columns correspond to samples, clusters, and other variables.

Keep the upload window open while the file transfers; you may leave the page once the object has been received.

IMAGE FILES

After uploading the Spatial object, image files are required to proceed. Pluto will alert you if the required image files are missing from the object.

NOTE: if an image is contained within the file, it cannot be edited or flipped within the platform.

Supported image formats:

  • OME-ZARR (.zarr)

  • OME-TIFF (.ome.tif / .ome.tiff)

  • SpatialData ZARR (.spatialdata.zarr)


FAQs

How do I format the sample data file if a sample has been sequenced in multiple lanes?

If a sample has been sequenced in multiple lanes, create separate rows for each lane in the sample data file. Each row should have the same sample_id but different r1_fastq and r2_fastq values corresponding to the different lanes. Pluto will automatically merge these during processing, resulting in only the unique sample_id values.

See below for a bulk RNA-seq example where 8 pairs of fastq files will be merged into 4 samples:

sample_id

r1_fastq

r2_fastq

condition

replicate

ctrl_rep1

c_rep1_lane1_R1.fastq.gz

c_rep1_lane1_R2.fastq.gz

control

1

ctrl_rep1

c_rep1_lane2_R1.fastq.gz

c_rep1_lane2_R2.fastq.gz

control

1

ctrl_rep2

c_rep2_lane1_R1.fastq.gz

c_rep2_lane1_R2.fastq.gz

control

2

ctrl_rep2

c_rep2_lane2_R1.fastq.gz

c_rep2_lane2_R2.fastq.gz

control

2

treated_rep1

t_rep1_lane1_R1.fastq.gz

t_rep1_lane1_R2.fastq.gz

treated

1

treated_rep1

t_rep1_lane2_R1.fastq.gz

t_rep1_lane2_R2.fastq.gz

treated

1

treated_rep2

t_rep2_lane1_R1.fastq.gz

t_rep2_lane1_R2.fastq.gz

treated

2

treated_rep2

t_rep2_lane2_R1.fastq.gz

t_rep2_lane2_R2.fastq.gz

treated

2

What should I do when uploading processed ChIP-seq or CUT&RUN data when I have more BigWig samples than samples in my assay data counts table?

It is common for some epigenetics pipelines to exclude input controls from the final counts table but still produce bigwig files for those samples, which are useful to view coverage. If you have bigwig files for samples that are not included in your assay data counts table, you should add those samples to both the assay and sample data files. Include a mock count of 1 for each peak for those samples in the assay data.

Here is an example of an assay data upload with an input control added:

peak_id

gene_symbol

EXP_1

EXP_2

input_control_1

C12orf45_chr12_1050326_1050334

C12orf45

0

2

1

MIR7843_chr14_725993_726001

MIR7843

4

22

1

SRRM1_chr1_246351_246357

SRRM1

83

226

1


🚀 Ready to create new experiments?

We hope that the summary above helps you feel empowered to create new experiments and dive into analyses! For more resources, we encourage you to take a look at our Blog and Knowledge Base.

Please reach out to support@pluto.bio if you have additional questions.

As always, our scientific support team is here to help! 🧬