Data Upload Overview
Last updated: June 24, 2026
Written by Andrew Goodspeed, PhD
A guide that summarizes the data formats accepted by Pluto for each experiment type
Every Pluto experiment requires two uploads: assay data (the measurements) and sample data (the metadata describing each sample). This article will cover the following experiments:
(bulk) RNA-seq
CUT&RUN / ChIP-seq
ATAC-seq
Proteomics
scRNA-seq
Metabolomics
Spatial transcriptomics
RNA-seq
RNA-seq experiments accept either FASTQ files (raw sequencing reads) or a processed data table (CSV, comma-separated). If uploading FASTQ files, the sample data must include r1_fastq and r2_fastq columns. If uploading processed data, choose from the accepted units below.
Accepted processed data units:
Raw counts (integer)
Expected counts (continuous)
Transcripts per million (TPM)
Reads per kilobase per million (RPKM)
Fragments per kilobase per million (FPKM)
NOTE: if uploading processed data, only raw counts and expected counts support differential expression analysis.
ASSAY DATA (PROCESSED DATA TABLE)
The assay data file is a gene-by-sample matrix uploaded as a CSV (CSV, comma-separated) file. The first column must be gene_symbol. The pathway analysis modules require common gene names. As a result, common gene names (e.g., ACTB, TP53) should be uploaded. However, if at least 90% of values in the first column are Ensembl IDs, they will automatically be converted to common gene names. If duplicates result from this conversion, only the most highly expressed entry is retained. Unmatched IDs or those without a gene symbol mapping are kept as-is.
Each subsequent column is named with the exact Sample ID listed within the sample_id column in the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.
Column | Type | Requirement |
| string | Required — first column; use common gene names or Ensembl IDs |
| numeric | One column per sample; must match |
Example:
gene_symbol | ctrl_fem_1 | ctrl_fem_3 | cyp_fem_6 | cyp_male_5 |
Acsbg3 | 106 | 319 | 123 | 85 |
Acsf2 | 1267 | 1586 | 1247 | 1194 |
Acsl1 | 1497 | 1931 | 1662 | 1343 |
Acta1 | 8757 | 23898 | 13231 | 14879 |
SAMPLE DATA
The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column. Additionally, sample_id must start with a letter and contain only alphanumeric characters and underscores.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
| Required if uploading FASTQ files — filename of the R1 FASTQ |
| Required if uploading paired-end FASTQ files — filename of the R2 FASTQ |
Additional columns (e.g., sex, tissue, treatment, group) are encouraged. Column names must start with a letter and contain no special characters except underscores. Samples run on multiple lanes should have equivalent rows and sample_id and only differ in the r1_fastq and r2_fastq columns. Pluto will automatically merge them during processing.
Replace any NA values with blanks before uploading.
Example:
sample_id | r1_fastq | r2_fastq | sex | tissue | treatment |
ctrl_fem_1 | SRX18852859_1.fastq.gz | SRX18852859_2.fastq.gz | female | urinary bladder | Control |
ctrl_fem_3 | SRX18852860_1.fastq.gz | SRX18852860_2.fastq.gz | female | urinary bladder | Control |
cyp_fem_6 | SRX18852865_1.fastq.gz | SRX18852865_2.fastq.gz | female | urinary bladder | CYP |
cyp_male_5 | SRX18852868_1.fastq.gz | SRX18852868_2.fastq.gz | male | urinary bladder | CYP |
CUT&RUN / ChIP-seq
CUT&RUN and ChIP-seq experiments share the same data format in Pluto. Both accept FASTQ files or a processed data table (consensus peak counts, CSV). Each experiment should contain only one antibody, not counting the controls. If an experiment has multiple antibodies, it should be broken up into multiple experiments. BigWig files are optional and only applicable when uploading processed data — see note below.
BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample and all samples must be in the sample data and assay data counts. If your pipeline doesn't typically return those samples in the counts table, they should be given a mock count of 1 for every peak.
BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.
ASSAY DATA (PROCESSED DATA TABLE)
The assay data is a peak-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be peak_id; gene_symbol is optional but recommended and must use common gene names for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.
Column | Type | Requirement |
| string | Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier |
| string | Optional but recommended — use common gene names for pathway analysis |
| numeric | One column per sample; must match |
Example:
peak_id | gene_symbol | group1_rep1 | group1_rep2 | CD34_NT_IgG | group2_rep1 |
C12orf45_chr12_1050326_1050334 | C12orf45 | 0 | 2 | 1 | 1 |
MIR7843_chr14_725993_726001 | MIR7843 | 4 | 22 | 1 | 4 |
SRRM1_chr1_246351_246357 | SRRM1 | 83 | 226 | 1 | 95 |
RAI2_chrX_179211_179227 | RAI2 | 6 | 21 | 1 | 24 |
SAMPLE DATA
The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column and follow the same naming rules as above. antibody and input_control are required for all CUT&RUN and ChIP-seq experiments.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
| Required — antibody used (e.g., |
| Required — the |
| Required if uploading FASTQ files — filename of the R1 FASTQ |
| Required if uploading paired-end FASTQ files — filename of the R2 FASTQ |
Additional columns (e.g., genotype, cell_line, condition, replicate) are encouraged.
Replace any NA values with blanks before uploading.
Example:
sample_id | antibody | input_control | genotype | cell_line | r1_fastq | r2_fastq |
CD34_NT_IgG | IgG | NT | CD34+ HSPCs | S11_R1.fastq.gz | S11_R2.fastq.gz | |
group1_rep1 | GATA1 | CD34_NT_IgG | NT | CD34+ HSPCs | S12_R1.fastq.gz | S12_R2.fastq.gz |
group1_rep2 | GATA1 | CD34_NT_IgG | NT | CD34+ HSPCs | S13_R1.fastq.gz | S13_R2.fastq.gz |
CD34_VHLKO_IgG | IgG | VHLKO | CD34+ HSPCs | S14_R1.fastq.gz | S14_R2.fastq.gz | |
group2_rep1 | GATA1 | CD34_VHLKO_IgG | VHLKO | CD34+ HSPCs | S15_R1.fastq.gz | S15_R2.fastq.gz |
ATAC-seq
ATAC-seq experiments use the same assay data format as CUT&RUN / ChIP-seq (a peak-by-sample count matrix), but because there is no antibody targeting, the sample data follows the simpler RNA-seq structure — i.e., no antibody or input_control columns are required. Both FASTQ files and processed data tables (consensus peak counts, CSV) are accepted. BigWig files are optional and only applicable when uploading processed data.
BigWig files: BigWig files show how sequencing reads are distributed across the genome and are used for signal track visualization and genome-wide coverage heatmaps. If you choose to upload BigWig files, you must provide one for every sample.
BED file (optional): After uploading BigWig files, you will have the option to upload a BED file containing the peak coordinates. This file is used to define the genomic regions displayed in signal track views.
ASSAY DATA (PROCESSED DATA TABLE)
The assay data file is a peak-by-sample matrix. The first column must be peak_id; gene_symbol is optional but recommended and must use common gene names (e.g., ACTB, TP53) for pathway analysis. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.
Column | Type | Requirement |
| string | Required — first column; alphanumeric with underscores only; can be a locus, gene symbol, or other identifier |
| string | Optional but recommended — use common gene names for pathway analysis |
| numeric | One column per sample; must match |
Example:
peak_id | gene_symbol | ctrl_rep1 | ctrl_rep2 | treated_rep1 | treated_rep2 |
C12orf45_chr12_105032601_105033450 | C12orf45 | 0 | 2 | 1 | 1 |
MIR7843_chr14_72599301_72600150 | MIR7843 | 4 | 22 | 1 | 4 |
SRRM1_chr1_24635101_24635700 | SRRM1 | 83 | 226 | 1 | 95 |
RAI2_chrX_17921151_17922750 | RAI2 | 6 | 21 | 1 | 24 |
SAMPLE DATA
The sample data file is uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
| Required if uploading FASTQ files — filename of the R1 FASTQ |
| Required if uploading paired-end FASTQ files — filename of the R2 FASTQ |
Additional columns (e.g., condition, cell_type, tissue, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.
Replace any NA values with blanks before uploading.
Example:
sample_id | r1_fastq | r2_fastq | condition | cell_type | replicate |
ctrl_rep1 | ctrl_rep1_R1.fastq.gz | ctrl_rep1_R2.fastq.gz | control | CD34+ HSPCs | 1 |
ctrl_rep2 | ctrl_rep2_R1.fastq.gz | ctrl_rep2_R2.fastq.gz | control | CD34+ HSPCs | 2 |
treated_rep1 | treated_rep1_R1.fastq.gz | treated_rep1_R2.fastq.gz | treated | CD34+ HSPCs | 1 |
treated_rep2 | treated_rep2_R1.fastq.gz | treated_rep2_R2.fastq.gz | treated | CD34+ HSPCs | 2 |
Proteomics
Proteomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. The uploaded data must be fully normalized, log-transformed, contain only numeric values, and be ready for comparisons. Character entries indicating limits of detection (e.g., <LOD) are not permitted.
For pathway analysis to be available, common gene names must be present in either the protein_id or gene_symbol column. Alternatively, UniProt IDs can be provided in protein_id or a uniprot_id column, and Pluto will automatically map them to gene symbols.
ASSAY DATA (PROCESSED DATA TABLE)
The assay data is a protein-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be protein_id, which can be populated with common protein names, UniProt IDs, or gene symbols. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.
Column | Type | Requirement |
| string | Required — first column; common protein name, UniProt ID, or gene symbol |
| string | Optional — use common gene names; required for pathway analysis if not provided in |
| string | Optional — UniProt ID; Pluto can use this to map gene symbols for pathway analysis |
| numeric | One column per sample; must match |
Example:
protein_id | gene_symbol | ctrl_rep1 | ctrl_rep2 | treated_rep1 | treated_rep2 |
ACTB_HUMAN | ACTB | 12.3 | 12.1 | 12.4 | 12.2 |
P53_HUMAN | TP53 | 8.7 | 8.9 | 10.2 | 10.5 |
MYC_HUMAN | MYC | 9.1 | 9.3 | 11.8 | 11.6 |
GAPDH_HUMAN | GAPDH | 14.2 | 14.0 | 14.1 | 14.3 |
SAMPLE DATA
The sample data file is also uploaded as a CSV (comma-separated) file, which describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
Additional columns (e.g., condition, cell_type, tissue, treatment, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.
Replace any NA values with blanks before uploading.
Example:
sample_id | condition | cell_type | replicate |
ctrl_rep1 | control | HeLa | 1 |
ctrl_rep2 | control | HeLa | 2 |
treated_rep1 | treated | HeLa | 1 |
treated_rep2 | treated | HeLa | 2 |
scRNA-seq
scRNA-seq experiments accept either FASTQ files or a processed Seurat object (RDS). FASTQ upload is supported for 10x Genomics 3’ and 10x Genomics FLEX experiments only, processed via the nf-core/scrnaseq pipeline. For all other scRNA-seq data, a processed Seurat object must be uploaded instead.
Processed Seurat object: The object must be uploaded in
.rdsformat and must already be filtered, normalized, and clustered prior to upload. Raw, unprocessed Seurat objects are not supported. Only the RNA assay (located inso@assays$RNA@data) will be used for differential expression and gene plotting so the data in that assay must be normalized. Cluster annotation is optional and can be performed within Pluto after upload.
NOTE: the 10x Genomics FLEX protocol option is only available in Pluto after a supported genome is selected (e.g., GRCh38).
SAMPLE DATA
The sample data file follows the same structure as bulk RNA-seq. It is uploaded as a CSV (comma-separated) file and describes each sample's metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores. If uploading FASTQ files, include r1_fastq and r2_fastq columns.
NOTE: the 10x Genomics FLEX assays requires two additional columns: pool and probe_barcode_ids.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
| Required if uploading FASTQ files — filename of the R1 FASTQ |
| Required if uploading paired-end FASTQ files — filename of the R2 FASTQ |
| Required for FLEX assays only — pool identifier for the sample |
| Required for FLEX assays only — probe barcode ID(s) associated with the sample |
Additional columns (e.g., condition, cell_type, tissue, replicate) are encouraged.
Replace any NA values with blanks before uploading.
Example (standard 10x 3’):
sample_id | r1_fastq | r2_fastq | condition | replicate |
ctrl_rep1 | ctrl_rep1_R1.fastq.gz | ctrl_rep1_R2.fastq.gz | control | 1 |
ctrl_rep2 | ctrl_rep2_R1.fastq.gz | ctrl_rep2_R2.fastq.gz | control | 2 |
treated_rep1 | treated_rep1_R1.fastq.gz | treated_rep1_R2.fastq.gz | treated | 1 |
treated_rep2 | treated_rep2_R1.fastq.gz | treated_rep2_R2.fastq.gz | treated | 2 |
Example (10x Genomics FLEX) with one pool, two lanes, and four biological samples:
sample_id | r1_fastq | r2_fastq | pool | probe_barcode_ids | condition | replicate |
ctrl_rep1 | pool1_L001_R1.fastq.gz | pool1_L001_R2.fastq.gz | pool1 | BC001 | control | 1 |
ctrl_rep2 | pool1_L001_R1.fastq.gz | pool1_L001_R2.fastq.gz | pool1 | BC002 | control | 2 |
treated_rep1 | pool1_L001_R1.fastq.gz | pool1_L001_R2.fastq.gz | pool1 | BC003 | treated | 1 |
treated_rep2 | pool1_L001_R1.fastq.gz | pool1_L001_R2.fastq.gz | pool1 | BC004 | treated | 2 |
ctrl_rep1 | pool1_L002_R1.fastq.gz | pool1_L002_R2.fastq.gz | pool1 | BC001 | control | 1 |
ctrl_rep2 | pool1_L002_R1.fastq.gz | pool1_L002_R2.fastq.gz | pool1 | BC002 | control | 2 |
treated_rep1 | pool1_L002_R1.fastq.gz | pool1_L002_R2.fastq.gz | pool1 | BC003 | treated | 1 |
treated_rep2 | pool1_L002_R1.fastq.gz | pool1_L002_R2.fastq.gz | pool1 | BC004 | treated | 2 |
Metabolomics
Metabolomics experiments accept only a processed data table (CSV, comma-separated) — raw data upload is not available. As with proteomics, the uploaded data must be fully normalized, ready for comparisons, log-transformed, and contain only numeric values. Character entries indicating limits of detection (e.g., <LOD) are not permitted.
ASSAY DATA (PROCESSED DATA TABLE)
The assay data is a metabolite-by-sample matrix uploaded as a CSV (comma-separated) file. The first column must be metabolite_id. Each subsequent column is named with the exact sample_id from the sample data file. See in the two examples below how the contents of sample data’s sample_id column relates to the column names in the assay data.
Column | Type | Requirement |
| string | Required — first column; name or identifier for the metabolite |
| numeric | One column per sample; must match |
Example:
metabolite_id | ctrl_rep1 | ctrl_rep2 | treated_rep1 | treated_rep2 |
Glucose | 8.2 | 8.4 | 6.1 | 6.3 |
Lactate | 6.5 | 6.7 | 9.2 | 9.0 |
Glutamine | 7.1 | 7.3 | 5.8 | 5.6 |
ATP | 9.4 | 9.2 | 7.8 | 7.9 |
SAMPLE DATA
The sample data file describes each sample’s metadata. sample_id must be the first column, start with a letter, and contain only alphanumeric characters and underscores.
Column | Requirement |
| Required — first column; alphanumeric, starts with a letter, underscores only |
Additional columns (e.g., condition, cell_type, tissue, treatment, replicate) are encouraged. Column names must start with a letter and contain no special characters except underscores.
Replace any NA values with blanks before uploading.
Example:
sample_id | condition | cell_type | replicate |
ctrl_rep1 | control | HeLa | 1 |
ctrl_rep2 | control | HeLa | 2 |
treated_rep1 | treated | HeLa | 1 |
treated_rep2 | treated | HeLa | 2 |
Spatial Transcriptomics
Spatial transcriptomics data from 10X Genomics Xenium and Visium experiments can be uploaded as a processed Spatial object in H5AD (.h5ad) format — raw data and FASTQ upload are not supported.
Each experiment supports one sample only; multi-sample datasets must be split into separate experiments.
The object must already be filtered, normalized, and clustered prior to upload, and must include spatial coordinates. Cluster annotation is optional and can be performed within Pluto after upload.
SPATIAL OBJECT
Upload a processed Spatial object in .h5ad format. After upload, Pluto will extract key metadata — including sample information, cluster labels, and cell embeddings — for use in downstream analysis. You will be prompted to map which metadata columns correspond to samples, clusters, and other variables.
Keep the upload window open while the file transfers; you may leave the page once the object has been received.
IMAGE FILES
After uploading the Spatial object, image files are required to proceed. Pluto will alert you if the required image files are missing from the object.
NOTE: if an image is contained within the file, it cannot be edited or flipped within the platform.
Supported image formats:
OME-ZARR (
.zarr)OME-TIFF (
.ome.tif/.ome.tiff)SpatialData ZARR (
.spatialdata.zarr)
FAQs
How do I format the sample data file if a sample has been sequenced in multiple lanes?
If a sample has been sequenced in multiple lanes, create separate rows for each lane in the sample data file. Each row should have the same sample_id but different r1_fastq and r2_fastq values corresponding to the different lanes. Pluto will automatically merge these during processing, resulting in only the unique sample_id values.
See below for a bulk RNA-seq example where 8 pairs of fastq files will be merged into 4 samples:
sample_id | r1_fastq | r2_fastq | condition | replicate |
ctrl_rep1 | c_rep1_lane1_R1.fastq.gz | c_rep1_lane1_R2.fastq.gz | control | 1 |
ctrl_rep1 | c_rep1_lane2_R1.fastq.gz | c_rep1_lane2_R2.fastq.gz | control | 1 |
ctrl_rep2 | c_rep2_lane1_R1.fastq.gz | c_rep2_lane1_R2.fastq.gz | control | 2 |
ctrl_rep2 | c_rep2_lane2_R1.fastq.gz | c_rep2_lane2_R2.fastq.gz | control | 2 |
treated_rep1 | t_rep1_lane1_R1.fastq.gz | t_rep1_lane1_R2.fastq.gz | treated | 1 |
treated_rep1 | t_rep1_lane2_R1.fastq.gz | t_rep1_lane2_R2.fastq.gz | treated | 1 |
treated_rep2 | t_rep2_lane1_R1.fastq.gz | t_rep2_lane1_R2.fastq.gz | treated | 2 |
treated_rep2 | t_rep2_lane2_R1.fastq.gz | t_rep2_lane2_R2.fastq.gz | treated | 2 |
What should I do when uploading processed ChIP-seq or CUT&RUN data when I have more BigWig samples than samples in my assay data counts table?
It is common for some epigenetics pipelines to exclude input controls from the final counts table but still produce bigwig files for those samples, which are useful to view coverage. If you have bigwig files for samples that are not included in your assay data counts table, you should add those samples to both the assay and sample data files. Include a mock count of 1 for each peak for those samples in the assay data.
Here is an example of an assay data upload with an input control added:
peak_id | gene_symbol | EXP_1 | EXP_2 | input_control_1 |
C12orf45_chr12_1050326_1050334 | C12orf45 | 0 | 2 | 1 |
MIR7843_chr14_725993_726001 | MIR7843 | 4 | 22 | 1 |
SRRM1_chr1_246351_246357 | SRRM1 | 83 | 226 | 1 |
🚀 Ready to create new experiments?
We hope that the summary above helps you feel empowered to create new experiments and dive into analyses! For more resources, we encourage you to take a look at our Blog and Knowledge Base.
Please reach out to support@pluto.bio if you have additional questions.
As always, our scientific support team is here to help! 🧬