Data and Sample Retention Policy

Sequencing Data Retention

Due to the large size and high volume of sequencing data, we have very limited capacity for data retention.

BCL file retention

Illumina sequencers generate raw data in binary base call (BCL) format, which needs to be converted to FASTQ files for further data analysis.  Our center uses bcl2fastq Conversion Software, which is a standalone software offered by Illumina, to do demultiplexing and convert BCL files to standard FASTQ files.

BCL files would be retained for a very short period of time only for troubleshooting, usually less than a month. If users would like a copy of the BCL files, they must request this when submitting samples.

FASTQ file retention

FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from most sequencing systems, and can be used as input for a wide variety of secondary data analysis solutions.

FASTQ is usually the data format that delivered to users, and will be stored at our center for three months. Once delivered, it’s users’ responsibility to keep the data files.

Other Sequence File Formats

FASTQ files are the starting format for data analysis.  During secondary or tertiary analysis of NGS data, FASTQ files could be converted into other formats, for example, .bam, .vcf etc. Our center cooperates with Bioinformatics core to help you achieve the designated data format.

For Single Cell data, we use cellranger mkfastq to demultiplex the Illumina sequencer’s base call files(BCLs) into FASTQ files. This is a pipeline that wraps Illumina’s bcl2fastq and recommended by 10x for their planform.

Further data analysis pipeline, cellranger counts, aggr, vdj, could also be performed with prerequisite.

Sample/Library Retention

If you would like to keep the samples or libraries that have already been sequenced, you need to take them back from us and store in your own freezers. We will keep them in our freezers for one month from the time the sequencing data is delivered. After one month, leftover samples and libraries will be discarded.