- Raw data quality control
Quality control raw data to ensure expected quantity and quality of sequencing data is met;
Contact sequencing service institution for necessary additional sequencing run.
- Raw data filter
Low quality sequencing data are removed to ensure the quality of downstream data analysis. For certain tasks, special treatment will be performed (exp. Circular adaptor)
- Data analysis
Sequence assembly and annotation: Genome or transcriptome assembly and annotation, the result will be in .fasta format for further analysis;
Sequence variation analysis: Report sequence variation (SNP/SNV, mutation and etc.) in standard .bcf or .vcf format or customized format for pipeline integration;
Genotyping: Recalibrate base call quality; Report genotype in genotype likelihood;
Gene expression quantification: Quantify gene expression count data in the form of raw count, reads per kilo base per million reads (rpkm) or count per million (cpm); Report gene expression in .txt, .csv, .tsv, or .xlsx format.
Statistical test for gene differentially expression: Provide raw statistical test result and filtered result based on standard cutoff.
- Post data analysis
Genome browser configuration: Configure genome browser to host custom genome and gene model and genome sequence comparison with other genome model system; Launch genome browser in user-provided URL;
Genome hosting/Data storage: Host user’s custom genome, gene model; configure data server to store users’ data;
Co-expression;
Discover regulation for mechanism and function interpretation from big dataset;
Correlation/Gene Set Enrichment Analysis;
Report variation cluster in custom format;
In silico up-stream regulator/drug prediction;
Discovery of potential up-stream effector gene (exp. Transcription factor, proteins modification and interaction); Report result in tabular format and network;
Machine/Statistical learning: Distinguish gene expression signature and apply the signatures for sample classification/diagnosis. Report result in tabular format and ROC;
Data mining: Public data mining requested by users. Data source includes TCGA, ENCODE, Ensembl, NCBI, 1000 Genome Projects and etc.
- Publication support
Upload dataset to data repository (i.e. GEO, SRA);
Acquire accession number and setup data release date;
Draft relevant methodology for manuscript;