bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.
You can install bambu from bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("bambu")
The default mode to run bambu is using a set of aligned reads (bam files), reference genome annotations (gtf file, TxDb object, or bambuAnnotation object), and reference genome sequence (fasta file or BSgenome). bambu* will return a summarizedExperiment object with the genomic coordinates for annotated and new transcripts and transcript expression estimates.
We highly recommend to use the same annotations that were used for genome alignment. If you have a gtf file and fasta file you can run bambu with the following options:
test.bam <- system.file("extdata", "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam", package = "bambu")
fa.file <- system.file("extdata", "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa", package = "bambu")
gtf.file <- system.file("extdata", "Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf", package = "bambu")
bambuAnnotations <- prepareAnnotations(gtf.file)
se <- bambu(reads = test.bam, annotations = bambuAnnotations, genome = fa.file)
Transcript discovery only (no quantification)
bambu(reads = test.bam, annotations = txdb, genome = fa.file, quant = FALSE)
Quantification of annotated transcripts and genes only (no transcript/gene discovery)
bambu(reads = test.bam, annotations = txdb, genome = fa.file, discovery = FALSE)
Large sample number/ limited memory
For larger sample numbers we recommend to write the processed data to a file:
bambu(reads = test.bam, rcOutDir = "./bambu/", annotations = bambuAnnotations, genome = fa.file)
For very large samples (>100 million reads) where memory is limiting we recommend running Bambu in lowMemory mode:
bambu(reads = test.bam, annotations = bambuAnnotations, genome = fa.file, lowMemory = TRUE)
You can also use precalculated annotations.
If you plan to run bambu more frequently, we recommend to save the bambuAnnotations object.
The bambuAnnotation object can be calculated from a .gtf file:
annotations <- prepareAnnotation(gtf.file)
From TxDb object
annotations <- prepareAnnotations(txdb)
More stringent filtering thresholds imposed on potential novel transcripts
bambu(reads, annotations, genome, opt.discovery = list(min.readCount = 5))
bambu(reads, annotations, genome, opt.discovery = list(min.sampleNumber = 5))
bambu(reads, annotations, genome, opt.discovery = list(min.readFractionByGene = 0.1))
bambu(reads, annotations, genome, NDR = 0.5)
Quantification without bias correction
The default estimation automatically does bias correction for expression estimates. However, you can choose to perform the quantification without bias correction.
bambu(reads, annotations, genome, opt.em = list(bias = FALSE))
Parallel computation
bambu allows parallel computation.
bambu(reads, annotations, genome, ncore = 8)
See our page for a complete step-by-step workflow and manual on how to customize other condictions.
bambu will output different results depending on whether quant mode is on.
By default, quant is set to TRUE, so bambu will generate a SummarizedExperiment object that contains the transcript expression estimates.
In the case when quant is set to FALSE, i.e., only transcript discovery is performed, bambu will report the grangeslist of the extended annotations
Transcript expression to gene expression
transcriptToGeneExpression(se)
Visualization
You can visualize the novel genes/transcripts using plotBambu function
plotBambu(se, type = "annotation", gene_id)
plotBambu(se, type = "annotation", transcript_id)
plotBambu(se, type = "heatmap") # heatmap
plotBambu(se, type = "pca") # PCA visualization
plotBambu(se, type = "heatmap", group.var) # heatmap
plotBambu(se, type = "pca", group.var) # PCA visualization
Write bambu outputs to files
writeBambuOutput(se, path = "./bambu/")
bambu version 1.99.0
Release date: 2021-10-18
Major Changes:
Minor fixes:
bambu version 1.0.2
Release date: 2020-11-10
bambu version 1.0.0
Release date: 2020-11-06
bambu version 0.99.4
Release date: 2020-08-18
bambu version 0.3.0
Release date: 2020-07-27
bambu version 0.2.0
Release date: 2020-06-18
bambu version 0.1.0
Release date: 2020-05-29
A manuscript describing bambu is currently in preparation. If you use bambu for your research, please cite using the following doi: 10.18129/B9.bioc.bambu. Please specificy that you are using a pre-publication release.
This package is developed and maintained by Ying Chen, Andre Sim, Yuk Kei Wan, and Jonathan Goeke at the Genome Institute of Singapore. If you want to contribute, please leave an issue. Thank you.