prepare reference annotations for long read RNA-Seq analysis with Bambu

prepareAnnotations(x)

Arguments

x	A path to gtf file or a `TxDb` object.

Value

A GRangesList object with additional details for each exon and transcript that are required by Bambu. Exons are ranked by the exon_rank column, corresponding to the rank in direction of transcription (from first to last exon). In addition to exon rank, gene id, transcript id, and the minimum transcript equivalent class is stored as well (a transcript equivalence class of a transcript x is the collection of transcripts where their exon junctions contain, in a continuous way, the exon junctions of the transcript x). The object is designed to be used by Bambu, and the direct access of the metadata is not recommended.

Details

This function creates a reference annotation object which is used for transcript discovery and quantification in Bambu. prepareAnnotations can use a path to a gtf file or a TxDB object as input, and returns a annotation object that stores additional information about transcripts which is used in Bambu. For each transcript, exons are ranked from first to last exon in direction of transcription.

Examples

gtf.file <- system.file("extdata",
    "Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf",
    package = "bambu"
)
annotations <- prepareAnnotations(x = gtf.file)

# run bambu
test.bam <- system.file("extdata",
    "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam", 
    package = "bambu")
fa.file <- system.file("extdata", 
    "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa", 
    package = "bambu")
se <- bambu(reads = test.bam, annotations = annotations, 
    genome = fa.file, discovery = TRUE, quant = TRUE)
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> '?repositories' for details
#> 
#> replacement repositories:
#>     CRAN: https://cloud.r-project.org
#> Start generating read class files
#> iteration: 
#> not all chromosomes present in reference annotations,
#>             annotations might be incomplete. Please compare objects
#>             on the same reference
#> not all chromosomes from reads present in reference genome 
#>             sequence, reads without reference chromosome sequence are dropped
#> Junction correction with not enough data,
#>             precalculated model is used
#> Not enough data points
#> Not enough TRUE/FALSE labels
#> Transcript model not trained. Using pre-trained models
#> 1
#> 
#> Finished generating read classes from genomic alignments.
#> iteration: 
#> 1
#> 
#> iteration: 
#> 1
#> 
#> iteration: 
#> 1
#> 
#> iteration: 
#> 1
#> 
#> Less than 50 TRUE or FALSE read classes for precision stabilization. 
#>           Filtering by prediction score instead
#> -- Predicting annotation completeness to determine NDR threshold --
#> Calculated NDR: 0.005
#> all detect novel transcripts are already present in the annotations, try a higher NDR
#> GRangesList object of length 105:
#> $ENST00000190165
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames        ranges strand | exon_rank exon_endRank
#>          <Rle>     <IRanges>  <Rle> | <integer>    <integer>
#>   [1]        9 976964-977455      + |         1            2
#>   [2]        9 990041-991731      + |         2            1
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $ENST00000305248
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames      ranges strand | exon_rank exon_endRank
#>          <Rle>   <IRanges>  <Rle> | <integer>    <integer>
#>   [1]        9 34965-35264      - |         2            1
#>   [2]        9 35504-35871      - |         1            2
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $ENST00000314367
#> GRanges object with 16 ranges and 2 metadata columns:
#>        seqnames        ranges strand | exon_rank exon_endRank
#>           <Rle>     <IRanges>  <Rle> | <integer>    <integer>
#>    [1]        9 121060-121573      - |        16            1
#>    [2]        9 121961-122090      - |        15            2
#>    [3]        9 123217-123282      - |        14            3
#>    [4]        9 123386-123454      - |        13            4
#>    [5]        9 134979-135030      - |        12            5
#>    ...      ...           ...    ... .       ...          ...
#>   [12]        9 172081-172172      - |         5           12
#>   [13]        9 173270-173366      - |         4           13
#>   [14]        9 175698-175784      - |         3           14
#>   [15]        9 177723-177820      - |         2           15
#>   [16]        9 178816-179047      - |         1           16
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> ...
#> <102 more elements>
#> DataFrame with 105 rows and 10 columns
#>                          TXNAME          GENEID      txid
#>                     <character>     <character> <logical>
#> ENST00000190165 ENST00000190165 ENSG00000064218        NA
#> ENST00000305248 ENST00000305248 ENSG00000218839        NA
#> ENST00000314367 ENST00000314367 ENSG00000172785        NA
#> ENST00000354485 ENST00000354485 ENSG00000107104        NA
#> ENST00000356521 ENST00000356521 ENSG00000172785        NA
#> ...                         ...             ...       ...
#> ENST00000620292 ENST00000620292 ENSG00000172785        NA
#> ENST00000620326 ENST00000620326 ENSG00000277631        NA
#> ENST00000621255 ENST00000621255 ENSG00000277631        NA
#> ENST00000622412 ENST00000622412 ENSG00000277631        NA
#> ENST00000628764 ENST00000628764 ENSG00000227155        NA
#>                                eqClass eqClassById readCount  newTxClass
#>                            <character>   <logical> <integer> <character>
#> ENST00000190165        ENST00000190165          NA        NA  annotation
#> ENST00000305248 ENST00000305248.ENST..          NA        NA  annotation
#> ENST00000314367        ENST00000314367          NA        NA  annotation
#> ENST00000354485 ENST00000354485.ENST..          NA        NA  annotation
#> ENST00000356521 ENST00000356521.ENST..          NA        22  annotation
#> ...                                ...         ...       ...         ...
#> ENST00000620292 ENST00000498044.ENST..          NA        NA  annotation
#> ENST00000620326        ENST00000620326          NA        NA  annotation
#> ENST00000621255        ENST00000621255          NA        NA  annotation
#> ENST00000622412        ENST00000622412          NA        NA  annotation
#> ENST00000628764        ENST00000628764          NA        NA  annotation
#>                     txNDR relReadCount relSubsetCount
#>                 <numeric>    <numeric>      <numeric>
#> ENST00000190165        NA           NA             NA
#> ENST00000305248        NA           NA             NA
#> ENST00000314367        NA           NA             NA
#> ENST00000354485        NA           NA             NA
#> ENST00000356521 0.0654418     0.468085              1
#> ...                   ...          ...            ...
#> ENST00000620292        NA           NA             NA
#> ENST00000620326        NA           NA             NA
#> ENST00000621255        NA           NA             NA
#> ENST00000622412        NA           NA             NA
#> ENST00000628764        NA           NA             NA
#> Finished extending annotations.
#> Start isoform quantification
#> iteration: 
#> 1
#> 
#> Finished isoform quantification.

prepare reference annotations for long read RNA-Seq analysis with Bambu

Arguments

Value

Details

See also

Examples