BamFile {Rsamtools} | R Documentation |
Maintain and use BAM files
Description
Use BamFile()
to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList()
provides a convenient way of managing a list of
BamFile
instances.
Usage
## Constructors
BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE,
asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE,
qnamePrefixEnd=NA, qnameSuffixStart=NA)
## Opening / closing
## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)
## accessors; also path(), index(), yieldSize()
## S4 method for signature 'BamFile'
isOpen(con, rw="")
## S4 method for signature 'BamFile'
isIncomplete(con)
## S4 method for signature 'BamFile'
obeyQname(object, ...)
obeyQname(object, ...) <- value
## S4 method for signature 'BamFile'
asMates(object, ...)
asMates(object, ...) <- value
## S4 method for signature 'BamFile'
qnamePrefixEnd(object, ...)
qnamePrefixEnd(object, ...) <- value
## S4 method for signature 'BamFile'
qnameSuffixStart(object, ...)
qnameSuffixStart(object, ...) <- value
## actions
## S4 method for signature 'BamFile'
scanBamHeader(files, ..., what=c("targets", "text"))
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFileList'
seqinfo(x)
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
filter=FilterRules(), indexDestination=TRUE,
param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512, byTag=NULL, nThreads=1L)
## S4 method for signature 'BamFileList'
mergeBam(files, destination, ...)
## reading
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## counting
## S4 method for signature 'BamFile'
idxstatsBam(file, index=file, ...)
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
quickBamFlagSummary(file, ..., param=ScanBamParam(), main.groups.only=FALSE)
Arguments
... |
Additional arguments. For |
con |
An instance of |
x , object , file , files |
A character vector of BAM file paths
(for |
index |
character(1); the BAM index file path (for
|
yieldSize |
Number of records to yield each time the file
is read from with |
asMates |
Logical indicating if records should be paired as mates. See ‘Fields’ section for details. |
qnamePrefixEnd |
Single character (or NA) marking the
end of the qname prefix. When specified, all characters prior to
and including the |
qnameSuffixStart |
Single character (or NA) marking the
start of the qname suffix. When specified, all characters following
and including the |
obeyQname |
Logical indicating if the BAM file is sorted
by |
value |
Logical value for setting |
what |
For |
filter |
A |
destination |
character(1) file path to write filtered reads to. |
indexDestination |
logical(1) indicating whether the destination file should also be indexed. |
byQname , maxMemory , byTag , nThreads |
See |
param |
An optional |
rw |
Mode of file; ignored. |
main.groups.only |
See |
Objects from the Class
Objects are created by calls of the form BamFile()
.
Fields
The BamFile
class inherits fields from the
RsamtoolsFile
class and has fields:
- yieldSize:
Number of records to yield each time the file is read from using
scanBam
or, whenlength(bamWhich()) != 0
, a threshold which yields records in complete ranges whose sum first exceedsyieldSize
. SettingyieldSize
on aBamFileList
does not alter existing yield sizes set on the individualBamFile
instances.- asMates:
A logical indicating if the records should be returned as mated pairs. When
TRUE
scanBam
attempts to mate (pair) the records and returns two additional fieldsgroupid
andmate_status
.groupid
is an integer vector of unique group ids;mate_status
is a factor with levelmated
for records successfully paired by the algorithm,ambiguous
for records that are possibly mates but cannot be assigned unambiguously, orunmated
for reads that did not have valid mates.Mate criteria:
Bit 0x40 and 0x80: Segments are a pair of first/last OR neither segment is marked first/last
Bit 0x100: Both segments are secondary OR both not secondary
Bit 0x10 and 0x20: Segments are on opposite strands
mpos match: segment1 mpos matches segment2 pos AND segment2 mpos matches segment1 pos
tid match
Flags, tags and ranges may be specified in the
ScanBamParam
for fine tuning of results.- obeyQname:
A logical(0) indicating if the file was sorted by qname. In Bioconductor > 2.12 paired-end files do not need to be sorted by
qname
. Instead setasMates=TRUE
in theBamFile
when using thereadGAlignmentsList
function from the GenomicAlignments package.
Functions and methods
BamFileList
inherits additional methods from
RsamtoolsFileList
and SimpleList
.
Opening / closing:
- open.BamFile
Opens the (local or remote)
path
andindex
(ifbamIndex
is notcharacter(0)
), files. Returns aBamFile
instance.- close.BamFile
Closes the
BamFile
con
; returning (invisibly) the updatedBamFile
. The instance may be re-opened withopen.BamFile
.- isOpen
Tests whether the
BamFile
con
has been opened for reading.- isIncomplete
Tests whether the
BamFile
con
is niether closed nor at the end of the file.
Accessors:
- path
Returns a character(1) vector of BAM path names.
- index
Returns a character(0) or character(1) vector of BAM index path names.
- yieldSize, yieldSize<-
Return or set an integer(1) vector indicating yield size.
- obeyQname, obeyQname<-
Return or set a logical(0) indicating if the file was sorted by qname.
- asMates, asMates<-
Return or set a logical(0) indicating if the records should be returned as mated pairs.
Methods:
- scanBamHeader
Visit the path in
path(file)
, returning the information contained in the file header; seescanBamHeader
.- seqinfo, seqnames, seqlength
Visit the path in
path(file)
, returning aSeqinfo
, character, or named integer vector containing information on the anmes and / or lengths of each sequence. Seqnames are ordered as they appear in the file.- scanBam
Visit the path in
path(file)
, returning the result ofscanBam
applied to the specified path.- countBam
Visit the path(s) in
path(file)
, returning the result ofcountBam
applied to the specified path.- idxstatsBam
Visit the index in
index(file)
, quickly returning adata.frame
with columnsseqnames
,seqlength
,mapped
(number of mapped reads onseqnames
) andunmapped
(number of unmapped reads).- filterBam
Visit the path in
path(file)
, returning the result offilterBam
applied to the specified path. A single file can be filtered to one or several destinations, as described infilterBam
.- indexBam
Visit the path in
path(file)
, returning the result ofindexBam
applied to the specified path.- sortBam
Visit the path in
path(file)
, returning the result ofsortBam
applied to the specified path.- mergeBam
Merge several BAM files into a single BAM file. See
mergeBam
for details; additional arguments supported bymergeBam,character-method
are also available forBamFileList
.- show
Compactly display the object.
Author(s)
Martin Morgan and Marc Carlson
See Also
The
readGAlignments
,readGAlignmentPairs
, andreadGAlignmentsList
functions defined in the GenomicAlignments package.-
summarizeOverlaps
and findSpliceOverlaps-methods in the GenomicAlignments package for methods that work on a BamFile and BamFileList objects.
Examples
##
## BamFile options.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- BamFile(fl)
bf
## When 'asMates=TRUE' scanBam() reads the data in as
## pairs. See 'asMates' above for details of the pairing
## algorithm.
asMates(bf) <- TRUE
## When 'yieldSize' is set, scanBam() will iterate
## through the file in chunks.
yieldSize(bf) <- 500
## Some applications append a filename (e.g., NCBI Sequence Read
## Archive (SRA) toolkit) or allele identifier to the sequence qname.
## This may result in a unique qname for each record which presents a
## problem when mating paired-end reads (identical qnames is one
## criteria for paired-end mating). 'qnamePrefixEnd' and
## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.
qnamePrefixEnd(bf) <- "/"
qnameSuffixStart(bf) <- "."
##
## Reading Bam files.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
(bf <- BamFile(fl))
head(seqlengths(bf)) # sequences and lengths in BAM file
if (require(RNAseqData.HNRNPC.bam.chr14)) {
bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES)
bfl
bfl[1:2] # subset
bfl[[1]] # select first element -- BamFile
## merged across BAM files
seqinfo(bfl)
head(seqlengths(bfl))
}
length(scanBam(fl)[[1]][[1]]) # all records
bf <- open(BamFile(fl)) # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)
## Use 'yieldSize' to iterate through a file in chunks.
bf <- open(BamFile(fl, yieldSize=1000))
while (nrec <- length(scanBam(bf)[[1]][[1]]))
cat("records:", nrec, "\n")
close(bf)
## Repeatedly visit multiple ranges in the BamFile.
rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
param <- ScanBamParam(which=rng[i], what="seq")
bam <- scanBam(bamFile, param=param)[[1]]
alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)