パッケージの詳細

@gmod/cram

GMOD4.1kMIT5.0.5

read CRAM files with pure Javascript

cram, genomics, bionode, biojs

readme

@gmod/cram

NPM version Coverage Status Build Status

Read CRAM files (indexed or unindexed) with pure JS, works in node or in the browser.

  • Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
  • Does not read CRAM 1.x
  • Can use .crai indexes out of the box, for efficient sequence fetching, but also has an index API that would allow use with other index types
  • Has preliminary support for bzip2 and lzma codecs. lzma requires the latest @gmod/cram version, and uses webassembly. If you find you are unable to compile it, you can try downgrading

Install

$ npm install --save @gmod/cram
# or
$ yarn add @gmod/cram

Usage

const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')

// Use indexedfasta library for seqFetch, if using local file (see below)
const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')

// this uses local file paths for node.js for IndexedFasta, for usages using
// remote URLs see indexedfasta docs for filehandles and
// https://github.com/gmod/generic-filehandle2
const t = new IndexedFasta({
  path: '/filesystem/yourfile.fa',
  faiPath: '/filesystem/yourfile.fa.fai',
})

// example of fetching records from an indexed CRAM file.
// NOTE: only numeric IDs for the reference sequence are accepted.
// For indexedfasta the numeric ID is the order in which the sequence names
// appear in the header

// Wrap in an async and then run
run = async () => {
  const idToName = []
  const nameToId = {}

  // example opening local files on node.js
  // can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for
  // the CraiIndex) params to open remote URLs
  //
  // alternatively `cramFilehandle` (for the IndexedCramFile class) and
  // `filehandle` (for the CraiIndex) can be used,  see for examples
  // https://github.com/gmod/generic-filehandle2

  const indexedFile = new IndexedCramFile({
    cramPath: '/filesystem/yourfile.cram',
    //or
    //cramUrl: 'url/to/file.cram'
    //cramFilehandle: a generic-filehandle2 or similar filehandle
    index: new CraiIndex({
      path: '/filesystem/yourfile.cram.crai',
      // or
      // url: 'url/to/file.cram.crai'
      // filehandle: a generic-filehandle2 or similar filehandle
    }),
    seqFetch: async (seqId, start, end) => {
      // note:
      // * seqFetch should return a promise for a string, in this instance retrieved from IndexedFasta
      // * we use start-1 because cram-js uses 1-based but IndexedFasta uses 0-based coordinates
      // * the seqId is a numeric identifier, so we convert it back to a name with idToName
      // * you can return an empty string from this function for testing if you want, but you may not get proper interpretation of record.readFeatures
      return t.getSequence(idToName[seqId], start - 1, end)
    },
    checkSequenceMD5: false,
  })
  const samHeader = await indexedFile.cram.getSamHeader()

  // use the @SQ lines in the header to figure out the
  // mapping between ref ID numbers and names

  const sqLines = samHeader.filter(l => l.tag === 'SQ')
  sqLines.forEach((sqLine, refId) => {
    sqLine.data.forEach(item => {
      if (item.tag === 'SN') {
        // this is the ref name
        const refName = item.value
        nameToId[refName] = refId
        idToName[refId] = refName
      }
    })
  })

  const records = await indexedFile.getRecordsForRange(
    nameToId['chr1'],
    10000,
    20000,
  )
  records.forEach(record => {
    console.log(`got a record named ${record.readName}`)
    if (record.readFeatures != undefined) {
      record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => {
        // process the read features. this can be used similar to
        // CIGAR/MD strings in SAM. see CRAM specs for more details.
        if (code === 'X') {
          console.log(
            `${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}`,
          )
        }
      })
    }
  })
}

run()

// can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for the CraiIndex) params to open remote URLs
// alternatively `cramFilehandle` (for the IndexedCramFile class) and `filehandle` (for the CraiIndex) can be used,  see for examples https://github.com/gmod/generic-filehandle2

You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag

API (auto-generated)

CramRecord

Table of Contents

CramRecord

Class of each CRAM record returned by this API.

Parameters
  • $0 any

    • $0.flags
    • $0.cramFlags
    • $0.readLength
    • $0.mappingQuality
    • $0.lengthOnRef
    • $0.qualityScores
    • $0.mateRecordNumber
    • $0.readBases
    • $0.readFeatures
    • $0.mateToUse
    • $0.readGroupId
    • $0.readName
    • $0.sequenceId
    • $0.uniqueId
    • $0.templateSize
    • $0.alignmentStart
    • $0.tags
isPaired

Returns boolean true if the read is paired, regardless of whether both segments are mapped

isProperlyPaired

Returns boolean true if the read is paired, and both segments are mapped

isSegmentUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

isMateUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

isReverseComplemented

Returns boolean true if the read is mapped to the reverse strand

isMateReverseComplemented

Returns boolean true if the mate is mapped to the reverse strand

isRead1

Returns boolean true if this is read number 1 in a pair

isRead2

Returns boolean true if this is read number 2 in a pair

isSecondary

Returns boolean true if this is a secondary alignment

isFailedQc

Returns boolean true if this read has failed QC checks

isDuplicate

Returns boolean true if the read is an optical or PCR duplicate

isSupplementary

Returns boolean true if this is a supplementary alignment

isDetached

Returns boolean true if the read is detached

hasMateDownStream

Returns boolean true if the read has a mate in this same CRAM segment

isPreservingQualityScores

Returns boolean true if the read contains qual scores

isUnknownBases

Returns boolean true if the read has no sequence bases

getReadBases

Get the original sequence of this read.

Returns String sequence basepairs

getPairOrientation

Get the pair orientation of a paired read. Adapted from igv.js

Returns String of paired orientatin

addReferenceSequence

Annotates this feature with the given reference sequence basepair information. This will add a sub and a ref item to base substitution read features given the actual substituted and reference base pairs, and will make the getReadSequence() method work.

Parameters
  • refRegion object

  • compressionScheme CramContainerCompressionScheme

Returns undefined nothing

ReadFeatures

The feature objects appearing in the readFeatures member of CramRecord objects that show insertions, deletions, substitutions, etc.

Static fields

  • code (character): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec for their meanings.
  • data (any): the data associated with the feature. The format of this varies depending on the feature code.
  • pos (number): location relative to the read (1-based)
  • refPos (number): location relative to the reference (1-based)

IndexedCramFile

Table of Contents

constructor

Parameters
  • args object

    • args.cram CramFile
    • args.index Index-like object that supports getEntriesForRange(seqId,start,end) -> Promise[Array[index entries]]
    • args.cacheSize number? optional maximum number of CRAM records to cache. default 20,000
    • args.checkSequenceMD5 boolean? default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.

getRecordsForRange

Parameters
  • seq number numeric ID of the reference sequence
  • start number start of the range of interest. 1-based closed coordinates.
  • end number end of the range of interest. 1-based closed coordinates.
  • opts {viewAsPairs: boolean?, pairAcrossChr: boolean?, maxInsertSize: number?} (optional, default {})

hasDataForReferenceSequence

Parameters

Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID

CramFile

Table of Contents

containerCount

Returns Promise([number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))

CraiIndex

Table of Contents

constructor

Parameters

hasDataForReferenceSequence

Parameters

Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise

getEntriesForRange

fetch index entries for the given range

Parameters

Returns Promise promise for an array of objects of the form {start, span, containerStart, sliceStart, sliceBytes }

CramUnimplementedError

Extends Error

Error caused by encountering a part of the CRAM spec that has not yet been implemented

CramMalformedError

Extends CramError

An error caused by malformed data.

CramBufferOverrunError

Extends CramMalformedError

An error caused by attempting to read beyond the end of the defined data.

Academic Use

This package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.

License

MIT © Robert Buels

更新履歴

v3.0.0

  • Remove @gmod/binary-parser to avoid CSP violation for use of 'eval'/'new Function'

v2.0.4

  • Remove fetchSizeLimit
  • Remove usage of url module

v2.0.3

  • Update sam header parsing to avoid breaking 'type contract'

v2.0.2

  • Update buffer-crc32
  • Update typescript-eslint config and related fixes

v2.0.1

  • Fix issue parsing header tags with : character

v2.0.0

  • Add lzma support via xz-decompress. This uses webassembly, so it is a major version bump

v1.7.4

  • Fix import of bzip2 module

v1.7.3

  • Fix usage of the 'b' tag under situations in CRA4 where a Uint8Array is received instead of Buffer

v1.7.2

  • Update README.md with docs

v1.7.1

  • Re-export CramRecord class for typescript

v1.7.0

  • Typescript entire codebase, big thanks to @0xorial for taking on this effort!
  • Update to use webpack 5 for UMD build

v1.6.4

  • Fix off by one in returning features from getRecordsFromRange

v1.6.3

  • Optimize CRAM parsing slightly (15% improvement on many short reads). This removes support for big endian machines
  • Publish src directory for sourceMap

v1.6.2

  • Publish src directory for better source maps

v1.6.1

  • Explicitly use pako for browser bundle to help avoid buggy zlib polyfills

v1.6.0

  • Support CRAMv3.1 (thanks to @jkbonfield for contributing!)
  • Support bzip codec
  • Remove localFile from the browser bundle using "browser" package.json field
  • Add esm module field in package.json

v1.5.9

  • Fix CRAM not downloading proper records for long reads (pt2, PR #84)

v1.5.8

  • Fix CRAM not downloading proper records for long reads (pt1, PR #85)

v1.5.7

  • Add getHeaderText to CRAM to get SAM header

v1.5.6

  • Remove unnecessary rethor win tinyMemoize error handler
  • Avoid uncaught promise from constructor

v1.5.5

  • Fix ability to reload CRAM file after failure
  • Check if BAI file incorrectly submitted as index for CRAM

v1.5.4

  • Fix handling of hard clipping

v1.5.3

  • Improved README
  • Upgrade to babel 7
  • Upgrade @gmod/binary-parser
  • Add fix for 'b', 'q', and 'Q' readFeatures

v1.5.2

  • Fix off-by-one error in range query
  • Add webpack cram-bundle.js

v1.5.1

  • Add fix for when mate is unmapped

v1.5.0

  • Add lossy-names support
  • Fix for mate strand

v1.4.3

  • Make sure mate exists for unmated pair, can exist when coordinate slices of cram file are made via samtools view

v1.4.2

  • Switch to es6-promisify for ie11
  • Switch to quick-lru instead of lru-cache for ie11

v1.4.1

  • Add maxInsertSize for viewAsPairs

v1.4.0

  • Add viewAsPairs implementation

v1.3.0

  • Fix tests in node 6
  • Make cram record unique IDs start at 1 instead of 0 to always be truthy
  • Implement gamma and subexp codecs

v1.2.0

  • Add getReadBases docs
  • Rewrite seq calculation to be much faster
  • Implement ref fetching for multi-ref slices