NAV Navbar
shell javascript


Welcome to the Bionode documentation! Here we document only some of the currently more stable modules. If the module you are looking for is not here, please check for its GitHub repository in the list below and read the file.

Bionode modules list

Bionode modules can be used as command line tools or JavaScript libraries! You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.


Make sure you have the latest Node.JS installed. You can then install each module with npm (see shell examples on the right).

# Install module as command line tool available in PATH
# using npm -g option
npm install bionode-ncbi -g

# Install locally in the node_modules folder of a project
# for usage as a library. See the JavaScript tab.
npm install bionode-ncbi


General usage

bionode-ncbi <command> [arguments] --limit [num] --pretty

## Options
--stdin, -s         Read STDIN  
--limit, -l         Limit number of results  
--throughput, -t    Number of items per API request  
--pretty, -p        Print human readable output instead of NDJSON  
export DEBUG='*'    Debug mode

# Display CLI help
bionode-ncbi --help
var ncbi = require('bionode-ncbi')

// Callback pattern'genome', 'human', callback)

// Event pattern'genome', 'human').on('data', console.log)

// Pipe pattern
var JSONStream = require('JSONStream')'genome', 'human')

Takes a database name and a query term. Returns the metadata.

For a list of NCBI database that can be used, see this documentation’s appendix


search <db> [term]

Parameter Default Description
db none One of these
term none Species, dataset ID, etc
bionode-ncbi search search taxonomy 'solenopsis invicta' --limit 1 --pretty'taxonomy', ''solenopsis invicta'').on('data', console.log)
  "uid": "13686",
  "status": "active",
  "rank": "species",
  "division": "ants",
  "scientificname": "Solenopsis invicta",
  "commonname": "red fire ant",
  "taxid": 13686,
  "akataxid": "",
  "genus": "Solenopsis",
  "species": "invicta",
  "subsp": "",
  "modificationdate": "2015/09/16 00:00",
  "genbankdivision": "Invertebrates"
// Arguments can be passed as an object instead:{ db: 'sra', term: 'solenopsis' })
.on('data', console.log)
// Advanced options can be passed using the previous syntax:
var options = {
  db: 'assembly', // database to search
  term: 'human',  // optional term for search
  limit: 500,     // optional limit of NCBI results
  throughput: 100 // optional number of items per request
// The search term can also be passed with write:
var search ='sra').on('data', console.log)
// Or piped, for example, from a file:
var split = require('split')



Takes a database name and a query term. Returns the data.


fetch <db> [term]

Parameter Default Description
db none One of these
term none Species, dataset ID, etc
bionode-ncbi fetch nucest p53 -l 1 --pretty
ncbi.fetch('nucest', 'p53').on('data', console.log)
  "id": "JZ923713.1 clone 186 Pelteobagrus fulvidraco spleen cDNA library Tachysurus fulvidraco cDNA similar to p53, mRNA sequence",
// With advanced parameters for sequence databases (all are optional):
var opts = {
   db: 'nucest',
   term: 'guillardia_theta',
   strand: 1,
   complexity: 4
ncbi.fetch(opts).on('data', console.log)

For some databases there are multiple return types. A default one will be chosen automatically, however it is possible to specify this via the rettype option. The NCBI website provides a list of databasese supported by efetch here:


Takes either sra or assembly database name and query term. Returns URLs of datasets.


urls <dlsource> [term]

bionode-ncbi urls assembly human -l 1 -p
ncbi.urls('assembly', 'human').on('data', console.log)
  "uid": "1075781",
  "assembly_report": {
    "txt": "
  "assembly_stats": {
    "txt": "
  "cds_from_genomic": {
    "fna": "
  "feature_table": {
    "txt": "
  "genomic": {
    "fna": "",
    "gbff": "",
    "gff": "
  "protein": {
    "faa": "",
    "gpff": "
  "rna_from_genomic": {
    "fna": "
  "wgsmaster": {
    "gbff": "
  "README": {
    "txt": ""
  "annotation_hashes": {
    "txt": ""
  "assembly_status": {
    "txt": ""
  "md5checksums": {
    "txt": ""
# The following examples requires a json parser
bionode-ncbi urls assembly human -l 1 -p | json genomic.fna
# Returns:


Takes either sra or assembly db name and query term. Downloads the corresponding SRA or assembly (genomic.fna) file into a folder named after the unique ID (UID).


download <dlsource> [term]

bionode-ncbi download assembly 'solenopsis invicta''assembly', 'solenopsis invicta').on('data', console.log)
.on('end', function(path) { console.log('File saved at ' + path) }
bionode-ncbi download assembly 'solenopsis invicta' --pretty
# Downloading GCF_000188075.1_Si_gnG_genomic.fna.gz
# [>                                            ] 0.7% of 116.65 MB (192.51 kB/s)

Returns a unique ID (UID) from a destination database linked to another UID from a source database.


link <srcDB> <destDB> [srcUID]

bionode-ncbi link assembly bioproject 244018 --pretty'assembly', 'bioproject', 244018).on('data', console.log)
  "srcDB": "assembly",
  "destDB": "bioproject",
  "srcUID": "244018",
  "destUIDs": [


Takes a property (e.g. biosample) and an optional destination property (e.g. sample) and looks for a field named property+id (e.g. biosampleid) in the Streamed object. Then it will do a for that id and save the result under Streamed


expand <property> [destProperty]

bionode-ncbi search genome 'solenopsis invicta' -l 1 | \
bionode-ncbi expand tax -s --pretty'genome', 'solenopsis invicta').pipe(ncbi.expand('tax'))
  "uid": "2938",
  "organism_name": "Solenopsis invicta",
  "organism_kingdom": "Eukaryota",
  "organism_group": "",
  "organism_subgroup": "Insects",
  "defline": "Solenopsis invicta overview",
  "projectid": 49663,
  "project_accession": "PRJNA49663",
  "status": "Draft",
  "number_of_chromosomes": "0",
  "number_of_plasmids": "0",
  "number_of_organelles": "1",
  "assembly_name": "Si_gnG",
  "assembly_accession": "GCA_000188075.1",
  "assemblyid": 244018,
  "create_date": "2011/02/03 00:00",
  "options": "",
  "weight": "",
  "chromosome_assemblies": "0",
  "scaffold_assemblies": "1",
  "sra_genomes": "0",
  "taxid": 13686,
  "tax": {
    "uid": "13686",
    "status": "active",
    "rank": "species",
    "division": "ants",
    "scientificname": "Solenopsis invicta",
    "commonname": "red fire ant",
    "taxid": 13686,
    "akataxid": "",
    "genus": "Solenopsis",
    "species": "invicta",
    "subsp": "",
    "modificationdate": "2015/09/16 00:00",
    "genbankdivision": "Invertebrates"

Similar to Link but takes the srcUID from a property of the Streamed object and attaches the result to a property with the name of the destination DB.


bionode-ncbi plink <property> <destDB>

bionode-ncbi search genome 'solenopsis invicta' -l 1 | \
bionode-ncbi expand tax -s | \
bionode-ncbi plink tax sra -s --pretty'genome', 'solenopsis invicta')
.pipe(ncbi.plink('tax', 'sra')
{ "uid":"2938",
  "organism_name":"Solenopsis invicta",
  "defline":"Solenopsis invicta overview",
  "create_date":"2011/02/03 00:00",
  "scientificname":"Solenopsis invicta",
  "commonname":"red fire ant",
  "modificationdate":"2015/09/16 00:00",


Streamable FASTA parser.


# bionode-fasta [options] [input file] [output file]
bionode-fasta input.fasta.gz output.json
# You can also use fasta files compressed with gzip
# If no output is provided, the result will be printed to stdout
# Options: -p, --path: Includes the path of the original file as a property of the output objects
// Returns a Writable Stream that parses a FASTA content Buffer
// into a JSON Buffer

var fasta = require('bionode-fasta')

// Can also parse content from filenames Strings
// streamed to it

.pipe(fasta({filenameMode: true}))
{ "id": "contig1", "seq": "AGTCATGACTGACGTACGCATG" }
{ "id": "contig2", "seq": "ATGTACGTACTGCATGC" }
bionode-fasta input.fasta.gz output.json --path
// When filenames are Streamed like in the previous example,
// or passed directly to the parser Stream, they can be added
// to the output Objects
fasta({includePath: true}, './input.fasta')
{ "id": "contig1",
  "path": "./input.fasta" }
// The output from the parser can also be available
// as Objects instead of Buffers

fasta({objectMode: true}, './input.fasta')
.on('data', console.log)
// Shortcut version of the previous example
fasta.obj('./input.fasta').on('data', console.log)
// Callback style can also be used, however they might
// not be the best for large files
fasta.obj('./input.fasta', function(data) {


Module for DNA, RNA and protein sequences manipulation


This method currently only works as a JavaScript library and doesn’t provide a CLI interface (see issue #5).


Check sequence type

Takes a sequence string and checks if it’s DNA, RNA or protein. Follows IUPAC notation which allows ambiguous sequence notation. In this case the sequence is labelled as ambiguous nucleotide rather than amino acid sequence.

// Returns: "dna"
// Returns: "rna"
// Returns: "protein"
// Returns: "ambiguousDna"
// Returns: "ambiguousRna"

Takes a sequence type argument and returns a function to complement bases.


Reverse sequence

Takes sequence string and returns the reverse sequence.



(reverse) complement sequence

Takes a sequence string and optional boolean for reverse, and returns its complement.

seq.complement("ATGACCCTGAAGGTGAA", true);

Takes a sequence string and returns the reverse complement (syntax sugar).


Transcribe base

Takes a base character and returns the transcript base.

// "U"
// "A"
// "a"
// "G"


Get codon amino acid

Takes an RNA codon and returns the translated amino acid.

// "M"
// "A"
// "L"


Remove introns

Take a sequence and an array of exonsRanges and removes them.

seq.removeIntrons("ATGACCCTGAAGGTGAATGACAG", [[1, 8]]);
seq.removeIntrons("ATGACCCTGAAGGTGAATGACAG", [[2, 9], [12, 20]]);


Transcribe sequence

Takes a sequence string and returns the transcribed sequence (dna <-> rna). If an array of exons is given, the introns will be removed from the sequence.

seq.transcribe("AUGACCCUGAAGGUGAA"); //reverse


Translate sequence

Takes a DNA or RNA sequence and translates it to protein If an array of exons is given, the introns will be removed from the sequence.

seq.translate("ATGACCCTGAAGGTGAATGACAGGAAGCC", [[3, 21]]);
// "LKVND"


Reverse exons

Takes an array of exons and the length of the reference and returns inverted coordinates.

seq.reverseExons([[2,8]], 20);
// [ [ 12, 18 ] ]
seq.reverseExons([[10,45], [65,105]], 180);
// [ [ 135, 170 ], [ 75, 115 ] ]


Find non-canonical translation start site

Takes a sequence and returns boolean for canonical translation start site.

// true
// false


Get reading frames

Takes a sequence and returns an array with the six possible Reading Frames (+1, +2, +3, -1, -2, -3).



Get open reading frames

Takes a Reading Frame sequence and returns an array of Open Reading Frames.



Get all open reading frames

Takes a sequence and returns all Open Reading Frames in the six Reading Frames.

//  [ 'TGA', 'CCCTGA', 'AGGTGA', 'ATGACA' ],


Find longest open reading frame

Takes a sequence and returns the longest ORF from all six reading frames and corresponding frame symbol (+1, +2, +3, -1, -2, -3). If a frame symbol is specified, only look for longest ORF on that frame. When sorting ORFs, if there’s a tie, choose the one that starts with start codon Methionine. If there’s still a tie, return one randomly.

seq.findLongestOpenReadingFrame("ATGACCCTGAAGGTGAATGACA", "-1");


bionode-ncbi databases

Database Contains
gquery All Databases
assembly Genome assembly information
bioproject Data related to a single initiative
biosample Biological source materials used in experimental assays
biosystems Literature, small molecules, and sequence data grouped by biological relationships
books Books and documents in life science and healthcare
clinvar Genomic variation and its relationship to human health
clone Clones and libraries, including sequence data, map positions and distributor information
cdd Conserved Domains. Annotation of functional units in proteins.
gap Genotypes and Phenotypes (dbGaP). Interaction of genotype and phenotype in Humans
dbvar Genomic structural variation – insertions, deletions, duplications, inversions, mobile element insertions, translocations, and complex chromosomal rearrangements
nucest Expressed Sequence Tags (dbEST). Short single-read transcript sequences from GenBank
gene Genes, focusing on genomes that have been completely sequenced
genome Genomes including sequences, maps, chromosomes, assemblies, and annotations
gds GEO DataSets. Curated gene expression and molecular abundance DataSets assembled from the Gene Expression Omnibus (GEO)
geoprofiles GEO Profiles. Individual gene expression and molecular abundance Profiles assembled from the Gene Expression Omnibus (GEO)
nucgss Database of Genome Survey Sequences (dbGSS). A division of GenBank that contains short single-pass reads of genomic DNA.
gtr Genetic Testing Registry (GTR). A voluntary registry of genetic tests and laboratories
homologene A gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs
medgen A portal to information about medical genetics
mesh MeSH (Medical Subject Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed
ncbisearch NCBI Web Site
nlmcatalog NLM bibliographic data for journals, books, audiovisuals, computer software, electronic resources and other materials
nuccore Nucleotide. A collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB
omim Online Mendelian Inheritance in Man (OMIM). Human genes and genetic disorders
pmc PubMed Central (PMC). Full-text biomedical and life sciences journal literature, including clinical medicine and public health.
popset Related DNA sequences that originate from comparative studies
probe Nucleic acid reagents designed for use in a wide variety of biomedical research applications, together with information on reagent distributors, probe effectiveness, and computed sequence similarities
protein Protein sequence records from a variety of sources, including GenPept, RefSeq, Swiss-Prot, PIR, PRF, and PDB.
proteinclusters Protein sequences (clusters), consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes
pcassay PubChem BioAssay. Bioactivity assays used to screen the chemical substances contained in the PubChem Substance database
pccompound PubChem Compound. Contains unique, validated chemical structures (small molecules)
pcsubstance PubChem Substance. Samples and links to biological screening results that are available in PubChem BioAssay
pubmed Citations and abstracts for biomedical literature from MEDLINE and additional life science journals
pubmedhealth Clinical effectiveness reviews and other resources to help consumers and clinicians use and understand clinical research results. These are drawn from the NCBI Bookshelf and PubMed, including published systematic reviews from organizations such as the Agency for Health Care Research and Quality, The Cochrane Collaboration, and others (see complete listing). Links to full text articles are provided when available.
snp A variety of tools are available for searching the SNP database, allowing search by genotype, method, population, submitter, markers and sequence similarity using BLAST. These are linked under “"Search”“ on the left side bar of the dbSNP main page.
sparcle SPARCLE (Subfamily Protein Architecture Labeling Engine) is a resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic conserved domain architecture
sra Sequence Read Archive (SRA). Sequencing data from the next generation of sequencing platforms
structure Structure (Molecular Modeling Database). Macromolecular 3D structures derived from the Protein Data Bank
taxonomy Phylogenetic lineages of more than 160,000 organisms
toolkit The NCBI C++ Toolkit
toolkitall The NCBI C++ Toolkit
toolkitbook The NCBI C++ Toolkit
toolkitbookgh The NCBI C++ Toolkit
unigene Expressed Sequence Tags (ESTs) organized by organism, tissue type and developmental stage