NtSubtree

NtSubtree is a package based on Fastsubtrees which automatically downloads and installs the NCBI taxonomy tree during setup, and make it easier to work with taxonomy names.

Installation

It can be installed using pip install ntsubtree and automatically installs fastsubtrees on which it depends.

CLI

The CLI tool ntsubtree is provided by the package. The first time the tool is called from the command line, the package data (NCBI taxonomy tree) is downloaded from NCBI, the fastsubtrees tree data is constructed, as well as a table of taxonomy names.

If anything goes wrong during the automatic download and construction, use ntsubtree update --cleanup to repeat the process.

After that, it is possible to update the data to the newest NCBI taxonomy data, by running ntsubtree update. This only re-downloads the data and reconstruct the tree data, if newer data is available. Furthermore, it conserves any attribute data which have been added to the tree.

To add new attributes to the tree, ntsubtree attribute can be used. The usage is identical to fastsubtrees attribute, except that no tree filename is passed.

To query the tree, the ntsubtree query command is used. The usage is identical to fastsubtrees query, except that no tree filename is passed.

Taxonomic names are displayed automatically in the query results, unless the option --no-taxname is used. Furthermore, it is possible to query the tree by using a taxon name instead of a taxon ID as a subtree root, using the option -n.

Example usage

ntsubtree query 562               # taxonomic names displayed alongside the IDs
ntsubtree query -n "Escherichia"  # Query by taxonomic name

ntsubtree attribute myattr values.tsv
ntsubtree query 562 myattr

ntsubtree update

API

The first time that ntsubtree is imported, the package data (NCBI taxonomy tree) is downloaded from NCBI, the fastsubtrees tree data is constructed, as well as a table of taxonomy names. This can be triggered by python -m ntsubtree.

The ntsubtree.update() function can be used to check if new taxonomy data is avalaible at NCBI and, if so, download it and update the tree, without loosing existing attribute data.

Working with the tree is done using the fastsubtrees package API. The Tree object is obtained using get_tree().

Besides the IDs, the taxname attribute is automatically available. Furthermore, the ntsubtree.search_name(query) function can be used to retrieve a taxon ID to pass to the fastsubtrees tree query methods.

Example usage

import ntsubtree

ids_in_subtree = ntsubtree.get_tree().subtree_ids(562)

taxid = ntsubtree.search_name("Escherichia")
subtree_info = tree.subtree_info(taxid, ["taxname"])

tree.create_attribute_from_tabular("myattr", "attr-tsv")
results = tree.subtree_info(562, ["taxname", "myattr"])

ntsubtree.update() # check for updates