Webscraper io

WEBSCRAPER IO INSTALL
WEBSCRAPER IO FULL
WEBSCRAPER IO CODE

In addition, several new features and improvements to performance have been added to version 2. Configure calculating CAZy coverage of GenBankĬazy_webscraper version 1 is depracted due to updates of the CAZy website.Configuring cazy_webscraper using a YAML file.The cazy_webscraper API or Interrogating the local CAZyme database.Configuring retrieving genomic assembly data.Retrieving genomic assembly data from NCBI.Configuring PDB protein structure file retrieval.Retrieving protein structure files from PDB.Configuring extracting sequences from a local CAZyme db.

Extracting protein sequences from the local CAZyme database and building a BLAST database.

Configuring GenBank protein sequence data retrieval.

Retrieving protein sequences from GenBank.

WEBSCRAPER IO INSTALL

For now please install via pypi or from source. The bioconda installation method is not currently supported, but we are working on getting this fixed soon.

WEBSCRAPER IO FULL

Please see the full documentation at ReadTheDocs.

The cazy_webscraper API facilitates interoggating the local CAZyme database.

A FASTA file per extracted protein sequence.Protein sequences (retrieved from GenBank and/or UniProt) from the local CAZyme database for CAZymes matching the user specified criteria, and write to: Retrieve the latest archaeal and bacterial taxonomic classifications (including complete lineages from kingdom to species) - available in cazy_webscraper verion >= 2.2.0.Ĭazy_webscraper faciltates extracting information from the local CAZyme database. Structure files are written to disk, not stored in the local CAZyme database.Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB): Latest genomic assembly data (GenBank and RefSeq (when available) version accession and ID numbers) (version >=2.1.3).Latest taxonomic classification - including complete lineage (including phylum, class, order and family) (version >=2.1.2).Specifically, cazy_webscraper can be used to retrieve data from the following external databases for CAZymes in the local CAZyme database that meet user specified criteria, and adds the downloaded data to the local CAZyme database: Using the expand subcommand, a user can expand the core dataset. A log of each query is recorded in the database for transparency, reproducibility and shareablity. Successive CAZy queries can be collated into a single local database. These queries can be filtered by taxonomy at Kingdoms, genus, species or strain level. cazy_webscraper can recover specified CAZy Classes and/or CAZy families. This enables users to integrate the dataset into analytical pipelines, and interrogate the data in a manner unachievable through the CAZy website.ĭata can be retrieved for user defined datasets of interest.

WEBSCRAPER IO CODE

The code is distributed under the MIT license.Ĭazy_webscraper retrieves protein data from the CAZy database and stores the data in a local SQLite3 database. Please ensure you are using cazy_webscraper version 2 or newer.īioconda installation is fixed for >= v2.1.3.1 cazy_webscraperĬazy_webscraper is an application and Python3 package for the automated retrieval of protein data from the CAZy database.