怎么从从ncbi的ftp上下了windows的本地blast

怎么从从ncbi的ftp上下了windows的本地blast,第1张

This document describes the "BLAST" databases available on the NCBI

FTP site under the /blast/db directory. The direct URL is:

ftp://ftp.ncbi.nih.gov/blast/db 本地BLAST数据库下载地址

1. General Introduction

NCBI BLAST home pages (http://www.ncbi.nih.gov/BLAST/) use a standard

set of BLAST databases for Nucleotide, Protein, and Translated BLAST

searches. These databases are made available in the /blast/db directory as

compressed archives (ftp://ftp.ncbi.nih.gov/blast/db/) in pre-formatted

format.这些数据库是已经预先进行过makeblastdb命令的,下载后可以直接使用

The FASTA databases reside under the /blast/db/FASTA directory.

The pre-formatted databases offer the following advantages:

* The pre-formatted databases are smaller in size and therefore are

faster to download

* Sequences in FASTA format can be generated from the pre-formatted

databases by the fastacmd utility可以从这些数据库文件中导出FASTA文件

* A convenient script (update_blastdb.pl) is available to download

the pre-formatted databases from the NCBI ftp site可用该脚本升级数据库

* Pre-formatting removes the need to run formatdb无需再运行建库命令行

* Taxonomy ids are available for each database entry.

Pre-formatted databases must be downloaded using the update_blastdb.pl

script or via FTP in binary mode. Documentation for the update_blastdb.pl

script can be obtained by running the script without any arguments (perl is

required). 下载数据库时,需要用到perl脚本update_blastdb.pl,或使用FTP下载工具

The compressed files downloaded must be inflated with gzip or other decompress

tools. The BLAST database files can then be extracted out of the resulting

tar file using tar program on Unix/Linux or WinZip and StuffIt Expander

on Windows and Macintosh platforms, respectively.下载的数据库为压缩包,要解压缩

Large databases are formatted in multiple 1 Gigabytes volumes, which

are named using the database.##.tar.gz convention. All relevant volumes

are required. An alias file is provided so that the database can be called

using the alias name without the extension (.nal or .pal). For example,

to call est database, simply use "-d est" option in the commandline

(without the quotes). 大的数据库通常分为多个压缩包,例如nr库有11个压缩包。所有的相关压缩包

都要下载,解压。解压缩会生成对应的库文件,同时生成一个nr.pal文件。检索nr库时输入-d nr 即可。

Certain databases are subsets of a larger parental database. For those

databases, alias and mask files, rather than actual databases, are provided.

The mask file needs the parent database to function properly. The parent

databases should be generated on the same day as the mask file. For

example, to use swissprot pre-formatted database, swissprot.tar.gz, one

will need to get the nr.tar.gz with the same date stamp. 有些数据库是大数据库

子集,使用这些子集数据库时,必须同时下载其(相同日期的)大数据库

Additional BLAST databases that are not provided in pre-formatted

formats are available in the FASTA subdirectory. 有些BLAST数据库没有提供预先建库

的文件,这些数据库可以从FASTA文件夹里下载 For genomic BLAST

databases, please check the genomes ftp directory at:

ftp://ftp.ncbi.nih.gov/genomes/ 在这里下载基因组BLAST数据库

2. Contents of the /blast/db/ directory

The pre-formatted BLAST databases are archived in this directory. The

name of these databases and their contents are listed below.

数据库名称 数据库内容

+----------------------+-----------------------------------------------+

|File Name | Content Description |

+----------------------+-----------------------------------------------+

/FASTA | subdirectory for FASTA formatted sequences

存放FASTA格式序列的子文件夹

README | README for this subdirectory (this file)

env_nr.*tar.gz | Environmental protein sequences 环境蛋白序列

env_nt.*tar.gz | Environmental nucleotide sequences 环境核苷酸序列

est.*tar.gz| volumes of the formatted est database

| from the EST division of GenBank, EMBL,

| and DDBJ. EST数据库

原理很简单后者是前者的子集。chromosome只包含所有已经测序的基因组数据。估计你的序列可能只是在高等生物中保守,所以才会出现选择chromosome数据库时相似数量下降非常多。 在做BLAST的时候,我们通常需要根据不同的目的选择不同的数据库。


欢迎分享,转载请注明来源:内存溢出

原文地址:https://54852.com/sjk/9536543.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2023-04-29
下一篇2023-04-29

发表评论

登录后才能评论

评论列表(0条)

    保存