Issues and Problems with Blast workflows
BLASTN
(Note all these tests were done using the
TavernaScuflWorkbench? version win32-0.1beta6 )
Test_XLRHODOP
XScufl file
emblAccNumberXLRHODOPseqretBlastn.xml
This blastn_ncbi example uses the embl entry XLRHODOP as its test input.
- For the full sequence 1684 letters (
trim_from set to 1), a complete blast result was obtained in approx 5 mins
- For a shortened sequence 1084 letters (
trim_from set to 601), an empty result set was returned after about 4 mins,
- this shortened sequence was accepted by the web interface for blastn - NCBI blastn
Test_X07024
XScufl file
emblAccNumberX07024seqretBlastn.xml
This blastn_ncbi example uses the embl entry X07024 as its test input.
- For the full sequence 5257 letters, an empty result set was returned after about 7 mins
- A shortened sequence of 2650 letters, did give a result from the NCBI web interface - NCBI blastn
Test_User_Sequence
XScufl file :
sequenceBlastn.xml
This blastn_ncbi example allows the user to paste in a sequence. Its results were the same as the examples that involved using seqret to get a test sequence. XLRHODOP is the only sequence for which results have been obtained.
Empty results
On several examples the blastn processor appears to complete but the results are incomplete. For example note that there is no done at the end of the line
Searching.... in the example below.
BLASTN 2.2.2 [Dec-14-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query=
(5257 letters)
Database: embl
3,078,285 sequences; 555,853,030 total letters
Searching.....................................................
( From
blastn_res_x07024.txt: example empty result )
Comparison Soaplab and Web Blastn
Versions and databases differ
The
NCBI blastn web interface gives (05 Nov 2003) e.g.
NCBI_BLAST_example.htm
- BLASTN 2.2.6 [Apr-09-2003]
- Database: All GenBank?+EMBL+DDBJ+PDB sequences
- Posted date: Nov 4, 2003 10:45 PM, Number of letters in database: -24,371,552, Number of sequences in database: 1,698,194
The Soaplab NCBI blastn web service gives (05 Nov 2003) e.g.
blastn_res_XLRHODOP.txt
- BLASTN 2.2.2 [Dec-14-2001]
- Database: embl
- Posted date: Sep 4, 2003 3:04 PM, Number of letters in database: 555,853,030, Number of sequences in database: 3,078,285
Scope of results
An example result for blastn using the NCBI web interface, with the default parameters, includes some identifiers that do not appear to be valid for seqret. For example, XM_352701, BC047240 in example below
Utilities
An XScufl model to test retrieving sequence using seqret and trimming.
--
MarkGreenwood - 05 Nov 2003