Chapter 4. Analysis

Table of Contents

4.1. Blast
4.1.1. Running Blast
4.1.2. Parsing Blast
4.1.3. Bio::Tools::BPlite family parsers
4.1.4. PSI-BLAST (Position Specific Iterative Blast)
4.1.5. bl2seq: Blast 2 sequences
4.1.6. Blast Internal classes structure
4.2. Genscan

4.1. Blast

4.1.1. Running Blast

4.1.1.1. Running local Blast with Bio::Tools::Run::StandAloneBlast

Example 4.1. StandAloneBlast run

use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta');
my $query = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
 						 'database' => 'swissprot',
						 _READMETHOD => "Blast"
						 );
my $blast_report = $factory->blastall($query);
my $result = $blast_report->next_result;

while( my $hit = $result->next_hit()) { 
   print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";}

	    
You can try it with this file.

Exercise 4.1. Running Blast on a Swissprot entry

Run the above example with the TAUD_ECOLI entry from Swissprot (see Section 2.2.2) ).

Solution A.10

Exercise 4.2. Running Blast: Setting parameters

Change the Expectation value (E) Blast parameter to 1.0 (instead of default value, 10.0) (query).

Solution A.11

Exercise 4.3. Running Blast: Saving output

Save the blast report in a local file (e.g blast.out).

Solution A.12

Exercise 4.4. Running a Remote Blast

Find out from the bioperl Web site documentation how to run a Blast at NCBI.

Solution A.13

4.1.2. Parsing Blast

4.1.2.1. Parsing from a Blast run result.

pseudocode:

Example 4.2. StandAloneBlast parsing

use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0],
                               -format => 'fasta');
my $query = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new(	'program'  => 'blastp',
				'database' => 'swissprot',
				_READMETHOD => "Blast" );
my $blast_report = $factory->blastall($query);
my $result = $blast_report->next_result;

while( my $hit = $result->next_hit()) {  
  print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n"; 
  while( my $hsp = $hit->next_hsp()) {
    print "E: ", $hsp->evalue(), "frac_identical: ", $hsp->frac_identical(), "\n";    }}

	    

$result belongs to the Bio::Search::Result::GenericResult class, where you can find the documentation about available methods on results.

Exercise 4.5. Display Blast hits

Display hit accession and hsp length.

Solution A.14

Exercise 4.6. Class of a Blast report

What is the class of $blast_report? (hint: use the ref() perl function).

Solution A.15

4.1.2.2. Parsing from a file.

The $blast_report that is created by the Bio::Tools::Run::StandAloneBlast new method actually belongs to the Bio::SearchIO class (see HOWTO). So, you can directly create such an object from a file.

Example 4.3. Parsing from a Blast file

use Bio::SearchIO;

my $blast_report = new Bio::SearchIO ('-format' => 'blast',
			              '-file'   => $ARGV[0]);
my $result = $blast_report->next_result;

while( my $hit = $result->next_hit()) {    
  print "\thit name: ", $hit->name(), "\n";    
  while( my $hsp = $hit->next_hsp()) { 	
    print "E: ", $hsp->evalue(), "frac_identical: ",	$hsp->frac_identical(), "\n";    }}

	    

Exercise 4.7. Parse a Blast output file

Run Example 4.3 with the file you have just created in Exercise 4.3.

Exercise 4.8. Parse Blast results on standard input

Run Example 4.3 with the blast report given on standard input.

Solution A.16

4.1.2.3. Blast Classes diagram

Figure 4.1. Blast Classes diagram

4.1.2.4. Filtering Blast output

Exercise 4.9. Filtering hits by length

Only display hits of length > to a given value (create a blast report from this file).

Solution A.17

Exercise 4.10. Filtering hits by position

Only display hits between two positions.

Solution A.18

Exercise 4.11. Display the best hit by databank

Display the best hit for each database in a Blast NCBI report (example).

  • If necessary, you can find here a dbname(hit->name) function to extract database name from the hit name.
  • You may use a perl hash to mark a database as already done:
    $done{$database} = 1;

Solution A.19

Exercise 4.12. Multiple queries

Submit two sequences and display the best hit for each. (2nd query)

Solution A.20

4.1.2.5. Extracting data from a Blast report

Exercise 4.13. Extracting the subject sequence

Extract as a string the subject sequence from HSP with identity above a given level.

Solution A.21

Exercise 4.14. Extracting alignments

Extract HSP alignment from hits which name contains a given pattern, and print this alignment in Clustalw format.

Solution A.22

Exercise 4.15. Locate EST in a genome

Locate EST in a genome:

  • displays the occurrences of this EST in this contig of a genome;

  • You may first display this as a Clustalw alignment (on sequence by HSP);

  • Blast report ;

Solution A.23

Exercise 4.16. Record Blast hits as sequence features

Iterate a local Blast on databases given on the command line and record each corresponding best hit as a feature for the query sequence.

Solution A.24

4.1.3. Bio::Tools::BPlite family parsers

The old Bio::Tools::BPlite family of parsers (BPlite, Bio::Tools::BPpsilite, Bio::Tools::BPbl2seq) is still maintained in the 1.0 release, although is will be deprecated one day. It is actually still necessary for blasting two sequences and to perform a psiblast (Position Specific Iterative Blast). It is also still the default parser returned by Bio::Tools::Run::StandAloneBlast, as shown below (although this might change soon).

Example 4.4. Parsing with BPLite

In the following example (notice that we have removed the _READMETHOD parameter), the $blast_report does not belong to the Bio::SearchIO class, but to the Bio::Tools::BPlite class. That is why the method to use in order to get informations are different, namely nextSbjct and nextHSP of class Bio::Tools::BPlite::Sbjct instead of next_hit and next_hsp. There is no next_result method, since these parsers were not able to deal with multiple reports.


use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
			      -format => 'fasta');
my $query = $Seq_in->next_seq();; 

my $factory = Bio::Tools::Run::StandAloneBlast->new(
						    'program'  => 'blastp',
						    'database' => 'swissprot'
						    );
my $blast_report = $factory->blastall($query);
while (my $subject = $blast_report->nextSbjct()) {
    print $subject->name(), "\n";
    while (my $hsp = $subject->nextHSP()) {
	print join("\t",
		   $hsp->P,
		   $hsp->percent,
		   $hsp->score), "\n";
    }
}

		  

Exercise 4.17. Print informations from a BPLite report

Take Example 4.4 and print hit names, and for all HSP, its P value, percent identity and score (see Bio::Tools::BPlite::HSP).

Solution A.25

Exercise 4.18. Create a Bio::Tools::BPlite from a file

Create a Bio::Tools::BPlite from a file.

Solution A.26

Exercise 4.19. Create a Bio::Tools::BPlite from standard input

Create a Bio::Tools::BPlite from standard input.

Solution A.27

Figure 4.2. BPLite Classes diagram

4.1.4. PSI-BLAST (Position Specific Iterative Blast)

See documentation or tutorial at NCBI.

Example 4.5. Running PSI-blast

The following example run a PSI-blast:

use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $query_in  = Bio::SeqIO->newFh ( -file   => $ARGV[0],
				    -format => 'fasta' );

my $query = <$query_in>; 
$factory = Bio::Tools::Run::StandAloneBlast->new('database' => 'swissprot');
$factory->j(2);
$blast_report = $factory->blastpgp($query);

for my $iteration (1 .. $blast_report->number_of_iterations()) {
  print "iteration: $iteration\n";
  my $result = $blast_report->round($iteration);

  while( my $hit = $result->nextSbjct()) {
    print "\thit name: ", $hit->name(), "\n";
    foreach my $hsp ($hit->nextHSP) {
      print "\tstart: ", $hsp->hit->start(), " end: ",$hsp->hit->end(),"\n";
      print "\tscore: ", $hsp->score(), "\n";	}    }}

	    

You can also load a report from a file, or from standard input:

        
my $blast_report = Bio::Tools::BPpsilite->new(-fh=>\*FH);
          

Example 4.6. PSI-Blast SearchIO class

There is also a SearchIO class for PSI-blast report, but it does not give access to the specific iterations:

use Bio::SearchIO;

my $in = Bio::SearchIO->new( -format  => 'psiblast',
				       -file    => $ARGV[0] );

while ( my $result = $in->next_result() ) { 
  foreach my $hit ( $result->hits ) {
    print "Hit: $hit\n"; }}

	  

Exercise 4.20. Parse a PSI-blast report

Load a PSI-blast report (example ) and at each iteration, display:

  • the hits that were not present at the previous iteration
  • the hits that have disappeared;

Solution A.28

4.1.5. bl2seq: Blast 2 sequences

Example 4.7. Running bl2seq

use Bio::SeqIO;use Bio::Tools::Run::StandAloneBlast;

my $query1_in  = Bio::SeqIO->newFh ( -file   => $ARGV[0],
				    -format => 'fasta' );
my $query1 = <$query1_in>; 
my $query2_in  = Bio::SeqIO->newFh ( -file   => $ARGV[1],
				    -format => 'fasta' );
my $query2 = <$query2_in>; 

$factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp');
$report = $factory->bl2seq($query1, $query2);
print "ref: ", ref($blast_report), "\n";

while(my $hsp = $report->next_feature) {  
  print "homology seq :\n",  $hsp->homologySeq, "\n";
  print "sbjctSeq :\n",  $hsp->sbjctSeq, "\n";}

	    

The call to bl2seq will return a Bio::Tools::BPbl2seq object.

Exercise 4.21. Build a Bio::SimpleAlign object

Build a Bio::SimpleAlign object from the subject and query sequences, and display this alignment in Clustalw format.

Solution A.29

4.1.6. Blast Internal classes structure

The following diagram shows the internal classes structure. It is provided for information, and it has not to be understood to use bioperl Blast classes.

Figure 4.3. Blast internal classes diagram