How can I account for numbers in scientific notation and in decimal form in perl regex? -


i'm new perl regex appreciate help. parsing blast outputs. right now, can account hits e-value contains integers , decimals. how can include hits e-value in scientific notation?

blastoutput.txt

                                                               score     e sequences producing significant alignments:                       (bits)  value  ref|wp_001577367.1|  hypothetical protein [escherichia coli] >...  75.9    4e-15 ref|wp_001533923.1|  cytotoxic necrotizing factor 1 [escherich...  75.9    7e-15 ref|wp_001682680.1|  cytotoxic necrotizing factor 1 [escherich...  75.9    7e-15 ref|zp_15044188.1|  cytotoxic necrotizing factor 1 domain prot...  40.0    0.002 ref|yp_650655.1|  hypothetical protein ypa_0742 [yersinia pest...  40.0    0.002  alignments >ref|wp_001577367.1| hypothetical protein [escherichia coli] 

parse.pl

open (file, './blastoutput.txt'); $marker = 0; @one; @acc; @desc; @score; @evalue; $counter=0; while(<file>){    chomp;    if($marker==1){    if(/^(\d+)\|(.+?)\|\s(.*?)\s(\d+)(\.\d+)? +(\d+)([\.\d+]?) *$/) {    #if(/^(\d+)\|(.+?)\|\s(.*?)\s(\d+)(\.\d+)? +(\d+)((\.\d+)?(e.*?)?) *$/)              $one[$counter] = $1;             $acc[$counter] = $2;             $desc[$counter] = $3;             $score[$counter] = $4+$5;             if(! $7){                 $evalue[$counter] = $6;             }else{                 $evalue[$counter] = $6+$7;             }             $counter++;         }     }     if(/sequences producing significant alignments/){         $marker = 1;     }elsif(/alignments/){         $marker = 0;     }elsif(/no significant similarity found/){         last;     } } for(my $i=0; $i < scalar(@one); $i++){     print "$one[$i] | $acc[$i] | $desc[$i] | $score[$i] | $evalue[$i]\n"; } close file; 

you can match number in scientific notation (or not) this:

\d+(?:\.\d+)?+(?:e[+-]?\d+)?+ 

with code:

if (/^([^|]+)\|([^|]+)\|\s++(.*?)\s(\d+(?:\.\d+)?+)\s+(\d+(?:\.\d+)?+(?:e[+-]?\d+)?+)\s*$/) {     $one[$counter] = $1;     $acc[$counter] = $2;     $desc[$counter] = $3;     $score[$counter] = $4;     $evalue[$counter] = $5;     $counter++; } 

(i have added possessive quantifiers ++ , ?+ reduce number of backtracking steps as possible, 3th group use lazy quantifier. best use more precise pattern if possible description part.)


Comments