Category: Science and Technology

Date Submitted: 11/28/2012

Bioinformatics is the application of informatics (IT, computer science, and statistics) to the field of molecular biology. It deals with the study of methods for storing, retrieving and analysing biological data (e.g. Nucleic acid (DNA/RNA) and protein sequence, structure, function, pathways and genetic interactions)

Data mining is the use of computer programs to find useful information by filtering or sifting through the data.

Uses of Bioinformatics:

•Sequence alignment/comparison

•Constructing genomes

•Finding and understanding genes

•Understanding large datasets

•Protein structure/interaction prediction

•Drug design

•Genome wide association studies

FASTA Format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. You often enter the sequence in FASTA format or enter the accession number or the gene identifier (GI). FASTA format has a single line of description (starting with >) then the sequence as continuous text, the word following the ">" symbol is the identifier of the sequence.

BLAST, Basic Local Alignment Search Tool, searches DNA and protein sequence databases for sequences that have similarity to your query sequence. BLAST searches all the DNA and/or protein data banks (Genbank, EMBL Swissprot etc.) for sequences that are a near match to yours. You can give it a DNA sequence or a protein sequence and ask it to search either DNA or protein databases or both.

Explanation of the output from BLAST:

E-value is the number of sequences of equal or higher similarity to the probe sequence that would be expected to be found in the database, just by chance. –i.e. it is an indicator of confidence; the closer to zero the more confident you are that these sequences really are homologous.

Query coverage is a fraction of query sequence that is aligned. Computed as the length of the aligned subsequence of the...