What is FASTA format?
FASTA format is a text-based format for representing
either nucleotide sequences or peptide sequences,
in which base pairs or amino acids are represented using
single-letter codes.
A sequence in FASTA format begins with a single-line description, followed
by lines of sequence data. The description line is distinguished from the
sequence data by a greater-than (">") symbol in the first column. It is
recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
Sequences are expected to be represented in the standard
IUB/IUPAC
amino acid and nucleic acid codes,
with these exceptions:
- lower-case letters are accepted and are mapped into upper-case;
- a single hyphen or dash can be used to represent a gap of indeterminate
length;
- in amino acid sequences, U and * are acceptable letters (see below).
- any numerical digits in the query sequence
should either be removed or replaced by appropriate letter codes (e.g.,
N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes are:
A --> adenosine M --> A C (amino)
C --> cytidine S --> G C (strong)
G --> guanine W --> A T (weak)
T --> thymidine B --> G T C
U --> uridine D --> G A T
R --> G A (purine) H --> A C T
Y --> T C (pyrimidine) V --> G C A
K --> G T (keto) N --> A G C T (any)
- gap of indeterminate length
The accepted amino acid codes are:
A alanine P proline
B aspartate or asparagine Q glutamine
C cystine R arginine
D aspartate S serine
E glutamate T threonine
F phenylalanine U selenocysteine
G glycine V valine
H histidine W tryptophan
I isoleucine Y tyrosine
K lysine Z glutamate or glutamine
L leucine X any
M methionine * translation stop
N asparagine - gap of indeterminate length
|