What is FASTA format?

FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.

An example sequence in FASTA format is:

    >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
    MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
    AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
    QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
    LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
    
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions:
  • lower-case letters are accepted and are mapped into upper-case;
  • a single hyphen or dash can be used to represent a gap of indeterminate length;
  • in amino acid sequences, U and * are acceptable letters (see below).
  • any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).

The nucleic acid codes are:

        A --> adenosine           M --> A C (amino)
        C --> cytidine            S --> G C (strong)
        G --> guanine             W --> A T (weak)
        T --> thymidine           B --> G T C
        U --> uridine             D --> G A T
        R --> G A (purine)        H --> A C T
        Y --> T C (pyrimidine)    V --> G C A
        K --> G T (keto)          N --> A G C T (any)
                                  -  gap of indeterminate length
The accepted amino acid codes are:
        A  alanine                         P  proline
        B  aspartate or asparagine         Q  glutamine
        C  cystine                         R  arginine
        D  aspartate                       S  serine
        E  glutamate                       T  threonine
        F  phenylalanine                   U  selenocysteine
        G  glycine                         V  valine
        H  histidine                       W  tryptophan
        I  isoleucine                      Y  tyrosine
        K  lysine                          Z  glutamate or glutamine
        L  leucine                         X  any
        M  methionine                      *  translation stop
        N  asparagine                      -  gap of indeterminate length
    
 


yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue Ann Arbor, MI 48109-2218