Standards


Nucleotides (DNA / RNA)

At DNA and RNA level HGVS nomenclature follows the Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences (see IUBMB (NC-IUB), specifying the description of nucleotides (see list) and the NCBI standards for sequence files and database searches (e.g. BLAST).


DNA

Symbol Meaning Description
A A Adenine
C C Cytosine
G G Guanine
T T Thymine
B C, G or T not-A (B follows A in alphabet)
D A, G or T not-C (D follows C in alphabet)
H A, C or T not-G (H follows G in alphabet)
K G or T Keto
M A or C aMino
N A, C, G or T aNy
R A or G puRine
S G or C Strong interaction (3 H-bonds)
V A, C or G not-T / not-U ( V follows U )
W A or T Weak interaction (2 H-bonds)
Y C or T pYrimidine
     
X* A, C, G or T masked nucleotide
-* none gap of indeterminate length

*used in alignment only


RNA

Symbol Meaning Description
a A Adenosine
c C Cytidine
g G Guanosine
u U Uridine
b c, g or u not-a (b follows a in alphabet)
d a, g or u not-c (d follows c in alphabet)
h a, c or u not-g (h follows g in alphabet)
k g or u keto
m a or c amino
n a, c, g or u any
r a or g purine
s g or c strong interaction (3 H-bonds)
v a, c or g not-u ( v follows u
w a or u weak interaction (2 H-bonds)
y c or u pyrimidine

Genetic Code

At the protein level HGVS nomenclature follows the Nomenclature and Symbolism for Amino Acids and Peptides see IUPAC-IUB, specifying the description of amino acids (see list). In addition HGVS nomenclature uses “Ter” (three-letter amino acid code) and “*” (three- and one-letter amino acid code) to indicate a translation termination (stop) codon. NOTE: in older versions the “X was used instead. In the Table below, to support translation from a DNA sequence, we have used a “T” in the codons although in nature RNA is translated so the codons contain U’s.

Nucleotide position in codon
First
Second Third
T C A G
T
TTT - Phe TCT - Ser TAT - Tyr TGT - Cys T
TTC - Phe TCC - Ser TAC - Tyr TGC - Cys C
TTA - Leu TCA - Ser TAA - */Ter TGA - */Ter A
TTG - Leu TCG - Ser TAG - */Ter TGG - Trp G
C CTT - Leu CCT - Pro CAT - His CGT - Arg T
CTC - Leu CCC - Pro CAC - His CGC - Arg C
CTA - Leu CCA - Pro CAA - Gln CGA - Arg A
CTG - Leu CCG - Pro CAG - Gln CGG - Arg G
A ATT - Ile ACT - Thr AAT - Asn AGT - Ser T
ATC - Ile ACC - Thr AAC - Asn AGC - Ser C
ATA - Ile ACA - Thr AAA - Lys AGA - Arg A
ATG - Met ACG - Thr AAG - Lys AGG - Arg G
G GTT - Val GCT - Ala GAT - Asp GGT - Gly T
GTC - Val GCC - Ala GAC - Asp GGC - Gly C
GTA - Val GCA - Ala GAA - Glu GGA - Gly A
GTG - Val GCG - Ala GAG - Glu GGG - Gly G

Amino Acid Descriptions

At the protein level HGVS nomenclature follows the Nomenclature and Symbolism for Amino Acids and Peptides see IUPAC-IUB, specifying the description of amino acids (see list). In addition HGVS nomenclature uses “Ter” (three-letter amino acid code) and “*” (three- and one-letter amino acid code) to indicate a translation termination (stop) codon (NOTE: in older versions the “X was used instead). In the Table below, to support translation from a DNA sequence, we have used a “T” in the codons although in nature RNA is translated so the codons contain U’s.

One Letter Code Three Letter Code Amino Acid Possible Codons Systemic Name Formula
A Ala Alanine GCA, GCC, GCG, GCT 2-Aminopropanoic acid CH3-CH(NH2)-COOH
B Asx Aspartic acid or Asparagine AAC, AAT, GAC, GAT    
C Cys Cysteine TGC, TGT 2-Amino-3-mercaptopropanoic acid HS-CH2-CH(NH2)-COOH
D Asp Aspartic acid GAC, GAT 2-Aminobutanedioic acid HOOC-CH2-CH(NH2)-COOH
E Glu Glutamic acid GAA, GAG 2-Aminopentanedioic acid HOOC-[CH2]2-CH(NH2)-COOH
F Phe Phenylalanine TTC, TTT 2-Amino-3-phenylpropanoic acid C6H5-CH2-CH(NH2)-COOH
G Gly Glycine GGA, GGC, GGG, GGT Aminoethanoic acid CH2(NH2)-COOH
H His Histidine CAC, CAT 2-Amino-3-(1H-imidazol-4-yl)-propanoic acid Histidine
I Ile Isoleucine ATA, ATC, ATT 2-Amino-3-methylpentanoic acid C2H5-CH(CH3)-CH(NH2)-COOH
K Lys Lysine AAA, AAG 2,6-Diaminohexanoic acid H2N-[CH2]4-CH(NH2)-COOH
L Leu Leucine CTA, CTC, CTG, CTT, TTA, TTG 2-Amino-4-methylpentanoic acid (CH3)2CH-CH2-CH(NH2)-COOH
M Met Methionine ATG (translation initiation) 2-Amino-4-(methylthio)butanoic acid CH3-S-[CH2]2-CH(NH2)-COOH
N Asn Asparagine AAC, AAT 2-Amino-3-carbamoylpropanoic acid H2N-CO-CH2-CH(NH2)-COOH
P Pro Proline CCA, CCC, CCG, CCT Pyrrolidine-2-carboxylic acid Proline
Q Gln Glutamine CAA, CAG 2-Amino-4-carbamoylbutanoic acid H2N-CO-[CH2]2-CH(NH2)-COOH
R Arg Arginine AGA, AGG, CGA, CGC, CGG, CGT 2-Amino-5-guanidinopentanoic acid H2N-C(=NH)-NH-[CH2]3-CH(NH2)-COOH
S Ser Serine AGC, AGT, TCA, TCC, TCG, TCT 2-Amino-3-hydroxypropanoic acid HO-CH2-CH(NH2)-COOH
T Thr Threonine ACA, ACC, ACG, ACT 2-Amino-3-hydroxybutanoic acid CH3-CH(OH)-CH(NH2)-COOH
U Sec Selenocysteine TGA   H2N-CH(COOH)–CH2-SeH
V Val Valine GTA, GTC, GTG, GTT 2-Amino-3-methylbutanoic acid (CH3)2CH-CH(NH2)-COOH
W Trp Tryptophan TGG 2-Amino-3-(lH-indol-3-yl)-propanoic acid Tryptophan
X Xaa unknown or ‘other’ NNN    
Y Tyr Tyrosine TAC, TAT 2-Amino-3-(4-hydroxyphenyl)-propanoic acid Tyrosine
Z Glx Glutamic acid or Glutamine      
* * Termination TAA, TAG, TGA (translation termination) HGVS addition (V2.0)