genomic reference sequences

coding DNA reference sequences

nucleotide numbering is based on the annotated protein isoform, the major translation product.

Initial recommendations (Antonarakis (1998) and Den Dunnen & Antonarakis (2000)) suggested two alternative descriptions for intronic variants; c.88+2T>G / c.89-1G>T and c.IVS2+2T>G / c.IVS2-1G>T. The format c.IVS2+2T>G / c.IVS2-1G>T has been retracted and should not be used.

non-coding DNA reference sequences

RNA reference sequences

nucleotide numbering for a RNA reference sequence follows that of the associated coding or non-coding DNA reference sequence; nucleotide r.123 relates to c.123 or n.123.

protein reference sequences

amino acid numbering is p.1, p.2, p.3, …, etc. from the first to the last amino acid of the reference sequence



Reference Sequence Figure


The basic recommendation is that the reference sequence used represents the major and largest transcript of the gene. Variants present in alternative transcripts, not covered by the selected reference transcript, can be described based on annoted alternative transcript variants (e.g. LRG_199t3) or protein isoforms (e.g. LRG_199p3). However, alternatively spliced exons (5’-first, internal or 3’-terminal) derived from within the gene can be also numbered as for intronic sequences and variants in transcripts initiating or terminating outside this region can be described as upstream or downstream sequences.