Numbering


genomic reference sequences

nucleotide numbering is g.1, g.2, g.3, …, etc. from the first to the last nucleotide of the reference sequence. Nucleotide numbers based on a genomic reference sequence do not include “+”, “-“, “*” or other prefixes.


mitochondrial DNA reference sequences

nucleotide numbering is g.1, g.2, g.3, …., etc. from the first to the last nucleotide of the reference sequence. Nucleotide numbers based on a mitochondrial reference sequence do not include “+”, “-“, “*” or other prefixes. NOTE: previously the suggestion was to use an “m.” prefix, this suggestion has been retracted (see Reference Sequences)


coding DNA reference sequences

nucleotide numbering is based on the annotated protein isoform, the major translation product.

Initial recommendations (Antonarakis (1998) and Den Dunnen & Antonarakis (2000)) suggested two alternative descriptions for intronic variants; c.88+2T>G / c.89-1G>T and c.IVS2+2T>G / c.IVS2-1G>T. The format c.IVS2+2T>G / c.IVS2-1G>T has been retracted and should not be used.


non-coding DNA reference sequences


RNA reference sequences

nucleotide numbering for a RNA reference sequencing follows that of the associated coding or non-coding DNA reference sequence; nucleotide r.123 relates to c.123 or n.123.


protein reference sequences

amino acid numbering is p.1, p.2, p.3, …, etc. from the first to the last amino acid of the reference sequence. Amino acid numbers based on a protein reference sequence do not include “+”, “-“, “*” or other prefixes.


Q&A


Figure

Reference Sequence Figure


Examples

The basic recommendation is that the reference sequence used represents the major and largest transcript of the gene. Variants present in alternative transcripts, not covered by the selected reference transcript, can be described based on annoted alternative transcript variants (e.g. LRG_199t3) or protein isoforms (e.g. LRG_199p3). However, alternatively spliced exons (5’-first, internal or 3’-terminal) derived from within the gene can be also numbered as for intronic sequences and variants in transcripts initiating or terminating outside this region can be described as upstream or downstream sequences.