Open Issues

Community Consultation

Proposal SVD-WG006 (circular DNA) suggesting to extend the HGVS recommendations to allowing a “o.” prefix for circular genomic reference sequences is currently open for Community Consulation (Closing: Oct.30 (2018).

Ongoing discussions

For closed topics see below.

Protein Extensions

Would you consider a small change in the suggested nomenclature for the description of extensions from p.*110Glnext*17 to p.*110Glnext17? (Yael Shinar, Tel Hashomer, Israel)

The description of extensions can probably indeed be simplified. While for extensions we currently give the position of the new translation initiation (start) codon as “-5” or of the termination (stop) codon as “*17” this is strictly speaking not necessary. By defenition the extension goes upstream for an N-terminal change and downstream for a C-terminal change. Using p.Met1ext5 (now p.Met1ext-5) and p.*110Glnext17 (now p.*110Glnext*17) therefore seems sufficient.

Circular molecules

July 19 (2016) - JT den Dunnen HGVS nomenclature assumes reference sequences are linear. However, the mitochondrial genome and many other DNA molecules (plasmids, viral genomes) are circular. How should one describe a variant involving the “first” and “last” nucleotides of the circular molecule? For now the suggestion is to describe the variant as m.[1del;16569del]. The question is whether m.16569_1del should be allowed (NOTE: includes an exception to the rule that in Y_Xdel X should be smaller then Y). Do you have a suggestion??


Aug.24 (2011) - JT den Dunnen HGVS nomenclature does currently not have recommendations for the descriptions of modifications of DNA, RNA or protein molecules. The most pressing need for a recommendation is are DNA methylation and RNA editing. Proposal SVD-WG005 (gom/lom) makes a start to get recommendations on this topic. SVD-WG005 introduces the use of the “|” character (“pipe”) to indicate that not a direct change of the sequence is described but a modification (change of state).

RNA editing

Addition of RNA editing data to a DNA variant database seems a sensible thing to do. An RNA-based sequencing study might reveal an interesting variant which, when checked in the database, is listed (…RNA editing data is not recorded). This will trigger a DNA sequencing experiment, trying to confirm the variant, which will fail since the variant is not present at the DNA level and valuable resources are spoiled.

The suggestion is to describe RNA editing using “** @**”;

The use of the “@” character versus other characters (&, $, ~, #) is of course debatable. Another option is to use a three-letter abbreviation like “del” and “ins”, e.g. “edt” (g.1287C|edt, c.143C|edt) but this is seems less attractive (longer and potentially confusing). The “@” should serve as a simple mark, indicating ‘note this site, something is happening at (“@”) this position.

Using the description r.143c>u on RNA level suggests a substitution. There are several types of RNA editing and “r.143c” probably does not really change to a “u”. All we can say is that the polymerases used to make a copy inserted an “a”. At some point we probably need to suggest ways to exactly describe the chemical modification made by the RNA editing enzyme but we can do that later. Making such recommendations can then be combined with those for DNA modifications (like methylation with methyl or hydroxy-methyl groups) making sure they follow the same rules.

The question is whether we need a specific description at the DNA level indicating that the nucleotide is known to be modified at RNA level. The main purpose of this mark would be to facilitate easy database retrieval of such sites. Approval of proposal SVD-WG001 more or less opened the option for such marks.

Exon Numbering

HGVS nomenclature does not give specific recommendations for the numbering of exons. For variant descriptions exon numbers are not required, nucleotide position are sufficient. In many genes there is no consensus on exon/intron numbering and several old numbering schemes may exist that had to be revised to include newly discovered exons (internal as well as 5’ and/or 3’ of the gene). This led to all kinds of numbering schemes with no clear structure, making it very difficult for non-experts in the specific gene to keep track of all details (see also Dalgleish 2010 and NCBI RefSeqGene). To prevent confusion and with the increasing use of genome browsers, numbering exons simply as 1, 2, 3, etc., from start to the end is the only logical option. Although this is probably difficult to accept by the experts, we can not keep on confusing newcomers by forever using legacy numbering systems. We should realize that, at some point, wrong assumptions will be made with as a consequence a patient will receive an erroneous diagnosis. This is of course unacceptable.

Closed topics

Imperfect copies

The proposal has been REJECTED

Accepting the proposal, without a whole range of specifications, would add too many options to decribe specific variants.

HGVS nomenclature has excellent possibilities to describe large duplications, inversions, conversions and insertions. However, no clear recommendations are available what to do when the nucleotides involved are not a perfect copy of the original sequence. The suggestion has been made (Taschner PEM, Den Dunnen JT (2011). Hum.Mutat. 32:507-511 to use “{ }” (curly braces) as a kind of “sub-alleles” to describe the variants in the altered region.

Numbering gene flanking nucleotides

The proposal has been REJECTED

The current recommendation to describe variants based on a coding DNA reference sequence is to use “c.-“ numbers for nucleotides 5’ of the ATG translation initiation codon and “c.*” numbers for nucleotides 3’ of the translation termination codon see Numbering. However, such descriptions do not show whether the nucleotides are inside or outside the transcribed region. The request has been filed (PEM Taschner, Leiden, Nederland) to make a discrimination between transcribed and un-transcribed nucleotides using the format;

This proposal has been rejected since: (i) genes often have several transcription initiation sites as well as polyA-addition sites, (ii) the transcription initiation (cap-site) is often ill-defined, (iii) variants that lie outside of a transcript can not be described based on a coding DNA reference sequence (c.), it does not contain the reference nucleotide, and should be described based on a gene or chromosome reference sequence. Use NC_000023.10:g.33229820A>G or LRG_199t1:c.-391T>C and not NC_000023.10(NM_004006.2):c.-244-u147T>C, LRG_199t1:c.-244-147T>C or similar descriptions.