DNA Recommendations

Alleles Variant


Definitions

Allele
a series of variants on one chromosome.

Description

Format (one allele): “prefix”[“change1”;”change2”], e.g. g.[123G>A;345del]

  • “prefix” = reference sequence used = g.
  • [ = opening symbol for allele = [
  • “change1” = description first variant = 123G>A
  • ; = separator symbol two changes = ;
  • “change2” = description second variant = 345del
  • ] = closing symbol for allele = ]

Format (two alleles): “prefix”[“change”];[“change”], e.g. g.[123G>A];[345del]

  • “prefix” = reference sequence used = g.
  • [ = opening symbol for allele-1 = [
  • “change” = description variant = 123G>A
  • ];[ = closing symbol for allele-1, separator symbol two alleles, opening symbol for allele-2 = ];[
  • “change” = description variant = 345del
  • ] = closing symbol for allele-2 = ]

Note

  • humans are diploid organisms and have two alleles at each genetic locus, with one allele inherited from each parent
  • when two variants are identified in a gene that are on one chromosome (in cis) this should be described as “g.[variant1;variant2]”.
  • when two variants are identified in a gene that are on different chromosomes (in trans) this should be described as “g.[variant1];[variant2]”.
  • when two variants are identified in a gene, but when it is not known whether these are on one chromosome (in cis) or on different chromosomes (in trans), this should be described as “variant1(;)variant”, i.e. without using “[ ]”. NOTE: it is recommended to determine whether the changes are on the same chromosome or not.
  • descriptions combining variants based on different reference sequence types (e.g. c.[76A>C];g.[10091C>G]) should not be used.

Examples

  • LRG_199t1:c.[2376G>C;3103del]
    one allele (chromosome) of a gene contains two different changes, g.2376G>C and c.3103del. The variants are found in cis.
  • LRG_199t1:c.[2376G>C];[3103del]
    the two alleles (chromosomes) of a gene each contain a different change, c.2376G>C and c.3103del. A heterozygous case (compound heterozygote, e.g. in a recessive disease). The variants are found in trans.
  • LRG_199t1:c.2376[G>C];[G>C]
    both alleles (chromosomes) of a gene contain the same variant, c.2376G>C. A homozygous case (e.g. in a recessive disease).
  • LRG_199t1:c.2376G>C(;)3103del
    two variants in a gene, c.2376G>C and c.3103del, but it is not known whether they are on the same or on different alleles (chromosomes).
    NOTE: when it is not known on which allele a variant is, allele brackets should not be used
  • LRG_199t1:c.2376[G>C];[(G>C)]
    analysis detects one variant (c.2376G>C), suggesting both alleles (chromosomes) contain this variants but it can not be excluded the other allele is deleted.
  • LRG_199t1:c.2376[G>C];[=]
    one allele (chromosome) of a gene contains a variant, c.2376G>C, the other allele (chromosome) contains the reference sequence, c.2376= (is wild-type).
    NOTE: the description c.[2376G>C];[=], containing c.2376G>C and c.=, is different since it indicates the entire coding DNA reference sequence was analysed and the only variant identified was c.2376G>C (on one allele).
    NOTE: for other variant types the format is c.2376[del];[=], c.2376_2399[dup];[=], c.2376_2377[insGT];[=], etc.
  • LRG_199t1:c.[2376G>C];[?]
    one allele (chromosomes) of a gene contains a variant, c.2376G>C, while a variant for the other allele is expected but not yet identified (g.?) (e.g. in individuals affected by a recessive disease).
  • LRG_199t1:c.[296T>G;476C>T;1083A>C];[296T>G;1083A>C]
    a sample contains variants c.296T>G and c.1083A>C on both alleles (chromosomes) and variant c.476C>T on only one allele.
  • NM_004006.2:c.[296T>G;476C>T];[476C>T](;)1083A>C
    a sample contains a homozygous variant (c.476C>T) and two heterozygous variants (c.296T>G and c.1083G>C) for which it is not known on which allele (chromosome) they are.
  • LRG_199t1:c.[296T>G];[476C>T](;)1083G>C(;)1406del
    a sample contains heterozygous variants on different alleles (c.296T>G and c.476C>T) and two additional heterozygous variants (c.1083G>C and c.1406del) for which it is not known on which allele (chromosome) they are.
  • NC_000023.10:g.[30683643A>G;33038273T>G]
    one allele (X-chromosome) contains two different changes in two different genes, g.30683643A>G in the GK gene and g.33038273T>G in the DMD gene.
  • c.[NM_000167.5:94A>G;NM_004006.2:76A>C]
    one allele contains two different changes in two different genes, NM_000167.5:c.94A>G in the GK gene and NM_004006.2:c.76A>C in the DMD gene.

Q&A

Was originally the recommendation to use the format [c.76A>C+c.83G>C]?

Indeed, originally den Dunnen and Antonarakis, 2000 the suggestion was to describe two changes in a gene on one chromosome as [c.76>C+c.83G>C], i.e. using a "+"-character to separate the two changes, while an earlier publication suggested to use a ";" ([c.76A>C;c.83G>C] (Antonarakis and the Nomenclature Working Group, 1998). To prevent confusion with older publications, to improve overall consistency and to keep descriptions as short as possible, the 2000 proposal was retracted. The recommended format is c.[76A>C;83G>C].

In recessive diseases, is it important I show which variants were found in which combination?

When in one individual you find more then one variant it is essential that you clearly indicate which variant(s) were found and on which allele(s);
  • disease severity will depend on the combination of variants found,
  • in recessive disease, when two variants are on one allele an individual is a carrier or you might not have found the variant on the 2nd allele.

I find the notation c.[76A>C] without describing the second allele misleading; not enough researchers know this refers to only one of the two alleles present. Would using c.[76A>C];[] be OK?

No, the recommended description is c.[76A>C];[76A=], i.e. c.76A= for "no change" on the second allele.

How should I describe the variants detected in males and females for a gene on the X-chromosome?

In females the description is straightforward, like c.[76A>C];[76A=]. In males there is no second allele (X-chromosome) which can be described as c.[76A>C];[0], i.e. using "c.0" to indicate the absence of a second X-chromosome.

I have a patient with hearing loss and variants in the GJB2 (c.35delG) and GJB6 (c.689_690insT) genes, how should I describe this? (Nancy Carson, Ottawa, Canada)

The recommendation is to use the format GJB2:c.[35delG] GJB6:c.[689_690insT]. This uses standard HGVS descriptions and prevents confusion regarding which variant was found in which gene. Note it is essential that you also define the coding DNA reference sequence used. Another format, coping with this directly, is to describe the variants as NM_004004.2:c.[35delG] NM_006783.1:c.[689_690insT], i.e. using the Genbank reference sequences in stead of the HGNC approved Gene Symbol.

When for a haplotype I have a range of changes to report, is there a suggested short format to use?

When it is once clearly specified (e.g. in the Materials & Methods) what variants are listed and based on which reference sequence(s), alleles may be reported in a simplified format like below. Variants should be listed in genomic order and using "[ ]" for variants on the same chromosome.
  • haplotype with all variants in relation to several different reference sequences, both genomic and coding DNA
    • description of the reference haplotype; [M59228.1:g.250G>C; AF209160.1:g.572CA[11_21]; Z11861.1:g.61T>C;Z16803.1:g.114A[18_22]
    • short haplotype description; [C;13;T;21]
  • haplotype with all variants in relation to one coding DNA reference sequence
    • description of the reference haplotype; NM_004006.1:c.[837G>A; 1704+51T>C; 3734C>T;6438+2669T(16_23); 6571C>T; 7098+13212GT(15_19)]
    • short haplotype description; [G;C;C;18;T;17]

Suggestion (Peter Taschner, Leiden)

Nomenclature recommendations mainly apply to genotype descriptions in tables. Unfortunately, these are not very useful in the general text of a paper. For instance, OPRM1:c.118A>G or rs1799971:A>G can be used to describe the variant, but in a paper you might like to discuss the phenotypic consequences of different genotypes. In fact the current recommendation is to use OPRM1:c.[118A>G];[118A=] to describe a heterozygote and OPRM1:c.[118A=];[118A=] and OPRM1:c.[118>G];[118>G] for the homozygotes. I would like to suggest to describe the genotypes in the text like;
  • OPRM1:c.118AA homozygotes
  • OPRM1:c.118GA heterozygotes
  • OPRM1:c.118GG homozygotes
The different alleles could then be designated as the OPRM1:c.118A allele and the OPRM1:c.118G allele. In combination with variants of other genes, the genotype descriptions could be OPRM1:c.118AA, GJB2:c.76AC double heterozygotes, etc.