DNA Recommendations

Deletion Variant


Definitions

Deletion
a sequence change where, compared to a reference sequence, one or more nucleotides are not present (deleted).

Description

Format: “prefix”“position(s)_deleted”“del”, e.g. g.123_127del

“prefix” = reference sequence used = g.
“position(s)_deleted” = position nucleotide or range of nucleotides deleted = 123_127
“del” = type of change is a deletion = del


Note

  • prefix reference sequences accepted are g., m., c. and n. (genomic, mitochondrial, coding DNA and non-coding DNA).
  • “positions_deleted” should contain two different positions, e.g. 123_126 not 123_123.
  • the “position(s)_deleted” should be listed from 5’ to 3’, e.g. 123_126 not 126_123.
  • for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
    • the 3’rule also applies for changes in single residue stretches and tandem repeats (nucleotide or amino acid)
    • the 3’rule applies to ALL descriptions (genome, gene, transcript and protein) of a given variant
    • exception
      deletions around exon/intron and intron/exon borders when identical nucleotides flank these borders (see Numbering)
      c.546+1del describes a deletion of a “G” at the exon/intron border ..CAGgtg.. (positions c.546/c.546+1). When RNA analysis shows a G deletion (r.456del), so no effect on splicing, the change is described as c.546del.
      NOTE: when in the above example the next exon starts with GGT.. the deletion is still described as c.546del (not c.548del).

Examples

  • g.7del (one nucleotide)
    a deletion of the T at position g.7 in the sequence ACTTACTGCC to ACTTAC_GCC
    NOTE: it is allowed to describe the variant as g.7delT
  • g.6_8del (several nucleotides)
    a deletion of nucleotides g.6 to g.8 in the sequence ACAATTGCC to ACAAT___C
    NOTE: it is allowed to describe the variant as g.6_8delTGC
  • c.120_123+48del
    a deletion of nucleotides c.120 to c.123+48 (coding DNA reference sequence), crossing an exon/intron border
  • exon/intron border
    • c.3del
      when exon 3 ends with ..CAA and exon 4 starts with ACG.. and the genomic DNA sequence shows that the last A-nucleotide of exon 3 is deleted (and not the first A in exon 4), the deletion changing ..CAAACG.. to ..CAACG.. is described as c.3delA (not c.4delA, see exception in Numbering)
      NOTE: it is allowed to describe the variant as c.3delA
    • c.8del
      the deletion of the G nucleotide at the intron/exon border in the sequence ATGCTGgt…/..agGGA to ATGCTGgt…/..agG_A
      NOTE: it is allowed to describe the variant as c.8delG
    • c.6+1del
      the deletion of the G nucleotide at the exon/intron border in the sequence ATGCTGgt…/..agGGA to ATGCTG_t…/..agAAG (not c.8del see Q&A)
      NOTE: it is allowed to describe the variant as c.6+1delG
  • c.4072-1234_5155-246del
    a deletion of nucleotides c.4072-1234 to c.5155-246 removing exon 30 (starting at position c.4072) to exon 36 (ending at position c.5154) of the DMD-gene.
    NOTE : c.4072-1234_5155-246delXXXXX, the size of the deletion (XXXXX) should not be described
  • c.(4071+1_4072-1)_(5154+1_5155-1)del
    a deletion of exon 30 (starting at position c.4072) to exon 36 (ending at position c.5154) of the DMD-gene. The deletion break point has not been sequenced. Exons 29 (ending at c.4071) and 37 (starting at nucleotide c.5155) have been tested an shown to be not deleted. The deletion therefore starts in intron 29 (position c.4071+1 to c.4072-1) and ends in intron 36 (position c.5154+1 to c.5155-1).
    NOTE : as mentioned (Uncertain) the description can also be probe-based. For a deletion of exons 30 to 36, detected using MLPA, the description would be c.(3996_4196)_(5090_5284)del, i.e. following the suggestion to use the central position (3’ nucleotide) of the probe ligation site. E.g. the MLPA exon 29 probes hybdrize from position c.3963 to c.4030 giving c.3996 as the position to use in the description.
    NOTE : this description is part of proposal SVD-WG003 (undecided).
    NOTE : previously, the suggestion was made to describe such deletions using the format c.4072-?_5154+?del. However, since c.4072-? indicates “to an unknown postion 5’ of c.4072” and c.5154+? “to an unknown postion 3’ of c.5154” this description is not correct when it is known that exons 29 and 37 are present.
  • c.(?_-30)_(12+1_13-1)del
    a deletion starting somewhere upstream of a gene, last postion tested postive c.-29, and ending in the intron between nucleotides c.12+1 and c.13-1 (intron 1).
  • c.(?_-1)_(*1_?)del
    a deletion of the entire protein coding region of a gene based on a coding DNA reference sequence).
    NOTE: when more details are available regarding the deletion, based on the probes tested to determine its location, the description can be specified like c.(?_-189)_(*884_?)del, meaning the deletion starts 5’ of c.-189 and extends 3’ of c.*884.
  • g.186_188=/del
    a mosaic case where from position 186 to 188 besides the normal sequence also chromosomes are found containing a deletion of this sequence
  • g.186_188=//del
    a chimeric case, i.e. the sample is a mix of cells containing g.186_188= and g.186_188del

Q&A

Can I use g.123del6 to describe a 6 nucleotide deletion?

No, a deletion of more than one residue should mention the first and last residue deleted, separated using the range symbol ("_", underscore), e.g. g.123_128del and not g.123del6.

In the example above, c.3delA, should the description based on a coding DNA reference sequence not be c.4delA?

Strictly speaking you are right. However, for cases like this an exception was made to prevent that when c.4delA is translated back to a genomic position one would end up at the wrong nucleotide, even in the wrong exon (see exception in Numbering).

Is the description of a deletion of exon 17 as c.EX17del still allowed?

A description like c.EX17del has never been allowed. Descriptions should be specific and indicate the nucleotides affected by the change.

Deletions in the BRCA1 gene are usually mediated by Alu sequences having a very high homology, reaching 100% in the breakpoint region. In such cases, what nucleotide should be used to describe the deletion breakpoint?

In cases like this the 3'rule applies (see Recommendations General), i.e. the deletion breakpoint is determined by the first nucleotide that differs after shifting the alignment as far 3' as possible. The first nucleotide differing is the first nucleotide deleted.

PCR analysis of a gene on the X-chromosome shows products for exons 1_3, no product is detected for exons 4_14 (exon 14 is the last exon of the gene). Since PCR fails already when one primer is not hybridising, we are not sure whether exon 4 and 14 are completely absent, or only partially. To describe the deletion I would therefore like to use the last base of exon 3 with "+?" and the last base of exon 13 with a "+?. What are your recommendations? (Erik-Jan Kamsteeg, Nijmegen, Nederland)

Literally speaking you are right and it is best to set the borders as precise as possible. When exon 3 is present the location of the reverse primer can be used to set the most 5' border (something like c.987+123). However, for the 3' end your reasoning does not make a difference. Since you do not know how far the deletion extends, you have no positive PCR limiting the deletion at the 3' end, using the location of exon 13 since exon 14 might be present would give the wrong impression. Consequently the precise description can only be like c.(987+123_?)del. Is this realy more informative then c.(987+1_?)del, using the exon 3 exon/intron border?

In literature I often see the description "deltaF508" for a variant in the CFTR gene in patients with Cystic Fibrosis. Is the variant detected in these patients c.1522_1524delTTT?

No. The sequence surrounding amino acid Phe508 in the CFTR gene is ..-ATC-TTT-GGT-.. (c.1519 to c.1527). Three different deletions (TC-T, C-TT and -TTT-) would give the reported protein variant "Phe508del". Applying the 3' rule [_see Recommendations_](/recommendations/general/) yields two different changes at DNA level, c.1521_1523del and c.1522_1524del. When you assume the change at DNA level is c.1522_1524delTTT, deletion of exactly the Phe508 encoding triplet, you are wrong. The change found in patients is mostly c.1521_1523delCTT. So, without a proper description in the manuscript one can not be certain.

Suggest to use "los" for a loss from a mononucleotide stretch

Pat O'Neill (Burlington, USA) writes; I especially like the use of "dup" in place of "ins" when the insertion creates a run of two or more nucleotides. I feel that there should be a parallel term for the loss of a nucleotide from a run of two or more instead of just "del". This is because of the mechanistic implications of both an ins and a del of a nucleotide in a run. Has this been discussed? My thought for a term in place of "del" is "los"for loss.
Shuji Ogino (Boston, USA) agrees but suggests to use "dec" for a decrease in length.
Reply (JdD); The "dup" nomenclature was introduced because it is simpler, shorter and less confusing (see above). The potential mechanistic relation is nice but was not decisive. Basically a description should be clear/unequivocal and it is not intended to contain other information.