RNA Recommendations

Duplication Variant


Definitions

Duplication
a sequence change where, compared to a reference sequence, a copy of one or more nucleotides are inserted directly 3' of the original copy of that sequence.

Description

Format: “prefix”“position(s)_duplicated”“dup”, e.g. r.123_345dup

“prefix” = reference sequence used = r.
“position(s)_duplicated” = position nucleotide or range of nucleotides duplicated = 123_345
“dup” = type of change is a duplication = dup


Note

  • prefix reference sequences accepted are r. (coding and non-coding RNA).
  • by definition, duplication may only be used when the additional copy is directly 3’-flanking of the original copy (a “tandem duplication”).
  • when there is no evidence that the extra copy of a sequence detected is in tandem (directly 3’-flanking) the original copy, the change can not be described as a duplication, it should be described as an insertion (see Insertion).
  • for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
    • the 3’rule also applies for changes in single residue stretches and tandem repeats (nucleotide or amino acid)
    • the 3’rule applies to ALL descriptions (genome, gene, transcript and protein) of a given variant
  • under discussion, see Proposal for complex variants
    { } (curly braces) can be used to list any change in the duplicated sequence (“positions_duplicated”) which is different when compared to the source, e.g. r.123_345dup{234a>g}

Examples

  • r.7dup (one nucleotide)
    the duplication of a “u” at position r.7 in the sequence ..acuuacugcc.. to ..acuuacuugcc..
    NOTE: it is allowed to describe the variant as r.7dupu
    NOTE: it is not allowed to describe the variant as r.6_7insu (see prioritisation)
  • r.6_8dup (several nucleotides)
    a duplication from position r.6 to r.8 in the sequence ..acaauugcc.. to ..acaauugcugcc..
    NOTE: it is allowed to describe the variant as g.6_8dupugc

Q&A

Why do we not describe a duplication as an insertion?

Although duplications are basically a special type of insertion, there are several reasons why the recommendation is to describe duplications separately;
  • the description is simple and shorter,
  • it is clear and prevents confusion regarding the position when an insertion is incorrectly reported like "22insg".

How should I describe the change "aucgaucgaucgaucaggguccc" to "aucgaucgaucgaucaaucgaucgaucggguccc"? The fact that the inserted sequence (aucgaucgauc) is present in the original sequence suggests it derives from a duplicative event.

The variant should be described as an insertion; r.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between r.17 and r.18 is probably derived from nearby, i.e. position r.5 to r.16, and thus likely derived from a duplicative event.