a sequence change where, compared to a reference sequence, a copy of one or more nucleotides are inserted directly 3' of the original copy of that sequence.
Format: “prefix”“position(s)_duplicated”“dup”, e.g. g.123_345dup
“prefix” = reference sequence used = g. “position(s)_duplicated” = position nucleotide or range of nucleotides duplicated = 123_345 “dup” = type of change is a duplication = dup
prefix reference sequences accepted are g., m., c. and n. (genomic, mitochondrial, coding DNA and non-coding DNA).
“positions_duplicated” should contain two different positions, e.g. 123_126 not 123_123.
the “positions_duplicated” should be listed from 5’ to 3’, e.g. 123_126 not 126_123.
by definition, duplication may only be used when the additional copy is directly 3’-flanking of the original copy (a “tandem duplication”).
when there is no evidence that the extra copy of a sequence detected is in tandem (directly 3’-flanking the original copy), the change can not be described as a duplication, it should be described as an insertion (see Insertion and proposal SVD-WG003).
inverted duplications are described as insertion (g.234_235ins123_234inv), not as a duplication (see Inversion)
when more then one additional copies are inserted directly 3’ of the original copy the change is indicated using the format for Repeated sequences, like  (triplication),  (quadruplication), etc.
for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
the 3’rule also applies for changes in single residue stretches and tandem repeats (nucleotide or amino acid)
the 3’rule applies to ALL descriptions (genome, gene, transcript and protein) of a given variant
deletions/duplications around exon/intron and intron/exon borders when identical nucleotides flank these borders (see Question below)
c.546+1dup describes a duplication of a “G” at the exon/intron border ..CAGgtg.. (positions c.546/c.546+1). When RNA analysis shows a G duplication (r.456dup), so no effect on splicing, the change is described as c.546dup.
NOTE: when in the above example the next exon starts with GGT.. the duplication is still described as c.546dup (not c.548dup) but based on a coding RNA sequence as r.548dup.
a duplication from position g.6 to g.8 in the sequence ACAATTGCC to ACAATTGCTGCC
NOTE: it is allowed to describe the variant as g.6_8dupTGC
a duplication of nucleotides c.120 to c.123+48 (coding DNA reference sequence), crossing an exon/intron border
based on the sequence of a genomic DNA sample, a duplication of the A nucleotide c.123 in the sequence CAAgt…/..agAAG to CAAAgt…/..agAAG, i.e. the duplication of the last nucleotide of an exon (see Question below)
NOTE: when RNA is sequenced and the variant does not alter splicing the description at the RNA level based on a coding RNA reference sequence is r.125dup (the 3’rule needs to be applied)
a duplication of nucleotides c.4072-1234 to c.5146-246 duplicating exon 30 (starting at position c.4072) to exon 36 (ending at position c.5145) of the DMD-gene.
NOTE : c.4072-1234_5146-246dupXXXXX, the size of the duplication (XXXXX) should not be described
a duplication of exon 30 (starting at position c.4072) to exon 36 (ending at position c.5145) of the DMD-gene. The duplication break point has not been sequenced. Exons 29 (ending at c.4071) and 37 (starting at nucleotide c.5146) have been tested an shown to be not duplicated. The duplication therefore starts in intron 29 (position c.4071+1 to c.4072-1) and ends in intron 36 (position c.5145+1 to c.5156-1).
NOTE : previously, the suggestion was made to describe such duplications using the format c.4072-?_5154+?dup. However, since c.4072-? indicates “to an unknown postion 5’ of c.4072” and c.5154+? “to an unknown postion 3’ of c.5154” this description is not correct when it is known that exons 29 and 37 are involved.
a triplication of exon 30 (starting at position c.4072) to exon 36 (ending at position c.5145) of the DMD-gene (break points not sequenced.
NOTE : this description should only be used when the two additional copies are in tandem with the original copy. There is no specific recommendation yet how to describe such a change but following current recommendations the format would be something like c.?ins(4071+1_4072-1)_(5145+1_5146-1) ( since 2 additional copies have been inserted somewhere in the genome).
a duplication starting somewhere upstream of a gene, last postion tested duplicated c.-29, and ending in the intron between nucleotides c.12+1 and c.13-1 (intron 1).
a duplication of the entire protein coding region of a gene based on a coding DNA reference sequence).
NOTE: when more details are available regarding the duplication, based on the probes tested to determine its location, the description can be specified like c.(?_-189)_(*884_?)dup, meaning the duplication starts 5’ of c.-189 and extends 3’ of c.*884.
Why do we not describe a duplication as an insertion?
Although duplications are basically a special type of insertion, there are several reasons why the recommendation is to describe duplications separately;
the description is simple and shorter,
it is clear and prevents confusion regarding the position when an insertion is incorrectly reported like "22insG",
it prevents hypothetical discussions regarding the site of the insertion; in the case of a duplication including an intron/exon border (e.g. c.123-8_137dup) is the "insertion" in the intron or in the exon?
insertion more or less means "coming from elsewhere". Mechanistically, a duplication is most likely caused by a local event, DNA polymerase slippage, duplicating a local sequence.
Can I use g.123dup6 to describe a 6 nucleotide duplication?
No, a duplication of more than one nucleotide should give the position of the first and last nucleotide duplicated, separated using the range symbol ("_", underscore), e.g. g.123_128dup. Note also that from the description "g.123dup6" it is not clear whether the duplication starts at position g.123 (so g.123_128dup) or after position 126 (so g.124-129dup).
In the example above, c.123dup, should the description based on a coding DNA reference sequence not be c.125dup?
Strictly speaking you are right. However, for cases like this an exception was made to prevent that when c.125dup is translated back to a genomic position one would end up at the wrong nucleotide, even in the wrong exon.
How should I describe the change ATCGATCGATCGATCGAGGGTCCC to ATCGATCGATCGATCGAATCGATCGATCGGGGTCCC? The fact that the inserted sequence (ATCGATCGATCG) is present in the original sequence suggests it derives from a duplicative event
The variant should be described as an insertion; g.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between g.17 and g.18 is probably derived from nearby, i.e. position g.5 to g.16, and thus likely derived from a duplicative event.