a sequence change where, compared to a reference sequence, a copy of one or more nucleotides are inserted directly 3' of the original copy of that sequence.
Format: “prefix”“position(s)_duplicated”“dup”, e.g. g.123_345dup
“prefix” = reference sequence used = g. “position(s)_duplicated” = position nucleotide or range of nucleotides duplicated = 123_345 “dup” = type of change is a duplication = dup 1
prefix reference sequences accepted are g., m., c. and n. (genomic, mitochondrial, coding DNA and non-coding DNA).
“positions_duplicated” should contain two different positions, e.g. 123_126 not 123_123.
the “positions_duplicated” should be listed from 5’ to 3’, e.g. 123_126 not 126_123.
by definition, duplication may only be used when the additional copy is directly 3’-flanking of the original copy (a “tandem duplication”).
when a variant can be described as a duplication it must be desribed as a duplication and not as e.g. an insertion (see Prioritization](/recommendations/general/)
when there is no evidence that the extra copy of a sequence detected is in tandem (directly 3’-flanking the original copy), the change can not be described as a duplication, it should be described as an insertion (see Insertion and proposal SVD-WG003).
inverted duplications are described as insertion (g.234_235ins123_234inv), not as a duplication (see Inversion)
when more then one additional copies are inserted directly 3’ of the original copy the change is indicated using the format for Repeated sequences, like  (triplication),  (quadruplication), etc.
for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
duplications around exon/exon junctions when identical nucleotides flank the junction (see Numbering);
when ..GAT gta..//..cag TCA.. changes to ..GATT gta..//..cag TCA.., based on a coding DNA reference sequence the variant is described as LRG_199t1:c.3921dup (NC_000023.10:g.32459297dup) and not as c.3922dup (which would translate to g.32456507dup)
1 = see Uncertain; when the postion and/or the sequence of a duplication has not been defined
one nucleotide - NM_004006.2:c.20dup (NC_000023.10:g.33229410dup)
the duplication of a T at position c.20 in the sequence AGAAGTAGAGG to AGAAGTTAGAGG
NOTE: the recommendation is not to describe the variant as NM_004006.2:c.20dupT, i.e. describe the duplicated nucleotide sequence. This description is longer, it contains redundant information and chances to make an error increases (e.g. NM_004006.2:c.20dupG).
a duplication from position c.20 to c.23 in the sequence AGAAGTAGAGG to AGAAGTAGATAGAGG
NOTE: the recommendation is not to describe the variant as c.20_23dupTAGA, i.e. describe the duplicated nucleotide sequence. This description is longer, it contains redundant information and chances to make an error increases (e.g. c.20_23dupTGGA).
a duplication of nucleotides c.160 to c.264+48 (coding DNA reference sequence), crossing an exon/intron border
the duplication of the T nucleotide at the exon/exon border in the sequence ..GAT gta..//..cag TCA.. changing to ..GATT gta..//..cag TCA..
NOTE : according to an exception of the 3’rule the variant (NC_000023.10:g.32459297dup) is not described as c.3922dup since this would shift the position of the variant to the next exon (c. 3922 linking to g.32456507) (see exception in Numbering and see Q&A)
the duplication of the G nucleotide at the exon/intron border in the sequence GAACAGgt…/..agTGCCTT changing to GAACAGggt…/..agTGCCTT (not c.1704dup)
NOTE: this description does not depend on the effect observed on RNA level, giving either altered splicing or r.1704dup
the duplication of the G nucleotide at the intron/exon border in the sequence CTGGCCgt…/..agGTTTTA changing to CTGGCCgt…/..agGGTTTTA (not c.1813-1dup)
NOTE: this description does not depend on the effect observed on RNA level, giving either altered splicing or r.1813dup
a duplication of nucleotides c.4072-1234 to c.5155-246 duplicating exon 30 (starting at position c.4072) to exon 36 (ending at position c.5154) of the DMD-gene.
NOTE : the format c.4072-1234_5155-246dupXXXXX, with XXXXX indicating the size of the duplication, should not be used
NOTE : the description NM_004006.2:c.4072-1234_5155-246dup is not correct, the reference sequence NM_004006.2 is a coding DNA reference sequence which does not include the intron sequences involved
a duplication of nucleotides c.720 to c.991 starting in exon 8 (position c.720) and ending in exon 10 (position c.991) of the DMD-gene.
a duplication of exon 30 (starting at position c.4072) to exon 36 (ending at position c.5145) of the human DMD-gene. The duplication break point has not been sequenced. Exons 29 (ending at c.4071) and 37 (starting at nucleotide c.5146) have been tested an shown to be not duplicated. The duplication therefore starts in intron 29 (position c.4071+1 to c.4072-1) and ends in intron 36 (position c.5145+1 to c.5156-1).
NOTE : previously, the suggestion was made to describe such duplications using the format c.4072-?_5154+?dup. However, since c.4072-? indicates “to an unknown postion 5’ of c.4072” and c.5154+? “to an unknown postion 3’ of c.5154” this description is not correct when it is known that exons 29 and 37 are involved.
a triplication of the sequence from exon 30 (starting at position c.4072) to exon 36 (ending at position c.5154) of the DMD-gene (break points not sequenced).
NOTE : this description should only be used when the two additional copies are in tandem with the original copy. There is no specific recommendation yet how to describe such a change but following current recommendations the format would be something like g.?_?ins(32381076_32382698)_(32430031_32456357) ( since 2 additional copies have been inserted somewhere in the genome).
duplications extending beyond the transcribed region
following current recommendations (see Numbering) it is not allowed to describe variants in nucleotides beyond the boundaries of a reference sequence. Consequently, duplications extending 5’ of a transcript can not be described like NC_000023.11(NM_004006.2):c.(?_-244)_(31+1_32-1)dup (c.-244 is the first nucleotide of NM_004006.2). Duplications extending 3’ of a transcript can not be described like NC_000023.11(NM_004006.2):c.(10086+1_10087-1)_(*2691_?)dup (c.*2691 is the last nucleotide of NM_004006.2). Such duplications can only be described using genomic coordinates. The HGVS nomenclature committee (SVD-WG) is discussing whether a c. based format should be proposed.
a duplication of the entire DMD gene based on a SNP-array analysis where the maximum size of the duplication lies between SNPs rs396303 and rs7887548 (nucleotides 31060227 and 33417151) and the minimum size between SNPs rs808178 and rs7887103 (nucleotides 31100351 and 33274278). Describing the duplication based on a coding DNA reference sequence using NC_000023.11(NM_004006.2):c.(-205839_-62966)_(*21568_*61692)dup makes no sense.
NOTE: the array analysis detects an extra copy of the sequences and it has to be determined whether it is a duplication. When it is not sure the variant is a duplication the variant should be described as an insertion, g.?_?insNC_000023.11:(31060227_31100351)_(33274278_33417151)
a duplication of the entire DMD gene based on a MLPA assay where nucleotides g.31120496 and g.33339477 are the center of the probes for the resp. last and first (brain promoter) exons.
NOTE: the MLPA analysis detects an extra copy of the sequences and it has to be determined whether it is a duplication. When it is not sure the variant is a duplication the variant should be described as an insertion, g.?_?insNC_000023.11:(?_31120496)_(33339477_?)
a mosaic case where from position g.19 to g.21 besides the normal sequence also chromosomes are found containing a duplication of this sequence
a chimeric case, i.e. the sample is a mix of cells containing g.19_21= and g.19_21dup
Why do we not describe a duplication as an insertion?
Although duplications are basically a special type of insertion, there are several reasons why the recommendation is to describe duplications separately;
the description is simple and shorter,
it is clear and prevents confusion regarding the position when an insertion is incorrectly reported like "22insG",
it prevents hypothetical discussions regarding the site of the insertion; in the case of a duplication including an intron/exon border (e.g. c.123-8_137dup) is the "insertion" in the intron or in the exon?
insertion more or less means "coming from elsewhere". Mechanistically, a duplication is most likely caused by a local event, DNA polymerase slippage, duplicating a local sequence.
Can I use g.123dup6 to describe a 6 nucleotide duplication?
No, a duplication of more than one nucleotide should give the position of the first and last nucleotide duplicated, separated using the range symbol ("_", underscore), e.g. g.123_128dup. Note also that from the description "g.123dup6" it is not clear whether the duplication starts at position g.123 (so g.123_128dup) or after position 123 (so g.124_129dup).
In the example above, c.3921dup, should the description based on a coding DNA reference sequence not be c.3922dup?
Strictly speaking you are right. However, for cases like this an exception was made to prevent that when c.3922dup is translated back to a genomic position one would end up at the wrong nucleotide, in the wrong exon (NC_000023.10:g.32456507dup in stead of NC_000023.10:g.32459297dup).
How should I describe the change ATCGATCGATCGATCGAGGGTCCC to ATCGATCGATCGATCGAATCGATCGATCGGGGTCCC? The fact that the inserted sequence (ATCGATCGATCG) is present in the original sequence suggests it derives from a duplicative event
The variant should be described as an insertion; g.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between g.17 and g.18 is probably derived from nearby, i.e. position g.5 to g.16, and thus likely derived from a duplicative event.