DNA Recommendations

Insertion Variant


a sequence change where, compared to the reference sequence, one or more nucleotides are inserted and where the insertion is not a copy of a sequence immediately 5'


Format: “prefix”“positions_flanking”“ins”“inserted_sequence”, e.g. g.123_124insAGC

“prefix” = reference sequence used = g.
“positions_flanking” = position two nucleotides flanking insertion site = 123_124
“ins” = type of change is an insertion = ins
“inserted_sequence” = inserted sequence = AGC


  • prefix reference sequences accepted are g., m., c. and n. (genomic, mitochondrial, coding DNA and non-coding DNA).
  • the “position” description should contain two flanking nucleotides, e.g. 123 and 124 but not 123 and 125.
  • an insertion can not be described using one nucleotide position, like g.123insG
  • for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
    • the 3’rule applies to ALL descriptions (genome, gene, transcript and protein) of a given variant
  • tandem duplications are described as a duplication (g.123_456dup), not an insertion (g.456_457ins123_456)
    • inverted duplications are described as insertion (g.234_235ins123_234inv), not as a duplication (see Inversion)
  • when the inserted sequence is very long it can best be submitted to a database (e.g. GenBank); the accession.version number obtained can then be used to describe the variant like g.123_124insL37425.1:23_361
  • under discussion, see Proposal for complex variants
    { } (curly braces) can be used to list any change in the inserted sequence (“inserted_sequence”) which is different when compared to the source, e.g. g.123_124ins100_120{111A>G}


  • g.4426_4427insA
    the insertion of an A nucleotide between nucleotides g.4426 and g.4427
  • g.5756_5757insAGG
    the insertion of nucleotides AGG between nucleotides g.5756 and g.5757
  • g.123_124insL37425.1:23_361
    the insertion of nucleotides 23 to 361 as described in GenBank file L37425.1 between nucleotides g.123 and g.124
  • insertion of inverted duplicated copies
    • g.122_123ins123_234inv
      a copy of nucleotides g.123 to g.234 is inserted, in inverted orientation, 5’ of the original sequence, between nucleotide g.122 and g.123
    • g.234_235ins123_234inv
      a copy of nucleotides g.123 to g.234 is inserted, in inverted orientation, 3’ of the original sequence, between nucleotide g.234 and g.235
    • g.122_123ins213_234invinsAins123_211inv
      an inverted copy of nucleotides g.123 to g.234, with a G>A substitution of nucleotide g.212, is inserted directly 3’ of the original sequence
    • g.122_123ins212_234inv123_199inv
      an inverted copy of nucleotides g.123 to g.234, with a deletion from nucleotides g.200 to g.211, is inserted directly 3’ of the original sequence
  • incomplete descriptions, preferably use exact descriptions only
    • c.(67_70)insG (p.Gly23fs)
      the insertion of a G at an unknown position in the sequence encoding amino acid 23
    • g.549_550insN
      the insertion of one not specified nucleotide (N) between position g.549 and g.550
    • g.15431_15432ins(5) (alternatively g.11_12insNNNNN)
      the insertion of 5 not specified nucleotides (NNNNN) between position g.15431 and g.15432
    • g.1134_1135ins(100)
      the insertion of 100 not specified nucleotides between position g.1134 and g.1135
  • g.?_?insNC_000023.10:(12345_23456)_(34567_45678)
    the insertion of a sequence from the X-chromosome (NC_000023.10), maximally involving nucleotides 12345_45678 but certainly nucleotides 23456_34567, at an unknown position (g.?_?) in the genome (see Uncertain)


Can I describe a variant as g.123insG?

No, since the description is not unequivocal it is not allowed. What does the description mean, the insertion of a G at position 123 or the insertion of a G after position 123?
The situation becomes even more complex when using a coding DNA reference sequence a "-" character is used, e.g. c.-14insG or c.456-13insG. In the description c.456-13insG, when the insertion is after intronic nucleotide c.456-13, is this position c.456-12 or c.456-14?

Can I use the "^" character to describe an insertion?

No, insertions can not be described using the format g.123ˆ124insG or g.123ˆ124G. The recommendations try to restrict the number of different characters used to a minimum. Since a character was already used to indicate a range (the underscore) a new character was not required.

How should I describe the change ATCGATCGATCGATCGAGGGTCCC to ATCGATCGATCGATCGAATCGATCGATCGGGTCCC? The fact that the inserted sequence (ATCGATCGATCG) is present in the original sequence suggests it derives from a duplicative event.

The variant should be described as an insertion; g.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between g.17 and g.18 is probably derived from nearby, i.e. position g.5 to g.16, and thus likely derived from a duplicative event.

A variant in the CDKN2A gene, duplicating the first 24 nucleotides of the coding DNA reference sequence, has been described as c.23ins24. My interpretation is it should be described as c.1_24dup, is this correct?

Since the sequence in that region si cagcATGGAGCCGGCGGCGGGGAGCAGCATGGAGCCTTCG.. the correct decription is c.9_32dup (p.(Ala4_Pro11dup)). c.1_24dup seems correct but neglects the 3'rule (3' shift possible for the underlined region). c.23ins24 is not correct since the position of the insertion is not described properly and because ins"24" does not define the sequence inserted.