a sequence change where, compared to the reference sequence, one or more nucleotides are inserted and where the insertion is not a copy of a sequence immediately 5'
Format: “prefix”“positions_flanking”“ins”“inserted_sequence”, e.g. r.123_124insauc
“prefix” = reference sequence used = r. “positions_flanking” = position two nucleotides flanking insertion site = 123_124 “ins” = type of change is an insertion = ins “inserted_sequence” = inserted sequence = auc
prefix reference sequences accepted are r. (coding and non-coding RNA).
the “position” description should contain two flanking nucleotides, e.g. 123 and 124 but not 123 and 125.
an insertion can not be described using one nucleotide position, like r.123insg
for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
the 3’rule applies to ALL descriptions (genome, gene, transcript and protein) of a given variant
when the inserted sequence is very long it can best be submitted to a database (e.g. GenBank); the accession.version number obtained can then be used to describe the variant like r.123_124insL37425.1:23_361
the insertion of intronic nucleotides r.2950-30 to r.2950-12 and r.1993-4 to r.1993-1 between nucleotides r.1992 and r.1993 (caused by the deletion NG_012232.1(NM_004006.2):c.2950-11_2950-5del]
alternative description r.2949_2950ins[2950-30_2950-12;uuag]
Can I describe a variant as r.123insg?
No, since the description is not unequivocal it is not allowed. What does the description mean, the insertion of a "g" at position 123 or the insertion of a "g" after position 123? The situation becomes even more complex when using a coding RNA reference sequence a "-" character is used, e.g. r.-14insG; when the insertion is after nucleotide r.-14, is this position r.-13 or r.-15?
Can I use the "^" character to describe an insertion?
No, insertions can not be described using the format r.123ˆ124insu or r.123ˆ124u. The recommendations try to restrict the number of different characters used to a minimum. Since a character was already used to indicate a range (the underscore) a new character was not required.
How should I describe the change "aucgaucgaucgaucaggguccc" to "aucgaucgaucgaucaaucgaucgaucggguccc"? The fact that the inserted sequence (aucgaucgauc) is present in the original sequence suggests it derives from a duplicative event.
The variant should be described as an insertion; r.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between r.17 and r.18 is probably derived from nearby, i.e. position r.5 to r.16, and thus likely derived from a duplicative event.