Protein Recommendations

Insertion Variant


Definitions

Insertion
a sequence change where, compared to the reference sequence, one or more amino acids are inserted and where the insertion is not a copy of a sequence immediately N-terminal (5')

Description

Format: “prefix”“amino_acids+positions_flanking”“ins”“inserted_sequence”, e.g. p.Lys23_Leu24insArgSerGln

“prefix” = reference sequence used = p.
“amino_acids+positions_flanking” = amino acids with positio flanking insertion site = Lys23_Leu24
“ins” = type of change is an insertion = ins
“inserted_sequence” = inserted sequence = ArgSerGln


Note

  • prefix reference sequence accepted is “p.” (protein).
  • predicted consequences, i.e. without experimental evidence (no RNA or protein analysed), should be given in parentheses, e.g. p.(Arg727_Ser728insTrpCys).
  • an insertion can not be described using one amino acid position, like p.Lys23insAsp
  • the “amino_acids+positions_flanking” should contain two flanking residues, e.g. Lys23 and Leu24, not two non-flanking residues (Lys23 and Asn25)
  • duplicating insertions should be described as duplications (see Duplication), not as insertion
  • in-frame insertions containing a translation stop codon in the inserted sequence are described as an insertion, not as a deletion-insertion removing the entire C-terminal amino acid sequence.
  • out-of-frame insertions are described as a frame shift.
  • for all descriptions the most C-terminal position possible of the reference sequence is arbitrarily assigned to have been changed (3’rule)
    • the 3’rule also applies for changes in single amino acid stretches and tandem repeats
  • variants should be described on the protein level and not incorporate knowledge regarding the change at the DNA level
  • when the inserted protein sequence is large and it is possible to derive the inserted amino acid sequence from the description given at DNA or RNA level, the insertion may be described by its length only (e.g. p.Lys2_Leu3ins34).
  • under discussion, see Proposal for complex variants
    { } (curly braces) can be used to list any change in the inserted sequence (“inserted_sequence”) which is different when compared to the source, e.g. p.Lys23_Leu24insArg100_Asp120{Gly111Glu}

Examples

  • p.His4_Gln5insAla
    the insertion of amino acid Ala between amino acids His4 and Gln5 changing MetLysGlyHisGlnGlnCys to MetLysGlyHisAlaGlnGlnCys
  • p.Lys2_Gly3insGlnSerLys
    the insertion of amino acids GlnSerLys between amino acids Lys2 and Gly3 changing MetLysGlyHisGlnGlnCys to MetLysGlnSerLysGlyHisGlnGlnCys
  • p.(Met3_His4insGlyTer)
    the predicted consequence at the protein level of an insertion at the DNA level (c.9_10insGGGTAG) is the insertion of GlyTer (alternatively Gly*)
    NOTE: this is not described as p.(Met3_Ile3418delinsGly), a deletion-insertion replacing the entire C-terminal protein coding sequence downstream of Met3 with a Gly)
  • p.Arg78_Gly79ins23
    the in-frame insertion of a 23 amino acid sequence between amino acids Arg78 and Gly79
    NOTE: it must be possible to deduce the 23 inserted amino acids from the description given at DNA or RNA level

Q&A

Can I describe a variant as p.His4insAla?

No, since the description is not unequivocal it is not allowed. What does the description mean, the insertion of a Ala at position 4 or the insertion of a Ala after position 4?

Can I use the "^" character to describe an insertion?

No, insertions can not be described using the format p.His4Gln5insAla or p.123ˆ124Ala. The recommendations try to restrict the number of different characters used to a minimum. Since a character was already used to indicate a range (the underscore) a new character was not required.

How should I describe the change "MetArgThrGlySerSerHisGlnTrpPhe" to "MetArgThrGlySerSerHisGlySerSerGlnTrpPhe"? The fact that the inserted sequence (GlySerSer) is present in the original sequence suggests it derives from a duplicative event.

The variant should be described as an insertion; p.His7_Gln8insGly4_Ser6. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between p.His7 and pGln8 is probably derived from nearby, i.e. position p.Gly4 to p.Ser6, and thus likely derived from a duplicative event.