转座元件(Transposable element,简称TE),又称转座子或移动元件,是一类DNA片段的集合,可以通过转座作用在基因组中从一个位置移动或复制到另一个位置。TE的长度范围从小于100个碱基对到超过20,000个碱基对不等。转座之后,很多类型TE两侧都含有短的(约1-20个碱基对)直接重复序列,这些直接重复序列是转座过程中从靶序列中衍生出来的靶位点重复序列(target site duplications,TSDs)。然而,一些TE类型,例如Helitron、几个Harbinger家族和CR1逆转座子,不产生TSDs。TSD的长度通常是一组TE及其相关物种的特征,但在不同家族和超家族中可能有所变化。在多数真核生物基因组中,TE是重复序列的主要成份。其他重复序列包括串联重复序列(卫星序列或微卫星)、零星的基因组重复以及一些多拷贝宿主基因(如rRNA、tRNA、组蛋白基因等)。事实上,TE可以被视为基因组内的寄生元件。同样地,细胞间病毒也可以被视为TE,因为它们可以整合到宿主基因组中,例如LTR-逆转录病毒。TE对宿主基因组具有多样化的进化影响。
Fig. 2. An unrooted consensus tree of the transposase superfamilies inferred from the presence or absence of the highly conserved residues in the signature strings. Bootstrap values are at the nodes. The arrows with labels indicate superfamily clusters merged in our revised classification. Shown on the right is a schematic representation of the DDE/D domain and the signature string for each superfamily. Conserved blocks are highlighted in blue, variable regions are in gray. White gaps are regions not drawn to scale. The DDE triads are highlighted in red. Alternative residues are marked by slashes; lowercase indicates that a residue occurs in <10% of the sequences in the alignment profile. The C/DH motif is highlighted in orange; the C(2)C, [M/L]H, and H(3-4)H motifs are highlighted in green.
List of mobile elements whose transposases have been examined by secondary structure prediction programs
Family
Element (or protein) analyzed
Active or # copies in genome1
From secondary structure, type of DDE/D
motif2
Relevant references3
IS1
IS1NISSto9
>40*5
DD(24)EDD(20)E
* Nyman et al., 1981; Ohta et al., 2002, 2004; Siguier et al., 2009
IS1595
1. ISPna2
?,DD(36)N”
Siguier et al., 2009
2. ISH4
?,DD(36)E”
Siguier et al., 2009
3. IS1016C
?,DD(34)E”
Siguier et al., 2009
4. IS1595
?,DD(35)N”
Siguier et al., 2009
5. ISSod11
13
DD(34)H
Siguier et al., 2009
6. ISNWi1
?,DD(35)E”
Siguier et al., 2009
7. ISNha5
?,DD(33)E”
Siguier et al., 2009
Merlin: MERLIN1_SM
Consensus
DD(36)E
Feschotte, 2004
IS3
IS911
Active
DD(35)E
Polard and Chandler, 1995; Rousseau et al., 2002
IS481
IS481
?00*
DD(35)E
*Glare et al., 1990; Chandler and Mahillon, 2002
IS4
IS50R
Active
PDB ID: 1muhDD(-strand)E
Rezshazy et al., 1993; Davies et al., 2000
IS701
IS701ISRso17
Active (15*)7
DD(-strand)E
*Mazel et al., 1991
ISH3
ISC1359ISC1439A
513
DD(-strand)E
IS1634
IS1634ISMac5ISPlu4
Active (?0*)77
DD(-strand)E
*Vilei et al., 1999
IS5
IS903
Active
DD(65)E
Derbyshire et al., 1987; Rezshazy et al., 1993; Tavakoli et al., 1997
PIF/Harbinger: PIFa (Z. mays)
Active
DD(59)E
Zhang et al., 2001; Kapitonov and Jurka, 2004; Sinzelle et al., 2008
IS1182
IS660ISPsy6
314
DD(-strand)E
Takami et al., 2001
IS6
IS6100
Active
DD(34)E
Martin et al., 1990; Mahillon and Chandler, 1998
IS21
IS21
Active
DD(45)E
Mahillon and Chandler, 1998; Berger and Haas, 2001
IS30
IS30
Active
DD(33)E
Caspers et al., 1984; Mahillon and Chandler, 1998
IS66
IS679ISPsy5ISMac8
Active333
DD(-helical?)E
Han et al., 2001
IS110
IS492IS1111
Active20
DEDDDEDD
Perkins-Balding et al., 1999; Buchner et al., 2005
IS256
IS256
Active
DD(-helical)E
Mahillon and Chandler, 1998; Prudhomme et al., 2002
MuDr/Foldback (Mutator)
Active
DD(-helical)E
Eisen et al., 1994; Babu et al., 2006; Hua-Van and Capy, 2008
IS630
ISY100
Active
DD(34)E
Doak et al., 1994; Feng and Colloms, 2007
Tc1/mariner: Mos1 (D. mauritiana)
Active
PDB ID: 2f7tDD(34)D
Plasterk et al., 1999; Richardson et al., 2006
Zator: Zator-1_HM
36*
DD(43)E
*Bao et al., 2009
IS982
ISPfu3
5
DD(47)E
Mahillon and Chandler, 1998
IS1380
IS1380A
?00*
DD(-strand)E
*Takemura et al., 1991; Chandler and Mahillon, 2002
piggyBac (T. ni)
Active
DD(-strand)D
Cary et al., 1989; Sarkar et al., 2003; Mitra et al., 2008
ISAs1
ISAzo3
7
DD(-strand)E/D?
ISL3
IS31831IS651
Active22
DD(-helical)E
Suzuki et al., 2006
Tn3
Tn3 (E. coli)
Active
DD(-helical?)E
Grindley, 2002
hAT
Hermes (M. domestica)
Active
PDB ID: 2bw3 DD(-helical)E insertion
Warren et al., 1994; Rubin et al., 2001; Hickman et al., 2005
CACTA
CACTA1 (A. thaliana) En/Spm ZM
Active
DD(-helical?)E/D?
Miura et al., 2001; DeMarco et al., 2006
P
Drosophila
Active
?
Rio, 2002
Transib
Transib1_AG
Consensus
DD(-helical)E
Kapitonov and Jurka, 2005; Chen and Li, 2008
RAG1 (M. musculus)
Active
DD(-helical)E
Kim et al., 1999; Landree et al., 1999; Lu et al., 2006
Sola
Sola3-3_HM
Multiple copies*
DD(40)E
*Bao et al., 2009
Hickman AB, Chandler M, Dyda F. Integrating prokaryotes and eukaryotes: DNA transposases in light of structure. Crit Rev Biochem Mol Biol. 2010 Feb;45(1):50-69. doi: 10.3109/10409230903505596.
Classification, distribution and the number of entries of LTR retrotransposons in Repbase
Superfamily
Total
Copia
10,595
Gypsy
6,694
BEL
1,855
ERV
ERV1
1,967
ERV2
1,266
ERV3
657
ERV4
187
Lentivirus
4
Unclassified ERV
325
Unclassified LTR
719
DIRS
418
Classification, and the number of entries of non-LTR retrotransposons in Repbase
Group
Clade
Total
CRE
CRE
43
R2
R4
46
Hero
23
NeSL
106
R2
159
Dualen
RandI/Dualen
13
L1
Proto1
6
L1
1,690
Tx1
273
RTE
RTETP
1
Proto2
47
RTEX
138
RTE
487
I
Outcast
23
Ingi
17
Vingi
141
I
195
Nimb
108
Tad1
141
Loa
74
R1
237
Jockey
243
CR1
Rex1
95
CR1
803
Kiri
91
L2
285
L2A
5
L2B
27
Crack
140
Daphne
227
Ambal
Ambal
8
Penelope
Penelope
477
SINE
SINE1/7SL
95
SINE2/tRNA
539
SINE3/5S
30
SINEU
17
Unclassified SINE
112
Unclassified non-LTR retrotransposon
179
Total
7,341
Reference: Kenji K. Kojima, Structural and sequence diversity of eukaryotic transposable elements, Genes & Genetic Systems, 2019, Volume 94, Issue 6, Pages 233-252
Proposal of TE classes with some members having a DNA transposon phenotype
Reference: Benoît Piégu, Solenne Bire, Peter Arensburger, Yves Bigot, A survey of transposable element classification systems – A call for a fundamental update to meet the challenge of their diversity and complexity, Molecular Phylogenetics and Evolution, Volume 86, 2015, Pages 90-109, ISSN 1055-7903, https://doi.org/10.1016/j.ympev.2015.03.009.
Major Features of Prokaryotes IS families (ISfinder)
Distribution of TEs across the eukaryote phylogeny
Reference genome size (sea green circles) varies dramatically across eukaryotes and is loosely correlated with transposable element content. Here, the honey bee TE content is likely an underestimate, as approximately 3% of the genome derives from unusual “large retrotransposon derivatives” (LARDs) (39). For ease of visualisation, DIRS elements have been included with LTRs and all Class II elements included under “DNA”. Data was acquired from genome RepeatMasker output files. Credit to Matt Crook for Volvox carteri silhouette and to Huang et al. for the figure inspiration (71). 阅读更多