-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathffigueiredo-cjs_trepid_paper.tex
353 lines (276 loc) · 14.3 KB
/
ffigueiredo-cjs_trepid_paper.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
\documentclass{bioinfo}
\copyrightyear{2015}
\pubyear{2015}
% \usepackage[english]{babel}
% %\usepackage[utf8]{inputenc} %unicode support
% \usepackage{indentfirst}
% \usepackage{amsmath,amsfonts,amssymb}
% % \usepackage[pdftex, bookmarks, colorlinks]{hyperref}
% \usepackage[bookmarks]{hyperref}
% % Load packages
% % \usepackage{cite} % Make references as [1-4], not [1,2,3,4]
% \usepackage{ifthen} % Conditional
% % \usepackage{multicol} %Columns
% \urlstyle{rm}
% \setlength{\doublerulesep}{\arrayrulewidth}
\usepackage{url} % Formatting web addresses
\begin{document}
\firstpage{1}
\title[Transposable Elements forward simulator]{An individual-based
model forward simulator for transposable elements dynamics and
evolution}
\author[Figueiredo \& Struchiner]{Felipe Figueiredo\,$^{1,2}$\footnote{to whom correspondence
should be addressed}, Claudio Struchiner\,$^{2}$}
\address{$^{1}$Programa de Biologia Computacional e Sistemas, Instituto
Oswaldo Cruz (BCS/IOC/Fiocruz)\\ $^{2}$Programa de Computação
Científica, PROCC/Fiocruz}
% Programa de P{\'o}s-gradua{\c c}{\~a}o em Biologia Computacional e
% Sistemas, Instituto Oswaldo Cruz, BCS/IOC/Fiocruz
% Programa
% de Computa{\c c}{\~a}o Cient{\'\i}fica
\history{Received on XXXXX; revised on XXXXX; accepted on XXXXX}
\editor{Associate Editor: XXXXXXX}
\maketitle
\begin{abstract}
\section{Motivation:}
Transposable Elements (TEs) are small genomic elements present in
almost all genomes sequenced so far, that appear repeatedly in the
individual’s genome. Much debate is ongoing about their origin, but
some mechanisms for the invasion of a host individual and the later
spread within the population are known from both \emph{in silico} and
wet laboratory experiments.
In order to assess both the evolutionary forces that promote variation
in TEs and their impact on host fitness, three independent levels
of biological organization must be observed simultaneously: (a) the
host abundance should be represented by a proper ecological
model, (b) the TEs' abundance should be denoted by a
population genetics model and (c) the TE sequences must evolve according to
a given evolutionary model. % Previous efforts in modelling TE dynamics
% focused mostly on the net quantity of TEs, either per individual or in
% the whole
% population. %% Models dealing with the molecular evolution of TEs are
%% scarce. Most mathematical or computational models deal with either one
%% or two of the levels of organization above.
\section{Results:}
%% We present TRepid, an individual-based model that can be used to
%% simulate Population Genetics of a host population focusing in
%% Transposable Elements’ dynamics. This system can be used for
%% simulating TE invasions, fixation and competition between TE families.
We present TRepid, an individual-based model that can be used to
simulate TE invasion and fixation on an age-structured host
population.
Host population dynamics features include random mating, %LM: is random mating a feature? I mean, it's less realistic than structured mating...
various population growth models % (including constant population, linear and
% exponential growth or decay and logistic)
and recombination by crossing over of gametes. The TEs spread dynamics
features include: transposition by either ’copy and paste’ or ’cut and
paste’, models that determine transposition rate, nucleotide
substitution according to a selectable molecular evolutionary model,
active and inactive TEs, inactivation of TE over generations,
selective pressure on host individuals by accumulation of TEs and
deleterious transposition events, among others.
\section{Availability:}
The software is available under the GNU General Purpose Licence (GPL) version 3 from
\url{https://launchpad.net/trepid}.
\section{Contact:}
\href{philsf79@gmail.com}{philsf79@gmail.com}
\end{abstract}
\section{Introduction}
%% \subsection{Dynamics of TEs}
Several approaches have been proposed to understand and predict how
the amount of TEs varies in hosts genomes
\citep{rouzic2005,HCX+05,SKR05}. Such mathematical models try to
assess the invasion capabilities, accumulation and fixation of new
copies and are usually based on mathematical and computational
techniques.
Previous % attempts in the modelling community
studies focused in reproducing
the dynamics predicte by differential or difference equations that
express how the total amount of TE copies vary in terms of acquisition
of new copies and excision of old ones
\citep{QA97,QA98,DCB05,DLC+06}.%% , although
%% some
%% explore less popular techniques such as genetic algorithms
%% \citep{QA01}, or even combination of more than one
%% technique.
These types of models can typically be characterized in a continuum
between two paradigm extremes: the \emph{master gene} model, in which
only one TE is active and generates inactive copies, and the
\emph{transposon} model in which every TE actively produces active
copies, leading to an exponential growth of TE copy number in the
absence of some regulation mechanism \citep{KRP05}. Phylogenetic
analysis can then provide the means to asses where in this spectrum a
given TE family resides \citep{BJ06,JB06}.
Most of the literature on models describing TE dynamics suffer from
the lack of empirical data against which these models could be %% FIXME: need to be more humble here
tested. It means that model diagnosis, an important step in model
development, is missing. By looking at the sequences of families of
transposable elements from the same genome, one can make inferences
about their phylogenetic tree. This tree is influenced by, among other
things, the changes in the number of elements of a given family over
evolutionary time. Thus, in principle, one can make inferences about
changes in the number of mobile elements in the genome from the
topology of the elements. Therefore, it becomes clear that examining
the population dynamics of TEs through coalescent approaches is going
to be highly informative \citep{Bro05,BJ06,SMT+09}. The main
contribution of our work is to devise a simulation framework that
mimics the empirical sequence data on TE dynamics where this dynamics
is known. By exploring this simulation framework, we hope to validate
the use of colescent models to estimate potential parameters, and
associated uncertainty in the estimation process, that describe TE
invasion. In doing so, we hope to contribute an important tool to the
debate about the introduction of transposable elements as part of
genetic drive systems moving genes through disease vector populations.
%% \section{Approach}
\begin{methods}
\section{Methods}
%% Throughout the computational model description, we will refer to
%% transposable elements as \emph{TEs}, and the individuals that carry
%% them as \emph{Hosts}. This is also the nomenclature of the objects
%% involved in the software.
%% \subsection{System description}
The simulator takes into consideration distinct and independent
modeling techniques for each level of biological organization: a
population model, a population genetics model and an evolutionary model of
the DNA sequences. Each of these sections are based on either
differential or difference equations, or probabilistic models.
At the ecological level the population model produces a trend for the
population dynamics. Currently there are models for constant and
exponential growth, and growth with saturation (logistic and Hassel
equations, see SOM for details. At the start of each generation, the
necessary ammount of sexually mature individuals is sampled and
coupled to generate the required ammount of offspring to follow the
ecological model as closely as possible, notwithstanding fitness
effects from TEs. The modular framework provide the means to implement
any other ecological model. An age structure is also optionally
available in discrete age classes.
At the population genetics level, the population genetics model does the
same thing for the ammount of TEs in each new individual, based on the
ammount of TEs in the parental gametes. Population genetics models are
available with selection impact \citep{SKR05} and without
\citep{rouzic2005}. FIXME: dump this mess and start over
At the sequence level the evolutionary model determines how the TE
sequences change over time, after sucessive transposition events, and
an aging structure for TEs that escape excision from the host
chromosomes.
\subsection{The generation algorithm}
The algorithm that happens at each generation models the basic
life cycle of a diploid sexual species subjected to transposition
events during gametogenesis.
\begin{enumerate}
\item Host couples are chosen randomly from available mature hosts at
the beginning of each reproductive season. Males and females are
chosen with and without replacement, respectively.
\item Each adult bears new gametes after transposition and
recombination.
\item Transposition draws a recruitment amount of new TE copies and
the deletion amount of excised copies from the transposition
model. This changes the content of the gametes in terms of
availability of TEs.
\item Mutations are sampled from the evolutionary model for newly
created TE copies. If ``cut and paste'' transposition is being used,
sample mutations for the original copy also. The same does not
happen for ``copy and paste''.
\item Recombination provides additional shuffling of gamete contents.
\item Mutations are sampled from the evolutionary model for all
existing TE copies.
\item Each couple gives birth to a number of offspring defined by the
user as a parameter.
%% \item Each newborn individual is composed of one chromosome from each
%% of its parents, and new individuals are introduced to the population
%% pool.
\item The fitness cost from TEs in newborn individuals is calculated
and any that exceeds a given threshold is killed before birth and
removed from the population.
\item The age of every surviving individual is incremented at the end
of the generation.
%% \item Newborn individuals inherit one chromosome from each parent,
%% after recombination, transposition and excision of active TEs.
\end{enumerate}
% FIXME: diagram in xfig.
%% \begin{table}[!t]
%% \processtable{This is table caption\label{Tab:01}}
%% {\begin{tabular}{llll}\toprule
%% head1 & head2 & head3 & head4\\\midrule
%% row1 & row1 & row1 & row1\\
%% row2 & row2 & row2 & row2\\
%% row3 & row3 & row3 & row3\\
%% row4 & row4 & row4 & row4\\\botrule
%% \end{tabular}}{This is a footnote}
%% \end{table}
\end{methods}
%% \begin{figure}[!tpb]%figure1
%% %\centerline{\includegraphics{fig01.eps}}
%% \caption{Caption, caption.}\label{fig:01}
%% \end{figure}
%% \begin{figure}[!tpb]%figure2
%% %\centerline{\includegraphics{fig02.eps}}
%% \caption{Caption, caption.}\label{fig:02}
%% \end{figure}
%\section{Discussion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% please remove the " % " symbol from \centerline{\includegraphics{fig01.eps}}
% as it may ignore the figures.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusion}
In this article we describe a computational model composed of a
forward-time individual-based model for the population genetics and
molecular evolution of transposable elements.
%% Similar software exist for each of the techniques we
%% combine.
Population genetics software exist for both forward simulations
\citep{Car08,GR06,PK05,PMW+08,Her08} and backward-time simulations
\citep{Hud02,TI09}, although most of them simply count the
distribution and availability of a set of alelles that populate a
given {\em locus} or {\em loci}. Similarly, there are simulators for
transposition phenomena in host populations \citep{DLC+06} but they
assume that TEs don't change over time. %% Software that consider
%% evolutionary models \citep{HG04} are scarce, and
As far as we are aware there is no simulator dealing with all three
levels of biological organization concomitantly as well as considering
how one level affect each other.
Additionally, at the end of each simulation an individual is
optionally sampled from the population so an additional level of
ecological modeling is being implicitly considered. This provides the
means to take into account a sampling distribution in a statistical
ecology framework.
\section*{Acknowledgement}
\paragraph{Funding\textcolon}
This work was partially supported by a Bill \& Melinda Gates
Foundation grant (FIXME), and a PhD scholarship by CAPES (FIXME).
\bibliographystyle{natbib}
%\bibliographystyle{achemnat}
%\bibliographystyle{plainnat}
%\bibliographystyle{abbrv}
%\bibliographystyle{bioinformatics}
%
%\bibliographystyle{plain}
%
\bibliography{ffigueiredo-cjs_trepid_bibtex}
%% \begin{thebibliography}{}
%% \bibitem[Bofelli {\it et~al}., 2000]{Boffelli03} Bofelli,F., Name2, Name3 (2003) Article title, {\it Journal Name}, {\bf 199}, 133-154.
%% \bibitem[Bag {\it et~al}., 2001]{Bag01} Bag,M., Name2, Name3 (2001) Article title, {\it Journal Name}, {\bf 99}, 33-54.
%% \bibitem[Yoo \textit{et~al}., 2003]{Yoo03}
%% Yoo,M.S. \textit{et~al}. (2003) Oxidative stress regulated genes
%% in nigral dopaminergic neurnol cell: correlation with the known
%% pathology in Parkinson's disease. \textit{Brain Res. Mol. Brain
%% Res.}, \textbf{110}(Suppl. 1), 76--84.
%% \bibitem[Lehmann, 1986]{Leh86}
%% Lehmann,E.L. (1986) Chapter title. \textit{Book Title}. Vol.~1, 2nd edn. Springer-Verlag, New York.
%% \bibitem[Crenshaw and Jones, 2003]{Cre03}
%% Crenshaw, B.,III, and Jones, W.B.,Jr (2003) The future of clinical
%% cancer management: one tumor, one chip. \textit{Bioinformatics},
%% doi:10.1093/bioinformatics/btn000.
%% \bibitem[Auhtor \textit{et~al}. (2000)]{Aut00}
%% Auhtor,A.B. \textit{et~al}. (2000) Chapter title. In Smith, A.C.
%% (ed.), \textit{Book Title}, 2nd edn. Publisher, Location, Vol. 1, pp.
%% ???--???.
%% \bibitem[Bardet, 1920]{Bar20}
%% Bardet, G. (1920) Sur un syndrome d'obesite infantile avec
%% polydactylie et retinite pigmentaire (contribution a l'etude des
%% formes cliniques de l'obesite hypophysaire). PhD Thesis, name of
%% institution, Paris, France.
%% \end{thebibliography}
\end{document}