In Evolve & Resequence (E&R) studies, a population is grown for a period of time under a given condition and sampled at several time points. The frequencies of genetic variants or phenotypes for the different time points are measured, and we’d like to infer the fitness that is associated with each specific variant (or phenotype). An example for such frequency data, sampled for 15 generations and determined for frequency is described here:
The size of the population is estimated to be 100,000. Therefore the parameter
N 100000was set.
The mutation rate is estimated to 1:100,000 (or 10-5). Therefore the parameter
mutation_rate0_1 1e-05was set.
We expect the fitness values of the phenomena to be between 0 (the minimum possible fitness value) and 2 (very adaptive fitness). Therefore the parameter
min_fitness_allele1 0.0was set, to indicate zero minimal expected fitness and
max_fitness_allele1 2.0as well, to indicate the maximal possible fitness value of two.
The prior we chose for this analysis was smoothed_composite, a prior that is built towards typical fitness landscapes. Therefore the parameter
fitness_prior smoothed_compositewas set.
We want the ABC framework to perform 100,000 simulations, and accept the fitness value from the best 1,000 simulations. Therefore the parameter
num_samples_from_prior 100000was set, to indicate 100,000 simulations, and the parameter
acceptance_rate 0.01was set, to indicate that the top 1% simulations will be used to decide on the fitness value of this allele.
The inferred fitness value by FITS was practically 1:
Mutation rate inference¶
A common problem is the inference of the rates of mutations between two (or more) alleles. FITS supports such inference by harnessing prior knowledge about the fitness of the mutant allele(s) (say, from competition essay) and and the size of the population. A particular example for a case where such inference can be highly accurate is the usage of frequencies of multiple positions with equal fitness. In many biological entities synonyomus mutations approach neutrality and therefore may be used for mutation rate inference. In this example we’ll highlight how this can be done. We simulated 10 independent loci using fitness value of 1, population size of 100,000 and mutation rate of 10-5 and measured their frequencies for 15 generations.
The minimum and maximum considerable mutations rates should be provided within the parameters file, using the log value. For this example, we use considerable mutation rates between 10-7 and 10-3, which will be defined between the wildtype allele (0) and the mutant allele (1) and vice-versa. For providing the minimal log mutation rate between the wildtype allele and the mutant we set
min_log_mutation_rate0_1 -7and its reciprocal
min_log_mutation_rate1_0 -7. For providing the maximal log mutation rate between the wildtype allele and the mutant we set
max_log_mutation_rate0_1 -3and its reciprocal
We used neutral alleles and therefore set the wildtype and mutant alleles’ fitness to be one:
Population size inference¶
If the mutation rates are known and the fitness of the measured allele is known, then the population size parameter may be inferred. Similar to the mutation rate inference, this can be performed by using frequency data from several loci that has the same fitness values. Here we simulated 10 neutral positions for 15 generations, using a population size of 100,000 and a mutation rate of 10-5.
Our prior knowledge suggests that the population size may be in the range between 104 and 107. We therefore set the parameter
Nlog_min 4to indicate minimum population size of 104. We also set the parameter
Nlog_max 7to indicate the maximum population size of 107.
Our prior knowledge also suggests that the alleles we measured are neutral, and therefore we set the wildtype and mutant alleles to have a fitness of 1:
Sometimes, we wish to have frequency data generated. Since simulations are a cornerstone on which FITS is relying on, it is possible to ask the framework to perform simulations of frequencies for given mutation rates, population size and fitness value.
In order to do so, we need to provide these three parameters as described in previous examples. For this example, we’ll use a mutation rate of 10-3, a fitness of 1.02 and a population size of 105.
We wish to simulate two alleles only. We therefore set
num_alleles 2to indicate two alleles.
We set for the two alleles the fitness values of 1 for the wildtype and 1.02 for the mutant:
We set the corresponding (equal) mutation rates:
The population size is set by defining
The last two things to consider are the frequency of the alleles in the beginning of the simulation, and the number of generations to simulate. Here we will assume that the wildtype allele is fixated for the beginning of the simulation. We’ll therefore set
init_freq_allele1 0. To control for the number of generations (100 in our example) we set
There’s no need to load a Data file in order to perform the simulations.
Considering Sample Effect¶
Population bottlenecks are common during any evolutionary scenario, whether in vivo or in vitro in experimental populations. In particular during serial passaging, only a fraction of the progeny will be carried on to the next passage. An additional layer of sampling also exists: the number of genomes that are sampled to be sequenced may often be much smaller than the population size itself. FITS can account for both of these types of sampling effects by using the following three parameters:
sample_size, as illustrated below: