Number of rate shifts
In this vignette we go a little more in-depth and explain how the number of rate shifts ($\hat{N}$) is estimated.
Tree file
First, we load the necessary modules and read in the tree file. We assume that we know the total number of extant species, and using that we can calculate the sampling fraction.
using Pesto
using CairoMakie
phy = readtree(Pesto.path("primates.tre"))
sampling_fraction = 0.62
primates = SSEdata(phy, sampling_fraction)
Model setup
In this vignette, we pick the rate values by hand, and we don't use so many, in order to illustrate how the calculations work.
tree_length = sum(primates.branch_lengths)
λ = [0.1, 0.2, 0.3, 0.4, 0.20]
μ = [0.05, 0.15, 0.05, 0.15, 0.25]
η = 1 / tree_length
model = BhDhModel(λ, μ, η)
Number of total rate shifts
If we want to see the total number of rate shifts per branch, we can use the function birth_death_shift
for the standard inference:
rates = birth_death_shift(model, primates)
rates[1:5,:]
Row | delta_lambda | delta_mu | delta_netdiv | delta_relext | mean_lambda | mean_mu | mean_netdiv | mean_relext | node | edge | nshift | shift_bf |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Int64 | Int64 | Float64 | Float64 | |
1 | -5.91957e-6 | -5.94332e-6 | 2.3745e-8 | -1.49756e-5 | 0.19998 | 0.14998 | 0.05 | 0.749949 | 235 | 1 | 8.86353e-5 | 0.0755798 |
2 | 4.37341e-6 | -0.000114465 | 0.000118839 | -0.000485073 | 0.199954 | 0.149935 | 0.0500193 | 0.749807 | 236 | 2 | 0.00136835 | 0.0749235 |
3 | 0.00314284 | -0.00186221 | 0.00500506 | -0.0132617 | 0.20108 | 0.1492 | 0.0518801 | 0.744774 | 237 | 3 | 0.0254853 | 3.65958 |
4 | 0.000306745 | -0.00016171 | 0.000468455 | -0.0012008 | 0.203261 | 0.147923 | 0.055338 | 0.735638 | 238 | 4 | 0.00340504 | 1.22171 |
5 | 7.32749e-5 | -6.80136e-5 | 0.000141289 | -0.000385965 | 0.203463 | 0.147801 | 0.0556619 | 0.734795 | 1 | 5 | 0.00210083 | 1.00766 |
We can use Makie
to plot the number of accumulated diversification rate shifts on the branches of the phylogeny
using Makie, CairoMakie
cmap = Makie.cgrad([:black, :red])
treeplot(primates, rates, "nshift"; cmap = cmap)
Number of rate shifts
The total number of rate shifts, as shown above, is a reduction or simplification of the number of rate shifts that can be inferred by Pesto
. The number of rate shifts from state j
to state i
accumulated over the branch length (from old to young) is described by the following differential equation
\[\frac{d\hat{N}_{M,ij}}{dt} = S_{M,j}(t) \frac{-\eta}{K-1} \frac{D_{M,i}(t)}{D_{M,j}(t)} \text{ if } j \neq i\]
with initial condition $\hat{N}_{ij}(t_0) = 0$. In Pesto, we would compute this using
nshift = state_shifts(model, primates)
The object returned nshift
is a three-dimensional array. The first dimension corresponds to the branch index (what was M
). The second dimension represents the arrival state (i
), and the third dimension represents the departure state (j
). The order of the first index (M
) is the same order as the edge indices in a standard ape
-style tree in R.
If we sum over second and third dimension, we get the number of rate shifts per branch:
sum(nshift, dims = 2:3)[:,1,1]
464-element Vector{Float64}:
0.0005798600047737718
0.0013698039577871505
0.025473245263337572
0.0034049508789278124
0.0021008711095765993
0.0014620958874536666
0.0008341220494348285
0.0008341220494348285
0.0022831017166727516
0.0021950257106371387
⋮
0.000327269450228598
0.00043826506782476573
0.00043826506782476573
0.00048337752537685
0.000558719869446034
0.000558719869446034
0.0006336185423692295
0.0007282139401176412
0.0007282139401176412
If instead we sum over the first dimension, we get a breakdown over which rate transitions were more frequent:
Nmatrix = sum(nshift, dims = 1)[1,:,:]
5×5 Matrix{Float64}:
0.0 0.235917 0.00980872 0.0329929 0.000360913
0.00068558 0.0 0.00856915 0.0280922 0.00125319
0.00159569 0.5526 0.0 0.0388905 0.000743638
0.00217758 0.952673 0.0135705 0.0 0.00133833
0.000368773 0.106413 0.00695386 0.0223903 0.0
In this case, the most frequent rate shift was from state 2
to state 4
, with $\hat{N} = 0.95$ number of rate shifts. Going from state 2
to state 4
under this model means an increase of $0.4-0.2=0.2$ in speciation rate units. This can for example be visualized using a histogram:
mids, bins = makebins(Nmatrix, model, -0.35, 0.35; nbins = 7)
f = Makie.Figure(size = (500, 300))
ax = Makie.Axis(f[1,1],
ylabel = "Number of rate shifts",
xlabel = "Change in speciation rate (λi - λj)",
xticks = round.(mids; digits = 2),
xgridvisible = false,
ygridvisible = false)
Makie.barplot!(ax, mids, bins[:,1], color = :maroon)
f
Most of the rate shift events represent a shift from a smaller to a larger speciation rate (i.e. $\lambda_i - \lambda_j > 0$), however some rate shifts are in the other direction ($\lambda_i - \lambda_j < 0$). There are also a few rate shift events where the speciation rate does not change ($\lambda_i - \lambda_j = 0$). In these events, it is the extinction rate that changes, and not the speciation rate. If we are interested in the question, "what is the overall magnitude of shifts?", we can calculate the mean shift magnitude (weighted by their frequencies):
\[\frac{1}{\sum_{i,j}\hat{N}_{ij}} \sum_{i,j} \hat{N}_{ij} (\lambda_i - \lambda_j).\]
In Julia we can calculate it like so
shifts_weighted = Nmatrix .* (λ .- λ')
mean_magnitude = sum(shifts_weighted) / sum(Nmatrix)
0.09792144275319149
meaning that the overall shift magnitude for the primates tree under this model was an increase of 0.098 speciation rate units.