Number of rate shifts

In this vignette we go a little more in-depth and explain how the number of rate shifts ($\hat{N}$) is estimated.

Tree file

First, we load the necessary modules and read in the tree file. We assume that we know the total number of extant species, and using that we can calculate the sampling fraction.

using Pesto
using CairoMakie

phy = readtree(Pesto.path("primates.tre"))
sampling_fraction = 0.62
primates = SSEdata(phy, sampling_fraction)

Model setup

In this vignette, we pick the rate values by hand, and we don't use so many, in order to illustrate how the calculations work.

tree_length = sum(primates.branch_lengths)
λ = [0.1, 0.2, 0.3, 0.4, 0.20]
μ = [0.05, 0.15, 0.05, 0.15, 0.25]
η = 1 / tree_length
model = BhDhModel(λ, μ, η)

Number of total rate shifts

If we want to see the total number of rate shifts per branch, we can use the function birth_death_shift for the standard inference:

rates = birth_death_shift(model, primates)
rates[1:5,:]
5×12 DataFrame
Rowdelta_lambdadelta_mudelta_netdivdelta_relextmean_lambdamean_mumean_netdivmean_relextnodeedgenshiftshift_bf
Float64Float64Float64Float64Float64Float64Float64Float64Int64Int64Float64Float64
1-5.91957e-6-5.94332e-62.3745e-8-1.49756e-50.199980.149980.050.74994923518.86353e-50.0755798
24.37341e-6-0.0001144650.000118839-0.0004850730.1999540.1499350.05001930.74980723620.001368350.0749235
30.00314284-0.001862210.00500506-0.01326170.201080.14920.05188010.74477423730.02548533.65958
40.000306745-0.000161710.000468455-0.00120080.2032610.1479230.0553380.73563823840.003405041.22171
57.32749e-5-6.80136e-50.000141289-0.0003859650.2034630.1478010.05566190.734795150.002100831.00766

We can use Makie to plot the number of accumulated diversification rate shifts on the branches of the phylogeny

using Makie, CairoMakie

cmap = Makie.cgrad([:black, :red])
treeplot(primates, rates, "nshift"; cmap = cmap)
Example block output

Number of rate shifts

The total number of rate shifts, as shown above, is a reduction or simplification of the number of rate shifts that can be inferred by Pesto. The number of rate shifts from state j to state i accumulated over the branch length (from old to young) is described by the following differential equation

\[\frac{d\hat{N}_{M,ij}}{dt} = S_{M,j}(t) \frac{-\eta}{K-1} \frac{D_{M,i}(t)}{D_{M,j}(t)} \text{ if } j \neq i\]

with initial condition $\hat{N}_{ij}(t_0) = 0$. In Pesto, we would compute this using

nshift = state_shifts(model, primates)

The object returned nshift is a three-dimensional array. The first dimension corresponds to the branch index (what was M). The second dimension represents the arrival state (i), and the third dimension represents the departure state (j). The order of the first index (M) is the same order as the edge indices in a standard ape-style tree in R.

If we sum over second and third dimension, we get the number of rate shifts per branch:

sum(nshift, dims = 2:3)[:,1,1]
464-element Vector{Float64}:
 0.0005798600047737718
 0.0013698039577871505
 0.025473245263337572
 0.0034049508789278124
 0.0021008711095765993
 0.0014620958874536666
 0.0008341220494348285
 0.0008341220494348285
 0.0022831017166727516
 0.0021950257106371387
 ⋮
 0.000327269450228598
 0.00043826506782476573
 0.00043826506782476573
 0.00048337752537685
 0.000558719869446034
 0.000558719869446034
 0.0006336185423692295
 0.0007282139401176412
 0.0007282139401176412

If instead we sum over the first dimension, we get a breakdown over which rate transitions were more frequent:

Nmatrix = sum(nshift, dims = 1)[1,:,:]
5×5 Matrix{Float64}:
 0.0          0.235917  0.00980872  0.0329929  0.000360913
 0.00068558   0.0       0.00856915  0.0280922  0.00125319
 0.00159569   0.5526    0.0         0.0388905  0.000743638
 0.00217758   0.952673  0.0135705   0.0        0.00133833
 0.000368773  0.106413  0.00695386  0.0223903  0.0

In this case, the most frequent rate shift was from state 2 to state 4, with $\hat{N} = 0.95$ number of rate shifts. Going from state 2 to state 4 under this model means an increase of $0.4-0.2=0.2$ in speciation rate units. This can for example be visualized using a histogram:

mids, bins = makebins(Nmatrix, model, -0.35, 0.35; nbins = 7)

f = Makie.Figure(size = (500, 300))
ax = Makie.Axis(f[1,1],
        ylabel = "Number of rate shifts",
        xlabel = "Change in speciation rate (λi - λj)",
        xticks = round.(mids; digits = 2),
        xgridvisible = false,
        ygridvisible = false)
Makie.barplot!(ax, mids, bins[:,1], color = :maroon)
f
Example block output

Most of the rate shift events represent a shift from a smaller to a larger speciation rate (i.e. $\lambda_i - \lambda_j > 0$), however some rate shifts are in the other direction ($\lambda_i - \lambda_j < 0$). There are also a few rate shift events where the speciation rate does not change ($\lambda_i - \lambda_j = 0$). In these events, it is the extinction rate that changes, and not the speciation rate. If we are interested in the question, "what is the overall magnitude of shifts?", we can calculate the mean shift magnitude (weighted by their frequencies):

\[\frac{1}{\sum_{i,j}\hat{N}_{ij}} \sum_{i,j} \hat{N}_{ij} (\lambda_i - \lambda_j).\]

In Julia we can calculate it like so

shifts_weighted = Nmatrix .* (λ .- λ')
mean_magnitude = sum(shifts_weighted) / sum(Nmatrix)
0.09792144275319149

meaning that the overall shift magnitude for the primates tree under this model was an increase of 0.098 speciation rate units.