Two-step rate analysis

In this vignette, we will set up the analysis that is shown in the (Pesto manuscript, submitted), and we will explain how the model is set up in more detail.

Tree file

First, we load the necessary modules and read in the tree file. We assume that we know the total number of extant species, and using that we can calculate the sampling fraction.

using Pesto

phy = readtree(Pesto.path("primates.tre"))
sampling_fraction = 0.62
primates = SSEdata(phy, sampling_fraction)

Model choice

Next, we set up the SSE model, including its dimensionality and hyperparameters. For this model, we will draw the speciation rate (λ) and extinction rate (µ) from LogNormal distributions. We pick the median of the LogNormal distributions such that they correspond to the maximum-likelihood estimates of the constant-rate birth-death model. This is also called the "empirical Bayes" approach, where we use the data twice. We pick the log-sd as H = 0.587, which corresponds to a LogNormal distribution whose 2.5%-97.5% quantile spans one order of magnitude.

λml, μml = estimate_constant_bdp(primates)

H = 0.587
n = 6

using Distributions
dλ = LogNormal(log(λml), H)
dμ = LogNormal(log(µml), H)

λquantiles = make_quantiles(dλ, n)
µquantiles = make_quantiles(dμ, n)
λ, μ = allpairwise(λquantiles, µquantiles)

The scatter plot of λ on the x-axis, and µ on the y-axis looks like the figure below (blue dots), with the quantiles of the LogNormal distributions on the margin.

primatestree

Next, we estimate the rate shift parameter η under the SSE model, conditional on λ and µ.

η = optimize_eta(λ, µ, primates)
0.0033108944752150236

The units of $\eta$ are number of rate shift events per lineage per time. The product of the tree length (the sum of all branch lengths) times $\eta$ will give us the number of expected rate shifts under the prior:

sum(primates.branch_lengths) * η
5.788207554752277

This allows us to set up the SSE model object:

model = BhDhModel(λ, μ, η)

With the model and data objects we can for example calculate the log likelihood

logL_root(model, primates)
-687.9243636513248

Branch rates and shifts

Or we can compute both the postorder and preorder pass, and get the expected speciation and extinction rates per branch. The result is a data frame object, and we print the first five rows:

rates = birth_death_shift(model, primates)
rates[1:5,:]
5×12 DataFrame
Rowdelta_lambdadelta_mudelta_netdivdelta_relextmean_lambdamean_mumean_netdivmean_relextnodeedgenshiftshift_bf
Float64Float64Float64Float64Float64Float64Float64Float64Int64Int64Float64Float64
1-0.000160736-0.0002566899.59532e-5-0.0006076970.2157940.1326330.08316120.60616523510.005924871.20347
20.0007276470.0002575150.000470132-0.0008477520.2160040.1326690.08333480.60578123620.02138820.335757
30.003369610.0003332060.00303641-0.005632540.2177340.1317470.08598710.59735123730.02938521.10842
40.001550580.0006482740.0009023116.25767e-50.220440.13220.08823950.59390423840.01067390.736863
50.001023530.0009817694.17627e-50.003207120.2217750.1329820.08879320.595314150.009741830.841873

Tree plots

As before, we can use Makie to make some quick tree plots. Here we are plotting the average net-diversification rate per branch, with a two-color scheme going from black to green.

using Makie, CairoMakie

cmap = Makie.cgrad([:black, :green])
treeplot(primates, rates, "mean_netdiv"; cmap = cmap)
Example block output