# Detecting rogue taxa in Bayesian posterior tree sets

#### 2022-01-13

Detecting “rogue” taxa and removing them from summary trees can produce consensus trees with a higher resolution, and can reveal strong support for groupings that would otherwise be masked by the uncertain position of rogues.

The raw output of Bayesian analysis requires a little processing before rogue taxa can be identified and explored using the “Rogue” R package.

The workflow presented here should be reasonably easy to adapt for the output of any Bayesian phylogenetic analysis, but if you hit snags or get stuck please let me know by filing a GitHub issue or by e-mail.

## Set up

library("TreeTools") # Read and plot trees
library("Rogue") # Find rogue taxa

We’ll work with some example data generated from a morphological analysis of early brachiopods (Sun et al., 2018) using MrBayes (Hulsenbeck & Ronquist, 2001). Our data files are stored on GitHub. Let’s load the results of run 1:

if (online) {
dataFolder <- "https://raw.githubusercontent.com/ms609/hyoliths/master/MrBayes/"
run1.t <- paste0(dataFolder, "hyo.nex.run1.t")
# Reading 10k trees takes a second or two...
if (packageVersion('ape') <= "5.6.1") {
# Workaround for a bug in ape, hopefully fixed in v5.6.2
run1Trees <- structure(lapply(run1Trees, function (tr) {

## Visualize results

Let’s see how these taxa influence the majority rule consensus of our results. Removing rogues may reveal information by producing reduced consensus trees with a higher resolution, or with higher split support values.

par(mar = rep(0, 4)) # Remove plot margin
par(mfrow = c(1, 2)) # Multi-panel plot
par(cex = 0.85) # Smaller labels

plenary <- Consensus(trees, p = 0.5)
reduced <- ConsensusWithout(trees, rogueTaxa, p = 0.5)

plot(plenary,
tip.color = ifelse(plenary$tip.label %in% rogueTaxa, 2, 1)) LabelSplits(plenary, SplitFrequency(plenary, trees)) plot(reduced) LabelSplits(reduced, SplitFrequency(reduced, trees)) We can also visualize the locations where our rogue taxa would plot on the reduced consensus tree: the rogue occurs more frequently at the brighter locations. par(mar = rep(0, 4), cex = 0.8) whichTaxon <- length(rogueTaxa) # Select an illuminating taxon positions <- RoguePlot(trees, rogueTaxa[whichTaxon], p = 0.5) # Plot a legend for the edge colours SpectrumLegend(spectrum = colorRampPalette(c(par("fg"), "#009E73"), space = "Lab")(100), labels = paste(range(positions$onEdge, positions\$atNode),
'trees'))

## References

Aberer, A. J., Krompass, D., & Stamatakis, A. (2013). Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Systematic Biology, 62(1), 162–166. doi:10.1093/sysbio/sys078
Hulsenbeck, J., & Ronquist, F. (2001). MrBayes: Bayesian inference of phylogeny. Bioinformatics, 17, 754–755.
Smith, M. R. (2022). Using information theory to detect rogue taxa and improve consensus trees. Systematic Biology, syab099. doi:10.1093/sysbio/syab099
Sun, H., Smith, M. R., Zeng, H., Zhao, F., Li, G., & Zhu, M. (2018). Hyoliths with pedicles illuminate the origin of the brachiopod body plan. Proceedings of the Royal Society B: Biological Sciences, 285(1887), 20181780. doi:10.1098/rspb.2018.1780