This package is built with a clear set of design principles based on current best practices. These design principles have been established before writing any code to make sure obstacles along the way would not lower our expectations.
fundiversity aims to provide a reliable tool to compute functional diversity indices. As some of the most used functional diversity indices are defined in Villéger et al. (2008), fundiversity adopts this framework. Rao’s Quadratic Entropy is quite popular as an additional functional diversity index, which makes it a good addition to the panel of indices computable with fundiversity.
We aim at having as few as dependencies as possible, unless they remain relatively lightweight and provide a large speed boost.
As CRAN packages are now automatically archived at the same time of any of their strong dependencies, dependencies for fundiversity should be well established, have a good track record at remaining on CRAN, and ideally already have a large number of reverse-dependencies.
Based on these guidelines, some acceptable dependencies are:
Additionally, special care is taken with packages that rely on external libraries, as their installation might be an issue on shared computing platforms where users don’t have super-user privileges.
fundiversity does however depend on vegan which brings quite a number of other dependencies. vegan is still heavy developed and as such shouldn’t be archived by CRAN without notice. fundiversity users probably already have vegan installed as they are probably also interested in other community ecology analyses provided by vegan. Thus depending on vegan is not a dependency nor an installation burden.
Putting each index in its own function improves maintainability by having shorter, more easily readable functions, with less control-flow.
Additionally, it speeds up computation (versus the case where all indices would be computed each time) and keep the number of columns in the output constant (as opposed to the case where an argument would control which index is returned by a single function with all indices).
Some packages in functional ecology transform the data before processing it. One common such transformation is the use of dimensionality reduction techniques.
Such transformations should only be done by the user if they wish to do so but never by the package itself. Even when documented in the function help, it is easily overlooked by users and may lead to misinterpreted results.
One acceptable exception is when function need dissimilarity matrix as input, such as for the computation of FEve or Rao’s Quadratic Entropy. Because other functions in fundiversity accept raw trait data, for the sake of coherence the functions that need dissimilarity matrix as input should also accept raw trait data and compute dissimilarity internally. To do so, they should still make minimal assumptions regarding the input dataset that all traits are quantitative. If it’s not the case, then it’s the user’s responsibility to choose the adapted dissimilarity metric she wants to use.
Two inputs are acceptable in fundiversity functions:
matrix, or a
tibble, but with a preference for
data.frames) with individuals as rows and characteristics as columns.
The rationale is that providing
characteristics is more user-friendly, as they are a common format in
functional ecology, and
data.frames are a familiar object
However, only allowing
data.frames doesn’t provide
enough flexibility. In particular, advanced users may want to compute
distances with a custom function (instead of Euclidean distances).
The functions should return
possible for several reasons:
data.frames are one of the most common object in the R ecosystem meaning they are familiar to beginners and there exist many S3 methods for them.
data.frames enable a pipe-able workflow, which is important it is already prevalent in the tidyverse and will be part of base R 4.1
To avoid further steps of data wrangling after running analyses, all functions should output similar structured data. When computing functional diversity indices most users want to compute site-level metric. To make our functions easy to use and flexible, they should only output the absolutely necessary data:
sitecontaining the site names as given by the user (guessed from the row names in the input site-species matrix),
FRic, etc.) that contains the values of the indices.
That way our functions only outputs unambiguous data that can easily be reused and merged with other data at the site-level.