The purpose of **dvir** is to implement state-of-the-art
algorithms for DNA-based disaster victim identification (DVI). In
particular, **dvir** performs *joint* identification
of multiple victims.

The methodology and algorithms of **dvir** are described
in Vigeland
& Egeland (2021): DNA-based Disaster Victim Identification.

The **dvir** package is part of the **ped
suite**, a collection of R packages for relatedness pedigree
analysis. Much of the machinery behind **dvir** is imported
from other **ped suite** packages, especially pedtools for handling
pedigrees and marker data, and forrel for the calculation
of likelihood ratios. A comprehensive presentation of these packages,
and much more, can be found in the recently published book Pedigree
Analysis in R.

To get the current official version of **dvir**, install
from CRAN as follows:

`install.packages("dvir")`

Alternatively, the latest development version may be obtained from GitHub:

```
# First install devtools if needed
if(!require(devtools)) install.packages("devtools")
# Install dvir from GitHub
::install_github("thoree/dvir") devtools
```

In the following we will use a toy DVI example from the paper
(see above) to illustrate how to use **dvir**.

To get started, we load the **dvir** package.

```
library(dvir)
#> Loading required package: pedtools
```

We consider the DVI problem shown below, in which three victim samples (V1, V2, V3) are to be matched against three missing persons (M1, M2, M3) belonging to two different families.

```
#> The argument `shaded` has been renamed to `hatched`; please use this instead.
#> The argument `shaded` has been renamed to `hatched`; please use this instead.
```

The hatched symbols indicate genotyped individuals. In this simple example we consider only a single marker, with 10 equifrequent alleles denoted 1, 2,…, 10. The available genotypes are shown in the figure.

DNA profiles from victims are generally referred to as *post
mortem* (PM) data, while the *ante mortem* (AM) data contains
profiles from the reference individuals R1 and R2.

A possible solution to the DVI problem is called an
*assignment*. In our toy example, there are *a priori* 14
possible assignments, which can be listed as follows:

```
#> V1 V2 V3
#> 1 M1 M2 M3
#> 2 M1 M2 *
#> 3 * M2 M3
#> 4 M1 * M3
#> 5 * M1 M3
#> 6 * M2 *
#> 7 * * M3
#> 8 M1 * *
#> 9 * M1 *
#> 10 * * *
#> 11 M2 M1 M3
#> 12 M2 M1 *
#> 13 M2 * M3
#> 14 M2 * *
```

Each row indicates the missing persons corresponding to V1, V2 and V3
(in that order) with `*`

meaning *not identified*. For
example, the first line gives the assignment where
`(V1, V2, V3) = (M1, M2, M3)`

, while line 10 shows the
*null model* corresponding to none of the victims identified. For
each assignment `a`

we can calculate the likelihood, denoted
`L(a)`

. The null likelihood is denoted `L0`

.

We consider the following to be the two main goals of the DVI analysis:

- Rank the assignments according to how likely they are. We measure
this by calculating the LR comparing each assignment
`a`

to the null model:`LR = L(a)/L0`

. - Find the
*posterior pairing probabilities*`P(Vi = Mj | data)`

for all combinations of`i,j = 1,2,3`

, and the*posterior non-pairing probabilities*`P(Vi = '*' | data)`

for all`i = 1, 2, 3`

.

The genotypes for this toy example are available within
**dvir** as a built-in dataset, under the name
`example2`

. This has the structure of a list, with elements
`pm`

(the victim data), `am`

(the reference data)
and `missing`

(a vector naming the missing persons). For easy
reference we store them in separate variables.

```
= example2$pm
pm = example2$am
am = example2$missing missing
```

We can inspect the data by printing each object. For instance,
`am`

is a list of two pedigrees:

```
am#> [[1]]
#> id fid mid sex L1
#> M1 * * 1 -/-
#> R1 * * 2 2/2
#> M2 M1 R1 1 -/-
#>
#> [[2]]
#> id fid mid sex L1
#> R2 * * 1 3/3
#> MO2 * * 2 -/-
#> M3 R2 MO2 2 -/-
```

Note that the two pedigrees are printed in so-called *ped
format*, with columns `id`

(ID label), `fid`

(father), `mid`

(mother), `sex`

(sex coded as 1 =
male; 2 = female) and `L1`

(genotypes at locus
`L1`

).

The appendix contains code for generating this dataset from scratch.

The `jointDVI()`

function performs joint identification of
all three victims, given the data. It returns a data frame ranking all
assignments with nonzero likelihood:

```
= jointDVI(pm, am, missing, verbose = FALSE)
jointRes
# Print the result
jointRes#> V1 V2 V3 loglik LR posterior
#> 1 M1 M2 M3 -16.11810 250 0.718390805
#> 2 M1 M2 * -17.72753 50 0.143678161
#> 3 * M2 M3 -18.42068 25 0.071839080
#> 4 M1 * M3 -20.03012 5 0.014367816
#> 5 * M1 M3 -20.03012 5 0.014367816
#> 6 * M2 * -20.03012 5 0.014367816
#> 7 * * M3 -20.03012 5 0.014367816
#> 8 M1 * * -21.63956 1 0.002873563
#> 9 * M1 * -21.63956 1 0.002873563
#> 10 * * * -21.63956 1 0.002873563
```

The output shows that the most likely joint solution is (V1, V2, V3) = (M1, M2, M3), with an LR of 250 compared to the null model.

Next, we compute the posterior pairing (and non-pairing)
probabilities. This is done by feeding the output from
`jointDVI()`

into the function `Bmarginal()`

.

```
Bmarginal(jointRes, missing, prior = NULL)
#> M1 M2 M3 *
#> V1 0.87931034 0.0000000 0.0000000 0.12068966
#> V2 0.01724138 0.9482759 0.0000000 0.03448276
#> V3 0.00000000 0.0000000 0.8333333 0.16666667
```

Here we used a default flat prior for simplicity, assigning equal prior probabilities to all assignments.

we see that the posterior pairing probabilities for the most likely solution are

*P*(V1 = M1 | data) = 0.88,*P*(V2 = M2 | data) = 0.95,*P*(V3 = M2 | data) = 0.83.

For completeness, here is one way of generating the
`example2`

dataset from scratch within R.

```
# Load pedtools for pedigree manipulation
library(pedtools)
# Attributes of the marker locus
= list(name = "L1",
loc alleles = 1:10,
afreq = rep(1/10, 10))
### 1. PM data
# PM as data frame
= data.frame(famid = c("V1", "V2", "V3"),
pm.df id = c("V1", "V2", "V3"),
fid = c(0, 0, 0),
mid = c(0, 0, 0),
sex = c(1, 1, 2),
L1 = c("1/1", "1/2", "3/4"))
# Convert to list of singletons
= as.ped(pm.df, locusAttributes = loc)
pm
### 2. AM data
# List of two pedigrees
= list(nuclearPed(father = "M1", mother = "R1", child = "M2"),
am nuclearPed(father = "R2", mother = "MO2", child = "M3", sex = 2))
# Attach marker and set genotypes
= setMarkers(am, locusAttributes = loc)
am genotype(am[[1]], marker = "L1", id = "R1") = "2/2"
genotype(am[[2]], marker = "L1", id = "R2") = "3/3"
### 3. Missing persons
= c("M1", "M2", "M3") missing
```

The complete dataset now looks as follows.

```
list(pm = pm, am = am, missing = missing)
#> $pm
#> $pm$V1
#> id fid mid sex L1
#> V1 * * 1 1/1
#>
#> $pm$V2
#> id fid mid sex L1
#> V2 * * 1 1/2
#>
#> $pm$V3
#> id fid mid sex L1
#> V3 * * 2 3/4
#>
#>
#> $am
#> $am[[1]]
#> id fid mid sex L1
#> M1 * * 1 -/-
#> R1 * * 2 2/2
#> M2 M1 R1 1 -/-
#>
#> $am[[2]]
#> id fid mid sex L1
#> R2 * * 1 3/3
#> MO2 * * 2 -/-
#> M3 R2 MO2 2 -/-
#>
#>
#> $missing
#> [1] "M1" "M2" "M3"
```