symDMatrix is an R package that provides symmetric matrices partitioned into file-backed blocks.

A symmetric matrix `G`

is partitioned into blocks as
follows:

```
+ --- + --- + --- +
| G11 | G12 | G13 |
+ --- + --- + --- +
| G21 | G22 | G23 |
+ --- + --- + --- +
| G31 | G32 | G33 |
+ --- + --- + --- +
```

Because the matrix is assumed to be symmetric (i.e., `Gij`

equals `Gji`

), only the diagonal and upper-triangular blocks
are stored and the other blocks are virtual transposes of the
corresponding diagonal blocks. Each block is a file-backed matrix of
type `ff_matrix`

of the ff package.

The package defines the class and multiple methods that allow treating this file-backed matrix as a standard RAM matrix.

Before we start, let’s create a symmetric matrix in RAM.

```
library(BGLR)
# Load genotypes from a mice data set
data(mice)
<- mice.X
X rownames(X) <- paste0("ID_", 1:nrow(X))
# Compute a symmetric genetic relationship matrix (G matrix) in RAM
<- tcrossprod(scale(X))
G1 <- G1 / mean(diag(G1)) G1
```

In practice, if we can hold a matrix in RAM, there is not much of a
point to convert it to a `symDMatrix`

object; however, this
will help us to get started.

```
library(symDMatrix)
<- as.symDMatrix(G1, blockSize = 400, vmode = "double", folderOut = "mice") G2
```

Now that we have a `symDMatrix`

object, let’s illustrate
some operators.

```
# Basic operators applied to a matrix in RAM and to a symDMatrix
# Dimension operators
all.equal(dim(G1), dim(G2))
nrow(G1) == nrow(G2)
ncol(G1) == ncol(G2)
all.equal(diag(G1), diag(G2))
# Names operators
all.equal(dimnames(G1), dimnames(G2))
all(rownames(G1) == rownames(G2))
all(colnames(G1) == rownames(G2))
# Block operators
nBlocks(G2)
blockSize(G2)
# Indexing (can use booleans, integers or labels)
1:2, 1:2]
G2[c("ID_1", "ID_2"), c("ID_1", "ID_2")]
G2[<- c(TRUE, TRUE, rep(FALSE, nrow(G2) - 2))
tmp
G2[tmp, tmp]head(G2[tmp, ])
# Exhaustive check of indexing
for (i in 1:100) {
<- sample(1:50, size = 1)
n1 <- sample(1:50, size = 1)
n2 <- sample(1:nrow(X), size = n1)
i1 <- sample(1:nrow(X), size = n2)
i2 <- G1[i1, i2, drop = FALSE]
TMP1 <- G2[i1, i2, drop = FALSE]
TMP2 stopifnot(all.equal(TMP1, TMP2))
}
```

The function `getG_symDMatrix`

of the BGData package
computes G=XX’ (with options for centering and scaling) without ever
loading G in RAM. It creates the `symDMatrix`

object
directly, block by block. In this example, `X`

is a matrix in
RAM. For large genotype data sets, `X`

could be a file-backed
matrix, e.g., a `BEDMatrix`

or `ff`

object.

```
library(BGData)
<- getG_symDMatrix(X, blockSize = 400, vmode = "double", folderOut = "mice2")
G3 class(G3)
all.equal(diag(G1), diag(G3))
for(i in 1:10){
<- sample(1:25, size = 1)
n1 <- sample(1:25, size = n1)
i1 for(j in 1:10){
<- sample(1:nrow(G1), size = 1)
n2 <- sample(1:nrow(G1), size = n2)
i2 <- G1[i1, i2]
tmp1 <- G3[i1, i2]
tmp2 stopifnot(all.equal(tmp1, tmp2))
} }
```

`ff`

files containing the
blocksThe function `symDMatrix`

allows creating a
`symDMatrix`

object from a list of `.RData`

files
containing `ff_matrix`

objects. The list is assumed to
provide, in order, files for
`G11, G12, ..., G1q, G22, G23, ..., G2q, ..., Gqq`

. This
approach is useful for very large G matrices. If `n`

is large
it may make sense to compute the blocks of the `symDMatrix`

object in parallel jobs (e.g., in an HPC). The function
`getG`

of the BGData package is
similar to `getG_symDMatrix`

but accepts arguments
`i1`

and `i2`

which define a block of G (i.e.,
rows of `X`

).

```
library(BGLR)
library(BGData)
library(ff)
# Load genotypes from a wheat data set
data(wheat)
<- wheat.X
X rownames(X) <- paste0("ID_", 1:nrow(X))
# Compute G matrix in RAM
<- colMeans(X)
centers <- apply(X = X, MARGIN = 2, FUN = sd)
scales <- tcrossprod(scale(X, center = centers, scale = scales))
G1 <- G1 / mean(diag(G1))
G1
# Compute G matrix block by block (each block computation can be distributed)
<- 3
nBlocks <- ceiling(nrow(X) / nBlocks)
blockSize <- 1:nrow(X)
i <- split(i, ceiling(i / blockSize))
blockIndices for (r in 1:nBlocks) {
for (s in r:nBlocks) {
<- paste0("wheat_", r, "_", s - r + 1)
blockName <- getG(X, center = centers, scale = scales, scaleG = TRUE,
block i = blockIndices[[r]], i2 = blockIndices[[s]])
<- ff::as.ff(block, filename = paste0(blockName, ".bin"), vmode = "double")
block save(block, file = paste0(blockName, ".RData"))
}
}<- as.symDMatrix(list.files(pattern = "^wheat.*RData$"))
G2 attr(G2, "centers") <- centers
attr(G2, "scales") <- scales
all.equal(diag(G1), diag(G2)) # there will be a slight numerical penalty
```

Install the stable version from CRAN:

`install.packages("symDMatrix")`

Alternatively, install the development version from GitHub:

```
# install.packages("remotes")
::install_github("QuantGen/symDMatrix") remotes
```

Further documentation can be found on RDocumentation.

- Issue Tracker: https://github.com/QuantGen/symDMatrix/issues
- Source Code: https://github.com/QuantGen/symDMatrix