The goal of measr is to make it easy to estimate and evaluate diagnostic classification models (DCMs). DCMs are primarily useful for assessment or survey data where responses are recorded dichotomously (e.g., right/wrong, yes/no) or polytomously (e.g., strongly agree, agree, disagree, strongly disagree). When using DCMs, the measured skills, or attributes, are categorical. Thus, these models are particularly useful when you are measuring multiple attributes that exist in different states. For example, an educational assessment may be interested in reporting whether or not students are proficient on a set of academic standards. Similarly, we might explore the presence or absence of attributes before and after an intervention.
There are two main classes of functions we need to get started. Estimation functions are used for building the DCM using the Stan probabilistic programming language and getting estimates of respondent proficiency. Evaluation functions can then be applied to the fitted model to assess how well the estimates represent the observed data.
To illustrate, we’ll fit a loglinear cognitive diagnostic model (LCDM) to an assessment of English language proficiency (see Templin & Hoffman, 2013). There are many different subtypes of DCMs that make different assumptions about how the attributes relate to each other. The LCDM is a general model that makes very few assumptions about the compensatory nature of the relationships between attributes. For details on the LCDM, see Henson & Templin (2019).
The data set we’re using contains 29 items that together measure
three attributes: morphosyntactic rules, cohesive rules, and lexical
rules. The Q-matrix defines which attributes are measured by each item.
For example, item E1 measures morphosyntactic and cohesive rules. The
data is further described in ?ecpe
.
ecpe_data
#> # A tibble: 2,922 × 29
#> resp_id E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 1 1 1 0 1 1 1 1 1 1 1
#> 2 2 1 1 1 1 1 1 1 1 1 1 1
#> 3 3 1 1 1 1 1 1 0 1 1 1 1
#> 4 4 1 1 1 1 1 1 1 1 1 1 1
#> 5 5 1 1 1 1 1 1 1 1 1 1 1
#> 6 6 1 1 1 1 1 1 1 1 1 1 1
#> 7 7 1 1 1 1 1 1 1 1 1 1 1
#> 8 8 0 1 1 1 1 1 0 1 1 1 0
#> 9 9 1 1 1 1 1 1 1 1 1 1 1
#> 10 10 1 1 1 1 0 0 1 1 1 1 1
#> # ℹ 2,912 more rows
#> # ℹ 17 more variables: E12 <int>, E13 <int>, E14 <int>, E15 <int>, E16 <int>,
#> # E17 <int>, E18 <int>, E19 <int>, E20 <int>, E21 <int>, E22 <int>,
#> # E23 <int>, E24 <int>, E25 <int>, E26 <int>, E27 <int>, E28 <int>
ecpe_qmatrix
#> # A tibble: 28 × 4
#> item_id morphosyntactic cohesive lexical
#> <chr> <int> <int> <int>
#> 1 E1 1 1 0
#> 2 E2 0 1 0
#> 3 E3 1 0 1
#> 4 E4 0 0 1
#> 5 E5 0 0 1
#> 6 E6 0 0 1
#> 7 E7 1 0 1
#> 8 E8 0 1 0
#> 9 E9 0 0 1
#> 10 E10 1 0 0
#> # ℹ 18 more rows
We can estimate the LCDM using the measr_dcm()
function.
We specify the data set, the Q-matrix, and the column names of the
respondent and item identifiers in each (if they exist). Finally, we add
two additional arguments. The method
defines how the model
should be estimated. For computational efficiency, I’ve selected
"optim"
, which uses Stan’s optimizer to estimate the model.
For a fully Bayesian estimation, you can change this
method = "mcmc"
. Finally, we specify the type of DCM to
estimate. As previously discussed, we’re estimating an LCDM in this
example. For more details and options for customizing the model
specification and estimation, see the model
estimation article on the measr website.
ecpe_lcdm <- measr_dcm(data = ecpe_data, qmatrix = ecpe_qmatrix,
resp_id = "resp_id", item_id = "item_id",
method = "optim", type = "lcdm")
Once the model as estimated, we can use measr_extract()
to pull out the probability that each respondent is proficient on each
of the attributes. For example, the first respondent has probabilities
near 1 for all attributes, indicating a high degree of confidence that
they are proficient in all attributes. On the other hand, respondent 8
has relatively low probabilities for morphosyntactic and cohesive
attributes, and is likely only proficient in lexical rules.
ecpe_lcdm <- add_respondent_estimates(ecpe_lcdm)
measr_extract(ecpe_lcdm, "attribute_prob")
#> # A tibble: 2,922 × 4
#> resp_id morphosyntactic cohesive lexical
#> <fct> <dbl> <dbl> <dbl>
#> 1 1 0.997 0.962 1.00
#> 2 2 0.995 0.900 1.00
#> 3 3 0.985 0.990 1.00
#> 4 4 0.998 0.991 1.00
#> 5 5 0.989 0.985 0.965
#> 6 6 0.993 0.991 1.00
#> 7 7 0.993 0.991 1.00
#> 8 8 0.00411 0.471 0.964
#> 9 9 0.949 0.986 0.999
#> 10 10 0.552 0.142 0.111
#> # ℹ 2,912 more rows
There are many ways to evaluate our estimated model including model
fit, model comparisons, and reliability. For a complete listing of
available options, see ?model_evaluation
. To illustrate how
these functions work, we’ll look at the classification accuracy and
consistency metrics described by Johnson &
Sinharay (2018).
We start by adding the reliability information to our estimated model
using add_reliability()
. We can then extract that
information, again using measr_extract()
. For these
indices, numbers close to 1 indicate a high level of classification
accuracy or consistency. These numbers are not amazing, but overall look
pretty good. For guidance on cutoff values for “good,” “fair,” etc.
reliability, see Johnson & Sinharay (2018).