For the identity link, the underlying model is \[Y = \beta_2X_2 + \beta_1X_1 + \beta_0\] Note there is no need to rearrange for Y because the link is the identity function.
Using the inverse link function, the underlying model is \[ 1/Y = \beta_2X_2 + \beta_1X_1 + \beta_0\].
Rearranging for Y, we get \[
Y = 1 / (\beta_2X_2 + \beta_1X_1 + \beta_0)
\] We see the relationship between Y and X is different between the two models. This is the beauty of the glm framework. It can handle many different relationships between Y and X.
First, lets generate some data.
<- simulate_gaussian(N = 1000, weights = c(1, 3), link = "inverse",
simdata unrelated = 1, ancillary = .005)
Next, lets do some basic data exploration. We see the response is gaussian.
ggplot(simdata, aes(x=Y)) +
geom_histogram(bins = 30)
The connection between Y and X1 is not obvious. There is only a slight downward trend. One might see it as unrelated.
ggplot(simdata, aes(x=X1, y=Y)) +
There is a connection between Y and X2. No surprise as the true weight is three.
ggplot(simdata, aes(x=X2, y=Y)) +
The scatter plot between the unrelated variable and Y looks like random noise. It is interesting to note the scatter plot for X1 looks more similar to this one than X2’s scatter plot despite being included in the model.
ggplot(simdata, aes(x=Unrelated1, y=Y)) +
We see the correlation is very strong between Y and X2. This is no surprise considering the above graph. The correlation between Y and X1 is somewhat larger in absolute value than the unrelated variable. However, I would not see this as particularly good news in predicting Y if I did not know the correct model.
cor(x=simdata$X1, y = simdata$Y)
#>  -0.2566767
cor(x=simdata$X2, y = simdata$Y)
#>  -0.8667127
cor(x=simdata$Unrelated1, y = simdata$Y)
#>  0.04608871
Pretending the correct model is unknown, lets try to find it. We will try three models. One with just X2, one with X1 and X2, and one with everything. Will the correct model stand out?
<- glm(Y ~ X2, data = simdata, family = gaussian(link = "inverse"))
glmInverseX2 <- glm(Y ~ X1 + X2, data = simdata, family = gaussian(link = "inverse"))
glmInverseX1X2<- glm(Y ~ X1 + X2 + Unrelated1, data = simdata, family = gaussian(link = "inverse"))
#>  -7245.43
#>  -7647.507
#>  -7645.924
The correct model has the lowest AIC.