---
title: "Modmarg Usage"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Modmarg Usage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Why Margins?

For many types of regression techniques, the coefficients in the
model may not be sufficient to adequately figure out the marginal relationship
between a covariate and the outcome of a regression (or the error in your 
estimate). In the simplest case, say you run the following formula in glm: 
`wages ~ age + age^2`. 

Because the output will include coefficients for both age and age squared, 
it's not immediately apparent what the total marginal relationship is between a 
change in age and wages. Moreover, computing the error in that estimate is a 
non-trivial problem.

This non-obviousness of marginal relationships is also a problem for even 
very simple regressions with functional forms that mean that coefficients are
not in the base units of the regression. For example, the coefficients of 
logistic regression are odds ratios, so even simple regressions are not
immediately interpretable. 

This package reproduces the `margins` command from Stata, which allows for easy
and quick estimation of marginal relationships and the associated error. The
error is computed using the delta method, which we detail
in a separate [vignette](delta-method.html).

In this vignette, we detail a few possible use cases of the `modmarg` package.^[Modmarg is short for *model margins*.]

# Use Case 1: Treatment Effects and Subgroup Effects

We want to ascertain the marginal effect of `treatment` on `y` while controlling
for age. In this first example we'll use a binned age variable.

```{r}
library(modmarg)
data(margex)

g <- glm(y ~ as.factor(agegroup)*as.factor(treatment) , data = margex)
summary(g)
```

It's not at all obvious from these coefficients what the total marginal
relationship is between `treatment` and `y`.

We can get the predicted margin of `y` at the various levels of `treatment`.

```{r}
modmarg::marg(mod = g, var_interest = "treatment", type = 'levels')
```

Or the effect of a unit change in `treatment`.

```{r}
modmarg::marg(mod = g, var_interest = "treatment", type = "effects")
```

Or maybe we want to get treatment effect at several different levels of 
age. Let's re-run the regression with continuous age cubed (maybe we're looking
at severity of a disease that's most prevalent among the young and the old).

Note that you have to use `raw = T` when using `poly()`. Otherwise
`marg` will try to create multiple orthogonal vectors from a constant, 
which doesn't work so well.

```{r}
g <- glm(y ~ poly(age, 3, raw = T) * as.factor(treatment) , data = margex)
summary(g)

modmarg::marg(mod = g, var_interest = "treatment", type = "effects",
              at = list("age" = c(20, 40, 60)))
```

# Use Case 2: Logistic Regression

Let's say we want to figure out how much `treatment` increased the likelihood of
a binary `outcome`.

```{r}
g <- glm(outcome ~ as.factor(treatment), data = margex, family = binomial)
summary(g)
```

Those coefficients are odds ratios. It's really unclear what the marginal
relationship is.

```{r}
marg(mod = g, var_interest = "treatment", type = 'levels')
marg(mod = g, var_interest = "treatment", type = "effects")
```

Aha! It's an 18 percentage point increase in the likelihood of a positive outcome 
from treatment and the effect is highly statistically significant. Much more 
interpretable.

# Use Case 3: Getting Margins At Specific Values of Variable of Interest

Let's say we want to know the marginal predicted level at only a couple of 
age groups while controlling for distance. `marg` makes that simple.

Note that unlike the `at` option, which takes a named list of values, 
`at_var_interest` takes just an unnamed vector.

```{r}
g <- glm(y ~ poly(distance, 2, raw = T) * as.factor(agegroup) , data = margex)
summary(g)

unique(margex$agegroup)
marg(mod = g, var_interest = "agegroup", type = 'levels',
          at_var_interest = c("20-29"))
```

# Use Case 4: Getting Margins with Different Variance-Covariance Matrices

Normally we assume that the amount of variation in our outcome of interest 
(conditional on covariates) is constant across our sample. Sometimes, this assumption 
is violated, and we will use a different variance-covariance matrix to represent this 
heterogeneity in variance (heteroskedasticity). Creating these variance-covariance 
matrices is beyond the scope of this package. However, they can be used with `marg`
to correct standard errors and p values in predicted levels or effects.

Let's say we want to get the predicted levels of an outcome for different treatments,
but we want to cluster our standard errors by the `arm` variable. We estimate the model,
and then create the "clustered" variance-covariance matrix separately 
(see [this script](https://github.com/anniejw6/modmarg/blob/master/data-raw/make_cluster_vcov.R) 
for one method to do this). This code and example replicate the `vce(cluster arm)` option in Stata.

We can use the `vcov_mat` option to pass a custom variance-covariance matrix to modmarg.
Because this is an OLS model, the degrees of freedom for the T test must also be corrected.
Here we're using Stata's default correction of `ngroups - 1`, where `ngroups` is the number
of unique values in the clustering variable. Notice the standard errors and p values increase
substantially.

```{r}
data(cvcov)
g <- glm(outcome ~ treatment + distance, data = margex, family = 'gaussian')
summary(g)
v <- cvcov$ols$clust
print(v)
d <- cvcov$ols$stata_dof
print(d)

# Without clustering
marg(mod = g, var_interest = "treatment", type = "levels")

# With clustering
marg(mod = g, var_interest = "treatment", type = "levels",
          vcov_mat = v, dof = d)
```