For our example on Poisson Regression, let's use a famous dataset called roaches (Gelman & Hill, 2007), which is data on the efficacy of a pest management system at reducing the number of roaches in urban apartments. It has 262 observations and the following variables:

  • y – number of roaches caught.

  • roach1 – pretreatment number of roaches.

  • treatment – binary/dummy (0 or 1) for treatment indicator.

  • senior – binary/dummy (0 or 1) for only elderly residents in building.

  • exposure2 – number of days for which the roach traps were used

using CSV
using DataFrames
url = "https://github.com/TuringLang/TuringGLM.jl/raw/main/data/roaches.csv";
roaches = CSV.read(download(url), DataFrame)
yroach1treatmentseniorexposure2
1153308.0100.8
2127331.25100.6
371.67101.0
473.0101.0
502.0101.14286
600.0101.0
77370.0100.8
82464.56101.14286
921.0001.0
10214.0001.14286
...
26280.0011.0
using TuringGLM

Using y as dependent variable and roach1, treatment, and senior as independent variables:

fm = @formula(y ~ roach1 + treatment + senior)
FormulaTerm
Response:
  y(unknown)
Predictors:
  roach1(unknown)
  treatment(unknown)
  senior(unknown)

We instantiate our model with turing_model passing a keyword argument model=Poisson to indicate that the model is a Poisson Regression:

model = turing_model(fm, roaches; model=Poisson);

Sample the model using the NUTS sampler and 2,000 samples:

chn = sample(model, NUTS(), 2_000);
plot_chains(chn)

References

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.