A comprehensive simulation study comparing Bayesian and frequentist approaches for analyzing 2³ factorial cluster randomized trials, with applications to emergency department interventions.
This repository contains code for conducting Monte Carlo simulation studies to evaluate statistical methods for factorial cluster randomized trials. The framework is designed for healthcare settings where multiple interventions are tested simultaneously across clustered units (e.g., emergency departments).
- 2³ Factorial Design: Analyzes trials with three binary interventions (A, B, C) and all interactions
- Cluster Randomization: Accounts for clustering within healthcare sites using random effects
- Bayesian Methods: Implements hierarchical models with different prior specifications
- Frequentist Comparison: Includes hierarchical model selection via likelihood ratio tests
- Power Analysis: Determines required sample sizes based on posterior precision criteria
- Prior Sensitivity: Evaluates robustness to different prior assumptions
| File | Description |
|---|---|
01_data_generation.R |
Core functions for simulating factorial cluster randomized trial data |
02_prior_predictive.R |
Prior predictive checks to validate prior specifications |
03_bayesian_power.R |
Monte Carlo simulation for Bayesian power analysis and sample size determination |
04_comparison_study.R |
Comprehensive comparison of Bayesian and frequentist approaches |
| File | Description |
|---|---|
model_HEX.stan |
Hierarchical Exchangeable (HEx) Model: Main Bayesian model with hierarchical priors for treatment effects |
model_nonX.stan |
Non-Exchangeable Model: Alternative specification without hierarchical structure |
model_HEX_ig.stan |
HEx with Inverse Gamma Priors: Variant using inverse gamma priors for scale parameters |
prior_predictive_model.stan |
Prior Predictive Model: Generates samples from prior distributions for prior checking |
- Control: No interventions
- Single Interventions: A only, B only, C only
- Two-way Combinations: AB, AC, BC
- Three-way Combination: ABC
- Cluster Level: Random effects and group assignments (4 size categories)
- Time Structure: Multiple quarters per cluster
- Individual Level: Binary outcomes with logistic regression
- Effect Structure: Main effects, two-way interactions, three-way interaction
The Bayesian models use the following hierarchical structure:
logit(P(y_i = 1)) = α_j[i] + τ_a*A_i + τ_b*B_i + τ_c*C_i +
τ_ab*A_i*B_i + τ_ac*A_i*C_i + τ_bc*B_i*C_i + τ_abc*A_i*B_i*C_i
Where:
α_j[i]: Random effect for cluster jτ_a, τ_b, τ_c: Main effects for treatments A, B, Cτ_ab, τ_ac, τ_bc: Two-way interaction effectsτ_abc: Three-way interaction effect
# Required R packages
library(simstudy) # Data simulation
library(data.table) # Data manipulation
library(cmdstanr) # Stan interface
library(posterior) # Posterior analysis
library(glmmTMB) # Frequentist mixed models
library(ggplot2) # Visualization
library(ggpubr) # Plot arrangements
library(parallel) # Parallel computing- Clone this repository:
git clone https://github.com/yourusername/factorial-cluster-trials.git
cd factorial-cluster-trials- Install required R packages:
install.packages(c("simstudy", "data.table", "glmmTMB", "ggplot2", "ggpubr", "parallel"))- Install cmdstanr and Stan:
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()source("01_data_generation.R")
# Define simulation parameters
list_of_defs <- s_define()
argsvec <- c(n_ed = 48, icc = 0.015, n_quarters = 2, ...)
# Generate simulated dataset
generated_data <- s_generate(list_of_defs, argsvec)source("02_prior_predictive.R")
# Compile prior predictive model
mod_prior <- cmdstan_model("prior_predictive_model.stan")
# Generate prior predictive plots
plots <- lapply(scenarios, function(x) replicate(x))source("03_bayesian_power.R")
# Compile main model
mod <- cmdstan_model("model_HEX.stan")
# Run power analysis
results <- mclapply(scenarios, function(a) s_replicate(a, mod), mc.cores = 4)- Number of clusters: 48-88 emergency departments
- Intracluster correlation: 0.015
- Time periods: 2 quarters per cluster
- Cluster sizes: Variable (60-300 patients per quarter)
- Baseline log-odds: -0.40 (≈40% control rate)
- Single intervention OR: 0.80-0.82
- Interaction effects: Small positive or null
Three scenarios representing different levels of informativeness:
- Informative: Small prior SDs (σ = 0.3-0.4)
- Moderate: Medium prior SDs (σ = 0.6-0.8)
- Weakly informative: Large prior SDs (σ = 2.5)
- Validates prior assumptions by examining implied outcome distributions
- Ensures priors allow for clinically meaningful effect sizes
- Identifies potential prior-data conflicts
- Determines required sample size for adequate posterior precision
- Target: Posterior SD ≤ 0.13 for interaction effects
- Monte Carlo estimation with 1000 replications per scenario
- Bayesian hierarchical models vs. frequentist hierarchical model selection
- Type I error rates and power comparison
- Robustness to prior specifications
- Memory: 8+ GB RAM recommended
- CPU: Multi-core processor for parallel processing
- Time: Full simulation study requires several hours/days on HPC
- Dependencies: Stan installation required
For large-scale simulations, high-performance computing (HPC) is recommended. Contact the authors for HPC setup guidance.
Based on posterior precision targets:
- Interaction effects: ~72-80 clusters needed for SD ≤ 0.13
- Main effects: Smaller sample sizes sufficient
- Diminishing returns: Limited benefit beyond 80 clusters
- Bayesian methods provide more nuanced uncertainty quantification
- Hierarchical priors improve estimation of interaction effects
- Frequentist model selection tends to be conservative
Keith Goldfeld
NYU Grossman School of Medicine
Email: [email protected]
For questions about implementation or methodology, please open an issue or contact the corresponding author.