🩺 4.7 Analysing a Simulated Clinical Trial

In this notebook, we’ll analyse a simulated randomised clinical trial—a common design in nutrition research to estimate causal effects.

You will: - Simulate a trial with randomisation, baseline covariates, adherence, and missingness - Produce a CONSORT-like flow and a Table 1 (with standardised mean differences, SMD) - Visualise distributions and balance (hist/KDE, love plot) - Estimate treatment effects: Welch t-test, Mann–Whitney, permutation test, Cohen’s d / Hedges’ g, bootstrap CIs - Adjust via ANCOVA (post ~ group + baseline + covariates) with robust SEs - Fit a Bayesian ANCOVA (PyMC) with posterior, HDIs, posterior predictive checks - Contrast Intention-to-Treat (ITT) vs Per-Protocol (PP)

Why these pieces?
  • SMDs give scale-free balance checks after randomisation.
  • ANCOVA increases precision by adjusting for baseline.
  • Permutation & bootstrap add robustness when assumptions are shaky.
  • Bayesian analysis communicates uncertainty with posteriors & probabilities.
%pip install -q pandas numpy scipy statsmodels seaborn matplotlib pymc arviz scikit-learn
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn.utils import resample
import pymc as pm, arviz as az
sns.set_theme()
pd.set_option('display.max_columns', 60)
rng = np.random.default_rng(42)
print('Environment ready.')

🧪 Step 1: Simulate the Trial

We create a parallel-group RCT (1:1) with baseline age, BMI, baseline outcome y0, and post outcome y1. Treatment improves y1 by ~1 unit on average; outcomes also depend on age/BMI. We add adherence and missingness to make it realistic.

n = 200
ids = np.arange(1, n+1)
group = rng.integers(0, 2, size=n)  # 0=Control, 1=Intervention

# Baseline covariates
age = rng.normal(45, 11, size=n)
bmi = rng.normal(27, 4, size=n)

# Baseline outcome (y0)
y0 = rng.normal(0, 2, size=n)

# Adherence (only matters in treatment); ~80% adherent if treated, 98% if control (placebo adherence)
adherent = np.where(group==1, rng.random(n) < 0.80, rng.random(n) < 0.98).astype(int)

# Treatment effect (on adherers), plus covariate drift and noise
true_effect = 1.0
mu = (0
      + group * adherent * true_effect
      + 0.15*y0  # regression to mean / persistence
      + 0.01*(age-45)
      + 0.03*(bmi-27))
y1 = mu + rng.normal(0, 2.0, size=n)

# Missingness: MCAR 5% on y1; slightly higher missing if non-adherent
miss_base = 0.05
miss_extra = np.where(adherent==0, 0.05, 0.0)
mask_miss = rng.random(n) < (miss_base + miss_extra)
y1_obs = y1.copy()
y1_obs[mask_miss] = np.nan

df = pd.DataFrame({
    'participant_id': ids,
    'group': group,                      # 0=Control, 1=Intervention
    'adherent': adherent,                # 1=yes
    'age': age,
    'bmi': bmi,
    'y0': y0,                            # baseline
    'y1': y1_obs                         # post (observed)
})

df['group_label'] = df['group'].map({0:'Control', 1:'Intervention'})
df.head()

🧾 Step 1b: CONSORT-like Flow

Counts at each stage help detect imbalance or data issues.

consort = {
    'Randomised (N)': [len(df)],
    'Allocated Control (N)': [(df['group']==0).sum()],
    'Allocated Intervention (N)': [(df['group']==1).sum()],
    'Adherent Control (N)': [((df['group']==0)&(df['adherent']==1)).sum()],
    'Adherent Intervention (N)': [((df['group']==1)&(df['adherent']==1)).sum()],
    'Observed y1 Control (N)': [((df['group']==0)&(df['y1'].notna())).sum()],
    'Observed y1 Intervention (N)': [((df['group']==1)&(df['y1'].notna())).sum()],
}
pd.DataFrame(consort)

📋 Step 2: Baseline Table (Table 1) with SMD

SMD (standardised mean difference) is a scale-free balance metric (|SMD| < 0.1 is often considered negligible).

def smd_cont(x0, x1):
    # Pooled SD
    s2 = (np.nanvar(x0, ddof=1) + np.nanvar(x1, ddof=1))/2
    return (np.nanmean(x1) - np.nanmean(x0)) / np.sqrt(s2) if s2>0 else np.nan

vars_cont = ['age','bmi','y0']
tbl = []
for v in vars_cont:
    x0 = df.loc[df['group']==0, v]
    x1 = df.loc[df['group']==1, v]
    row = {
        'Variable': v,
        'Control_mean': np.nanmean(x0),
        'Control_sd': np.nanstd(x0, ddof=1),
        'Intervention_mean': np.nanmean(x1),
        'Intervention_sd': np.nanstd(x1, ddof=1),
        'SMD': smd_cont(x0, x1)
    }
    tbl.append(row)
table1 = pd.DataFrame(tbl)
table1.round(2)

Love plot (SMDs)

Visual balance check across baseline covariates.

plt.figure(figsize=(6, 3))
order = table1.sort_values('SMD', key=lambda s: s.abs())
plt.hlines(order['Variable'], xmin=0, xmax=order['SMD'])
plt.plot(order['SMD'], order['Variable'], 'o')
plt.axvline(0, ls='--', alpha=.5)
plt.axvline(0.1, ls=':', alpha=.5, color='grey'); plt.axvline(-0.1, ls=':', alpha=.5, color='grey')
plt.title('Standardised Mean Differences (baseline)')
plt.xlabel('SMD'); plt.ylabel('')
plt.tight_layout(); plt.show()

📈 Step 3: Visualise Outcomes

We inspect post outcome (y1) and change (y1 - y0).

df['change'] = df['y1'] - df['y0']
fig, axes = plt.subplots(1,2, figsize=(12,4))
sns.histplot(data=df, x='y1', hue='group_label', kde=True, element='step', stat='density', common_norm=False, ax=axes[0])
axes[0].set_title('Post outcome (y1)'); axes[0].set_xlabel('y1')
sns.histplot(data=df, x='change', hue='group_label', kde=True, element='step', stat='density', common_norm=False, ax=axes[1])
axes[1].set_title('Change (y1 - y0)'); axes[1].set_xlabel('change')
plt.tight_layout(); plt.show()

📏 Step 4: Frequentist Estimators (Unadjusted)

Primary ITT analysis uses everyone with observed y1 in their assigned group.

  • Welch t-test for mean difference (robust to unequal variances)
  • Cohen’s d / Hedges’ g for effect size
  • Mann–Whitney U and Cliff’s delta (non-parametric)
  • Permutation test (exact/randomisation-based inference)
  • Bootstrap 95% CI for mean difference
itt = df.dropna(subset=['y1']).copy()
y0_ = itt.loc[itt['group']==0, 'y1']
y1_ = itt.loc[itt['group']==1, 'y1']
mean_diff = y1_.mean() - y0_.mean()

# Welch t-test
t_w, p_w = stats.ttest_ind(y1_, y0_, equal_var=False)

# Cohen's d and Hedges' g
sd_pooled = np.sqrt(((y0_.var(ddof=1) + y1_.var(ddof=1))/2))
cohen_d = (y1_.mean() - y0_.mean()) / sd_pooled
J = 1 - 3/(4*(len(itt))-9)  # small sample correction
hedges_g = J * cohen_d

# Mann–Whitney + Cliff's delta
u, p_u = stats.mannwhitneyu(y1_, y0_, alternative='two-sided')
def cliffs_delta(a, b):
    a, b = np.asarray(a), np.asarray(b)
    diffs = a.reshape(-1,1) - b.reshape(1,-1)
    return (np.sum(diffs>0) - np.sum(diffs<0)) / (a.size*b.size)
delta = cliffs_delta(y1_, y0_)

# Permutation (difference in means)
def perm_test(a, b, reps=2000, rng=rng):
    obs = a.mean() - b.mean()
    comb = np.concatenate([a,b])
    n_a = len(a)
    cnt = 0
    for _ in range(reps):
        rng.shuffle(comb)
        da, db = comb[:n_a], comb[n_a:]
        if abs(da.mean()-db.mean()) >= abs(obs):
            cnt += 1
    return obs, (cnt/reps)
perm_diff, p_perm = perm_test(y1_.values.copy(), y0_.values.copy())

# Bootstrap CI for mean difference
def bootstrap_mean_diff(a, b, reps=5000, rng=rng):
    diffs = np.empty(reps)
    for i in range(reps):
        da = resample(a, replace=True, random_state=rng.integers(0, 1e9))
        db = resample(b, replace=True, random_state=rng.integers(0, 1e9))
        diffs[i] = da.mean() - db.mean()
    return np.quantile(diffs, [0.025, 0.975])
ci_boot = bootstrap_mean_diff(y1_.values, y0_.values)

print(f"Mean difference (y1): {mean_diff:.2f}")
print(f"Welch t: t={t_w:.2f}, p={p_w:.3f}")
print(f"Cohen's d: {cohen_d:.2f}, Hedges' g: {hedges_g:.2f}")
print(f"Mann–Whitney U p={p_u:.3f}, Cliff's delta={delta:.2f}")
print(f"Permutation p={p_perm:.3f}")
print(f"Bootstrap 95% CI for mean diff: [{ci_boot[0]:.2f}, {ci_boot[1]:.2f}]")

🧮 Step 5: ANCOVA (Adjusted Analysis)

Model post outcome with baseline and covariates: y1 ~ group + y0 + age + bmi.

Why ANCOVA? Adjusting for baseline often yields tighter CIs (more power) under correct specification and preserves unbiasedness in RCTs.
ancova_df = itt.copy()
ancova_df['group'] = ancova_df['group'].astype(int)
fit = smf.ols('y1 ~ group + y0 + age + bmi', data=ancova_df).fit(cov_type='HC3')
print(fit.summary())
coef = fit.params['group']
ci = fit.conf_int().loc['group'].values
print(f"\nAdjusted effect (group=Intervention vs Control): {coef:.2f}  CI95% [{ci[0]:.2f}, {ci[1]:.2f}] (HC3)")

🔁 Step 6: ITT vs Per-Protocol

  • ITT: analyse by assigned group regardless of adherence (primary)
  • PP: restrict to adherent participants (sensitivity)
    Caution PP can be biased if adherence relates to prognosis; use as a sensitivity check, not the main analysis.
pp = itt.query('adherent==1')
y0_pp = pp.loc[pp['group']==0,'y1']; y1_pp = pp.loc[pp['group']==1,'y1']
mean_diff_pp = y1_pp.mean() - y0_pp.mean()
print(f"ITT mean diff (y1): {mean_diff:.2f}")
print(f"PP  mean diff (y1): {mean_diff_pp:.2f}")

🧠 Step 7: Bayesian ANCOVA

Model:
y1 ~ Normal(mu, sigma) with
mu = α + β_group*group + β0*y0 + β_age*age + β_bmi*bmi.

We report the posterior for β_group (treatment effect), 95% HDI, and posterior probability P(β_group > 0).

Notes
  • Weakly-informative priors keep estimates stable while remaining non-committal.
  • Posterior predictive checks assess fit.
bdf = ancova_df.dropna(subset=['y1','y0','age','bmi']).copy()
with pm.Model() as m:
    alpha = pm.Normal('alpha', 0, 10)
    b_group = pm.Normal('b_group', 0, 5)
    b_y0 = pm.Normal('b_y0', 0, 2)
    b_age = pm.Normal('b_age', 0, 0.1)
    b_bmi = pm.Normal('b_bmi', 0, 0.2)
    sigma = pm.HalfNormal('sigma', 2)
    mu = (alpha
          + b_group * bdf['group'].values
          + b_y0 * bdf['y0'].values
          + b_age * (bdf['age'].values-45)
          + b_bmi * (bdf['bmi'].values-27))
    y = pm.Normal('y', mu=mu, sigma=sigma, observed=bdf['y1'].values)
    trace = pm.sample(1500, tune=1500, target_accept=0.9, random_seed=42, chains=2, cores=2, progressbar=True, return_inferencedata=True)
    ppc = pm.sample_posterior_predictive(trace, model=m, random_seed=42)

az.plot_posterior(trace, var_names=['b_group'], ref_val=0);
plt.title('Posterior: treatment effect (β_group)'); plt.show()

bg = trace.posterior['b_group'].values.flatten()
hdi = az.hdi(bg, hdi_prob=0.95)
print(f"Posterior mean (β_group): {bg.mean():.2f} | 95% HDI [{hdi[0]:.2f}, {hdi[1]:.2f}]")
print(f"P(β_group>0): {(bg>0).mean():.3f}")

# Posterior predictive check
idata_ppc = az.from_pymc(posterior_predictive=ppc, model=m)
az.plot_ppc(idata_ppc, num_pp_samples=200);
plt.title('Posterior predictive check'); plt.show()

🔁 Optional Exercises

  1. Stratified effects: Add an interaction group × (age>50) in ANCOVA—does the effect differ by age strata?
  2. Equivalence: Define a ROPE (e.g., ±0.25 units) and compute P(|β_group| < 0.25) from the posterior.
  3. Imputation: Mean-impute missing y1 within group and re-run ITT Welch test—how do results shift?
  4. Power (bonus): Loop over n to find sample size achieving ~80% power for Welch t-test given the simulated SD and effect.
Hints
  • Interaction in OLS: y1 ~ group*I(age>50) + y0 + age + bmi
  • ROPE prob: np.mean(np.abs(bg) < 0.25)
  • Simple impute: df.groupby('group')['y1'].transform(lambda s: s.fillna(s.mean()))

✅ Summary

You: - Simulated an RCT with realistic wrinkles (adherence, missingness) - Assessed baseline balance (SMDs) and visualised distributions - Estimated treatment effects with robust frequentist tools (Welch, MWU, permutation, bootstrap) and ANCOVA - Ran a Bayesian ANCOVA, reporting posteriors and predictive checks - Compared ITT and PP

➡️ This workflow mirrors real trial analysis: pre-specify ITT primary, show adjusted analysis, and include sensitivity checks.

Further reading
  • CONSORT: Randomised trials reporting
  • Senn S. Statistical Issues in Drug Development (ANCOVA in RCTs)
  • Gelman et al. Bayesian Data Analysis (posterior predictive checks)