1 FB2NEP Python Cheat‑Sheet (Colab/Jupyter)
This one‑pager covers the most common things you’ll do in FB2NEP notebooks.
1.1 0) Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Stats & modelling (install if missing)
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula_api as smfIf a library is missing in Colab:
!pip -q install statsmodels
# then: Runtime → Restart runtime1.2 1) Load / save data
# CSV from local upload or Drive
df = pd.read_csv("my_data.csv")
# CSV from GitHub (raw)
url = "https://raw.githubusercontent.com/ggkuhnke/fb2nep-eoi/main/data/synthetic/fb2nep.csv"
df = pd.read_csv(url)
# Save
df.to_csv("output.csv", index=False)1.3 2) Quick look
df.head()
df.tail()
df.shape
df.info()
df.describe(include="all")
df["sex"].value_counts(dropna=False)
df.isna().mean() # fraction missing per column1.4 3) Select / filter / transform
# Columns
df[["age", "bmi"]]
# Rows
df[df["age"] >= 50]
# New columns
df["bmi_sq"] = df["bmi"] ** 2
# Rename
df = df.rename(columns={"cholesterol": "chol"})
# Sort
df = df.sort_values(["age", "bmi"], ascending=[True, False])1.5 4) Grouping & summaries
df.groupby("group")["bmi"].mean()
df.groupby(["group", "sex"])["sbp"].agg(["mean", "std", "count"])
# Crosstab
pd.crosstab(df["group"], df["sex"], margins=True, normalize="index")1.6 5) Plotting (quick)
df["bmi"].hist(bins=20)
plt.title("BMI distribution")
plt.xlabel("BMI"); plt.ylabel("Count")
plt.show()
# Boxplot by group
df.boxplot(column="sbp", by="group")
plt.suptitle(""); plt.title("SBP by group"); plt.xlabel("group"); plt.ylabel("SBP")
plt.show()
# Scatter
plt.scatter(df["bmi"], df["sbp"])
plt.xlabel("BMI"); plt.ylabel("SBP"); plt.show()1.7 6) Basic stats
# Two-sample t-test
a = df.loc[df["group"] == "A", "sbp"]
b = df.loc[df["group"] == "B", "sbp"]
stats.ttest_ind(a, b, equal_var=False, nan_policy="omit")
# Chi-square test on a 2x2
tab = pd.crosstab(df["group"], df["sex"])
stats.chi2_contingency(tab)1.8 7) Simple models (statsmodels)
# OLS regression
model = smf.ols("sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
print(model.summary())
# Logistic regression (binary outcome)
# e.g., 'high_sbp' is 0/1
logit = smf.logit("high_sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
print(logit.summary())1.9 8) Jupyter basics
- Run cell: Shift + Enter
- Insert cell above/below: A / B
- Interrupt: stop button (■) or
Kernel/Runtime → Interrupt - Restart:
Kernel/Runtime → Restart - Markdown cell: text with
#headings,**bold**, lists, etc.
1.10 9) Reproducibility
SEED = 11088
np.random.seed(SEED)- Record: dataset version, random seed, and exact code you ran.