1 FB2NEP Python Cheat‑Sheet (Colab/Jupyter)

This one‑pager covers the most common things you’ll do in FB2NEP notebooks.


1.1 0) Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Stats & modelling (install if missing)
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula_api as smf

If a library is missing in Colab:

!pip -q install statsmodels
# then: Runtime → Restart runtime

1.2 1) Load / save data

# CSV from local upload or Drive
df = pd.read_csv("my_data.csv")

# CSV from GitHub (raw)
url = "https://raw.githubusercontent.com/USER/REPO/BRANCH/path/to/file.csv"
df = pd.read_csv(url)

# Save
df.to_csv("output.csv", index=False)

1.3 2) Quick look

df.head()
df.tail()
df.shape
df.info()
df.describe(include="all")
df["sex"].value_counts(dropna=False)
df.isna().mean()  # fraction missing per column

1.4 3) Select / filter / transform

# Columns
df[["age", "bmi"]]

# Rows
df[df["age"] >= 50]

# New columns
df["bmi_sq"] = df["bmi"] ** 2

# Rename
df = df.rename(columns={"cholesterol": "chol"})

# Sort
df = df.sort_values(["age", "bmi"], ascending=[True, False])

1.5 4) Grouping & summaries

df.groupby("group")["bmi"].mean()
df.groupby(["group", "sex"])["sbp"].agg(["mean", "std", "count"])

# Crosstab
pd.crosstab(df["group"], df["sex"], margins=True, normalize="index")

1.6 5) Plotting (quick)

df["bmi"].hist(bins=20)
plt.title("BMI distribution")
plt.xlabel("BMI"); plt.ylabel("Count")
plt.show()

# Boxplot by group
df.boxplot(column="sbp", by="group")
plt.suptitle(""); plt.title("SBP by group"); plt.xlabel("group"); plt.ylabel("SBP")
plt.show()

# Scatter
plt.scatter(df["bmi"], df["sbp"])
plt.xlabel("BMI"); plt.ylabel("SBP"); plt.show()

1.7 6) Basic stats

# Two-sample t-test
a = df.loc[df["group"] == "A", "sbp"]
b = df.loc[df["group"] == "B", "sbp"]
stats.ttest_ind(a, b, equal_var=False, nan_policy="omit")

# Chi-square test on a 2x2
tab = pd.crosstab(df["group"], df["sex"])
stats.chi2_contingency(tab)

1.8 7) Simple models (statsmodels)

# OLS regression
model = smf.ols("sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
print(model.summary())

# Logistic regression (binary outcome)
# e.g., 'high_sbp' is 0/1
logit = smf.logit("high_sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
print(logit.summary())

1.9 8) Jupyter basics

  • Run cell: Shift + Enter
  • Insert cell above/below: A / B
  • Interrupt: stop button (■) or Kernel/Runtime → Interrupt
  • Restart: Kernel/Runtime → Restart
  • Markdown cell: text with # headings, **bold**, lists, etc.

1.10 9) Reproducibility

SEED = 11088
np.random.seed(SEED)
  • Record: dataset version, random seed, and exact code you ran.