1 FB2NEP Python Cheat‑Sheet (Colab/Jupyter)
This one‑pager covers the most common things you’ll do in FB2NEP notebooks.
1.1 0) Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Stats & modelling (install if missing)
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula_api as smf
If a library is missing in Colab:
!pip -q install statsmodels
# then: Runtime → Restart runtime
1.2 1) Load / save data
# CSV from local upload or Drive
= pd.read_csv("my_data.csv")
df
# CSV from GitHub (raw)
= "https://raw.githubusercontent.com/USER/REPO/BRANCH/path/to/file.csv"
url = pd.read_csv(url)
df
# Save
"output.csv", index=False) df.to_csv(
1.3 2) Quick look
df.head()
df.tail()
df.shape
df.info()="all")
df.describe(include"sex"].value_counts(dropna=False)
df[# fraction missing per column df.isna().mean()
1.4 3) Select / filter / transform
# Columns
"age", "bmi"]]
df[[
# Rows
"age"] >= 50]
df[df[
# New columns
"bmi_sq"] = df["bmi"] ** 2
df[
# Rename
= df.rename(columns={"cholesterol": "chol"})
df
# Sort
= df.sort_values(["age", "bmi"], ascending=[True, False]) df
1.5 4) Grouping & summaries
"group")["bmi"].mean()
df.groupby("group", "sex"])["sbp"].agg(["mean", "std", "count"])
df.groupby([
# Crosstab
"group"], df["sex"], margins=True, normalize="index") pd.crosstab(df[
1.6 5) Plotting (quick)
"bmi"].hist(bins=20)
df["BMI distribution")
plt.title("BMI"); plt.ylabel("Count")
plt.xlabel(
plt.show()
# Boxplot by group
="sbp", by="group")
df.boxplot(column""); plt.title("SBP by group"); plt.xlabel("group"); plt.ylabel("SBP")
plt.suptitle(
plt.show()
# Scatter
"bmi"], df["sbp"])
plt.scatter(df["BMI"); plt.ylabel("SBP"); plt.show() plt.xlabel(
1.7 6) Basic stats
# Two-sample t-test
= df.loc[df["group"] == "A", "sbp"]
a = df.loc[df["group"] == "B", "sbp"]
b =False, nan_policy="omit")
stats.ttest_ind(a, b, equal_var
# Chi-square test on a 2x2
= pd.crosstab(df["group"], df["sex"])
tab stats.chi2_contingency(tab)
1.8 7) Simple models (statsmodels)
# OLS regression
= smf.ols("sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
model print(model.summary())
# Logistic regression (binary outcome)
# e.g., 'high_sbp' is 0/1
= smf.logit("high_sbp ~ age + bmi + C(sex) + C(group)", data=df).fit()
logit print(logit.summary())
1.9 8) Jupyter basics
- Run cell: Shift + Enter
- Insert cell above/below: A / B
- Interrupt: stop button (■) or
Kernel/Runtime → Interrupt
- Restart:
Kernel/Runtime → Restart
- Markdown cell: text with
#
headings,**bold**
, lists, etc.
1.10 9) Reproducibility
= 11088
SEED np.random.seed(SEED)
- Record: dataset version, random seed, and exact code you ran.