import os
from google.colab import files
= '06_qualitative'
MODULE = 'food_preferences.txt'
DATASET = '/content/data-analysis-projects'
BASE_PATH = os.path.join(BASE_PATH, 'notebooks', MODULE)
MODULE_PATH = os.path.join(MODULE_PATH, 'data', DATASET)
DATASET_PATH
try:
if not os.path.exists(BASE_PATH):
print('Cloning repository...')
!git clone https://github.com/ggkuhnle/data-analysis-projects.git
os.chdir(MODULE_PATH)if not os.path.exists(DATASET_PATH):
raise FileNotFoundError('Dataset missing after clone.')
print('Dataset ready ✅')
except Exception as e:
print('Setup fallback: upload file...')
'data', exist_ok=True)
os.makedirs(= files.upload()
uploaded if DATASET in uploaded:
with open(os.path.join('data', DATASET), 'wb') as f:
f.write(uploaded[DATASET])print('Uploaded dataset ✅')
else:
raise FileNotFoundError('Upload food_preferences.txt to continue.')
🧬 6.1 Introduction to Qualitative Research
Qualitative research explores the why and how of experiences, behaviours, and meanings. Instead of measuring quantities (grams, mmol/L), it examines language, stories, and context—for example, why a family chooses certain foods, how participants experience a diet, or what barriers hippos (ok, stakeholders 🦛) face when changing routines.
Unlike quantitative methods that test hypotheses with numbers, qualitative approaches seek rich descriptions and interpretations. You’ll often work with interviews, focus groups, open-ended survey responses, diaries, field notes, or observations.
🎯 Objectives
- Understand what qualitative research is (and isn’t), and when to use it.
- Compare qualitative vs. quantitative approaches and how they complement each other.
- Get hands-on: load and lightly explore open-ended text (food preference notes).
- Prepare for rigorous analysis (sampling, reflexivity, ethics, coding frameworks).
Key concepts (click)
Data types: interviews, focus groups, observation notes, diaries, open-ended survey text.
Designs: phenomenology (lived experience), grounded theory (theory building), ethnography (culture/setting), case study, narrative analysis, reflexive thematic analysis.
Sampling: purposive, maximum variation, snowball, theoretical sampling (until saturation: no new themes emerge).
Trustworthiness: credibility (member checks, triangulation), dependability (audit trail), confirmability (reflexivity), transferability (thick description).
Ethics: consent, confidentiality, anonymisation, data minimisation, secure storage.🧭 When to use qualitative methods?
- To understand experiences (e.g., why participants prefer certain foods).
- To explore contexts and systems (e.g., food access, cultural norms).
- To generate hypotheses and inform interventions or surveys.
- To explain unexpected quantitative results (mixed methods).
🔧 Setup (Colab)
Clone repo or upload food_preferences.txt
(open-ended responses).
📥 Load the qualitative data
We treat each line as one response (e.g., short interview note or survey comment).
import pandas as pd
from pathlib import Path
= Path('data')/'food_preferences.txt'
txt = [r.strip() for r in txt.read_text(encoding='utf-8').splitlines() if r.strip()]
responses = pd.DataFrame({'response_id': range(1, len(responses)+1), 'text': responses})
df print('N responses:', len(df))
5) df.head(
🔍 First look (light-touch)
Before coding/themes, a quick familiarisation pass helps: skim, note recurring words, surprising phrases, and tone.
for i, row in df.head(8).iterrows():
print(f"{row['response_id']:>2}: {row['text']}")
📏 Rigour in qualitative work
- Reflexivity: keep a short reflexive memo about your assumptions, role, and decisions.
- Audit trail: version a codebook, note inclusion/exclusion criteria, justify transformations.
- Ethics: anonymise identifiers, store raw audio/text securely, manage consent and withdrawal.
- Triangulation: compare data sources (e.g., interview + observation + logs).
🧪 Tiny, safe quantifications
Counting words isn’t the analysis, but it can help orient you. We’ll keep this light and non-dominant.
= ['carrot', 'carrots', 'fruit', 'grass', 'sweet', 'bitter']
targets = {t:0 for t in targets}
word_counts for t in df['text'].str.lower():
for w in targets:
+= t.count(w)
word_counts[w] word_counts
🧩 Next steps: from familiarisation → coding → themes
- Generate initial codes (labels on meaningful segments).
- Group codes into candidate themes.
- Review & refine themes against the data.
- Define & name themes; select vivid excerpts.
- Report with thick description and transparent decisions.
👉 You’ll practice coding in 6.2 and 6.3 (with reliability checks).
🧩 Exercises
- Reflexive memo (5–8 lines): what prior beliefs might shape your interpretations?
- Context probe: list three non-text sources you’d triangulate (e.g., observation, diet logs, environmental notes)—why?
- Ethics: identify any direct identifiers in these responses; propose an anonymisation rule.
✅ Conclusion
You’ve set a solid foundation for qualitative work—design choices, rigour, ethics, and a first “feel” for the text. Next, we’ll preprocess and move toward codes and themes.
Further reading
- Reflexive Thematic Analysis (overview)
- Qualitative rigour & trustworthiness
- Ethics & anonymisation in text