estimation: Questions and suggestions for revision


Open questions

1. The "previous BRFSS lab." The devnote says "Make it a continuation of the BRFSS lab." I don't see a previous BRFSS lab in the dp course directory. I've written the estimation lab as if BRFSS is introduced here for the first time. Is there a prior BRFSS lab I should reference and build on? If so, what did students do in it?

2. Dataset source. The notebook currently loads BRFSS data from a GitHub URL (a cleaned Kaggle version). This is fragile. Should I bundle a local copy in the repo? The file is ~25MB, which is large for a git repo. Alternatives: (a) use the kaggle CLI to download it, (b) use a different smaller health dataset, (c) host it on makingwithcode.org.

3. A4.3.5 (association rules) and the crime analysis example. The IB standard explicitly mentions crime analysis as an example. Should I include a secondary example using crime data (alongside the BRFSS data) to make the direct connection to the standard clearer for teachers reviewing alignment?

4. Should students train a classification model here? The teaching note raises this as an open question. Having students predict diabetes (binary) from health behaviors using logistic regression or a decision tree would unify the regression and classification threads. But it might make the lab feel too long or unfocused. What do you prefer?

5. Jupyter notebooks. This is the first lab in the sequence to use notebooks. The devnote says "Back to jupyter notebooks"—implying students have used them before. Should the lab include a brief refresher on notebook usage, or assume competence from a previous course?

6. US-centric data. BRFSS covers only US respondents. For an IB course with international students, this may feel less relevant. Should I include an alternative non-US dataset (e.g., WHO data) or frame the US context explicitly?


Suggestions for improvement