Member-only story

Using R Coding and Then ChatGPT to Figure Out an Outbreak

Ultimately, you’re the human with the lived experience, and GPTs are tools. Use them (your experience and the tools) wisely.

8 min readMar 17, 2024

A future epidemiologist? Not quite… (Image via DALL-E artificial intelligence from OpenAI.)

In my last post, I wrote about investigating an outbreak of food borne disease, and how the p-value for the statistical analysis I did while doing so can be changed. The p-value was initially not significant, but increasing the sample size made it significant without changing the measure of association. It’s a form of p-hacking if you’re a little too obsessed with p-values.

P-Hacking for Beginners

That famous p-value in studies should not have the final say in your conclusions.

epiren.medium.com

For this post, I will use a sample of 1,000 fake people who all ate at a fictional big event and became sick. I will then show you how we can determine the most likely food source of the outbreak. And I will finish with a discussion of the statistical analysis I did using ChatGPT’s Advanced Data Analysis tool.

I will link to the data and code at the end of the post.

First, Look At Your Data

One of the first things you should do with your data is to look at it. Open it up in your software of choice and look around. Are there any glaring errors, like weights for humans in the thousands or blood glucose measurements in the decimals? Yes, that’s right… You need to know the ins and outs of what you’re looking at.

Then do some exploratory data analysis. Do you have missing data? Is there a pattern to the missing data, or something that could explain it? For example, are you missing answers for drinking bottled water based on drinking agua fresca? Or were people who ate the fish less likely to tell you if they ate the salad?

Then just do a summary of the responses.

> summary(df)
   personID         illness baked_potato salad   soda    bottle_water agua_fresca dessert beef    chicken fish   
 Length:1000        N:681   N:650        N:599   N:485   Y:1000       N:516       N:101   N:727   N:683   N:528…

Using R Coding and Then ChatGPT to Figure Out an Outbreak

Ultimately, you’re the human with the lived experience, and GPTs are tools. Use them (your experience and the tools) wisely.

P-Hacking for Beginners

That famous p-value in studies should not have the final say in your conclusions.

First, Look At Your Data

Written by René F. Najera, MPH, DrPH

No responses yet