What do people do online?
Using data donation to understand digital behavior.
a workshop at the SPP Junior Researcher Meeting
October 22, 2025
frodew.github.io/workshop_spp_annual_meeting/
1️⃣ What is digital trace data?
2️⃣ What is data donation? - The participant’s perspective.
3️⃣ What is data donation? - The researcher’s perspective.
Please raise your hand 🤚 if you …
🎓 PhD, University of Mannheim & Institute for Employment Research
🧐 Research interests: "I study what people do online."
More info: github.com/frodew & frieder-rodewald.de
🎓 Postdoc at Institute for Employment Research & LMU
🧐 Research interests: Social inequalities (in the labor market), potentials of digital trace data for labor market research
👉 part of the SPP project Integrating Data Donations in Survey Infrastructure
Our Team
A huge thanks to Valerie Hase, for contributing to the conceptualizion of a previous data donation workshop at CompText in Vienna.
🤔 Which examples for digital trace data you know?
💡 Definition: The recording and storing of activities on digital platforms to draw conclusions about digital and analog phenomena.
This might include:
⚠️ Be careful: These "advantages" are often claimed, but not empirically proven.
👉 Digital traces are neither necessarily less biased, cheaper, or larger.
🤔 Questions?
The participant’s perspective.
👉 Solution: Platforms offer data download packages (DDPs), which users can request and download to inspect data.
👉 Consequence: Researchers uses DDPs as part of user-centric data donation studies.
🤔 Please raise your hand ✋
(Before a week ago…) Who has ever tried to request their data from an online platform?
💡 Definition: Data donation studies are a user-centric method for collecting digital traces.
For platforms like YouTube, Instagram, or LinkedIn, for example… (Hase et al. 2024)
Compared to APIs (Ohme et al. 2024)…
👉 but can be burdensome for participants!
|
|
|
|
|
Survey | Request & Download Data | Extract Data | Inspect Data | Consent |
Survey start page
Different degrees in standardization for data requests (Hase et al. 2024)…
Request manual for LinkedIn on computer
Request manual for Instagram on computer
Data overview on data donation platform
🤔 You might have already requested and downloaded your data in preparation for today. Did you encounter any difficulties in requesting and downloading your data?
We will dive into the content of your data a bit later.
🤔 Questions?
The researcher’s perspective.
🤔 What are methodological decisions researchers have to take in data donation studies?
|
|
|
|
|
Survey | Request & Download Data | Extract Data | Inspect Data | Consent |
📢 Task: Try it yourself.
Take a look at your downloaded data. What do you see; anything caught your eye?
Feel free to work in groups of 2-3 people for 5 minutes.
Different degrees in standardization for DDP content (Hase et al. 2024)…
subscriptions.csv (before processing)
def extract_youtube_content_from_zip_folder(zip_file_path, possible_filenames):
"""Extract content from YouTube data export zip file using filenames"""
try:
with zipfile.ZipFile(zip_file_path, "r") as zip_ref:
# Get the list of file names in the zip file
filenames = zip_ref.namelist()
# Look for matching files
for possible_filename in possible_filenames:
for filename in filenames:
if possible_filename in filename:
try:
# Process based on file extension
if filename.endswith(".json"):
with zip_ref.open(filename) as json_file:
json_content = json.loads(json_file.read())
return json_content
elif file_name.endswith(".csv"):
with zip_ref.open(file_name) as csv_file:
csv_content = pd.read_csv(csv_file)
def extract_subscriptions(subscriptions_csv):
"""Extract YouTube channel subscriptions"""
# Define column name
if "Kanaltitel" in subscriptions_csv.columns:
channel_column = "Kanaltitel"
else:
channel_column = "Channel Title"
# Define description
channel_name = "Subscribed Channel"
# Create DataFrame with just the channel names
subscriptions_df = pd.DataFrame({channel_name: subscriptions_csv[channel_column]})
return subscriptions_df
subscriptions.csv (after processing)
🙃 Thank you for participating; happy to talk with you about data donation (and anything else) throughout the next days.
For example …
🤔 What do you think: Which participant characteristics may correlate with non-response or non-compliance?
🤔 Questions?
Data Donation Workshop - Frieder & Sebastian