Bellabeat marketing case study

About the company

Founded by Urška Sršen and Sando Mur, Bellabeat is a high-tech company that manufactures health-focused smart products. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

Business Task

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has
asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain
insight into how people are already using their smart devices. Then, using this information, she would like high-level
recommendations for how these trends can inform Bellabeat marketing strategy.

Ask

Sršen asks to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart
devices and select one Bellabeat product to apply these insights to.
Using this information, make high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Questions to consider:

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Prepare

Data source and integrity:

  • For this project I will use FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker data from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
  • Since this is a high-level analysis, I will focus on the tables with daily activity instead of minute or hourly and examine any trends and correlations that may help inform marketing strategy.

Process

  • Clean and prepare the data for analysis

Key tasks:

  1. Check the data for errors.
  2. Choose your tools.
  3. Transform the data so you can work with it effectively.
  4. Document the cleaning process.

# Loading libraries

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)

# Importing the data

activity <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
steps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
sleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
hourly_intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")

# previewing the data

head(activity)

Fixing the date formatting

  • ActivityDate is the date column, but it is formatted as a character type. Formatting it to date type will be more helpful for analysis.

# The date variables are in character/string format, here I am formatting them to date/datetime and separating date and time.

# activity
activity$ActivityDate = as.Date(activity$ActivityDate, format="%m/%d/%Y")
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
activity$weekday <- ordered(weekdays(activity$ActivityDate), levels=c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday"))

# steps
steps$ActivityDay = as.Date(steps$ActivityDay, format="%m/%d/%Y")
steps$date <- format(steps$ActivityDay, format = "%m/%d/%y")

# intensities
intensities$ActivityDay = as.POSIXct(intensities$ActivityDay, format="%m/%d/%Y", tz=Sys.timezone())
intensities$date <- format(intensities$ActivityDay, format = "%m/%d/%y")

# hourly intensities
hourly_intensities$ActivityHour = as.POSIXct(hourly_intensities$ActivityHour, format="%m/%d/%Y %H:%M:%S %p", tz=Sys.timezone())
hourly_intensities$hour_of_day <- hour(hourly_intensities$ActivityHour)
hourly_intensities$time <- format(as.POSIXct(hourly_intensities$ActivityHour), format = "%H:%M")
hourly_intensities$date <- format(as.POSIXct(hourly_intensities$ActivityHour), format = "%m/%d/%y")

# daily calories
calories$ActivityDay = as.POSIXct(calories$ActivityDay, format="%m/%d/%Y")
calories$date <- format(calories$ActivityDay, format = "%m/%d/%y")

# sleep
sleep$SleepDay = as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")

# weight
weight$Date = as.POSIXct(weight$Date, format="%m/%d/%Y %H:%M:%S %p", tz=Sys.timezone())
weight$date <- format(weight$Date, format = "%m/%d/%y")

Exploring the datasets

# checking first and last dates, 1 month of data

min(activity$ActivityDate)
max(activity$ActivityDate)

# 2016-04-12
# 2016-05-12
# counting unique ids in tables will tell us how many users we have in the datasets

n_distinct(activity$Id)
n_distinct(steps$Id)
n_distinct(intensities$Id)
n_distinct(hourly_intensities$Id)
n_distinct(calories$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

# 33
# 33
# 33
# 33
# 33
# 24
# 8

# summarizing user frequency, one user logged only 4 days

user_days_logged <- activity %>% group_by(Id) %>% summarise(frequency = n()) %>% arrange(frequency)

head(user_days_logged, 10)
# performing some initial table summaries

# activity
activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes, Calories) %>%
  summary()

# explore num of active minutes per category
activity %>%
  select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
  summary()

# sleep
sleep %>%
  select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()

# weight
weight %>%
  select(WeightKg, BMI, Fat) %>%
  summary()
activity %>%  
  summarise(
      average_sedentary_hours = mean(SedentaryMinutes)/60,
      average_calories = mean(Calories),
      average_total_steps = mean(TotalSteps),
  )

sleep %>%  
  summarise(
      average_sleep_hours = mean(TotalMinutesAsleep)/60,
      average_time_in_bed = mean(TotalTimeInBed)/60
  )

From the summary statistics we can quickly get some helpful information of app users:

  • An average 16.5 hours was spent sedentary.
  • Users spent an average 7 hours sleeping per sleep session.
  • 7,637 total steps were taken and 2,303 calories burned, on average.

Merging tables

# merging activity and calories tables

daily_activity_cals <- merge(activity, calories, by=c('Id', 'Calories', 'date'), all=TRUE) %>% drop_na() %>% select(-ActivityDay)

head(daily_activity_cals)
# merging sleep and activity tables

sleep_activity <- merge(sleep, activity, by=c('Id', 'date')) %>% select(-SleepDay)

# merging activity and weight tables

weight_activity <- merge(activity, weight, by= c('Id', 'date')) %>% select(-ActivityDate, -Date)

Analyses & Plotting

Key tasks:

  1. Aggregate your data so it’s useful and accessible.
  2. Organize and format your data.
  3. Perform calculations.
  4. Identify trends and relationships.
sedentary_summary <- activity %>% group_by(weekday) %>% summarise(sum_steps = sum(TotalSteps))
sedentary_summary
# Plotting the sum of steps by weekday

ggplot(sedentary_summary, aes(weekday, sum_steps)) + 
    geom_col() +
    labs(title="Total steps taken by days of the week",
         x="",
         y="Sum of total steps")
weekdays_count <- activity %>% group_by(weekday) %>% summarise(count = n())
weekdays_count
# Plotting the weekday counts

ggplot(weekdays_count, aes(weekday, count)) + 
    geom_col() + 
    labs(title="Frequency of activity by days of the week",
         x="")

Users were most active in the middle of the week with a peak on Tuesday.

# creating groups of categories will help summarize and visualize based on how active the user was

group_by_usertype <- 
    activity %>%
    summarise(
        user_type = factor(case_when(
        SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
        SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
        SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
        SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
    ),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), Calories, Id=Id) %>%
    drop_na()

head(group_by_usertype)
# plotting user categories

group_by_usertype %>%
group_by(user_type) %>%
summarise(total = n()) %>%
mutate(totals = sum(total)) %>%
group_by(user_type) %>%
summarise(total_percent = total / totals) %>%
ggplot(aes(user_type,y=total_percent, fill=user_type)) +
    geom_col()+
    scale_y_continuous(labels = scales::percent) +
    theme(legend.position="none") +
    labs(title="User type distribution", x=NULL) +
    theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))
# Box plot of calories burned by user type

ggplot(group_by_usertype, aes(user_type, Calories, fill=user_type)) +
    geom_boxplot() +
    theme(legend.position="none") +
    labs(title="Calories burned by User type", x=NULL) +
    theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))

Most of the users are sedentary and most calories as expected were burned by those who were ‘Very Active’.

# plotting hourly intensity to see what time of day users were most active

hourly_intensities %>%
  group_by(time) %>%
  drop_na() %>%
  summarise(average_total_intensity = mean(TotalIntensity)) %>%
    ggplot(aes(time, y=average_total_intensity)) + 
        geom_histogram(stat = "identity") +
        theme(axis.text.x = element_text(angle = 30)) +
        labs(
            title="Average total intensity over time of day",
            x="Time of day",
            y="Average intensity")

Most users were active during the evening hours between 5PM and 8PM with a peak around 7PM.

# looking at relationship between BMI and weight

ggplot(weight, aes(x=BMI, y=WeightPounds)) + 
  geom_point()+ labs(title="BMI vs Weight")

There is not much relationship between BMI and Fat or Weight, there was one user with very high weight and BMI.

# plotting the relationship between steps taken and calories

ggplot(activity, aes(x=TotalSteps, y=Calories)) + 
  geom_point()+ labs(title="Total steps taken vs number of calories burned")

As we can expect, there is a positive relationship between total steps and calories burned.

# plotting weight in pounds and calories burned

ggplot(weight_activity, aes(x=WeightPounds, y=Calories)) + 
  geom_point()+
  labs(title="Weight in pounds and how many calories are burned")

We only have 8 users in the weight table but I wanted to plot the weight and calories burned to see where those users are in the range. We can see those around the 200-pound range burned the most calories.

# Pie chart of user type distribution using plotly.

colors <- c('rgb(211,94,96)', 'rgb(128,133,133)', 'rgb(144,103,167)', 'rgb(171,104,87)')

fig <- plot_ly(group_by_usertype, labels = ~user_type, values = ~Calories, type = 'pie',
        textposition = 'inside',
        textinfo = 'label+percent',
        insidetextfont = list(color = '#FFFFFF'),
        hoverinfo = 'text',
        text = ~paste(Calories),
        marker = list(colors = colors,
                      line = list(color = '#FFFFFF', width = 1)),
                      #The 'pull' attribute can also be used to create space between the sectors
        showlegend = FALSE)
fig <- fig %>% layout(title = 'User type distribution',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

fig

The bar distribution of user type wasn’t very clear to me so I used Plotly to place the user types on a pie chart with the labels for another view.

Summary of analysis:

  • An average 16.5 hours was spent sedentary.
  • Users spent an average 7 hours sleeping per sleep session.
  • 7,637 total steps a day were taken and 2,303 calories burned, on average.
  • Users were most active in the middle of the week with a peak on tuesday and between 5PM and 8PM with a peak around 7PM.
  • There is not much relationship between BMI and Fat or Weight, there was one user with very high weight and BMI.
  • As we can expect, there is a positive relationship between total steps and calories burned.

Recommendations:

  1. Since the data shows 52% of users were sedentary for an average 16.5 hours, I would assume these users work full-time sedentary jobs and take some time after work between 5PM and 8PM in the middle of the week to exercise, though not as much as they should be. Users are taking, on average, almost 3000 less steps than the recommended 10,000 per day for increased health benefits.
  2. There is not much relation between BMI and Fat and/or Weight, so the app should emphasize using body fat percentage instead of BMI and should encourage users to measure and use it.
  3. Since there is a positive relationship between Steps and being ‘Very Active’ and calories burned, the marketing strategy should focus on motivating users to be more active throughout the week. Since the U.S. Department of Health and Human Services recommends 150 minutes a week of moderate-intensity exercise, I would suggest sending users a variety of reminders or strategies such as spreading their recommended 30 minutes of activity throughout the day and the week, and to stand up for 30 minutes for every hour of sitting to meet their goal.
  4. I would also recommend providing low calorie meal recipes that are easy to prepare so that users can lower their calorie intake.
  5. Marketing strategy should focus on motivating the user to increase the intensity of activity and eating healthy meals.

Summary of analysis:

  • An average 16.5 hours was spent sedentary.
  • Users spent an average 7 hours sleeping per sleep session.
  • 7,637 total steps a day were taken and 2,303 calories burned, on average.
  • Users were most active in the middle of the week with a peak on tuesday and between 5PM and 8PM with a peak around 7PM.
  • There is not much relationship between BMI and Fat or Weight, there was one user with very high weight and BMI.
  • As we can expect, there is a positive relationship between total steps and calories burned.

Recommendations:

  1. Since the data shows 52% of users were sedentary for an average 16.5 hours, I would assume these users work full-time sedentary jobs and take some time after work between 5PM and 8PM in the middle of the week to exercise, though not as much as they should be. Users are taking, on average, almost 3000 less steps than the recommended 10,000 per day for increased health benefits.
  2. There is not much relation between BMI and Fat and/or Weight, so the app should emphasize using body fat percentage instead of BMI and should encourage users to measure and use it.
  3. Since there is a positive relationship between Steps and being ‘Very Active’ and calories burned, the marketing strategy should focus on motivating users to be more active throughout the week. Since the U.S. Department of Health and Human Services recommends 150 minutes a week of moderate-intensity exercise, I would suggest sending users a variety of reminders or strategies such as spreading their recommended 30 minutes of activity throughout the day and the week, and to stand up for 30 minutes for every hour of sitting to meet their goal.
  4. I would also recommend providing low calorie meal recipes that are easy to prepare so that users can lower their calorie intake.
  5. Marketing strategy should focus on motivating the user to increase the intensity of activity and eating healthy meals.

This case study was part of the Google Data Analytics Certificate and my first using R. Thanks for reading!