14 minute read

Note: This project displays the following SQL & Tableau skills:

  • Filtering
  • Joins
  • Grouping and Aggregation
  • CTEs
  • Derived tables
  • Subqueries
  • Views
  • LODS (Level of Detail Expressions)

A dashboard can be found here

Introduction

This analysis is conducted on a Mexican E-commerce store called Maven Toys and uses SQL and Tableau for querying the data and visualisation, respectively.

The purpose of this is to extract insights that can shed light on current business performance and better inform future business decisions. These may range from learning about things such as which store to promote, which products tend to perform better in some months compared to others, whether or not money tied up in inventory could be managed better and more.

The queries are in 3 parts:

  • Products and Profitability
  • Seasonality
  • City-level and Stock Analysis

A short summary of this project, including information on the data tables, columns, data types and data source can be found in a GitHub repository README file.

Part 1 - Data Cleaning

Here I’m going to remove the dollar sign ‘$’ in the ‘product_cost’ and ‘product_price’ columns, then change type to ‘double’. This is done so that we can perform calculations with the data in these columns.


update products
set product_cost = replace(product_cost, '$','')

update products
set product_price = replace(product_price, '$','')

alter table products
modify product_cost double

alter table products
modify product_price double

Part 2 - Products and Profitability

Q1) Which 5 products have generated the most profit?


select
      a.product_name,
      round(sum((a.product_price-a.product_cost)*b.units), 2) as total_profit
from
      products a
inner join
      sales_toys b using(Product_ID)
group by
      a.product_name
order by
      total_profit desc
limit
      5

None

Colorbuds, which are a type of earphones, were more profitable than the next two most profitable products combined.

Interpretation & Takeaway

Among the 5 most profitable products, Colorbuds account for 43% of all profit, a significant amount. If combined with Action Figures, the second highest profitable product, nearly 62% of profit is accounted for.

If compared to total profit across all products, Colorbuds are responsible for just over 20% of this figure.

It may be tempting to recommend that Maven Toys should consider focusing on these two products in marketing campaigns, for example, at the expense of other products. However, without further information, we cannot make this recommendation.

For example, it may be the case that one or more of the other products serves as a loss leader. Loss leaders, as we know, are products that are intentionally sold at a lower price point, sometimes even below cost price, in order to attract customers to the company. The company can sell cross-sell other, higher-priced/more profitable products to the same customers as a result.

Therefore, the presence of the loss leader may actually be critical to overall profits because without them, the company may not attract as many customers in the first place. It may be the case that one of Lego Bricks, Deck of Cards & Glass Marbles serves as a loss leader. Completely neglecting one of these in favour of the 2 most profitable products may result in lower overall profits.

In conclusion, if the company has knowingly placed one of these products as a loss leader then ceasing its promotion is not advisable. Conversely, if this is not the case, then the company may run an A/B test online to verify whether further promoting Colorbuds or Action Figures leads to an increase in overall profit

Q2) What product category is the most profitable overall?


select
      a.product_category,
      round(sum((a.product_price-a.product_cost)*b.units), 2) as total_profit
from
      products a
inner join
      sales_toys b using(Product_ID)
group by
      a.product_category
order by
      total_profit desc

None

Interpretation & Takeaway

Toys and Electronics are the best-performing categories when it comes to total profit, accounting for nearly 52% of all profit. The two products (not product categories) that generated the greatest amount of profit belong to these 2 product categories (Colorbuds = Electronics & Action Figure = Toys)

Sports & Outdoors generated just over half a million dollars in revenue, being the least profitable category. Combined with the findings in Query 5, we see that there could be an opportunity to focus on manufacturing new Sports & Outdoors products that take sell better in Winter. This could be something as simple as skis & snowboards for children.

Q3.5) Do Toys and Electronics generate the most profit across all stores, or do certain categories generate more profit depending on the store?

To visualise this, we need to write a Level of Detail Expression (LOD) because the data is at a per-sale level while the visualisation we require is at a per-store and per-category level. I need to find the proportion of profit each product category is responsible for per-store, not across all stores

{fixed [Store Name], [Product Category]: sum([Profit])} / { fixed [Store Name] : sum([Profit])}

None

We see that Electronics and Toys are consistently responsible for generating the largest proportion of profit per store. There are only a few exceptions to this, such as in the store named “Maven Toys Morelia 1”, where Games dominate and generate almost 40% of the profit for the store.

For Maven Toys it may be exploring adding more items under the Electronics and Toys categories to give customers even more choice and take full advantage of the popularity of these categories

Q3) What products are high margin and which ones are low margin?

select
    product_name, round(product_price-product_cost, 2) as margin,
    case when (product_price-product_cost) > 5 then 'High Margin' else 'Low Margin' end as Status
from
    products
order by
    margin desc
    

None

Interpretation & Takeaway

Products are split into high and low margins nearly evenly.

The criteria of a high margin product is that the selling price is at least $6 greater than the cost price. I chose this figure based on the distribution of margins across all products, but it is still fairly arbitrary.

Interestingly, only 2 products from the 5 most profitable products list (Colorbuds & Action Figures) are high-margin, so let’s explore the relationship (if any exists) between the size of product margin and profitability

Q4) Are high margin products more profitable? Or are low margin products more profitable?


-- find the total profit and margin for each product

with cte1 as(

    select
          a.product_name,
          round(sum((a.product_price-a.product_cost)*b.units), 2) as total_profit,
          max(round(a.product_price-a.product_cost, 2)) as margin
    from
          products a
    inner join
          sales_toys b using(Product_ID)
    group by
          a.product_name
    order by
          total_profit desc)

select
    cte1.product_name,
    cte1.total_profit,
    cte1.margin, # get the results from above CTE and label each product as 'high' or 'low' margin
    case when cte1.margin > 5 then 'High Margin' else 'Low Margin' end as Status
from
    cte1
order by
    total_profit DESC, margin desc

None

Interpretation & Takeaway

We see that top 2 most profitable products, Colorbuds and Action Figures, are considered to be high-margin products while the next 3 most profitable products are low margin.

The graph shows us that beyond these first 2 products, there seems to be little to no correlation on a per-product basis between profit and whether or not the product is low- or high-margin. It may be the case that there is an additional factor, beyond the fact that they are high-margin products, that may be influencing why Colorbuds and Action Figures are so profitable.

What I want to do now is compare the total, average profit and average units sold per sale across high-margin products and compare those with the same calculations for low-margin products. Just to make it a little more objective, I am going to exclude “Colorbuds” in this visualisation, because it is a clear outlier:

None

While low-margin products make significantly more total profit overall, high-margin products generate more profit on average, on a per sale basis, even when we exclude ‘Colorbuds’ from the calculation

This may seem like an obvious conclusion; if high-margin products have a greater difference between sale and cost price (as is their very definition), then of course they generate more profit per sale on average. However, price is inversely correlated with quantity demanded from the point of view of customers. And of course, profit equals (price-cost)*quantity, so quantity sold per sale matters.

According to economic theory, customers should be buying proportionally more low-margin goods per sale, but we see that this isn’t the case. Lower-margin products sell only 0.16 units more per sale than high-margin products on average.

Overall, this suggests that higher-margin products are inherently more attractive to customers and they are inherently more profitable on average. This suggests that for Maven Toys, it is worth exploring whether they can sell additional high-margin products, provided that customer demand and market research also back this decision up.

Part 3 - Seasonality

Q5) Is there a seasonality aspect to sales?

The date range of the data is 2017-01-01 to 2018-09-30. This means that all months apart from October, November and December have two months worth of data (once for 2017 and once for 2018).


select DATE_FORMAT(a.date, '%Y-%m'),
     round(sum(a.Units*b.product_price), 2) as total_monthly_revenue
from
     sales a 
inner join 
     products b on a.Product_ID = b.Product_ID
group by 
     DATE_FORMAT(a.date, '%Y-%m')

None

Interpretation & Takeaway

As we can see, there is a general uptick in sales revenue for Maven Toys during the Spring months. Apart from a sharp rise in revenue in December, which can be attributed to Christmas and New Year’s holidays, revenue is generally lower in the Winter and Autumn months than in Spring.

As before, one possible recommendation here for Maven Toys is to introduce new products that are likely to sell better in Spring & Winter (specifically to take advantage of the peak during the last week of the year). For example, specialised headphones that can be worn during Winter sports (such as skiiing) could prove to be popular during Winter.

Overall, these observations do seem to indicate there is an element of seasonality involved. However, we still need to be careful in making generalisations because we have less than 2 years of data. Our conclusions would be more valid if we had, say, 5 years’ worth of data. But for now it seems that the spring months are correlated with an uptick in sales for Maven Toys

Q6) Is there a difference in product category popularity between Summer and Winter months?


-- get revenue of winter months

with w as(

    select
          a.product_category,
          round(sum(a.product_price*b.units), 2) as total_revenue
    from
          products a inner join sales_toys b using(Product_ID)
    where
          month(b.date) in(1, 2, 12)
    group by
          a.product_category),

-- get revenue of summer months

s as(

    select
          a.product_category,
          round(sum(a.product_price*b.units), 2) as total_revenue
    from
          products a inner join sales_toys b using(Product_ID)
    where
          month(b.date) in(6, 7, 8)
    group by
          a.product_category)

-- collect the results of the above two CTEs, joining on product category

select
    w.product_category,
    w.total_revenue as winter_revenue,
    s.total_revenue as summer_revenue
from
    w
inner join
    s using(product_category)

None

Interpretation & Takeaway

Here we see some interesting phenomena.

All categories bar Games sell better in Summer than in Winter. This makes sense as people are less likely to pursue outdoor activities in the Winter than in Summer, and games are primarily an indoor-oriented hobby/pastime.

Toys and Sports & Outdoors have the biggest differences in sales between Summer and Winter. Again, this is to be expected given most people’s inclination towards outdoor activities when the whether is warmer. In Summer, Toys sales are 24% greater than in Winter. This figure is about 54% for Sports & Outdoors

Electronics sales are nearly the same in Winter as in Summer, with Summer sales slightly higher while Games sales are nearly 12% higher in Winter than in Summer

Considering all of this, perhaps it is worthwhile considering promoting categories such as Toys and Sports & Outdoors more in the Summer in marketing campaigns, while doing the same for Games during Winter months. This may allow Maven Toys to take advantage of the higher demand for these categories during the appropriate months

Part 4 - City-level and Stock analysis

Q7) What cities generate the most revenue?


select
      s.Store_city,
      round(sum(p.product_price*t.units), 2) as City_Revenue
from
      stores s
inner join
      sales_toys t on s.Store_ID = t.Store_ID
inner join
      products p on p.Product_ID = t.Product_ID
group by
      s.Store_City
order by
      City_Revenue desc
      

None

Interpretation & Takeaway

Maven Toys have their highest sales on a per-city basis in Mexico City (Ciudad de Mexico), Guadalajara and Monterrey. After cross-referencing the list of cities in which Maven Toys has branches with a list of the largest cities in Mexico by population, we can see that the largest cities in population terms are associated with the greatest sales.

Again, this is not a surprising finding as a large population means more potential customers. Larger cities also tend to be more urban and richer on average, so this may also have an influence.

What’s more interesting is that all cities other than Mexico City, Guadalajara and Monterrey in which Maven Toys has a presence are not counted among the 30 largest cities by population. What this could imply is that Maven Toys may consider opening a branch in a city with among the largest populations in Mexico, such as Tijuana or Leon, since population size and sales seem to be correlated for them.

For example, Tijuana has approximately 1.8 million people and Leon 1.6 million people. For comparison, Hermosillo, where Maven Toys sold $903,389 worth of goods, has approximately 900,000 people. So there could be a case made to expand to a larger city with potentially a larger customer base.

Q8) Are older (legacy) stores able to incorporate ecommerce technology well?

What we are asking is: are legacy stores providing the same level of success as newer stores, as measured by profit? The earliest store opening is in 1992 and the latest is in 2016. Let’s consider 2004 which is halfway between the two dates as the cutoff point which decides whether a store is considered ‘New’ or ‘Legacy’


with cte2 as(

      select
            s.Store_ID,
            s.Store_Name,
            round(sum(p.product_price*t.units), 2) as Store_Revenue
      from
            stores s
      inner join
            sales_toys t on s.Store_ID = t.Store_ID
      inner join
            products p on p.Product_ID = t.Product_ID
      group by
            s.Store_ID, s.Store_Name)

select
      cte2.Store_name,
      cte2.Store_Revenue,
      case when year(s.store_open_date) <= 2004 then 'Legacy Store' else 'New Store' end as Age_Status
from
      cte2
inner join
      stores s using(Store_ID)
order by
      Store_Revenue DESC

None

Interpretation & Takeaway

At first glance it seems New Stores overwhelmingly outperform Legacy Stores. However, this conclusion is incorrect because when we account for the total number of stores belonging to each type (there are far fewer Legacy stores compared to New stores), we find that Legacy stores actually outperform their newer counterparts on average. However, this difference is quite small, only about $11,000.

One interpretation of this is that Legacy stores have incorporated e-commerce technology quite well. Moreover, these stores are more established because they are older. This may contribute to the reasons why they perform better on average. Perhaps customers have a long-standing relationship with these older stores and more people are aware of their existence compared to newer stores. This in turn promotes more online orders and footfall.

Regardless, it seems as though Maven Toys is doing a good job of updating its older stores.

Q9) How much money is tied up in each store in the form of stock? Top 5 vs Bottom 5

Top 5:


select
      i.store_id,
      s.store_name,
      round(sum(i.Stock_On_Hand*p.product_price), 2) as total_inventory_value
from
      inventory i
inner join
      stores s on i.store_id = s.store_id
inner join
      products p on i.product_id = p.product_id
group by
      i.store_id, s.store_name
order by
      total_inventory_value desc
limit 5


Bottom 5:


select
      i.store_id,
      s.store_name,
      round(sum(i.Stock_On_Hand*p.product_price), 2) as total_inventory_value
from
      inventory i
inner join
      stores s on i.store_id = s.store_id
inner join
      products p on i.product_id = p.product_id
group by
      i.store_id, s.store_name
order by
      total_inventory_value asc
limit 5

Q10) Is there any relationship between value of stock tied up and revenue by store?

Keeping a large amount of inventory is generally considered to be undesirable for a few different factors. For example, more inventory requires more storage, which increases costs. If customer tastes change, existing inventory is in danger of becoming obsolete.

Keeping this in mind, I wanted to see if the value tied up in inventory has any relationship with store revenue. Financial theory tells us that a high revenue-to-inventory ratio suggests the company is selling its inventory quickly and efficiently, while a low ratio suggests that the company is struggling to sell its inventory.

Here I compare two ranked measures for each store. The first is the revenue-to-inventory ratio, where the 1st ranked store has the highest, and therefore ‘best’, revenue-to-inventory ratio. The second is revenue itself. Again, the store ranked 1st here has the highest revenue.


-- get inventory value per store

with inventory_value as(

      select
            i.store_id,
            s.store_name,
            round(sum(i.Stock_On_Hand*p.product_price), 2) as total_inventory_value
      from
            inventory i
      inner join
            stores s on i.store_id = s.store_id
      inner join
            products p on i.product_id = p.product_id
      group by
            i.store_id, s.store_name
      order by
            total_inventory_value),[2022-03-27-SQL-Coronavirus.md](2022-03-27-SQL-Coronavirus.md)

-- get total revenue per store

store_revenue as(

      select
            s.store_name,
            sum(p.product_price*t.units) as revenue
      from
            products p
      inner join
            sales_toys t on p.product_id = t.product_id
      inner join
            stores s on t.store_id = s.store_id
      group by
            s.store_name)

-- collect and return relevant results of above CTEs, joining on store_name

select
      inventory_value.store_name,
      inventory_value.total_inventory_value,
      store_revenue.revenue
from
      inventory_value
inner join
      store_revenue using(store_name)
order by
      store_revenue.revenue desc

None

Interpretation & Takeaway

Having a ‘good’ revenue-to-sales ratio does not imply that a store has high revenue. The store with the highest revenue, Maven Toys Ciudad de Mexico 2, is ranked 44 out of 50 when it comes to revenue-to-inventory ratio, a clear mismatch. The story is similar for the top 20 stores ranked by revenue. 14th-placed (by revenue) Maven Toys Guanajuato 1 is ranked the best (#1) for revenue-to-inventory ratio.

Therefore, this seems to suggest that just because a store has a large amount of inventory tied up, does not mean that its sales are suffering