Relational Databases & SQL
The Language of Structured Logic
To analyze data at a global scale, you must first master the language of **Relational Databases (RDBMS)**. Whether it's **Visa** storing billions of transaction records or **Walmart** managing a decentralized inventory, SQL is the universal tool for extracting, grouping, and transforming raw tables into meaningful insights.
In this chapter, we move beyond simple spreadsheets to explore **Primary Keys**, **Normalization**, and the powerful **Window Functions** that allow us to calculate running totals and rankings in real-time across massive datasets.
Relational Concepts & Normalization
For **Visa**, data integrity is non-negotiable. They use **Normalization** to ensure that a merchant's name is stored in exactly one place, linked via a **Foreign Key** to millions of individual transactions. This prevents "Update Anomalies"—where changing a merchant's address in one record doesn't reflect in others.
Basic Querying: SELECT to HAVING
At **Walmart**, analysts use SQL to filter out the noise. The **WHERE** clause filters data before it is grouped, while the **HAVING** clause filters the results *after* grouping. This allows a logistics manager to ask: *"Show me only the warehouses that have an average inventory level GREATER than 10,000 units."*
Advanced: Window Functions & CTEs
Modern SQL uses **Window Functions** (like `PARTITION BY`) to perform calculations across a set of rows related to the current row. For **Visa**, this is essential for "Running Totals"—calculating a user's total spending *as each transaction arrives* without collapsing the rows into a single sum.
The 20 Essential SQL Commands (Part 1: DDL & DML)
Every data analyst must know these 20 SQL commands by heart. Part 1 covers **Data Definition Language (DDL)** commands like CREATE, ALTER, DROP, and TRUNCATE, plus **Data Manipulation Language (DML)** commands like INSERT, UPDATE, DELETE, and MERGE. For **Walmart**, these are used to build and maintain the schema of a global inventory system.
The 20 Essential SQL Commands (Part 2: Querying & Analytics)
Part 2 covers the analytical powerhouses: **WHERE**, **GROUP BY**, **ORDER BY**, **HAVING**, **JOIN** variants, **UNION**, **CASE WHEN**, **COALESCE**, **RANK()**, and **PARTITION BY**. These are the commands that turn raw tables into executive-level insights at **Visa** and **Netflix**.
Practice Questions
Question 1
What is the primary difference between the WHERE and HAVING clauses in SQL?
Question 2
In the Visa running total example, why is a Window Function (OVER) superior to a simple SUM()?