Chapter 15: Database Fundamentals with SQLite

Small, Fast, and Everywhere: The Perfect Place to Start

1. Introduction

Welcome to Database Fundamentals, where projects without persistent storage turn into real applications. SQLite is the fastest way to learn the fundamentals without getting buried in infrastructure, because it runs directly inside your Python code with zero setup. That lets you focus on what matters most: how relational databases store data, enforce correctness, and answer questions reliably.

When your projects grow into production web services handling concurrent users and distributed systems, you'll move to PostgreSQL (a production database server). This book walks you through that transition in Part V (Chapters 24-25). The important point is that nothing you learn here gets thrown away. The SQL, schema design habits, and database instincts you build with SQLite carry forward directly.

In this chapter, you’ll build the mental model that every backend developer needs: how data is stored, how tables relate, how to query confidently, and how to design schemas (table structures) that won’t collapse the moment your app grows. You’ll go beyond copy-paste SQL by learning the practical rules that make databases reliable: primary keys, constraints, indexes, and transactions. Then you’ll apply them through Python so your API can create, read, update, and delete data safely.

Chapter Roadmap

This chapter takes you from zero database knowledge to confidently designing schemas, writing SQL, and integrating SQLite into Python applications. Here's the path we'll follow:

Why Databases Matter

Section 1 • Foundation

Understand why persistent storage transforms scripts into real applications. See how adding just a few lines of database code gives your programs memory, enabling historical queries, trend analysis, and features that depend on past data.

Persistence State vs Stateless SQLite Overview

SQL Fundamentals

Section 2 • Core Skills

Learn the SQL you actually need: creating tables with CREATE TABLE, inserting rows with INSERT, querying with SELECT and WHERE, sorting and aggregating results, updating and deleting data, speeding up queries with indexes, and connecting related tables with JOIN.

CREATE / INSERT SELECT / WHERE JOIN Indexes

Python's sqlite3 Module

Section 3 • Integration

Bridge SQL and Python using the built-in sqlite3 library. Master connections, cursors, parameterized queries, transaction management with commits and rollbacks, context managers, and robust error handling patterns for production code.

Connections Cursors Transactions Error Handling

Hands-On Project: Weather API Cache

Section 4 • Applied Build

Put everything together by building a weather API cache that stores responses in SQLite, checks cache freshness before making network requests, and provides analytics queries over historical weather data.

Schema Design Cache Logic Analytics Queries

Schema Design Best Practices

Section 5 • Professional Patterns

Level up with production-grade techniques: choosing the right data types, indexing strategies that balance speed and storage, normalization rules for splitting tables, schema migrations for evolving databases, and common performance pitfalls to avoid.

Data Types Normalization Migrations Performance

Why APIs Need Databases

Through Chapters 1-14, you've built increasingly sophisticated API integrations: fetching data, handling errors, managing authentication, and processing complex JSON responses. Every one of those programs shares a fundamental limitation: the moment they finish running, they forget everything.

This limitation blocks you from building anything that depends on historical context. You can't track how your music taste evolved over months. You can't detect spending patterns in your transactions. You can't cache expensive API responses to avoid rate limits. You can't build dashboards that show trends, or tools that learn from past behavior, or applications that remember user preferences between runs.

"Scripts are stateless. Applications need state."

Here's the transformation that changes everything. Below is a weather script you could have written after Chapter 8. It works perfectly: it fetches current conditions, displays them, then forgets everything:

Weather Script (No Memory)

Python

import requests

API_KEY = "YOUR_OPENWEATHERMAP_API_KEY"

def get_weather(location):
    response = requests.get(
        "https://api.openweathermap.org/data/2.5/weather",
        params={"q": location, "appid": API_KEY, "units": "metric"},
        timeout=10
    )
    return response.json()

weather = get_weather("Dublin")
print(f"Temperature: {weather['main']['temp']}°C")
print(f"Conditions: {weather['weather'][0]['description']}")

# Script ends. Data disappears.

Now watch what happens when you add eight lines of database code:

Weather Script (With Memory)

Python

import requests
import sqlite3
from datetime import datetime, timezone

API_KEY = "YOUR_OPENWEATHERMAP_API_KEY"

def get_weather(location):
    response = requests.get(
        "https://api.openweathermap.org/data/2.5/weather",
        params={"q": location, "appid": API_KEY, "units": "metric"},
        timeout=10
    )
    data = response.json()
    
    # Eight new lines - save to database
    with sqlite3.connect('weather.db') as conn:
        conn.execute("""
            INSERT INTO weather_history (location, temperature, conditions, timestamp)
            VALUES (?, ?, ?, ?)
        """, (
            location,
            data['main']['temp'],
            data['weather'][0]['description'],
            datetime.now(timezone.utc)
        ))
    
    return data

# Today's weather (from API)
weather = get_weather("Dublin")
print(f"Today: {weather['main']['temp']}°C - {weather['weather'][0]['description']}")

# Yesterday's weather (from database)
with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT temperature, conditions 
        FROM weather_history 
        WHERE location = ? AND date(timestamp) = date('now', '-1 day')
    """, ("Dublin",))
    yesterday = cursor.fetchone()
    if yesterday:
        print(f"Yesterday: {yesterday[0]}°C - {yesterday[1]}")

# Weekly temperature trend (from database)
with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT date(timestamp), AVG(temperature)
        FROM weather_history 
        WHERE location = ? AND timestamp >= date('now', '-7 days')
        GROUP BY date(timestamp)
        ORDER BY date(timestamp)
    """, ("Dublin",))
    
    print("\nLast 7 days:")
    for day, avg_temp in cursor.fetchall():
        print(f"  {day}: {avg_temp:.1f}°C")

Same API. Same city. Same weather data. But now your application can answer questions about the past, calculate trends over time, and build context from historical patterns. This is the fundamental difference between a script and an application.

Why This Matters for Your Music Project

In Chapter 16, you'll build the Spotify Music Time Machine. Every feature depends on persistence:

"This Week Last Year" playlists require storing what you listened to last year
Musical evolution tracking requires comparing your taste across months or years
Hidden gems discovery requires remembering songs you played heavily six months ago
Listening statistics require aggregating play counts over time

Without a database, none of these features are possible. The Spotify API only gives you "recently played tracks" (last 50 songs). To answer "What was I listening to last March?" you need to have been saving that data all along.

This chapter teaches you how to save it.

SQLite: Perfect for Your Projects

This book uses SQLite because it removes every barrier to learning databases. There's no server to install, no configuration files to edit, no ports to manage. Python includes the sqlite3 library by default. If you have Python, you have SQLite.

Your entire database is a single file (music.db, weather.db) that lives alongside your Python scripts. To back it up, copy the file. To share it with a friend, email the file. There is no complex export process.

Feature	SQLite	PostgreSQL / MySQL
Architecture	Embedded (runs inside your app)	Client-Server (separate process)
Data Storage	Single file on your disk	Complex file structure managed by server
Setup Time	Instant (import sqlite3)	Minutes to hours (install, configure, create users)
Concurrency	One writer at a time (multiple readers OK)	Thousands of simultaneous connections
Best For	Single-user apps, prototypes, mobile apps, < 100K records	Multi-user web services, enterprise apps, millions of records

The Production Mindset

A common misconception is that SQLite is a "toy" database for beginners. It is not. SQLite is production-grade software that powers the contacts on your phone, the bookmarks in your browser, and the guidance systems of airplanes. It is the most deployed database engine in the world.

The difference isn't quality, it's use case. If you're building a website that gets 100,000 hits per second, you need PostgreSQL. If you're building a tool to analyze a 2GB CSV file on your laptop, or a portfolio project like the Music Time Machine, SQLite is arguably the superior choice. It's simpler, faster to develop with, and requires zero infrastructure.

Professional developers use SQLite for the right jobs. You're not learning a "beginner" technology. You're learning when and how to use an embedded database effectively.

Learning Objectives

By the end of this chapter, you'll be able to:

Explain when API-powered applications need persistent storage instead of in-memory scripts and understand why databases solve this problem
Design SQLite schemas for API data by choosing sensible data types, primary keys, and indexes
Write core SQL statements for CRUD operations (CREATE, INSERT, SELECT, UPDATE, DELETE) plus basic JOINs
Use Python's sqlite3 module safely with parameterized queries, context managers, commits, and rollbacks
Handle database errors and schema changes without losing data by applying basic migration strategies
Decide when SQLite is appropriate versus when server databases like PostgreSQL are more suitable

You've seen the problem (ephemeral scripts) and the solution (persistent databases). You understand what SQLite is and why it's perfect for your projects. Now let's learn SQL, the language for talking to databases.

2. SQL Basics You Actually Need

You've seen database code in action. Now let's break down the SQL language that makes it work. SQL (Structured Query Language) is how you talk to databases: creating tables, inserting data, querying results, and everything in between.

This section teaches you the SQL you need for the weather cache and music project. You won't learn every SQL feature (there are dozens you'll never use). You'll learn the essential operations that power real applications: creating schemas, storing API responses, querying historical data, and analyzing trends.

Every SQL command you see here appears in the weather cache project. This isn't abstract theory. It's the exact SQL that makes your applications remember.

Tables, Rows, and Columns

A database stores data in tables. Each table is like a spreadsheet: rows represent individual records, and columns represent fields of information.

Here's what the weather cache table looks like after storing a few API responses:

Weather Cache Table Structure

weather_history table
┌────┬──────────┬─────────────┬────────────────┬─────────────────────┐
│ id │ location │ temperature │ conditions     │ timestamp           │
├────┼──────────┼─────────────┼────────────────┼─────────────────────┤
│ 1  │ Dublin   │ 12.5        │ Cloudy         │ 2024-03-15 14:30:22 │
│ 2  │ London   │ 15.2        │ Rain           │ 2024-03-15 14:35:10 │
│ 3  │ Dublin   │ 13.1        │ Partly Cloudy  │ 2024-03-16 09:15:44 │
│ 4  │ Paris    │ 18.0        │ Sunny          │ 2024-03-16 10:22:33 │
└────┴──────────┴─────────────┴────────────────┴─────────────────────┘

Each row = one weather reading from the API
Each column = one piece of information about that reading

The id column uniquely identifies each row. The location, temperature, and conditions columns store data from the API response. The timestamp column records when you cached the data.

Every database operation works with this structure: inserting new rows, querying specific columns, filtering by values, sorting results. The table is the fundamental unit of database storage.

CREATE TABLE: Designing Your Schema

Before you can store data, you need to create a table. The CREATE TABLE statement defines your schema: what columns exist, what data types they hold, and what rules they follow.

Start with your use case. For the weather cache, you need to store location, temperature, conditions, and when you fetched the data. Here's how you translate that into SQL:

Python

import sqlite3

with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        CREATE TABLE IF NOT EXISTS weather_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            location TEXT NOT NULL,
            temperature REAL,
            conditions TEXT,
            humidity INTEGER,
            wind_speed REAL,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    """)

What This SQL Does

CREATE TABLE IF NOT EXISTS: Creates the table only if it doesn't already exist. Safe to run multiple times without errors.

id INTEGER PRIMARY KEY AUTOINCREMENT: Every row gets a unique ID number automatically. The database handles this. You never assign IDs manually.

location TEXT NOT NULL: Location is stored as text and is required (NOT NULL means you can't insert a row without a location).

temperature REAL: REAL stores decimal numbers (12.5, 15.2). Use this for temperatures, prices, percentages, or anything with decimals.

humidity INTEGER: INTEGER stores whole numbers (75, 80, 85). Use this for counts, IDs, percentages without decimals.

timestamp DATETIME DEFAULT CURRENT_TIMESTAMP: Automatically records the current date and time when you insert a row. You don't need to pass this value. The database fills it in.

The data types matter. If you use TEXT for temperature, you can't calculate averages or compare values mathematically. If you use INTEGER for decimals, you lose precision (12.5 becomes 12). Choose types based on what operations you need to perform.

Data Types You'll Use Most

INTEGER: Whole numbers (counts, IDs, years). Example: play_count INTEGER

REAL: Decimal numbers (prices, temperatures, percentages). Example: temperature REAL

TEXT: Strings of any length (names, descriptions, JSON). Example: artist_name TEXT

DATETIME: Timestamps (when things happened). Example: played_at DATETIME

These four types handle 95% of API data storage. SQLite has other types (BLOB for binary data, NULL for missing values), but you'll rarely need them for API projects.

INSERT: Adding Data

Once your table exists, you need to fill it with data. The INSERT statement adds new rows to your table.

After fetching weather from the API, you store it in the database. Here's the pattern:

Python

import sqlite3
from datetime import datetime

# Fetch weather data from API (simplified)
weather_data = {
    "location": "Dublin",
    "temperature": 12.5,
    "conditions": "Cloudy",
    "humidity": 75,
    "wind_speed": 4.5
}

# Store in database
with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        INSERT INTO weather_history (location, temperature, conditions, humidity, wind_speed, timestamp)
        VALUES (?, ?, ?, ?, ?, ?)
    """, (
        weather_data["location"],
        weather_data["temperature"],
        weather_data["conditions"],
        weather_data["humidity"],
        weather_data["wind_speed"],
        datetime.now()
    ))

The Question Mark Pattern

The ? placeholders are critical for security. Never use f-strings or string concatenation with SQL. Always use parameterized queries.

Why this matters: If you use f-strings, malicious input could execute arbitrary SQL commands (SQL injection attacks). The ? placeholders ensure user input is treated as data, not executable code.

The values in the second argument (the tuple) replace the ? placeholders in order. First ? gets weather_data["location"], second ? gets weather_data["temperature"], and so on.

Notice you didn't specify the id column in the INSERT statement. That's because AUTOINCREMENT handles it automatically. The database assigns IDs 1, 2, 3, 4... as you insert rows. You never manage IDs manually.

If you need to insert multiple rows at once, use executemany():

Python

weather_readings = [
    ("Dublin", 12.5, "Cloudy", 75, 4.5, datetime.now()),
    ("London", 15.2, "Rain", 85, 6.2, datetime.now()),
    ("Paris", 18.0, "Sunny", 60, 3.1, datetime.now())
]

with sqlite3.connect('weather.db') as conn:
    conn.executemany("""
        INSERT INTO weather_history (location, temperature, conditions, humidity, wind_speed, timestamp)
        VALUES (?, ?, ?, ?, ?, ?)
    """, weather_readings)
    print(f"Inserted {len(weather_readings)} weather readings")

This inserts all three rows in a single database operation, which is much faster than three separate INSERT statements. Use executemany() when you have multiple records to store.

UPSERT: Insert or Update in One Statement

For caching scenarios, you often want to update a row if it already exists, or insert it if it doesn't. Doing this with two separate queries creates a race condition. SQLite solves it with INSERT OR REPLACE:

Python

with sqlite3.connect('weather.db') as conn:
    # INSERT OR REPLACE: inserts new row, or replaces if the UNIQUE
    # constraint (location) already exists. One query, no race condition.
    conn.execute("""
        INSERT OR REPLACE INTO weather_history
            (location, temperature, conditions, timestamp)
        VALUES (?, ?, ?, ?)
    """, ("Dublin", 13.2, "Overcast", datetime.now()))

    # Alternatively, use ON CONFLICT to update only specific columns
    # (preserving created_at while refreshing the data):
    conn.execute("""
        INSERT INTO weather_history (location, temperature, conditions, timestamp)
        VALUES (?, ?, ?, ?)
        ON CONFLICT(location) DO UPDATE SET
            temperature = excluded.temperature,
            conditions  = excluded.conditions,
            timestamp   = excluded.timestamp
    """, ("Dublin", 13.2, "Overcast", datetime.now()))

INSERT OR REPLACE is the simpler option and works well when you want to fully overwrite a row. ON CONFLICT DO UPDATE is more surgical — it lets you update only specific columns (like refreshing weather data without touching created_at). You'll use both patterns heavily in the weather cache project.

SELECT: Querying Data

Storing data is only half the story. You need to retrieve it. The SELECT statement queries your database and returns matching rows.

The simplest query retrieves everything:

Python

import sqlite3

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("SELECT * FROM weather_history")
    all_weather = cursor.fetchall()
    
    for row in all_weather:
        print(row)

Terminal Output

(1, 'Dublin', 12.5, 'Cloudy', 75, 4.5, '2024-03-15 14:30:22')
(2, 'London', 15.2, 'Rain', 85, 6.2, '2024-03-15 14:35:10')
(3, 'Paris', 18.0, 'Sunny', 60, 3.1, '2024-03-16 10:22:33')

SELECT * means "get all columns." The fetchall() method returns every matching row as a list of tuples. Each tuple represents one row, with values in the same order as your table's columns.

Usually you don't need all columns. Select only what you need:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, conditions 
        FROM weather_history
    """)
    
    for location, temp, conditions in cursor:
        print(f"{location}: {temp}°C, {conditions}")

Terminal Output

Dublin: 12.5°C, Cloudy
London: 15.2°C, Rain
Paris: 18.0°C, Sunny

Notice you can iterate directly over the cursor instead of calling fetchall(). This is more memory-efficient for large result sets because it processes one row at a time instead of loading everything into memory.

You can also unpack the tuple directly in the for loop (for location, temp, conditions in cursor). This makes your code more readable than accessing values by index (row[0], row[1]).

WHERE: Filtering Results

Most queries need filtering. You don't want all weather data. You want Dublin's weather, or readings above 15°C, or data from the last 24 hours. The WHERE clause filters results based on conditions.

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT temperature, conditions, timestamp
        FROM weather_history
        WHERE location = ?
    """, ("Dublin",))
    
    print("Dublin weather history:")
    for temp, conditions, timestamp in cursor:
        print(f"  {timestamp}: {temp}°C, {conditions}")

Terminal Output

Dublin weather history:
  2024-03-15 14:30:22: 12.5°C, Cloudy
  2024-03-16 09:15:44: 13.1°C, Partly Cloudy

The WHERE location = ? clause filters rows where the location matches "Dublin." Only Dublin's weather readings are returned.

You can combine multiple conditions with AND and OR:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, conditions
        FROM weather_history
        WHERE temperature > 15 AND temperature < 20
    """)
    
    print("Moderate temperatures (15-20°C):")
    for location, temp, conditions in cursor:
        print(f"  {location}: {temp}°C")

This returns only rows where temperature is between 15 and 20. You can use comparison operators: = (equal), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal), != (not equal).

For time-based filtering (critical for the weather cache), use SQLite's date functions:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, conditions, timestamp
        FROM weather_history
        WHERE timestamp > datetime('now', '-24 hours')
    """)
    
    print("Weather from last 24 hours:")
    for location, temp, conditions, timestamp in cursor:
        print(f"  {timestamp} | {location}: {temp}°C")

SQLite Date Functions

SQLite includes built-in functions for working with dates and times:

datetime('now'): Current timestamp

datetime('now', '-24 hours'): 24 hours ago

datetime('now', '-7 days'): 7 days ago

date(timestamp): Extract just the date from a timestamp (2024-03-15)

These functions let you filter by time ranges, which is essential for cache freshness checks ("give me data from the last hour") and historical analysis ("show me last week's temperatures").

This is how the weather cache determines if cached data is fresh or stale. If a reading exists with timestamp > datetime('now', '-1 hour'), the cache is fresh. Otherwise, fetch new data from the API.

Here's a quick reference for the date functions you'll use most often:

Function	Returns	Example Result
`datetime('now')`	Current timestamp	2024-03-15 14:30:22
`datetime('now', '-1 hour')`	1 hour ago	2024-03-15 13:30:22
`datetime('now', '-7 days')`	7 days ago	2024-03-08 14:30:22
`date('now')`	Current date only	2024-03-15
`date(timestamp)`	Extract date from timestamp	2024-03-15
`datetime('now', '+7 days')`	7 days in the future	2024-03-22 14:30:22

ORDER BY and LIMIT: Sorting and Limiting Results

Query results come back in unpredictable order unless you specify otherwise. The ORDER BY clause sorts results, and LIMIT restricts how many rows you get.

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, conditions
        FROM weather_history
        ORDER BY temperature DESC
    """)
    
    print("Hottest to coldest:")
    for location, temp, conditions in cursor:
        print(f"  {location}: {temp}°C")

Terminal Output

Hottest to coldest:
  Paris: 18.0°C
  London: 15.2°C
  Dublin: 13.1°C
  Dublin: 12.5°C

ORDER BY temperature DESC sorts by temperature in descending order (highest to lowest). Use ASC for ascending order (lowest to highest), or omit it entirely since ascending is the default.

For the weather cache, you often want the most recent reading. Sort by timestamp in descending order and take the first result:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT temperature, conditions, timestamp
        FROM weather_history
        WHERE location = ?
        ORDER BY timestamp DESC
        LIMIT 1
    """, ("Dublin",))
    
    result = cursor.fetchone()
    if result:
        temp, conditions, timestamp = result
        print(f"Latest Dublin weather: {temp}°C, {conditions} (at {timestamp})")
    else:
        print("No weather data found for Dublin")

LIMIT 1 returns only the first row. Combined with ORDER BY timestamp DESC, you get the most recent weather reading. Use fetchone() instead of fetchall() when you expect a single result.

LIMIT is also useful for pagination or getting top results:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, timestamp
        FROM weather_history
        ORDER BY temperature DESC
        LIMIT 5
    """)
    
    print("Top 5 hottest readings:")
    for location, temp, timestamp in cursor:
        print(f"  {location}: {temp}°C")

This pattern (ORDER BY + LIMIT) appears constantly in real applications: "show me the 10 most played songs," "get the 20 most recent API responses," "find the top 5 users by activity."

Aggregation: Calculating Statistics

Databases excel at calculating statistics across many rows. Aggregation functions let you compute averages, counts, minimums, maximums, and sums without loading all the data into Python.

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT AVG(temperature), MIN(temperature), MAX(temperature)
        FROM weather_history
        WHERE location = ?
    """, ("Dublin",))
    
    avg_temp, min_temp, max_temp = cursor.fetchone()
    print(f"Dublin temperature statistics:")
    print(f"  Average: {avg_temp:.1f}°C")
    print(f"  Range: {min_temp}°C to {max_temp}°C")

Terminal Output

Dublin temperature statistics:
  Average: 12.8°C
  Range: 12.5°C to 13.1°C

The aggregation functions (AVG, MIN, MAX) compute statistics directly in the database. You get a single result row with the calculated values.

COUNT tells you how many rows match your query:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, COUNT(*) as reading_count
        FROM weather_history
        GROUP BY location
        ORDER BY reading_count DESC
    """)
    
    print("Weather readings per city:")
    for location, count in cursor:
        print(f"  {location}: {count} readings")

Terminal Output

Weather readings per city:
  Dublin: 2 readings
  London: 1 reading
  Paris: 1 reading

Understanding GROUP BY

GROUP BY location groups all rows with the same location together, then applies the aggregation function (COUNT(*)) to each group separately.

Without GROUP BY, COUNT(*) would return the total number of rows in the entire table. With GROUP BY location, you get one count per unique location.

The pattern is: SELECT [column], [aggregation] FROM [table] GROUP BY [column]. You group by the same column you're selecting (unless you're only selecting the aggregation result).

This is how you build analytics for the music project: "how many times did I play each artist?" becomes SELECT artist, COUNT(*) FROM listening_history GROUP BY artist.

You can also group by date to see trends over time:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT date(timestamp) as day, AVG(temperature) as avg_temp
        FROM weather_history
        WHERE location = ?
        GROUP BY date(timestamp)
        ORDER BY day
    """, ("Dublin",))
    
    print("Dublin daily average temperatures:")
    for day, avg_temp in cursor:
        print(f"  {day}: {avg_temp:.1f}°C")

This calculates the average temperature for each day. The date(timestamp) function extracts just the date portion (2024-03-15), and GROUP BY date(timestamp) groups all readings from the same day together.

UPDATE and DELETE: Modifying Data

You won't often update or delete individual rows in API cache projects (you typically just insert new data). But you'll need these operations for database maintenance and corrections.

UPDATE modifies existing rows:

Python

with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        UPDATE weather_history
        SET temperature = ?, conditions = ?
        WHERE id = ?
    """, (14.0, "Partly Cloudy", 1))
    
    print("Updated weather reading #1")

This changes the temperature and conditions for the row with id = 1. The WHERE clause is critical. Without it, you'd update every row in the table.

DELETE removes rows entirely:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        DELETE FROM weather_history
        WHERE timestamp < datetime('now', '-30 days')
    """)
    
    deleted_count = cursor.rowcount
    print(f"Deleted {deleted_count} old weather readings")

This removes all weather data older than 30 days. The rowcount attribute tells you how many rows were deleted.

Always Use WHERE with UPDATE and DELETE

UPDATE weather_history SET temperature = 20 (no WHERE clause) would set every temperature in your entire table to 20. DELETE FROM weather_history (no WHERE clause) would delete every row in your table.

Always include a WHERE clause unless you genuinely want to modify or delete every row. When in doubt, test your WHERE clause with a SELECT first: SELECT * FROM weather_history WHERE timestamp < datetime('now', '-30 days') shows you what would be deleted.

You now know the core CRUD operations: Create (INSERT), Read (SELECT), Update (UPDATE), and Delete (DELETE). These four operations handle most database interactions. Next, you'll learn how to make those operations fast through indexes, and how to connect related data through JOINs.

Indexes: Making Queries Fast

As your database grows, queries slow down. An index speeds up lookups by creating a sorted reference structure. Think of it like a book's index: instead of reading every page to find "SQLite," you check the index and jump directly to the right pages.

Without an index, WHERE location = 'Dublin' scans every row in the table. With an index on location, the database jumps directly to Dublin's rows.

Python

with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_location 
        ON weather_history(location)
    """)
    print("Created index on location column")

Figure 15.2: Without an index, the database reads every row. With an index, it jumps directly to Dublin's rows.

Now queries filtering by location run much faster. The performance difference is negligible with 100 rows but dramatic with 100,000 rows.

For the weather cache, you frequently query by location and timestamp together. Create a composite index:

Python

with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_location_timestamp 
        ON weather_history(location, timestamp DESC)
    """)
    print("Created composite index on location and timestamp")

This index optimizes queries like WHERE location = ? ORDER BY timestamp DESC, which is exactly what the cache freshness check does. The DESC in the index definition means it's optimized for descending order.

When to Add Indexes

Add indexes for columns you frequently filter on (WHERE location = ?) or sort by (ORDER BY timestamp). Don't index every column. Indexes speed up reads but slow down writes.

For the weather cache: index location and timestamp. For the music project: index artist_name, played_at, and track_id. These are the columns you'll query most often.

If a query feels slow (takes more than a second with thousands of rows), check if the WHERE and ORDER BY columns are indexed.

JOIN: Connecting Related Tables

Sometimes you need data from multiple tables. A JOIN combines rows from two or more tables based on a related column.

For the weather cache, you might store locations in a separate table to avoid repeating city information:

Python

with sqlite3.connect('weather.db') as conn:
    # Locations table
    conn.execute("""
        CREATE TABLE IF NOT EXISTS locations (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            city TEXT NOT NULL,
            country TEXT NOT NULL,
            timezone TEXT
        )
    """)
    
    # Weather readings reference locations by ID
    conn.execute("""
        CREATE TABLE IF NOT EXISTS weather_readings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            location_id INTEGER NOT NULL,
            temperature REAL,
            conditions TEXT,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (location_id) REFERENCES locations(id)
        )
    """)

Now each weather reading stores a location_id that references the locations table. To get the full picture (weather data with city names), you need a JOIN:

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT locations.city, locations.country, 
               weather_readings.temperature, weather_readings.conditions,
               weather_readings.timestamp
        FROM weather_readings
        JOIN locations ON weather_readings.location_id = locations.id
        ORDER BY weather_readings.timestamp DESC
        LIMIT 5
    """)
    
    print("Recent weather readings:")
    for city, country, temp, conditions, timestamp in cursor:
        print(f"  {city}, {country}: {temp}°C, {conditions} ({timestamp})")

How JOIN Works

Imagine you have two spreadsheets: one with customer names and IDs, and another with order lists and customer IDs.

A JOIN is like asking Excel to "VLOOKUP" the customer name for every order based on the matching ID, creating a new, combined view where every order row now also has the correct customer name attached to it.

Figure 15.3: An INNER JOIN connects rows from two tables wherever the primary key and foreign key match, creating a combined result set.

For the weather cache project, you don't need separate tables. A single table works fine. But for the music project, you might separate artists, albums, and tracks into different tables, then JOIN them to get complete information.

You'll see JOINs again in later chapters when building more complex database schemas. For now, understand the concept: JOINs let you split data across multiple tables (avoiding repetition) while still querying it as if it were in one table.

You now know the SQL commands that power database applications: CREATE TABLE, INSERT, SELECT with WHERE/ORDER BY/LIMIT, aggregation functions, UPDATE, DELETE, indexes, and JOINs. These operations handle 95% of what you'll need for API projects.

Next, you'll learn Python's sqlite3 module in depth: how to execute these SQL commands safely, handle errors, manage connections, and integrate databases into your Python code.

3. Python's sqlite3 Module

You know SQL. Now you need to execute it from Python. The sqlite3 module bridges Python and SQLite: it handles connections, executes queries, returns results, and manages transactions.

This section teaches you the Python patterns that make database code safe and professional. You'll learn when to commit changes, how to handle errors, why context managers matter, and how to avoid the mistakes that corrupt databases or leak resources.

Every pattern here appears in production code. These aren't academic examples. They're the techniques that separate scripts from reliable applications.

Connections: The Gateway to Your Database

A connection represents your link to the database file. You open a connection, perform operations, then close it. The connection is your session. It tracks changes, manages transactions, and releases resources when you're done.

Here's the basic pattern:

Python

import sqlite3

# Open connection
conn = sqlite3.connect('weather.db')

# Use the connection
cursor = conn.execute("SELECT * FROM weather_history LIMIT 5")
for row in cursor:
    print(row)

# Close connection
conn.close()

sqlite3.connect('weather.db') opens (or creates) the database file. If weather.db doesn't exist, SQLite creates it. If it does exist, SQLite opens it. The connection object lets you execute SQL and manage transactions.

conn.close() releases the file handle and flushes any pending changes. Always close connections when you're done. Failing to close connections can lock the database file or prevent other processes from accessing it.

The Manual Pattern Has Problems

This manual open-close pattern works, but it's fragile. If your code crashes or raises an exception between connect() and close(), the connection never closes. This leaks resources and can corrupt your database.

Professional code uses context managers instead. We'll cover that pattern in a moment. It's the right way to handle connections.

In-Memory Databases: Testing Without Files

SQLite offers a special connection type that exists entirely in memory. No files are created or modified. This is perfect for testing: you get a real database that disappears when your program ends, leaving no cleanup required.

To create an in-memory database, use the special connection string ':memory:' instead of a filename:

In-Memory Database Example

Python

import sqlite3

def test_weather_cache():
    """Test the weather cache logic without creating files."""
    # Use :memory: instead of a filename
    with sqlite3.connect(':memory:') as conn:
        # Create the schema
        conn.execute("""
            CREATE TABLE weather_history (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                location TEXT NOT NULL,
                temperature REAL,
                conditions TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        
        # Insert test data
        conn.execute("""
            INSERT INTO weather_history (location, temperature, conditions)
            VALUES (?, ?, ?)
        """, ("Dublin", 12.5, "Cloudy"))
        
        # Query it back
        cursor = conn.execute("""
            SELECT temperature, conditions 
            FROM weather_history 
            WHERE location = ?
        """, ("Dublin",))
        
        result = cursor.fetchone()
        assert result[0] == 12.5
        assert result[1] == "Cloudy"
        
        print("✓ Test passed!")
    
    # Database is gone now - no cleanup needed

test_weather_cache()

When the connection closes, the entire database disappears. No weather.db file is created. This makes in-memory databases ideal for unit tests: they're fast (no disk I/O), isolated (each test gets a fresh database), and clean (no leftover files).

When to Use In-Memory Databases

Use :memory: for:

Unit tests that need database operations
Experimenting with SQL queries without affecting real data
Temporary data processing that doesn't need persistence
Learning SQLite without cluttering your file system

Use file-based databases for:

Production applications (you need persistence!)
Data that must survive program restarts
Debugging (you want to inspect the database manually)
Sharing data between multiple processes

Here's a more complete testing example that validates cache freshness logic:

Testing Cache Freshness

Python

import sqlite3
from datetime import datetime, timedelta, timezone

def test_cache_freshness():
    """Test that cache correctly identifies fresh vs stale data."""
    with sqlite3.connect(':memory:') as conn:
        conn.execute("""
            CREATE TABLE weather_history (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                location TEXT NOT NULL,
                temperature REAL,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        
        # Insert fresh data (recent)
        conn.execute("""
            INSERT INTO weather_history (location, temperature, timestamp)
            VALUES (?, ?, ?)
        """, ("Dublin", 12.5, datetime.now(timezone.utc)))
        
        # Insert stale data (2 hours ago)
        stale_timestamp = datetime.now(timezone.utc) - timedelta(hours=2)
        conn.execute("""
            INSERT INTO weather_history (location, temperature, timestamp)
            VALUES (?, ?, ?)
        """, ("London", 15.0, stale_timestamp))
        
        # Check Dublin cache (should be fresh)
        cursor = conn.execute("""
            SELECT temperature 
            FROM weather_history 
            WHERE location = ? 
              AND timestamp > datetime('now', '-1 hour')
            ORDER BY timestamp DESC 
            LIMIT 1
        """, ("Dublin",))
        
        dublin_result = cursor.fetchone()
        assert dublin_result is not None, "Dublin cache should be fresh"
        print(f"✓ Dublin cache is fresh: {dublin_result[0]}°C")
        
        # Check London cache (should be stale/missing)
        cursor = conn.execute("""
            SELECT temperature 
            FROM weather_history 
            WHERE location = ? 
              AND timestamp > datetime('now', '-1 hour')
            ORDER BY timestamp DESC 
            LIMIT 1
        """, ("London",))
        
        london_result = cursor.fetchone()
        assert london_result is None, "London cache should be stale"
        print("✓ London cache correctly identified as stale")

test_cache_freshness()

This test validates the cache freshness logic you'll use in production. The database exists only for the duration of the test. No files are created, no cleanup is needed, and the next test gets a completely fresh database.

In-Memory Databases vs Temporary Files

For most tests, ':memory:' is the right choice. But occasionally you need an actual file, for example, when testing backup/restore functionality or when you want to inspect the database after a test fails.

For file-based testing, use Python's tempfile module to create databases in a temporary directory that gets cleaned up automatically:

Python

import sqlite3
import tempfile
import os

def test_with_temp_file():
    """Test using a temporary file instead of in-memory."""
    # Create temporary database file
    temp_db_path = tempfile.mktemp(suffix='.db')
    
    try:
        conn = sqlite3.connect(temp_db_path)
        
        # Create schema and run tests
        conn.execute("""
            CREATE TABLE test_table (id INTEGER PRIMARY KEY, value TEXT)
        """)
        conn.execute("INSERT INTO test_table (value) VALUES (?)", ("test",))
        
        cursor = conn.execute("SELECT value FROM test_table")
        assert cursor.fetchone()[0] == "test"
        
        print("✓ Temporary file test passed")
        
        conn.close()
    finally:
        # Clean up: delete the temporary database file
        if os.path.exists(temp_db_path):
            os.remove(temp_db_path)

test_with_temp_file()

Temporary files are useful when you need to test features that require actual file operations (like database backups or exports) or when you want to inspect the database manually after tests fail.

Testing Best Practices

Use :memory: for fast unit tests
Use temporary files when testing file operations
Create fixtures (helper functions that set up common test data)
Test both success cases and error cases (what happens with invalid data?)
Never run tests against your production database
Reset the database between tests to ensure isolation

Chapter 19 covers testing in depth, including pytest fixtures for database setup and how to mock external dependencies. For now, know that testing database code is possible and important. :memory: databases make it easy.

Cursors: Executing Queries and Fetching Results

A cursor executes SQL statements and retrieves results. You get a cursor from a connection, use it to run queries, then fetch the rows it returns.

You've already seen the shortcut pattern: conn.execute() creates a cursor automatically. But you can also create cursors explicitly:

Python

import sqlite3

conn = sqlite3.connect('weather.db')

# Create cursor explicitly
cursor = conn.cursor()

# Execute query
cursor.execute("SELECT location, temperature FROM weather_history WHERE temperature > ?", (15,))

# Fetch results
results = cursor.fetchall()
for location, temp in results:
    print(f"{location}: {temp}°C")

cursor.close()
conn.close()

Both approaches work. conn.execute() is more concise for simple queries. conn.cursor() gives you more control when you need to execute multiple statements or manage result sets carefully.

Cursors Are Iterators

A cursor is an iterator over result rows. You can iterate directly (for row in cursor) or fetch rows explicitly (cursor.fetchall(), cursor.fetchone()). Both patterns work. Choose based on your needs.

Iterating directly is more memory-efficient for large result sets. Fetching all results upfront is simpler when you know the result set is small.

Fetching Results: Three Ways to Retrieve Rows

After executing a SELECT query, you need to retrieve the results. SQLite gives you three methods, each suited to different use cases.

fetchone(): Get One Row

Returns the next row, or None if no rows remain. Use this when you expect a single result or want to process rows one at a time.

Python

import sqlite3

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT temperature, conditions, timestamp
        FROM weather_history
        WHERE location = ?
        ORDER BY timestamp DESC
        LIMIT 1
    """, ("Dublin",))
    
    result = cursor.fetchone()
    if result:
        temp, conditions, timestamp = result
        print(f"Latest Dublin weather: {temp}°C, {conditions}")
    else:
        print("No weather data found for Dublin")

fetchone() is perfect for queries that return a single row: "get the most recent reading," "check if a record exists," "find a specific user by ID."

fetchall(): Get All Rows at Once

Returns a list of all remaining rows. Use this when you need all results and the result set is small (hundreds of rows, not millions).

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, AVG(temperature) as avg_temp
        FROM weather_history
        GROUP BY location
        ORDER BY avg_temp DESC
    """)
    
    all_results = cursor.fetchall()
    
    print("Average temperatures by city:")
    for location, avg_temp in all_results:
        print(f"  {location}: {avg_temp:.1f}°C")

fetchall() loads the entire result set into memory. This is fine for analytics queries that return a few dozen rows. Avoid it for queries that might return thousands of rows. Use iteration instead.

fetchmany(): Get Rows in Batches

Returns the next N rows. Use this for processing large result sets in chunks, balancing memory usage and database calls.

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("SELECT * FROM weather_history")
    
    batch_size = 100
    while True:
        rows = cursor.fetchmany(batch_size)
        if not rows:
            break
        
        # Process this batch
        for row in rows:
            # Do something with each row (export to CSV, send to API, etc.)
            process_row(row)
        
        print(f"Processed {len(rows)} rows")

fetchmany(100) retrieves 100 rows at a time. When those rows are exhausted, call it again to get the next 100. This pattern handles arbitrarily large result sets without loading everything into memory.

Choosing the Right Fetch Method

Use fetchone() when: You expect exactly one result, or you're checking if a row exists

Use fetchall() when: Result set is small (< 1000 rows) and you need all results at once

Use iteration when: Result set might be large and you can process rows one at a time

Use fetchmany() when: Result set is large but you need to process in batches (exporting, sending to API)

For the weather cache and music project, fetchone() and fetchall() handle most use cases. You'll rarely have more than a few thousand rows.

Commits and Rollbacks: Managing Transactions

When you INSERT, UPDATE, or DELETE data, the changes aren't written to disk immediately. They're held in a transaction. You must commit the transaction to make changes permanent, or rollback to discard them.

SELECT queries don't need commits. They just read data. But any query that modifies the database requires a commit:

Python

import sqlite3

conn = sqlite3.connect('weather.db')

# Insert data
conn.execute("""
    INSERT INTO weather_history (location, temperature, conditions, timestamp)
    VALUES (?, ?, ?, ?)
""", ("Dublin", 14.2, "Sunny", "2024-03-17 10:00:00"))

# Changes are pending - not yet written to disk
# You must commit to make them permanent
conn.commit()

conn.close()

Without conn.commit(), your INSERT never hits the database. When the connection closes, pending changes are rolled back and discarded.

Rollbacks let you discard changes if something goes wrong:

Python

import sqlite3

conn = sqlite3.connect('weather.db')

try:
    # Insert multiple records as a transaction
    conn.execute("INSERT INTO weather_history (location, temperature) VALUES (?, ?)", 
                 ("Dublin", 14.2))
    conn.execute("INSERT INTO weather_history (location, temperature) VALUES (?, ?)", 
                 ("London", 16.5))
    conn.execute("INSERT INTO weather_history (location, temperature) VALUES (?, ?)", 
                 ("Paris", 18.0))
    
    # Commit all changes together
    conn.commit()
    print("Successfully inserted 3 weather records")
    
except sqlite3.Error as e:
    # Something went wrong - discard all changes
    conn.rollback()
    print(f"Error occurred: {e}. Changes rolled back.")
    
finally:
    conn.close()

Transactions Ensure Consistency

Transactions let you group multiple operations into an atomic unit. Either all operations succeed (commit), or none of them do (rollback). You never end up in a partial state where some changes saved but others didn't.

This matters when multiple INSERTs or UPDATEs depend on each other. If inserting weather data for five cities and the fourth INSERT fails, you probably want to rollback all five to maintain consistency.

Transaction lifecycle: SQL changes stay pending until you explicitly commit or rollback. Context managers handle this automatically.

By default, SQLite uses autocommit mode: each statement commits immediately unless you explicitly begin a transaction. The sqlite3 module disables autocommit by default, so you must call commit() explicitly. This is safer. You control when changes become permanent.

Context Managers: The Professional Pattern

You've seen the manual pattern: connect(), do work, commit(), close(). It works, but it's fragile. If an exception occurs between opening and closing, the connection leaks and changes might not commit.

Professional code uses context managers, the with statement. This pattern guarantees proper cleanup even when errors occur:

Python

import sqlite3

with sqlite3.connect('weather.db') as conn:
    conn.execute("""
        INSERT INTO weather_history (location, temperature, conditions)
        VALUES (?, ?, ?)
    """, ("Dublin", 14.2, "Sunny"))
    
    # Commit happens automatically if no exception occurs
    # Connection closes automatically when the with block ends

The with statement handles cleanup automatically. If the code inside the block succeeds, changes commit. If an exception occurs, changes rollback. The connection always closes when the block ends.

What the Context Manager Does

On entry (with sqlite3.connect...): Opens the database connection

On success (block completes normally): Commits pending changes, closes connection

On exception (error occurs in block): Rolls back changes, closes connection, re-raises the exception

You never leak connections. You never forget to commit. You never leave the database in an inconsistent state. This is why professional code always uses context managers.

The context manager eliminates three error-prone manual steps. Seven lines of fragile boilerplate reduces to two lines that are always safe.

Compare the manual pattern to the context manager pattern:

Python - Manual (Fragile)

conn = sqlite3.connect('weather.db')
try:
    conn.execute("INSERT INTO weather_history (...) VALUES (...)", (...))
    conn.commit()
except Exception as e:
    conn.rollback()
    raise
finally:
    conn.close()

Python - Context Manager (Professional)

with sqlite3.connect('weather.db') as conn:
    conn.execute("INSERT INTO weather_history (...) VALUES (...)", (...))

The context manager version does everything the manual version does (commits, rollbacks, closes) but automatically and safely. Always use context managers for database connections. This isn't optional style advice, it's how you prevent resource leaks and data corruption.

Error Handling: When Databases Fight Back

Databases fail in predictable ways. Files get locked. Constraints get violated. Queries reference nonexistent columns. Professional code anticipates these failures and handles them gracefully.

The sqlite3 module raises specific exceptions for different error types. Catch them to provide useful error messages:

Python

import sqlite3

def save_weather(location, temperature, conditions):
    try:
        with sqlite3.connect('weather.db') as conn:
            conn.execute("""
                INSERT INTO weather_history (location, temperature, conditions, timestamp)
                VALUES (?, ?, ?, datetime('now'))
            """, (location, temperature, conditions))
            print(f"Saved weather for {location}")
            
    except sqlite3.IntegrityError as e:
        # Constraint violation (duplicate key, foreign key, NOT NULL, etc.)
        print(f"Cannot save weather: {e}")
        print("Check that all required fields are provided and constraints are satisfied")
        
    except sqlite3.OperationalError as e:
        # Database locked, table doesn't exist, syntax error
        print(f"Database operation failed: {e}")
        print("Ensure the database file is accessible and not locked by another process")
        
    except sqlite3.DatabaseError as e:
        # Generic database error (catches all sqlite3 errors)
        print(f"Database error: {e}")
        
    except Exception as e:
        # Unexpected error (network issues, disk full, etc.)
        print(f"Unexpected error: {e}")

Common SQLite Errors

IntegrityError: Constraint violation. You tried to insert a duplicate primary key, violate a foreign key constraint, or insert NULL into a NOT NULL column.

OperationalError: Database locked (another process is writing), table doesn't exist, or SQL syntax error.

ProgrammingError: You called a method on a closed connection or cursor, or used the API incorrectly.

Most errors you'll encounter are IntegrityError (bad data) or OperationalError (database locked). Handle these explicitly and let other exceptions propagate for debugging.

The most common error is OperationalError: database is locked. This happens when one process is writing while another tries to write simultaneously. SQLite allows multiple readers but only one writer at a time.

For single-user applications (like the weather cache or music project), database locking shouldn't occur. If it does, you probably have unclosed connections or are accessing the database from multiple threads without proper locking.

Defensive Error Handling

Don't let database errors crash your application silently. Catch specific exceptions, log them with context, and either retry (for transient errors like locks) or fail gracefully with a user-friendly message.

For the weather cache: if saving to the database fails, you can still return the weather data from the API. The cache is an optimization, not a requirement. Don't let a locked database prevent users from seeing weather data.

Best Practices: Patterns for Production Code

You've learned the mechanics. Now here are the patterns that separate hobby scripts from production-ready code.

Always Use Parameterized Queries

Never use f-strings or string concatenation to build SQL. Always use ? placeholders and pass values as a tuple. This prevents SQL injection attacks and handles quoting automatically.

Python - NEVER Do This

# DANGEROUS - vulnerable to SQL injection
location = user_input  # Could be: "Dublin'; DROP TABLE weather_history; --"
conn.execute(f"SELECT * FROM weather_history WHERE location = '{location}'")

Python - ALWAYS Do This

# SAFE - parameterized query prevents SQL injection
location = user_input
conn.execute("SELECT * FROM weather_history WHERE location = ?", (location,))

Always Use Context Managers

Use with sqlite3.connect(...) as conn: for every database operation. This guarantees proper cleanup even when exceptions occur.

Close Connections Quickly

Don't keep connections open longer than necessary. Open the connection, do your work, let the context manager close it. Holding connections open blocks other processes from writing.

Validate Data Before INSERT

Check that data is valid before inserting it. Verify types, check for None where NOT NULL is required, validate ranges. Catching errors early (in Python) gives you better error messages than letting the database reject the data.

Python

def save_weather(location, temperature, conditions):
    # Validate data before database operation
    if not location:
        raise ValueError("Location is required")
    
    if not isinstance(temperature, (int, float)):
        raise ValueError(f"Temperature must be numeric, got {type(temperature)}")
    
    if temperature < -100 or temperature > 100:
        raise ValueError(f"Temperature {temperature}°C is outside valid range")
    
    # Now insert with confidence
    with sqlite3.connect('weather.db') as conn:
        conn.execute("""
            INSERT INTO weather_history (location, temperature, conditions, timestamp)
            VALUES (?, ?, ?, datetime('now'))
        """, (location, temperature, conditions))

Use Row Factories for Readable Results

By default, SQLite returns query results as tuples. You access columns by numeric index, which works but is fragile and hard to read:

Tuple-Based Access (Default)

Python

with sqlite3.connect('weather.db') as conn:
    cursor = conn.execute("""
        SELECT location, temperature, conditions, timestamp 
        FROM weather_history 
        WHERE location = ?
        ORDER BY timestamp DESC 
        LIMIT 1
    """, ("Dublin",))

    row = cursor.fetchone()
    if row:
        # Which index is which? You have to count columns in the SELECT
        print(f"{row[0]}: {row[1]}°C, {row[2]}")
        # What if you change the SELECT order? All indices break

This approach has problems: changing the SELECT order breaks all your index references, reading row[1] doesn't tell you what the value represents, and passing tuples to templates forces you to remember what each position means.

SQLite provides a better way: Row factories convert query results into dict-like objects that support both index and name-based access.

Dictionary-Like Access (Recommended)

Python

with sqlite3.connect('weather.db') as conn:
    # Enable Row factory for dictionary-like access
    conn.row_factory = sqlite3.Row

    cursor = conn.execute("""
        SELECT location, temperature, conditions, timestamp 
        FROM weather_history 
        WHERE location = ?
        ORDER BY timestamp DESC 
        LIMIT 1
    """, ("Dublin",))

    row = cursor.fetchone()
    if row:
        # Column names are explicit - much clearer!
        print(f"{row['location']}: {row['temperature']}°C, {row['conditions']}")

        # You can still use indices if needed
        print(f"First column: {row[0]}")

        # Works with dict() conversion for templates
        weather_data = dict(row)
        # {'location': 'Dublin', 'temperature': 12.5, 'conditions': 'cloudy', ...}

With sqlite3.Row, your code becomes self-documenting. row['temperature'] is immediately clear. Reordering SELECT columns doesn't break anything. And when you pass these rows to Flask templates (Chapter 17), you can access them naturally: {{ weather.temperature }}.

When to Use Row Factories

Use sqlite3.Row when:

Building web applications (Flask, Django) where templates need named access
Working with queries that return many columns (10+ columns become unmanageable as tuples)
Refactoring queries frequently (adding/removing/reordering columns)
Writing code others will maintain (explicit names improve readability)

Tuples are fine when:

You're only selecting 1-2 columns and position is obvious
Performance is absolutely critical (Row has tiny overhead, but it exists)
You're immediately unpacking results: location, temp = row

For the Music Time Machine project, you'll use sqlite3.Row extensively. Your queries return tracks with 8+ columns (title, artist, album, play_date, energy, valence, danceability, tempo). Accessing by name keeps the code maintainable.

You now understand Python's sqlite3 module: connections, cursors, fetching results, transactions, context managers, error handling, and professional patterns. You can execute SQL safely, handle failures gracefully, and manage database resources properly.

Next, you'll build the weather API cache: a complete project that combines everything from Sections 2 and 3. You'll design a schema, fetch data from an API, store it in SQLite, check cache freshness, and query historical trends. This is where SQL and Python come together to build an application with memory.

4. Hands-On Project: Weather API Cache

You know SQL. You know Python's sqlite3 module. Now you'll combine them with API calls to build something useful: a weather cache that remembers past data and avoids unnecessary API requests.

This project demonstrates why databases matter for API applications. Without a database, you'd fetch the same weather data repeatedly, burning through rate limits and wasting time on network calls. With a cache, you store responses and reuse them when they're still fresh. The API gives you current data. The database gives you history and speed.

By the end of this section, you'll have a complete weather cache that stores API responses, checks cache freshness, returns cached data when it's recent, fetches fresh data when it's stale, and analyzes temperature trends over time. These exact patterns power the music project in Chapter 16.

Designing the Schema

Before writing code, design your schema. What data do you need to store? What queries will you run? How will you check if data is fresh?

For the weather cache, you need:

Location (which city the weather data is for)
Temperature, conditions, humidity, wind speed (the actual weather data)
Timestamp (when you fetched this data from the API)
A unique ID for each reading

The queries you'll run most often:

"Is there cached data for Dublin from the last hour?" (cache freshness check)
"What was the temperature trend for Dublin over the last 7 days?" (historical analysis)
"What's the average temperature for each city I've tracked?" (aggregation)

These queries filter by location and timestamp frequently, so those columns need indexes. Here's the complete schema:

Python

import sqlite3

def create_tables():
    """Create the weather cache table and indexes."""
    with sqlite3.connect('weather_cache.db') as conn:
        # Main weather table
        conn.execute("""
            CREATE TABLE IF NOT EXISTS weather_history (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                location TEXT NOT NULL,
                temperature REAL NOT NULL,
                conditions TEXT,
                humidity INTEGER,
                wind_speed REAL,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
                fetched_at DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        
        # Index for cache lookups (location + recent timestamp)
        conn.execute("""
            CREATE INDEX IF NOT EXISTS idx_location_timestamp 
            ON weather_history(location, timestamp DESC)
        """)
        
        # Index for location-only queries
        conn.execute("""
            CREATE INDEX IF NOT EXISTS idx_location 
            ON weather_history(location)
        """)
        
        print("Database tables and indexes created successfully")

# Run this once to set up the database
create_tables()

Schema Design Decisions

timestamp: When the weather reading was reported (from API). This is the actual weather time.

fetched_at: When we retrieved this data from the API. Used for cache freshness checks.

NOT NULL on key fields: Location and temperature are required. If the API doesn't return them, we don't cache that response.

Composite index (location, timestamp DESC): Optimizes "get most recent weather for Dublin" queries. Exactly what cache freshness checks do.

IF NOT EXISTS: Safe to run multiple times. The table and indexes only get created if they don't already exist.

Fetching and Storing Weather Data

First, you need to fetch weather from the API. This uses the error handling patterns from Chapter 9: timeouts, status code checking, defensive JSON parsing.

Python

import requests
from datetime import datetime, timezone

API_KEY = "YOUR_OPENWEATHERMAP_API_KEY"  # Get free key at openweathermap.org

def fetch_weather_from_api(location):
    """Fetch current weather from OpenWeatherMap API."""
    try:
        response = requests.get(
            "https://api.openweathermap.org/data/2.5/weather",
            params={
                "q": location,
                "appid": API_KEY,
                "units": "metric"
            },
            timeout=10
        )
        response.raise_for_status()
        
        data = response.json()
        
        # Extract weather data with defensive parsing
        weather = {
            "location": location,
            "temperature": data.get("main", {}).get("temp"),
            "conditions": data.get("weather", [{}])[0].get("description", "Unknown"),
            "humidity": data.get("main", {}).get("humidity"),
            "wind_speed": data.get("wind", {}).get("speed"),
            "timestamp": datetime.now(timezone.utc)
        }
        
        return weather
        
    except requests.exceptions.Timeout:
        print(f"Request timed out for {location}")
        return None
        
    except requests.exceptions.RequestException as e:
        print(f"Error fetching weather for {location}: {e}")
        return None

Now store the weather data in the database:

Python

def save_weather(weather):
    """Save weather data to the database."""
    if not weather or not weather.get("temperature"):
        print("Invalid weather data - not caching")
        return False
    
    try:
        with sqlite3.connect('weather_cache.db') as conn:
            conn.execute("""
                INSERT INTO weather_history 
                (location, temperature, conditions, humidity, wind_speed, timestamp, fetched_at)
                VALUES (?, ?, ?, ?, ?, ?, ?)
            """, (
                weather["location"],
                weather["temperature"],
                weather["conditions"],
                weather["humidity"],
                weather["wind_speed"],
                weather["timestamp"],
                datetime.now(timezone.utc)
            ))
            
        print(f"Cached weather for {weather['location']}: {weather['temperature']}°C")
        return True
        
    except sqlite3.Error as e:
        print(f"Database error: {e}")
        return False

What Just Happened

fetch_weather_from_api() calls the OpenWeatherMap API with error handling. If the request fails, it returns None instead of crashing.

save_weather() validates the data before inserting. If temperature is missing (the most critical field), we don't cache the response. Bad data doesn't pollute the database.

Both functions use parameterized queries (? placeholders) to prevent SQL injection. Both use context managers to guarantee proper cleanup.

The pattern is: fetch from API → validate → store in database. This separates concerns and makes each function testable independently.

Cache Freshness: The Smart Part

Here's where the cache becomes useful. Before fetching from the API, check if you have recent data in the database. If the cached data is less than an hour old, return it. If it's stale or doesn't exist, fetch fresh data.

Python

def get_cached_weather(location, max_age_minutes=60):
    """
    Get weather from cache if fresh, otherwise return None.
    
    Args:
        location: City name
        max_age_minutes: Consider cache fresh if data is newer than this (default 60 minutes)
    
    Returns:
        Dictionary with weather data if cache is fresh, None otherwise
    """
    try:
        with sqlite3.connect('weather_cache.db') as conn:
            conn.row_factory = sqlite3.Row
            
            cursor = conn.execute("""
                SELECT location, temperature, conditions, humidity, wind_speed, 
                       timestamp, fetched_at
                FROM weather_history
                WHERE location = ?
                  AND fetched_at > datetime('now', ? || ' minutes')
                ORDER BY fetched_at DESC
                LIMIT 1
            """, (location, -max_age_minutes))
            
            row = cursor.fetchone()
            
            if row:
                print(f"Cache HIT for {location} (age: {max_age_minutes} min threshold)")
                return dict(row)
            else:
                print(f"Cache MISS for {location} (no recent data)")
                return None
                
    except sqlite3.Error as e:
        print(f"Database error: {e}")
        return None

How Cache Freshness Works

The query WHERE fetched_at > datetime('now', '-60 minutes') finds data fetched within the last hour. SQLite's datetime('now', '-60 minutes') calculates "60 minutes ago."

We parameterize the minutes value: ? || ' minutes' concatenates the parameter with the string " minutes", so passing -60 produces '-60 minutes'.

ORDER BY fetched_at DESC LIMIT 1 gets the most recent cached entry. If multiple readings exist within the freshness window, we want the newest.

conn.row_factory = sqlite3.Row lets us return dict(row), which converts the Row object to a dictionary with column names as keys.

Now combine cache checking with API fetching:

Python

def get_weather(location, max_age_minutes=60):
    """
    Get weather for a location, using cache if fresh or fetching from API if stale.
    
    This is the main function users call. It handles the cache-or-fetch logic
    automatically and always returns weather data (or None on total failure).
    """
    # Try cache first
    cached = get_cached_weather(location, max_age_minutes)
    if cached:
        return cached
    
    # Cache miss or stale - fetch from API
    print(f"Fetching fresh weather for {location} from API...")
    weather = fetch_weather_from_api(location)
    
    if weather:
        # Save to cache for next time
        save_weather(weather)
        return weather
    else:
        print(f"Failed to fetch weather for {location}")
        return None

This is the power of caching. The first call fetches from the API and stores the result. The second call returns instantly from the database. After an hour, the cache expires and you fetch fresh data automatically.

Let's test it:

Python

# First call - fetches from API
weather = get_weather("Dublin")
if weather:
    print(f"\n{weather['location']}: {weather['temperature']}°C, {weather['conditions']}")

# Second call (immediate) - returns from cache
weather = get_weather("Dublin")
if weather:
    print(f"\n{weather['location']}: {weather['temperature']}°C, {weather['conditions']}")

Terminal Output

Cache MISS for Dublin (no recent data)
Fetching fresh weather for Dublin from API...
Cached weather for Dublin: 12.5°C

Dublin: 12.5°C, light rain

Cache HIT for Dublin (age: 60 min threshold)

Dublin: 12.5°C, light rain

The first call took a second (network request). The second call returned instantly (database query). You just avoided an unnecessary API call while still getting the data you needed.

Cache flowchart: the first call for a city fetches from the API and stores the result. Every subsequent call within 60 minutes returns instantly from the database.

Analytics: Querying Historical Data

Now that you're caching weather data over time, you can analyze trends. This is where databases shine. Queries that would require loading thousands of API responses into memory happen instantly.

Python

def get_temperature_trend(location, days=7):
    """Get daily average temperature for the last N days."""
    with sqlite3.connect('weather_cache.db') as conn:
        cursor = conn.execute("""
            SELECT date(timestamp) as day, 
                   AVG(temperature) as avg_temp,
                   COUNT(*) as reading_count
            FROM weather_history
            WHERE location = ?
              AND timestamp >= date('now', ? || ' days')
            GROUP BY date(timestamp)
            ORDER BY day
        """, (location, -days))
        
        results = cursor.fetchall()
        
        if not results:
            print(f"No historical data for {location} in the last {days} days")
            return []
        
        return results

Python

def show_temperature_trend(location, days=7):
    """Display temperature trend for a location."""
    print(f"\n{location} - Temperature Trend (Last {days} Days)")
    print("=" * 50)
    
    trend = get_temperature_trend(location, days)
    
    if not trend:
        return
    
    for day, avg_temp, count in trend:
        print(f"{day}: {avg_temp:.1f}°C (from {count} readings)")
    
    # Calculate overall statistics
    temps = [avg_temp for _, avg_temp, _ in trend]
    if temps:
        print(f"\nOverall Average: {sum(temps) / len(temps):.1f}°C")
        print(f"Range: {min(temps):.1f}°C to {max(temps):.1f}°C")

Python

# Show trend after collecting data for a week
show_temperature_trend("Dublin", days=7)

Terminal Output

Dublin - Temperature Trend (Last 7 Days)
==================================================
2024-03-11: 11.2°C (from 24 readings)
2024-03-12: 12.8°C (from 24 readings)
2024-03-13: 13.5°C (from 24 readings)
2024-03-14: 12.1°C (from 24 readings)
2024-03-15: 14.3°C (from 24 readings)
2024-03-16: 13.9°C (from 24 readings)
2024-03-17: 12.6°C (from 18 readings)

Overall Average: 12.9°C
Range: 11.2°C to 14.3°C

This analytics query groups readings by day, calculates the average temperature for each day, and orders results chronologically. The database does all the heavy lifting (grouping, averaging, sorting) in milliseconds.

Here's another useful query: comparing multiple cities:

Python

def compare_cities():
    """Compare average temperatures across all cached cities."""
    with sqlite3.connect('weather_cache.db') as conn:
        cursor = conn.execute("""
            SELECT location,
                   AVG(temperature) as avg_temp,
                   MIN(temperature) as min_temp,
                   MAX(temperature) as max_temp,
                   COUNT(*) as total_readings
            FROM weather_history
            WHERE timestamp >= date('now', '-7 days')
            GROUP BY location
            ORDER BY avg_temp DESC
        """)
        
        print("\nCity Comparison (Last 7 Days)")
        print("=" * 70)
        print(f"{'City':<15} {'Avg Temp':<12} {'Min':<10} {'Max':<10} {'Readings':<10}")
        print("-" * 70)
        
        for location, avg_temp, min_temp, max_temp, count in cursor:
            print(f"{location:<15} {avg_temp:>8.1f}°C  {min_temp:>8.1f}°C  {max_temp:>8.1f}°C  {count:>8}")

compare_cities()

Terminal Output

City Comparison (Last 7 Days)
======================================================================
City            Avg Temp     Min        Max        Readings  
----------------------------------------------------------------------
Paris               16.8°C      14.2°C      19.1°C       168
London              14.3°C      11.5°C      16.8°C       168
Dublin              12.9°C      11.2°C      14.3°C       162

These analytics queries demonstrate the value of persistent storage. Without a database, you'd need to call the API repeatedly to gather historical data, or maintain complex in-memory data structures that disappear when your program ends. With SQLite, the data accumulates automatically and queries run instantly.

Putting It All Together

Here's the complete weather cache as a single runnable program. This combines everything from this section: schema creation, API fetching, cache logic, and analytics.

Python

"""
Weather API Cache - Complete Example
Demonstrates SQLite database integration with API calls
"""
import sqlite3
import requests
from datetime import datetime, timezone

API_KEY = "YOUR_OPENWEATHERMAP_API_KEY"

# [Include all the functions defined above:
#  - create_tables()
#  - fetch_weather_from_api()
#  - save_weather()
#  - get_cached_weather()
#  - get_weather()
#  - get_temperature_trend()
#  - show_temperature_trend()
#  - compare_cities()
# ]

def main():
    """Demonstrate the weather cache in action."""
    # Set up database
    print("Setting up weather cache database...")
    create_tables()
    
    # Track weather for multiple cities
    cities = ["Dublin", "London", "Paris"]
    
    print("\nFetching weather for cities...")
    for city in cities:
        weather = get_weather(city)
        if weather:
            print(f"  {city}: {weather['temperature']}°C, {weather['conditions']}")
    
    # Demonstrate cache hit
    print("\nFetching same cities again (should hit cache)...")
    for city in cities:
        weather = get_weather(city)
        if weather:
            print(f"  {city}: {weather['temperature']}°C, {weather['conditions']}")
    
    # Show analytics (after collecting data for a while)
    print("\n" + "=" * 70)
    compare_cities()

if __name__ == "__main__":
    main()

Running the Complete Example

To run this project:

1. Sign up for a free API key at openweathermap.org

2. Replace YOUR_OPENWEATHERMAP_API_KEY with your actual key

3. Install requests: pip install requests --break-system-packages

4. Run the script: python weather_cache.py

The first run creates the database. Subsequent runs add to it. Leave it running for a week (fetch weather every hour) and you'll have rich historical data for analytics.

What You Built and Why It Matters

This weather cache demonstrates every database concept from Sections 2 and 3:

Schema design: Chose appropriate data types, added NOT NULL constraints, created indexes for common queries
CRUD operations: INSERT to cache responses, SELECT to retrieve data, no UPDATE/DELETE needed (append-only cache)
Defensive SQL: Parameterized queries prevent SQL injection, context managers guarantee cleanup
Time-based filtering: SQLite date functions determine cache freshness
Aggregation: GROUP BY and AVG() calculate daily temperature trends
Error handling: Caught specific exceptions, validated data before insertion

More importantly, you built something that solves a real problem. Without the cache, each weather request hits the API, burns rate limits, and takes a second. With the cache, requests return instantly and you have historical data for analysis. The API gives you current information. The database gives you memory and speed.

These exact patterns power the music project in Chapter 16. Instead of weather data, you'll cache Spotify listening history. Instead of temperature trends, you'll analyze musical evolution. But the fundamentals are identical: fetch from API, store in SQLite, check cache freshness, query historical patterns.

From Cache to Application

The weather cache is functional but basic. To make it production-ready, you'd add:

Cleanup logic: Delete data older than 30 days to prevent unlimited growth

Configurable freshness: Different cities might need different cache durations

Multiple API sources: Fall back to a secondary weather API if the primary fails

Web interface: Flask dashboard to visualize trends (covered in Chapter 17)

These enhancements use the same SQLite foundation you built here. You've learned the core patterns. Extensions are just more SQL queries and Python functions.

You've built a complete database-backed application. You can fetch data from APIs, store it in SQLite, query it intelligently, and analyze trends over time. Your applications now have memory.

The final section of this chapter covers schema design best practices: when to normalize data, how to handle schema changes, and patterns for designing databases that grow with your application. These concepts prepare you for the more complex schemas you'll design in the music project.

5. Schema Design Best Practices

You've built a working database application. Now let's discuss the design decisions that separate adequate schemas from excellent ones. These patterns help you build databases that perform well, handle growth gracefully, and adapt to changing requirements without requiring complete rewrites.

Schema design is about trade-offs. Every decision has consequences for performance, maintainability, and flexibility. This section teaches you when to optimize for speed versus simplicity, when to split tables versus keeping them combined, and how to evolve schemas as your application grows.

These aren't rigid rules. They're guidelines informed by production experience. You'll learn when to follow them and when breaking them makes sense for your specific use case.

Choosing the Right Data Types

Data type decisions affect storage size, query performance, and what operations you can perform. Choose types based on what you need to do with the data, not just what fits.

INTEGER for Counts and IDs

Use INTEGER for anything you'll count, compare, or do math on: row IDs, play counts, years, quantities. SQLite stores integers efficiently and comparison operations (>, <) are fast.

SQL

-- Good: INTEGER for countable values
play_count INTEGER,
track_duration_ms INTEGER,
release_year INTEGER,
user_age INTEGER

-- Bad: TEXT for numeric values
play_count TEXT,  -- Can't do: WHERE play_count > 10
release_year TEXT  -- Can't calculate age: 2024 - release_year

REAL for Decimal Precision

Use REAL for decimal numbers: temperatures, prices, ratings, percentages. Don't use INTEGER if precision matters. 12.5°C and 12°C are different temperatures.

SQL

-- Good: REAL preserves decimals
temperature REAL,        -- 12.5, 12.7, 13.2
price REAL,             -- 9.99, 10.50
track_popularity REAL   -- 0.75, 0.82

-- Bad: INTEGER loses precision
temperature INTEGER     -- 12.5 becomes 12, 12.7 becomes 12

TEXT for Strings of Any Length

Use TEXT for names, descriptions, URLs, JSON, or anything textual. SQLite's TEXT type has no length limit, so you don't need to guess "is 50 characters enough for artist names?"

SQL

-- TEXT handles strings of any length
artist_name TEXT,
track_title TEXT,
album_url TEXT,
spotify_uri TEXT,
raw_api_response TEXT  -- Can store entire JSON responses

Storing JSON in SQLite

API responses are often too complex to fully normalize into columns. A Spotify track response contains nested objects, arrays of artists, multiple image URLs, and market availability data across dozens of fields. Decomposing all of that into columns is over-engineering for most projects.

The pragmatic approach: store the fields you query most often as proper columns, and preserve the full response as a TEXT column using json.dumps():

Python

import json, sqlite3

# Schema: normalize the columns you filter/sort on,
# store the full response for everything else
conn.execute("""
    CREATE TABLE IF NOT EXISTS tracks (
        id          INTEGER PRIMARY KEY AUTOINCREMENT,
        track_id    TEXT UNIQUE,
        artist_name TEXT,         -- queryable column
        track_title TEXT,         -- queryable column
        played_at   DATETIME,     -- queryable column
        raw_json    TEXT          -- full Spotify API response
    )
""")

# Store: serialize dict → string
track = response.json()
conn.execute("""
    INSERT INTO tracks (track_id, artist_name, track_title, played_at, raw_json)
    VALUES (?, ?, ?, ?, ?)
""", (
    track['id'],
    track['artists'][0]['name'],
    track['name'],
    played_at,
    json.dumps(track)          # full response preserved
))

# Retrieve: deserialize string → dict when you need nested fields
cursor = conn.execute("SELECT raw_json FROM tracks WHERE artist_name = ?", ("Radiohead",))
for row in cursor:
    full_track = json.loads(row[0])
    print(full_track['album']['images'][0]['url'])  # access any nested field

This pattern keeps your schema simple while ensuring you never lose data. If you later need a field you didn't normalize (say, track['popularity']), it's already in raw_json. No need to re-fetch from the API.

DATETIME for Timestamps

Use DATETIME (or TEXT with ISO format) for timestamps. SQLite stores dates as text in ISO 8601 format (2024-03-15 14:30:22), which sorts correctly and works with date functions.

SQL

-- Good: DATETIME with DEFAULT for automatic timestamps
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
played_at DATETIME,
last_updated DATETIME

-- Works with date functions
WHERE played_at > datetime('now', '-7 days')
WHERE date(created_at) = date('now')

The Common Mistake: TEXT for Everything

Beginning developers often use TEXT for all columns because it's simple and works initially. But this breaks sorting, filtering, and math:

play_count as TEXT: "10" sorts before "2" (alphabetically), and you can't do WHERE play_count > 5

temperature as TEXT: You can't calculate averages, ranges, or trends

timestamp as TEXT (non-ISO format): "03/15/2024" doesn't sort correctly or work with date functions

Choose data types based on operations, not convenience. Five minutes of planning saves hours of refactoring later.

Indexing Strategies: Speed vs Storage

Indexes make queries fast but slow down writes and consume disk space. The art is knowing which columns to index and which to leave alone.

Index Columns Used in WHERE Clauses

If you frequently filter by a column, index it. Queries with WHERE location = 'Dublin' benefit from an index on location. Without an index, SQLite scans every row.

SQL

-- Frequent query: find tracks by artist
SELECT * FROM tracks WHERE artist_name = 'Radiohead'

-- Create index to speed it up
CREATE INDEX idx_artist_name ON tracks(artist_name)

Index Columns Used in ORDER BY

If you sort by a column frequently, index it. ORDER BY played_at DESC runs faster with an index on played_at. Include DESC in the index definition if you always sort descending.

SQL

-- Frequent query: get recent plays
SELECT * FROM listening_history ORDER BY played_at DESC LIMIT 20

-- Index optimized for descending order
CREATE INDEX idx_played_at ON listening_history(played_at DESC)

Composite Indexes for Multiple Columns

When queries filter by multiple columns together, create a composite index. The column order matters. Put the most selective column first (the one that narrows results the most).

SQL

-- Frequent query: get Dublin's recent weather
SELECT * FROM weather_history 
WHERE location = 'Dublin' 
ORDER BY timestamp DESC 
LIMIT 1

-- Composite index covering both columns
CREATE INDEX idx_location_timestamp 
ON weather_history(location, timestamp DESC)

-- This index helps queries filtering by location alone
-- But doesn't help queries filtering by timestamp alone

How Composite Indexes Work

A composite index on (location, timestamp) is like a phone book sorted by last name, then first name. You can find "Smith, John" quickly. You can find all Smiths quickly. But you can't find all Johns quickly. You'd need to scan every last name.

Similarly, an index on (location, timestamp) helps queries filtering by location, or location + timestamp. But it doesn't help queries filtering only by timestamp. Column order matters.

Don't Index Everything

Indexes aren't free. Each index makes INSERT, UPDATE, and DELETE slower because SQLite must update the index too. Only index columns you query frequently.

When to Add Indexes

Always index: Foreign keys, columns used in WHERE clauses frequently, columns used in ORDER BY

Consider indexing: Columns used in JOIN conditions, columns used in GROUP BY

Don't index: Columns you never filter or sort by, small tables (< 1000 rows), columns with very few unique values (boolean flags)

For the weather cache: index location and timestamp. For the music project: index artist_name, track_id, and played_at. These are the columns you'll query most often.

Advanced: Profiling Query Performance with EXPLAIN QUERY PLAN

When a query feels slow, use EXPLAIN QUERY PLAN to see how SQLite executes it. This shows whether indexes are being used and where bottlenecks occur.

SQL

-- Check how SQLite executes a query
EXPLAIN QUERY PLAN
SELECT * FROM weather_history 
WHERE location = 'Dublin' 
ORDER BY timestamp DESC 
LIMIT 10

Terminal Output - Without Index

QUERY PLAN
`--SCAN weather_history
   `--USE TEMP B-TREE FOR ORDER BY

SCAN weather_history means SQLite is reading every row in the table (a full table scan). USE TEMP B-TREE FOR ORDER BY means it's creating a temporary structure to sort results. Both operations are slow with large tables.

Now add an index and check again:

SQL

-- Create the index
CREATE INDEX idx_location_timestamp 
ON weather_history(location, timestamp DESC);

-- Check the query plan again
EXPLAIN QUERY PLAN
SELECT * FROM weather_history 
WHERE location = 'Dublin' 
ORDER BY timestamp DESC 
LIMIT 10

Terminal Output - With Index

QUERY PLAN
`--SEARCH weather_history USING INDEX idx_location_timestamp (location=?)

SEARCH... USING INDEX idx_location_timestamp means SQLite is using the index to find matching rows directly. No table scan, no temporary sorting structure. This query now executes in milliseconds instead of seconds.

Reading Query Plans

SCAN table: Full table scan, reading every row. Slow with large tables.

SEARCH table USING INDEX: Using an index to find rows. Fast.

USE TEMP B-TREE FOR ORDER BY: Creating a temporary structure to sort. Can be slow. Add an index on the ORDER BY column.

COVERING INDEX: The index contains all columns needed for the query, so SQLite doesn't need to read the table at all. Fastest possible query.

Use EXPLAIN QUERY PLAN when a query takes more than a few hundred milliseconds. If you see SCAN on a large table or TEMP B-TREE for sorting, add an index on the appropriate columns. Profiling removes guesswork from optimization.

Normalization: When to Split Tables

Normalization is the process of splitting data across multiple tables to eliminate redundancy. Denormalization is deliberately keeping redundant data in a single table for simplicity or performance. Both have their place.

Here's the weather cache in a single table (denormalized):

SQL - Denormalized (Single Table)

CREATE TABLE weather_history (
    id INTEGER PRIMARY KEY,
    location TEXT,           -- "Dublin" repeated 1000 times
    country TEXT,            -- "Ireland" repeated 1000 times
    latitude REAL,           -- 53.3498 repeated 1000 times
    longitude REAL,          -- -6.2603 repeated 1000 times
    timezone TEXT,           -- "Europe/Dublin" repeated 1000 times
    temperature REAL,
    conditions TEXT,
    timestamp DATETIME
)

Every Dublin weather reading stores "Dublin", "Ireland", 53.3498, -6.2603, "Europe/Dublin". This data repeats 1000 times. It wastes space and makes updates difficult (if Dublin's timezone changes, you must update 1000 rows).

Here's the normalized version with two tables:

SQL - Normalized (Two Tables)

-- Locations table (3 rows for Dublin, London, Paris)
CREATE TABLE locations (
    id INTEGER PRIMARY KEY,
    city TEXT NOT NULL,
    country TEXT NOT NULL,
    latitude REAL,
    longitude REAL,
    timezone TEXT
)

-- Weather readings reference locations by ID
CREATE TABLE weather_readings (
    id INTEGER PRIMARY KEY,
    location_id INTEGER NOT NULL,     -- Just stores 1, 2, or 3
    temperature REAL,
    conditions TEXT,
    timestamp DATETIME,
    FOREIGN KEY (location_id) REFERENCES locations(id)
)

Figure 15.4: An Entity Relationship Diagram (ERD). One location links to many weather readings. The crow's foot symbol on the right means "one or many."

Now location data appears once. Dublin's information is stored in one row in the locations table, and weather readings reference it by ID. To get complete data, use a JOIN:

SQL

SELECT locations.city, locations.country,
       weather_readings.temperature, weather_readings.conditions
FROM weather_readings
JOIN locations ON weather_readings.location_id = locations.id
WHERE locations.city = 'Dublin'
ORDER BY weather_readings.timestamp DESC

Trade-offs: Normalized vs Denormalized

Normalized (multiple tables):

Less storage (location data stored once)
Easier updates (change Dublin's timezone in one place)
Requires JOINs (slightly more complex queries)
Better for data that changes or has many attributes

Denormalized (single table):

More storage (repeated data)
Harder updates (must update all rows with that location)
Simpler queries (no JOINs needed)
Better for read-heavy workloads or simple schemas

When to Normalize

Normalize when: The same data repeats many times (artist names, album titles, location details), you need to update that data independently, you're tracking relationships (tracks belong to albums, albums belong to artists)

Keep denormalized when: The data is small and simple (weather cache with just city names), you never update the repeated data, JOIN queries feel too complex for your use case, performance measurements show denormalized is faster

For the weather cache: denormalized works fine. For the music project: normalize artists and albums into separate tables. You'll have thousands of tracks by the same artist, and you want to track artist-level statistics.

Schema Migrations: Evolving Your Database

Requirements change. You realize you need a new column, or you need to split a table, or you need to change a data type. Schema migrations let you evolve your database without losing data or starting over.

SQLite makes simple migrations easy:

Adding a Column

The simplest migration. Use ALTER TABLE ADD COLUMN to add new fields without affecting existing data.

Python

def migrate_add_feels_like_column():
    """Add 'feels_like' temperature column to weather_history."""
    with sqlite3.connect('weather_cache.db') as conn:
        # Check if column already exists
        cursor = conn.execute("PRAGMA table_info(weather_history)")
        columns = [row[1] for row in cursor.fetchall()]
        
        if 'feels_like' not in columns:
            conn.execute("""
                ALTER TABLE weather_history 
                ADD COLUMN feels_like REAL
            """)
            print("Added 'feels_like' column")
        else:
            print("'feels_like' column already exists")

migrate_add_feels_like_column()

Existing rows get NULL for the new column. Future INSERTs can provide a value. This migration is safe and reversible (you can ignore the column if you change your mind).

Renaming a Column

SQLite 3.25+ supports ALTER TABLE RENAME COLUMN. For older versions, you need to recreate the table.

Python

def migrate_rename_conditions_to_description():
    """Rename 'conditions' column to 'description'."""
    with sqlite3.connect('weather_cache.db') as conn:
        try:
            # Modern SQLite (3.25+)
            conn.execute("""
                ALTER TABLE weather_history 
                RENAME COLUMN conditions TO description
            """)
            print("Renamed 'conditions' to 'description'")
        except sqlite3.OperationalError:
            print("Column rename not supported - update SQLite or use table recreation")

Complex Changes: The Recreation Pattern

For changes SQLite doesn't support directly (changing data types, adding NOT NULL to existing columns, removing columns), recreate the table with the new schema and copy data over.

Python

def migrate_change_temperature_precision():
    """
    Change temperature from REAL to storing as INTEGER (celsius * 10).
    This increases precision and saves space.
    """
    with sqlite3.connect('weather_cache.db') as conn:
        # Step 1: Create new table with updated schema
        conn.execute("""
            CREATE TABLE weather_history_new (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                location TEXT NOT NULL,
                temperature_x10 INTEGER NOT NULL,  -- Store celsius * 10
                conditions TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        
        # Step 2: Copy data with transformation
        conn.execute("""
            INSERT INTO weather_history_new (id, location, temperature_x10, conditions, timestamp)
            SELECT id, location, CAST(temperature * 10 AS INTEGER), conditions, timestamp
            FROM weather_history
        """)
        
        # Step 3: Drop old table
        conn.execute("DROP TABLE weather_history")
        
        # Step 4: Rename new table to original name
        conn.execute("ALTER TABLE weather_history_new RENAME TO weather_history")
        
        # Step 5: Recreate indexes
        conn.execute("""
            CREATE INDEX idx_location_timestamp 
            ON weather_history(location, timestamp DESC)
        """)
        
        print("Migration complete: temperature now stored as INTEGER * 10")

The Table Recreation Pattern

This five-step pattern handles complex migrations safely:

1. Create new table with desired schema
2. Copy data from old table (with transformations if needed)
3. Drop old table
4. Rename new table to original name
5. Recreate indexes

The transaction ensures all-or-nothing: either the entire migration succeeds, or it rolls back and your original table remains unchanged. This pattern handles schema changes SQLite doesn't support directly.

Migration Best Practices

Version your migrations: Name them migration_001_add_feels_like.py, migration_002_rename_conditions.py so you know what order to run them

Test on a copy first: Copy your database file, run the migration on the copy, verify the results before touching production data

Keep migrations small: One logical change per migration. Easier to debug and safer to run

Back up before big changes: Complex migrations that drop tables are risky. Back up your .db file first

For the music project, you'll add columns as you discover new Spotify API fields worth tracking. These simple migrations (ADD COLUMN) are low-risk and don't require table recreation.

Production Considerations

Finally, some guidelines for databases that need to run reliably over time:

Plan for Growth

Design schemas assuming data will grow. Will your queries still be fast with 100,000 rows? Are your indexes covering the queries you'll run most often? Can you delete old data without corrupting relationships?

Use Constraints

NOT NULL, UNIQUE, and FOREIGN KEY constraints prevent bad data from entering your database. They're documentation that enforces itself. Better to get an IntegrityError than to silently store invalid data.

SQL

CREATE TABLE tracks (
    id INTEGER PRIMARY KEY,
    spotify_id TEXT NOT NULL UNIQUE,      -- Must exist, must be unique
    title TEXT NOT NULL,                  -- Must exist
    artist_id INTEGER NOT NULL,           -- Must exist
    duration_ms INTEGER CHECK(duration_ms > 0),  -- Must be positive
    FOREIGN KEY (artist_id) REFERENCES artists(id)  -- Must reference real artist
)

Document Your Schema

Future you (and anyone else working on the code) needs to understand your design decisions. Add comments explaining non-obvious choices, relationships between tables, and why you normalized or denormalized certain data.

SQL

-- Listening history tracks when user played each song
-- One row per play event (not aggregated - we calculate stats in queries)
-- Kept denormalized (track title, artist name stored here) for simpler queries
-- and because track metadata rarely changes after initial insert
CREATE TABLE listening_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    track_id TEXT NOT NULL,           -- Spotify track ID
    track_name TEXT NOT NULL,         -- Denormalized for query simplicity
    artist_name TEXT NOT NULL,        -- Denormalized for query simplicity
    played_at DATETIME NOT NULL,      -- When user played this track
    duration_ms INTEGER               -- Track length (for skip detection)
)

Test Your Queries

Before relying on complex queries in production, test them with realistic data volumes. A query that works instantly with 100 rows might take minutes with 100,000 rows if it's missing an index.

Common Performance Pitfalls

Even with good schema design and proper indexes, certain query patterns kill performance. These anti-patterns are easy to write accidentally but devastating at scale. Recognizing them early saves hours of debugging when your application slows down.

The N+1 Query Problem

This happens when you query the database inside a loop. You make one query to get a list of items, then make N additional queries (one per item) to get related data. With 100 items, that's 101 database queries when one would suffice.

Anti-Pattern: N+1 Queries

Python - SLOW (Don't Do This)

import sqlite3

with sqlite3.connect('music.db') as conn:
    # Query 1: Get all artists
    cursor = conn.execute("SELECT id, name FROM artists")
    artists = cursor.fetchall()
    
    # Queries 2-101: Get track count for each artist (100 separate queries!)
    for artist_id, artist_name in artists:
        cursor = conn.execute("""
            SELECT COUNT(*) FROM tracks WHERE artist_id = ?
        """, (artist_id,))
        track_count = cursor.fetchone()[0]
        print(f"{artist_name}: {track_count} tracks")

# Total: 101 database queries for 100 artists
# With 1000 artists? 1001 queries. This does not scale.

Solution: Single Query with JOIN

Python - FAST (Do This Instead)

import sqlite3

with sqlite3.connect('music.db') as conn:
    # Single query gets everything at once
    cursor = conn.execute("""
        SELECT artists.name, COUNT(tracks.id) as track_count
        FROM artists
        LEFT JOIN tracks ON artists.id = tracks.artist_id
        GROUP BY artists.id, artists.name
        ORDER BY track_count DESC
    """)
    
    for artist_name, track_count in cursor:
        print(f"{artist_name}: {track_count} tracks")

# Total: 1 database query for any number of artists
# 10x-100x faster than the N+1 approach

The optimized version uses a JOIN with GROUP BY to fetch all data in one query. Instead of 101 round trips to the database, you make one efficient query that returns everything you need. This pattern becomes critical when working with hundreds or thousands of records.

The N+1 approach makes one query per artist. The JOIN approach always makes exactly one query — regardless of how many artists exist.

How to Spot N+1 Queries

If you see conn.execute() or cursor.execute() inside a for loop that iterates over database results, you likely have an N+1 problem.

Red flag pattern:

for item in query_results: cursor.execute("SELECT ... WHERE id = ?", (item.id,))

Fix: Rewrite as a single query using JOIN, IN clause, or subquery to fetch all related data at once.

Transactions Inside Loops

Opening and closing database connections (or committing transactions) inside a loop kills performance. Each commit forces SQLite to flush data to disk. When you commit 1000 times in a loop, you're doing 1000 disk writes instead of one.

Anti-Pattern: Transaction Per Insert

Python - SLOW (Don't Do This)

import sqlite3

tracks = [
    ("Song 1", "Artist 1", 180000),
    ("Song 2", "Artist 2", 210000),
    # ... 1000 more tracks
]

# Opening connection inside loop - creates 1000 connections!
for track_name, artist_name, duration in tracks:
    with sqlite3.connect('music.db') as conn:
        conn.execute("""
            INSERT INTO tracks (track_name, artist_name, duration_ms)
            VALUES (?, ?, ?)
        """, (track_name, artist_name, duration))
        # Commits after EVERY insert (1000 disk writes)

# This takes seconds instead of milliseconds

Solution: Batch in Single Transaction

Python - FAST (Do This Instead)

import sqlite3

tracks = [
    ("Song 1", "Artist 1", 180000),
    ("Song 2", "Artist 2", 210000),
    # ... 1000 more tracks
]

# Single connection, single transaction for all inserts
with sqlite3.connect('music.db') as conn:
    for track_name, artist_name, duration in tracks:
        conn.execute("""
            INSERT INTO tracks (track_name, artist_name, duration_ms)
            VALUES (?, ?, ?)
        """, (track_name, artist_name, duration))
    
    # Commits once after all inserts complete (1 disk write)

# 50x-100x faster than committing per insert

By wrapping all inserts in a single transaction, you reduce 1000 disk writes to one. The context manager (with sqlite3.connect()) commits automatically when the block exits. All inserts succeed or fail together, which also gives you better error handling.

Even Better: Use executemany()

For bulk inserts, executemany() is even faster:

with sqlite3.connect('music.db') as conn: conn.executemany(""" INSERT INTO tracks (track_name, artist_name, duration_ms) VALUES (?, ?, ?) """, tracks)

This sends all rows to SQLite in one batch, letting the database optimize the operation internally. Use this for inserting dozens or hundreds of rows at once.

Performance Anti-Pattern Checklist

🚫 N+1 Queries: conn.execute() inside a loop over query results → Use JOIN or IN clause instead

🚫 Transaction Per Operation: Opening connections inside loops → Open connection once, batch operations

🚫 Missing Indexes: Slow queries with WHERE or ORDER BY on unindexed columns → Add indexes

🚫 SELECT *: Fetching all columns when you only need a few → Select specific columns

These patterns are easy to write but expensive at scale. Spotting them during code review saves debugging time later when your application slows down with real data volumes.

For the Music Time Machine project, you'll insert hundreds of listening history records at once (when syncing Spotify data). Use executemany() to batch those inserts in a single transaction. When displaying artist statistics, use GROUP BY queries instead of looping over artists. These patterns keep your application fast even with thousands of tracks.

You now understand the principles that guide schema design: choosing appropriate data types, indexing strategically, deciding when to normalize, and evolving schemas safely through migrations. These aren't rigid rules. They're trade-offs you evaluate based on your specific use case.

The weather cache taught you the mechanics of SQLite and Python integration. This section taught you the judgment to design schemas that work well at scale. Both skills combine in Chapter 16's music project, where you'll design a more complex schema with multiple related tables and sophisticated queries.

Next, we'll review everything you've learned in this chapter and prepare you for the portfolio project ahead.

6. Chapter Summary

You've transformed your API scripts into applications with memory. The weather cache you built demonstrates everything databases enable: storing API responses, checking data freshness, avoiding unnecessary network calls, and analyzing trends over time. These patterns form the foundation of every data-driven application.

More importantly, you learned when and why to use databases. Not every script needs SQLite. But when you need persistence, historical tracking, or complex queries across accumulated data, databases are the right tool. You now know how to choose that tool, design schemas that work at scale, and write Python code that interacts with databases safely and professionally.

Key Skills Mastered

Understanding When Databases Matter

You can distinguish between scripts (ephemeral, forget everything) and applications (persistent, remember state). You know when SQLite is the right choice (single-user apps, prototypes, portfolio projects) versus when you need PostgreSQL (multi-user services, production scale). You understand that databases aren't overhead. They're the difference between a demo and a tool people actually use.

Writing Essential SQL Operations

You can write CREATE TABLE statements with appropriate data types and constraints. You can INSERT data with parameterized queries that prevent SQL injection. You can SELECT data with WHERE clauses for filtering, ORDER BY for sorting, LIMIT for pagination, and aggregation functions (AVG, COUNT, MIN, MAX) with GROUP BY for analytics. You understand indexes and when they speed up queries versus when they're unnecessary overhead.

Using Python's sqlite3 Module Professionally

You use context managers (with sqlite3.connect()) to guarantee proper connection cleanup. You understand the difference between fetchone(), fetchall(), and iteration for retrieving results. You know when to commit transactions and when to rollback. You catch specific exceptions (IntegrityError, OperationalError) and handle them appropriately. You validate data before insertion rather than letting the database reject it.

Building Cache Logic That Actually Works

You can implement cache freshness checks using SQLite's date functions (datetime('now', '-1 hour')). You understand the cache-or-fetch pattern: check database first, return cached data if fresh, fetch from API if stale or missing, then save the response for next time. You know how to combine WHERE clauses with ORDER BY and LIMIT to get the most recent cached entry efficiently.

Designing Schemas That Scale

You choose data types based on operations (INTEGER for math, REAL for decimals, TEXT for strings, DATETIME for timestamps). You create indexes for columns used in WHERE clauses and ORDER BY statements. You understand the normalization trade-off: split tables to eliminate redundancy versus keep denormalized for simpler queries. You can write migrations to add columns, rename fields, or recreate tables with updated schemas without losing data.

Querying Historical Data for Analytics

You can write GROUP BY queries to calculate daily averages, trends over time, and statistics by category. You use SQLite's date functions to group readings by day, week, or month. You understand how to combine WHERE filters with GROUP BY aggregations to answer questions like "What was Dublin's average temperature each day last week?" These patterns transform accumulated API data into insights.

Writing Production-Ready Database Code

You always use parameterized queries to prevent SQL injection. You always use context managers to prevent resource leaks. You validate data before insertion. You catch specific exceptions and provide useful error messages. You document your schema decisions. You test queries with realistic data volumes before relying on them in production. These aren't optional polish. They're the difference between code that works in demos and code that runs reliably for months.

Chapter Review Quiz

Test your understanding of database fundamentals with these questions. Each question reinforces a key concept from the chapter.

Select question to reveal the answer:

Why would you use a database for an API-powered application instead of just storing data in Python variables or files?

Databases provide persistence, efficient querying, and built-in aggregation: Variables disappear when scripts end. Files persist but require manual parsing. Databases store data in queryable structures with indexes for instant lookups, aggregation functions for analytics (AVG, COUNT, GROUP BY), and persistence across runs. The weather cache demonstrates this: without SQLite, you'd either waste API calls re-fetching the same data or implement complex file handling with no query capabilities. Databases handle storage, retrieval, expiration checks, and analytics in SQL instead of requiring you to write all that logic in Python.

What's wrong with this query: conn.execute(f"SELECT * FROM weather WHERE location = '{city}'")?

SQL injection vulnerability from f-string interpolation: If city contains malicious input like "Dublin'; DROP TABLE weather; --", the query becomes SELECT * FROM weather WHERE location = 'Dublin'; DROP TABLE weather; --'. This executes two statements: the SELECT, then DROP TABLE (deleting your entire table). The -- comments out the rest. Always use parameterized queries: conn.execute("SELECT * FROM weather WHERE location = ?", (city,)). The ? placeholder treats input as data, not executable code. This isn't optional. It's a fundamental security requirement.

You run a query that filters by artist_name thousands of times, and it's slow. You have 50,000 rows. What's the likely problem and how do you fix it?

Missing index causes full table scans: Without an index on artist_name, SQLite checks every row to find matches: 50,000 comparisons per query. Create an index: CREATE INDEX idx_artist_name ON tracks(artist_name). Now SQLite can jump directly to matching rows, like using a book's index instead of reading every page. The query goes from seconds to milliseconds. General rule: index columns used in WHERE clauses, ORDER BY statements, and JOIN conditions if the table has more than a few hundred rows. Don't index every column because indexes speed up reads but slow down writes.

What happens if you forget to call conn.commit() after an INSERT statement?

Uncommitted transactions are lost when connections close: The INSERT exists only in memory as a pending transaction. When the connection closes, SQLite automatically rolls back uncommitted changes. Use context managers to commit automatically: with sqlite3.connect('data.db') as conn: commits on success, rolls back on exceptions. Without context managers, you must call conn.commit() explicitly or your data disappears. SELECT queries don't need commits (they don't modify data), but INSERT, UPDATE, and DELETE require commits to persist changes.

You're caching weather data and want to return cached results if they're less than 1 hour old. How do you check cache freshness in SQL?

Use SQLite's datetime functions with relative time modifiers: SELECT * FROM weather_history WHERE location = ? AND fetched_at > datetime('now', '-1 hour') ORDER BY fetched_at DESC LIMIT 1. The function datetime('now', '-1 hour') calculates "one hour ago from right now." The WHERE clause filters to rows fetched after that time (within the last hour). ORDER BY fetched_at DESC LIMIT 1 gets the most recent entry. If this returns a row, use cached data. If it returns None, the cache is stale: fetch fresh data from the API. This pattern is why timestamps are critical in cache tables.

When should you normalize your database (split data across multiple tables) versus keeping everything in one table?

Normalize when data repeats and updates independently, denormalize for simplicity: Normalize when the same data repeats many times (artist names in thousands of track rows), you need to update it independently (changing an artist's country shouldn't require updating 1000 rows), or you're tracking relationships (tracks belong to albums, albums belong to artists). Keep denormalized when data is simple (just city names), you never update it (weather readings are historical), JOIN queries feel too complex, or the table is small (under 10,000 rows). The weather cache works fine denormalized. The music project should normalize artists and albums to avoid redundant data and enable artist-level statistics.

You need to add a new column to an existing table with 10,000 rows of production data. What's the safest way to do this?

Use ALTER TABLE ADD COLUMN with proper precautions: ALTER TABLE weather_history ADD COLUMN feels_like REAL is safe because existing rows get NULL (or the DEFAULT value if specified), the operation is fast (updates metadata only, doesn't rewrite data), and it's reversible. Before running in production: back up the database file, test on a copy first, check if the column already exists (prevents errors if migration runs twice), and update your code to handle both NULL and populated values. For complex changes (changing data types, adding NOT NULL constraints, removing columns), use the table recreation pattern from Section 5.

What's the difference between fetchone(), fetchall(), and iterating over the cursor? When would you use each?

Choose based on result size and processing needs: fetchone() returns the next row or None. Use it for single results (LIMIT 1 queries, existence checks, user lookups). fetchall() returns all rows as a list. Use it for small result sets you need all at once (analytics with limited results, top 10 lists, comparing cities). Iteration (for row in cursor:) processes rows one at a time without loading everything into memory. Use it for large result sets (10,000+ rows), streaming to files/APIs, or when memory efficiency matters. General rule: fetchone() for singles, fetchall() for small batches, iteration for large volumes.

Looking Forward

You've mastered two of the three pillars for the music project: OAuth authentication (Chapter 14) and database persistence (this chapter). The third pillar, building a web interface with Flask, comes in Chapter 17. But first, you need to combine OAuth and databases in a real project.

Chapter 16: Building the Spotify Music Time Machine brings everything together. You'll authenticate with Spotify using OAuth, fetch your listening history from their API, store it in SQLite using the patterns from this chapter, and build features that require historical data: "This Week Last Year" playlists, musical evolution tracking, hidden gems discovery, and listening statistics.

The database skills you practiced here transfer directly. Instead of caching weather by location and timestamp, you'll cache tracks by artist and play date. Instead of temperature trends, you'll analyze genre evolution. Instead of comparing cities, you'll compare time periods. The SQL patterns are identical. Only the domain changes.

This is where your portfolio project begins. Chapter 16 gives you a functional music analytics tool that tracks data over time. Chapter 17 adds a web dashboard with interactive charts. Chapter 18 deploys it to production with a live URL. By the end of Part III, you'll have a complete application worth showing to recruiters.

The weather cache was practice. The music project is the real thing. Everything you learned here (schema design, cache freshness, historical queries, migrations) powers that project. You're ready.

When to Graduate from SQLite

You've learned how to use SQLite effectively for the projects in this book: weather caches, music analytics, personal dashboards. But part of professional database knowledge is understanding the tool's boundaries. SQLite is production-grade software that powers billions of devices, but it's not the right choice for every project.

Here are the signals that indicate it's time to graduate from SQLite to a client-server database like PostgreSQL or MySQL:

Multiple Writers

SQLite allows only one writer at a time. If your application needs multiple processes or servers writing to the database simultaneously (e.g., a web app with 10 workers handling concurrent requests), you need a client-server database. PostgreSQL and MySQL handle concurrent writes gracefully.

Pro Tip: WAL Mode for Better Concurrency

If you hit the "one writer at a time" limit but aren't ready to migrate to PostgreSQL, enable Write-Ahead Logging (WAL) mode. WAL lets readers and writers operate simultaneously without blocking each other — a significant upgrade for read-heavy applications like API caches:

Python

with sqlite3.connect('weather.db') as conn:
    # Enable WAL mode — readers never block writers, writers never block readers.
    # Persists for the lifetime of the database file (set it once).
    conn.execute("PRAGMA journal_mode=WAL")

    # Also consider enabling foreign key enforcement (off by default):
    conn.execute("PRAGMA foreign_keys=ON")

WAL mode is a drop-in improvement with no code changes required beyond the one-line PRAGMA. It won't turn SQLite into PostgreSQL — you still can't have two concurrent writes at the exact same millisecond — but it eliminates most "database is locked" errors you'd encounter serving a small Flask or FastAPI app with a handful of concurrent users.

Network Access

SQLite databases are files. They can't be accessed over a network. If your frontend runs on one server and your database needs to be on another (common in production), you need a database server that accepts network connections. SQLite requires the database file to be on the same machine as the application.

Database Size Over 100GB

SQLite can technically handle terabyte-sized databases, but performance degrades. Client-server databases are optimized for large datasets with better caching, query optimization, and memory management. If your database approaches 100GB, consider PostgreSQL.

Team Size Grows

When multiple developers need database access for debugging, analytics, or migrations, a centralized database server simplifies access control. SQLite's file-based nature makes collaboration harder. Who has the current version? Did someone's migration script run? A centralized database solves these coordination problems.

Advanced Features Required

PostgreSQL and MySQL offer features SQLite doesn't: full-text search (PostgreSQL's tsvector), JSON querying (PostgreSQL's JSONB), geographic data (PostGIS), stored procedures, triggers with more power, and user-level permissions. If your project needs these, graduate to a feature-rich database.

Real-World Decision Framework

Stick with SQLite when:

Single-user applications (desktop apps, CLI tools, personal projects)
Read-heavy workloads with infrequent writes (caching, analytics)
Embedded applications (mobile apps, IoT devices)
Prototypes and MVPs (iterate fast, migrate later if needed)
Development/testing environments (fast setup, zero configuration)

Graduate to PostgreSQL/MySQL when:

Multi-user web applications with concurrent writes
Production services with separate database servers
Applications requiring high write throughput (over 100 writes/second sustained)
Projects where multiple developers need direct database access
Applications using advanced database features (full-text search, JSON queries, GIS)

The good news: Migrating from SQLite to PostgreSQL is straightforward. Your SQL queries remain largely the same because both support standard SQL. The main changes are connection strings (from sqlite3.connect('db.db') to psycopg2.connect('postgresql://...')) and minor syntax differences (PostgreSQL uses SERIAL instead of AUTOINCREMENT, TRUE instead of 1).

For interviews, understanding when to choose SQLite vs PostgreSQL demonstrates system design knowledge. You're not just "using databases." You're making informed architectural decisions based on requirements.

Common Interview Question

"Why did you choose SQLite for your music analytics project?"

Good answer: "It's a single-user application with read-heavy workloads, which is perfect for SQLite. The database caches Spotify API data and serves analytics queries, with writes happening only when I fetch new listening history (a few times per day). SQLite handles this elegantly with zero configuration. If I were building a multi-user web service with concurrent writes, I'd use PostgreSQL, but for a personal analytics tool, SQLite is the right choice."

The Music Time Machine project (Chapter 16) uses SQLite because it's single-user and read-heavy. But when you deploy it as a web app (Chapter 18), you'll see exactly where the boundaries are, and you'll understand when to graduate.