Chapter 16: Building the Spotify Music Time Machine

Your First Portfolio-Ready Application

1. The Music Time Machine Vision

Your first end-to-end portfolio app starts here: the Spotify Music Time Machine.

In the next 15 minutes, you will authenticate with Spotify and generate your first playlist. Then you will turn that quick win into a portfolio-ready application that combines OAuth, SQLite, and real historical analysis. You will design a database schema optimized for time-based snapshots, build four production features, implement robust error handling, and write automated tests with mocks, bringing together techniques you learned in previous chapters. If you have jumped directly to this chapter, you should be comfortable with basic Python and have seen OAuth before. The database fundamentals are taught inside the chapter.

Spotify Music Time Machine

What were you listening to this week last year? What about three years ago? If you're like most people, you have no idea. Those songs are gone, not from Spotify's servers, but from your memory. The playlists you obsessed over in 2021 sit forgotten while your current favorites dominate your queue.

Spotify gives you short-term recaps and endless recommendations, but it is surprisingly hard to revisit the music that defined earlier chapters of your life. You can remember an era, but not the exact tracks that were on repeat.

This chapter fixes that problem. You will build the Music Time Machine, a tool that tracks listening history over time, surfaces forgotten gems, captures monthly snapshots of your musical identity, and creates playlists using audio analysis. By the end, you will have an application that feels valuable to use, not just impressive to demonstrate.

What You'll Build

The Music Time Machine consists of four core features:

Forgotten Gems

Rediscover songs you loved months ago but haven't heard recently. The system compares your long-term favorites against your current listening and surfaces tracks that fell off your rotation.

Currently Obsessed

Capture monthly snapshots of your musical identity. The system saves your top tracks into dated playlists: "November 2024 Obsession," "December 2024 Obsession." A year later, you have a perfect record of what you were listening to during specific moments.

Mood-Based Playlists

Generate playlists using Spotify's audio features (energy, valence, danceability, tempo). Want a workout playlist? The system finds high-energy, high-tempo tracks. Need focus music? Low energy, moderate tempo. The algorithm scores your listening history and creates perfectly curated playlists.

Musical Evolution Analytics

Track how your taste changes over time. The system calculates track turnover rates, identifies genre shifts, and reveals statistical patterns. You can see yourself evolving from indie rock to electronic, or notice that summer playlists are more upbeat.

Why This Matters for Your Portfolio

These aren't toy features. Forgotten Gems demonstrates set operations and historical data analysis. Monthly snapshots show time-series database design. Mood playlists prove you can build algorithmic curation systems. Analytics display your ability to query and visualize accumulated data.

More importantly, this is an application people actually want to use. When you show this to a recruiter, they immediately understand the value.

Why Spotify Doesn't Provide This

Spotify is designed around discovery and staying engaged in the present. The product nudges you toward new listening through features like Discover Weekly, Release Radar, and Daily Mixes.

A deep "time machine" experience is a different kind of value. Looking back is meaningful, but it does not drive day-to-day discovery in the same way. Spotify does provide the building blocks through its API, such as top tracks, recently played, and audio features, but it does not package them into a consumer tool focused on long-term rediscovery.

That gap is the opportunity. The data is available through the API, and you can combine it with your own saved snapshots to build the tools you actually want. That is what this chapter is about: using the Spotify API to create features that serve your goals, not just the default product loop.

How the Pieces Fit Together

The Music Time Machine combines everything you've learned in previous chapters into one cohesive application:

OAuth Authentication (Chapter 14)

You'll use Spotify's OAuth 2.0 flow to get permission to read listening history and create playlists. The authorization concepts from Chapter 14 apply directly here.

API Integration (Chapters 1-9)

You'll fetch data from Spotify's API, handle rate limits, parse JSON responses, and implement retry logic. The error handling patterns from Chapter 9 prevent crashes when Spotify's servers are slow or return unexpected data.

Database Persistence (Chapter 15)

SQLite stores everything: monthly snapshots, track metadata, audio features, generated playlists. The schema design decisions you learned in Chapter 15 apply directly here.

Production Patterns (Chapters 9, 14, 15)

This is an application that runs repeatedly over months, accumulates data, handles errors gracefully, and provides consistent results. The production mindset you've been developing throughout the book culminates here.

In Chapter 17, you'll add a Flask web interface with interactive charts to visualize your musical evolution. In Chapter 20, you'll deploy the entire application to production with a live URL. But this chapter focuses on the core: OAuth, API integration, database design, and the features that make the Time Machine useful.

2. Building Responsibly

Privacy and Data Ethics

Before you build tools that access personal data, you need to understand your responsibilities. The Music Time Machine stores listening snapshots, audio preferences, and playlist metadata. This kind of data can reveal patterns about mood, routines, and identity, so it deserves careful handling.

Professional developers do not just build features. They think about privacy implications, minimize data collection, and give users control over their information. These principles matter whether you're building a personal project or a production application.

Data Minimization: Collect Only What You Need

The Spotify API returns large JSON responses. Each track object includes many fields you will never use. Storing everything wastes space and increases privacy risk. Professional practice is to extract and store only the fields your features require. The Music Time Machine stores track IDs, names, artists, audio features (energy, valence, tempo), and timestamps. It ignores artwork URLs, preview URLs, and other metadata you do not use.

When designing your schema, ask: "Could I justify why I stored this field?" If not, exclude it. Data minimization also aligns with GDPR principles.

Explicit Consent Through OAuth Scopes

Spotify's OAuth flow uses permission scopes so users can see what an app is requesting and decide whether to grant it. The Music Time Machine requests only the scopes it needs: user-top-read, user-read-recently-played, playlist-modify-public, and playlist-modify-private. It does not request unrelated scopes because those features do not exist in the application.

Never request permissions "just in case." Users notice, and recruiters reviewing your portfolio will too. Precise scope selection shows least-privilege thinking.

Local Storage and User Control

The Music Time Machine stores data in a local SQLite database on your machine. It builds long-term history by saving your own snapshots over time, rather than trying to pull an unlimited archive from the API. For a personal project, local-first storage gives clear user control. Want to delete your history? Delete the database file. Want to inspect what is stored? Open the SQLite database in any browser. Want to stop syncing? Stop running the script.

If you later host this as a web application, you will need explicit deletion features, export tools, and clear privacy documentation.

Security Considerations

OAuth tokens grant access to your Spotify account. Treat them like credentials. Spotipy stores tokens in a cache file in the project folder by default. Never commit token caches or secrets to version control. Your .gitignore should include .cache and any secrets files you use (for example, .env). Consider locking down token-cache file permissions on your machine as well.

The database contains personal listening data. Do not share it publicly or upload it to repositories. If you're demonstrating the project, use anonymized sample data or blur sensitive information in screenshots.

Why Building Responsibly Matters for Your Portfolio

When you present this project to recruiters, privacy awareness differentiates you. You can say:

"I designed this following GDPR principles: data minimization by storing only necessary fields, explicit consent through precise OAuth scopes, and user control through local-first storage. I also kept credentials out of version control and treated OAuth tokens as sensitive data."

This demonstrates professional judgment. Companies hiring API developers need people who understand that access to user data comes with responsibility.

3. Chapter Overview

Chapter Roadmap

This chapter builds your first portfolio-ready project through six progressive stages: rapid prototype, conceptual foundation, schema design, feature implementation, production hardening, and automated testing. Here's the journey:

Your First Playlist in 15 Minutes

Section 4 • Quick Start

Get OAuth authentication working and create your first Spotify playlist. You'll see the entire flow work end-to-end before diving into details, establishing confidence that the API integration functions correctly.

OAuth Setup Quick Win API Authentication

Understanding Spotify's Data Model

Section 5 • API Concepts

Master the concepts that power the Music Time Machine: track URIs for playback, time ranges for historical windows, audio features for algorithmic curation, and pagination for large datasets. These fundamentals inform every design decision that follows.

Track URIs Time Ranges Audio Features

Database Schema Design

Section 6 • Schema Architecture

Design a three-table schema optimized for time-series queries. You'll separate entities (tracks) from events (snapshots), choose strategic denormalization, and create indexes that make historical analysis fast. Good schema design makes complex features simple.

Schema Design Time-Series Data Strategic Indexing

Building Core Features

Section 7 • Feature Implementation

Implement the four features that define the Music Time Machine: Forgotten Gems uses set operations on historical data, Monthly Snapshots automate archiving, Mood Playlists leverage audio feature scoring, and Analytics aggregate accumulated insights. Each feature demonstrates different integration patterns.

Forgotten Gems Mood Playlists Historical Analysis

Production Error Handling

Section 8 • Error Strategies

Transform prototype code into production-ready systems. You'll categorize errors (transient, authorization, rate limit, data), implement retry logic with exponential backoff and jitter, handle database locks gracefully, and enable partial success when full completion fails.

Retry Logic Error Recovery Graceful Degradation

Testing with Mocks

Section 9 • Automated Testing

Write fast, reliable tests that verify functionality without hitting Spotify's servers. You'll use mocks to control what the API returns, test edge cases (empty results, missing data, boundary conditions), and measure coverage to identify untested code. Professional developers test before deploying.

Unit Testing Mocks Test Coverage

Key strategy: You'll build incrementally. Section 4 proves the concept works. Section 5 establishes the mental model. Section 6 designs data persistence. Section 7 implements features. Section 8 hardens for production. Section 9 adds testing. Each stage builds on the previous, teaching you how professional projects evolve from prototype to production.

Learning Objectives

By the end of this chapter, you'll be able to:

Integrate OAuth authentication
Design database schemas for time-series data
Build multi-feature applications
Implement production error handling
Write automated tests using mocks
Use third-party libraries professionally
Create algorithmic curation systems

These competencies combine into portfolio work that demonstrates full-stack thinking.

4. Quick Start: Your First Playlist in 15 Minutes

Before building the complete Music Time Machine, you need a quick win. This section gets you from zero to a working Spotify integration in 15 minutes. You'll authorize with OAuth, fetch your top tracks, and create an actual playlist in your Spotify account.

This quick start establishes three things: (1) OAuth works and you can authenticate, (2) the Spotify API returns real data, and (3) you can create playlists programmatically. Once you have this foundation, you'll understand how all the features build on top of it.

Don't worry about understanding every line of code yet. The goal is to see it work, feel the satisfaction of seeing your playlist appear in Spotify, then dive into the details with confidence.

Step 1: Get Spotify Developer Credentials

Every application that uses Spotify's API needs credentials: a Client ID and Client Secret. These identify your application to Spotify and enable OAuth authentication.

Here's how to get them:

Go to the Spotify Developer Dashboard

Visit https://developer.spotify.com/dashboard and log in with your Spotify account (free or premium works for both).

Create an App

Click "Create app" and fill in the form:

App name: "Music Time Machine" (or whatever you prefer)
App description: "Personal music analytics and playlist generator"
Redirect URI: http://localhost:8888/callback (exactly this, this is where OAuth redirects after authorization)
APIs used: Check "Web API"

Get Your Credentials

After creating the app, click "Settings." You'll see your Client ID displayed. Click "View client secret" to reveal your Client Secret. Copy both. You'll need them in a moment.

Why localhost:8888/callback?

Remember OAuth's redirect URI from Chapter 14? After you authorize the application, Spotify redirects your browser to this URI with an authorization code. The URI doesn't need to be a real web server. It just needs to match what you configured. Spotipy opens a local web server temporarily on port 8888 to catch the redirect and extract the code. This is the same OAuth dance from Chapter 14, but Spotipy handles the details.

Keep Your Client Secret Secret

Your Client Secret is a password. Don't commit it to GitHub, don't share it in screenshots, don't post it in Discord. We'll use environment variables to keep it safe. This is the same pattern you learned in Chapter 7 for API keys.

Step 2: Install Dependencies

You need two Python packages: spotipy (Spotify API wrapper) and python-dotenv (environment variable management).

Terminal

pip install spotipy python-dotenv --break-system-packages

Spotipy abstracts away the HTTP requests, token management, and error handling. You could interact with Spotify's API directly using requests (and you've learned enough to do that), but Spotipy saves you from writing hundreds of lines of boilerplate. You understand what it's doing behind the scenes. Now let it handle the tedious parts.

Step 3: Store Credentials Securely

Create a file called .env in your project directory. This file stores secrets and never gets committed to version control.

.env

SPOTIPY_CLIENT_ID=your_client_id_here
SPOTIPY_CLIENT_SECRET=your_client_secret_here
SPOTIPY_REDIRECT_URI=http://localhost:8888/callback

Replace your_client_id_here and your_client_secret_here with the actual values from the Spotify Developer Dashboard. The redirect URI should match exactly what you configured in Step 1.

Now add .env to your .gitignore file so Git never tracks it:

.gitignore

.env
__pycache__/
*.pyc
.cache
*.db

The .cache file will appear when you run the OAuth flow. It stores your access token locally. You don't want that in version control either.

Step 4: Create Your First Playlist

Now for the moment of truth. This script authenticates with Spotify, fetches your top tracks, and creates a playlist. It's 30 lines of code that demonstrate the entire flow.

Python

"""
Quick Start: Create your first Spotify playlist
This script demonstrates OAuth authentication and basic API usage
"""
import os
from dotenv import load_dotenv
import spotipy
from spotipy.oauth2 import SpotifyOAuth

# Load credentials from .env file
load_dotenv()

# Define the permissions we need
scope = "user-top-read playlist-modify-public playlist-modify-private"

# Create Spotify client with OAuth
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

# Get current user info
user = sp.current_user()
print(f"Authenticated as: {user['display_name']}")

# Fetch top tracks from last 4 weeks
print("\nFetching your top tracks...")
top_tracks = sp.current_user_top_tracks(limit=20, time_range='short_term')

# Extract track URIs (Spotify's unique identifiers)
track_uris = [track['uri'] for track in top_tracks['items']]
track_names = [track['name'] for track in top_tracks['items']]

# Create a new playlist
print("\nCreating playlist...")
playlist = sp.user_playlist_create(
    user=user['id'],
    name="My Top Tracks - Quick Start",
    public=False,
    description="Created by Music Time Machine - My current favorites"
)

# Add tracks to the playlist
sp.playlist_add_items(playlist['id'], track_uris)

print(f"\n✓ Success! Created playlist with {len(track_uris)} tracks")
print(f"✓ Playlist URL: {playlist['external_urls']['spotify']}")
print("\nYour top tracks right now:")
for i, name in enumerate(track_names, 1):
    print(f"  {i}. {name}")

Save this as quick_start.py and run it:

Terminal

python quick_start.py

The OAuth Flow: What Just Happened

When you run the script, here's what happens:

Browser Opens Automatically

Spotipy detects you don't have an access token yet and opens your browser to Spotify's authorization page. You see a page asking "Music Time Machine wants to access your Spotify account" with a list of permissions (read your top tracks, modify playlists).

You Grant Permission

Click "Agree" or "Accept." This is you (the resource owner) authorizing the application to access your data. Remember this from Chapter 14? This is the OAuth authorization step.

Spotify Redirects to localhost

Spotify redirects your browser to http://localhost:8888/callback?code=.... The URL contains an authorization code. Spotipy has started a tiny web server on port 8888 that catches this redirect and extracts the code.

Token Exchange Happens Behind the Scenes

Spotipy exchanges the authorization code for an access token by making a POST request to Spotify's token endpoint. This is the OAuth token exchange from Chapter 14. You don't see it happen, but Spotipy is doing exactly what you learned: sending the code plus client credentials to get tokens.

Token Saved Locally

Spotipy saves the access token in a .cache file. The next time you run the script, it reads the token from cache instead of requiring you to authorize again. The token is valid for 1 hour. When it expires, Spotipy automatically uses the refresh token to get a new one.

API Calls Succeed

With a valid access token, the script fetches your top tracks, creates a playlist, and adds the tracks. All of this uses the access token in the Authorization header of HTTP requests to Spotify's API.

What Spotipy Abstracts

Behind the scenes, Spotipy is doing what you learned in Chapter 14:

Authorization URL: https://accounts.spotify.com/authorize?client_id=...&response_type=code&redirect_uri=...&scope=...

Token Exchange: POST https://accounts.spotify.com/api/token with authorization code and client credentials

API Requests: GET https://api.spotify.com/v1/me/top/tracks with Authorization: Bearer [access_token]

Token Refresh: Automatic refresh when the access token expires, using the refresh token

You understand OAuth because you built it from scratch in Chapter 14. Spotipy saves you from implementing all that plumbing again. This is good engineering: use libraries for solved problems, understand what they do so you can debug when things break.

Terminal Output

Authenticated as: John Smith

Fetching your top tracks...

Creating playlist...

✓ Success! Created playlist with 20 tracks
✓ Playlist URL: https://open.spotify.com/playlist/5a8d2f3...

Your top tracks right now:
  1. Karma Police
  2. Everlong
  3. Mr. Brightside
  4. Dreams
  5. Wonderwall
  6. Smells Like Teen Spirit
  7. Don't Look Back In Anger
  8. Champagne Supernova
  9. Under the Bridge
  10. Black Hole Sun
  11. Jeremy
  12. Interstate Love Song
  13. Glycerine
  14. No Rain
  15. Today
  16. Lightning Crashes
  17. What I Got
  18. Semi-Charmed Life
  19. Closing Time
  20. The Impression That I Get

Check your Spotify app. The playlist is there. It actually worked. You just integrated with a real-world OAuth API, fetched personalized data, and created content in your account. All in 30 lines of Python.

Understanding Scopes: What Permissions Mean

Look at the scope variable in the quick start script:

Python

scope = "user-top-read playlist-modify-public playlist-modify-private"

Scopes define what your application can do. Each scope grants permission for specific operations:

user-top-read

Allows reading your top tracks and artists. Required for the Forgotten Gems and Currently Obsessed features. Without this scope, sp.current_user_top_tracks() returns a 403 Forbidden error.

playlist-modify-public

Allows creating and modifying public playlists. Required for creating playlists that appear in your profile. Without this, sp.user_playlist_create() fails if you try to make a public playlist.

playlist-modify-private

Allows creating and modifying private playlists. Most users prefer private playlists for generated content, so we request this scope too. The quick start script creates private playlists (public=False).

For the complete Music Time Machine, you'll need additional scopes:

Python - Complete Scopes

scope = (
    "user-top-read "                   # Read top tracks and artists
    "user-read-recently-played "       # Read listening history
    "playlist-modify-public "          # Create/modify public playlists
    "playlist-modify-private "         # Create/modify private playlists
    "user-library-read"                # Read saved tracks for mood analysis
)

Principle of Least Privilege

Only request the scopes you actually need. Users see the permission list during OAuth authorization, and excessive permissions make them nervous. If you only read data, don't request write permissions. If you only work with playlists, don't request access to saved albums.

The Music Time Machine legitimately needs all five scopes listed above. Each one enables a specific feature. If you later decide you don't need a scope (maybe you remove the mood playlist feature), update the scope list and users won't be asked for that permission on next authorization.

Troubleshooting: Common First-Run Issues

If something goes wrong, here are the most common problems and their solutions:

Error: "Invalid redirect URI" during authorization

The redirect URI in your .env file doesn't match what you configured in the Spotify Developer Dashboard.

Fix: Go to your app settings in the dashboard and verify the redirect URI is exactly http://localhost:8888/callback. No trailing slash, no https, port 8888 specifically. Then make sure your .env file has: SPOTIPY_REDIRECT_URI=http://localhost:8888/callback

The redirect URI must match character-for-character. Even http://127.0.0.1:8888/callback won't work if you configured localhost.

Error: "Can't connect to localhost:8888" or browser shows "Unable to connect"

After you authorize in the browser, Spotify tries to redirect to localhost:8888 but can't connect. This usually means Spotipy's temporary web server didn't start properly.

Fix: Check if another application is using port 8888. Try closing other programs and running the script again. If the problem persists, change the port in both the .env file and the Spotify Developer Dashboard (try 8080 or 9000).

Alternatively, the browser might show the "Unable to connect" page but the script works anyway. Check your terminal. If it says "Authenticated as: [your name]", the OAuth flow succeeded even though the browser couldn't load the callback page.

Error: "Invalid client" or "Client secret invalid"

Your Client ID or Client Secret is wrong. Double-check that you copied them correctly from the Spotify Developer Dashboard.

Fix: Go to your app in the dashboard, click "Settings," and copy the Client ID and Client Secret again. Make sure there are no extra spaces, no quotes, and no newlines in your .env file. The format should be:

SPOTIPY_CLIENT_ID=abc123...
SPOTIPY_CLIENT_SECRET=xyz789...

If you accidentally committed your Client Secret to GitHub, regenerate it in the dashboard and update your .env file with the new secret.

Script runs but creates an empty playlist

The script authenticated successfully but sp.current_user_top_tracks() returned no tracks. This happens if your Spotify account is brand new with no listening history, or if you're using a different account than you expected.

Fix: Check that you're logged into the correct Spotify account. Listen to a few songs on Spotify (at least 10 to 15 tracks over a day or two) to build up enough history for the "top tracks" endpoint to return results.

Alternatively, change time_range='short_term' to time_range='medium_term' or time_range='long_term' to access longer listening history.

Error: "Insufficient client scope" when creating playlist

You authorized the app but didn't grant the necessary permissions. This happens if you modified the scope variable after first authorization.

Fix: Delete the .cache file in your project directory and run the script again. This forces a new OAuth authorization with the updated scopes. You'll need to click "Agree" in the browser again, and this time the new scopes will be included.

Whenever you change the scope variable, delete .cache so Spotipy re-authorizes with the new permissions.

What You Just Accomplished

In 15 minutes, you:

Created a Spotify Developer app and obtained OAuth credentials
Configured environment variables to store secrets securely
Completed the OAuth authorization flow (even though Spotipy handled the details)
Fetched personalized data from Spotify's API using an access token
Created a real playlist in your Spotify account with your current top tracks

This quick start established the foundation. You now know OAuth works, the API returns real data, and you can create playlists programmatically. Everything else in this chapter builds on these capabilities.

The next section explains what Spotipy is doing behind the scenes, introduces Spotify's data model (audio features, time ranges, track URIs), and prepares you to understand the database schema design that enables the Time Machine's features. But first, take a moment to appreciate what you built. That playlist is real. The OAuth flow worked. You're ready for the complete project.

The quick start proved OAuth works and the API returns real data. Section 3 dives deeper into Spotify's data model to understand what the API provides: track URIs, time ranges, audio features, and pagination. These concepts are essential for designing the database schema that comes in Section 4.

5. Understanding Spotify's Data Model

Before you can design a database to store Spotify data, you need to understand what Spotify gives you. The API returns complex nested JSON structures with dozens of fields per track. Some fields you'll use constantly (track name, artist, audio features). Others you'll ignore (market availability codes, external IDs for other platforms).

This section focuses on the data that matters for the Music Time Machine. You'll learn Spotify's identifier system (URIs and IDs), time ranges that categorize your listening history, audio features that enable algorithmic playlist generation, and the structure of track objects returned by the API. Understanding these concepts now prevents confusion when you're designing database tables.

The goal isn't to memorize every field Spotify returns. It's to understand the data model well enough that you can make informed decisions about what to store, how to structure your tables, and which fields enable which features.

Spotify's Identifier System

Spotify uses two types of identifiers for every resource (tracks, albums, artists, playlists): Spotify IDs and Spotify URIs. You'll work with both, so you need to understand the difference.

Spotify ID (22-character string)

A unique identifier for a resource. Example: 6rqhFgbbKwnb9MLmUQDhG6. This is a base62-encoded string that uniquely identifies a track. Think of it like a database primary key. Spotify IDs appear in API responses under the id field.

Spotify URI (includes resource type)

A formatted identifier that includes the resource type. Example: spotify:track:6rqhFgbbKwnb9MLmUQDhG6. The URI explicitly states this is a track, followed by its ID. URIs appear in API responses under the uri field. Many API methods require URIs rather than IDs.

Python

# Example: Working with Spotify identifiers
track = sp.current_user_top_tracks(limit=1)['items'][0]

# Both identifiers point to the same track
track_id = track['id']          # "6rqhFgbbKwnb9MLmUQDhG6"
track_uri = track['uri']        # "spotify:track:6rqhFgbbKwnb9MLmUQDhG6"

# Convert between them
uri_from_id = f"spotify:track:{track_id}"        # Build URI from ID
id_from_uri = track_uri.split(':')[-1]           # Extract ID from URI

# API methods use both formats
audio_features = sp.audio_features([track_id])   # Accepts IDs
sp.playlist_add_items(playlist_id, [track_uri])  # Requires URIs

Storage Decision

In your database, store Spotify IDs as your primary keys. They're shorter (22 characters vs 28 for URIs) and you can always reconstruct the URI by prepending spotify:track:. The ID is the canonical identifier.

When calling API methods that require URIs, build them from your stored IDs. When parsing API responses, extract the ID from the URI if needed. This keeps your database lean while maintaining compatibility with Spotify's API.

Time Ranges: How Spotify Categorizes Your History

Spotify's current_user_top_tracks() and current_user_top_artists() methods accept a time_range parameter that determines which window of your listening history to analyze. This concept is fundamental to the Music Time Machine because different time ranges serve different purposes.

Time Range	Approximate Period	Use Case
`short_term`	Last ~4 weeks	Current obsessions, recent discoveries, monthly snapshots
`medium_term`	Last ~6 months	Seasonal preferences, sustained favorites, trend analysis
`long_term`	Several years	All-time favorites, lifetime patterns, forgotten gems discovery

The exact duration of each time range isn't publicly documented by Spotify, and it can vary based on your listening activity. What matters is the relative relationship: short_term tracks recent obsessions, medium_term captures sustained preferences, and long_term reveals your all-time patterns.

Python

# Fetch top tracks for each time range
short_term = sp.current_user_top_tracks(limit=50, time_range='short_term')
medium_term = sp.current_user_top_tracks(limit=50, time_range='medium_term')
long_term = sp.current_user_top_tracks(limit=50, time_range='long_term')

# Extract track IDs from each range
short_ids = {track['id'] for track in short_term['items']}
medium_ids = {track['id'] for track in medium_term['items']}
long_ids = {track['id'] for track in long_term['items']}

# Find forgotten gems: in long-term but not in short-term
forgotten = long_ids - short_ids
print(f"You loved {len(forgotten)} tracks long-term but haven't heard them recently")

# Find new discoveries: in short-term but not in medium-term
discoveries = short_ids - medium_ids
print(f"You discovered {len(discoveries)} new tracks in the last month")

How This Enables Forgotten Gems

The Forgotten Gems feature compares time ranges using set operations. Tracks in long_term but not short_term are songs you loved historically but haven't heard recently. This is pure Python set subtraction, made possible by Spotify's time range categorization.

Your database doesn't need to store which time range a track came from. Instead, you'll store timestamps when you fetch tracks, then use SQL queries to implement similar "forgotten" logic based on dates. The time ranges help you understand Spotify's data model, but your application will track time more precisely using actual dates.

Audio Features: The Secret to Algorithmic Curation

Spotify analyzes every track in their catalog and assigns numerical scores for various audio characteristics. These features power their recommendation algorithms, and they enable your mood-based playlist generation. The audio_features() method returns 13 attributes per track, but you only need to care about 7 of them.

Energy (0.0 to 1.0)

Perceptual measure of intensity and activity. High energy tracks feel fast, loud, and noisy (metal, punk, EDM). Low energy tracks feel calm and subdued (classical, ambient, acoustic). Example: "Thunderstruck" by AC/DC has energy 0.92, while "Claire de Lune" has energy 0.11.

Valence (0.0 to 1.0)

Musical positiveness or happiness. High valence sounds cheerful and euphoric (pop, dance, happy songs). Low valence sounds sad or angry (melancholic indie, heavy metal, blues). Example: "Happy" by Pharrell Williams has valence 0.96, while "Hurt" by Johnny Cash has valence 0.14.

Danceability (0.0 to 1.0)

How suitable a track is for dancing based on tempo, rhythm stability, beat strength, and overall regularity. High danceability tracks have strong, regular beats (funk, disco, pop). Low danceability tracks have irregular rhythms or lack beat emphasis (free jazz, ambient, classical).

Tempo (BPM, typically 50-200)

Beats per minute. The speed of the track. Fast tracks work for running or workouts (150+ BPM), moderate tracks for walking (100-130 BPM), slow tracks for relaxation (60-90 BPM). Example: Most house music sits at 120-130 BPM, while ballads average 60-80 BPM.

Acousticness (0.0 to 1.0)

Confidence measure of whether the track is acoustic (no electric instruments). High acousticness indicates unplugged or live recordings with acoustic guitars, pianos, and natural instruments. Low acousticness indicates heavy use of synthesizers, electric guitars, and electronic production.

Instrumentalness (0.0 to 1.0)

Predicts whether a track contains no vocals. Tracks with high instrumentalness (>0.5) are likely instrumental tracks (classical, jazz, ambient, lo-fi beats). Low instrumentalness indicates vocal presence (pop, rock, hip-hop).

Speechiness (0.0 to 1.0)

Detects the presence of spoken words. High speechiness (>0.66) indicates talk shows or audiobooks. Medium speechiness (0.33-0.66) suggests rap or spoken word. Low speechiness (<0.33) is music with minimal talking (most genres).

Python

# Fetch audio features for a track
track_id = "6rqhFgbbKwnb9MLmUQDhG6"  # "Karma Police" by Radiohead
features = sp.audio_features([track_id])[0]

print(f"Energy: {features['energy']:.2f}")
print(f"Valence: {features['valence']:.2f}")
print(f"Danceability: {features['danceability']:.2f}")
print(f"Tempo: {features['tempo']:.0f} BPM")
print(f"Acousticness: {features['acousticness']:.2f}")
print(f"Instrumentalness: {features['instrumentalness']:.2f}")
print(f"Speechiness: {features['speechiness']:.2f}")

Output

Energy: 0.55
Valence: 0.31
Danceability: 0.49
Tempo: 126 BPM
Acousticness: 0.18
Instrumentalness: 0.00
Speechiness: 0.03

What Just Happened

"Karma Police" scores medium on energy (0.55), low on happiness/valence (0.31), and moderate on danceability (0.49). These numbers match the listening experience: it's a melancholic alternative rock song with moderate intensity. Not a pump-up track, not a sad ballad, somewhere in between.

Low instrumentalness (0.00) confirms vocals are present throughout. Low speechiness (0.03) confirms it's sung music, not spoken word. The tempo of 126 BPM feels moderate, not too fast or slow.

These features enable mood-based playlist generation. Want a workout playlist? Filter for tracks where energy > 0.8 AND tempo > 140. Want focus music? Filter for energy < 0.4 AND instrumentalness > 0.5. The possibilities are endless once you have the features stored in your database.

Building Mood Profiles

The real power of audio features emerges when you combine them into mood profiles. Instead of filtering on single attributes, you create multi-dimensional criteria that capture complex moods.

Mood Profile	Energy	Valence	Tempo	Other Criteria
Workout	>0.75	>0.50	>140 BPM	Danceability >0.6
Focus	<0.40	Any	90-110 BPM	Instrumentalness >0.5
Melancholic	<0.50	<0.30	Any	Acousticness >0.5
Party	>0.70	>0.70	118-135 BPM	Danceability >0.7
Chill	<0.40	0.4-0.7	80-100 BPM	Acousticness >0.4

Python

# Example: Score a track against a workout profile
def matches_workout_profile(features):
    """Returns True if track matches workout criteria"""
    return (
        features['energy'] > 0.75 and
        features['valence'] > 0.50 and
        features['tempo'] > 140 and
        features['danceability'] > 0.6
    )

# Fetch your top 50 tracks and their audio features
tracks = sp.current_user_top_tracks(limit=50)['items']
track_ids = [track['id'] for track in tracks]
features_list = sp.audio_features(track_ids)

# Filter for workout tracks
workout_tracks = [
    (tracks[i], features_list[i])
    for i in range(len(tracks))
    if features_list[i] and matches_workout_profile(features_list[i])
]

print(f"Found {len(workout_tracks)} workout-suitable tracks from your top 50")
for track, features in workout_tracks[:5]:
    print(f"  {track['name']} - {track['artists'][0]['name']}")
    print(f"    Energy: {features['energy']:.2f}, Tempo: {features['tempo']:.0f} BPM")

This approach scales beautifully. You can define 20 different mood profiles, score every track in your library against each one, and generate perfectly curated playlists on demand. The database stores the audio features once, and you can create unlimited playlists without making additional API calls.

What a Track Object Contains

When you fetch tracks from Spotify's API, you get complex nested JSON objects. Here's what matters for the Music Time Machine:

JSON

{
  "id": "6rqhFgbbKwnb9MLmUQDhG6",
  "uri": "spotify:track:6rqhFgbbKwnb9MLmUQDhG6",
  "name": "Karma Police",
  "artists": [
    {
      "id": "4Z8W4fKeB5YxbusRsdQVPb",
      "name": "Radiohead"
    }
  ],
  "album": {
    "id": "6400dnyeDyD2mIFHfkwHXN",
    "name": "OK Computer",
    "release_date": "1997-06-16",
    "images": [
      {
        "url": "https://i.scdn.co/image/ab67616d0000b273c8b444df094279e70d0ed856",
        "height": 640,
        "width": 640
      }
    ]
  },
  "duration_ms": 263066,
  "popularity": 77,
  "explicit": false,
  "external_urls": {
    "spotify": "https://open.spotify.com/track/6rqhFgbbKwnb9MLmUQDhG6"
  }
}

Essential Fields (Store These)

id: The track's Spotify ID (your database primary key). name: Track title. artists: Array of artist objects (usually just extract the first artist's name for display). album.name: Album title. duration_ms: Track length in milliseconds. popularity: Spotify's popularity score (0-100, updates frequently).

Optional Fields (Useful for Display)

album.images: Array of album cover URLs in different sizes (use the first one for thumbnails). external_urls.spotify: Direct link to open the track in Spotify. album.release_date: When the track was released (useful for sorting by era).

Fields You Can Ignore

available_markets: List of countries where the track is available (you don't need this). disc_number, track_number: Position on album (irrelevant for playlists). is_local, is_playable: Technical flags that don't affect your features. preview_url: 30-second preview clip (you're not building a player).

Database Design Hint

Notice that artist and album are nested objects. You'll need to decide whether to normalize these into separate tables (proper database design) or denormalize by storing just the names as text in the tracks table (simpler, slightly redundant).

For the Music Time Machine, denormalization is acceptable. You'll store artist_name and album_name as text fields in the tracks table. This avoids JOIN operations and simplifies queries. The tradeoff is storing "Radiohead" multiple times if you have multiple Radiohead tracks. The database file grows slightly larger, but queries run faster and the code is simpler.

Professional applications with millions of tracks would normalize. For a personal music tracker with thousands of tracks, denormalization is the pragmatic choice. Section 4 explains this decision in detail.

How This Enables the Features

Now that you understand Spotify's data model, you can see how each concept enables specific Music Time Machine features:

Forgotten Gems

Compare tracks from different time ranges using set operations. Tracks in long_term but not short_term are forgotten favorites. Your database will implement this using timestamps instead of time ranges, but the concept is the same.

Currently Obsessed (Monthly Snapshots)

Fetch short_term top tracks once per month and save them with a timestamp. Store the track ID, name, artist, and the date you fetched it. Over time, you accumulate monthly snapshots that show how your taste evolves.

Mood-Based Playlists

Fetch audio features for every track you've listened to and store them in the database. When generating a playlist, query the database for tracks matching mood criteria (e.g., WHERE energy > 0.8 AND tempo > 140). Return track URIs and pass them to Spotify's playlist creation API.

Musical Evolution Analytics

Query your database for tracks grouped by month. Calculate statistics: average energy per month, genre diversity, track turnover rate. These analytics only work because you stored historical data with timestamps.

Everything you just learned translates directly into database tables. Track IDs become primary keys. Audio features become columns. Time ranges inform your timestamp-based queries. The next section designs the schema that brings these concepts together.

Takeaways & Next Step

Understanding Spotify's Data Model:

Track URIs: Spotify's canonical identifiers for playback and API operations
Time Ranges: Short-term (4 weeks), medium-term (6 months), long-term (years) listening windows
Audio Features: Seven key dimensions (energy, valence, danceability, tempo, acousticness, instrumentalness, speechiness) enable algorithmic curation
Pagination: Offset-based pagination with limits up to 50 items per request

With Spotify's data model understood, Section 4 designs the database schema that stores this data efficiently. You'll see how to separate entities (tracks) from events (snapshots), why audio features deserve their own table, and how composite primary keys enable time-series queries. This schema will migrate to PostgreSQL in Chapter 24 when you build multi-user production systems.

6. Database Schema Design

A bad database schema makes simple features complicated. A good schema makes complex features simple. The difference shows up in your code: awkward queries with multiple joins and subqueries versus straightforward SELECT statements that return exactly what you need.

This section designs the database that powers the Music Time Machine. You'll see how each table supports specific features, why certain fields need indexes, and what happens when you get the design wrong. The schema looks obvious in retrospect, but the design decisions matter.

We'll start with the wrong approach (storing everything in one massive table), identify the problems it creates, then build the correct schema that separates concerns and enables efficient queries.

The Naive Approach (Don't Do This)

Here's what seems obvious at first: create one table that stores everything about every track you fetch from Spotify.

SQL (Bad Design)

CREATE TABLE everything (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    track_id TEXT,
    track_name TEXT,
    artist_name TEXT,
    album_name TEXT,
    duration_ms INTEGER,
    snapshot_date TEXT,
    time_range TEXT,
    energy REAL,
    valence REAL,
    tempo REAL,
    danceability REAL
    -- ... every field for every track for every snapshot
);

This design has serious problems:

Massive Data Duplication

If "Karma Police" appears in 12 monthly snapshots, you store the track name, artist, album, duration, and audio features 12 times. The database grows unnecessarily large and updates become complex (change one field, update 12 rows).

No Clear Primary Key

What uniquely identifies a row? It's not the track_id (same track appears multiple times). It's not the snapshot_date (multiple tracks per snapshot). You end up with an auto-increment ID that doesn't represent anything meaningful.

Awkward Queries

Finding forgotten gems requires complex GROUP BY logic to identify tracks that appeared in old snapshots but not recent ones. Generating mood playlists requires scanning duplicate audio features. Every query fights the schema.

Update Anomalies

Spotify updates track popularity scores frequently. If you want to refresh popularity for a track that appears in 10 snapshots, you either update 10 rows or accept inconsistent data across snapshots.

The fundamental mistake: mixing two concerns. Tracks are entities (things that exist independently). Snapshots are events (points in time when you captured data). Storing them in one table conflates entity with event.

The Correct Approach: Three Tables

The Music Time Machine needs three tables, each with a clear purpose:

Table	Purpose	Primary Key
`tracks`	Stores track metadata (name, artist, album, duration)	track_id (Spotify ID)
`audio_features`	Stores Spotify's audio analysis (energy, valence, tempo)	track_id (references tracks)
`snapshots`	Records when tracks appeared in your top tracks	(track_id, snapshot_date) composite key

This design stores each track once in the tracks table. Audio features (optional data, not all tracks need it) go in a separate table. The snapshots table creates the many-to-many relationship between tracks and dates: the same track can appear in multiple snapshots, and each snapshot contains multiple tracks.

Visual Schema Relationship


  tracks table                audio_features table         snapshots table
┌──────────────────┐        ┌───────────────────┐        ┌──────────────────┐
│ track_id    (PK) │────┬──→│ track_id     (PK) │        │ id          (PK) │
│ name             │    │   │ energy            │        │ track_id    (FK) │
│ artist_name      │    │   │ valence           │        │ snapshot_date    │
│ album_name       │    │   │ danceability      │        │ time_range       │
│ duration_ms      │    │   │ tempo             │        │ rank             │
│ popularity       │    │   │ acousticness      │        │ created_at       │
│ spotify_url      │    │   │ instrumentalness  │        └──────────────────┘
│ added_at         │    │   │ speechiness       │                │
└──────────────────┘    │   │ fetched_at        │                │
                        │   └───────────────────┘                │
                        └────────────────────────────────────────┘
                        
KEY:  PK = Primary Key   FK = Foreign Key

This diagram shows how the three tables connect. Each track appears once in the tracks table. Audio features (optional) link via track_id. Snapshots create the many-to-many relationship: the same track can appear in multiple snapshots (different months), and each snapshot contains multiple tracks (your top 50).

Why Separate Audio Features

Audio features are optional because not all tracks need them immediately. When you create a monthly snapshot, you save 50 track IDs. But you only fetch audio features when generating mood playlists (maybe weeks later).

Separating them into their own table means you can add audio features on demand. The tracks table stays lean. Queries that don't need audio features (like listing recent snapshots) run faster because they don't scan unnecessary columns.

If a track doesn't have audio features yet, the audio_features table simply won't have a row for that track_id. Your code checks: if features exist, use them. If not, fetch them from Spotify and store them.

The Tracks Table

The tracks table stores metadata for every track you've encountered. Each track appears exactly once, identified by its Spotify ID.

SQL

CREATE TABLE tracks (
    track_id TEXT PRIMARY KEY,      -- Spotify's track ID (unique identifier)
    name TEXT NOT NULL,              -- Track title
    artist_name TEXT NOT NULL,       -- Primary artist (denormalized for simplicity)
    album_name TEXT NOT NULL,        -- Album title (denormalized)
    duration_ms INTEGER NOT NULL,    -- Track length in milliseconds
    popularity INTEGER,              -- Spotify popularity score (0-100)
    album_image_url TEXT,            -- Album cover URL (for display)
    spotify_url TEXT,                -- Direct link to track in Spotify
    added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP  -- When we first encountered this track
);

track_id as Primary Key

Uses Spotify's ID directly. This prevents duplicate tracks and makes lookups instant. When you fetch top tracks from Spotify and get the same song again, you don't create a new row. You just reference the existing track_id in your snapshots table.

Denormalized Artist and Album

Stores artist_name and album_name as text rather than foreign keys to separate artist and album tables. This is deliberate denormalization. You sacrifice a bit of storage space (storing "Radiohead" multiple times) for simpler queries (no joins needed to display track information).

Optional Display Fields

The album_image_url and spotify_url fields aren't strictly necessary but make the application more polished. You can show album art in the web interface and link directly to tracks in Spotify.

added_at Timestamp

Records when you first encountered this track. This isn't the same as when you listened to it (that's in snapshots), but it's useful for analytics ("I discovered 47 new artists this year").

Python (Inserting Tracks)

import sqlite3

def save_track(conn, track):
    """Save a track to the database (insert or ignore if exists)"""
    conn.execute("""
        INSERT OR IGNORE INTO tracks (
            track_id, name, artist_name, album_name, 
            duration_ms, popularity, album_image_url, spotify_url
        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        track['id'],
        track['name'],
        track['artists'][0]['name'],      # First artist only
        track['album']['name'],
        track['duration_ms'],
        track['popularity'],
        track['album']['images'][0]['url'] if track['album']['images'] else None,
        track['external_urls']['spotify']
    ))

# Usage: fetch tracks from Spotify and save them
top_tracks = sp.current_user_top_tracks(limit=50)['items']

with sqlite3.connect('music_time_machine.db') as conn:
    for track in top_tracks:
        save_track(conn, track)
    conn.commit()

print(f"Saved {len(top_tracks)} tracks to database")

INSERT OR IGNORE Pattern

The INSERT OR IGNORE statement attempts to insert a track. If the track_id already exists (because you encountered this track in a previous snapshot), SQLite ignores the insert without raising an error. This is safer than checking "does this track exist?" before every insert.

This pattern has a subtle tradeoff: it won't update existing tracks. If Spotify changes a track's popularity score between snapshots, your database keeps the old value. For most fields (name, artist, album), this is fine because they rarely change. For popularity, you might want INSERT OR REPLACE instead to always use the latest data.

The Audio Features Table

The audio_features table stores Spotify's audio analysis data. This table has a one-to-one relationship with tracks: each track can have at most one row of audio features.

SQL

CREATE TABLE audio_features (
    track_id TEXT PRIMARY KEY,       -- References tracks(track_id)
    energy REAL NOT NULL,             -- 0.0 to 1.0
    valence REAL NOT NULL,            -- 0.0 to 1.0 (happiness)
    danceability REAL NOT NULL,       -- 0.0 to 1.0
    tempo REAL NOT NULL,              -- BPM (typically 50-200)
    acousticness REAL NOT NULL,       -- 0.0 to 1.0
    instrumentalness REAL NOT NULL,   -- 0.0 to 1.0
    speechiness REAL NOT NULL,        -- 0.0 to 1.0
    loudness REAL,                    -- dB (typically -60 to 0)
    key INTEGER,                      -- Musical key (0-11, optional)
    mode INTEGER,                     -- Major (1) or Minor (0)
    time_signature INTEGER,           -- Beats per bar (3, 4, 5, 7)
    fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (track_id) REFERENCES tracks(track_id) ON DELETE CASCADE
);

The FOREIGN KEY constraint ensures referential integrity. You can't insert audio features for a track_id that doesn't exist in the tracks table. If you delete a track from the tracks table, its audio features are automatically deleted (ON DELETE CASCADE).

Python (Fetching and Storing Audio Features)

def save_audio_features(conn, track_id, features):
    """Save audio features for a track"""
    conn.execute("""
        INSERT OR REPLACE INTO audio_features (
            track_id, energy, valence, danceability, tempo,
            acousticness, instrumentalness, speechiness,
            loudness, key, mode, time_signature
        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        track_id,
        features['energy'],
        features['valence'],
        features['danceability'],
        features['tempo'],
        features['acousticness'],
        features['instrumentalness'],
        features['speechiness'],
        features['loudness'],
        features['key'],
        features['mode'],
        features['time_signature']
    ))

# Fetch audio features for all tracks that don't have them yet
with sqlite3.connect('music_time_machine.db') as conn:
    # Find tracks without audio features
    cursor = conn.execute("""
        SELECT track_id FROM tracks 
        WHERE track_id NOT IN (SELECT track_id FROM audio_features)
    """)
    track_ids = [row[0] for row in cursor.fetchall()]
    
    if track_ids:
        print(f"Fetching audio features for {len(track_ids)} tracks...")
        
        # Spotify allows 100 track IDs per request
        for i in range(0, len(track_ids), 100):
            batch = track_ids[i:i+100]
            features_list = sp.audio_features(batch)
            
            for features in features_list:
                if features:  # Some tracks might not have features
                    save_audio_features(conn, features['id'], features)
        
        conn.commit()
        print("Audio features saved")

Lazy Loading Pattern

This code demonstrates lazy loading: fetch audio features only when needed, not when you first save tracks. When you create a monthly snapshot, you save 50 tracks. That's 50 database inserts.

Later, when generating a mood playlist, you check which tracks lack audio features and batch-fetch them from Spotify. This might be 500 tracks if you've accumulated several months of snapshots. But you only fetch each track's features once, then reuse them forever.

This pattern minimizes API calls (Spotify rate limits you) and speeds up snapshot creation (no need to wait for 50 audio feature requests every time you take a snapshot).

The Snapshots Table

The snapshots table records when tracks appeared in your top tracks. This is the historical record that enables forgotten gems discovery, trend analysis, and musical evolution tracking.

SQL

CREATE TABLE snapshots (
    track_id TEXT NOT NULL,          -- References tracks(track_id)
    snapshot_date DATE NOT NULL,     -- When this snapshot was taken
    time_range TEXT NOT NULL,        -- 'short_term', 'medium_term', or 'long_term'
    rank INTEGER NOT NULL,           -- Position in top tracks (1-50)
    PRIMARY KEY (track_id, snapshot_date, time_range),
    FOREIGN KEY (track_id) REFERENCES tracks(track_id) ON DELETE CASCADE
);

-- Index for time-based queries (critical for performance)
CREATE INDEX idx_snapshots_date ON snapshots(snapshot_date);
CREATE INDEX idx_snapshots_track_date ON snapshots(track_id, snapshot_date);

Composite Primary Key

The combination of (track_id, snapshot_date, time_range) uniquely identifies a row. The same track can appear in multiple snapshots, and you can take multiple snapshots per day (one per time range). But each specific combination appears only once.

Rank Field

Stores the track's position in your top tracks (1 for your most-played track, 50 for the least-played in that snapshot). This enables analytics like "which tracks consistently rank high?" and "how quickly do songs rise and fall?"

Indexes for Performance

The two indexes dramatically speed up time-based queries. Finding tracks from the last 30 days becomes instant instead of scanning every row. The second index optimizes queries that filter by both track_id and date (like "when did I first hear this track?").

Python (Creating a Snapshot)

from datetime import date

def create_snapshot(conn, sp, time_range='short_term'):
    """
    Fetch current top tracks and save as a snapshot
    Returns number of tracks saved
    """
    # Fetch top tracks from Spotify
    top_tracks = sp.current_user_top_tracks(limit=50, time_range=time_range)['items']
    today = date.today().isoformat()
    
    # Save tracks to tracks table first
    for track in top_tracks:
        save_track(conn, track)  # INSERT OR IGNORE
    
    # Save snapshot records
    for rank, track in enumerate(top_tracks, start=1):
        conn.execute("""
            INSERT OR IGNORE INTO snapshots (track_id, snapshot_date, time_range, rank)
            VALUES (?, ?, ?, ?)
        """, (track['id'], today, time_range, rank))
    
    conn.commit()
    return len(top_tracks)

# Usage: create monthly snapshot
with sqlite3.connect('music_time_machine.db') as conn:
    count = create_snapshot(conn, sp, time_range='short_term')
    print(f"Created snapshot with {count} tracks")

What Just Happened

The create_snapshot() function demonstrates the schema design in action. First, it saves all tracks to the tracks table (INSERT OR IGNORE means existing tracks are skipped). Then it records which tracks appeared in this snapshot.

Notice the separation of concerns: track metadata lives in tracks, while the historical record of "this track was in my top 50 on this date" lives in snapshots. This separation prevents data duplication and enables efficient queries.

If you run this function monthly for a year, you accumulate 12 snapshots with roughly 600 snapshot records (50 tracks × 12 months, though many tracks repeat across months). But you only store each unique track once in the tracks table.

Querying the Schema

A good schema makes common queries simple. Here are the queries that power Music Time Machine features:

Find Forgotten Gems

SQL

-- Tracks that appeared 90-365 days ago but not in last 30 days
SELECT DISTINCT t.track_id, t.name, t.artist_name
FROM tracks t
JOIN snapshots s ON t.track_id = s.track_id
WHERE s.snapshot_date BETWEEN date('now', '-365 days') AND date('now', '-90 days')
  AND t.track_id NOT IN (
      SELECT track_id FROM snapshots 
      WHERE snapshot_date >= date('now', '-30 days')
  )
ORDER BY s.snapshot_date DESC
LIMIT 25;

This query finds tracks you loved 3-12 months ago but haven't heard recently. The index on snapshot_date makes both the date filter and the subquery fast.

Generate Workout Playlist

SQL

-- High energy, high tempo tracks from your library
SELECT t.track_id, t.name, t.artist_name, 
       af.energy, af.tempo, af.valence
FROM tracks t
JOIN audio_features af ON t.track_id = af.track_id
WHERE af.energy > 0.75
  AND af.tempo > 140
  AND af.valence > 0.5
  AND af.danceability > 0.6
ORDER BY af.energy DESC, af.tempo DESC
LIMIT 25;

This query scores tracks against workout criteria and returns the 25 best matches. Because audio features are in a separate table, this query only scans tracks that have features (not every track in your database).

Track Your Musical Evolution

SQL

-- Average audio features by month
SELECT 
    strftime('%Y-%m', s.snapshot_date) AS month,
    AVG(af.energy) AS avg_energy,
    AVG(af.valence) AS avg_valence,
    AVG(af.tempo) AS avg_tempo,
    COUNT(DISTINCT t.artist_name) AS unique_artists
FROM snapshots s
JOIN tracks t ON s.track_id = t.track_id
JOIN audio_features af ON t.track_id = af.track_id
WHERE s.time_range = 'short_term'
GROUP BY month
ORDER BY month;

This aggregation query shows how your musical taste evolves over time. The index on snapshot_date makes the date filtering efficient, even with thousands of snapshot records.

Schema Enables Simple Queries

Notice how straightforward these queries are. Finding forgotten gems is a simple NOT IN subquery. Mood playlists are basic WHERE clauses with JOINs. Evolution tracking is standard GROUP BY aggregation.

The naive single-table design would require complex DISTINCT clauses, multiple nested subqueries, and GROUP BY logic to deduplicate tracks. The three-table design eliminates that complexity. Each query maps directly to a feature concept.

The Complete Schema

Here's the complete schema creation script. Save this as schema.sql and run it to initialize your database:

SQL

-- Music Time Machine Database Schema
-- SQLite 3 compatible

-- Enable foreign key constraints (disabled by default in SQLite)
PRAGMA foreign_keys = ON;

-- Track metadata
CREATE TABLE IF NOT EXISTS tracks (
    track_id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    artist_name TEXT NOT NULL,
    album_name TEXT NOT NULL,
    duration_ms INTEGER NOT NULL,
    popularity INTEGER,
    album_image_url TEXT,
    spotify_url TEXT,
    added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Audio features for algorithmic curation
CREATE TABLE IF NOT EXISTS audio_features (
    track_id TEXT PRIMARY KEY,
    energy REAL NOT NULL,
    valence REAL NOT NULL,
    danceability REAL NOT NULL,
    tempo REAL NOT NULL,
    acousticness REAL NOT NULL,
    instrumentalness REAL NOT NULL,
    speechiness REAL NOT NULL,
    loudness REAL,
    key INTEGER,
    mode INTEGER,
    time_signature INTEGER,
    fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (track_id) REFERENCES tracks(track_id) ON DELETE CASCADE
);

-- Historical snapshots of top tracks
CREATE TABLE IF NOT EXISTS snapshots (
    track_id TEXT NOT NULL,
    snapshot_date DATE NOT NULL,
    time_range TEXT NOT NULL CHECK(time_range IN ('short_term', 'medium_term', 'long_term')),
    rank INTEGER NOT NULL CHECK(rank >= 1 AND rank <= 50),
    PRIMARY KEY (track_id, snapshot_date, time_range),
    FOREIGN KEY (track_id) REFERENCES tracks(track_id) ON DELETE CASCADE
);

-- Indexes for performance
CREATE INDEX IF NOT EXISTS idx_snapshots_date ON snapshots(snapshot_date);
CREATE INDEX IF NOT EXISTS idx_snapshots_track_date ON snapshots(track_id, snapshot_date);
CREATE INDEX IF NOT EXISTS idx_tracks_artist ON tracks(artist_name);

-- Schema version tracking (for future migrations)
CREATE TABLE IF NOT EXISTS schema_version (
    version INTEGER PRIMARY KEY,
    applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO schema_version (version) VALUES (1);

Python (Initialize Database)

import sqlite3

def initialize_database(db_path='music_time_machine.db'):
    """Create database and tables from schema"""
    conn = sqlite3.connect(db_path)
    
    # Read and execute schema
    with open('schema.sql', 'r') as f:
        schema_sql = f.read()
        conn.executescript(schema_sql)
    
    conn.commit()
    conn.close()
    print(f"Database initialized at {db_path}")

# Initialize on first run
if __name__ == '__main__':
    initialize_database()

Run this once to create your database. The schema includes CHECK constraints to validate data (time_range must be one of three values, rank must be 1-50), foreign key constraints to maintain relationships, and a schema_version table for future migrations.

Schema Evolution and Migrations

Databases evolve. Three months from now, you might realize you need a new field or table. Instead of deleting your database and losing accumulated data, you write a migration that transforms the existing schema.

The schema_version table tracks which migrations have been applied. Each migration increments the version number and records when it ran.

SQL (Example Migration: Add Genre Field)

-- Migration 002: Add genre tracking
-- Check current version first
SELECT MAX(version) FROM schema_version;

-- If version is 1, run this migration
ALTER TABLE tracks ADD COLUMN primary_genre TEXT;

-- Record that migration 2 was applied
INSERT INTO schema_version (version) VALUES (2);

You won't need migrations immediately, but knowing the pattern exists prevents panic when you realize your schema needs to change. Chapter 15 covered migrations in detail. The key point: migrations let schemas evolve without data loss.

Why These Design Decisions Matter

Every schema decision involved tradeoffs. Here's the reasoning behind each choice:

Three Tables Instead of One

Separating tracks, audio_features, and snapshots prevents data duplication and enables efficient queries. The cost is JOINs, but SQLite handles these joins efficiently with proper indexes. The benefit is storage efficiency and query simplicity.

Denormalized Artist and Album Names

Storing artist_name and album_name as text (rather than foreign keys to separate tables) trades storage space for query simplicity. For a personal music tracker with thousands of tracks, the redundancy is negligible and queries become much simpler (no need to JOIN artists and albums just to display track information).

Separate Audio Features Table

Audio features are optional data fetched on demand. Separating them means the tracks table stays lean (smaller, faster queries) and you can fetch audio features lazily. The alternative (storing 12 audio feature columns in tracks with mostly NULL values) wastes space and slows down queries that don't need features.

Composite Primary Key for Snapshots

The combination (track_id, snapshot_date, time_range) uniquely identifies a snapshot record. This design naturally prevents duplicate snapshots and makes time-based queries efficient. An auto-increment ID would work but wouldn't add value.

Indexes on Date Fields

The Music Time Machine runs time-based queries constantly (tracks from last 30 days, tracks from 90-365 days ago, monthly aggregations). Without indexes, these queries scan every row. With indexes, they jump directly to relevant date ranges. This is the difference between queries that take 5ms and queries that take 500ms.

Professional database design is about understanding your access patterns and optimizing for them. The Music Time Machine queries snapshots by date constantly, so date indexes are critical. It rarely queries by artist name, so that index is optional (but included for completeness).

Takeaways & Next Step

Database Schema Design:

Three-table separation: Tracks (entities), audio_features (optional attributes), snapshots (time-series events)
Denormalization tradeoff: Store artist/album names as text for query simplicity despite slight storage redundancy
Composite keys: (track_id, snapshot_date, time_range) uniquely identifies snapshot records
Index strategy: snapshot_date and time_range indexed for fast historical queries

The schema is designed for SQLite's strengths: simple queries, fast reads, single-user access. When you migrate to PostgreSQL in Chapter 24 for multi-user production deployment, you'll add connection pooling, concurrent write handling, and potentially further normalization. The core schema patterns remain the same.

Section 5 implements the four core features using this schema. You'll see how good schema design makes complex features simple, and how SQL queries replace algorithmic complexity.

7. Building Core Features

You have OAuth authentication working. You have a database schema that stores tracks, audio features, and historical snapshots. Now you build the features that make the Music Time Machine useful: discovering forgotten gems, capturing monthly snapshots, generating mood-based playlists, and tracking musical evolution.

Each feature follows the same pattern: fetch data from Spotify's API, query your local database for historical context, apply logic that combines both sources, and create playlists or generate insights. The database provides memory (what you listened to months ago), Spotify provides current state (what you're listening to now), and your code bridges the two.

This section builds one feature at a time, showing complete working code for each. You'll see how the schema design from Section 4 makes these features straightforward to implement. By the end, you'll have a command-line application with four distinct capabilities, each demonstrating different aspects of API and database integration.

Feature 1: Forgotten Gems

Forgotten Gems answers a simple question: which songs did you love months ago but haven't heard recently? This feature demonstrates set operations, time-based database queries, and the power of accumulated historical data.

The algorithm compares two sets of tracks. Old favorites are tracks that appeared in snapshots 90-365 days ago (roughly 3 months to 1 year back). Recent listens are tracks from the last 30 days. Forgotten gems are the difference: old favorites minus recent listens.

Python

import sqlite3
from datetime import date

def find_forgotten_gems(conn, limit=25):
    """
    Find tracks you loved months ago but haven't heard recently
    
    Args:
        conn: SQLite database connection
        limit: Maximum number of tracks to return
    
    Returns:
        List of (track_id, name, artist_name) tuples
    """
    cursor = conn.execute("""
        SELECT DISTINCT t.track_id, t.name, t.artist_name, t.album_name
        FROM tracks t
        JOIN snapshots s ON t.track_id = s.track_id
        WHERE s.snapshot_date BETWEEN date('now', '-365 days') AND date('now', '-90 days')
          AND t.track_id NOT IN (
              SELECT track_id FROM snapshots 
              WHERE snapshot_date >= date('now', '-30 days')
          )
        GROUP BY t.track_id
        ORDER BY MAX(s.snapshot_date) DESC
        LIMIT ?
    """, (limit,))
    
    return cursor.fetchall()

def create_forgotten_gems_playlist(sp, conn):
    """
    Create a playlist of forgotten gems in user's Spotify account
    
    Args:
        sp: Authenticated Spotipy client
        conn: SQLite database connection
    
    Returns:
        Playlist URL if successful, None otherwise
    """
    # Find forgotten tracks
    forgotten = find_forgotten_gems(conn, limit=25)
    
    if not forgotten:
        print("No forgotten gems found. This feature needs at least 3 months of history.")
        print("Keep taking monthly snapshots and check back later!")
        return None
    
    # Get user info
    user = sp.current_user()
    
    # Create playlist
    playlist_name = f"Forgotten Gems - {date.today().strftime('%B %Y')}"
    playlist = sp.user_playlist_create(
        user=user['id'],
        name=playlist_name,
        public=False,
        description="Tracks I loved months ago but haven't heard recently - rediscovered by Music Time Machine"
    )
    
    # Build track URIs
    track_uris = [f"spotify:track:{track[0]}" for track in forgotten]
    
    # Add tracks to playlist
    sp.playlist_add_items(playlist['id'], track_uris)
    
    # Display results
    print(f"\n✓ Created playlist: {playlist_name}")
    print(f"✓ Added {len(forgotten)} forgotten gems")
    print(f"✓ Playlist URL: {playlist['external_urls']['spotify']}\n")
    
    print("Your forgotten gems:")
    for i, (track_id, name, artist, album) in enumerate(forgotten, 1):
        print(f"  {i}. {name} - {artist}")
    
    return playlist['external_urls']['spotify']

# Usage
with sqlite3.connect('music_time_machine.db') as conn:
    url = create_forgotten_gems_playlist(sp, conn)
    if url:
        print(f"\nOpen in Spotify: {url}")

What Just Happened: The Query Logic

The SQL query does the heavy lifting. It finds tracks that appeared in snapshots between 90 and 365 days ago, then excludes any tracks that also appeared in the last 30 days. The NOT IN subquery performs the set subtraction.

DISTINCT: Prevents duplicate tracks if the same track appeared in multiple old snapshots.

GROUP BY with MAX(s.snapshot_date): Orders results by when you last heard each track. This surfaces tracks you stopped listening to more recently (maybe 3 months ago) before tracks from 10 months ago.

The date functions (date('now', '-90 days')) make SQLite calculate relative dates. This query works today, tomorrow, and a year from now without modification.

Example Output

✓ Created playlist: Forgotten Gems - December 2024
✓ Added 25 forgotten gems
✓ Playlist URL: https://open.spotify.com/playlist/3cEYpjA8bZ0Iex...

Your forgotten gems:
  1. Fake Plastic Trees - Radiohead
  2. Dissolve - Absofacto
  3. The Less I Know The Better - Tame Impala
  4. Midnight City - M83
  5. Electric Feel - MGMT
  6. Such Great Heights - The Postal Service
  7. Little Talks - Of Monsters and Men
  ...
  25. Float On - Modest Mouse

Open in Spotify: https://open.spotify.com/playlist/3cEYpjA8bZ0Iex...

This feature gets more valuable over time. With 3 months of history, you find a few forgotten tracks. With a year of history, you rediscover dozens. With 2+ years, you surface songs you completely forgot you loved.

Why This Works Better Than Spotify's Recommendations

Spotify's algorithm recommends new music based on what you listen to now. It pushes you toward discovery. Forgotten Gems does the opposite: it pulls from your own historical preferences, not algorithmic predictions.

These aren't songs Spotify thinks you'll like. They're songs you provably loved (they appeared in your top tracks for weeks or months), but fell out of rotation. The database remembers what Spotify's algorithm ignores: your past preferences.

Feature 2: Currently Obsessed (Monthly Snapshots)

Currently Obsessed captures monthly snapshots of your musical identity. Every time you run this feature, it fetches your current top 50 tracks and saves them with today's date. Over time, these snapshots become a musical diary: "What was I listening to in March 2024?" becomes a question your database can answer.

This feature is simpler than Forgotten Gems because it doesn't require complex queries. It just fetches, stores, and creates a dated playlist. But it's foundational because every other feature depends on having historical snapshots accumulated over time.

Python

from datetime import date

def save_track(conn, track):
    """Save a track to the database (insert or ignore if exists)"""
    conn.execute("""
        INSERT OR REPLACE INTO tracks (
            track_id, name, artist_name, album_name, 
            duration_ms, popularity, album_image_url, spotify_url
        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        track['id'],
        track['name'],
        track['artists'][0]['name'],
        track['album']['name'],
        track['duration_ms'],
        track.get('popularity', 0),
        track['album']['images'][0]['url'] if track['album']['images'] else None,
        track['external_urls']['spotify']
    ))

def create_monthly_snapshot(sp, conn, time_range='short_term'):
    """
    Create a monthly snapshot of current top tracks
    
    Args:
        sp: Authenticated Spotipy client
        conn: SQLite database connection
        time_range: 'short_term' (last 4 weeks), 'medium_term' (6 months), or 'long_term' (several years)
    
    Returns:
        Tuple of (playlist_url, track_count)
    """
    # Check if we already took a snapshot today
    today = date.today().isoformat()
    cursor = conn.execute("""
        SELECT COUNT(*) FROM snapshots 
        WHERE snapshot_date = ? AND time_range = ?
    """, (today, time_range))
    
    if cursor.fetchone()[0] > 0:
        print(f"Already took a {time_range} snapshot today.")
        print("Monthly snapshots are designed to run once per month.")
        return None, 0
    
    # Fetch top tracks from Spotify
    print(f"Fetching your top {time_range} tracks...")
    top_tracks = sp.current_user_top_tracks(limit=50, time_range=time_range)['items']
    
    # Save tracks to database
    for track in top_tracks:
        save_track(conn, track)
    
    # Save snapshot records
    for rank, track in enumerate(top_tracks, start=1):
        conn.execute("""
            INSERT OR IGNORE INTO snapshots (track_id, snapshot_date, time_range, rank)
            VALUES (?, ?, ?, ?)
        """, (track['id'], today, time_range, rank))
    
    conn.commit()
    
    # Create dated playlist in Spotify
    user = sp.current_user()
    month_year = date.today().strftime('%B %Y')
    playlist_name = f"Currently Obsessed - {month_year}"
    
    playlist = sp.user_playlist_create(
        user=user['id'],
        name=playlist_name,
        public=False,
        description=f"My top tracks from {month_year} - captured by Music Time Machine"
    )
    
    # Add tracks to playlist
    track_uris = [f"spotify:track:{track['id']}" for track in top_tracks]
    sp.playlist_add_items(playlist['id'], track_uris)
    
    # Display results
    print(f"\n✓ Snapshot saved: {len(top_tracks)} tracks from {today}")
    print(f"✓ Created playlist: {playlist_name}")
    print(f"✓ Playlist URL: {playlist['external_urls']['spotify']}\n")
    
    print("Your current top tracks:")
    for i, track in enumerate(top_tracks[:10], 1):
        print(f"  {i}. {track['name']} - {track['artists'][0]['name']}")
    if len(top_tracks) > 10:
        print(f"  ... and {len(top_tracks) - 10} more")
    
    return playlist['external_urls']['spotify'], len(top_tracks)

# Usage: Run this once per month
with sqlite3.connect('music_time_machine.db') as conn:
    url, count = create_monthly_snapshot(sp, conn, time_range='short_term')
    if url:
        print(f"\nCome back next month for another snapshot!")
        print(f"After 3+ months, you'll have enough history for Forgotten Gems.")

What Just Happened: Snapshot Protection

The function checks if you already took a snapshot today before fetching from Spotify. This prevents accidental duplicate snapshots if you run the script twice in one day. Monthly snapshots work best when they're actually monthly (or at least weekly).

INSERT OR REPLACE: Used for tracks instead of INSERT OR IGNORE. This updates popularity scores and other fields that might change over time. The previous snapshot data remains intact because snapshots reference tracks by ID.

Rank field: Stores each track's position in your top 50. Track #1 is your most-played song this month. This enables analytics like "which tracks consistently rank in my top 10 across multiple months?"

Example Output

Fetching your top short_term tracks...

✓ Snapshot saved: 50 tracks from 2024-12-03
✓ Created playlist: Currently Obsessed - December 2024
✓ Playlist URL: https://open.spotify.com/playlist/7dGJo6EpROf...

Your current top tracks:
  1. Creep - Radiohead
  2. Karma Police - Radiohead
  3. No Surprises - Radiohead
  4. High and Dry - Radiohead
  5. Street Spirit (Fade Out) - Radiohead
  6. The Less I Know The Better - Tame Impala
  7. Elephant - Tame Impala
  8. Borderline - Tame Impala
  9. Let It Happen - Tame Impala
  10. New Person, Same Old Mistakes - Tame Impala
  ... and 40 more

Come back next month for another snapshot!
After 3+ months, you'll have enough history for Forgotten Gems.

Set a monthly reminder to run this script. The accumulation happens slowly, but after 6 months you have a rich historical dataset. After a year, you can see how your taste evolved across seasons. After 2 years, you have genuine long-term trend data that reveals patterns you didn't know existed.

Automation Tip

You can automate monthly snapshots with a cron job (Linux/Mac) or Task Scheduler (Windows). Schedule the script to run on the first day of each month. Add error handling (covered in Section 6) so failures get logged instead of breaking silently.

For now, manual execution works fine. Run it monthly, maybe when you pay rent or when you get your monthly Spotify Wrapped summary. The consistency matters more than the exact automation.

📦 Going Further: From Snapshots to Streaming History

The Currently Obsessed feature uses snapshots — calling sp.current_user_top_tracks() to capture what Spotify considers your favorites over a calculated time period (approximately 4 weeks for short_term, 6 months for medium_term, and several years for long_term). This approach is robust and beginner-friendly.

Why snapshots work: Run the script once a month, and it captures that month's favorites perfectly. Miss a month? No problem — the data is still there next time you run it. Spotify calculates your top tracks continuously, so the endpoint always returns current results regardless of when you last called it. This is ideal for manual execution and learning.

For production deployments (Chapter 20), you might want streaming history instead — using sp.current_user_recently_played() with cursor-based pagination. This captures every individual listening event in chronological order, giving you complete play-by-play history rather than calculated summaries.

Why streaming history is different: The recently_played endpoint only returns the last 50 tracks. If you listen to 60 songs and then run the script, the first 10 are lost forever. This approach requires automated, frequent syncing (hourly or more often) to avoid gaps. Miss a sync window, lose data permanently.

Snapshots (Current Approach):

Forgiving: Run monthly, works perfectly
Manual-friendly: No automation required
Summary data: Top tracks over a period
No data loss: Spotify maintains the calculation

Streaming History (Advanced):

Comprehensive: Captures every play
Automation-required: Must run hourly or more
Granular data: Individual listening events with timestamps
Data loss risk: Miss a sync = permanent gaps

When to graduate to streaming history: After deploying your application in Chapter 20 with automated background jobs (cron tasks or scheduled workers), you can switch to recently_played with cursor pagination. The automation ensures you never miss a sync window, and the granular data enables advanced analytics like "listening patterns by hour of day" or "skip rate analysis."

For now, snapshots provide everything you need: monthly playlists, forgotten gems detection, and historical comparisons. They work reliably without automation, which makes them perfect for learning. Production features can come later, after you understand the deployment infrastructure that makes them viable.

Think of it this way: snapshots are photographs you take monthly. Streaming history is a video camera running 24/7. The photograph approach works great until you need frame-by-frame analysis. Start with snapshots, graduate to streaming when you have the infrastructure to support it.

Feature 3: Mood-Based Playlists

Mood-based playlists demonstrate algorithmic curation: scoring tracks against criteria and selecting the best matches. You define mood profiles (combinations of energy, valence, tempo, and other audio features), then query your database for tracks that match those profiles.

This feature requires audio features to be stored in your database. If you haven't fetched them yet, the code includes a helper function that identifies tracks without features and batch-fetches them from Spotify.

Step 1: Define Mood Profiles

Python

# Mood profile definitions
MOOD_PROFILES = {
    'workout': {
        'description': 'High energy tracks for running or gym',
        'criteria': {
            'energy_min': 0.75,
            'tempo_min': 140,
            'valence_min': 0.50,
            'danceability_min': 0.60
        }
    },
    'focus': {
        'description': 'Low energy instrumental tracks for concentration',
        'criteria': {
            'energy_max': 0.40,
            'tempo_min': 90,
            'tempo_max': 110,
            'instrumentalness_min': 0.50,
            'speechiness_max': 0.10
        }
    },
    'chill': {
        'description': 'Relaxed, mellow tracks',
        'criteria': {
            'energy_max': 0.50,
            'valence_min': 0.40,
            'valence_max': 0.70,
            'tempo_min': 80,
            'tempo_max': 110,
            'acousticness_min': 0.30
        }
    },
    'party': {
        'description': 'High energy, upbeat dance tracks',
        'criteria': {
            'energy_min': 0.70,
            'valence_min': 0.70,
            'danceability_min': 0.70,
            'tempo_min': 118,
            'tempo_max': 135
        }
    },
    'melancholic': {
        'description': 'Sad, introspective tracks',
        'criteria': {
            'energy_max': 0.50,
            'valence_max': 0.30,
            'acousticness_min': 0.40
        }
    }
}

def display_available_moods():
    """Show available mood profiles"""
    print("Available mood profiles:")
    for mood_name, profile in MOOD_PROFILES.items():
        print(f"  • {mood_name}: {profile['description']}")
    print()

Step 2: Query Tracks Matching Mood

Python

def build_mood_query(criteria):
    """
    Build SQL WHERE clause from mood criteria
    
    Args:
        criteria: Dictionary of min/max constraints for audio features
    
    Returns:
        Tuple of (where_clause, params)
    """
    conditions = []
    params = []
    
    for key, value in criteria.items():
        if key.endswith('_min'):
            feature = key[:-4]  # Remove '_min' suffix
            conditions.append(f"af.{feature} >= ?")
            params.append(value)
        elif key.endswith('_max'):
            feature = key[:-4]  # Remove '_max' suffix
            conditions.append(f"af.{feature} <= ?")
            params.append(value)
    
    where_clause = " AND ".join(conditions)
    return where_clause, params

def find_tracks_by_mood(conn, mood_name, limit=25):
    """
    Find tracks matching a mood profile
    
    Args:
        conn: SQLite database connection
        mood_name: Name of mood profile from MOOD_PROFILES
        limit: Maximum number of tracks to return
    
    Returns:
        List of (track_id, name, artist_name, album_name) tuples
    """
    if mood_name not in MOOD_PROFILES:
        print(f"Unknown mood: {mood_name}")
        display_available_moods()
        return []
    
    profile = MOOD_PROFILES[mood_name]
    where_clause, params = build_mood_query(profile['criteria'])
    
    query = f"""
        SELECT t.track_id, t.name, t.artist_name, t.album_name,
               af.energy, af.valence, af.tempo
        FROM tracks t
        JOIN audio_features af ON t.track_id = af.track_id
        WHERE {where_clause}
        ORDER BY af.energy DESC, af.tempo DESC
        LIMIT ?
    """
    
    params.append(limit)
    cursor = conn.execute(query, params)
    return cursor.fetchall()

Step 3: Fetch Audio Features with Rate Limiting

Python

import time

def fetch_missing_audio_features(sp, conn):
    """
    Fetch audio features for tracks that don't have them yet.
    Uses batching and rate limiting to stay under Spotify's limits.
    
    Args:
        sp: Authenticated Spotipy client
        conn: SQLite database connection
    
    Returns:
        Number of tracks for which features were fetched
    """
    # Find tracks without audio features
    cursor = conn.execute("""
        SELECT track_id FROM tracks 
        WHERE track_id NOT IN (SELECT track_id FROM audio_features)
    """)
    track_ids = [row[0] for row in cursor.fetchall()]
    
    if not track_ids:
        return 0
    
    print(f"Fetching audio features for {len(track_ids)} tracks...")
    
    # Spotify allows 100 track IDs per request
    fetched_count = 0
    batch_size = 100
    
    for i in range(0, len(track_ids), batch_size):
        batch = track_ids[i:i+batch_size]
        
        try:
            features_list = sp.audio_features(batch)
            
            for features in features_list:
                if features:  # Some tracks might not have features
                    conn.execute("""
                        INSERT OR REPLACE INTO audio_features (
                            track_id, energy, valence, danceability, tempo,
                            acousticness, instrumentalness, speechiness,
                            loudness, key, mode, time_signature
                        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                    """, (
                        features['id'],
                        features['energy'],
                        features['valence'],
                        features['danceability'],
                        features['tempo'],
                        features['acousticness'],
                        features['instrumentalness'],
                        features['speechiness'],
                        features['loudness'],
                        features['key'],
                        features['mode'],
                        features['time_signature']
                    ))
                    fetched_count += 1
            
            # Proactive rate limiting: sleep between batches
            # This keeps us comfortably under Spotify's rate limits
            if i + batch_size < len(track_ids):
                time.sleep(0.5)
                
        except Exception as e:
            print(f"Error fetching batch {i//batch_size + 1}: {e}")
            # Continue with next batch even if this one fails
            continue
    
    conn.commit()
    print(f"✓ Saved audio features for {fetched_count} tracks\n")
    return fetched_count

Why the 0.5 Second Delay?

Spotify's rate limits aren't publicly documented, but testing shows they allow approximately 180 requests per minute to most endpoints. That's 3 requests per second sustained.

With 500 missing tracks split into 5 batches (100 tracks each), you're making 5 requests. Without delays, all 5 requests fire in under 1 second. This works but uses up your rate limit budget quickly if you're running multiple features simultaneously.

Adding time.sleep(0.5) spreads those 5 requests across 2.5 seconds. This keeps you comfortably under the limit while only adding 2 seconds of total runtime. The trade-off is worth it: slightly slower execution prevents rate limit errors that would require 60+ second retry delays.

Professional API integration always includes proactive throttling. React to rate limits when you hit them (with retry logic), but avoid hitting them in the first place (with deliberate pauses).

Why Batch Size Matters

Each API call has overhead (TCP connection, TLS handshake, HTTP headers, network latency). Even if each call takes only 100ms, 500 calls take 50 seconds. Five batched calls take 0.5 seconds. The 500 tracks of processing time drops from 50 seconds to under 3 seconds (including throttling delays).

Spotify rate limits by requests per second, not by data transferred. Five batched requests consume far less of your rate limit budget than 500 individual requests. This matters when processing large libraries or running multiple features simultaneously.

The code uses range(0, len(track_ids), 100) to process in chunks of 100, matching Spotify's documented batch limit. Professional API integration always uses batch endpoints when available and respects documented limits.

Step 4: Create Mood Playlist

Python

def create_mood_playlist(sp, conn, mood_name, limit=25):
    """
    Create a playlist based on mood profile
    
    Args:
        sp: Authenticated Spotipy client
        conn: SQLite database connection
        mood_name: Name of mood profile from MOOD_PROFILES
        limit: Number of tracks to include
    
    Returns:
        Playlist URL if successful, None otherwise
    """
    # Ensure we have audio features
    fetch_missing_audio_features(sp, conn)
    
    # Find matching tracks
    tracks = find_tracks_by_mood(conn, mood_name, limit)
    
    if not tracks:
        print(f"No tracks found matching '{mood_name}' mood profile.")
        print("This usually means:")
        print("  1. You don't have enough tracks in your database yet")
        print("  2. Your music taste doesn't match this mood profile")
        print("Try taking more monthly snapshots or try a different mood.")
        return None
    
    # Get user info
    user = sp.current_user()
    
    # Create playlist
    profile = MOOD_PROFILES[mood_name]
    playlist_name = f"{mood_name.title()} - {date.today().strftime('%B %Y')}"
    playlist = sp.user_playlist_create(
        user=user['id'],
        name=playlist_name,
        public=False,
        description=f"{profile['description']} - curated by Music Time Machine"
    )
    
    # Add tracks to playlist
    track_uris = [f"spotify:track:{track[0]}" for track in tracks]
    sp.playlist_add_items(playlist['id'], track_uris)
    
    # Display results
    print(f"\n✓ Created playlist: {playlist_name}")
    print(f"✓ Added {len(tracks)} tracks matching '{mood_name}' profile")
    print(f"✓ Playlist URL: {playlist['external_urls']['spotify']}\n")
    
    print(f"Tracks in your {mood_name} playlist:")
    for i, track in enumerate(tracks[:10], 1):
        track_id, name, artist, album, energy, valence, tempo = track
        print(f"  {i}. {name} - {artist}")
        print(f"     Energy: {energy:.2f}, Valence: {valence:.2f}, Tempo: {tempo:.0f} BPM")
    
    if len(tracks) > 10:
        print(f"  ... and {len(tracks) - 10} more")
    
    return playlist['external_urls']['spotify']

# Usage: Generate a workout playlist
with sqlite3.connect('music_time_machine.db') as conn:
    display_available_moods()
    url = create_mood_playlist(sp, conn, 'workout', limit=25)
    if url:
        print(f"\nOpen in Spotify: {url}")

What Just Happened: Dynamic Query Building

The build_mood_query() function converts mood criteria into SQL WHERE clauses. For the workout profile (energy_min: 0.75, tempo_min: 140), it generates WHERE af.energy >= 0.75 AND af.tempo >= 140.

This design lets you add new mood profiles without writing new queries. Just define the criteria in the MOOD_PROFILES dictionary and the code handles the rest. Want a "road trip" mood? Add it with your preferred energy, tempo, and valence thresholds.

The function fetches missing audio features before querying. This means the first time you generate a mood playlist might take 30 seconds (fetching features for 500 tracks), but subsequent playlists generate instantly because the features are cached in your database.

Example Output

Available mood profiles:
  • workout: High energy tracks for running or gym
  • focus: Low energy instrumental tracks for concentration
  • chill: Relaxed, mellow tracks
  • party: High energy, upbeat dance tracks
  • melancholic: Sad, introspective tracks

Fetching audio features for 327 tracks...
✓ Saved audio features for 327 tracks

✓ Created playlist: Workout - December 2024
✓ Added 25 tracks matching 'workout' profile
✓ Playlist URL: https://open.spotify.com/playlist/4kL9mN3pR...

Tracks in your workout playlist:
  1. Losing My Religion - R.E.M.
     Energy: 0.84, Valence: 0.52, Tempo: 163 BPM
  2. The Less I Know The Better - Tame Impala
     Energy: 0.83, Valence: 0.61, Tempo: 117 BPM
  3. Feels Like We Only Go Backwards - Tame Impala
     Energy: 0.79, Valence: 0.54, Tempo: 156 BPM
  ... and 22 more

Open in Spotify: https://open.spotify.com/playlist/4kL9mN3pR...

Mood playlists demonstrate why separating audio features into their own table was the right design choice. You fetch features once (slow), then query them repeatedly (fast). Creating 5 different mood playlists takes seconds after the initial feature fetch.

Feature 4: Musical Evolution Analytics

Musical Evolution Analytics transforms your accumulated snapshots into insights about how your taste changes over time. This feature demonstrates database aggregation, time-series analysis, and the value of historical data.

The analytics include track turnover rates (how quickly your favorites change), average audio feature trends by month, artist diversity metrics, and new discovery tracking. These aren't features you use daily. They're insights you check every few months to see patterns emerge.

Python

def analyze_musical_evolution(conn):
    """
    Generate analytics about musical taste evolution over time
    
    Args:
        conn: SQLite database connection
    
    Returns:
        Dictionary containing various analytics metrics
    """
    analytics = {}
    
    # Total unique tracks and artists
    cursor = conn.execute("SELECT COUNT(DISTINCT track_id) FROM tracks")
    analytics['total_tracks'] = cursor.fetchone()[0]
    
    cursor = conn.execute("SELECT COUNT(DISTINCT artist_name) FROM tracks")
    analytics['total_artists'] = cursor.fetchone()[0]
    
    # Number of snapshots taken
    cursor = conn.execute("""
        SELECT COUNT(DISTINCT snapshot_date) FROM snapshots 
        WHERE time_range = 'short_term'
    """)
    analytics['snapshot_count'] = cursor.fetchone()[0]
    
    # Date range of data
    cursor = conn.execute("""
        SELECT MIN(snapshot_date), MAX(snapshot_date) 
        FROM snapshots WHERE time_range = 'short_term'
    """)
    min_date, max_date = cursor.fetchone()
    analytics['date_range'] = (min_date, max_date)
    
    # Track turnover rate: how many tracks appear in one snapshot but not the next
    cursor = conn.execute("""
        WITH consecutive_snapshots AS (
            SELECT snapshot_date,
                   LAG(snapshot_date) OVER (ORDER BY snapshot_date) as prev_date
            FROM (SELECT DISTINCT snapshot_date FROM snapshots 
                  WHERE time_range = 'short_term')
        ),
        turnover_by_snapshot AS (
            SELECT cs.snapshot_date,
                   COUNT(DISTINCT s1.track_id) as current_tracks,
                   COUNT(DISTINCT s2.track_id) as previous_tracks,
                   COUNT(DISTINCT CASE WHEN s2.track_id IS NULL THEN s1.track_id END) as new_tracks
            FROM consecutive_snapshots cs
            JOIN snapshots s1 ON s1.snapshot_date = cs.snapshot_date 
                              AND s1.time_range = 'short_term'
            LEFT JOIN snapshots s2 ON s2.snapshot_date = cs.prev_date 
                                   AND s2.track_id = s1.track_id
                                   AND s2.time_range = 'short_term'
            WHERE cs.prev_date IS NOT NULL
            GROUP BY cs.snapshot_date
        )
        SELECT AVG(CAST(new_tracks AS FLOAT) / current_tracks * 100)
        FROM turnover_by_snapshot
    """)
    result = cursor.fetchone()[0]
    analytics['avg_turnover_pct'] = result if result else 0
    
    # Average audio features by month
    cursor = conn.execute("""
        SELECT 
            strftime('%Y-%m', s.snapshot_date) AS month,
            AVG(af.energy) AS avg_energy,
            AVG(af.valence) AS avg_valence,
            AVG(af.tempo) AS avg_tempo,
            COUNT(DISTINCT t.artist_name) AS unique_artists
        FROM snapshots s
        JOIN tracks t ON s.track_id = t.track_id
        LEFT JOIN audio_features af ON t.track_id = af.track_id
        WHERE s.time_range = 'short_term'
        GROUP BY month
        ORDER BY month
    """)
    analytics['monthly_trends'] = cursor.fetchall()
    
    # Top artists across all time
    cursor = conn.execute("""
        SELECT t.artist_name, COUNT(DISTINCT s.snapshot_date) as appearances
        FROM tracks t
        JOIN snapshots s ON t.track_id = s.track_id
        WHERE s.time_range = 'short_term'
        GROUP BY t.artist_name
        ORDER BY appearances DESC
        LIMIT 10
    """)
    analytics['top_artists'] = cursor.fetchall()
    
    return analytics

def display_analytics(analytics):
    """Pretty-print analytics results"""
    print("\n" + "="*60)
    print("MUSICAL EVOLUTION ANALYTICS")
    print("="*60 + "\n")
    
    # Overview
    print("📊 Overview")
    print(f"  Total unique tracks discovered: {analytics['total_tracks']}")
    print(f"  Total unique artists: {analytics['total_artists']}")
    print(f"  Snapshots taken: {analytics['snapshot_count']}")
    
    if analytics['date_range'][0]:
        print(f"  Data spans: {analytics['date_range'][0]} to {analytics['date_range'][1]}")
    print()
    
    # Turnover rate
    if analytics['avg_turnover_pct'] > 0:
        print("🔄 Track Turnover Rate")
        print(f"  Average: {analytics['avg_turnover_pct']:.1f}% of tracks change between snapshots")
        if analytics['avg_turnover_pct'] < 20:
            print("  → You stick with favorites for a long time")
        elif analytics['avg_turnover_pct'] > 40:
            print("  → You discover and rotate through music quickly")
        else:
            print("  → You balance discovering new music with replaying favorites")
        print()
    
    # Monthly trends
    if analytics['monthly_trends']:
        print("📈 Monthly Trends")
        print("  Month      Energy  Valence  Tempo   Artists")
        print("  " + "-"*52)
        for row in analytics['monthly_trends']:
            month, energy, valence, tempo, artists = row
            energy_str = f"{energy:.2f}" if energy else "N/A"
            valence_str = f"{valence:.2f}" if valence else "N/A"
            tempo_str = f"{tempo:.0f}" if tempo else "N/A"
            print(f"  {month}    {energy_str:>6}  {valence_str:>6}  {tempo_str:>6}  {artists:>4}")
        print()
    
    # Top artists
    if analytics['top_artists']:
        print("🎤 Most Consistent Artists")
        print("  (Artists who appeared in the most monthly snapshots)")
        for i, (artist, appearances) in enumerate(analytics['top_artists'], 1):
            print(f"  {i:2}. {artist:<40} {appearances} months")
        print()
    
    print("="*60 + "\n")

# Usage
with sqlite3.connect('music_time_machine.db') as conn:
    analytics = analyze_musical_evolution(conn)
    display_analytics(analytics)

Example Output

============================================================
MUSICAL EVOLUTION ANALYTICS
============================================================

📊 Overview
  Total unique tracks discovered: 487
  Total unique artists: 142
  Snapshots taken: 8
  Data spans: 2024-05-01 to 2024-12-03

🔄 Track Turnover Rate
  Average: 32.4% of tracks change between snapshots
  → You balance discovering new music with replaying favorites

📈 Monthly Trends
  Month      Energy  Valence  Tempo   Artists
  ----------------------------------------------------
  2024-05    0.61    0.48     124      38
  2024-06    0.65    0.52     128      41
  2024-07    0.68    0.58     132      39
  2024-08    0.64    0.55     129      42
  2024-09    0.59    0.46     121      44
  2024-10    0.62    0.49     125      40
  2024-11    0.66    0.53     130      43
  2024-12    0.63    0.50     126      45

🎤 Most Consistent Artists
  (Artists who appeared in the most monthly snapshots)
   1. Radiohead                                      8 months
   2. Tame Impala                                    7 months
   3. Arctic Monkeys                                 6 months
   4. The Strokes                                    6 months
   5. MGMT                                           5 months
   6. Foster The People                              5 months
   7. Phoenix                                        5 months
   8. Two Door Cinema Club                           4 months
   9. Vampire Weekend                                4 months
  10. alt-J                                          4 months

============================================================

What Just Happened: Complex Aggregation

The track turnover calculation uses a window function (LAG) to compare consecutive snapshots. For each snapshot, it counts how many tracks are new compared to the previous month, then averages those percentages across all snapshots.

A 30% turnover rate means about 15 of your top 50 tracks change each month. Low turnover (under 20%) suggests you replay favorites extensively. High turnover (over 40%) suggests you constantly discover and rotate through new music.

The monthly trends query shows how your average energy, valence, and tempo change over time. Some people's music gets more energetic in summer months. Others prefer mellower music in winter. The data reveals patterns you might not consciously notice.

Analytics features become more valuable as your dataset grows. With 3 months of data, the insights are interesting. With 12 months, they reveal seasonal patterns. With 24+ months, they show genuine long-term evolution in your musical taste.

Bringing It All Together

You now have four working features. Each demonstrates different technical concepts, but they all follow the same architecture: combine Spotify's API (current state) with your database (historical memory), apply logic that bridges the two, and produce useful outputs.

Here's a simple command-line interface that lets users choose which feature to run:

Python (music_time_machine.py)

"""
Music Time Machine - Command Line Interface
Your personal music history tracker and playlist generator
"""
import os
import sqlite3
from dotenv import load_dotenv
import spotipy
from spotipy.oauth2 import SpotifyOAuth

# Load environment variables
load_dotenv()

# Initialize Spotify client
scope = "user-top-read playlist-modify-public playlist-modify-private"
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

# Database connection
DB_PATH = 'music_time_machine.db'

def display_menu():
    """Display main menu"""
    print("\n" + "="*60)
    print("MUSIC TIME MACHINE")
    print("="*60)
    print("\n1. Create Monthly Snapshot (Currently Obsessed)")
    print("2. Find Forgotten Gems")
    print("3. Generate Mood Playlist")
    print("4. View Analytics")
    print("5. Exit")
    print()

def main():
    """Main application loop"""
    # Ensure database exists
    if not os.path.exists(DB_PATH):
        print("Initializing database...")
        # Run schema.sql here (or call initialize_database())
    
    while True:
        display_menu()
        choice = input("Select an option (1-5): ").strip()
        
        with sqlite3.connect(DB_PATH) as conn:
            if choice == '1':
                create_monthly_snapshot(sp, conn)
            
            elif choice == '2':
                create_forgotten_gems_playlist(sp, conn)
            
            elif choice == '3':
                display_available_moods()
                mood = input("Enter mood name: ").strip().lower()
                create_mood_playlist(sp, conn, mood, limit=25)
            
            elif choice == '4':
                analytics = analyze_musical_evolution(conn)
                display_analytics(analytics)
            
            elif choice == '5':
                print("\nThanks for using Music Time Machine!")
                print("Keep taking monthly snapshots to build your musical history.\n")
                break
            
            else:
                print("Invalid choice. Please select 1-5.")
        
        if choice != '5':
            input("\nPress Enter to continue...")

if __name__ == '__main__':
    main()

This creates a simple menu-driven interface. Users select a feature by number, the application executes it, then returns to the menu. This pattern works well for command-line tools that perform discrete actions rather than running continuously.

What's Next

These four features form the core of the Music Time Machine. In Chapter 17, you'll add a Flask web interface with interactive charts that visualize your monthly trends. In Chapter 18, you'll deploy the entire application to production with a live URL.

But the command-line version you've built here is already useful. You can run it monthly, accumulate data, and generate playlists. The database grows richer over time, and the features become more valuable. This is the foundation everything else builds on.

8. Production Error Handling

The features you built in Section 5 work perfectly when everything goes right. But production code runs in the real world where networks timeout, APIs return unexpected errors, OAuth tokens expire, and databases get locked. The difference between hobby code and professional software is how gracefully it handles these failures.

This section applies the error handling patterns from Chapter 9 to the Music Time Machine. You'll categorize errors, implement retry logic for transient failures, provide user-friendly messages, and ensure the application degrades gracefully when non-critical components fail.

The goal isn't to prevent failures (you can't). The goal is to make failures predictable and recoverable. When Spotify's API times out, your application should retry automatically. When OAuth authorization fails, it should explain what went wrong. When the database is temporarily locked, it should wait and try again.

Error Categories for the Music Time Machine

The Music Time Machine encounters four distinct categories of errors. Each category requires a different handling strategy.

Category	Examples	Strategy
Transient Failures	Network timeout, Spotify 503 (service unavailable), database locked	Retry with exponential backoff
Authorization Failures	Expired OAuth token, invalid credentials, insufficient scopes	Re-authenticate or prompt user to update permissions
Rate Limiting	Spotify 429 (too many requests)	Honor Retry-After header, exponential backoff
Data Errors	Missing audio features, empty snapshots, corrupt database	Log error, continue with partial data, or skip gracefully

Chapter 9 covered these patterns in detail. This section shows how to apply them to the specific failures the Music Time Machine encounters during OAuth flows, API calls, database operations, and playlist generation.

Implementing Retry Logic

Transient failures resolve themselves if you wait and retry. Network hiccups last seconds. Spotify's servers recover from overload within minutes. Database locks release when competing transactions complete. The solution is exponential backoff with jitter.

Python

import time
import random
from spotipy.exceptions import SpotifyException
import requests

class MusicTimeMachineError(Exception):
    """Base exception for Music Time Machine errors"""
    pass

class TransientError(MusicTimeMachineError):
    """Temporary error that might resolve with retry"""
    pass

class AuthorizationError(MusicTimeMachineError):
    """OAuth or permission error requiring user action"""
    pass

class RateLimitError(MusicTimeMachineError):
    """Hit Spotify's rate limit"""
    def __init__(self, message, retry_after=None):
        super().__init__(message)
        self.retry_after = retry_after

def retry_with_backoff(func, max_attempts=3, base_delay=1.0):
    """
    Retry a function with exponential backoff and jitter
    
    Args:
        func: Function to retry (should take no arguments, use lambda if needed)
        max_attempts: Maximum number of retry attempts
        base_delay: Initial delay in seconds (doubles each retry)
    
    Returns:
        Function result if successful
    
    Raises:
        Last exception if all retries fail
    """
    last_exception = None
    
    for attempt in range(max_attempts):
        try:
            return func()
        
        except (requests.Timeout, requests.ConnectionError) as e:
            last_exception = TransientError(f"Network error: {e}")
            error_type = "network timeout"
        
        except SpotifyException as e:
            if e.http_status == 429:
                # Rate limit - check for Retry-After header
                retry_after = e.headers.get('Retry-After')
                if retry_after:
                    wait_time = int(retry_after)
                    print(f"Rate limited. Waiting {wait_time} seconds as requested...")
                    time.sleep(wait_time)
                    continue
                else:
                    last_exception = RateLimitError("Hit rate limit without Retry-After header")
                    error_type = "rate limit"
            
            elif e.http_status in (500, 502, 503, 504):
                # Server errors are transient
                last_exception = TransientError(f"Spotify server error: {e.http_status}")
                error_type = "server error"
            
            elif e.http_status in (401, 403):
                # Authorization errors don't benefit from retry
                raise AuthorizationError(f"Authorization failed: {e.msg}")
            
            else:
                # Other errors are not transient
                raise
        
        except Exception as e:
            # Unknown error - don't retry
            raise
        
        # Calculate backoff with jitter
        if attempt < max_attempts - 1:
            delay = base_delay * (2 ** attempt)
            jitter = random.uniform(0, delay * 0.1)  # 0-10% jitter
            wait_time = delay + jitter
            
            print(f"Attempt {attempt + 1} failed ({error_type}). Retrying in {wait_time:.1f}s...")
            time.sleep(wait_time)
    
    # All retries exhausted
    raise last_exception

What Just Happened: Categorizing Errors

The retry logic categorizes errors as they occur. Network timeouts and 5xx server errors are transient (retry makes sense). 429 rate limits get special handling (honor the Retry-After header). 401/403 authorization errors skip retry entirely because they require user action.

The exponential backoff formula doubles the wait time with each retry: 1s, 2s, 4s. Jitter adds 0-10% randomness to prevent thundering herd problems (thousands of clients retrying at exactly the same time and overwhelming the server again).

Custom exception types (TransientError, AuthorizationError, RateLimitError) let calling code distinguish between failure modes and respond appropriately. A TransientError after 3 retries tells the user "Spotify is having problems, try again later." An AuthorizationError tells them "re-authorize the application."

Wrapping Spotify API Calls

Now that you have retry logic, wrap every Spotify API call with it. This transforms fragile direct API calls into robust operations that handle transient failures automatically.

Python (Before: Fragile)

# Fragile: crashes on network hiccup or server error
def create_monthly_snapshot_fragile(sp, conn):
    top_tracks = sp.current_user_top_tracks(limit=50, time_range='short_term')['items']
    
    for track in top_tracks:
        save_track(conn, track)
    
    # If this fails, you lose all the work above
    playlist = sp.user_playlist_create(user['id'], name=playlist_name, public=False)
    sp.playlist_add_items(playlist['id'], track_uris)

Python (After: Robust)

def create_monthly_snapshot_robust(sp, conn):
    """Create monthly snapshot with error handling"""
    try:
        # Fetch tracks with retry
        top_tracks = retry_with_backoff(
            lambda: sp.current_user_top_tracks(limit=50, time_range='short_term')['items']
        )
        
        # Save to database
        for track in top_tracks:
            save_track(conn, track)
        conn.commit()
        
        # Create playlist with retry
        user = retry_with_backoff(lambda: sp.current_user())
        
        playlist = retry_with_backoff(
            lambda: sp.user_playlist_create(
                user=user['id'],
                name=playlist_name,
                public=False,
                description=f"My top tracks from {month_year}"
            )
        )
        
        # Add tracks with retry
        track_uris = [f"spotify:track:{track['id']}" for track in top_tracks]
        retry_with_backoff(
            lambda: sp.playlist_add_items(playlist['id'], track_uris)
        )
        
        return playlist['external_urls']['spotify'], len(top_tracks)
    
    except AuthorizationError as e:
        print(f"\n❌ Authorization Error: {e}")
        print("Please re-run the application to re-authorize with Spotify.")
        print("Delete the .cache file if you need to reset OAuth completely.\n")
        return None, 0
    
    except TransientError as e:
        print(f"\n❌ Temporary Error: {e}")
        print("Spotify's servers might be experiencing issues.")
        print("Please try again in a few minutes.\n")
        return None, 0
    
    except RateLimitError as e:
        print(f"\n❌ Rate Limit Error: {e}")
        print("You've made too many requests to Spotify's API.")
        print("Wait a few minutes before trying again.\n")
        return None, 0
    
    except Exception as e:
        print(f"\n❌ Unexpected Error: {e}")
        print("Something went wrong. Please check your internet connection")
        print("and make sure Spotify's API is accessible.\n")
        return None, 0

What Just Happened: Three-Part Error Messages

Each error handler prints a three-part message: (1) what went wrong, (2) why it matters, (3) what to do about it. This is the pattern from Chapter 9. Users see helpful guidance instead of technical stack traces.

Authorization errors: Explain that OAuth tokens expired and tell users to re-run the application (Spotipy handles re-authorization automatically) or delete .cache for a full reset.

Transient errors: Acknowledge the problem is temporary and suggest waiting a few minutes. No user action needed beyond retry.

Rate limit errors: Explain they made too many requests and need to slow down. This is user behavior modification, not technical debugging.

Handling Database Errors

SQLite databases can encounter errors: the database file gets locked when another process writes to it, disk space runs out, or the schema doesn't match expectations. Production code anticipates these failures.

Python

import sqlite3

def safe_database_operation(conn, operation_func, max_attempts=3):
    """
    Execute a database operation with retry on lock errors
    
    Args:
        conn: SQLite connection
        operation_func: Function that performs database operations
        max_attempts: Maximum retry attempts for lock errors
    
    Returns:
        Result of operation_func
    
    Raises:
        sqlite3.Error if operation fails after retries
    """
    for attempt in range(max_attempts):
        try:
            result = operation_func(conn)
            conn.commit()
            return result
        
        except sqlite3.OperationalError as e:
            if "database is locked" in str(e).lower():
                if attempt < max_attempts - 1:
                    wait_time = 0.5 * (2 ** attempt)  # 0.5s, 1s, 2s
                    print(f"Database locked. Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                    continue
                else:
                    raise MusicTimeMachineError(
                        "Database is locked by another process. "
                        "Close other applications using the database and try again."
                    )
            else:
                # Other operational errors (disk full, corrupted database)
                raise MusicTimeMachineError(f"Database error: {e}")
        
        except sqlite3.IntegrityError as e:
            # Constraint violations (foreign key, unique, etc.)
            raise MusicTimeMachineError(f"Data integrity error: {e}")
        
        except sqlite3.DatabaseError as e:
            # Malformed database, schema errors
            raise MusicTimeMachineError(
                f"Database structure error: {e}. "
                "You might need to rebuild the database from schema.sql"
            )

def save_track_safe(conn, track):
    """Save track with error handling"""
    def operation(conn):
        conn.execute("""
            INSERT OR REPLACE INTO tracks (
                track_id, name, artist_name, album_name, 
                duration_ms, popularity, album_image_url, spotify_url
            ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            track['id'],
            track['name'],
            track['artists'][0]['name'],
            track['album']['name'],
            track['duration_ms'],
            track.get('popularity', 0),
            track['album']['images'][0]['url'] if track['album']['images'] else None,
            track['external_urls']['spotify']
        ))
    
    safe_database_operation(conn, operation)

SQLite Lock Behavior

SQLite locks the entire database file during writes. If process A is writing and process B tries to write, process B gets "database is locked" and must wait. This is normal SQLite behavior, not an error.

The retry logic waits with exponential backoff (0.5s, 1s, 2s) for the lock to release. For a single-user application like the Music Time Machine, locks release quickly (milliseconds to seconds). If the lock persists after 3 retries, something else is wrong (another program opened the database file, or the application crashed mid-transaction).

SQLite's locking is why you separate read-heavy operations (analytics, queries) from write-heavy operations (taking snapshots). You can have multiple simultaneous readers, but writers block everyone.

Graceful Degradation

Some features can partially succeed even when components fail. When fetching audio features for 100 tracks, maybe 5 fail because Spotify doesn't have feature data for those tracks. The application should save the 95 successful results rather than failing completely.

Python

def fetch_missing_audio_features_robust(sp, conn):
    """
    Fetch audio features with graceful degradation
    Continue even if some tracks fail
    
    Returns:
        Tuple of (successful_count, failed_count)
    """
    # Find tracks without features
    cursor = conn.execute("""
        SELECT track_id FROM tracks 
        WHERE track_id NOT IN (SELECT track_id FROM audio_features)
    """)
    track_ids = [row[0] for row in cursor.fetchall()]
    
    if not track_ids:
        return 0, 0
    
    print(f"Fetching audio features for {len(track_ids)} tracks...")
    
    successful = 0
    failed = 0
    failed_tracks = []
    
    # Process in batches of 100 (Spotify's limit)
    for i in range(0, len(track_ids), 100):
        batch = track_ids[i:i+100]
        
        try:
            # Fetch features with retry
            features_list = retry_with_backoff(
                lambda: sp.audio_features(batch),
                max_attempts=3
            )
            
            # Save successful features
            for features in features_list:
                if features is None:
                    # Spotify doesn't have features for this track
                    failed += 1
                    continue
                
                try:
                    conn.execute("""
                        INSERT OR REPLACE INTO audio_features (
                            track_id, energy, valence, danceability, tempo,
                            acousticness, instrumentalness, speechiness,
                            loudness, key, mode, time_signature
                        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                    """, (
                        features['id'],
                        features['energy'],
                        features['valence'],
                        features['danceability'],
                        features['tempo'],
                        features['acousticness'],
                        features['instrumentalness'],
                        features['speechiness'],
                        features['loudness'],
                        features['key'],
                        features['mode'],
                        features['time_signature']
                    ))
                    successful += 1
                
                except (sqlite3.Error, KeyError) as e:
                    # Database error or malformed feature data
                    failed += 1
                    failed_tracks.append(features.get('id', 'unknown'))
            
            conn.commit()
        
        except (TransientError, RateLimitError) as e:
            # Entire batch failed - count all tracks as failed
            print(f"Batch failed: {e}")
            failed += len(batch)
            failed_tracks.extend(batch)
        
        except Exception as e:
            # Unexpected error - log and continue with next batch
            print(f"Unexpected error in batch: {e}")
            failed += len(batch)
            failed_tracks.extend(batch)
    
    # Report results
    print(f"✓ Saved audio features for {successful} tracks")
    if failed > 0:
        print(f"⚠ Could not fetch features for {failed} tracks")
        print(f"  This is normal - some tracks don't have audio analysis available.")
    
    return successful, failed

What Just Happened: Partial Success Pattern

The function processes tracks in batches and continues even when individual batches fail. If batch 1 succeeds but batch 2 fails, the function saves batch 1's results and reports partial success.

Spotify's audio_features() method returns None for tracks that don't have audio analysis (very rare, but happens for some local files or very new releases). The code treats None as expected rather than an error. It counts these tracks as "failed" but doesn't stop processing.

The function returns both success and failure counts. Calling code can decide whether partial success is acceptable (it usually is for audio features) or whether it needs to retry failures later.

Graceful degradation means accepting that some operations will fail and continuing with partial results. For mood playlists, having audio features for 95% of tracks is fine. For monthly snapshots, saving 48 of 50 tracks is acceptable. Perfect results are ideal, but partial results are useful.

Logging for Debugging

Print statements work for development. Production applications need structured logging that records what happened, when it happened, and what context was relevant. Python's logging module provides this capability.

Python

import logging
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('music_time_machine.log'),
        logging.StreamHandler()  # Also print to console
    ]
)

logger = logging.getLogger('MusicTimeMachine')

def create_monthly_snapshot_with_logging(sp, conn):
    """Create monthly snapshot with comprehensive logging"""
    logger.info("Starting monthly snapshot creation")
    
    try:
        # Fetch tracks
        logger.info("Fetching top tracks from Spotify")
        top_tracks = retry_with_backoff(
            lambda: sp.current_user_top_tracks(limit=50, time_range='short_term')['items']
        )
        logger.info(f"Fetched {len(top_tracks)} tracks successfully")
        
        # Save to database
        logger.info("Saving tracks to database")
        for i, track in enumerate(top_tracks):
            save_track_safe(conn, track)
            if (i + 1) % 10 == 0:
                logger.debug(f"Saved {i + 1}/{len(top_tracks)} tracks")
        
        conn.commit()
        logger.info("Database commit successful")
        
        # Create playlist
        logger.info("Creating Spotify playlist")
        user = retry_with_backoff(lambda: sp.current_user())
        logger.debug(f"Authenticated as user: {user['id']}")
        
        month_year = datetime.now().strftime('%B %Y')
        playlist_name = f"Currently Obsessed - {month_year}"
        
        playlist = retry_with_backoff(
            lambda: sp.user_playlist_create(
                user=user['id'],
                name=playlist_name,
                public=False
            )
        )
        logger.info(f"Created playlist: {playlist['id']}")
        
        # Add tracks
        track_uris = [f"spotify:track:{track['id']}" for track in top_tracks]
        retry_with_backoff(
            lambda: sp.playlist_add_items(playlist['id'], track_uris)
        )
        logger.info(f"Added {len(track_uris)} tracks to playlist")
        
        logger.info("Monthly snapshot completed successfully")
        return playlist['external_urls']['spotify'], len(top_tracks)
    
    except AuthorizationError as e:
        logger.error(f"Authorization failed: {e}")
        return None, 0
    
    except TransientError as e:
        logger.warning(f"Transient error (after retries): {e}")
        return None, 0
    
    except Exception as e:
        logger.exception(f"Unexpected error during snapshot: {e}")
        return None, 0

Log Levels and When to Use Them

DEBUG: Detailed information for diagnosing problems (saved 10/50 tracks). Only visible when debugging is enabled.

INFO: General progress messages (started snapshot, fetched tracks, created playlist). Always visible in production.

WARNING: Something unexpected but recoverable happened (transient error after retries, partial data saved).

ERROR: Something failed and user action is required (authorization failed, database corrupted).

EXCEPTION: Like ERROR but includes full stack trace. Use logger.exception() in exception handlers for complete debugging context.

Example Log Output

2024-12-03 14:32:15 - MusicTimeMachine - INFO - Starting monthly snapshot creation
2024-12-03 14:32:15 - MusicTimeMachine - INFO - Fetching top tracks from Spotify
2024-12-03 14:32:17 - MusicTimeMachine - INFO - Fetched 50 tracks successfully
2024-12-03 14:32:17 - MusicTimeMachine - INFO - Saving tracks to database
2024-12-03 14:32:17 - MusicTimeMachine - DEBUG - Saved 10/50 tracks
2024-12-03 14:32:18 - MusicTimeMachine - DEBUG - Saved 20/50 tracks
2024-12-03 14:32:18 - MusicTimeMachine - DEBUG - Saved 30/50 tracks
2024-12-03 14:32:19 - MusicTimeMachine - DEBUG - Saved 40/50 tracks
2024-12-03 14:32:19 - MusicTimeMachine - DEBUG - Saved 50/50 tracks
2024-12-03 14:32:19 - MusicTimeMachine - INFO - Database commit successful
2024-12-03 14:32:19 - MusicTimeMachine - INFO - Creating Spotify playlist
2024-12-03 14:32:19 - MusicTimeMachine - DEBUG - Authenticated as user: john_smith
2024-12-03 14:32:20 - MusicTimeMachine - INFO - Created playlist: 3cEYpjA8bZ0Iex
2024-12-03 14:32:21 - MusicTimeMachine - INFO - Added 50 tracks to playlist
2024-12-03 14:32:21 - MusicTimeMachine - INFO - Monthly snapshot completed successfully

Logs get written to both music_time_machine.log and the console. When something goes wrong, check the log file for detailed context. The timestamps show exactly when each step completed and how long operations took.

Production-Ready Error Handling Pattern

Here's the complete pattern that combines retry logic, graceful degradation, user-friendly messages, and logging:

Python

def production_ready_feature(sp, conn, feature_name):
    """
    Template for production-ready feature implementation
    
    Demonstrates: retry logic, error categorization, logging,
    graceful degradation, user-friendly messages
    """
    logger.info(f"Starting {feature_name}")
    
    try:
        # Step 1: Fetch data with retry
        logger.info("Fetching data from Spotify")
        data = retry_with_backoff(
            lambda: sp.some_spotify_method(),
            max_attempts=3
        )
        logger.info(f"Fetched {len(data)} items")
        
        # Step 2: Process data with graceful degradation
        logger.info("Processing data")
        successful = 0
        failed = 0
        
        for item in data:
            try:
                # Process individual item
                process_item(conn, item)
                successful += 1
            except Exception as e:
                # Log failure but continue processing
                logger.warning(f"Failed to process item: {e}")
                failed += 1
        
        conn.commit()
        logger.info(f"Processed {successful} items successfully, {failed} failed")
        
        # Step 3: Create output with retry
        logger.info("Creating output")
        result = retry_with_backoff(
            lambda: create_output(sp, processed_data),
            max_attempts=3
        )
        logger.info(f"{feature_name} completed successfully")
        
        # Step 4: User-friendly success message
        print(f"\n✓ {feature_name} completed successfully")
        print(f"✓ Processed {successful} items")
        if failed > 0:
            print(f"⚠ {failed} items could not be processed (see log for details)")
        
        return result
    
    except AuthorizationError as e:
        logger.error(f"Authorization failed: {e}")
        print(f"\n❌ Authorization Error")
        print("Your Spotify authentication has expired or is invalid.")
        print("Please re-run the application to re-authorize.")
        return None
    
    except TransientError as e:
        logger.warning(f"Transient error: {e}")
        print(f"\n❌ Temporary Error")
        print("Spotify's servers are experiencing issues.")
        print("Please try again in a few minutes.")
        return None
    
    except RateLimitError as e:
        logger.warning(f"Rate limit hit: {e}")
        print(f"\n❌ Rate Limit Exceeded")
        print("You've made too many requests to Spotify.")
        print("Please wait a few minutes before trying again.")
        return None
    
    except MusicTimeMachineError as e:
        logger.error(f"Application error: {e}")
        print(f"\n❌ Error: {e}")
        return None
    
    except Exception as e:
        logger.exception(f"Unexpected error: {e}")
        print(f"\n❌ Unexpected Error")
        print(f"Something went wrong: {e}")
        print("Please check music_time_machine.log for details.")
        return None

The Complete Pattern

This template demonstrates every production error handling technique: (1) wrap external calls with retry logic, (2) process items individually with graceful degradation, (3) log everything at appropriate levels, (4) categorize errors and handle each category appropriately, (5) provide three-part user messages.

Every feature in the Music Time Machine should follow this pattern. The pattern adds complexity (more lines of code, more exception handlers), but it transforms fragile scripts into reliable applications. The first time your application recovers automatically from a network hiccup, you'll appreciate the investment.

Building a Diagnostic Toolkit

When your Music Time Machine fails, you need to know why. Is the OAuth token expired? Is the database schema wrong? Is Spotify's API down? Is your network connection broken? Integrated systems have multiple failure points. Professional developers build diagnostic tools to isolate problems quickly.

The diagnostic script tests each component independently: database connectivity, schema validation, OAuth token freshness, and API accessibility. Run this when something breaks and you'll know exactly where to look.

System Diagnostic Script

Python

"""
diagnose.py - System diagnostic tool for Music Time Machine

Run this script when something breaks. It tests each system component 
independently and reports exactly what's working and what's not.

Usage: python diagnose.py
"""

import sqlite3
import os
import sys
from pathlib import Path
import spotipy
from spotipy.oauth2 import SpotifyOAuth

def check_database():
    """Test database connectivity and schema."""
    print("\n" + "="*60)
    print("DATABASE DIAGNOSTICS")
    print("="*60)
    
    db_path = Path("music_time_machine.db")
    
    # Check if database file exists
    if not db_path.exists():
        print("❌ Database file not found")
        print(f"   Expected: {db_path.absolute()}")
        print("   Action: Run the database setup script first")
        return False
    
    print(f"✓ Database file exists: {db_path.absolute()}")
    
    # Test connection
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        print("✓ Database connection successful")
    except Exception as e:
        print(f"❌ Cannot connect to database: {e}")
        return False
    
    # Verify schema
    required_tables = ['listening_history', 'monthly_snapshots']
    
    cursor.execute("""
        SELECT name FROM sqlite_master 
        WHERE type='table'
    """)
    existing_tables = [row[0] for row in cursor.fetchall()]
    
    schema_valid = True
    for table in required_tables:
        if table in existing_tables:
            print(f"✓ Table exists: {table}")
            
            # Check row count
            cursor.execute(f"SELECT COUNT(*) FROM {table}")
            count = cursor.fetchone()[0]
            print(f"  → Contains {count} rows")
        else:
            print(f"❌ Missing table: {table}")
            schema_valid = False
    
    # Check for audio features columns
    cursor.execute("PRAGMA table_info(listening_history)")
    columns = [row[1] for row in cursor.fetchall()]
    
    required_columns = ['energy', 'valence', 'danceability', 'tempo']
    for col in required_columns:
        if col in columns:
            print(f"✓ Column exists: {col}")
        else:
            print(f"❌ Missing column: {col}")
            schema_valid = False
    
    conn.close()
    
    if schema_valid:
        print("\n✓ Database schema is valid")
    else:
        print("\n❌ Database schema has issues")
        print("   Action: Run database migration script")
    
    return schema_valid


def check_oauth():
    """Test OAuth configuration and token validity."""
    print("\n" + "="*60)
    print("OAUTH DIAGNOSTICS")
    print("="*60)
    
    # Check for credentials
    client_id = os.getenv('SPOTIPY_CLIENT_ID')
    client_secret = os.getenv('SPOTIPY_CLIENT_SECRET')
    redirect_uri = os.getenv('SPOTIPY_REDIRECT_URI')
    
    creds_valid = True
    
    if client_id:
        print(f"✓ Client ID found: {client_id[:8]}...")
    else:
        print("❌ SPOTIPY_CLIENT_ID not set")
        creds_valid = False
    
    if client_secret:
        print(f"✓ Client Secret found: {client_secret[:8]}...")
    else:
        print("❌ SPOTIPY_CLIENT_SECRET not set")
        creds_valid = False
    
    if redirect_uri:
        print(f"✓ Redirect URI: {redirect_uri}")
    else:
        print("❌ SPOTIPY_REDIRECT_URI not set")
        creds_valid = False
    
    if not creds_valid:
        print("\n❌ OAuth credentials incomplete")
        print("   Action: Set environment variables or create .env file")
        return False
    
    # Check for token cache
    cache_path = Path(".cache")
    if cache_path.exists():
        print(f"✓ Token cache exists: {cache_path}")
    else:
        print("⚠ No token cache found (will require login)")
    
    # Test token validity
    try:
        scope = "user-top-read user-read-recently-played"
        auth_manager = SpotifyOAuth(
            client_id=client_id,
            client_secret=client_secret,
            redirect_uri=redirect_uri,
            scope=scope
        )
        
        token_info = auth_manager.get_cached_token()
        
        if token_info:
            print("✓ Valid token found in cache")
            
            # Check if token is expired
            import time
            expires_at = token_info.get('expires_at', 0)
            now = int(time.time())
            
            if expires_at > now:
                remaining = expires_at - now
                print(f"  → Token valid for {remaining // 60} minutes")
            else:
                print("⚠ Token expired (will auto-refresh)")
        else:
            print("⚠ No cached token (will require login)")
            
    except Exception as e:
        print(f"❌ OAuth error: {e}")
        return False
    
    print("\n✓ OAuth configuration is valid")
    return True


def check_api_access():
    """Test actual API connectivity."""
    print("\n" + "="*60)
    print("API DIAGNOSTICS")
    print("="*60)
    
    try:
        scope = "user-top-read user-read-recently-played"
        auth_manager = SpotifyOAuth(scope=scope)
        sp = spotipy.Spotify(auth_manager=auth_manager)
        
        # Test with simple API call
        print("Testing API connection...")
        user = sp.current_user()
        
        print(f"✓ Successfully connected to Spotify API")
        print(f"  → Authenticated as: {user['display_name']}")
        print(f"  → User ID: {user['id']}")
        
        # Test top tracks endpoint
        print("\nTesting top tracks endpoint...")
        top_tracks = sp.current_user_top_tracks(limit=5, time_range='short_term')
        track_count = len(top_tracks['items'])
        print(f"✓ Retrieved {track_count} top tracks")
        
        # Test audio features endpoint
        if track_count > 0:
            print("\nTesting audio features endpoint...")
            track_id = top_tracks['items'][0]['id']
            features = sp.audio_features([track_id])
            
            if features and features[0]:
                print(f"✓ Audio features working")
                print(f"  → Sample: energy={features[0]['energy']:.2f}, "
                      f"valence={features[0]['valence']:.2f}")
            else:
                print("⚠ Audio features returned None")
        
        print("\n✓ All API endpoints accessible")
        return True
        
    except spotipy.exceptions.SpotifyException as e:
        print(f"❌ Spotify API error: {e}")
        
        if e.http_status == 401:
            print("   → Authentication failed")
            print("   Action: Delete .cache and re-authenticate")
        elif e.http_status == 429:
            print("   → Rate limit exceeded")
            print("   Action: Wait 60 seconds and try again")
        else:
            print(f"   → HTTP {e.http_status}")
        
        return False
        
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        print("   Action: Check network connection")
        return False


def run_full_diagnostic():
    """Run all diagnostic checks."""
    print("\n" + "="*60)
    print("MUSIC TIME MACHINE - SYSTEM DIAGNOSTICS")
    print("="*60)
    
    results = {
        'database': check_database(),
        'oauth': check_oauth(),
        'api': check_api_access()
    }
    
    print("\n" + "="*60)
    print("DIAGNOSTIC SUMMARY")
    print("="*60)
    
    for component, status in results.items():
        status_icon = "✓" if status else "❌"
        print(f"{status_icon} {component.upper()}: {'PASS' if status else 'FAIL'}")
    
    all_pass = all(results.values())
    
    if all_pass:
        print("\n✓ All systems operational")
        print("  Your Music Time Machine should work correctly")
    else:
        print("\n❌ System has issues")
        print("  Review the diagnostics above for specific problems")
    
    print("="*60 + "\n")
    
    return all_pass


if __name__ == "__main__":
    success = run_full_diagnostic()
    sys.exit(0 if success else 1)

How to Use This Diagnostic

Save this as diagnose.py in your project directory. When something breaks, run it:

python diagnose.py

The script tests each component independently and tells you exactly what's wrong. Common scenarios:

Database fails, OAuth passes: Your database schema is wrong or corrupted. Re-run the setup script.
OAuth fails, database passes: Your credentials are missing or wrong. Check your .env file.
OAuth passes, API fails: Your token is invalid or Spotify's API is down. Delete .cache and re-authenticate.
Everything fails: Check your network connection or verify Spotify's API status.

Professional Debugging Practice

This diagnostic script demonstrates a professional debugging approach: isolate each system component and test it independently. When debugging integrated systems, developers often waste hours checking the wrong component.

"My app doesn't work" becomes "Database connection successful, OAuth valid, but API returns 429" — which immediately tells you the problem is rate limiting, not credentials or schema issues.

Building diagnostic tools early saves debugging time later. When you deploy in Chapter 20, you'll add health check endpoints that run similar diagnostics automatically. Start building these habits now.

When Production Error Handling Isn't Worth It

Not every project needs production-grade error handling. The Music Time Machine does because it runs repeatedly over months, depends on external services, and stores valuable accumulated data. But for single-use scripts or internal developer tools, simpler error handling is fine.

Use Production Error Handling When:

Application runs unattended or on a schedule
Users are non-technical (can't debug stack traces)
External APIs are unreliable or rate-limited
Data loss would be costly (months of accumulated snapshots)
Application has 100+ users who can't file detailed bug reports

Skip Production Error Handling When:

Script runs once then exits (migration scripts, data exports)
Users are developers who understand stack traces
Failures are acceptable (prototype, proof of concept)
Development speed matters more than reliability
You're the only user and can debug problems immediately

The Music Time Machine crosses the threshold where production error handling becomes worthwhile. You'll run it monthly for years. Network issues and API hiccups are inevitable. Accumulated data is valuable. Users deserve helpful messages instead of cryptic exceptions. The investment in robust error handling pays off.

9. Testing with Mocks

The Music Time Machine works. You've tested it manually by running features and verifying playlists appear in Spotify. But manual testing doesn't scale. Every time you change code, you need to re-test every feature. Every test requires hitting Spotify's API (slow, rate-limited, requires internet connection). Manual testing can't verify edge cases like network timeouts or missing audio features.

Automated tests solve these problems. You write tests once, run them hundreds of times, and catch bugs instantly when changes break existing functionality. The challenge with API-dependent code is that tests shouldn't make real API calls. Real API calls are slow, consume rate limits, require authentication, and produce non-deterministic results (your top tracks change over time).

The solution is mocking: replacing real API calls with fake implementations that return controlled data. Your tests run in isolation without touching Spotify's servers, without needing OAuth tokens, and with complete control over what data the "API" returns. This section shows you how to write tests for the Music Time Machine using Python's unittest.mock library.

Why Testing Matters for This Project

The Music Time Machine accumulates valuable data over months. A bug that corrupts your database or deletes snapshots loses irreplaceable history. Automated tests prevent these disasters by catching bugs before they reach production.

Regression Prevention

When you add a new feature (like a "summer vibes" mood profile), tests ensure existing features still work. Without tests, you might accidentally break Forgotten Gems while adding mood profiles, and not discover the bug until months later when you try to generate a forgotten gems playlist.

Edge Case Coverage

Manual testing hits the happy path (everything works). Tests cover edge cases: what happens when Spotify returns empty results? When audio features are missing? When the database is locked? Tests verify your error handling actually works.

Refactoring Confidence

Want to optimize database queries or restructure code? Tests let you refactor fearlessly. If tests pass after changes, behavior is preserved. If tests fail, you know exactly what broke.

Documentation Through Examples

Tests document how code should behave. Reading test_forgotten_gems_with_no_recent_plays() shows exactly what the forgotten gems feature should return when you have old snapshots but no recent ones.

Professional developers write tests not because they enjoy testing, but because untested code breaks in production and causes real problems. The time investment in writing tests pays off the first time tests catch a bug before users encounter it.

Test Setup and Structure

Python's unittest framework provides everything you need for testing. Create a tests/ directory alongside your main code and organize tests by feature.

Project Structure

music_time_machine/
├── music_time_machine.py      # Main application
├── features.py                # Feature implementations
├── database.py                # Database operations
├── schema.sql                 # Database schema
├── .env                       # Credentials (gitignored)
├── tests/
│   ├── __init__.py
│   ├── test_forgotten_gems.py
│   ├── test_monthly_snapshots.py
│   ├── test_mood_playlists.py
│   └── test_database.py
└── requirements.txt

Python (tests/test_forgotten_gems.py)

import unittest
import sqlite3
from datetime import date, timedelta
from unittest.mock import Mock, patch
import sys
import os

# Add parent directory to path so we can import our modules
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

from features import find_forgotten_gems, create_forgotten_gems_playlist

class TestForgottenGems(unittest.TestCase):
    """Test suite for Forgotten Gems feature"""
    
    def setUp(self):
        """Run before each test - create in-memory database"""
        # Use in-memory database for test isolation
        self.conn = sqlite3.connect(':memory:')
        
        # Create schema
        self.conn.executescript("""
            CREATE TABLE tracks (
                track_id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                artist_name TEXT NOT NULL,
                album_name TEXT NOT NULL,
                duration_ms INTEGER NOT NULL
            );
            
            CREATE TABLE snapshots (
                track_id TEXT NOT NULL,
                snapshot_date DATE NOT NULL,
                time_range TEXT NOT NULL,
                rank INTEGER NOT NULL,
                PRIMARY KEY (track_id, snapshot_date, time_range)
            );
        """)
    
    def tearDown(self):
        """Run after each test - cleanup"""
        self.conn.close()
    
    def test_finds_tracks_from_old_snapshots_not_in_recent(self):
        """Should return tracks from 90-365 days ago that aren't in last 30 days"""
        # Arrange: Insert test data
        old_date = (date.today() - timedelta(days=180)).isoformat()
        recent_date = (date.today() - timedelta(days=10)).isoformat()
        
        # Track that appears in old snapshot only
        self.conn.execute(
            "INSERT INTO tracks VALUES (?, ?, ?, ?, ?)",
            ('track1', 'Old Favorite', 'Artist A', 'Album A', 240000)
        )
        self.conn.execute(
            "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
            ('track1', old_date, 'short_term', 1)
        )
        
        # Track that appears in both (should be excluded)
        self.conn.execute(
            "INSERT INTO tracks VALUES (?, ?, ?, ?, ?)",
            ('track2', 'Still Playing', 'Artist B', 'Album B', 200000)
        )
        self.conn.execute(
            "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
            ('track2', old_date, 'short_term', 2)
        )
        self.conn.execute(
            "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
            ('track2', recent_date, 'short_term', 1)
        )
        
        self.conn.commit()
        
        # Act: Call function
        forgotten = find_forgotten_gems(self.conn, limit=10)
        
        # Assert: Check results
        self.assertEqual(len(forgotten), 1)
        self.assertEqual(forgotten[0][1], 'Old Favorite')  # Check track name
        self.assertEqual(forgotten[0][2], 'Artist A')      # Check artist
    
    def test_returns_empty_list_when_no_old_snapshots(self):
        """Should return empty list when no snapshots exist from 90-365 days ago"""
        # Arrange: Only recent snapshot
        recent_date = (date.today() - timedelta(days=5)).isoformat()
        
        self.conn.execute(
            "INSERT INTO tracks VALUES (?, ?, ?, ?, ?)",
            ('track1', 'Recent Track', 'Artist A', 'Album A', 240000)
        )
        self.conn.execute(
            "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
            ('track1', recent_date, 'short_term', 1)
        )
        self.conn.commit()
        
        # Act
        forgotten = find_forgotten_gems(self.conn, limit=10)
        
        # Assert
        self.assertEqual(len(forgotten), 0)
    
    def test_respects_limit_parameter(self):
        """Should return at most 'limit' tracks even if more qualify"""
        # Arrange: Insert 10 forgotten tracks
        old_date = (date.today() - timedelta(days=180)).isoformat()
        
        for i in range(10):
            track_id = f'track{i}'
            self.conn.execute(
                "INSERT INTO tracks VALUES (?, ?, ?, ?, ?)",
                (track_id, f'Track {i}', 'Artist', 'Album', 240000)
            )
            self.conn.execute(
                "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
                (track_id, old_date, 'short_term', i+1)
            )
        
        self.conn.commit()
        
        # Act: Request only 5
        forgotten = find_forgotten_gems(self.conn, limit=5)
        
        # Assert
        self.assertEqual(len(forgotten), 5)

if __name__ == '__main__':
    unittest.main()

What Just Happened: Test Structure

Each test follows the Arrange-Act-Assert pattern. Arrange: Set up test data (insert tracks and snapshots with specific dates). Act: Call the function you're testing. Assert: Verify results match expectations.

The setUp() method runs before each test and creates a fresh in-memory database. This ensures tests don't interfere with each other. Each test gets its own clean database with the schema created but no data. The tearDown() method cleans up after each test.

Test names describe what they verify: test_finds_tracks_from_old_snapshots_not_in_recent documents the expected behavior. Someone reading tests understands what forgotten gems should do without reading implementation code.

Running Tests

# Run all tests
python -m unittest discover tests/

# Run specific test file
python -m unittest tests.test_forgotten_gems

# Run specific test
python -m unittest tests.test_forgotten_gems.TestForgottenGems.test_respects_limit_parameter

# Run with verbose output
python -m unittest discover tests/ -v

Test Output

...
----------------------------------------------------------------------
Ran 3 tests in 0.012s

OK

Three dots mean three tests passed. If a test fails, you see detailed output showing which assertion failed and what the actual vs expected values were.

Mocking Spotify API Calls

Testing functions that call Spotify's API requires mocking. You replace the real Spotipy client with a mock object that returns fake data you control. This lets tests run without network calls, authentication, or rate limits.

Python (tests/test_monthly_snapshots.py)

import unittest
from unittest.mock import Mock, patch
import sqlite3
from datetime import date

from features import create_monthly_snapshot

class TestMonthlySnapshots(unittest.TestCase):
    
    def setUp(self):
        """Create in-memory database for tests"""
        self.conn = sqlite3.connect(':memory:')
        self.conn.executescript("""
            CREATE TABLE tracks (
                track_id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                artist_name TEXT NOT NULL,
                album_name TEXT NOT NULL,
                duration_ms INTEGER NOT NULL,
                popularity INTEGER,
                album_image_url TEXT,
                spotify_url TEXT
            );
            
            CREATE TABLE snapshots (
                track_id TEXT NOT NULL,
                snapshot_date DATE NOT NULL,
                time_range TEXT NOT NULL,
                rank INTEGER NOT NULL,
                PRIMARY KEY (track_id, snapshot_date, time_range)
            );
        """)
    
    def tearDown(self):
        self.conn.close()
    
    def test_creates_snapshot_with_mocked_spotify(self):
        """Should fetch tracks from Spotify, save to database, and create playlist"""
        # Arrange: Create mock Spotify client
        mock_sp = Mock()
        
        # Mock the API responses
        mock_sp.current_user_top_tracks.return_value = {
            'items': [
                {
                    'id': 'track1',
                    'name': 'Test Track 1',
                    'artists': [{'name': 'Artist A'}],
                    'album': {
                        'name': 'Album A',
                        'images': [{'url': 'http://example.com/image1.jpg'}]
                    },
                    'duration_ms': 240000,
                    'popularity': 85,
                    'external_urls': {'spotify': 'https://open.spotify.com/track/track1'}
                },
                {
                    'id': 'track2',
                    'name': 'Test Track 2',
                    'artists': [{'name': 'Artist B'}],
                    'album': {
                        'name': 'Album B',
                        'images': [{'url': 'http://example.com/image2.jpg'}]
                    },
                    'duration_ms': 200000,
                    'popularity': 90,
                    'external_urls': {'spotify': 'https://open.spotify.com/track/track2'}
                }
            ]
        }
        
        mock_sp.current_user.return_value = {
            'id': 'test_user',
            'display_name': 'Test User'
        }
        
        mock_sp.user_playlist_create.return_value = {
            'id': 'playlist123',
            'external_urls': {'spotify': 'https://open.spotify.com/playlist/playlist123'}
        }
        
        # Act: Call function with mock
        playlist_url, track_count = create_monthly_snapshot(
            mock_sp, 
            self.conn, 
            time_range='short_term'
        )
        
        # Assert: Verify tracks were saved to database
        cursor = self.conn.execute("SELECT COUNT(*) FROM tracks")
        self.assertEqual(cursor.fetchone()[0], 2)
        
        cursor = self.conn.execute("SELECT COUNT(*) FROM snapshots")
        self.assertEqual(cursor.fetchone()[0], 2)
        
        # Assert: Verify Spotify methods were called correctly
        mock_sp.current_user_top_tracks.assert_called_once_with(
            limit=50, 
            time_range='short_term'
        )
        mock_sp.user_playlist_create.assert_called_once()
        mock_sp.playlist_add_items.assert_called_once()
        
        # Assert: Verify return values
        self.assertIsNotNone(playlist_url)
        self.assertEqual(track_count, 2)
    
    def test_prevents_duplicate_snapshots_same_day(self):
        """Should refuse to create second snapshot on same day"""
        # Arrange: Create existing snapshot
        today = date.today().isoformat()
        self.conn.execute(
            "INSERT INTO tracks VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
            ('track1', 'Track', 'Artist', 'Album', 240000, 80, None, 'url')
        )
        self.conn.execute(
            "INSERT INTO snapshots VALUES (?, ?, ?, ?)",
            ('track1', today, 'short_term', 1)
        )
        self.conn.commit()
        
        mock_sp = Mock()
        
        # Act: Try to create another snapshot
        playlist_url, track_count = create_monthly_snapshot(
            mock_sp,
            self.conn,
            time_range='short_term'
        )
        
        # Assert: Should return None (refused to create duplicate)
        self.assertIsNone(playlist_url)
        self.assertEqual(track_count, 0)
        
        # Assert: Spotify API should not have been called
        mock_sp.current_user_top_tracks.assert_not_called()
    
    def test_handles_spotify_api_error(self):
        """Should handle gracefully when Spotify API fails"""
        # Arrange: Mock that raises exception
        mock_sp = Mock()
        mock_sp.current_user_top_tracks.side_effect = Exception("Spotify API error")
        
        # Act & Assert: Should not crash
        with self.assertRaises(Exception):
            create_monthly_snapshot(mock_sp, self.conn)

if __name__ == '__main__':
    unittest.main()

What Just Happened: Creating Mocks

The Mock() object from unittest.mock creates a fake object that records how it's called. When you write mock_sp.current_user_top_tracks.return_value = {...}, you're telling the mock "when this method is called, return this data."

The mock acts like a real Spotipy client. Your function calls sp.current_user_top_tracks(), the mock returns your fake data, and your function processes it normally. The test verifies your function saved the data correctly and called Spotify methods with the right parameters.

assert_called_once: Verifies the method was called exactly once. assert_called_once_with: Verifies the method was called with specific arguments. assert_not_called: Verifies the method was never called. These assertions ensure your function interacts with the API correctly.

Testing Mood Playlist Generation

Mood playlist tests verify that the audio feature filtering logic works correctly. You need to mock both Spotify API calls (for fetching features) and test the SQL queries that filter tracks by mood criteria.

Python (tests/test_mood_playlists.py)

import unittest
import sqlite3
from unittest.mock import Mock

from features import find_tracks_by_mood, create_mood_playlist

class TestMoodPlaylists(unittest.TestCase):
    
    def setUp(self):
        """Create database with tracks and audio features"""
        self.conn = sqlite3.connect(':memory:')
        self.conn.executescript("""
            CREATE TABLE tracks (
                track_id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                artist_name TEXT NOT NULL,
                album_name TEXT NOT NULL,
                duration_ms INTEGER NOT NULL
            );
            
            CREATE TABLE audio_features (
                track_id TEXT PRIMARY KEY,
                energy REAL NOT NULL,
                valence REAL NOT NULL,
                danceability REAL NOT NULL,
                tempo REAL NOT NULL,
                acousticness REAL NOT NULL,
                instrumentalness REAL NOT NULL,
                speechiness REAL NOT NULL
            );
        """)
        
        # Insert test tracks with various audio features
        test_data = [
            # High energy workout tracks
            ('track1', 'Workout Track 1', 'Artist A', 'Album A', 0.85, 0.70, 0.75, 150),
            ('track2', 'Workout Track 2', 'Artist B', 'Album B', 0.90, 0.65, 0.80, 160),
            
            # Low energy focus tracks
            ('track3', 'Focus Track 1', 'Artist C', 'Album C', 0.25, 0.40, 0.30, 95),
            ('track4', 'Focus Track 2', 'Artist D', 'Album D', 0.30, 0.35, 0.25, 100),
            
            # Medium energy chill tracks
            ('track5', 'Chill Track 1', 'Artist E', 'Album E', 0.40, 0.55, 0.40, 90),
        ]
        
        for track_id, name, artist, album, energy, valence, dance, tempo in test_data:
            self.conn.execute(
                "INSERT INTO tracks VALUES (?, ?, ?, ?, ?)",
                (track_id, name, artist, album, 240000)
            )
            self.conn.execute(
                "INSERT INTO audio_features VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
                (track_id, energy, valence, dance, tempo, 0.5, 0.5, 0.05)
            )
        
        self.conn.commit()
    
    def tearDown(self):
        self.conn.close()
    
    def test_workout_mood_filters_correctly(self):
        """Should return only high energy, high tempo tracks for workout mood"""
        # Act: Find workout tracks
        tracks = find_tracks_by_mood(self.conn, 'workout', limit=10)
        
        # Assert: Should only return the 2 workout tracks
        self.assertEqual(len(tracks), 2)
        
        # Verify both returned tracks meet workout criteria
        for track in tracks:
            track_id, name, artist, album, energy, valence, tempo = track
            self.assertGreater(energy, 0.75, f"Track {name} energy too low")
            self.assertGreater(tempo, 140, f"Track {name} tempo too low")
    
    def test_focus_mood_filters_correctly(self):
        """Should return only low energy tracks for focus mood"""
        # Act
        tracks = find_tracks_by_mood(self.conn, 'focus', limit=10)
        
        # Assert
        self.assertEqual(len(tracks), 2)
        
        for track in tracks:
            energy = track[4]
            self.assertLess(energy, 0.40, "Focus track energy too high")
    
    def test_chill_mood_filters_correctly(self):
        """Should return medium energy, moderate tempo tracks for chill mood"""
        # Act
        tracks = find_tracks_by_mood(self.conn, 'chill', limit=10)
        
        # Assert: Should return the 1 chill track
        self.assertEqual(len(tracks), 1)
        self.assertEqual(tracks[0][1], 'Chill Track 1')
    
    def test_returns_empty_when_no_tracks_match_mood(self):
        """Should return empty list when no tracks match mood criteria"""
        # Arrange: Clear database and insert only low-energy tracks
        self.conn.execute("DELETE FROM audio_features")
        self.conn.execute(
            "INSERT INTO audio_features VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
            ('track1', 0.20, 0.30, 0.25, 80, 0.5, 0.5, 0.05)
        )
        self.conn.commit()
        
        # Act: Try to find workout tracks (should find none)
        tracks = find_tracks_by_mood(self.conn, 'workout', limit=10)
        
        # Assert
        self.assertEqual(len(tracks), 0)
    
    def test_respects_limit_parameter(self):
        """Should return at most 'limit' tracks"""
        # Act: Request only 1 workout track (even though 2 exist)
        tracks = find_tracks_by_mood(self.conn, 'workout', limit=1)
        
        # Assert
        self.assertEqual(len(tracks), 1)
    
    def test_create_mood_playlist_with_mocked_spotify(self):
        """Should create playlist with tracks matching mood"""
        # Arrange: Mock Spotify client
        mock_sp = Mock()
        mock_sp.current_user.return_value = {'id': 'test_user'}
        mock_sp.user_playlist_create.return_value = {
            'id': 'playlist123',
            'external_urls': {'spotify': 'https://open.spotify.com/playlist/123'}
        }
        
        # Act: Create workout playlist
        playlist_url = create_mood_playlist(mock_sp, self.conn, 'workout', limit=25)
        
        # Assert: Verify playlist was created with correct tracks
        self.assertIsNotNone(playlist_url)
        mock_sp.user_playlist_create.assert_called_once()
        
        # Verify playlist_add_items was called with workout track URIs
        call_args = mock_sp.playlist_add_items.call_args
        added_uris = call_args[0][1]  # Second argument to playlist_add_items
        
        # Should have 2 workout tracks
        self.assertEqual(len(added_uris), 2)
        self.assertTrue(all(uri.startswith('spotify:track:') for uri in added_uris))

if __name__ == '__main__':
    unittest.main()

What Just Happened: Testing SQL Logic

These tests verify the SQL queries that filter tracks by mood criteria. You insert tracks with known audio features (high energy tracks, low energy tracks, medium energy tracks), then verify the mood filtering returns the correct subset.

The test data is carefully chosen to test boundary conditions. A track with energy 0.85 should match the workout profile (energy > 0.75), but a track with energy 0.40 should not. Tests verify these boundaries work correctly.

The final test combines SQL filtering with API mocking. It verifies the entire flow: query database for matching tracks, format as URIs, call Spotify API to create playlist. The mock verifies the right number of tracks were added with correct URIs.

Testing Time-Dependent Logic

The forgotten gems feature depends on dates: it finds tracks from 90-365 days ago that aren't in the last 30 days. Testing date logic is tricky because dates change every day. A test that passes today might fail tomorrow when dates shift.

The solution is controlling time in tests. You explicitly set snapshot dates to known values (180 days ago, 10 days ago) rather than using relative dates. This makes tests deterministic regardless of when they run.

Python (Time-Based Testing)

from datetime import date, timedelta
import unittest

class TestTimeDependentLogic(unittest.TestCase):
    
    def test_forgotten_gems_date_ranges(self):
        """Test that date calculations are correct"""
        # Calculate expected date boundaries
        today = date.today()
        days_90_ago = today - timedelta(days=90)
        days_365_ago = today - timedelta(days=365)
        days_30_ago = today - timedelta(days=30)
        
        # Verify dates fall in expected windows
        self.assertLess(days_365_ago, days_90_ago)
        self.assertLess(days_30_ago, today)
        
        # These date calculations should match your SQL queries
        # SELECT ... WHERE snapshot_date BETWEEN date('now', '-365 days') AND date('now', '-90 days')
    
    def test_boundary_conditions_for_dates(self):
        """Test edge cases at date boundaries"""
        conn = sqlite3.connect(':memory:')
        conn.executescript("""
            CREATE TABLE tracks (track_id TEXT PRIMARY KEY, name TEXT, 
                               artist_name TEXT, album_name TEXT, duration_ms INTEGER);
            CREATE TABLE snapshots (track_id TEXT, snapshot_date DATE, 
                                  time_range TEXT, rank INTEGER);
        """)
        
        # Test track exactly 90 days ago (should be included)
        exactly_90_days = (date.today() - timedelta(days=90)).isoformat()
        conn.execute("INSERT INTO tracks VALUES ('track1', 'Name', 'Artist', 'Album', 240000)")
        conn.execute("INSERT INTO snapshots VALUES ('track1', ?, 'short_term', 1)", (exactly_90_days,))
        
        # Test track exactly 365 days ago (should be included)
        exactly_365_days = (date.today() - timedelta(days=365)).isoformat()
        conn.execute("INSERT INTO tracks VALUES ('track2', 'Name2', 'Artist', 'Album', 240000)")
        conn.execute("INSERT INTO snapshots VALUES ('track2', ?, 'short_term', 1)", (exactly_365_days,))
        
        # Test track 89 days ago (too recent, should be excluded)
        days_89_ago = (date.today() - timedelta(days=89)).isoformat()
        conn.execute("INSERT INTO tracks VALUES ('track3', 'Name3', 'Artist', 'Album', 240000)")
        conn.execute("INSERT INTO snapshots VALUES ('track3', ?, 'short_term', 1)", (days_89_ago,))
        
        # Test track 366 days ago (too old, should be excluded)
        days_366_ago = (date.today() - timedelta(days=366)).isoformat()
        conn.execute("INSERT INTO tracks VALUES ('track4', 'Name4', 'Artist', 'Album', 240000)")
        conn.execute("INSERT INTO snapshots VALUES ('track4', ?, 'short_term', 1)", (days_366_ago,))
        
        conn.commit()
        
        # Act: Find forgotten gems
        from features import find_forgotten_gems
        forgotten = find_forgotten_gems(conn, limit=10)
        
        # Assert: Should include tracks at exactly 90 and 365 days, exclude 89 and 366
        self.assertEqual(len(forgotten), 2)
        track_names = [track[1] for track in forgotten]
        self.assertIn('Name', track_names)   # 90 days
        self.assertIn('Name2', track_names)  # 365 days
        self.assertNotIn('Name3', track_names)  # 89 days (too recent)
        self.assertNotIn('Name4', track_names)  # 366 days (too old)
        
        conn.close()

Why Boundary Testing Matters

Date boundaries are easy to get wrong. Is "90 days ago" inclusive or exclusive? What about "365 days ago"? Boundary tests verify that tracks at exactly 90 days and exactly 365 days are handled correctly.

SQLite's BETWEEN operator is inclusive on both ends. BETWEEN date('now', '-365 days') AND date('now', '-90 days') includes both boundaries. The test verifies this by inserting tracks at exact boundaries and checking they appear in results.

These tests are deterministic. They run identically today, tomorrow, and next year because they use explicit dates calculated from date.today() rather than hardcoded strings.

Measuring Test Coverage

Test coverage measures what percentage of your code is executed during tests. High coverage doesn't guarantee good tests (you can execute code without verifying it works correctly), but low coverage guarantees untested code. Use Python's coverage tool to measure coverage.

Terminal

# Install coverage tool
pip install coverage

# Run tests with coverage tracking
coverage run -m unittest discover tests/

# Generate coverage report
coverage report

# Generate detailed HTML report
coverage html
# Open htmlcov/index.html in browser to see line-by-line coverage

Example Coverage Report

Name                    Stmts   Miss  Cover
-------------------------------------------
features.py               145     12    92%
database.py                67      8    88%
music_time_machine.py      89     25    72%
-------------------------------------------
TOTAL                     301     45    85%

This report shows features.py has 92% test coverage (145 statements, only 12 untested). The 12 untested lines might be error handling edge cases or rarely-used code paths. Aim for 80-90% coverage on core logic. Don't obsess over 100% coverage.

What to Test vs What to Skip

Test thoroughly: Core logic (date calculations, mood filtering, data transformations), database queries (CRUD operations, complex queries), edge cases (empty results, missing data, boundary conditions).

Test lightly: Error messages (verify they exist, don't verify exact wording), UI code (print statements, menu displays), trivial getters/setters.

Don't test: External libraries (Spotipy, SQLite), Python standard library functions, configuration files, generated code.

Running Tests Automatically

Professional projects run tests automatically on every code change. When you push code to GitHub, automated systems (GitHub Actions, GitLab CI, CircleCI) run your test suite and report failures immediately. This catches bugs before they reach production.

.github/workflows/tests.yml

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install coverage
    
    - name: Run tests
      run: |
        coverage run -m unittest discover tests/
        coverage report
        coverage xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v2
      with:
        files: ./coverage.xml

Save this file in your repository. Every time you push code, GitHub Actions runs your tests automatically and reports results. Failed tests block pull requests from merging, preventing broken code from reaching production.

10. Chapter Summary

You built the Music Time Machine from scratch. Starting with OAuth authentication, you integrated Spotify's API, designed a database schema optimized for time-series queries, implemented four complete features that combine API calls with historical data, added production error handling that makes failures recoverable, and wrote automated tests that verify everything works without manual testing.

This isn't a tutorial project you'll abandon. It's a tool you'll actually use. Every month you take a snapshot, your database grows richer. After six months, forgotten gems starts surfacing tracks you genuinely forgot about. After a year, the analytics reveal patterns in your musical taste you never consciously noticed. The longer you use it, the more valuable it becomes.

More importantly, you built something portfolio-ready. When recruiters ask "show me a project," you have a live application with OAuth, database persistence, algorithmic curation, error handling, and automated tests. You can explain design decisions, demonstrate features, and discuss tradeoffs you made. This is the kind of project that gets interviews.

Key Skills Mastered

This chapter covered substantial ground. Here are the professional skills you now have:

OAuth Integration with Real-World APIs

You applied Chapter 14's OAuth concepts to Spotify's authorization flow, managed access tokens through Spotipy's abstractions, understood what happens behind the scenes (authorization URLs, token exchange, refresh logic), and handled authorization errors gracefully with helpful user messages. You can now integrate OAuth with any provider that follows the standard flow.

Database Schema Design for Time-Series Data

You designed a three-table schema that separates entities (tracks, audio features) from events (snapshots), chose between normalization and denormalization strategically, created indexes that optimize time-based queries, and understood why separating audio features into their own table enables lazy loading and faster queries. These design patterns apply to any application tracking data over time.

Multi-Feature Application Architecture

You built four distinct features that share a database and work together coherently. Forgotten gems depends on monthly snapshots providing historical data. Mood playlists depend on audio features being fetched and stored. Analytics aggregates all accumulated data. You saw how features build on shared infrastructure rather than operating in isolation.

Algorithmic Curation Systems

You built mood-based playlist generation that scores tracks against multi-dimensional criteria. The system defines mood profiles (combinations of energy, valence, tempo), queries the database for matching tracks, and creates playlists without manual curation. This demonstrates how audio feature data enables algorithmic content discovery.

Production Error Handling Patterns

You categorized errors (transient, authorization, rate limit, data errors), implemented retry logic with exponential backoff and jitter, handled SQLite database locks gracefully, enabled graceful degradation for partial failures, and provided three-part user messages that explain what went wrong and what to do. Your application recovers from failures automatically instead of crashing.

Testing API-Dependent Code with Mocks

You wrote automated tests that verify functionality without making real API calls, used mocks to control what "Spotify" returns during tests, tested edge cases (empty results, missing data, boundary conditions), created in-memory databases for test isolation, and measured test coverage to identify untested code. Professional developers test their code before it reaches production.

Working with Third-Party API Libraries

You used Spotipy professionally while understanding what it abstracts (OAuth flows, token management, API endpoints), knew when to trust the library and when to work around its limitations, and combined library convenience with custom logic for application-specific features. This skill transfers to any third-party library integration.

These skills combine into something greater than their sum. You didn't just learn OAuth or databases or testing in isolation. You integrated them into a complete application that demonstrates full-stack development capability. That integration is what makes this portfolio-ready.

Chapter Review Quiz

Test your understanding of the Music Time Machine concepts:

Select question to reveal the answer:

Why does the Music Time Machine use three separate tables (tracks, audio_features, snapshots) instead of storing everything in one table?

Three tables prevent data duplication and enable efficient queries. Tracks are entities that exist independently. Audio features are optional attributes fetched on demand. Snapshots are events that record when tracks appeared in your top 50. Mixing these concerns into one table causes massive duplication (storing "Karma Police" metadata 12 times if it appears in 12 monthly snapshots).

When generating mood playlists, you only need tracks and audio features. Separating snapshots means you don't scan irrelevant historical records. When analyzing trends, you need snapshots and tracks but not audio features. Each table contains only the data needed for specific query patterns.

Audio features are fetched on demand (when generating mood playlists), not when taking snapshots. Separating them enables this lazy loading pattern. You take monthly snapshots quickly (no audio feature fetching), then batch-fetch features later when needed.

The forgotten gems feature finds tracks from 90-365 days ago that don't appear in the last 30 days. Why these specific time windows?

The time windows balance recency and history for genuine rediscovery. Tracks from 90-365 days ago (3-12 months) represent songs you loved for sustained periods, not brief obsessions. Less than 90 days is too recent (you probably remember these songs). More than 365 days risks surfacing tracks from completely different musical phases that no longer resonate.

The last 30 days excludes tracks still in rotation. You want to rediscover forgotten favorites, not get recommendations for songs you're already hearing. Thirty days captures your current listening without being too restrictive.

The gap between 30-90 days is a buffer zone that prevents tracks that just recently dropped out of rotation from appearing as "forgotten." A track you stopped listening to 40 days ago isn't genuinely forgotten yet.

These windows are configurable. You could adjust to 60-180 days and 14 days for different discovery patterns. The defaults balance "genuinely forgotten" with "still relevant to current taste."

Why does the retry logic use exponential backoff with jitter instead of retrying immediately or waiting fixed intervals?

Exponential backoff with jitter prevents overwhelming recovering services while avoiding synchronized retry waves. If Spotify's servers are overloaded (causing 503 errors), immediately retrying adds more load and makes the problem worse. The service needs time to recover before handling new requests.

Fixed intervals create problems too. If 1,000 clients all retry every 5 seconds, they create synchronized waves of traffic that repeatedly overwhelm the server just as it's recovering. This extends outages.

Exponential backoff (1s, 2s, 4s) gives services progressively more breathing room to recover. If the problem persists after 3 retries (7 seconds total wait), it's likely not transient. Jitter (adding 0-10% randomness) prevents thundering herd. If 1,000 clients all fail at the same moment, jitter ensures they don't all retry at exactly 1 second, 2 seconds, and 4 seconds later. Requests arrive in a steady stream rather than synchronized waves.

This pattern is used by AWS, Google Cloud, and every major platform because it minimizes both client wait times and server load during recovery.

When testing the mood playlist feature, why do you mock Spotify's API instead of making real API calls during tests?

Mocking enables fast, reliable, isolated tests without external dependencies. Real API calls take 100-500ms each. With mocks, tests run in milliseconds. A test suite with 50 tests completes in under 1 second with mocks vs 30+ seconds with real API calls. Fast tests mean you run them frequently.

Real API calls fail when your internet is down, when Spotify's servers have issues, when rate limits are hit, or when OAuth tokens expire. Tests should pass or fail based on code correctness, not external service availability. Mocks keep tests isolated so they can run in any order without affecting each other.

You can't control what Spotify returns. Your top tracks change daily. Mocks let you test specific scenarios: what happens with empty results? What if audio features are missing? What if a track has extreme energy values? You define the test data exactly. Tests run in continuous integration systems without your personal Spotify credentials. Mocks eliminate the authentication dependency entirely.

The audio features table stores energy, valence, tempo, and other attributes. Why store these in the database instead of fetching them from Spotify's API every time you generate a mood playlist?

Caching audio features in the database eliminates redundant API calls and enables offline functionality. Audio features for a track never change. Once fetched, they're valid forever. Fetching them repeatedly wastes API calls and consumes rate limits. With 500 tracks in your database, generating a workout playlist requires zero API calls (just a SQL query) instead of 500 API calls.

SQL queries against local database complete in milliseconds. Fetching audio features for 500 tracks from Spotify takes 5-10 seconds (even in batches of 100). Cached features make mood playlist generation instant.

Once features are cached, you can generate mood playlists without internet connection. The database contains all data needed for filtering. You only need network access to create the playlist in Spotify, not to determine which tracks match the mood. You fetch features on demand (when first generating mood playlists), not when taking snapshots. This keeps monthly snapshots fast while ensuring features are available when needed.

The tradeoff is storage space (12 float values per track), but for a personal application with thousands of tracks, this is negligible (kilobytes, not megabytes).

The create_monthly_snapshot function checks if a snapshot already exists for today's date before fetching tracks. Why prevent multiple snapshots per day?

Preventing duplicate snapshots maintains data integrity and reflects the feature's intended purpose. The database schema uses (track_id, snapshot_date, time_range) as a composite primary key. Taking multiple snapshots on the same day would either fail (primary key constraint violation) or require adding timestamps, which complicates time-based queries that currently use simple date comparisons.

Monthly snapshots capture your musical identity at monthly intervals. Taking multiple snapshots per day doesn't make semantic sense. Your top 50 tracks don't change meaningfully in hours. The feature is designed for monthly rhythm (first day of month, end of month, whenever you remember).

Without this check, running the script twice in one day (accidentally double-clicking, running from different terminals) would attempt to create duplicate snapshots. The check provides idempotency: running the same command multiple times has the same effect as running it once. The error message explains "Monthly snapshots are designed to run once per month" to educate users about intended usage rather than silently creating duplicates or crashing with a database error.

If you wanted intraday granularity (snapshots every 6 hours), you'd add time component to snapshot_date and change the primary key accordingly. The current design matches the monthly snapshot use case.

Why does the database schema store artist_name and album_name as text in the tracks table instead of creating separate artists and albums tables with foreign key relationships?

Denormalization trades storage space for query simplicity and performance. A normalized approach would create artists table, albums table, and use foreign keys in tracks. This eliminates redundancy (store "Radiohead" once, not 20 times for 20 tracks). This is "proper" database design taught in courses.

But for the Music Time Machine, denormalization works better. It's a single-user application with thousands of tracks, not millions. Storing "Radiohead" 20 times uses a few hundred extra bytes. Modern computers don't notice this. The storage penalty is negligible.

The benefit is query simplicity. Displaying track information requires zero joins. SELECT name, artist_name, album_name FROM tracks gets everything. With normalization, you'd need JOIN artists ON tracks.artist_id = artists.id JOIN albums ON tracks.album_id = albums.id for every query. Joins add complexity and slight performance overhead.

SQLite is optimized for simple queries, not complex joins. Denormalized queries run faster because SQLite scans fewer tables and doesn't build join indexes. You'd normalize for multi-user applications, millions of records, or when you need to update artist or album info independently. The Music Time Machine doesn't have these requirements. The schema includes this design decision deliberately. It's not "wrong" or "lazy." It's pragmatic for the application's scale and access patterns.

The mood playlist feature processes tracks in batches of 100 when fetching audio features. Why batch instead of fetching features one track at a time?

Batching dramatically reduces API calls and total time by leveraging Spotify's batch endpoint. Spotify's audio_features() endpoint accepts up to 100 track IDs in a single request. Fetching 500 tracks individually requires 500 API calls. Fetching in batches of 100 requires 5 API calls. This is a 100x reduction in network requests.

If one batch fails (network timeout, server error), you lose at most 100 tracks worth of features. With individual requests, failures would happen more frequently (more requests = more opportunities for failure), and you'd lose features one at a time, making progress tracking difficult. The code uses range(0, len(track_ids), 100) to process in chunks of 100, matching Spotify's documented batch limit. Professional API integration always uses batch endpoints when available.

Looking Forward

The Music Time Machine works as a command-line application. You can run it monthly, generate playlists, and view analytics. But showing this to recruiters requires screenshots and demo videos. The application would be more compelling with a web interface.

Chapter 17 introduces Flask fundamentals and builds your first web dashboard. You'll learn Flask's routing system, master Jinja2 templates, integrate Chart.js visualizations, and create a responsive home page that displays your musical insights. The chapter uses a "backend-first" strategy with a provided CSS starter kit, so you focus on connecting your data rather than wrestling with frontend minutiae.

Chapter 18 completes the dashboard by building three additional pages: an Analytics page with multiple chart types and date filtering, a Playlist Manager with form handling and AJAX, and a Settings page with data management controls. You'll apply the patterns from Chapter 17 repeatedly until they become second nature, turning concepts into muscle memory.

The testing patterns you learn in Section 7 preview Chapter 19, where you\'ll expand this test suite with integration tests for database operations, end-to-end tests for complete workflows, and achieve 95% code coverage. For now, focus on the mock-based unit testing that verifies your features work without hitting Spotify\'s servers. Chapter 19 adds comprehensive automated testing to your application. You\'ll use pytest to write unit tests for algorithms, integration tests for database operations, and mocks for external dependencies like Spotify's API. The test suite will verify that 43+ tests pass in under 3 seconds with 95% code coverage, giving you confidence that your application works correctly and won't break when you make changes.

Chapter 20 covers deployment, taking your application from localhost to a live public URL. You'll deploy the complete application (Flask web server, SQLite database, OAuth configuration) to a cloud platform like Railway or Render. The deployment process includes environment variable configuration, database migrations, HTTPS setup, and production security considerations. By the end, you'll have a deployed application at yourname-music-time-machine.railway.app that you can share with anyone.

The foundation you built in this chapter (OAuth, database, features, error handling) supports everything that comes next. The web interface, additional features, testing, and deployment are additions, not rewrites. This is how professional development works: build solid foundations, then add layers of functionality on top.