Chapter 12: API Data Validation

Building Reliable Applications with External Data

1. Introduction

Defensive programming prevents crashes. Validation prevents bad data.

Your News Aggregator from Chapter 11 works. Articles display, searches run, and your defensive patterns prevent crashes when APIs return missing or malformed fields.

But "working" is not the same as "reliable." Use the aggregator daily and quality problems show up. Articles have empty titles. Dates display as "Invalid Date." Placeholder text slips through. A story dated "2099" jumps to the top of the feed.

This is the limit of defensive programming. Fallback values and safe checks keep your app running, but they can also allow bad data to pass quietly. Defensive programming protects your code. Validation protects your output.

The Key Distinction

Defensive programming (Chapters 9 to 11) keeps your application stable when data is missing, malformed, or unexpected.

Validation (this chapter) enforces rules and rejects or flags data that does not meet your requirements.

You need both. Defensive programming for resilience, validation for quality.

In this chapter, you'll learn the difference between data that is "safe" (won't crash your code) and data that is "valid" (meets your requirements). You'll build a three-layer validation approach that checks structure, content, and business rules, and you'll learn where validation belongs in your architecture so you enforce quality without scattering checks everywhere.

Chapter Roadmap

This chapter takes you from understanding why validation matters to building a production-ready validation system. You'll progress from manual checks through automated schemas to a hybrid approach used in real-world applications.

The Three-Layer Validation Pattern

Section 2 • Foundation

Learn the conceptual framework that organises validation into three distinct layers: structural checks, content checks, and business rules. You'll see why scattered single-field checks don't scale and how a layered approach catches different categories of bad data.

Structural Content Business Rules

Manual Validation Implementation

Section 3 • Core Skills

Build each validation layer by hand using pure Python functions. You'll implement structural validators that check for required keys, content validators that enforce types and ranges, and business rule validators that catch domain-specific issues like -999 sensor error placeholders.

Validation Functions Fail-Fast Patterns Weather Dashboard

JSON Schema and the Hybrid Approach

Sections 4–5 • Automation

Discover how JSON Schema automates structural and content validation with declarative rules instead of manual code. Then combine schemas with manual functions in a hybrid approach: let schemas handle repetitive checks while keeping hand-written validators for complex business logic.

JSON Schema jsonschema Library Hybrid Validation Error Messages

Validation Architecture

Section 6 • Design Patterns

Learn where validation belongs in your application architecture. You'll explore the three validation boundaries, understand when to fail fast versus fail gracefully, and build systematic tests that verify your validators catch every category of bad data.

Boundaries Fail-Fast vs Graceful Testing Validators

Practical Exercises

Section 7 • Hands-On Practice

Apply everything you've learned across four exercises: build a user profile validator from scratch, convert it to JSON Schema, fix a validation bug in the News Aggregator, and validate multi-day forecast data with cross-field business rules.

Profile Validator Schema Conversion News Aggregator Forecast Validation

Learning Objectives

What You'll Master in This Chapter

By the end of this chapter, you'll be able to:

Explain the difference between error handling, defensive programming, and validation, and know when each one belongs
Implement three-layer validation (structural, content, business rules) to catch different types of data quality issues
Build manual validation functions with clear error messages and fail-fast patterns that keep downstream code simple
Use JSON Schema to automate structural and content validation while keeping manual functions for complex business logic
Apply the hybrid approach that production systems use: schemas for automation, manual code for domain rules
Place validation at the right architectural boundaries and test it systematically

Why Validation Matters

You might wonder: if defensive programming prevents crashes, why add validation? Three reasons make validation essential as your applications grow:

Developer

✓

Status: 200 OK

Request succeeded

✓

No exception raised

Error handling worked

✓

App running

No crash detected

User

Weather App ● Live

Paris, France

-999°C

Partly Cloudy

N/A

Humidity

null

Wind

150°C

Feels Like

No errors. No crash. But your users are staring at -999°C.

Silent Corruption

Defensive programming with fallback values can hide quality issues. A missing author name becomes an empty string. A malformed timestamp becomes "N/A". Users see corrupted data but no errors, making problems harder to detect and debug.

Business Requirements

Some data isn't just "nice to have"; it's required for your application to function. A payment amount must be positive. An email must contain "@". A temperature must be physically possible. Defensive programming can't enforce these rules.

Clear Failures

When data doesn't meet requirements, validation fails fast with specific error messages. This beats silent corruption or mysterious behavior downstream. You know exactly what's wrong and where.

This chapter shows you how to add validation strategically, not replacing defensive programming but complementing it. You'll learn to validate at boundaries where quality matters while keeping defensive patterns throughout your codebase. The result is an application that is both resilient (handles unexpected situations) and reliable (produces quality output).

2. The Three-Layer Validation Pattern

Before building validators, you need to understand what you're validating against. The Weather Dashboard from Chapter 8 calls the Open-Meteo API and expects responses that look like this:

JSON - Expected Weather Response

{
  "current": {
    "temperature_2m": 22.5,
    "relative_humidity_2m": 65,
    "wind_speed_10m": 12.3,
    "weather_code": 0,
    "apparent_temperature": 21.8
  },
  "current_units": {
    "temperature_2m": "°C",
    "relative_humidity_2m": "%"
  }
}

Your code assumes this structure and uses the values directly. When the API sends exactly what you expect, everything works. But production APIs are messier than that.

When APIs Send Bad Data

External APIs don't guarantee data quality. Here's what actually shows up in production:

JSON - What Production APIs Actually Return

{
  "current": {
    "temperature_2m": -999,         // Sensor error placeholder
    "relative_humidity_2m": "N/A",  // String instead of number
    "wind_speed_10m": null,         // Missing data
    "weather_code": 71,             // Snow code
    "apparent_temperature": 150     // Impossible value
  }
}

Weather sensors malfunction and return placeholder values like -999. Format changes turn numbers into strings. Fields go missing entirely. The API returns technically valid JSON with a 200 status code, but the data is garbage.

Here's what happens when your code processes this without validation:

Python - Code Without Validation

def display_current_weather(weather_data, location_name):
    """Display current weather - crashes on bad data."""
    current = weather_data["current"]
    
    # Crashes if "current" missing (KeyError)
    temp = current["temperature_2m"]
    print(f"Temperature: {temp}°C")
    
    # Crashes if temperature is string (TypeError)
    if temp > 30:
        print("It's hot today!")
    
    # Shows nonsense if temp is -999 or 150
    # Users see: "Temperature: -999°C" or "Temperature: 150°C"

Without validation, you have three outcomes: crashes from type errors, crashes from missing fields, or silent corruption where impossible values flow through your application and confuse users.

Why Single-Field Checks Don't Scale

You might think: "I'll just add a check for each field." Here's what that looks like:

Python - The Repetition Problem

def validate_temperature(value):
    """Validate temperature field."""
    if value is None:
        return False, "Temperature missing"
    try:
        temp = float(value)
        if temp < -100 or temp > 60:
            return False, f"Temperature out of range: {temp}"
    except (ValueError, TypeError):
        return False, f"Temperature not numeric: {value}"
    return True, None

def validate_humidity(value):
    """Validate humidity field."""
    if value is None:
        return False, "Humidity missing"
    try:
        humidity = float(value)
        if humidity < 0 or humidity > 100:
            return False, f"Humidity out of range: {humidity}"
    except (ValueError, TypeError):
        return False, f"Humidity not numeric: {value}"
    return True, None

def validate_wind_speed(value):
    """Validate wind speed field."""
    if value is None:
        return False, "Wind speed missing"
    try:
        wind = float(value)
        if wind < 0 or wind > 200:
            return False, f"Wind speed out of range: {wind}"
    except (ValueError, TypeError):
        return False, f"Wind speed not numeric: {value}"
    return True, None

# You need similar functions for:
# - pressure, visibility, precipitation
# - cloud_cover, uv_index, wind_direction
# - and 10 more fields...
#
# That's 50+ lines per field × 15 fields = 750+ lines
# Every API change means updating multiple validators
# New developers must learn your pattern for every field type

The pattern is identical for every field: check existence, verify type, validate range. Writing this manually is tedious, error-prone, and doesn't scale. Real APIs have dozens of fields spread across nested structures. You need a systematic approach.

Three-Layer Validation

Professional validation separates concerns into three distinct layers. Each layer catches different problems and answers a different question about your data:

Layer 1

Structure

Shape right?

Layer 2

Content

Values possible?

Layer 3

Business Rules

Makes sense?

The three-layer validation pattern — each layer catches different types of data quality issues

Layer 1: Structural Validation

Question: Is the shape right?

Checks that required fields exist and have correct types. Prevents crashes from accessing missing keys or wrong data types.

Example - Structural Checks

if not isinstance(data, dict):
    return False, "Weather data must be a dictionary"

if "current" not in data:
    return False, "Missing required 'current' section"

if not isinstance(data["current"], dict):
    return False, "'current' must be a dictionary"

What it catches: Missing sections, wrong types, structural changes

Layer 2: Content Validation

Question: Are values sensible?

Validates that field values are realistic and well-formed. Prevents logic errors from unrealistic or malformed data.

Example - Content Checks

if temp < -100 or temp > 60:
    return False, f"Unrealistic temperature: {temp}°C"

if humidity < 0 or humidity > 100:
    return False, f"Invalid humidity: {humidity}%"

if wind_speed < 0:
    return False, "Wind speed cannot be negative"

What it catches: Placeholder values, sensor errors, format corruption

Layer 3: Business Rules Validation

Question: Does it make sense together?

Checks cross-field logic and domain-specific constraints. Prevents acting on technically valid but operationally nonsensical data.

Example - Business Rule Checks

# Snow at warm temperatures doesn't make sense
if weather_code in [71, 73, 75] and temp > 5:
    return False, f"Snow at {temp}°C is unlikely"

# "Feels like" should be near actual temperature
if abs(apparent_temp - temp) > 20:
    return False, f"'Feels like' {apparent_temp}°C too far from {temp}°C"

What it catches: Logical inconsistencies, domain violations, impossible combinations

These three layers work together sequentially. Structural validation ensures safe access to fields. Content validation ensures individual values make sense. Business rules ensure the data makes sense as a whole. Each layer builds on the previous one.

Why Three Layers?

Separating concerns makes validation easier to build, test, and maintain. Structural checks are generic and can be automated. Content validation is field-specific but follows patterns. Business rules are unique to your domain and require custom logic.

This separation also determines when each validation runs. Structure first (prevent crashes), content second (prevent corruption), business rules last (prevent nonsense).

Takeaways & Next Step

Understanding the Pattern

Production APIs send bad data: Placeholder values, wrong types, missing fields, impossible values
Single-field checks don't scale: Real APIs need systematic validation across dozens of fields
Three layers, three concerns: Structure (shape), Content (values), Business Rules (logic)
Sequential validation: Each layer builds on the previous, preventing different failure modes
Clear separation: Makes testing easier, maintenance simpler, and automation possible

With the pattern understood, Section 3 implements all three layers manually for the Weather Dashboard. You'll see exactly how to build validators that catch every type of data quality issue.

3. Manual Validation Implementation

Section 2 introduced the three-layer pattern conceptually. Now let's implement it. You'll build three validators that work together as a validation pipeline, then integrate that pipeline into the Weather Dashboard.

Each validator follows the same signature: it receives data and returns (is_valid, error_message). This consistent interface makes validators easy to chain and test.

Layer 1: Structural Validation

Structural validation ensures the response has the right shape. Check this first so you can safely access nested fields in later validators.

Weather Structure Validator

Python

def validate_weather_structure(data):
    """
    Layer 1: Validate weather response structure.
    Returns (is_valid, error_message).
    """
    # Root must be a dictionary
    if not isinstance(data, dict):
        return False, "Weather data must be a dictionary"
    
    # Must contain 'current' section
    if "current" not in data:
        return False, "Weather data missing 'current' section"
    
    # 'current' must be a dictionary
    if not isinstance(data["current"], dict):
        return False, "'current' section must be a dictionary"
    
    # Check for current_units (helpful but not critical)
    if "current_units" not in data:
        print("Warning: Missing units information")
    
    return True, None

What This Validates

Response is a dictionary (not string, list, or None)
Contains required 'current' section for current weather
'current' is itself a dictionary (not array or primitive)
Warns about missing units (degraded but usable)

After structural validation passes, you know data["current"] exists and is a dictionary. You can safely check its contents in the next layer.

Layer 2: Content Validation

Content validation checks that individual field values are realistic. This catches placeholder values, sensor errors, and data corruption.

Weather Content Validator

Python

def validate_weather_content(current_data):
    """
    Layer 2: Validate weather content for realistic values.
    Returns (is_valid, error_message).
    """
    # Temperature is required
    if "temperature_2m" not in current_data:
        return False, "Missing required temperature data"
    
    temp = current_data["temperature_2m"]
    if temp is None:
        return False, "Temperature cannot be null"
    
    # Validate temperature is numeric and realistic
    try:
        temp_float = float(temp)
        if temp_float < -100 or temp_float > 60:
            return False, f"Unrealistic temperature: {temp_float}°C"
    except (ValueError, TypeError):
        return False, f"Temperature must be numeric, got: {temp}"
    
    # Optional field: humidity (if present, must be valid)
    if "relative_humidity_2m" in current_data:
        humidity = current_data["relative_humidity_2m"]
        if humidity is not None:
            try:
                humidity_float = float(humidity)
                if humidity_float < 0 or humidity_float > 100:
                    return False, f"Invalid humidity: {humidity_float}%"
            except (ValueError, TypeError):
                return False, f"Humidity must be numeric, got: {humidity}"
    
    # Optional field: wind speed (if present, must be valid)
    if "wind_speed_10m" in current_data:
        wind = current_data["wind_speed_10m"]
        if wind is not None:
            try:
                wind_float = float(wind)
                if wind_float < 0:
                    return False, f"Wind speed cannot be negative: {wind_float}"
                if wind_float > 200:
                    return False, f"Unrealistic wind speed: {wind_float} km/h"
            except (ValueError, TypeError):
                return False, f"Wind speed must be numeric, got: {wind}"
    
    return True, None

Required vs Optional Fields

Temperature is required—if it's missing or invalid, validation fails immediately. Humidity and wind speed are optional—they're validated only if present. This distinction lets you enforce critical requirements while gracefully handling incomplete data for non-essential fields.

Layer 3: Business Rules Validation

Business rules validate cross-field logic and domain-specific constraints. These catch data that's individually valid but collectively nonsensical.

Weather Business Rules Validator

Python

def validate_weather_business_rules(current_data):
    """
    Layer 3: Validate weather business rules and logical consistency.
    Returns (is_valid, error_message).
    """
    temp = current_data.get("temperature_2m")
    weather_code = current_data.get("weather_code")
    
    # Business rule: Snow conditions should match temperature
    if temp is not None and weather_code is not None:
        try:
            temp_float = float(temp)
            code_int = int(weather_code)
            
            # Weather codes 71, 73, 75 indicate snow
            if code_int in [71, 73, 75] and temp_float > 5:
                return False, f"Snow conditions at {temp_float}°C is unlikely"
                
        except (ValueError, TypeError):
            # Type errors already caught in content validation
            pass
    
    # Business rule: "Feels like" should be near actual temperature
    apparent_temp = current_data.get("apparent_temperature")
    if temp is not None and apparent_temp is not None:
        try:
            temp_float = float(temp)
            apparent_float = float(apparent_temp)
            
            # "Feels like" shouldn't differ by more than 20°C
            if abs(apparent_float - temp_float) > 20:
                return False, (
                    f"'Feels like' {apparent_float}°C too different "
                    f"from actual {temp_float}°C"
                )
        except (ValueError, TypeError):
            # Type errors already caught in content validation
            pass
    
    return True, None

Why Business Rules Come Last

Business rules assume fields exist and have correct types. That's why they come after structural and content validation. By the time business rules run, you know fields are present and numeric, so you can focus on logical relationships without worrying about type errors.

The Complete Validation Pipeline

Now chain all three validators into a pipeline. Each layer runs sequentially, stopping at the first failure. This "fail-fast" approach gives specific error messages and avoids redundant checks.

Validation Pipeline

Python

def validate_weather_data(data):
    """
    Complete validation pipeline: structure → content → business rules.
    Returns (is_valid, error_message).
    """
    # Layer 1: Structural validation
    valid, error = validate_weather_structure(data)
    if not valid:
        return False, f"Structure validation failed: {error}"
    
    # Layer 2: Content validation
    current = data["current"]  # Safe because structure validated
    valid, error = validate_weather_content(current)
    if not valid:
        return False, f"Content validation failed: {error}"
    
    # Layer 3: Business rules validation
    valid, error = validate_weather_business_rules(current)
    if not valid:
        return False, f"Business rule validation failed: {error}"
    
    # All three layers passed
    return True, None


# Test with valid data
good_data = {
    "current": {
        "temperature_2m": 22.5,
        "relative_humidity_2m": 65,
        "wind_speed_10m": 12.3,
        "weather_code": 0,
        "apparent_temperature": 21.8
    }
}

valid, error = validate_weather_data(good_data)
print(f"Valid: {valid}")  # Valid: True


# Test with bad data
bad_data = {
    "current": {
        "temperature_2m": -999,  # Placeholder value
        "relative_humidity_2m": 65
    }
}

valid, error = validate_weather_data(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: Content validation failed: Unrealistic temperature: -999°C

Validation Pipeline — Fail-Fast Output

Output (example)

# Running validate_weather_data(bad_data)

Layer 1 — Structural validation  → PASS
Layer 2 — Content validation     → FAIL
Layer 3 — Business rules         → SKIPPED

# Valid: False
# Error: Content validation failed: Unrealistic temperature: -999°C

Fail-Fast Behavior

The pipeline stops at the first failure. If structure validation fails, content and business rules never run. If content validation fails, business rules don't run. This prevents cascading errors and gives precise feedback about which layer caught the problem.

Integrating Validation into Weather Dashboard

Now integrate the validation pipeline into the Weather Dashboard. Validate data immediately after fetching it, before using it anywhere in your application.

Weather Dashboard with Validation

Python

import requests

class ValidatedWeatherDashboard:
    """Weather dashboard with three-layer validation."""
    
    def __init__(self):
        self.weather_url = "https://api.open-meteo.com/v1/forecast"
    
    def get_weather_data(self, latitude, longitude):
        """Fetch and validate weather data."""
        print("Fetching weather data...")
        
        params = {
            "latitude": latitude,
            "longitude": longitude,
            "current": ["temperature_2m", "relative_humidity_2m", 
                       "wind_speed_10m", "weather_code", "apparent_temperature"],
            "timezone": "auto"
        }
        
        try:
            # Fetch data (error handling from Chapter 9)
            response = requests.get(self.weather_url, params=params, timeout=15)
            response.raise_for_status()
            data = response.json()
            
            # Validate data (new validation pipeline)
            valid, error = validate_weather_data(data)
            if not valid:
                print(f"\n❌ Data validation failed: {error}")
                return None
            
            print("✓ Data validated successfully")
            return data
            
        except requests.exceptions.RequestException as e:
            print(f"\n❌ Request failed: {e}")
            return None
    
    def display_weather(self, data):
        """Display validated weather data."""
        if not data:
            return
        
        current = data["current"]
        
        # After validation passes, we can trust this data
        temp = current["temperature_2m"]
        humidity = current.get("relative_humidity_2m", "N/A")
        wind = current.get("wind_speed_10m", "N/A")
        
        print(f"\n🌡️  Temperature: {temp}°C")
        print(f"💧 Humidity: {humidity}%")
        print(f"💨 Wind Speed: {wind} km/h")


# Usage
dashboard = ValidatedWeatherDashboard()

# Dublin coordinates
weather_data = dashboard.get_weather_data(53.3498, -6.2603)
dashboard.display_weather(weather_data)

Validation happens at the boundary where external data enters your system. After validation passes, the rest of your code can trust the data. No more defensive checks scattered throughout display logic, formatting functions, or business calculations.

Validation at the Boundary

Validate once at the entry point, then trust the data everywhere else. This keeps validation logic centralized and makes downstream code simpler. The display_weather function doesn't need defensive checks because validation already guaranteed the data is good.

Takeaways & Next Step

Manual Validation System

Three validators, one pipeline: Structure → Content → Business rules, each with clear responsibility
Consistent interface: All validators return (is_valid, error_message) for easy chaining
Fail-fast pattern: Stop at first failure with specific error message
Validate at boundaries: Check data where it enters your system, trust it everywhere else
Required vs optional: Enforce critical fields, gracefully handle optional ones

You just wrote 80+ lines of validation code that works perfectly. Section 4 shows you how to replace most of that repetitive code with declarative schemas while keeping the manual business rules where they add value.

4. Automating Validation with JSON Schema

You just built a complete validation system with 80+ lines of Python. Most of that code follows mechanical patterns: check types, verify ranges, ensure fields exist. This repetition is exactly what computers excel at—and exactly what you shouldn't write manually.

JSON Schema lets you describe validation rules declaratively in JSON format. A validation library reads your schema and checks data automatically. You replace dozens of lines of Python with a concise schema definition.

What is JSON Schema?

JSON Schema is a standard for describing JSON data structure and constraints. Instead of writing Python functions that check types and ranges, you define rules in JSON format. Libraries like jsonschema read your schema and validate data against it automatically.

Here's the structural and content validation from Section 3 expressed as a JSON Schema:

Weather Data Schema

JSON Schema

{
  "type": "object",
  "required": ["current"],
  "properties": {
    "current": {
      "type": "object",
      "required": ["temperature_2m"],
      "properties": {
        "temperature_2m": {
          "type": "number",
          "minimum": -100,
          "maximum": 60
        },
        "relative_humidity_2m": {
          "type": "number",
          "minimum": 0,
          "maximum": 100
        },
        "wind_speed_10m": {
          "type": "number",
          "minimum": 0,
          "maximum": 200
        },
        "weather_code": {
          "type": "integer"
        },
        "apparent_temperature": {
          "type": "number"
        }
      }
    },
    "current_units": {
      "type": "object"
    }
  }
}

What This Schema Defines

Structure: Root must be object with required "current" field
Types: temperature_2m must be number, weather_code must be integer
Constraints: temperature between -100 and 60, humidity 0-100
Required vs optional: temperature_2m required, humidity optional

Compare this schema to the 60+ lines of manual structural and content validation you wrote in Section 3. The schema is shorter, clearer, and self-documenting. Anyone can read it and understand what valid data looks like.

Using JSON Schema in Python

The jsonschema library validates data against your schema automatically. Install it with pip install jsonschema, then use it like this:

Python

from jsonschema import validate, ValidationError

# Define schema (in practice, load from a JSON file)
weather_schema = {
    "type": "object",
    "required": ["current"],
    "properties": {
        "current": {
            "type": "object",
            "required": ["temperature_2m"],
            "properties": {
                "temperature_2m": {
                    "type": "number",
                    "minimum": -100,
                    "maximum": 60
                },
                "relative_humidity_2m": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 100
                },
                "wind_speed_10m": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 200
                }
            }
        }
    }
}

def validate_weather_with_schema(data):
    """Validate weather data using JSON Schema."""
    try:
        validate(instance=data, schema=weather_schema)
        return True, None
    except ValidationError as e:
        return False, e.message


# Test with valid data
good_data = {
    "current": {
        "temperature_2m": 22.5,
        "relative_humidity_2m": 65,
        "wind_speed_10m": 12.3
    }
}

valid, error = validate_weather_with_schema(good_data)
print(f"Valid: {valid}")  # Valid: True


# Test with invalid data
bad_data = {
    "current": {
        "temperature_2m": 150  # Too hot!
    }
}

valid, error = validate_weather_with_schema(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: 150 is greater than the maximum of 60

What Just Happened

You replaced approximately 60 lines of manual validation code with a 20-line schema definition. The jsonschema library handles type checking, range validation, required fields, and nested structure automatically. The schema is also self-documenting—anyone can read it and understand what valid data looks like.

When Schemas Shine, When They Struggle

JSON Schema excels at structural and content validation but can't handle complex business logic. Here's what schemas do well and where they fall short:

Validation Type	Schema Capability	Example
Type checking	✓ Excellent	temperature must be number
Range validation	✓ Excellent	humidity between 0 and 100
Required fields	✓ Excellent	temperature_2m is required
String patterns	✓ Good	Email must match regex pattern
Cross-field logic	✗ Limited	Snow at warm temperatures
Complex business rules	✗ Cannot express	"Feels like" vs actual temperature

Schemas automate mechanical validation beautifully but can't replace domain logic. That's where the hybrid approach comes in: use schemas for structure and content, use manual functions for business rules.

Takeaways & Next Step

Schema-Based Validation

Declarative over imperative: Describe what's valid, not how to check it
Massive code reduction: 60+ lines of manual validation becomes 20-line schema
Self-documenting: Schema defines both validation rules and data structure documentation
Schemas excel at mechanics: Type checking, ranges, required fields, patterns
Schemas struggle with logic: Cross-field rules, domain constraints, complex conditions

Section 5 combines schemas with manual validation to get the best of both approaches. You'll see the pattern that most production systems use: automated validation for mechanical checks, custom code for business logic.

5. The Hybrid Approach

Here's what production systems actually do: they use schemas to automate structural and content validation, then add manual functions for business rules that schemas can't express. This hybrid approach gives you schema automation where it helps most and manual flexibility where you need it.

Let schemas handle the tedious work: type checking, range validation, required fields. Write manual code only for logic schemas can't express: cross-field rules, domain constraints, complex conditions. You get less code to maintain with clearer separation of concerns.

Combining Schema and Manual Validation

The hybrid validator runs schema validation first (structure and content), then manual validation second (business rules). Each handles what it does best.

Hybrid Validation Pipeline

Python

from jsonschema import validate, ValidationError

# Schema handles structure and content
weather_schema = {
    "type": "object",
    "required": ["current"],
    "properties": {
        "current": {
            "type": "object",
            "required": ["temperature_2m"],
            "properties": {
                "temperature_2m": {
                    "type": "number",
                    "minimum": -100,
                    "maximum": 60
                },
                "relative_humidity_2m": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 100
                },
                "wind_speed_10m": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 200
                },
                "weather_code": {"type": "integer"},
                "apparent_temperature": {"type": "number"}
            }
        }
    }
}

def validate_business_rules(current_data):
    """Manual validation for business rules only."""
    temp = current_data.get("temperature_2m")
    weather_code = current_data.get("weather_code")
    apparent_temp = current_data.get("apparent_temperature")
    
    # Business rule: Snow at warm temperatures
    if temp is not None and weather_code is not None:
        if weather_code in [71, 73, 75] and temp > 5:
            return False, f"Snow at {temp}°C is unlikely"
    
    # Business rule: "Feels like" vs actual temperature
    if temp is not None and apparent_temp is not None:
        if abs(apparent_temp - temp) > 20:
            return False, (
                f"'Feels like' {apparent_temp}°C too different "
                f"from actual {temp}°C"
            )
    
    return True, None


def validate_weather_hybrid(data):
    """
    Hybrid approach: Schema for structure/content, manual for business rules.
    Returns (is_valid, error_message).
    """
    # Step 1: Schema validation (automated)
    try:
        validate(instance=data, schema=weather_schema)
    except ValidationError as e:
        return False, f"Schema validation failed: {e.message}"
    
    # Step 2: Business rules (manual)
    current = data["current"]
    valid, error = validate_business_rules(current)
    if not valid:
        return False, f"Business rule validation failed: {error}"
    
    return True, None


# Test with valid data
good_data = {
    "current": {
        "temperature_2m": 22.5,
        "relative_humidity_2m": 65,
        "wind_speed_10m": 12.3,
        "weather_code": 0,
        "apparent_temperature": 21.8
    }
}

valid, error = validate_weather_hybrid(good_data)
print(f"Valid: {valid}")  # Valid: True


# Test with business rule violation
bad_data = {
    "current": {
        "temperature_2m": 15.0,  # Warm temperature
        "weather_code": 71  # Snow code - doesn't make sense!
    }
}

valid, error = validate_weather_hybrid(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: Business rule validation failed: Snow at 15.0°C is unlikely

Best of Both Worlds

Schema handles: Type checking, range validation, required fields, structure
Manual functions handle: Cross-field logic, domain rules, complex conditions
Result: Less code to maintain, clearer validation rules, easier updates

This is the approach most production systems converge on. Schemas eliminate repetitive code. Manual functions provide flexibility for complex rules. You get automation and control in the right proportions.

When to Use Which Approach

The right validation approach depends on your API complexity, team size, and maintenance requirements. Here's how to decide:

Scenario	Best Approach	Rationale
Single API, simple structure	Manual validation	Overhead of schemas not worth it for small scale
Multiple APIs with similar structure	JSON Schema	Reuse schemas across APIs, reduce duplication
Complex business rules	Hybrid (Schema + Manual)	Schema for structure/content, manual for domain logic
Rapidly changing API	JSON Schema	Update schemas without changing code
Team with junior developers	JSON Schema	Declarative schemas easier to understand than validation code
Performance-critical path	Manual validation	Avoid schema parsing overhead in hot paths
Building an SDK/library	Hybrid (Schema + Manual)	Users can extend schemas for their needs

Most applications eventually need the hybrid approach. Start with manual validation for simple cases. Add schemas when repetition becomes painful. Keep manual functions for business logic that schemas can't express.

Writing Good Error Messages

Validation error messages serve two audiences: developers debugging issues and operations teams monitoring production. Good messages specify what failed, why it matters, and what to do about it.

Be Specific

Don't say "validation failed." Say what failed and why.

Example

# Bad
return False, "Invalid temperature"

# Good
return False, f"Temperature {temp}°C outside valid range (-100 to 60)"

Include Context

Show the actual value that failed, not just that something failed.

Example

# Bad
return False, "Unrealistic humidity"

# Good
return False, f"Humidity {humidity}% exceeds maximum of 100%"

Distinguish Layers

Prefix messages with which layer failed for easier debugging.

Example

# Each layer identifies itself
return False, "Structure validation failed: Missing 'current' section"
return False, "Content validation failed: Temperature must be numeric"
return False, "Business rule violated: Snow at 15°C is unlikely"

Performance Considerations

Validation adds overhead. For most applications, this cost is negligible compared to network I/O. But in performance-critical paths, validation can matter.

When Validation Cost Matters

Schema validation is slower than manual: Parsing schemas and running generic checks costs more than targeted Python code. In hot paths processing thousands of requests per second, this difference matters.

Solutions: Cache compiled schemas, validate once at boundaries and pass validated data downstream, or use manual validation in performance-critical code paths.

When it doesn't matter: API requests typically take 100-500ms. Validation adds 1-5ms. In normal API integration code, this overhead is insignificant.

Takeaways & Next Step

The Production Pattern

Hybrid approach dominates: Schemas for automation, manual code for domain logic
Clear division: Schemas handle mechanics (types, ranges, structure), manual handles business rules
Decision framework: Start manual, add schemas when repetition hurts, keep both for complex systems
Good error messages: Specific, include context, identify which layer failed
Performance awareness: Validation cost matters in hot paths, negligible for normal API calls

With validation implemented, Section 6 covers where to put it in your architecture, how to test it systematically, and how to integrate it with the error handling patterns from Chapter 9.

6. Where Validation Lives

Validation belongs at architectural boundaries—places where data crosses from one system or layer to another. Validate once at the boundary, then trust the data everywhere downstream. This keeps validation centralized and downstream code simple.

The key question isn't "should I validate" but "where should I validate." Put validation in the wrong place and you'll either miss problems (validate too late) or create redundant checks (validate too often).

The Three Validation Boundaries

Most applications have three natural validation boundaries. Each serves a different purpose and catches different problems:

API Client Layer

When: Immediately after receiving API response

What to validate: Structure and content (Layers 1 and 2)

Why here: Catch API provider issues before they spread through your system

Python - API Client Validation

class WeatherAPIClient:
    """Client for Open-Meteo API with validation."""
    
    def fetch_weather(self, latitude, longitude):
        """Fetch and validate weather data."""
        response = requests.get(self.url, params=params)
        response.raise_for_status()
        data = response.json()
        
        # Validate at API boundary
        valid, error = validate_weather_structure_and_content(data)
        if not valid:
            raise ValidationError(f"Invalid API response: {error}")
        
        return data  # Downstream code can trust this structure

Service Layer

When: Before processing data for business logic

What to validate: Business rules (Layer 3)

Why here: Enforce domain constraints before data affects business operations

Python - Service Layer Validation

class WeatherService:
    """Service layer with business rule validation."""
    
    def get_current_conditions(self, latitude, longitude):
        """Get weather with business rule enforcement."""
        # Data already structurally validated by API client
        weather_data = self.api_client.fetch_weather(latitude, longitude)
        
        # Validate business rules
        current = weather_data["current"]
        valid, error = validate_weather_business_rules(current)
        if not valid:
            logger.warning(f"Business rule violation: {error}")
            # Decide: return partial data or raise error
        
        return self.format_weather(weather_data)

Application Layer

When: Before displaying or storing data

What to validate: Application-specific requirements

Why here: Ensure data meets UI or storage constraints

Python - Application Layer Validation

class WeatherDashboard:
    """Application with display-specific validation."""
    
    def display_weather(self, weather_data):
        """Display weather with UI-specific checks."""
        current = weather_data["current"]
        temp = current["temperature_2m"]
        
        # Application-specific validation
        if temp < -50:
            # Too cold to display outdoor activities
            return self.show_extreme_cold_warning()
        
        # Normal display
        return self.render_current_weather(current)

Validation Layering

Each boundary validates different concerns. API client ensures structural validity. Service layer enforces business rules. Application layer handles display/storage requirements. Data flows through all three, gaining more trust at each boundary.

Fail-Fast vs Fail-Graceful

When validation fails, you have two choices: fail-fast (raise error immediately) or fail-graceful (log warning and continue with partial data). Choose based on how critical the data is.

Data Criticality	Strategy	Example
Essential for operation	Fail-fast (raise error)	Payment amount, user authentication
Important but not critical	Fail-graceful (log + fallback)	Weather "feels like" temperature
Enhancement only	Fail-graceful (log + skip)	Weather icon, background images

Python - Fail-Fast vs Fail-Graceful

def get_weather_with_validation(latitude, longitude):
    """Fetch weather with appropriate failure handling."""
    data = fetch_weather_data(latitude, longitude)
    
    # Fail-fast for critical data
    valid, error = validate_temperature(data["current"]["temperature_2m"])
    if not valid:
        raise ValidationError(f"Critical field invalid: {error}")
    
    # Fail-graceful for enhancements
    valid, error = validate_weather_icon(data["current"].get("weather_code"))
    if not valid:
        logger.warning(f"Optional field invalid: {error}")
        data["current"]["weather_code"] = None  # Use fallback icon
    
    return data

Testing Your Validators

Validators are code like any other—they need tests. Good validator tests cover three scenarios: valid data passes, invalid data fails with correct errors, and edge cases are handled properly.

Testing the Weather Validator

Python - Validator Tests

def test_valid_data_passes():
    """Valid weather data should pass all validation layers."""
    data = {
        "current": {
            "temperature_2m": 22.5,
            "relative_humidity_2m": 65,
            "wind_speed_10m": 12.3,
            "weather_code": 0
        }
    }
    
    valid, error = validate_weather_data(data)
    assert valid is True
    assert error is None


def test_missing_section_fails():
    """Missing 'current' section should fail structural validation."""
    data = {"other_section": {}}
    
    valid, error = validate_weather_data(data)
    assert valid is False
    assert "current" in error.lower()


def test_invalid_temperature_fails():
    """Out-of-range temperature should fail content validation."""
    data = {
        "current": {
            "temperature_2m": 150  # Too hot
        }
    }
    
    valid, error = validate_weather_data(data)
    assert valid is False
    assert "temperature" in error.lower()


def test_business_rule_violation_fails():
    """Snow at warm temperature should fail business rules."""
    data = {
        "current": {
            "temperature_2m": 15,
            "weather_code": 71  # Snow code
        }
    }
    
    valid, error = validate_weather_data(data)
    assert valid is False
    assert "snow" in error.lower()


def test_edge_cases():
    """Edge case values should be handled correctly."""
    # Exact boundary values
    data = {"current": {"temperature_2m": -100}}  # Minimum valid
    valid, _ = validate_weather_data(data)
    assert valid is True
    
    data = {"current": {"temperature_2m": 60}}  # Maximum valid
    valid, _ = validate_weather_data(data)
    assert valid is True
    
    # Just outside boundaries
    data = {"current": {"temperature_2m": -100.1}}
    valid, error = validate_weather_data(data)
    assert valid is False

Test Coverage for Validators

Valid data: Confirm validators don't reject good data
Each layer: Test structural, content, and business rule failures separately
Edge cases: Boundary values, None, empty strings, extreme ranges
Error messages: Verify errors are specific and helpful

Takeaways & Next Step

Validation Architecture

Validate at boundaries: API client (structure/content), service layer (business rules), application layer (display/storage)
Choose failure strategy: Fail-fast for critical data, fail-graceful for enhancements
Test systematically: Valid data, invalid data, edge cases, error messages
Centralize validation: Validate once at boundaries, trust data downstream
Layer appropriately: Each boundary validates different concerns, data gains trust as it flows through

Section 7 provides practical exercises to reinforce these patterns. You'll validate real-world data, fix bugs in the News Aggregator, and apply validation to your own API integrations.

7. Practical Application

Time to apply what you've learned. These exercises build on each other, starting with simple validators and progressing to complete validation systems. Work through them in order—each reinforces patterns from earlier sections.

Exercise 1: Build a User Profile Validator

Create a manual three-layer validator for user profile data. This exercise reinforces the structural → content → business rules pattern.

User Profile Data

JSON - Expected Structure

{
  "user": {
    "id": 12345,
    "email": "user@example.com",
    "age": 28,
    "username": "john_doe",
    "preferences": {
      "newsletter": true,
      "theme": "dark"
    }
  }
}

Requirements:

Structure: Root must have "user" object with id, email, age, username
Content: Email must contain "@", age between 13-120, username 3-20 characters
Business rules: If newsletter is true, email cannot be from disposable domains (tempmail, guerrillamail)

Task: Build validate_user_profile(data) following the three-layer pattern.

Show Solution

Python

def validate_user_profile_structure(data):
    """Layer 1: Structural validation."""
    if not isinstance(data, dict):
        return False, "Data must be a dictionary"
    
    if "user" not in data:
        return False, "Missing 'user' object"
    
    user = data["user"]
    if not isinstance(user, dict):
        return False, "'user' must be a dictionary"
    
    required = ["id", "email", "age", "username"]
    for field in required:
        if field not in user:
            return False, f"Missing required field: {field}"
    
    return True, None


def validate_user_profile_content(user):
    """Layer 2: Content validation."""
    # Email format
    email = user["email"]
    if "@" not in email:
        return False, "Email must contain '@'"
    
    # Age range
    try:
        age = int(user["age"])
        if age < 13 or age > 120:
            return False, f"Age {age} outside valid range (13-120)"
    except (ValueError, TypeError):
        return False, f"Age must be integer, got: {user['age']}"
    
    # Username length
    username = user["username"]
    if not isinstance(username, str):
        return False, "Username must be string"
    if len(username) < 3 or len(username) > 20:
        return False, f"Username length {len(username)} outside valid range (3-20)"
    
    return True, None


def validate_user_profile_business_rules(user):
    """Layer 3: Business rules."""
    disposable_domains = ["tempmail.com", "guerrillamail.com"]
    
    preferences = user.get("preferences", {})
    if preferences.get("newsletter"):
        email = user["email"]
        domain = email.split("@")[-1]
        if domain in disposable_domains:
            return False, f"Newsletter requires non-disposable email (got {domain})"
    
    return True, None


def validate_user_profile(data):
    """Complete validation pipeline."""
    valid, error = validate_user_profile_structure(data)
    if not valid:
        return False, f"Structure: {error}"
    
    user = data["user"]
    
    valid, error = validate_user_profile_content(user)
    if not valid:
        return False, f"Content: {error}"
    
    valid, error = validate_user_profile_business_rules(user)
    if not valid:
        return False, f"Business rule: {error}"
    
    return True, None


# Test
test_data = {
    "user": {
        "id": 12345,
        "email": "user@example.com",
        "age": 28,
        "username": "john_doe",
        "preferences": {"newsletter": True, "theme": "dark"}
    }
}

valid, error = validate_user_profile(test_data)
print(f"Valid: {valid}, Error: {error}")

Exercise 2: Convert to JSON Schema

Take the user profile validator from Exercise 1 and convert structural/content validation to JSON Schema. Keep business rules as manual validation.

Task: Create a schema that validates structure and content, then combine it with the business rules validator from Exercise 1.

Show Solution

Python

from jsonschema import validate, ValidationError

user_schema = {
    "type": "object",
    "required": ["user"],
    "properties": {
        "user": {
            "type": "object",
            "required": ["id", "email", "age", "username"],
            "properties": {
                "id": {"type": "integer"},
                "email": {
                    "type": "string",
                    "pattern": "^.*@.*$"
                },
                "age": {
                    "type": "integer",
                    "minimum": 13,
                    "maximum": 120
                },
                "username": {
                    "type": "string",
                    "minLength": 3,
                    "maxLength": 20
                },
                "preferences": {
                    "type": "object",
                    "properties": {
                        "newsletter": {"type": "boolean"},
                        "theme": {
                            "type": "string",
                            "enum": ["light", "dark"]
                        }
                    }
                }
            }
        }
    }
}


def validate_user_hybrid(data):
    """Hybrid validation: schema + manual business rules."""
    # Schema handles structure and content
    try:
        validate(instance=data, schema=user_schema)
    except ValidationError as e:
        return False, f"Schema validation failed: {e.message}"
    
    # Manual business rules
    user = data["user"]
    valid, error = validate_user_profile_business_rules(user)
    if not valid:
        return False, error
    
    return True, None


# Test
valid, error = validate_user_hybrid(test_data)
print(f"Valid: {valid}, Error: {error}")

Exercise 3: Fix the News Aggregator Bug

The News Aggregator from Chapter 11 has a validation bug where empty article titles pass through. Fix the normalizer to reject articles with empty titles.

Python - Buggy Normalizer

def normalize_newsapi(response):
    """Transform NewsAPI response - has a validation bug."""
    articles = []
    
    for item in response.get("articles", []):
        title = item.get("title", "").strip()
        url = item.get("url", "").strip()
        
        # BUG: This only checks URL!
        if not url:
            continue
        
        articles.append({
            "title": title,
            "url": url,
            "published_at": item.get("publishedAt", ""),
            "source": "NewsAPI"
        })
    
    return articles

Task: Write a test that catches this bug, then fix the normalizer to validate both title and URL.

Show Solution

Python - Test & Fix

# Test that catches the bug
def test_empty_title_rejected():
    """Articles with empty titles should be rejected."""
    response = {
        "articles": [
            {
                "title": "",  # Empty title
                "url": "https://example.com/article",
                "publishedAt": "2025-01-15T10:00:00Z"
            }
        ]
    }
    
    articles = normalize_newsapi(response)
    assert len(articles) == 0, "Empty title should be rejected"


# Fixed normalizer
def normalize_newsapi(response):
    """Transform NewsAPI response - fixed validation."""
    articles = []
    
    for item in response.get("articles", []):
        title = item.get("title", "").strip()
        url = item.get("url", "").strip()
        
        # FIX: Check BOTH title and url
        if not title or not url:
            continue
        
        articles.append({
            "title": title,
            "url": url,
            "published_at": item.get("publishedAt", ""),
            "source": "NewsAPI"
        })
    
    return articles


# Test now passes
test_empty_title_rejected()
print("✓ Test passed: Empty titles are rejected")

Exercise 4: Validate Forecast Data

Build a validator for weather forecast data that enforces cross-field business rules.

JSON - Forecast Structure

{
  "daily": {
    "time": ["2025-01-15", "2025-01-16", "2025-01-17"],
    "temperature_2m_max": [12.5, 14.2, 13.8],
    "temperature_2m_min": [6.1, 7.3, 8.0]
  }
}

Business Rules:

All arrays must have same length
Dates must be sequential (no gaps)
temperature_2m_max >= temperature_2m_min for each day
No temperature changes > 30°C between consecutive days

Task: Implement validate_forecast_data(data) that checks these rules.

Show Solution

Python

from datetime import datetime, timedelta

def validate_forecast_data(data):
    """Validate forecast data with cross-field business rules."""
    
    # Structure
    if "daily" not in data:
        return False, "Missing 'daily' section"
    
    daily = data["daily"]
    required = ["time", "temperature_2m_max", "temperature_2m_min"]
    for field in required:
        if field not in daily:
            return False, f"Missing required field: {field}"
    
    times = daily["time"]
    temp_max = daily["temperature_2m_max"]
    temp_min = daily["temperature_2m_min"]
    
    # Business Rule 1: Same length
    if not (len(times) == len(temp_max) == len(temp_min)):
        return False, (
            f"Array length mismatch: times={len(times)}, "
            f"max={len(temp_max)}, min={len(temp_min)}"
        )
    
    if len(times) == 0:
        return False, "Empty forecast arrays"
    
    # Business Rule 2: Sequential dates
    prev_date = None
    for date_str in times:
        try:
            date = datetime.strptime(date_str, "%Y-%m-%d")
            if prev_date and (date - prev_date).days != 1:
                return False, f"Date gap: {prev_date.date()} to {date.date()}"
            prev_date = date
        except ValueError:
            return False, f"Invalid date format: {date_str}"
    
    # Business Rule 3: Max >= Min
    for i, (tmax, tmin) in enumerate(zip(temp_max, temp_min)):
        if tmax < tmin:
            return False, f"Day {i}: max {tmax}°C < min {tmin}°C"
    
    # Business Rule 4: No extreme jumps
    for i in range(1, len(temp_max)):
        prev_avg = (temp_max[i-1] + temp_min[i-1]) / 2
        curr_avg = (temp_max[i] + temp_min[i]) / 2
        change = abs(curr_avg - prev_avg)
        
        if change > 30:
            return False, (
                f"Extreme temp change between day {i-1} and {i}: "
                f"{change:.1f}°C"
            )
    
    return True, None


# Test
valid_forecast = {
    "daily": {
        "time": ["2025-01-15", "2025-01-16", "2025-01-17"],
        "temperature_2m_max": [12.5, 14.2, 13.8],
        "temperature_2m_min": [6.1, 7.3, 8.0]
    }
}

valid, error = validate_forecast_data(valid_forecast)
print(f"Valid: {valid}, Error: {error}")

Key Takeaways

What You've Mastered

Three-layer pattern: Structural, content, business rules—each catches different problems
Manual vs schema: Manual for flexibility, schemas for automation, hybrid for production
Architectural boundaries: Validate at API client, service layer, application layer
Failure strategies: Fail-fast for critical data, fail-graceful for enhancements
Testing validators: Valid data, invalid data, edge cases, error messages
Production patterns: You can now build validation systems that scale

Looking Forward

In Chapter 13 (Production-Ready Weather Dashboard Capstone), you'll apply both error handling from Chapter 9 and validation from this chapter across multiple APIs simultaneously. You'll see how these patterns work together when coordinating data from different sources, each with their own failure modes and data quality issues.

You now understand that reliable API integration requires two complementary approaches: error handling ensures requests complete successfully, and validation ensures the responses contain usable data. Together, they create applications that users can depend on.