Defensive programming prevents crashes. Validation prevents bad data.
Your News Aggregator from Chapter 11 works. Articles display, searches run, and your defensive patterns prevent crashes when APIs return missing or malformed fields.
But "working" is not the same as "reliable." Use the aggregator daily and quality problems show up. Articles have empty titles. Dates display as "Invalid Date." Placeholder text slips through. A story dated "2099" jumps to the top of the feed.
This is the limit of defensive programming. Fallback values and safe checks keep your app running, but they can also allow bad data to pass quietly. Defensive programming protects your code. Validation protects your output.
Defensive programming (Chapters 9 to 11) keeps your application stable when data is missing, malformed, or unexpected.
Validation (this chapter) enforces rules and rejects or flags data that does not meet your requirements.
You need both. Defensive programming for resilience, validation for quality.
In this chapter, you'll learn the difference between data that is "safe" (won't crash your code) and data that is "valid" (meets your requirements). You'll build a three-layer validation approach that checks structure, content, and business rules, and you'll learn where validation belongs in your architecture so you enforce quality without scattering checks everywhere.
Chapter Roadmap
This chapter takes you from understanding why validation matters to building a production-ready validation system. You'll progress from manual checks through automated schemas to a hybrid approach used in real-world applications.
The Three-Layer Validation Pattern
Learn the conceptual framework that organises validation into three distinct layers: structural checks, content checks, and business rules. You'll see why scattered single-field checks don't scale and how a layered approach catches different categories of bad data.
Manual Validation Implementation
Build each validation layer by hand using pure Python functions. You'll implement structural validators that check for required keys, content validators that enforce types and ranges, and business rule validators that catch domain-specific issues like -999 sensor error placeholders.
JSON Schema and the Hybrid Approach
Discover how JSON Schema automates structural and content validation with declarative rules instead of manual code. Then combine schemas with manual functions in a hybrid approach: let schemas handle repetitive checks while keeping hand-written validators for complex business logic.
Validation Architecture
Learn where validation belongs in your application architecture. You'll explore the three validation boundaries, understand when to fail fast versus fail gracefully, and build systematic tests that verify your validators catch every category of bad data.
Practical Exercises
Apply everything you've learned across four exercises: build a user profile validator from scratch, convert it to JSON Schema, fix a validation bug in the News Aggregator, and validate multi-day forecast data with cross-field business rules.
Learning Objectives
What You'll Master in This Chapter
By the end of this chapter, you'll be able to:
- Explain the difference between error handling, defensive programming, and validation, and know when each one belongs
- Implement three-layer validation (structural, content, business rules) to catch different types of data quality issues
- Build manual validation functions with clear error messages and fail-fast patterns that keep downstream code simple
- Use JSON Schema to automate structural and content validation while keeping manual functions for complex business logic
- Apply the hybrid approach that production systems use: schemas for automation, manual code for domain rules
- Place validation at the right architectural boundaries and test it systematically
Why Validation Matters
You might wonder: if defensive programming prevents crashes, why add validation? Three reasons make validation essential as your applications grow:
No errors. No crash. But your users are staring at -999°C.
Silent Corruption
Defensive programming with fallback values can hide quality issues. A missing author name becomes an empty string. A malformed timestamp becomes "N/A". Users see corrupted data but no errors, making problems harder to detect and debug.
Business Requirements
Some data isn't just "nice to have"; it's required for your application to function. A payment amount must be positive. An email must contain "@". A temperature must be physically possible. Defensive programming can't enforce these rules.
Clear Failures
When data doesn't meet requirements, validation fails fast with specific error messages. This beats silent corruption or mysterious behavior downstream. You know exactly what's wrong and where.
This chapter shows you how to add validation strategically, not replacing defensive programming but complementing it. You'll learn to validate at boundaries where quality matters while keeping defensive patterns throughout your codebase. The result is an application that is both resilient (handles unexpected situations) and reliable (produces quality output).
2. The Three-Layer Validation Pattern
Before building validators, you need to understand what you're validating against. The Weather Dashboard from Chapter 8 calls the Open-Meteo API and expects responses that look like this:
{
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0,
"apparent_temperature": 21.8
},
"current_units": {
"temperature_2m": "°C",
"relative_humidity_2m": "%"
}
}
Your code assumes this structure and uses the values directly. When the API sends exactly what you expect, everything works. But production APIs are messier than that.
When APIs Send Bad Data
External APIs don't guarantee data quality. Here's what actually shows up in production:
{
"current": {
"temperature_2m": -999, // Sensor error placeholder
"relative_humidity_2m": "N/A", // String instead of number
"wind_speed_10m": null, // Missing data
"weather_code": 71, // Snow code
"apparent_temperature": 150 // Impossible value
}
}
Weather sensors malfunction and return placeholder values like -999. Format changes turn numbers into strings. Fields go missing entirely. The API returns technically valid JSON with a 200 status code, but the data is garbage.
Here's what happens when your code processes this without validation:
def display_current_weather(weather_data, location_name):
"""Display current weather - crashes on bad data."""
current = weather_data["current"]
# Crashes if "current" missing (KeyError)
temp = current["temperature_2m"]
print(f"Temperature: {temp}°C")
# Crashes if temperature is string (TypeError)
if temp > 30:
print("It's hot today!")
# Shows nonsense if temp is -999 or 150
# Users see: "Temperature: -999°C" or "Temperature: 150°C"
Without validation, you have three outcomes: crashes from type errors, crashes from missing fields, or silent corruption where impossible values flow through your application and confuse users.
Why Single-Field Checks Don't Scale
You might think: "I'll just add a check for each field." Here's what that looks like:
def validate_temperature(value):
"""Validate temperature field."""
if value is None:
return False, "Temperature missing"
try:
temp = float(value)
if temp < -100 or temp > 60:
return False, f"Temperature out of range: {temp}"
except (ValueError, TypeError):
return False, f"Temperature not numeric: {value}"
return True, None
def validate_humidity(value):
"""Validate humidity field."""
if value is None:
return False, "Humidity missing"
try:
humidity = float(value)
if humidity < 0 or humidity > 100:
return False, f"Humidity out of range: {humidity}"
except (ValueError, TypeError):
return False, f"Humidity not numeric: {value}"
return True, None
def validate_wind_speed(value):
"""Validate wind speed field."""
if value is None:
return False, "Wind speed missing"
try:
wind = float(value)
if wind < 0 or wind > 200:
return False, f"Wind speed out of range: {wind}"
except (ValueError, TypeError):
return False, f"Wind speed not numeric: {value}"
return True, None
# You need similar functions for:
# - pressure, visibility, precipitation
# - cloud_cover, uv_index, wind_direction
# - and 10 more fields...
#
# That's 50+ lines per field × 15 fields = 750+ lines
# Every API change means updating multiple validators
# New developers must learn your pattern for every field type
The pattern is identical for every field: check existence, verify type, validate range. Writing this manually is tedious, error-prone, and doesn't scale. Real APIs have dozens of fields spread across nested structures. You need a systematic approach.
Three-Layer Validation
Professional validation separates concerns into three distinct layers. Each layer catches different problems and answers a different question about your data:
The three-layer validation pattern — each layer catches different types of data quality issues
Layer 1: Structural Validation
Question: Is the shape right?
Checks that required fields exist and have correct types. Prevents crashes from accessing missing keys or wrong data types.
if not isinstance(data, dict):
return False, "Weather data must be a dictionary"
if "current" not in data:
return False, "Missing required 'current' section"
if not isinstance(data["current"], dict):
return False, "'current' must be a dictionary"
What it catches: Missing sections, wrong types, structural changes
Layer 2: Content Validation
Question: Are values sensible?
Validates that field values are realistic and well-formed. Prevents logic errors from unrealistic or malformed data.
if temp < -100 or temp > 60:
return False, f"Unrealistic temperature: {temp}°C"
if humidity < 0 or humidity > 100:
return False, f"Invalid humidity: {humidity}%"
if wind_speed < 0:
return False, "Wind speed cannot be negative"
What it catches: Placeholder values, sensor errors, format corruption
Layer 3: Business Rules Validation
Question: Does it make sense together?
Checks cross-field logic and domain-specific constraints. Prevents acting on technically valid but operationally nonsensical data.
# Snow at warm temperatures doesn't make sense
if weather_code in [71, 73, 75] and temp > 5:
return False, f"Snow at {temp}°C is unlikely"
# "Feels like" should be near actual temperature
if abs(apparent_temp - temp) > 20:
return False, f"'Feels like' {apparent_temp}°C too far from {temp}°C"
What it catches: Logical inconsistencies, domain violations, impossible combinations
These three layers work together sequentially. Structural validation ensures safe access to fields. Content validation ensures individual values make sense. Business rules ensure the data makes sense as a whole. Each layer builds on the previous one.
Separating concerns makes validation easier to build, test, and maintain. Structural checks are generic and can be automated. Content validation is field-specific but follows patterns. Business rules are unique to your domain and require custom logic.
This separation also determines when each validation runs. Structure first (prevent crashes), content second (prevent corruption), business rules last (prevent nonsense).
Takeaways & Next Step
- Production APIs send bad data: Placeholder values, wrong types, missing fields, impossible values
- Single-field checks don't scale: Real APIs need systematic validation across dozens of fields
- Three layers, three concerns: Structure (shape), Content (values), Business Rules (logic)
- Sequential validation: Each layer builds on the previous, preventing different failure modes
- Clear separation: Makes testing easier, maintenance simpler, and automation possible
With the pattern understood, Section 3 implements all three layers manually for the Weather Dashboard. You'll see exactly how to build validators that catch every type of data quality issue.
3. Manual Validation Implementation
Section 2 introduced the three-layer pattern conceptually. Now let's implement it. You'll build three validators that work together as a validation pipeline, then integrate that pipeline into the Weather Dashboard.
Each validator follows the same signature: it receives data and returns (is_valid, error_message). This consistent interface makes validators easy to chain and test.
Layer 1: Structural Validation
Structural validation ensures the response has the right shape. Check this first so you can safely access nested fields in later validators.
def validate_weather_structure(data):
"""
Layer 1: Validate weather response structure.
Returns (is_valid, error_message).
"""
# Root must be a dictionary
if not isinstance(data, dict):
return False, "Weather data must be a dictionary"
# Must contain 'current' section
if "current" not in data:
return False, "Weather data missing 'current' section"
# 'current' must be a dictionary
if not isinstance(data["current"], dict):
return False, "'current' section must be a dictionary"
# Check for current_units (helpful but not critical)
if "current_units" not in data:
print("Warning: Missing units information")
return True, None
- Response is a dictionary (not string, list, or None)
- Contains required 'current' section for current weather
- 'current' is itself a dictionary (not array or primitive)
- Warns about missing units (degraded but usable)
After structural validation passes, you know data["current"] exists and is a dictionary. You can safely check its contents in the next layer.
Layer 2: Content Validation
Content validation checks that individual field values are realistic. This catches placeholder values, sensor errors, and data corruption.
def validate_weather_content(current_data):
"""
Layer 2: Validate weather content for realistic values.
Returns (is_valid, error_message).
"""
# Temperature is required
if "temperature_2m" not in current_data:
return False, "Missing required temperature data"
temp = current_data["temperature_2m"]
if temp is None:
return False, "Temperature cannot be null"
# Validate temperature is numeric and realistic
try:
temp_float = float(temp)
if temp_float < -100 or temp_float > 60:
return False, f"Unrealistic temperature: {temp_float}°C"
except (ValueError, TypeError):
return False, f"Temperature must be numeric, got: {temp}"
# Optional field: humidity (if present, must be valid)
if "relative_humidity_2m" in current_data:
humidity = current_data["relative_humidity_2m"]
if humidity is not None:
try:
humidity_float = float(humidity)
if humidity_float < 0 or humidity_float > 100:
return False, f"Invalid humidity: {humidity_float}%"
except (ValueError, TypeError):
return False, f"Humidity must be numeric, got: {humidity}"
# Optional field: wind speed (if present, must be valid)
if "wind_speed_10m" in current_data:
wind = current_data["wind_speed_10m"]
if wind is not None:
try:
wind_float = float(wind)
if wind_float < 0:
return False, f"Wind speed cannot be negative: {wind_float}"
if wind_float > 200:
return False, f"Unrealistic wind speed: {wind_float} km/h"
except (ValueError, TypeError):
return False, f"Wind speed must be numeric, got: {wind}"
return True, None
Temperature is required—if it's missing or invalid, validation fails immediately. Humidity and wind speed are optional—they're validated only if present. This distinction lets you enforce critical requirements while gracefully handling incomplete data for non-essential fields.
Layer 3: Business Rules Validation
Business rules validate cross-field logic and domain-specific constraints. These catch data that's individually valid but collectively nonsensical.
def validate_weather_business_rules(current_data):
"""
Layer 3: Validate weather business rules and logical consistency.
Returns (is_valid, error_message).
"""
temp = current_data.get("temperature_2m")
weather_code = current_data.get("weather_code")
# Business rule: Snow conditions should match temperature
if temp is not None and weather_code is not None:
try:
temp_float = float(temp)
code_int = int(weather_code)
# Weather codes 71, 73, 75 indicate snow
if code_int in [71, 73, 75] and temp_float > 5:
return False, f"Snow conditions at {temp_float}°C is unlikely"
except (ValueError, TypeError):
# Type errors already caught in content validation
pass
# Business rule: "Feels like" should be near actual temperature
apparent_temp = current_data.get("apparent_temperature")
if temp is not None and apparent_temp is not None:
try:
temp_float = float(temp)
apparent_float = float(apparent_temp)
# "Feels like" shouldn't differ by more than 20°C
if abs(apparent_float - temp_float) > 20:
return False, (
f"'Feels like' {apparent_float}°C too different "
f"from actual {temp_float}°C"
)
except (ValueError, TypeError):
# Type errors already caught in content validation
pass
return True, None
Business rules assume fields exist and have correct types. That's why they come after structural and content validation. By the time business rules run, you know fields are present and numeric, so you can focus on logical relationships without worrying about type errors.
The Complete Validation Pipeline
Now chain all three validators into a pipeline. Each layer runs sequentially, stopping at the first failure. This "fail-fast" approach gives specific error messages and avoids redundant checks.
def validate_weather_data(data):
"""
Complete validation pipeline: structure → content → business rules.
Returns (is_valid, error_message).
"""
# Layer 1: Structural validation
valid, error = validate_weather_structure(data)
if not valid:
return False, f"Structure validation failed: {error}"
# Layer 2: Content validation
current = data["current"] # Safe because structure validated
valid, error = validate_weather_content(current)
if not valid:
return False, f"Content validation failed: {error}"
# Layer 3: Business rules validation
valid, error = validate_weather_business_rules(current)
if not valid:
return False, f"Business rule validation failed: {error}"
# All three layers passed
return True, None
# Test with valid data
good_data = {
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0,
"apparent_temperature": 21.8
}
}
valid, error = validate_weather_data(good_data)
print(f"Valid: {valid}") # Valid: True
# Test with bad data
bad_data = {
"current": {
"temperature_2m": -999, # Placeholder value
"relative_humidity_2m": 65
}
}
valid, error = validate_weather_data(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: Content validation failed: Unrealistic temperature: -999°C
# Running validate_weather_data(bad_data)
Layer 1 — Structural validation → PASS
Layer 2 — Content validation → FAIL
Layer 3 — Business rules → SKIPPED
# Valid: False
# Error: Content validation failed: Unrealistic temperature: -999°C
The pipeline stops at the first failure. If structure validation fails, content and business rules never run. If content validation fails, business rules don't run. This prevents cascading errors and gives precise feedback about which layer caught the problem.
Integrating Validation into Weather Dashboard
Now integrate the validation pipeline into the Weather Dashboard. Validate data immediately after fetching it, before using it anywhere in your application.
import requests
class ValidatedWeatherDashboard:
"""Weather dashboard with three-layer validation."""
def __init__(self):
self.weather_url = "https://api.open-meteo.com/v1/forecast"
def get_weather_data(self, latitude, longitude):
"""Fetch and validate weather data."""
print("Fetching weather data...")
params = {
"latitude": latitude,
"longitude": longitude,
"current": ["temperature_2m", "relative_humidity_2m",
"wind_speed_10m", "weather_code", "apparent_temperature"],
"timezone": "auto"
}
try:
# Fetch data (error handling from Chapter 9)
response = requests.get(self.weather_url, params=params, timeout=15)
response.raise_for_status()
data = response.json()
# Validate data (new validation pipeline)
valid, error = validate_weather_data(data)
if not valid:
print(f"\n❌ Data validation failed: {error}")
return None
print("✓ Data validated successfully")
return data
except requests.exceptions.RequestException as e:
print(f"\n❌ Request failed: {e}")
return None
def display_weather(self, data):
"""Display validated weather data."""
if not data:
return
current = data["current"]
# After validation passes, we can trust this data
temp = current["temperature_2m"]
humidity = current.get("relative_humidity_2m", "N/A")
wind = current.get("wind_speed_10m", "N/A")
print(f"\n🌡️ Temperature: {temp}°C")
print(f"💧 Humidity: {humidity}%")
print(f"💨 Wind Speed: {wind} km/h")
# Usage
dashboard = ValidatedWeatherDashboard()
# Dublin coordinates
weather_data = dashboard.get_weather_data(53.3498, -6.2603)
dashboard.display_weather(weather_data)
Validation happens at the boundary where external data enters your system. After validation passes, the rest of your code can trust the data. No more defensive checks scattered throughout display logic, formatting functions, or business calculations.
Validate once at the entry point, then trust the data everywhere else. This keeps validation logic centralized and makes downstream code simpler. The display_weather function doesn't need defensive checks because validation already guaranteed the data is good.
Takeaways & Next Step
- Three validators, one pipeline: Structure → Content → Business rules, each with clear responsibility
- Consistent interface: All validators return (is_valid, error_message) for easy chaining
- Fail-fast pattern: Stop at first failure with specific error message
- Validate at boundaries: Check data where it enters your system, trust it everywhere else
- Required vs optional: Enforce critical fields, gracefully handle optional ones
You just wrote 80+ lines of validation code that works perfectly. Section 4 shows you how to replace most of that repetitive code with declarative schemas while keeping the manual business rules where they add value.
4. Automating Validation with JSON Schema
You just built a complete validation system with 80+ lines of Python. Most of that code follows mechanical patterns: check types, verify ranges, ensure fields exist. This repetition is exactly what computers excel at—and exactly what you shouldn't write manually.
JSON Schema lets you describe validation rules declaratively in JSON format. A validation library reads your schema and checks data automatically. You replace dozens of lines of Python with a concise schema definition.
What is JSON Schema?
JSON Schema is a standard for describing JSON data structure and constraints. Instead of writing Python functions that check types and ranges, you define rules in JSON format. Libraries like jsonschema read your schema and validate data against it automatically.
Here's the structural and content validation from Section 3 expressed as a JSON Schema:
{
"type": "object",
"required": ["current"],
"properties": {
"current": {
"type": "object",
"required": ["temperature_2m"],
"properties": {
"temperature_2m": {
"type": "number",
"minimum": -100,
"maximum": 60
},
"relative_humidity_2m": {
"type": "number",
"minimum": 0,
"maximum": 100
},
"wind_speed_10m": {
"type": "number",
"minimum": 0,
"maximum": 200
},
"weather_code": {
"type": "integer"
},
"apparent_temperature": {
"type": "number"
}
}
},
"current_units": {
"type": "object"
}
}
}
- Structure: Root must be object with required "current" field
- Types: temperature_2m must be number, weather_code must be integer
- Constraints: temperature between -100 and 60, humidity 0-100
- Required vs optional: temperature_2m required, humidity optional
Compare this schema to the 60+ lines of manual structural and content validation you wrote in Section 3. The schema is shorter, clearer, and self-documenting. Anyone can read it and understand what valid data looks like.
Using JSON Schema in Python
The jsonschema library validates data against your schema automatically. Install it with pip install jsonschema, then use it like this:
from jsonschema import validate, ValidationError
# Define schema (in practice, load from a JSON file)
weather_schema = {
"type": "object",
"required": ["current"],
"properties": {
"current": {
"type": "object",
"required": ["temperature_2m"],
"properties": {
"temperature_2m": {
"type": "number",
"minimum": -100,
"maximum": 60
},
"relative_humidity_2m": {
"type": "number",
"minimum": 0,
"maximum": 100
},
"wind_speed_10m": {
"type": "number",
"minimum": 0,
"maximum": 200
}
}
}
}
}
def validate_weather_with_schema(data):
"""Validate weather data using JSON Schema."""
try:
validate(instance=data, schema=weather_schema)
return True, None
except ValidationError as e:
return False, e.message
# Test with valid data
good_data = {
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3
}
}
valid, error = validate_weather_with_schema(good_data)
print(f"Valid: {valid}") # Valid: True
# Test with invalid data
bad_data = {
"current": {
"temperature_2m": 150 # Too hot!
}
}
valid, error = validate_weather_with_schema(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: 150 is greater than the maximum of 60
You replaced approximately 60 lines of manual validation code with a 20-line schema definition. The jsonschema library handles type checking, range validation, required fields, and nested structure automatically. The schema is also self-documenting—anyone can read it and understand what valid data looks like.
When Schemas Shine, When They Struggle
JSON Schema excels at structural and content validation but can't handle complex business logic. Here's what schemas do well and where they fall short:
| Validation Type | Schema Capability | Example |
|---|---|---|
| Type checking | ✓ Excellent | temperature must be number |
| Range validation | ✓ Excellent | humidity between 0 and 100 |
| Required fields | ✓ Excellent | temperature_2m is required |
| String patterns | ✓ Good | Email must match regex pattern |
| Cross-field logic | ✗ Limited | Snow at warm temperatures |
| Complex business rules | ✗ Cannot express | "Feels like" vs actual temperature |
Schemas automate mechanical validation beautifully but can't replace domain logic. That's where the hybrid approach comes in: use schemas for structure and content, use manual functions for business rules.
Takeaways & Next Step
- Declarative over imperative: Describe what's valid, not how to check it
- Massive code reduction: 60+ lines of manual validation becomes 20-line schema
- Self-documenting: Schema defines both validation rules and data structure documentation
- Schemas excel at mechanics: Type checking, ranges, required fields, patterns
- Schemas struggle with logic: Cross-field rules, domain constraints, complex conditions
Section 5 combines schemas with manual validation to get the best of both approaches. You'll see the pattern that most production systems use: automated validation for mechanical checks, custom code for business logic.
5. The Hybrid Approach
Here's what production systems actually do: they use schemas to automate structural and content validation, then add manual functions for business rules that schemas can't express. This hybrid approach gives you schema automation where it helps most and manual flexibility where you need it.
Let schemas handle the tedious work: type checking, range validation, required fields. Write manual code only for logic schemas can't express: cross-field rules, domain constraints, complex conditions. You get less code to maintain with clearer separation of concerns.
Combining Schema and Manual Validation
The hybrid validator runs schema validation first (structure and content), then manual validation second (business rules). Each handles what it does best.
from jsonschema import validate, ValidationError
# Schema handles structure and content
weather_schema = {
"type": "object",
"required": ["current"],
"properties": {
"current": {
"type": "object",
"required": ["temperature_2m"],
"properties": {
"temperature_2m": {
"type": "number",
"minimum": -100,
"maximum": 60
},
"relative_humidity_2m": {
"type": "number",
"minimum": 0,
"maximum": 100
},
"wind_speed_10m": {
"type": "number",
"minimum": 0,
"maximum": 200
},
"weather_code": {"type": "integer"},
"apparent_temperature": {"type": "number"}
}
}
}
}
def validate_business_rules(current_data):
"""Manual validation for business rules only."""
temp = current_data.get("temperature_2m")
weather_code = current_data.get("weather_code")
apparent_temp = current_data.get("apparent_temperature")
# Business rule: Snow at warm temperatures
if temp is not None and weather_code is not None:
if weather_code in [71, 73, 75] and temp > 5:
return False, f"Snow at {temp}°C is unlikely"
# Business rule: "Feels like" vs actual temperature
if temp is not None and apparent_temp is not None:
if abs(apparent_temp - temp) > 20:
return False, (
f"'Feels like' {apparent_temp}°C too different "
f"from actual {temp}°C"
)
return True, None
def validate_weather_hybrid(data):
"""
Hybrid approach: Schema for structure/content, manual for business rules.
Returns (is_valid, error_message).
"""
# Step 1: Schema validation (automated)
try:
validate(instance=data, schema=weather_schema)
except ValidationError as e:
return False, f"Schema validation failed: {e.message}"
# Step 2: Business rules (manual)
current = data["current"]
valid, error = validate_business_rules(current)
if not valid:
return False, f"Business rule validation failed: {error}"
return True, None
# Test with valid data
good_data = {
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0,
"apparent_temperature": 21.8
}
}
valid, error = validate_weather_hybrid(good_data)
print(f"Valid: {valid}") # Valid: True
# Test with business rule violation
bad_data = {
"current": {
"temperature_2m": 15.0, # Warm temperature
"weather_code": 71 # Snow code - doesn't make sense!
}
}
valid, error = validate_weather_hybrid(bad_data)
print(f"Valid: {valid}")
print(f"Error: {error}")
# Valid: False
# Error: Business rule validation failed: Snow at 15.0°C is unlikely
- Schema handles: Type checking, range validation, required fields, structure
- Manual functions handle: Cross-field logic, domain rules, complex conditions
- Result: Less code to maintain, clearer validation rules, easier updates
This is the approach most production systems converge on. Schemas eliminate repetitive code. Manual functions provide flexibility for complex rules. You get automation and control in the right proportions.
When to Use Which Approach
The right validation approach depends on your API complexity, team size, and maintenance requirements. Here's how to decide:
| Scenario | Best Approach | Rationale |
|---|---|---|
| Single API, simple structure | Manual validation | Overhead of schemas not worth it for small scale |
| Multiple APIs with similar structure | JSON Schema | Reuse schemas across APIs, reduce duplication |
| Complex business rules | Hybrid (Schema + Manual) | Schema for structure/content, manual for domain logic |
| Rapidly changing API | JSON Schema | Update schemas without changing code |
| Team with junior developers | JSON Schema | Declarative schemas easier to understand than validation code |
| Performance-critical path | Manual validation | Avoid schema parsing overhead in hot paths |
| Building an SDK/library | Hybrid (Schema + Manual) | Users can extend schemas for their needs |
Most applications eventually need the hybrid approach. Start with manual validation for simple cases. Add schemas when repetition becomes painful. Keep manual functions for business logic that schemas can't express.
Writing Good Error Messages
Validation error messages serve two audiences: developers debugging issues and operations teams monitoring production. Good messages specify what failed, why it matters, and what to do about it.
Be Specific
Don't say "validation failed." Say what failed and why.
# Bad
return False, "Invalid temperature"
# Good
return False, f"Temperature {temp}°C outside valid range (-100 to 60)"
Include Context
Show the actual value that failed, not just that something failed.
# Bad
return False, "Unrealistic humidity"
# Good
return False, f"Humidity {humidity}% exceeds maximum of 100%"
Distinguish Layers
Prefix messages with which layer failed for easier debugging.
# Each layer identifies itself
return False, "Structure validation failed: Missing 'current' section"
return False, "Content validation failed: Temperature must be numeric"
return False, "Business rule violated: Snow at 15°C is unlikely"
Performance Considerations
Validation adds overhead. For most applications, this cost is negligible compared to network I/O. But in performance-critical paths, validation can matter.
Schema validation is slower than manual: Parsing schemas and running generic checks costs more than targeted Python code. In hot paths processing thousands of requests per second, this difference matters.
Solutions: Cache compiled schemas, validate once at boundaries and pass validated data downstream, or use manual validation in performance-critical code paths.
When it doesn't matter: API requests typically take 100-500ms. Validation adds 1-5ms. In normal API integration code, this overhead is insignificant.
Takeaways & Next Step
- Hybrid approach dominates: Schemas for automation, manual code for domain logic
- Clear division: Schemas handle mechanics (types, ranges, structure), manual handles business rules
- Decision framework: Start manual, add schemas when repetition hurts, keep both for complex systems
- Good error messages: Specific, include context, identify which layer failed
- Performance awareness: Validation cost matters in hot paths, negligible for normal API calls
With validation implemented, Section 6 covers where to put it in your architecture, how to test it systematically, and how to integrate it with the error handling patterns from Chapter 9.
6. Where Validation Lives
Validation belongs at architectural boundaries—places where data crosses from one system or layer to another. Validate once at the boundary, then trust the data everywhere downstream. This keeps validation centralized and downstream code simple.
The key question isn't "should I validate" but "where should I validate." Put validation in the wrong place and you'll either miss problems (validate too late) or create redundant checks (validate too often).
The Three Validation Boundaries
Most applications have three natural validation boundaries. Each serves a different purpose and catches different problems:
API Client Layer
When: Immediately after receiving API response
What to validate: Structure and content (Layers 1 and 2)
Why here: Catch API provider issues before they spread through your system
class WeatherAPIClient:
"""Client for Open-Meteo API with validation."""
def fetch_weather(self, latitude, longitude):
"""Fetch and validate weather data."""
response = requests.get(self.url, params=params)
response.raise_for_status()
data = response.json()
# Validate at API boundary
valid, error = validate_weather_structure_and_content(data)
if not valid:
raise ValidationError(f"Invalid API response: {error}")
return data # Downstream code can trust this structure
Service Layer
When: Before processing data for business logic
What to validate: Business rules (Layer 3)
Why here: Enforce domain constraints before data affects business operations
class WeatherService:
"""Service layer with business rule validation."""
def get_current_conditions(self, latitude, longitude):
"""Get weather with business rule enforcement."""
# Data already structurally validated by API client
weather_data = self.api_client.fetch_weather(latitude, longitude)
# Validate business rules
current = weather_data["current"]
valid, error = validate_weather_business_rules(current)
if not valid:
logger.warning(f"Business rule violation: {error}")
# Decide: return partial data or raise error
return self.format_weather(weather_data)
Application Layer
When: Before displaying or storing data
What to validate: Application-specific requirements
Why here: Ensure data meets UI or storage constraints
class WeatherDashboard:
"""Application with display-specific validation."""
def display_weather(self, weather_data):
"""Display weather with UI-specific checks."""
current = weather_data["current"]
temp = current["temperature_2m"]
# Application-specific validation
if temp < -50:
# Too cold to display outdoor activities
return self.show_extreme_cold_warning()
# Normal display
return self.render_current_weather(current)
Each boundary validates different concerns. API client ensures structural validity. Service layer enforces business rules. Application layer handles display/storage requirements. Data flows through all three, gaining more trust at each boundary.
Fail-Fast vs Fail-Graceful
When validation fails, you have two choices: fail-fast (raise error immediately) or fail-graceful (log warning and continue with partial data). Choose based on how critical the data is.
| Data Criticality | Strategy | Example |
|---|---|---|
| Essential for operation | Fail-fast (raise error) | Payment amount, user authentication |
| Important but not critical | Fail-graceful (log + fallback) | Weather "feels like" temperature |
| Enhancement only | Fail-graceful (log + skip) | Weather icon, background images |
def get_weather_with_validation(latitude, longitude):
"""Fetch weather with appropriate failure handling."""
data = fetch_weather_data(latitude, longitude)
# Fail-fast for critical data
valid, error = validate_temperature(data["current"]["temperature_2m"])
if not valid:
raise ValidationError(f"Critical field invalid: {error}")
# Fail-graceful for enhancements
valid, error = validate_weather_icon(data["current"].get("weather_code"))
if not valid:
logger.warning(f"Optional field invalid: {error}")
data["current"]["weather_code"] = None # Use fallback icon
return data
Testing Your Validators
Validators are code like any other—they need tests. Good validator tests cover three scenarios: valid data passes, invalid data fails with correct errors, and edge cases are handled properly.
def test_valid_data_passes():
"""Valid weather data should pass all validation layers."""
data = {
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0
}
}
valid, error = validate_weather_data(data)
assert valid is True
assert error is None
def test_missing_section_fails():
"""Missing 'current' section should fail structural validation."""
data = {"other_section": {}}
valid, error = validate_weather_data(data)
assert valid is False
assert "current" in error.lower()
def test_invalid_temperature_fails():
"""Out-of-range temperature should fail content validation."""
data = {
"current": {
"temperature_2m": 150 # Too hot
}
}
valid, error = validate_weather_data(data)
assert valid is False
assert "temperature" in error.lower()
def test_business_rule_violation_fails():
"""Snow at warm temperature should fail business rules."""
data = {
"current": {
"temperature_2m": 15,
"weather_code": 71 # Snow code
}
}
valid, error = validate_weather_data(data)
assert valid is False
assert "snow" in error.lower()
def test_edge_cases():
"""Edge case values should be handled correctly."""
# Exact boundary values
data = {"current": {"temperature_2m": -100}} # Minimum valid
valid, _ = validate_weather_data(data)
assert valid is True
data = {"current": {"temperature_2m": 60}} # Maximum valid
valid, _ = validate_weather_data(data)
assert valid is True
# Just outside boundaries
data = {"current": {"temperature_2m": -100.1}}
valid, error = validate_weather_data(data)
assert valid is False
- Valid data: Confirm validators don't reject good data
- Each layer: Test structural, content, and business rule failures separately
- Edge cases: Boundary values, None, empty strings, extreme ranges
- Error messages: Verify errors are specific and helpful
Takeaways & Next Step
- Validate at boundaries: API client (structure/content), service layer (business rules), application layer (display/storage)
- Choose failure strategy: Fail-fast for critical data, fail-graceful for enhancements
- Test systematically: Valid data, invalid data, edge cases, error messages
- Centralize validation: Validate once at boundaries, trust data downstream
- Layer appropriately: Each boundary validates different concerns, data gains trust as it flows through
Section 7 provides practical exercises to reinforce these patterns. You'll validate real-world data, fix bugs in the News Aggregator, and apply validation to your own API integrations.
7. Practical Application
Time to apply what you've learned. These exercises build on each other, starting with simple validators and progressing to complete validation systems. Work through them in order—each reinforces patterns from earlier sections.
Exercise 1: Build a User Profile Validator
Create a manual three-layer validator for user profile data. This exercise reinforces the structural → content → business rules pattern.
{
"user": {
"id": 12345,
"email": "user@example.com",
"age": 28,
"username": "john_doe",
"preferences": {
"newsletter": true,
"theme": "dark"
}
}
}
Requirements:
- Structure: Root must have "user" object with id, email, age, username
- Content: Email must contain "@", age between 13-120, username 3-20 characters
- Business rules: If newsletter is true, email cannot be from disposable domains (tempmail, guerrillamail)
Task: Build validate_user_profile(data) following the three-layer pattern.
Show Solution
def validate_user_profile_structure(data):
"""Layer 1: Structural validation."""
if not isinstance(data, dict):
return False, "Data must be a dictionary"
if "user" not in data:
return False, "Missing 'user' object"
user = data["user"]
if not isinstance(user, dict):
return False, "'user' must be a dictionary"
required = ["id", "email", "age", "username"]
for field in required:
if field not in user:
return False, f"Missing required field: {field}"
return True, None
def validate_user_profile_content(user):
"""Layer 2: Content validation."""
# Email format
email = user["email"]
if "@" not in email:
return False, "Email must contain '@'"
# Age range
try:
age = int(user["age"])
if age < 13 or age > 120:
return False, f"Age {age} outside valid range (13-120)"
except (ValueError, TypeError):
return False, f"Age must be integer, got: {user['age']}"
# Username length
username = user["username"]
if not isinstance(username, str):
return False, "Username must be string"
if len(username) < 3 or len(username) > 20:
return False, f"Username length {len(username)} outside valid range (3-20)"
return True, None
def validate_user_profile_business_rules(user):
"""Layer 3: Business rules."""
disposable_domains = ["tempmail.com", "guerrillamail.com"]
preferences = user.get("preferences", {})
if preferences.get("newsletter"):
email = user["email"]
domain = email.split("@")[-1]
if domain in disposable_domains:
return False, f"Newsletter requires non-disposable email (got {domain})"
return True, None
def validate_user_profile(data):
"""Complete validation pipeline."""
valid, error = validate_user_profile_structure(data)
if not valid:
return False, f"Structure: {error}"
user = data["user"]
valid, error = validate_user_profile_content(user)
if not valid:
return False, f"Content: {error}"
valid, error = validate_user_profile_business_rules(user)
if not valid:
return False, f"Business rule: {error}"
return True, None
# Test
test_data = {
"user": {
"id": 12345,
"email": "user@example.com",
"age": 28,
"username": "john_doe",
"preferences": {"newsletter": True, "theme": "dark"}
}
}
valid, error = validate_user_profile(test_data)
print(f"Valid: {valid}, Error: {error}")
Exercise 2: Convert to JSON Schema
Take the user profile validator from Exercise 1 and convert structural/content validation to JSON Schema. Keep business rules as manual validation.
Task: Create a schema that validates structure and content, then combine it with the business rules validator from Exercise 1.
Show Solution
from jsonschema import validate, ValidationError
user_schema = {
"type": "object",
"required": ["user"],
"properties": {
"user": {
"type": "object",
"required": ["id", "email", "age", "username"],
"properties": {
"id": {"type": "integer"},
"email": {
"type": "string",
"pattern": "^.*@.*$"
},
"age": {
"type": "integer",
"minimum": 13,
"maximum": 120
},
"username": {
"type": "string",
"minLength": 3,
"maxLength": 20
},
"preferences": {
"type": "object",
"properties": {
"newsletter": {"type": "boolean"},
"theme": {
"type": "string",
"enum": ["light", "dark"]
}
}
}
}
}
}
}
def validate_user_hybrid(data):
"""Hybrid validation: schema + manual business rules."""
# Schema handles structure and content
try:
validate(instance=data, schema=user_schema)
except ValidationError as e:
return False, f"Schema validation failed: {e.message}"
# Manual business rules
user = data["user"]
valid, error = validate_user_profile_business_rules(user)
if not valid:
return False, error
return True, None
# Test
valid, error = validate_user_hybrid(test_data)
print(f"Valid: {valid}, Error: {error}")
Exercise 3: Fix the News Aggregator Bug
The News Aggregator from Chapter 11 has a validation bug where empty article titles pass through. Fix the normalizer to reject articles with empty titles.
def normalize_newsapi(response):
"""Transform NewsAPI response - has a validation bug."""
articles = []
for item in response.get("articles", []):
title = item.get("title", "").strip()
url = item.get("url", "").strip()
# BUG: This only checks URL!
if not url:
continue
articles.append({
"title": title,
"url": url,
"published_at": item.get("publishedAt", ""),
"source": "NewsAPI"
})
return articles
Task: Write a test that catches this bug, then fix the normalizer to validate both title and URL.
Show Solution
# Test that catches the bug
def test_empty_title_rejected():
"""Articles with empty titles should be rejected."""
response = {
"articles": [
{
"title": "", # Empty title
"url": "https://example.com/article",
"publishedAt": "2025-01-15T10:00:00Z"
}
]
}
articles = normalize_newsapi(response)
assert len(articles) == 0, "Empty title should be rejected"
# Fixed normalizer
def normalize_newsapi(response):
"""Transform NewsAPI response - fixed validation."""
articles = []
for item in response.get("articles", []):
title = item.get("title", "").strip()
url = item.get("url", "").strip()
# FIX: Check BOTH title and url
if not title or not url:
continue
articles.append({
"title": title,
"url": url,
"published_at": item.get("publishedAt", ""),
"source": "NewsAPI"
})
return articles
# Test now passes
test_empty_title_rejected()
print("✓ Test passed: Empty titles are rejected")
Exercise 4: Validate Forecast Data
Build a validator for weather forecast data that enforces cross-field business rules.
{
"daily": {
"time": ["2025-01-15", "2025-01-16", "2025-01-17"],
"temperature_2m_max": [12.5, 14.2, 13.8],
"temperature_2m_min": [6.1, 7.3, 8.0]
}
}
Business Rules:
- All arrays must have same length
- Dates must be sequential (no gaps)
- temperature_2m_max >= temperature_2m_min for each day
- No temperature changes > 30°C between consecutive days
Task: Implement validate_forecast_data(data) that checks these rules.
Show Solution
from datetime import datetime, timedelta
def validate_forecast_data(data):
"""Validate forecast data with cross-field business rules."""
# Structure
if "daily" not in data:
return False, "Missing 'daily' section"
daily = data["daily"]
required = ["time", "temperature_2m_max", "temperature_2m_min"]
for field in required:
if field not in daily:
return False, f"Missing required field: {field}"
times = daily["time"]
temp_max = daily["temperature_2m_max"]
temp_min = daily["temperature_2m_min"]
# Business Rule 1: Same length
if not (len(times) == len(temp_max) == len(temp_min)):
return False, (
f"Array length mismatch: times={len(times)}, "
f"max={len(temp_max)}, min={len(temp_min)}"
)
if len(times) == 0:
return False, "Empty forecast arrays"
# Business Rule 2: Sequential dates
prev_date = None
for date_str in times:
try:
date = datetime.strptime(date_str, "%Y-%m-%d")
if prev_date and (date - prev_date).days != 1:
return False, f"Date gap: {prev_date.date()} to {date.date()}"
prev_date = date
except ValueError:
return False, f"Invalid date format: {date_str}"
# Business Rule 3: Max >= Min
for i, (tmax, tmin) in enumerate(zip(temp_max, temp_min)):
if tmax < tmin:
return False, f"Day {i}: max {tmax}°C < min {tmin}°C"
# Business Rule 4: No extreme jumps
for i in range(1, len(temp_max)):
prev_avg = (temp_max[i-1] + temp_min[i-1]) / 2
curr_avg = (temp_max[i] + temp_min[i]) / 2
change = abs(curr_avg - prev_avg)
if change > 30:
return False, (
f"Extreme temp change between day {i-1} and {i}: "
f"{change:.1f}°C"
)
return True, None
# Test
valid_forecast = {
"daily": {
"time": ["2025-01-15", "2025-01-16", "2025-01-17"],
"temperature_2m_max": [12.5, 14.2, 13.8],
"temperature_2m_min": [6.1, 7.3, 8.0]
}
}
valid, error = validate_forecast_data(valid_forecast)
print(f"Valid: {valid}, Error: {error}")
Key Takeaways
- Three-layer pattern: Structural, content, business rules—each catches different problems
- Manual vs schema: Manual for flexibility, schemas for automation, hybrid for production
- Architectural boundaries: Validate at API client, service layer, application layer
- Failure strategies: Fail-fast for critical data, fail-graceful for enhancements
- Testing validators: Valid data, invalid data, edge cases, error messages
- Production patterns: You can now build validation systems that scale
Looking Forward
In Chapter 13 (Production-Ready Weather Dashboard Capstone), you'll apply both error handling from Chapter 9 and validation from this chapter across multiple APIs simultaneously. You'll see how these patterns work together when coordinating data from different sources, each with their own failure modes and data quality issues.
You now understand that reliable API integration requires two complementary approaches: error handling ensures requests complete successfully, and validation ensures the responses contain usable data. Together, they create applications that users can depend on.