Back to Curriculum

Advanced Error Handling and Debugging

📚 Lesson 20 of 20 ⏱️ 50 min

Advanced Error Handling and Debugging

50 min

Error handling is crucial for building robust Python applications that gracefully handle failures and provide meaningful feedback. Python's exception system allows you to catch, handle, and propagate errors in a structured way. Understanding different error types (syntax errors, runtime errors, logical errors) and how to handle them appropriately is essential for writing production-quality code. Good error handling improves user experience, makes debugging easier, and prevents applications from crashing unexpectedly.

Python's exception hierarchy starts with `BaseException` (all exceptions inherit from this) and `Exception` (most user-defined exceptions inherit from this). Built-in exceptions include `ValueError` (wrong value type), `TypeError` (wrong type), `KeyError`/`IndexError` (missing keys/indices), `FileNotFoundError` (file doesn't exist), and many more. You can catch specific exceptions, multiple exceptions, or use `except Exception` to catch all exceptions (though this should be used carefully). The `else` clause runs when no exception occurs, and `finally` always runs for cleanup.

Custom exceptions allow you to create domain-specific error types that clearly communicate what went wrong. Create custom exceptions by inheriting from `Exception` (or more specific exceptions) and adding relevant attributes. Exception hierarchies enable catching related errors together (catch the base class to catch all subclasses). Custom exceptions should be descriptive, include relevant context, and follow naming conventions (end with 'Error'). They make error handling more precise and code more maintainable.

Python's debugging tools include `pdb` (Python debugger for interactive debugging), `traceback` (for extracting and formatting stack traces), `logging` (for recording application events and errors), `assert` statements (for debugging assumptions), and IDE debuggers. `pdb` allows you to set breakpoints, step through code, inspect variables, and evaluate expressions. The `logging` module provides flexible logging with different levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) and multiple handlers (console, file, network). Proper logging is essential for debugging production issues.

Error handling strategies include fail-fast (raise exceptions immediately when errors are detected), fail-safe (handle errors gracefully and continue), and defensive programming (validate inputs, handle edge cases). The choice depends on context—critical errors should fail fast, while recoverable errors can be handled gracefully. Always log errors with sufficient context (what happened, where, when, why) for debugging. Use specific exception types rather than catching everything, and let exceptions propagate when you can't handle them meaningfully.

Best practices include using specific exception types, providing meaningful error messages, logging errors with context, using try-except-finally for resource cleanup, creating custom exceptions for domain-specific errors, and testing error handling paths. Avoid bare `except:` clauses (catch `Exception` explicitly), don't suppress exceptions silently, and don't use exceptions for control flow. Good error handling makes applications more robust, debuggable, and user-friendly.

Key Concepts

  • Python's exception system handles errors in a structured way.
  • Custom exceptions enable domain-specific error types.
  • try-except-finally blocks handle errors and ensure cleanup.
  • pdb, traceback, and logging are essential debugging tools.
  • Good error handling improves robustness and debuggability.

Learning Objectives

Master

  • Creating custom exceptions for domain-specific errors
  • Using try-except-finally for proper error handling and cleanup
  • Working with Python debugging tools (pdb, traceback, logging)
  • Implementing effective error handling strategies

Develop

  • Defensive programming and error handling thinking
  • Understanding exception propagation and handling
  • Designing robust, fault-tolerant applications

Tips

  • Use specific exception types rather than catching everything.
  • Always log errors with sufficient context for debugging.
  • Use try-except-finally for resource cleanup.
  • Create custom exceptions for domain-specific error types.

Common Pitfalls

  • Using bare except: clauses, catching system-exiting exceptions.
  • Suppressing exceptions silently, making debugging impossible.
  • Using exceptions for control flow, which is inefficient and unclear.
  • Not providing meaningful error messages, making debugging difficult.

Summary

  • Python's exception system provides structured error handling.
  • Custom exceptions enable domain-specific, meaningful error types.
  • try-except-finally ensures proper error handling and cleanup.
  • Debugging tools (pdb, traceback, logging) are essential for troubleshooting.
  • Good error handling makes applications robust and maintainable.

Exercise

Implement a comprehensive error handling and debugging system with custom exceptions, logging, debugging utilities, and error recovery mechanisms.

import traceback
import logging
import sys
from typing import Any, Dict, List, Optional, Callable
from datetime import datetime
import json

# Configure comprehensive logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('application.log'),
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)

class ApplicationError(Exception):
    """Base exception class for application-specific errors."""
    
    def __init__(self, message: str, error_code: str = None, details: Dict[str, Any] = None):
        super().__init__(message)
        self.message = message
        self.error_code = error_code or "UNKNOWN_ERROR"
        self.details = details or {}
        self.timestamp = datetime.now()
        self.traceback = traceback.format_exc()
        
    def to_dict(self) -> Dict[str, Any]:
        """Convert exception to dictionary for logging or API responses."""
        return {
            "error_type": self.__class__.__name__,
            "message": self.message,
            "error_code": self.error_code,
            "details": self.details,
            "timestamp": self.timestamp.isoformat(),
            "traceback": self.traceback
        }
        
    def __str__(self):
        return f"{self.__class__.__name__}: {self.message} (Code: {self.error_code})"

class ValidationError(ApplicationError):
    """Exception raised when data validation fails."""
    
    def __init__(self, field: str, value: Any, rule: str, message: str = None):
        super().__init__(
            message or f"Validation failed for field '{field}' with value '{value}' using rule '{rule}'",
            "VALIDATION_ERROR",
            {"field": field, "value": value, "rule": rule}
        )
        self.field = field
        self.value = value
        self.rule = rule

class DatabaseError(ApplicationError):
    """Exception raised when database operations fail."""
    
    def __init__(self, operation: str, table: str, message: str, original_error: Exception = None):
        super().__init__(
            f"Database operation '{operation}' on table '{table}' failed: {message}",
            "DATABASE_ERROR",
            {"operation": operation, "table": table, "original_error": str(original_error) if original_error else None}
        )
        self.operation = operation
        self.table = table
        self.original_error = original_error

class NetworkError(ApplicationError):
    """Exception raised when network operations fail."""
    
    def __init__(self, url: str, method: str, status_code: int = None, message: str = None):
        super().__init__(
            message or f"Network request to '{url}' using {method} failed",
            "NETWORK_ERROR",
            {"url": url, "method": method, "status_code": status_code}
        )
        self.url = url
        self.method = method
        self.status_code = status_code

class ErrorHandler:
    """Centralized error handling and logging system."""
    
    def __init__(self):
        self.error_counts: Dict[str, int] = {}
        self.error_history: List[Dict[str, Any]] = []
        self.max_history_size = 1000
        self.recovery_strategies: Dict[str, Callable] = {}
        
    def handle_error(self, error: Exception, context: Dict[str, Any] = None) -> Dict[str, Any]:
        """Handle an error and return error information."""
        # Log the error
        self._log_error(error, context)
        
        # Update error counts
        error_type = error.__class__.__name__
        self.error_counts[error_type] = self.error_counts.get(error_type, 0) + 1
        
        # Add to history
        self._add_to_history(error, context)
        
        # Try to recover if possible
        recovery_result = self._attempt_recovery(error, context)
        
        # Return error information
        return {
            "error": error,
            "context": context or {},
            "recovery_attempted": recovery_result is not None,
            "recovery_result": recovery_result,
            "error_count": self.error_counts[error_type],
            "timestamp": datetime.now().isoformat()
        }
        
    def _log_error(self, error: Exception, context: Dict[str, Any] = None):
        """Log error with context information."""
        if isinstance(error, ApplicationError):
            error_info = error.to_dict()
            logger.error(f"Application error: {error_info}")
        else:
            logger.error(f"System error: {str(error)}", exc_info=True)
        
        if context:
            logger.error(f"Error context: {json.dumps(context, indent=2, default=str)}")
        
    def _add_to_history(self, error: Exception, context: Dict[str, Any] = None):
        """Add error to history, maintaining size limit."""
        error_record = {
            "timestamp": datetime.now().isoformat(),
            "error_type": error.__class__.__name__,
            "message": str(error),
            "context": context or {}
        }
        
        self.error_history.append(error_record)
        
        # Maintain history size
        if len(self.error_history) > self.max_history_size:
            self.error_history.pop(0)
        
    def _attempt_recovery(self, error: Exception, context: Dict[str, Any] = None) -> Any:
        """Attempt to recover from the error using registered strategies."""
        error_type = error.__class__.__name__
        
        if error_type in self.recovery_strategies:
            try:
                recovery_func = self.recovery_strategies[error_type]
                result = recovery_func(error, context)
                logger.info(f"Recovery strategy executed for {error_type}: {result}")
                return result
            except Exception as recovery_error:
                logger.error(f"Recovery strategy failed for {error_type}: {recovery_error}")
                return None
        
        return None
        
    def register_recovery_strategy(self, error_type: str, strategy: Callable):
        """Register a recovery strategy for a specific error type."""
        self.recovery_strategies[error_type] = strategy
        logger.info(f"Recovery strategy registered for {error_type}")
        
    def get_error_statistics(self) -> Dict[str, Any]:
        """Get statistics about errors."""
        total_errors = sum(self.error_counts.values())
        
        return {
            "total_errors": total_errors,
            "error_counts": self.error_counts.copy(),
            "history_size": len(self.error_history),
            "recovery_strategies": list(self.recovery_strategies.keys())
        }
        
    def clear_history(self):
        """Clear error history."""
        self.error_history.clear()
        logger.info("Error history cleared")

class DebugHelper:
    """Utility class for debugging and troubleshooting."""
    
    @staticmethod
    def inspect_object(obj: Any, max_depth: int = 3, current_depth: int = 0) -> Dict[str, Any]:
        """Inspect an object and return detailed information."""
        if current_depth >= max_depth:
            return {"type": type(obj).__name__, "value": str(obj), "max_depth_reached": True}
        
        try:
            if hasattr(obj, '__dict__'):
                # Object with attributes
                result = {
                    "type": type(obj).__name__,
                    "module": getattr(obj, '__module__', 'Unknown'),
                    "attributes": {}
                }
                
                for attr_name, attr_value in obj.__dict__.items():
                    if not attr_name.startswith('_'):
                        result["attributes"][attr_name] = DebugHelper.inspect_object(
                            attr_value, max_depth, current_depth + 1
                        )
                
                return result
            elif isinstance(obj, (list, tuple)):
                # Sequence types
                return {
                    "type": type(obj).__name__,
                    "length": len(obj),
                    "items": [DebugHelper.inspect_object(item, max_depth, current_depth + 1) 
                              for item in obj[:5]]  # Limit to first 5 items
                }
            elif isinstance(obj, dict):
                # Dictionary
                return {
                    "type": type(obj).__name__,
                    "length": len(obj),
                    "keys": list(obj.keys())[:10],  # Limit to first 10 keys
                    "sample_values": {k: DebugHelper.inspect_object(v, max_depth, current_depth + 1) 
                                      for k, v in list(obj.items())[:5]}
                }
            else:
                # Simple types
                return {
                    "type": type(obj).__name__,
                    "value": str(obj),
                    "repr": repr(obj)
                }
        except Exception as e:
            return {
                "type": type(obj).__name__,
                "error": f"Inspection failed: {str(e)}"
            }
    
    @staticmethod
    def get_call_stack() -> List[Dict[str, str]]:
        """Get the current call stack information."""
        stack = traceback.extract_stack()
        return [
            {
                "filename": frame.filename,
                "line_number": frame.lineno,
                "function_name": frame.name,
                "line_content": frame.line
            }
            for frame in stack[:-1]  # Exclude this function
        ]
    
    @staticmethod
    def measure_execution_time(func: Callable, *args, **kwargs) -> Dict[str, Any]:
        """Measure execution time and memory usage of a function."""
        import time
        import psutil
        
        start_time = time.time()
        start_memory = psutil.Process().memory_info().rss if 'psutil' in sys.modules else 0
        
        try:
            result = func(*args, **kwargs)
            execution_time = time.time() - start_time
            end_memory = psutil.Process().memory_info().rss if 'psutil' in sys.modules else 0
            
            return {
                "success": True,
                "result": result,
                "execution_time": execution_time,
                "memory_delta": end_memory - start_memory,
                "start_memory": start_memory,
                "end_memory": end_memory
            }
        except Exception as e:
            execution_time = time.time() - start_time
            return {
                "success": False,
                "error": str(e),
                "execution_time": execution_time,
                "memory_delta": 0
            }

class DataValidator:
    """Data validation with comprehensive error reporting."""
    
    def __init__(self):
        self.validation_rules = {}
        self.custom_validators = {}
        
    def add_rule(self, field: str, rule: str, validator: Callable, message: str = None):
        """Add a validation rule for a field."""
        if field not in self.validation_rules:
            self.validation_rules[field] = []
        
        self.validation_rules[field].append({
            "rule": rule,
            "validator": validator,
            "message": message
        })
        
    def validate(self, data: Dict[str, Any]) -> List[ValidationError]:
        """Validate data against all registered rules."""
        errors = []
        
        for field, rules in self.validation_rules.items():
            if field in data:
                value = data[field]
                
                for rule_info in rules:
                    try:
                        if not rule_info["validator"](value):
                            message = rule_info["message"] or f"Validation failed for {field}"
                            errors.append(ValidationError(field, value, rule_info["rule"], message))
                    except Exception as e:
                        errors.append(ValidationError(field, value, rule_info["rule"], f"Validator error: {e}"))
            else:
                # Field is missing
                errors.append(ValidationError(field, None, "required", f"Field '{field}' is required"))
        
        return errors

# Example usage and demonstration
def demonstrate_error_handling():
    """Demonstrate the comprehensive error handling system."""
    print("=== Comprehensive Error Handling Demo ===\n")
    
    # Initialize error handler
    error_handler = ErrorHandler()
    
    # Register recovery strategies
    def recover_from_validation_error(error: ValidationError, context: Dict[str, Any]) -> str:
        """Recovery strategy for validation errors."""
        if error.field == "email" and error.rule == "format":
            return f"Attempting to fix email format: {error.value}"
        return "No recovery strategy available"
    
    error_handler.register_recovery_strategy("ValidationError", recover_from_validation_error)
    
    # Initialize data validator
    validator = DataValidator()
    
    # Add validation rules
    validator.add_rule("email", "format", lambda x: '@' in str(x), "Email must contain @ symbol")
    validator.add_rule("age", "range", lambda x: 0 <= int(x) <= 120, "Age must be between 0 and 120")
    validator.add_rule("name", "length", lambda x: len(str(x)) >= 2, "Name must be at least 2 characters")
    
    # Test data validation
    print("1. Data Validation with Error Handling:")
    test_data = {
        "email": "invalid-email",
        "age": 150,
        "name": "A"
    }
    
    try:
        validation_errors = validator.validate(test_data)
        if validation_errors:
            print(f"  Validation errors found: {len(validation_errors)}")
            for error in validation_errors:
                print(f"    - {error}")
                # Handle the error
                error_info = error_handler.handle_error(error, {"data": test_data})
                print(f"      Error handled: {error_info['recovery_result']}")
        else:
            print("  All data is valid!")
    except Exception as e:
        print(f"  Validation failed: {e}")
    
    # Test custom exceptions
    print("\n2. Custom Exception Handling:")
    
    try:
        # Simulate a database error
        raise DatabaseError("SELECT", "users", "Connection timeout", ConnectionError("Connection failed"))
    except DatabaseError as e:
        error_info = error_handler.handle_error(e, {"operation": "user_query"})
        print(f"  Database error handled: {error_info['error']}")
    
    # Test debugging utilities
    print("\n3. Debugging Utilities:")
    
    # Inspect an object
    sample_object = {
        "name": "John Doe",
        "age": 30,
        "skills": ["Python", "JavaScript", "SQL"]
    }
    
    inspection = DebugHelper.inspect_object(sample_object)
    print(f"  Object inspection: {json.dumps(inspection, indent=2)}")
    
    # Get call stack
    stack = DebugHelper.get_call_stack()
    print(f"  Call stack depth: {len(stack)}")
    
    # Measure execution time
    def sample_function():
        import time
        time.sleep(0.1)
        return "Function completed"
    
    timing = DebugHelper.measure_execution_time(sample_function)
    print(f"  Function timing: {timing}")
    
    # Get error statistics
    print("\n4. Error Statistics:")
    stats = error_handler.get_error_statistics()
    print(f"  Total errors: {stats['total_errors']}")
    print(f"  Error counts: {stats['error_counts']}")
    print(f"  Recovery strategies: {stats['recovery_strategies']}")

def demonstrate_error_recovery():
    """Demonstrate error recovery mechanisms."""
    print("\n=== Error Recovery Demonstration ===\n")
    
    error_handler = ErrorHandler()
    
    # Register a recovery strategy for network errors
    def retry_network_request(error: NetworkError, context: Dict[str, Any]) -> str:
        """Retry network request with exponential backoff."""
        import time
        
        max_retries = context.get('max_retries', 3)
        current_retry = context.get('current_retry', 0)
        
        if current_retry < max_retries:
            wait_time = 2 ** current_retry
            print(f"    Retrying in {wait_time} seconds... (attempt {current_retry + 1}/{max_retries})")
            time.sleep(wait_time)
            return f"Retry attempt {current_retry + 1} completed"
        else:
            return "Max retries exceeded"
    
    error_handler.register_recovery_strategy("NetworkError", retry_network_request)
    
    # Simulate network errors with recovery
    print("1. Network Error Recovery:")
    
    for attempt in range(3):
        try:
            # Simulate network failure
            raise NetworkError("https://api.example.com", "GET", 500, "Internal server error")
        except NetworkError as e:
            context = {"max_retries": 3, "current_retry": attempt}
            error_info = error_handler.handle_error(e, context)
            print(f"  Attempt {attempt + 1}: {error_info['recovery_result']}")
            
            if "Max retries exceeded" in error_info['recovery_result']:
                break
    
    print("\n2. Error History Analysis:")
    
    # Show error history
    stats = error_handler.get_error_statistics()
    print(f"  Total errors in session: {stats['total_errors']}")
    print(f"  Error types encountered: {list(stats['error_counts'].keys())}")
    
    # Clear history
    error_handler.clear_history()
    print("  Error history cleared")

if __name__ == "__main__":
    demonstrate_error_handling()
    demonstrate_error_recovery()

Code Editor

Output