Amazon CloudWatch - Monitoring
65 minAmazon CloudWatch is a monitoring and observability service that provides data and actionable insights for AWS resources, applications, and services. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, giving you a unified view of AWS resources, applications, and services. This visibility enables you to monitor application performance, troubleshoot issues, and optimize resource utilization.
CloudWatch Metrics are data points about the performance of your AWS resources. AWS services automatically send metrics to CloudWatch, including CPU utilization, network traffic, request counts, and error rates. You can also publish custom metrics from your applications. Metrics are stored for 15 months, enabling long-term trend analysis. Understanding metrics helps you identify performance bottlenecks and optimize resource usage.
CloudWatch Logs enable you to centralize logs from EC2 instances, Lambda functions, and other sources. Log groups organize related log streams, and log streams contain sequences of log events. CloudWatch Logs Insights provides interactive query capabilities to search and analyze log data. Log retention policies automatically delete old logs, helping manage storage costs while retaining logs for compliance requirements.
CloudWatch Alarms monitor metrics and perform actions when thresholds are breached. Alarms can trigger Auto Scaling actions, send SNS notifications, or perform EC2 actions. Alarms evaluate metrics over time periods and can use statistical functions (average, sum, minimum, maximum) to determine alarm state. Configuring appropriate alarms enables proactive issue detection and automated responses.
CloudWatch Dashboards provide customizable views of your metrics and alarms. Dashboards can display multiple metrics from different AWS services, giving you a unified view of your application's health and performance. Dashboards are useful for operations teams to monitor applications at a glance and can be shared across teams. Custom dashboards enable you to focus on metrics relevant to your specific use cases.
CloudWatch also provides Events (now EventBridge) for event-driven architectures, X-Ray for distributed tracing, and ServiceLens for end-to-end observability. These features complement metrics and logs, providing comprehensive observability for modern applications. Understanding CloudWatch's capabilities enables you to build observable, maintainable applications on AWS.
Key Concepts
- CloudWatch provides monitoring and observability for AWS resources.
- Metrics are data points about resource and application performance.
- CloudWatch Logs centralize and analyze log data from multiple sources.
- Alarms monitor metrics and trigger actions when thresholds are breached.
- Dashboards provide customizable views of metrics and alarms.
Learning Objectives
Master
- Understanding CloudWatch metrics and their use cases
- Configuring CloudWatch Logs for centralized logging
- Creating and configuring CloudWatch alarms
- Building CloudWatch dashboards for monitoring
Develop
- Understanding observability and monitoring best practices
- Designing comprehensive monitoring strategies
- Implementing proactive issue detection and response
Tips
- Set up alarms for critical metrics to detect issues early.
- Use CloudWatch Logs Insights to query and analyze log data efficiently.
- Create custom metrics for application-specific monitoring needs.
- Use CloudWatch dashboards to visualize key metrics at a glance.
Common Pitfalls
- Not setting up monitoring, missing performance issues and failures.
- Creating too many alarms, causing alert fatigue and missing important alerts.
- Not retaining logs appropriately, losing important debugging information.
- Not using custom metrics, missing application-specific insights.
Summary
- CloudWatch provides comprehensive monitoring and observability.
- Metrics, logs, and alarms enable proactive issue detection.
- Dashboards provide unified views of application health.
- Proper monitoring enables reliable, maintainable applications.
Exercise
Set up CloudWatch monitoring, alarms, and dashboards.
# Create a CloudWatch dashboard
cat > dashboard.json << 'EOF'
{
"widgets": [
{
"type": "metric",
"x": 0,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"metrics": [
["AWS/EC2", "CPUUtilization", "AutoScalingGroupName", "my-asg"]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "EC2 CPU Utilization"
}
},
{
"type": "metric",
"x": 12,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"metrics": [
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", "my-alb"]
],
"period": 300,
"stat": "Sum",
"region": "us-east-1",
"title": "ALB Request Count"
}
}
]
}
EOF
aws cloudwatch put-dashboard --dashboard-name "MyAppDashboard" --dashboard-body file://dashboard.json
# Create a CloudWatch alarm
aws cloudwatch put-metric-alarm \
--alarm-name "HighCPUAlarm" \
--alarm-description "Alarm when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-sns-topic
Exercise Tips
- Use CloudWatch Logs Insights to query logs with SQL-like syntax.
- Set up metric filters to extract custom metrics from log data.
- Use CloudWatch composite alarms to combine multiple alarm conditions.
- Enable detailed monitoring for EC2 instances (1-minute vs 5-minute intervals).