Introduction
- Learn how to reduce AWS costs that occur alongside your Datadog subscription fees
- Reading time: about 5 minutes
- Technology stack: AWS, Datadog
Target Audience
- Teams using Datadog to monitor AWS environments
- Anyone looking to optimize AWS integration costs with Datadog
- Infrastructure/SRE professionals concerned about cloud monitoring costs
Technical Background
When monitoring AWS environments, many organizations use excellent monitoring tools like Datadog. However, beyond Datadog’s monthly fees, there’s an often-overlooked cost: the AWS CloudWatch API call charges. Datadog regularly calls the AWS CloudWatch API to collect metrics, and these API calls generate additional costs.
According to Datadog’s official documentation, the AWS integration uses crawlers that collect metrics every 10 minutes by default. When receiving metrics from CloudWatch at 5-minute intervals, there can be a total delay of 15-20 minutes: 5-10 minutes for CloudWatch processing, 10 minutes for Datadog’s default polling interval, and up to 5 more minutes due to queuing and CloudWatch API limitations.
Problem Details
- Additional AWS CloudWatch API call costs are incurred separate from Datadog subscription fees
- Default settings collect data from all regions at 10-minute intervals
- Unnecessary costs are generated from unused regions and overly frequent polling
According to an AWS re:Post article, additional costs from the Datadog AWS integration are significantly influenced by the frequency and scope of CloudWatch API calls. With default settings, metrics are collected from all regions and services, resulting in costs for resources you may not even be using.
Solution Approach
I’ll introduce two cost-reduction approaches I’ve implemented:
- Adjusting Metric Collection Intervals: Extending the default 10-minute interval to reduce API call frequency
- Limiting Collection Regions: Restricting metric collection to only the regions you actually use
These configuration changes can be implemented without significantly reducing monitoring quality while achieving cost savings.
Implementation Steps
1. Adjusting Metric Collection Intervals
By default, Datadog calls the AWS CloudWatch API every 10 minutes to collect metrics. By extending this interval, you can reduce the number of API calls and associated costs.
# Before Change
Polling interval: 10 minutes (default setting)
API calls per day: 144 calls/day/metric
# After Change
Polling interval: 30 minutes (example)
API calls per day: 48 calls/day/metric
- Implementation purpose: Avoid unnecessarily frequent metric collection to reduce costs
- Operating principle: Modify the polling interval configured on the Datadog server side
- Note: This change requires a request to Datadog Support
In many cases, extending the interval from 10 minutes to 30 minutes won’t cause operational issues. However, consider the balance carefully for critical metrics or scenarios requiring rapid detection.
According to the Datadog glossary, collection interval refers to how frequently agents or integrations collect metrics. Extending this interval reduces resource usage and costs but comes with the trade-off of lower time-series data granularity.
2. Limiting Collection Regions
By default, Datadog collects metrics from all AWS regions. You can reduce costs by stopping collection from regions you aren’t using or that are of low importance.
# Datadog configuration example
integration:
aws:
regions:
- ap-northeast-1 # Tokyo region
- us-east-1 # Virginia region (used for CloudFront, etc.)
# Other regions disabled
If you’re using Terraform, you can define your resources like this:
resource "datadog_integration_aws" "integration" {
account_id = "123456789012" # AWS Account ID
role_name = "DatadogAWSIntegrationRole"
# Limit monitored regions to Tokyo and Virginia
host_tags = ["region:ap-northeast-1", "region:us-east-1"]
# Exclude other regions
excluded_regions = [
"ap-northeast-2", # Seoul
"ap-northeast-3", # Osaka
"ap-southeast-1", # Singapore
"ap-southeast-2", # Sydney
"eu-central-1", # Frankfurt
"eu-west-1", # Ireland
"eu-west-2", # London
"sa-east-1", # Sao Paulo
"us-east-2", # Ohio
"us-west-1", # Northern California
"us-west-2", # Oregon
# Add others as needed
]
}
- Implementation purpose: Reduce API calls to unnecessary regions
- Operating principle: Limit polling target regions in Datadog settings
- Note: Remember to update monitoring settings when you start using new regions
In many cases, you might only be actively using the Tokyo region (ap-northeast-1) and the Virginia region (us-east-1) for CloudFront. Stopping collection from other regions can significantly reduce costs.
Results and Effects
Expected benefits from applying these changes:
- Reduced API call costs
- Maintained monitoring quality for important metrics
- Cost reduction proportional to the number of regions and collection frequency
Tips & Best Practices
- Vary polling intervals based on metric importance (shorter for critical metrics, longer for less important ones)
- Regularly check the "API Request" category costs in AWS Cost Explorer
- Review monitoring settings when starting to use new regions
- According to Datadog’s official FAQ, if you need metrics with shorter latency, you can configure CloudWatch Metric Streams and Amazon Data Firehose to get metrics with a 2-3 minute delay
- To get system-level metrics more quickly, consider installing the Datadog Agent on your cloud hosts where possible
Conclusion
When using Datadog to monitor AWS, it’s important to be aware of not just Datadog’s own fees but also the API call costs incurred on the AWS side. Two simple configuration changes—adjusting metric collection intervals and limiting regions—can lead to cost reductions. The effect is particularly noticeable in large environments or when monitoring multiple accounts.
Take a look at your settings to optimize costs while maintaining monitoring quality. These adjustments have minimal impact on Datadog’s performance while reducing your AWS bill.
References
- Datadog AWS Integration Documentation
- AWS Cloudwatch API Pricing
- AWS Cost Optimization Best Practices
- Terraform Datadog Provider Documentation
- Cloud Metrics Delay – Explanation of Datadog’s AWS crawler and metric delays
- Collection Interval – Datadog Glossary – Explanation of metric collection intervals
- Optimizing Amazon CloudWatch Spend for Your Datadog AWS Integration – Optimization guide by AWS re:Post
- AWS Integration and CloudWatch FAQ – Datadog guide for reducing metric delays