Is Datadog Down? Current Status, Outage Reports & User Feedback
Operational Last checked: 3 minutes ago
User Reports (Last 12 Hours)
Quick Actions
Incident History
Resolved Incidents
We have detected a potential outage for Datadog.
Started:
Resolved:
We have detected a potential outage for Datadog.
Started:
Resolved:
We have detected a potential outage for Datadog.
Started:
Resolved:
We have detected a potential outage for Datadog.
Started:
Resolved:
We have detected a potential outage for Datadog.
Started:
Resolved:
We have detected a potential outage for Datadog.
Started:
Resolved:
Frequently Asked Questions
Agent reporting issues may be caused by configuration errors, connectivity problems, or service disruptions. Verify your API key is correct, check the agent logs for connection errors, and ensure your network allows outbound traffic to Datadog's intake endpoints.
Access issues can result from authentication problems, account status, or service disruptions. Try clearing your browser cache, check if your SSO configuration is working properly, and verify your account hasn't exceeded user limits.
Log ingestion issues may be caused by collection configuration, indexing limits, or service disruptions. Check your log collection settings, verify you haven't exceeded your indexed log quota, and examine agent logs for any errors related to log transmission.
Alert issues can stem from monitor configuration, evaluation timing, or notification channel problems. Verify your monitor conditions and thresholds are correctly defined, check the monitor's evaluation history, and test notification channels to ensure they're functioning properly.
Data gaps may be caused by agent connectivity issues, aggregation settings, or service disruptions. Check for agent restarts or outages during the gap periods, verify your data retention and aggregation settings, and examine if the gaps affect all metrics or just specific ones.
API issues can result from authentication problems, rate limiting, or service disruptions. Verify your API and application keys are valid, check if you're exceeding API rate limits, and implement retry logic for critical API operations.
Synthetic monitoring failures may be caused by endpoint availability, test configuration, or service disruptions. Verify the monitored endpoints are accessible, check your test configuration for any unrealistic expectations, and examine if failures correlate with specific test locations or time periods.
APM tracing issues can stem from instrumentation problems, sampling configuration, or service disruptions. Verify your application is properly instrumented for your language and framework, check your sampling rates aren't filtering out too much data, and ensure your application can reach Datadog's trace intake endpoints.
About Datadog
Datadog is a monitoring and security platform for cloud applications that provides comprehensive visibility into infrastructure, application performance, logs, and user experience. The service collects, processes, and visualizes telemetry data from servers, containers, databases, and cloud services, enabling teams to detect anomalies, troubleshoot issues, and understand system behavior through correlated metrics, traces, and logs in a unified platform.
DevOps engineers use Datadog to monitor infrastructure health across hybrid and multi-cloud environments, tracking resource utilization and operational metrics while setting up automated alerts for potential issues. Application developers implement Datadog APM (Application Performance Monitoring) to identify bottlenecks, trace requests across microservices, and optimize code performance in production. Security teams leverage Datadog's security monitoring capabilities to detect threats, track compliance, and monitor sensitive data access patterns, while business stakeholders utilize dashboards and reports to understand service level indicators and technical performance in relation to business outcomes.
Users may experience various types of issues when using Datadog, including temporary data visualization delays during exceptionally high cardinality metric ingestion, occasional agent communication interruptions requiring reconnection, or brief alerting delays during major incident storms. Custom dashboard rendering might experience loading time increases when displaying high complexity visualizations with many widgets. API rate limiting might affect automated operations making frequent requests, while log indexing could experience processing backlog during sudden, massive volume spikes. During scheduled platform updates, users might notice slightly increased latency for query operations, temporary limitations on certain visualization features, or brief delays in notification delivery through integrated third-party services.