Solution Document for Log Optimization & Long-Term Archival on Azure Kubernetes Services (AKS)
(Datadog Logs + Azure Blob via Terraform)
1. Purpose
This SOP defines the implementation of a scalable, secure, and cost-optimized logging architecture for applications running on Azure Kubernetes Service (AKS).
The solution integrates:
- Datadog for real-time monitoring and alerting
- Azure Blob Storage for long-term log retention
- Terraform for Infrastructure-as-Code automation
The objectives are:
- Reduce log ingestion cost
- Preserve critical observability
- Enable long-term retention
- Ensure secure authentication
- Provide scalable enterprise architecture
2. Scope
This SOP applies to:
- AKS workloads
- Datadog log ingestion & indexing
- Log filtering & exclusion policies
- Azure Blob archival integration
- Terraform-based infrastructure provisioning
3. Architecture Overview
Phase 1 – Optimized Real-Time Monitoring
AKS → Datadog (Optimized 15-Day Indexed Logs)
Phase 2 – Long-Term Retention
AKS → Datadog (Real-Time Monitoring – 15 Days)
↓
Azure Blob (Long-Term Archive – 6–12+ Months)
4. Phase 1 – Log Ingestion Optimization
4.1 Objective
Reduce indexed log volume while retaining full visibility of critical logs required for monitoring and incident response.
4.2 Log Analysis Summary
Observed log distribution:
- High volume: info, debug
- Low volume but critical: error, warn
Optimization focused on ingestion-level control since retention (15 days) is fixed under contract.
4.3 Index Configuration
Step 1 – Create Critical Logs Index

- Index Name: critical-logs
- Filter: status:error OR status:warn
- Retention: 15 days
Purpose:
- Retain 100% of actionable logs
- Maintain alerting and incident response integrity
Step 2 – Modify Main Index
- Index Name: main
- Filter: -status:error -status:warn

Purpose:
Separate non-critical logs from critical logs.
4.4 Exclusion Policies
| Log Type | Policy |
|---|---|
| status:debug | 100% excluded |
| status:info | 80% excluded |
Effect:
- Debug logs fully removed from indexing
- Only 20% of info logs retained
- All critical logs preserved
4.5 Validation Procedure
Navigate to:
- Logs → Explorer (Confirm correct index separation)
- Logs → Usage → Indexed Logs Volume (Compare before and after optimization)
Expected Result:
~70–80% reduction in indexed log ingestion.
5. Phase 2 – Azure Blob Long-Term Archival
5.1 Objective
Enable cost-efficient long-term log retention beyond Datadog’s 15-day indexed retention.
5.2 Infrastructure Provisioning (Terraform)
Provisioned using Infrastructure-as-Code.
Repository Reference:
airowireNetworks/logs.git
Terraform provisions:
- Resource Group
- Azure Storage Account
- Private Blob Container
- Lifecycle policies (optional)
- Secure authentication configuration
5.3 Secure Authentication Model
Authentication implemented using:
Azure AD Service Principal
Created:
- Client ID
- Tenant ID
- Client Secret
Advantages:
- Enterprise-grade security
- Centralized identity management
- Easier secret rotation
- Least-privilege RBAC enforcement
5.4 RBAC Configuration
Control Plane Roles (Subscription Level)
- Reader
- Monitoring Reader
Data Plane Role (Container Level)
- Storage Blob Data Contributor
Resolved prior authorization issues (e.g., 403 errors) and ensured secure log write access.
5.5 Datadog Azure Integration
Configured within Datadog:
- Installed Azure Integration
- Provided Tenant ID
- Client ID
- Client Secret
- Subscription ID
5.6 Archive Configuration
Location:
Logs → Configuration → Archiving & Forwarding
Configured:
- Destination: Azure Storage
- Authentication: Azure AD
- Target container
- Archive status: Active
5.7 Archive Validation
Validation steps:
- Confirm archive status is Active in Datadog
- Verify log objects appear in Azure Blob container
- Monitor storage growth and access logs

6. Security Controls
| Layer | Control |
|---|---|
| Storage | Private container |
| Authentication | Azure AD Service Principal |
| Access | RBAC (Least Privilege) |
| Transport | HTTPS enforced |
| Infrastructure | Terraform-managed |
7. Operational Responsibilities
| Role | Responsibility |
|---|---|
| DevOps | Maintain Terraform & Datadog configuration |
| Platform Team | Monitor ingestion & archive health |
| Security | Review RBAC & identity controls |
| Operations | Validate log availability |
8. Outcomes Achieved
- 70–80% ingestion reduction
- 100% retention of critical logs
- Secure Azure AD authentication
- Long-term archival capability
- Compliance readiness
- Infrastructure-as-Code governance
- Scalable enterprise architecture
9. Future Enhancement (Phase 3 – Optional)
Vector-based log routing may be introduced if:
- Log volume increases significantly
- Further ingestion reduction is required
- Multi-destination routing becomes necessary
- Advanced transformation is needed
Phase 3 is optional and will be evaluated based on scaling needs.
10. Conclusion
The implemented two-phase logging strategy delivers:
- Cost optimization
- Preserved observability
- Secure long-term retention
- Infrastructure automation
- Enterprise-ready architecture
This approach balances operational simplicity with scalability and financial efficiency.
Contact
Patrick Schmidt — patrick@airowire.com
Piyush Choudhary — piyush@airowire.com
Dr. Shivanand Poojara — shivanand@airowire.com