Solution Document for Log Optimization & Long-Term Archival on Azure Kubernetes Services (AKS)

(Datadog Logs + Azure Blob via Terraform)

1. Purpose

This SOP defines the implementation of a scalable, secure, and cost-optimized logging architecture for applications running on Azure Kubernetes Service (AKS).

The solution integrates:

  • Datadog for real-time monitoring and alerting
  • Azure Blob Storage for long-term log retention
  • Terraform for Infrastructure-as-Code automation

The objectives are:

  • Reduce log ingestion cost
  • Preserve critical observability
  • Enable long-term retention
  • Ensure secure authentication
  • Provide scalable enterprise architecture

2. Scope

This SOP applies to:

  • AKS workloads
  • Datadog log ingestion & indexing
  • Log filtering & exclusion policies
  • Azure Blob archival integration
  • Terraform-based infrastructure provisioning

3. Architecture Overview

Phase 1 – Optimized Real-Time Monitoring

AKS → Datadog (Optimized 15-Day Indexed Logs)

Phase 2 – Long-Term Retention

AKS → Datadog (Real-Time Monitoring – 15 Days)
         ↓
        Azure Blob (Long-Term Archive – 6–12+ Months)

4. Phase 1 – Log Ingestion Optimization

4.1 Objective

Reduce indexed log volume while retaining full visibility of critical logs required for monitoring and incident response.

4.2 Log Analysis Summary

Observed log distribution:

  • High volume: info, debug
  • Low volume but critical: error, warn

Optimization focused on ingestion-level control since retention (15 days) is fixed under contract.

4.3 Index Configuration

Step 1 – Create Critical Logs Index

  • Index Name: critical-logs
  • Filter: status:error OR status:warn
  • Retention: 15 days

Purpose:

  • Retain 100% of actionable logs
  • Maintain alerting and incident response integrity

Step 2 – Modify Main Index

  • Index Name: main
  • Filter: -status:error -status:warn

Purpose:

Separate non-critical logs from critical logs.

4.4 Exclusion Policies

Log Type Policy
status:debug 100% excluded
status:info 80% excluded

Effect:

  • Debug logs fully removed from indexing
  • Only 20% of info logs retained
  • All critical logs preserved

4.5 Validation Procedure

Navigate to:

  • Logs → Explorer (Confirm correct index separation)
  • Logs → Usage → Indexed Logs Volume (Compare before and after optimization)

Expected Result:

~70–80% reduction in indexed log ingestion.

5. Phase 2 – Azure Blob Long-Term Archival

5.1 Objective

Enable cost-efficient long-term log retention beyond Datadog’s 15-day indexed retention.

5.2 Infrastructure Provisioning (Terraform)

Provisioned using Infrastructure-as-Code.

Repository Reference:
airowireNetworks/logs.git

Terraform provisions:

  • Resource Group
  • Azure Storage Account
  • Private Blob Container
  • Lifecycle policies (optional)
  • Secure authentication configuration

5.3 Secure Authentication Model

Authentication implemented using:

Azure AD Service Principal

Created:

  • Client ID
  • Tenant ID
  • Client Secret

Advantages:

  • Enterprise-grade security
  • Centralized identity management
  • Easier secret rotation
  • Least-privilege RBAC enforcement

5.4 RBAC Configuration

Control Plane Roles (Subscription Level)

  • Reader
  • Monitoring Reader

Data Plane Role (Container Level)

  • Storage Blob Data Contributor

Resolved prior authorization issues (e.g., 403 errors) and ensured secure log write access.

5.5 Datadog Azure Integration

Configured within Datadog:

  • Installed Azure Integration
  • Provided Tenant ID
  • Client ID
  • Client Secret
  • Subscription ID

5.6 Archive Configuration

Location:

Logs → Configuration → Archiving & Forwarding

Configured:

  • Destination: Azure Storage
  • Authentication: Azure AD
  • Target container
  • Archive status: Active

5.7 Archive Validation

Validation steps:

  • Confirm archive status is Active in Datadog
  • Verify log objects appear in Azure Blob container
  • Monitor storage growth and access logs

6. Security Controls

Layer Control
Storage Private container
Authentication Azure AD Service Principal
Access RBAC (Least Privilege)
Transport HTTPS enforced
Infrastructure Terraform-managed

7. Operational Responsibilities

Role Responsibility
DevOps Maintain Terraform & Datadog configuration
Platform Team Monitor ingestion & archive health
Security Review RBAC & identity controls
Operations Validate log availability

8. Outcomes Achieved

  • 70–80% ingestion reduction
  • 100% retention of critical logs
  • Secure Azure AD authentication
  • Long-term archival capability
  • Compliance readiness
  • Infrastructure-as-Code governance
  • Scalable enterprise architecture

9. Future Enhancement (Phase 3 – Optional)

Vector-based log routing may be introduced if:

  • Log volume increases significantly
  • Further ingestion reduction is required
  • Multi-destination routing becomes necessary
  • Advanced transformation is needed

Phase 3 is optional and will be evaluated based on scaling needs.

10. Conclusion

The implemented two-phase logging strategy delivers:

  • Cost optimization
  • Preserved observability
  • Secure long-term retention
  • Infrastructure automation
  • Enterprise-ready architecture

This approach balances operational simplicity with scalability and financial efficiency.

Contact

Patrick Schmidt — patrick@airowire.com
Piyush Choudhary — piyush@airowire.com
Dr. Shivanand Poojara — shivanand@airowire.com