Skip to content

Azure Monitor Modules

Terraform modules for Azure Monitor alerting and notifications.

Module Overview

Module Purpose Key Parameters
action_group Notification routing email_addresses, display_name
metric_alert Resource metric monitoring scopes, criteria, threshold
log_alert KQL query alerts query, data_source_id, frequency
query_alert Advanced App Insights queries appinsights, name
smart_detector_alert ML-based anomaly detection appinsights, name

Quick Examples

Action Group

module "ops_alerts" {
  source = "../../modules/azure_monitor/action_group"

  name         = "OpsTeamAlerts"
  display_name = "Ops"  # Max 12 characters

  rg = {
    name     = "MyResourceGroup"
    location = "westus2"
  }

  email_addresses = {
    "ops"    = "ops@example.com"
    "oncall" = "oncall@example.com"
  }

  tags = { Environment = "Production" }
}

Metric Alert

module "cpu_alert" {
  source = "../../modules/azure_monitor/metric_alert"

  name        = "HighCPU"
  description = "Alert when CPU exceeds 80%"

  rg       = { name = "MyResourceGroup", location = "westus2" }
  scopes   = [azurerm_app_service.main.id]
  severity = 2  # 0=Critical, 1=Error, 2=Warning, 3=Info, 4=Verbose

  criteria = [{
    metric_namespace = "Microsoft.Web/sites"
    metric_name      = "CpuPercentage"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 80
  }]

  frequency        = "PT5M"   # Evaluation frequency (ISO 8601)
  window_size      = "PT15M"  # Time window
  action_group_ids = [module.ops_alerts.id]
  auto_mitigate    = true
}

Common App Service Metrics: CpuPercentage, MemoryPercentage, Http5xx, ResponseTime

Common SQL Metrics: cpu_percent, dtu_consumption_percent, storage_percent, deadlock

Log Alert

module "error_alert" {
  source = "../../modules/azure_monitor/log_alert"

  name           = "HighErrorRate"
  description    = "Alert on elevated error rate"
  rg             = { name = "MyResourceGroup", location = "westus2" }
  data_source_id = azurerm_log_analytics_workspace.main.id

  query = <<-QUERY
    AppTraces
    | where SeverityLevel >= 3
    | summarize ErrorCount = count() by bin(TimeGenerated, 5m)
    | where ErrorCount > 10
  QUERY

  frequency        = 5   # Minutes
  time_window      = 15  # Minutes
  severity         = 2
  threshold        = 0
  action_group_ids = [module.ops_alerts.id]
}

Smart Detector Alert

module "anomaly_detector" {
  source = "../../modules/azure_monitor/smart_detector_alert"

  name = "FailureAnomalies"
  rg   = { name = "MyResourceGroup", location = "westus2" }

  appinsights = {
    name                = azurerm_application_insights.main.name
    connection_string   = azurerm_application_insights.main.connection_string
    instrumentation_key = azurerm_application_insights.main.instrumentation_key
  }

  tags = { Environment = "Production" }
}

Built-in Detectors: Failure Anomalies, Performance Degradation, Memory Leak, Exception Anomalies Note: ML models need 24-48 hours to establish baselines.

Reference

Resource Group Object (All Modules)

rg = {
  name     = "MyResourceGroup"
  location = "westus2"
}

Severity Levels

Level Description Use Case
0 Critical Service down, data loss
1 Error Major functionality impaired
2 Warning Degraded performance
3 Informational Notable events
4 Verbose Detailed information

Evaluation Frequency (Metric Alerts)

ISO 8601 duration patterns: - PT1M - Every 1 minute (responsive, more noise) - PT5M - Every 5 minutes (recommended) - PT15M - Every 15 minutes (less noise, slower)

Note: Window size must be ≥ frequency.

Module Inputs

action_group

Input Type Required Default Description
name string yes - Action group name
display_name string yes - Short name (max 12 chars)
rg object yes - Resource group
email_addresses map(string) yes - Email receivers
tags map(string) no {} Resource tags

Outputs: id, this

metric_alert

Input Type Required Default Description
name string yes - Alert name
description string yes - Alert description
rg object yes - Resource group
scopes list(string) yes - Resource IDs to monitor
criteria list(object) yes - Metric criteria
action_group_ids list(string) yes - Action group IDs
frequency string no PT5M Evaluation frequency
window_size string no PT15M Time window
severity number no 3 Severity (0-4)
auto_mitigate bool no true Auto-resolve

Outputs: id, this

log_alert

Input Type Required Default Description
name string yes - Alert name
rg object yes - Resource group
data_source_id string yes - Log Analytics workspace ID
query string yes - KQL query
frequency number no 5 Evaluation frequency (minutes)
time_window number no 15 Time window (minutes)
severity number no 3 Severity (0-4)
threshold number no 0 Alert threshold
action_group_ids list(string) no [] Action group IDs

Outputs: id, this

query_alert & smart_detector_alert

Input Type Required Default Description
name string yes - Alert name
rg object yes - Resource group
appinsights object yes - App Insights config
tags map(string) no {} Resource tags

Outputs: id, this

Complete Example

See examples/monitoring-alerts for a complete implementation with all alert types.