Azure Monitor Modules¶
Terraform modules for Azure Monitor alerting and notifications.
Module Overview¶
| Module | Purpose | Key Parameters |
|---|---|---|
| action_group | Notification routing | email_addresses, display_name |
| metric_alert | Resource metric monitoring | scopes, criteria, threshold |
| log_alert | KQL query alerts | query, data_source_id, frequency |
| query_alert | Advanced App Insights queries | appinsights, name |
| smart_detector_alert | ML-based anomaly detection | appinsights, name |
Quick Examples¶
Action Group¶
module "ops_alerts" {
source = "../../modules/azure_monitor/action_group"
name = "OpsTeamAlerts"
display_name = "Ops" # Max 12 characters
rg = {
name = "MyResourceGroup"
location = "westus2"
}
email_addresses = {
"ops" = "ops@example.com"
"oncall" = "oncall@example.com"
}
tags = { Environment = "Production" }
}
Metric Alert¶
module "cpu_alert" {
source = "../../modules/azure_monitor/metric_alert"
name = "HighCPU"
description = "Alert when CPU exceeds 80%"
rg = { name = "MyResourceGroup", location = "westus2" }
scopes = [azurerm_app_service.main.id]
severity = 2 # 0=Critical, 1=Error, 2=Warning, 3=Info, 4=Verbose
criteria = [{
metric_namespace = "Microsoft.Web/sites"
metric_name = "CpuPercentage"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}]
frequency = "PT5M" # Evaluation frequency (ISO 8601)
window_size = "PT15M" # Time window
action_group_ids = [module.ops_alerts.id]
auto_mitigate = true
}
Common App Service Metrics: CpuPercentage, MemoryPercentage, Http5xx, ResponseTime
Common SQL Metrics: cpu_percent, dtu_consumption_percent, storage_percent, deadlock
Log Alert¶
module "error_alert" {
source = "../../modules/azure_monitor/log_alert"
name = "HighErrorRate"
description = "Alert on elevated error rate"
rg = { name = "MyResourceGroup", location = "westus2" }
data_source_id = azurerm_log_analytics_workspace.main.id
query = <<-QUERY
AppTraces
| where SeverityLevel >= 3
| summarize ErrorCount = count() by bin(TimeGenerated, 5m)
| where ErrorCount > 10
QUERY
frequency = 5 # Minutes
time_window = 15 # Minutes
severity = 2
threshold = 0
action_group_ids = [module.ops_alerts.id]
}
Smart Detector Alert¶
module "anomaly_detector" {
source = "../../modules/azure_monitor/smart_detector_alert"
name = "FailureAnomalies"
rg = { name = "MyResourceGroup", location = "westus2" }
appinsights = {
name = azurerm_application_insights.main.name
connection_string = azurerm_application_insights.main.connection_string
instrumentation_key = azurerm_application_insights.main.instrumentation_key
}
tags = { Environment = "Production" }
}
Built-in Detectors: Failure Anomalies, Performance Degradation, Memory Leak, Exception Anomalies Note: ML models need 24-48 hours to establish baselines.
Reference¶
Resource Group Object (All Modules)¶
Severity Levels¶
| Level | Description | Use Case |
|---|---|---|
| 0 | Critical | Service down, data loss |
| 1 | Error | Major functionality impaired |
| 2 | Warning | Degraded performance |
| 3 | Informational | Notable events |
| 4 | Verbose | Detailed information |
Evaluation Frequency (Metric Alerts)¶
ISO 8601 duration patterns:
- PT1M - Every 1 minute (responsive, more noise)
- PT5M - Every 5 minutes (recommended)
- PT15M - Every 15 minutes (less noise, slower)
Note: Window size must be ≥ frequency.
Module Inputs¶
action_group¶
| Input | Type | Required | Default | Description |
|---|---|---|---|---|
| name | string | yes | - | Action group name |
| display_name | string | yes | - | Short name (max 12 chars) |
| rg | object | yes | - | Resource group |
| email_addresses | map(string) | yes | - | Email receivers |
| tags | map(string) | no | {} | Resource tags |
Outputs: id, this
metric_alert¶
| Input | Type | Required | Default | Description |
|---|---|---|---|---|
| name | string | yes | - | Alert name |
| description | string | yes | - | Alert description |
| rg | object | yes | - | Resource group |
| scopes | list(string) | yes | - | Resource IDs to monitor |
| criteria | list(object) | yes | - | Metric criteria |
| action_group_ids | list(string) | yes | - | Action group IDs |
| frequency | string | no | PT5M | Evaluation frequency |
| window_size | string | no | PT15M | Time window |
| severity | number | no | 3 | Severity (0-4) |
| auto_mitigate | bool | no | true | Auto-resolve |
Outputs: id, this
log_alert¶
| Input | Type | Required | Default | Description |
|---|---|---|---|---|
| name | string | yes | - | Alert name |
| rg | object | yes | - | Resource group |
| data_source_id | string | yes | - | Log Analytics workspace ID |
| query | string | yes | - | KQL query |
| frequency | number | no | 5 | Evaluation frequency (minutes) |
| time_window | number | no | 15 | Time window (minutes) |
| severity | number | no | 3 | Severity (0-4) |
| threshold | number | no | 0 | Alert threshold |
| action_group_ids | list(string) | no | [] | Action group IDs |
Outputs: id, this
query_alert & smart_detector_alert¶
| Input | Type | Required | Default | Description |
|---|---|---|---|---|
| name | string | yes | - | Alert name |
| rg | object | yes | - | Resource group |
| appinsights | object | yes | - | App Insights config |
| tags | map(string) | no | {} | Resource tags |
Outputs: id, this
Complete Example¶
See examples/monitoring-alerts for a complete implementation with all alert types.