Azure Monitor Modules¶

Terraform modules for Azure Monitor alerting and notifications.

Module Overview¶

Module	Purpose	Key Parameters
action_group	Notification routing	`email_addresses`, `display_name`
metric_alert	Resource metric monitoring	`scopes`, `criteria`, `threshold`
log_alert	KQL query alerts	`query`, `data_source_id`, `frequency`
query_alert	Advanced App Insights queries	`appinsights`, `name`
smart_detector_alert	ML-based anomaly detection	`appinsights`, `name`

Quick Examples¶

Action Group¶

module "ops_alerts" {
  source = "../../modules/azure_monitor/action_group"

  name         = "OpsTeamAlerts"
  display_name = "Ops"  # Max 12 characters

  rg = {
    name     = "MyResourceGroup"
    location = "westus2"
  }

  email_addresses = {
    "ops"    = "ops@example.com"
    "oncall" = "oncall@example.com"
  }

  tags = { Environment = "Production" }
}

Metric Alert¶

module "cpu_alert" {
  source = "../../modules/azure_monitor/metric_alert"

  name        = "HighCPU"
  description = "Alert when CPU exceeds 80%"

  rg       = { name = "MyResourceGroup", location = "westus2" }
  scopes   = [azurerm_app_service.main.id]
  severity = 2  # 0=Critical, 1=Error, 2=Warning, 3=Info, 4=Verbose

  criteria = [{
    metric_namespace = "Microsoft.Web/sites"
    metric_name      = "CpuPercentage"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 80
  }]

  frequency        = "PT5M"   # Evaluation frequency (ISO 8601)
  window_size      = "PT15M"  # Time window
  action_group_ids = [module.ops_alerts.id]
  auto_mitigate    = true
}

Common App Service Metrics: CpuPercentage, MemoryPercentage, Http5xx, ResponseTime

Common SQL Metrics: cpu_percent, dtu_consumption_percent, storage_percent, deadlock

Log Alert¶

module "error_alert" {
  source = "../../modules/azure_monitor/log_alert"

  name           = "HighErrorRate"
  description    = "Alert on elevated error rate"
  rg             = { name = "MyResourceGroup", location = "westus2" }
  data_source_id = azurerm_log_analytics_workspace.main.id

  query = <<-QUERY
    AppTraces
    | where SeverityLevel >= 3
    | summarize ErrorCount = count() by bin(TimeGenerated, 5m)
    | where ErrorCount > 10
  QUERY

  frequency        = 5   # Minutes
  time_window      = 15  # Minutes
  severity         = 2
  threshold        = 0
  action_group_ids = [module.ops_alerts.id]
}

Smart Detector Alert¶

module "anomaly_detector" {
  source = "../../modules/azure_monitor/smart_detector_alert"

  name = "FailureAnomalies"
  rg   = { name = "MyResourceGroup", location = "westus2" }

  appinsights = {
    name                = azurerm_application_insights.main.name
    connection_string   = azurerm_application_insights.main.connection_string
    instrumentation_key = azurerm_application_insights.main.instrumentation_key
  }

  tags = { Environment = "Production" }
}

Built-in Detectors: Failure Anomalies, Performance Degradation, Memory Leak, Exception Anomalies Note: ML models need 24-48 hours to establish baselines.

Reference¶

Resource Group Object (All Modules)¶

rg = {
  name     = "MyResourceGroup"
  location = "westus2"
}

Severity Levels¶

Level	Description	Use Case
0	Critical	Service down, data loss
1	Error	Major functionality impaired
2	Warning	Degraded performance
3	Informational	Notable events
4	Verbose	Detailed information

Evaluation Frequency (Metric Alerts)¶

ISO 8601 duration patterns: - PT1M - Every 1 minute (responsive, more noise) - PT5M - Every 5 minutes (recommended) - PT15M - Every 15 minutes (less noise, slower)

Note: Window size must be ≥ frequency.

Module Inputs¶

action_group¶

Input	Type	Required	Default	Description
name	string	yes	-	Action group name
display_name	string	yes	-	Short name (max 12 chars)
rg	object	yes	-	Resource group
email_addresses	map(string)	yes	-	Email receivers
tags	map(string)	no	{}	Resource tags

Outputs: id, this

metric_alert¶

Input	Type	Required	Default	Description
name	string	yes	-	Alert name
description	string	yes	-	Alert description
rg	object	yes	-	Resource group
scopes	list(string)	yes	-	Resource IDs to monitor
criteria	list(object)	yes	-	Metric criteria
action_group_ids	list(string)	yes	-	Action group IDs
frequency	string	no	PT5M	Evaluation frequency
window_size	string	no	PT15M	Time window
severity	number	no	3	Severity (0-4)
auto_mitigate	bool	no	true	Auto-resolve

Outputs: id, this

log_alert¶

Input	Type	Required	Default	Description
name	string	yes	-	Alert name
rg	object	yes	-	Resource group
data_source_id	string	yes	-	Log Analytics workspace ID
query	string	yes	-	KQL query
frequency	number	no	5	Evaluation frequency (minutes)
time_window	number	no	15	Time window (minutes)
severity	number	no	3	Severity (0-4)
threshold	number	no	0	Alert threshold
action_group_ids	list(string)	no	[]	Action group IDs

Outputs: id, this

query_alert & smart_detector_alert¶

Input	Type	Required	Default	Description
name	string	yes	-	Alert name
rg	object	yes	-	Resource group
appinsights	object	yes	-	App Insights config
tags	map(string)	no	{}	Resource tags

Outputs: id, this

Complete Example¶

See examples/monitoring-alerts for a complete implementation with all alert types.