MonitoringUpdated July 3, 2026

Monitoring & Observability

monitoringobservabilitydynatracesplunkinterlinkazure-monitorservicenowalertingdashboardsoneview

Monitoring & Observability

This is the central landing page for all monitoring tooling used in Epic on Azure. Use the links below to jump directly to the tool or dashboard you need.

OneView Record: Epic on Azure (AIDE_0085665) — central catalog of all tooling, data sources, and application metadata.

Access Requirements

Before you can view dashboards or receive alerts, request membership in these Microsoft Entra ID (Azure AD) groups:

Group	Purpose	Required For
Monitoring_ReadOnly	Read access to Dynatrace dashboards and monitoring data	All team members
dtcloud_AIDE_0085665_config	Dynatrace configuration access (management zones, alerting profiles)	Platform engineers, on-call
cloud_splunk_east_epic_azure_nw_power	Splunk read/write for VM and network logs	All team members
sec_splunk_epic_azure_nw_power	Splunk security logs (firewall)	Security, network engineers

Request these groups through Secure (MyIT). For full onboarding steps, see Phase 2: Tools Setup.

Monitoring Architecture

All monitoring data flows through a layered architecture: agents collect data, tools analyze it, alerts route through Interlink, and notifications reach on-call engineers via ServiceNow Notify.

graph TD
    subgraph "Data Collection"
        W_OA[Windows — OneAgent]
        W_LOG[Windows — Event Logs]
        L_OA[Linux — OneAgent]
        L_FB[Linux — Fluent Bit]
        FW[Firewalls / Appliances — Syslog]
        AZ[Azure Platform Metrics]
    end

    subgraph "Analysis & Visualization"
        DT[(Dynatrace<br/>APM + Infrastructure)]
        SP[(Splunk<br/>Log Aggregation)]
        AZMON[Azure Monitor<br/>Metric Alerts]
        ESP[Epic System Pulse]
    end

    subgraph "Alert Routing"
        IL[Interlink<br/>Event Aggregator]
    end

    subgraph "Notification"
        SNOW[ServiceNow Notify<br/>Alerts to Devices]
        TCC[Command Center / TCC<br/>P1 & P2 Escalation]
    end

    W_OA -->|OneAgent| DT
    L_OA -->|OneAgent| DT
    W_LOG -->|Azure Log Aggregator| SP
    L_FB -->|Azure Event Hub| SP
    FW -->|Syslog → Event Hub| SP
    AZ --> AZMON
    AZ --> DT
    ESP -->|Events| DT

    DT -->|Problem Notifications| IL
    AZMON -->|Alert Rules| IL
    SP -->|Saved Searches / Alerts| IL

    IL --> SNOW
    IL --> TCC

Key data flows:

Host metrics and APM → Dynatrace OneAgent → Dynatrace SaaS → Interlink
Logs (all hosts) → Fluent Bit / Event Logs → Azure Event Hub → Splunk → Interlink
Azure platform metrics → Azure Monitor → Alert Rules → Interlink
All alerts → Interlink → ServiceNow Notify → on-call devices

Dashboards & Tools Quick Access

Dynatrace (APM & Infrastructure)

Dynatrace OneAgent is deployed on all hosts and auto-injectable application components (IIS, .NET, Java, etc.). It provides full-stack observability from infrastructure to application transactions.

Dashboard	Description	Link
Infrastructure Insights	What needs attention — problems, resource issues	Open Dashboard
Infrastructure Health Overview	Epic National West health status	Open Dashboard
ODB Health Dashboard	Epic on Azure ODB performance and health	Open Dashboard
Interconnect Foreground	Interconnect Foreground monitoring	Open Dashboard
Interconnect Background	Interconnect Background monitoring	Open Dashboard
BCA	BCA monitoring	Open Dashboard
Epic Care Link	Epic Care Link monitoring	Open Dashboard
Care Everywhere	Care Everywhere monitoring	Open Dashboard
Welcome Web	Welcome Web monitoring	Open Dashboard
Epic Print Service	Epic Print Service monitoring	Open Dashboard
Welcome Client	Welcome Client monitoring	Open Dashboard
System Pulse	System Pulse monitoring	Open Dashboard
ODB	ODB monitoring	Open Dashboard

Dynatrace Tenants:

Environment	Tenant ID	URL
Production	`skx14060`	skx14060.apps.dynatrace.com
Non-Production	`dfr17824`	dfr17824.apps.dynatrace.com

Filter tags: Askid:AIDE_0085665, [Azure]aide-id: AIDE_0085665

Host groups: AIDE_0085665.{environment}.azu (e.g., AIDE_0085665.prod.azu)

Network zones: AIDE_0085665.{environment}.azu

OneAgent deployment:

Linux install path: /monitor/oneagent
Windows install path: C:\monitor\oneagent
Monitoring mode: Full stack (application + infrastructure)
Deployed via Ansible: see ohemr-ansible-role-dynatrace
Configuration-as-Code: see ohemr-dynatrace-config (Terraform)

Documentation:

Dynatrace Problems API Guide — querying problems via API with cURL and jq
Monitoring Strategy — where Dynatrace fits in the overall monitoring stack

Splunk (Log Aggregation & Analysis)

Splunk is the central log aggregation platform. All hosts, applications, and network appliances forward logs to Splunk via Azure Event Hub and Fluent Bit.

Dashboard	Description	Link
Azure Metric Alert Dashboard	Tracked resource groups, Severity 0 (Critical) alerts	Open Dashboard
Azure Patching Schedule	Upcoming and recent patching maintenance windows	Open Dashboard

Splunk Instances:

Instance	Purpose	URL
Cloud Splunk East	VM, network, and infrastructure logs	est-sh.prod.cloud-splunk-optum.com
Security Splunk	Firewall and security event logs	sec-splunk.optum.com

Key indexes:

Index	Content
`cloud_epic_azure_nw`	VM and network infrastructure logs
`sec_n_paloalto_panos`	Palo Alto firewall logs

Quick test search: index=cloud_epic_azure_nw | head 10

Log retention: 90 days operational, 7 years audit

Documentation:

Splunk Maintenance Windows — suppressing alerts during patching
Splunk Queries Guide — useful SPL queries and search patterns
Fluent Bit Configuration — Linux log collection agent setup

Azure Monitor (Infrastructure Metric Alerts)

Azure Monitor provides native metric alerting for all Azure resources. Alert rules are deployed via Terraform and route through Interlink.

Critical alert thresholds (Severity 0):

Metric	Threshold	Resource Type
CPU	>= 95%	Virtual Machines
Available Memory	<= 2%	Virtual Machines
Data Disk IOPS	>= 98%	Virtual Machines
OS Disk IOPS	>= 98%	Virtual Machines
VM Availability	< 1	Virtual Machines
Disk Free Space	< 10%	Windows VMs

Warning thresholds (Severity 2): CPU >= 90%, Memory <= 15%, IOPS >= 95%, Disk < 15%

Activity log alerts: Service Health, SQL Firewall changes, NSG changes

Alert processing rules (suppression):

Maintenance windows: Saturday-Sunday 2:00-4:00 AM CST
Cloud test resources: daily suppression
Excluded resource groups: PCC Agentless Scan, PublicCloudManaged-ComputeScan, DIG Security, LP Central Logging

Infrastructure-as-Code:

Alert rules: ohemr-epic-private-registry-alert-processing-rule (Terraform)
Action groups route to Event Hub diagnostic-logs on namespace lp-cl-centralus-eventhub-6a9ba7a4

Documentation:

Metric Alert Configuration — thresholds and alert rule details
Metric Alert Code Explanations — alert logic documentation
EoA Monitoring Coverage Matrix — what is monitored and by whom

Alert runbooks:

Interlink (Event Aggregation & Alert Routing)

Interlink is the central event aggregator and alerting tool. All monitoring tools (Dynatrace, Azure Monitor, Splunk) route their alerts through Interlink, which then dispatches notifications via ServiceNow Notify.

Access: interlink.optum.com (Production) | interlink-test.optum.com (Test)

Alert flow:

Monitoring tool detects issue and fires alert
Alert arrives in Interlink as an event
Interlink applies correlation rules and deduplication
Interlink dispatches notification via ServiceNow Notify
On-call engineer receives alert on their registered device

Documentation:

Interlink Maintenance Windows — creating and managing maintenance suppression records via API and UI

ServiceNow Notify (Alert Delivery)

ServiceNow Notify delivers alerts to on-call engineers' registered devices (phone, SMS, email) based on their ServiceNow profile configuration.

Incident routing:

Severity	Routing	Target
P1, P2 (Critical/High)	Command Center / TCC	Immediate page to on-call + incident bridge
P3, P4 (Warning/Info)	Team / ServiceNow ticket	Team notification or auto-ticket creation

Documentation:

OneView (Application Record)

OneView is the central catalog for finding all tooling, data sources, and metadata associated with Epic on Azure.

Epic on Azure record: AIDE_0085665

OneView provides:

Application metadata and ownership
Linked infrastructure and services
Monitoring tool references
Compliance and security posture

Epic System Pulse

Epic System Pulse is the native Epic monitoring tool for application-level health and performance.

Access: systempulse.uhc.com

Integration: Events flow into Dynatrace for correlation with infrastructure metrics. Manual review and classification by the Epic technical team is currently required.

Selector.AI (POC)

Selector.AI is an AIOps platform currently under evaluation as a proof-of-concept for intelligent alert correlation, root cause analysis, and automated insights across the monitoring stack.

Access: optum.selector.ai

Status: Active POC — not yet integrated into production alert routing. Contact the platform team for access and current scope.

Monitoring by Resource Type

Resource	Primary Monitor	Secondary Monitor	Log Destination	Alert Routing
Azure Services (Storage, ExpressRoute)	Azure Monitor	Dynatrace OneAgent	Splunk via Event Hub	Interlink → SNOW/TCC
Windows VMs	Azure Monitor + Guest OS Logs	Dynatrace OneAgent	Splunk via Event Hub	Interlink → SNOW/TCC
Linux VMs	Azure Monitor (AMA + DCR)	Dynatrace OneAgent	Splunk via Fluent Bit	Interlink → SNOW/TCC
NetApp Volumes	Azure Monitor	—	Splunk via Event Hub	Interlink → SNOW/TCC
Firewalls / Appliances	Appliance Syslog (Palo Alto)	—	Splunk via Event Hub	Interlink → SNOW/TCC
Citrix	UberAgent Dashboards	—	Splunk via Kafka	Interlink → SNOW/TCC
Epic Application	Epic System Pulse	Dynatrace APM	—	Manual review (future: automated)

For the full coverage matrix with team ownership and operational details, see EoA Monitoring Standards.

Performance Targets

Metric	Target
Epic Hyperspace response time	< 2 seconds
Database query average	< 100ms
API endpoint response	< 500ms
File system operations	< 50ms
Production availability	99.95% uptime
Critical services availability	99.99% uptime
Planned maintenance	< 4 hours/month

Capacity thresholds: Warning at 75%, Critical at 85%, Emergency at 95%

Monitoring Contacts

Domain	Contact
Azure Monitoring	Clint / Indhu
Dynatrace	Paris
Splunk Logging	Clint / Indhu / Paris
Appliance Monitoring	Dwayne B Jones
Citrix	Jason
SQL Monitoring	Laura / Clint / John Brownlee
Epic System Pulse	Matt / Jordan

Key Repositories

Repository	Purpose	IaC Tool
ohemr-dynatrace-config	Dynatrace configuration-as-code (management zones, auto-tags, alerting profiles)	Terraform
ohemr-ansible-role-dynatrace	OneAgent and ActiveGate deployment	Ansible
ohemr-epic-private-registry-alert-processing-rule	Azure Monitor alert rules and processing	Terraform
ohemr-epic-megadoc	This documentation (monitoring section)	MkDocs

Monitoring & Observability

Access Requirements

Monitoring Architecture

Dashboards & Tools Quick Access

Dynatrace (APM & Infrastructure)

Splunk (Log Aggregation & Analysis)

Azure Monitor (Infrastructure Metric Alerts)

Interlink (Event Aggregation & Alert Routing)

ServiceNow Notify (Alert Delivery)

OneView (Application Record)

Epic System Pulse

Selector.AI (POC)

Monitoring by Resource Type

Performance Targets

Monitoring Contacts

Related Documentation

Key Repositories