Navigation
MonitoringUpdated July 3, 2026

Monitoring & Observability

monitoringobservabilitydynatracesplunkinterlinkazure-monitorservicenowalertingdashboardsoneview

Monitoring & Observability

This is the central landing page for all monitoring tooling used in Epic on Azure. Use the links below to jump directly to the tool or dashboard you need.

OneView Record: Epic on Azure (AIDE_0085665) — central catalog of all tooling, data sources, and application metadata.


Access Requirements

Before you can view dashboards or receive alerts, request membership in these Microsoft Entra ID (Azure AD) groups:

GroupPurposeRequired For
Monitoring_ReadOnlyRead access to Dynatrace dashboards and monitoring dataAll team members
dtcloud_AIDE_0085665_configDynatrace configuration access (management zones, alerting profiles)Platform engineers, on-call
cloud_splunk_east_epic_azure_nw_powerSplunk read/write for VM and network logsAll team members
sec_splunk_epic_azure_nw_powerSplunk security logs (firewall)Security, network engineers

Request these groups through Secure (MyIT). For full onboarding steps, see Phase 2: Tools Setup.


Monitoring Architecture

All monitoring data flows through a layered architecture: agents collect data, tools analyze it, alerts route through Interlink, and notifications reach on-call engineers via ServiceNow Notify.

graph TD
    subgraph "Data Collection"
        W_OA[Windows — OneAgent]
        W_LOG[Windows — Event Logs]
        L_OA[Linux — OneAgent]
        L_FB[Linux — Fluent Bit]
        FW[Firewalls / Appliances — Syslog]
        AZ[Azure Platform Metrics]
    end

    subgraph "Analysis & Visualization"
        DT[(Dynatrace<br/>APM + Infrastructure)]
        SP[(Splunk<br/>Log Aggregation)]
        AZMON[Azure Monitor<br/>Metric Alerts]
        ESP[Epic System Pulse]
    end

    subgraph "Alert Routing"
        IL[Interlink<br/>Event Aggregator]
    end

    subgraph "Notification"
        SNOW[ServiceNow Notify<br/>Alerts to Devices]
        TCC[Command Center / TCC<br/>P1 & P2 Escalation]
    end

    W_OA -->|OneAgent| DT
    L_OA -->|OneAgent| DT
    W_LOG -->|Azure Log Aggregator| SP
    L_FB -->|Azure Event Hub| SP
    FW -->|Syslog → Event Hub| SP
    AZ --> AZMON
    AZ --> DT
    ESP -->|Events| DT

    DT -->|Problem Notifications| IL
    AZMON -->|Alert Rules| IL
    SP -->|Saved Searches / Alerts| IL

    IL --> SNOW
    IL --> TCC

Key data flows:

  • Host metrics and APM → Dynatrace OneAgent → Dynatrace SaaS → Interlink
  • Logs (all hosts) → Fluent Bit / Event Logs → Azure Event Hub → Splunk → Interlink
  • Azure platform metrics → Azure Monitor → Alert Rules → Interlink
  • All alerts → Interlink → ServiceNow Notify → on-call devices

Dashboards & Tools Quick Access

Dynatrace (APM & Infrastructure)

Dynatrace OneAgent is deployed on all hosts and auto-injectable application components (IIS, .NET, Java, etc.). It provides full-stack observability from infrastructure to application transactions.

DashboardDescriptionLink
Infrastructure InsightsWhat needs attention — problems, resource issuesOpen Dashboard
Infrastructure Health OverviewEpic National West health statusOpen Dashboard
ODB Health DashboardEpic on Azure ODB performance and healthOpen Dashboard
Interconnect ForegroundInterconnect Foreground monitoringOpen Dashboard
Interconnect BackgroundInterconnect Background monitoringOpen Dashboard
BCABCA monitoringOpen Dashboard
Epic Care LinkEpic Care Link monitoringOpen Dashboard
Care EverywhereCare Everywhere monitoringOpen Dashboard
Welcome WebWelcome Web monitoringOpen Dashboard
Epic Print ServiceEpic Print Service monitoringOpen Dashboard
Welcome ClientWelcome Client monitoringOpen Dashboard
System PulseSystem Pulse monitoringOpen Dashboard
ODBODB monitoringOpen Dashboard

Dynatrace Tenants:

EnvironmentTenant IDURL
Productionskx14060skx14060.apps.dynatrace.com
Non-Productiondfr17824dfr17824.apps.dynatrace.com

Filter tags: Askid:AIDE_0085665, [Azure]aide-id: AIDE_0085665

Host groups: AIDE_0085665.{environment}.azu (e.g., AIDE_0085665.prod.azu)

Network zones: AIDE_0085665.{environment}.azu

OneAgent deployment:

  • Linux install path: /monitor/oneagent
  • Windows install path: C:\monitor\oneagent
  • Monitoring mode: Full stack (application + infrastructure)
  • Deployed via Ansible: see ohemr-ansible-role-dynatrace
  • Configuration-as-Code: see ohemr-dynatrace-config (Terraform)

Documentation:


Splunk (Log Aggregation & Analysis)

Splunk is the central log aggregation platform. All hosts, applications, and network appliances forward logs to Splunk via Azure Event Hub and Fluent Bit.

DashboardDescriptionLink
Azure Metric Alert DashboardTracked resource groups, Severity 0 (Critical) alertsOpen Dashboard
Azure Patching ScheduleUpcoming and recent patching maintenance windowsOpen Dashboard

Splunk Instances:

InstancePurposeURL
Cloud Splunk EastVM, network, and infrastructure logsest-sh.prod.cloud-splunk-optum.com
Security SplunkFirewall and security event logssec-splunk.optum.com

Key indexes:

IndexContent
cloud_epic_azure_nwVM and network infrastructure logs
sec_n_paloalto_panosPalo Alto firewall logs

Quick test search: index=cloud_epic_azure_nw | head 10

Log retention: 90 days operational, 7 years audit

Documentation:


Azure Monitor (Infrastructure Metric Alerts)

Azure Monitor provides native metric alerting for all Azure resources. Alert rules are deployed via Terraform and route through Interlink.

Critical alert thresholds (Severity 0):

MetricThresholdResource Type
CPU>= 95%Virtual Machines
Available Memory<= 2%Virtual Machines
Data Disk IOPS>= 98%Virtual Machines
OS Disk IOPS>= 98%Virtual Machines
VM Availability< 1Virtual Machines
Disk Free Space< 10%Windows VMs

Warning thresholds (Severity 2): CPU >= 90%, Memory <= 15%, IOPS >= 95%, Disk < 15%

Activity log alerts: Service Health, SQL Firewall changes, NSG changes

Alert processing rules (suppression):

  • Maintenance windows: Saturday-Sunday 2:00-4:00 AM CST
  • Cloud test resources: daily suppression
  • Excluded resource groups: PCC Agentless Scan, PublicCloudManaged-ComputeScan, DIG Security, LP Central Logging

Infrastructure-as-Code:

Documentation:

Alert runbooks:


Interlink (Event Aggregation & Alert Routing)

Interlink is the central event aggregator and alerting tool. All monitoring tools (Dynatrace, Azure Monitor, Splunk) route their alerts through Interlink, which then dispatches notifications via ServiceNow Notify.

Access: interlink.optum.com (Production) | interlink-test.optum.com (Test)

Alert flow:

  1. Monitoring tool detects issue and fires alert
  2. Alert arrives in Interlink as an event
  3. Interlink applies correlation rules and deduplication
  4. Interlink dispatches notification via ServiceNow Notify
  5. On-call engineer receives alert on their registered device

Documentation:


ServiceNow Notify (Alert Delivery)

ServiceNow Notify delivers alerts to on-call engineers' registered devices (phone, SMS, email) based on their ServiceNow profile configuration.

Incident routing:

SeverityRoutingTarget
P1, P2 (Critical/High)Command Center / TCCImmediate page to on-call + incident bridge
P3, P4 (Warning/Info)Team / ServiceNow ticketTeam notification or auto-ticket creation

Documentation:


OneView (Application Record)

OneView is the central catalog for finding all tooling, data sources, and metadata associated with Epic on Azure.

Epic on Azure record: AIDE_0085665

OneView provides:

  • Application metadata and ownership
  • Linked infrastructure and services
  • Monitoring tool references
  • Compliance and security posture

Epic System Pulse

Epic System Pulse is the native Epic monitoring tool for application-level health and performance.

Access: systempulse.uhc.com

Integration: Events flow into Dynatrace for correlation with infrastructure metrics. Manual review and classification by the Epic technical team is currently required.


Selector.AI (POC)

Selector.AI is an AIOps platform currently under evaluation as a proof-of-concept for intelligent alert correlation, root cause analysis, and automated insights across the monitoring stack.

Access: optum.selector.ai

Status: Active POC — not yet integrated into production alert routing. Contact the platform team for access and current scope.


Monitoring by Resource Type

ResourcePrimary MonitorSecondary MonitorLog DestinationAlert Routing
Azure Services (Storage, ExpressRoute)Azure MonitorDynatrace OneAgentSplunk via Event HubInterlink → SNOW/TCC
Windows VMsAzure Monitor + Guest OS LogsDynatrace OneAgentSplunk via Event HubInterlink → SNOW/TCC
Linux VMsAzure Monitor (AMA + DCR)Dynatrace OneAgentSplunk via Fluent BitInterlink → SNOW/TCC
NetApp VolumesAzure MonitorSplunk via Event HubInterlink → SNOW/TCC
Firewalls / AppliancesAppliance Syslog (Palo Alto)Splunk via Event HubInterlink → SNOW/TCC
CitrixUberAgent DashboardsSplunk via KafkaInterlink → SNOW/TCC
Epic ApplicationEpic System PulseDynatrace APMManual review (future: automated)

For the full coverage matrix with team ownership and operational details, see EoA Monitoring Standards.


Performance Targets

MetricTarget
Epic Hyperspace response time< 2 seconds
Database query average< 100ms
API endpoint response< 500ms
File system operations< 50ms
Production availability99.95% uptime
Critical services availability99.99% uptime
Planned maintenance< 4 hours/month

Capacity thresholds: Warning at 75%, Critical at 85%, Emergency at 95%


Monitoring Contacts

DomainContact
Azure MonitoringClint / Indhu
DynatraceParis
Splunk LoggingClint / Indhu / Paris
Appliance MonitoringDwayne B Jones
CitrixJason
SQL MonitoringLaura / Clint / John Brownlee
Epic System PulseMatt / Jordan

Related Documentation


Key Repositories

RepositoryPurposeIaC Tool
ohemr-dynatrace-configDynatrace configuration-as-code (management zones, auto-tags, alerting profiles)Terraform
ohemr-ansible-role-dynatraceOneAgent and ActiveGate deploymentAnsible
ohemr-epic-private-registry-alert-processing-ruleAzure Monitor alert rules and processingTerraform
ohemr-epic-megadocThis documentation (monitoring section)MkDocs