SupportUpdated July 3, 2026

Incident Management

supportincident-managementservicenowitsmescalationcommunicationoperationsepicazuresla

Incident Management

Key Links:

1. Introduction

1.1 Purpose

The Incident Management document ensures a structured approach to identifying, documenting, and resolving incidents promptly, while maintaining clear communication with all stakeholders.

1.2 Scope

Applies to both Production and Non-Production (testing, development, etc.) environments.
Training environments are treated with the same standards and processes as Production.
Covers all teams involved in operational and technical incident resolution for the Epic implementation on Azure.
Encompasses detection, triaging, escalation, communication, resolution, and post-incident review.

1.3 Key Principles

Rapid Response: Swiftly address incidents to minimize impact on patient care and business operations.
Consistent Communication: Provide clear, timely updates to stakeholders.
Continuous Improvement: Leverage lessons from each incident to drive process enhancements.

2. Roles & Responsibilities

Below is a high-level summary of each team’s assignment and escalation group.

Team	Assignment Group
Business Operations (Epic App DBA)	Epic - Azure (National West)
Citrix	USS_Virtual_Workspace
Azure Platform Ops (Prod)	Epic_Azure_Infrastructure_Ops (Prod)
Azure Platform Ops (Non-Prod)	Epic_Azure_Infrastructure_Ops_NonProd
Network NSIS (Topology & Connectivity)	NSIS FIREWALL ANALYST
Core DNS (Infoblox)	ISO - IPAM
Network Delivery (Palo Alto & FW)	ISO_Cyber_Defense_Support_CDS
Active Directory	Directory Services Infrastructure (DSI)
Cloudflare	Load Balancer Web Application Firewall
Terraform Enterprise & Hashi Vault	E2M TEK

Visual: Team Escalation/Support Flow

flowchart TD
    BO["Business Operations (Epic App DBA)"]
    CIT["Citrix"]
    subgraph AZG ["Azure Platform Ops"]
      APROD["Prod"]
      ANONPROD["Non-Prod"]
    end
    NSIS["Network NSIS (Topology & Connectivity)"]
    DNS["Core DNS (Infoblox)"]
    SEC["Network Security (Palo Alto & FW)"]
    AD["Active Directory"]
    CF["Cloudflare"]
    TEK["Terraform Enterprise & Hashi Vault"]

    BO --> CIT
    BO --> AZG

    AZG --> NSIS
    AZG --> DNS
    AZG --> SEC
    AZG --> AD
    AZG --> CF
    AZG --> TEK

2.1 Team Responsibilities

Business Operations (Epic - Azure National West)

Oversee the Epic application and ensure it meets operational and patient-care needs.
Perform Epic database administration (DBA), performance monitoring, and routine maintenance.
Coordinate with other teams for Epic upgrades or critical infrastructure changes.
Work with Azure Platform Ops and Citrix to ensure infrastructure changes follow Terraform/IaC principles.

Citrix (CITRIX IMS (UHT) - OSW)

Manage remote application and desktop virtualization services for secure Epic access.
Maintain Citrix server farms, load balancing, and user access policies.
Collaborate with Azure Platform Ops for infrastructure provisioning and updates via IaC.

Azure Platform Ops (Prod) (Epic_Azure_Infrastructure_Ops (Prod))

Manage the production Azure environment hosting Epic (including training, which is treated as production).
Monitor workloads, manage capacity, and ensure regulatory/performance compliance.
Implement changes per Terraform and IaC best practices.

Azure Platform Ops (Non-Prod) (Epic_Azure_Infrastructure_Ops_NonProd)

Maintain non-production Azure environments (e.g., development, testing).
Provision, patch, and decommission resources using IaC standards.

Network NSIS (Topology & Connectivity) (NSIS FIREWALL ANALYST)

Ensure secure, efficient network communication within Azure environments.
Troubleshoot network issues impacting Epic performance.
Collaborate with Azure Platform Ops for infrastructure changes via Terraform.

Core DNS (Infoblox) (ISO - IPAM)

Maintain domain name resolution services for Epic and manage DNS records.
Coordinate DNS updates for new/decommissioned systems.
Work with Azure Platform Ops to push DNS-related changes through IaC workflows.

Network Security (Palo Alto & Firewalls) (ISO - CYBER DEFENSE SUPPORT)

Manage advanced firewall/security configurations (IPS/IDS, threat prevention, logging, rules).
Enforce network segmentation policies.
Implement firewall changes with Azure Platform Ops using Terraform/IaC.

Active Directory (Directory Services Infrastructure (DSI))

Administer domain controllers, group policies, and authentication for Epic.
Coordinate account provisioning, deprovisioning, and security policy changes with Azure Platform Ops.

Cloudflare (Load Balancer Web Application Firewall)

Provide external DNS management, Content Delivery Network (CDN), and DDoS protection for public Epic services.
Monitor edge performance/availability.
Implement changes through IaC tools with Azure Platform Ops.

Terraform Enterprise & Hashi Vault (E2M TEK)

Maintain and support the Terraform Enterprise and HashiCorp Vault platforms.
Address incidents or outages related to the availability or performance of these tools.
Collaborate with Infra Ops and other teams if platform issues impact environment provisioning or secret management.
Responsibility for IaC templates, modules, and secrets remains with the Infra Ops teams.

3. Incident Detection & Triage

3.1 Corporate Priority Grid (ServiceNow SLA Group)

Incident priorities are assigned in ServiceNow based on the corporate SLA group. The table below describes each priority, with response and resolution goals:

Priority	Definition	Response Goal	Restoration/Fulfillment Goal
1	Major outages (business-critical app down). TCC & SMEs assess VBFs.	15 min	1 hr
2	Outages/service degradations. TCC & SMEs assess impact.	15 min	4 hrs
3	Single-user “hard down” or multi-user (not P1/P2).	4 business hrs	1 business day
4	Single-user, workaround exists.	1 business day	2 business days
5	Low-impact (password resets, RFIs, etc.).	5 business days	5 business days

Monitoring & Alerts

Automated monitoring tools (e.g., logging platforms, system alerts) should be tuned for timely detection.
All incident alerts route to the respective assignment and/or paging groups.

Triage

Validate incident severity and assign the appropriate priority (1–5).
Assign incidents to the appropriate team based on environment, service ownership, and business impact.

4. Escalation & Communication Plan

4.1 Escalation Triggers

If an incident cannot be resolved within specified timeframes or requires specialized expertise, escalate to the next tier (DevOps, Infrastructure, Security, etc.).

4.2 Communication Channels

Primary: Email distribution lists, instant messaging channels, or ticketing system notifications (e.g., ServiceNow).
Executive Updates: High-priority incidents warrant direct communication to executive and business leadership. These communications are typically coordinated and delivered by the Business Operations (Epic - Azure National West) team.

4.3 Timeframes

Acknowledge incidents within the timeframe defined by their assigned priority.
Update stakeholders at defined intervals (e.g., every 30 minutes for Priority 1 incidents).

5. Resolution & Post-Incident Review

Resolution

The assigned team mitigates or resolves the issue.
Confirm all systems are stable and notify impacted users/stakeholders of the resolution.

Post-Incident Review

Conduct a retrospective for critical (P1/P2) incidents.
Document root cause, lessons learned, and action items to prevent recurrence.

6. Additional Resources

Refer to the full Corporate Incident Management Policy for detailed escalation procedures.
For ServiceNow access instructions, visit: https://uhgazure.sharepoint.com/sites/Optum/SitePages/Secure-Access-for-ServiceNow.aspx
For additional knowledge items and resources related to Epic on Azure, visit the Epic on Azure Knowledge Portal.
Contact the respective assignment/paging group for guidance on team ownership or IaC processes.