SupportUpdated July 3, 2026
Incident Management
supportincident-managementservicenowitsmescalationcommunicationoperationsepicazuresla
Incident Management
Key Links:
1. Introduction
1.1 Purpose
The Incident Management document ensures a structured approach to identifying, documenting, and resolving incidents promptly, while maintaining clear communication with all stakeholders.
1.2 Scope
- Applies to both Production and Non-Production (testing, development, etc.) environments.
- Training environments are treated with the same standards and processes as Production.
- Covers all teams involved in operational and technical incident resolution for the Epic implementation on Azure.
- Encompasses detection, triaging, escalation, communication, resolution, and post-incident review.
1.3 Key Principles
- Rapid Response: Swiftly address incidents to minimize impact on patient care and business operations.
- Consistent Communication: Provide clear, timely updates to stakeholders.
- Continuous Improvement: Leverage lessons from each incident to drive process enhancements.
2. Roles & Responsibilities
Below is a high-level summary of each team’s assignment and escalation group.
| Team | Assignment Group |
|---|---|
| Business Operations (Epic App DBA) | Epic - Azure (National West) |
| Citrix | USS_Virtual_Workspace |
| Azure Platform Ops (Prod) | Epic_Azure_Infrastructure_Ops (Prod) |
| Azure Platform Ops (Non-Prod) | Epic_Azure_Infrastructure_Ops_NonProd |
| Network NSIS (Topology & Connectivity) | NSIS FIREWALL ANALYST |
| Core DNS (Infoblox) | ISO - IPAM |
| Network Delivery (Palo Alto & FW) | ISO_Cyber_Defense_Support_CDS |
| Active Directory | Directory Services Infrastructure (DSI) |
| Cloudflare | Load Balancer Web Application Firewall |
| Terraform Enterprise & Hashi Vault | E2M TEK |
Visual: Team Escalation/Support Flow
flowchart TD
BO["Business Operations (Epic App DBA)"]
CIT["Citrix"]
subgraph AZG ["Azure Platform Ops"]
APROD["Prod"]
ANONPROD["Non-Prod"]
end
NSIS["Network NSIS (Topology & Connectivity)"]
DNS["Core DNS (Infoblox)"]
SEC["Network Security (Palo Alto & FW)"]
AD["Active Directory"]
CF["Cloudflare"]
TEK["Terraform Enterprise & Hashi Vault"]
BO --> CIT
BO --> AZG
AZG --> NSIS
AZG --> DNS
AZG --> SEC
AZG --> AD
AZG --> CF
AZG --> TEK
2.1 Team Responsibilities
Business Operations (Epic - Azure National West)
- Oversee the Epic application and ensure it meets operational and patient-care needs.
- Perform Epic database administration (DBA), performance monitoring, and routine maintenance.
- Coordinate with other teams for Epic upgrades or critical infrastructure changes.
- Work with Azure Platform Ops and Citrix to ensure infrastructure changes follow Terraform/IaC principles.
Citrix (CITRIX IMS (UHT) - OSW)
- Manage remote application and desktop virtualization services for secure Epic access.
- Maintain Citrix server farms, load balancing, and user access policies.
- Collaborate with Azure Platform Ops for infrastructure provisioning and updates via IaC.
Azure Platform Ops (Prod) (Epic_Azure_Infrastructure_Ops (Prod))
- Manage the production Azure environment hosting Epic (including training, which is treated as production).
- Monitor workloads, manage capacity, and ensure regulatory/performance compliance.
- Implement changes per Terraform and IaC best practices.
Azure Platform Ops (Non-Prod) (Epic_Azure_Infrastructure_Ops_NonProd)
- Maintain non-production Azure environments (e.g., development, testing).
- Provision, patch, and decommission resources using IaC standards.
Network NSIS (Topology & Connectivity) (NSIS FIREWALL ANALYST)
- Ensure secure, efficient network communication within Azure environments.
- Troubleshoot network issues impacting Epic performance.
- Collaborate with Azure Platform Ops for infrastructure changes via Terraform.
Core DNS (Infoblox) (ISO - IPAM)
- Maintain domain name resolution services for Epic and manage DNS records.
- Coordinate DNS updates for new/decommissioned systems.
- Work with Azure Platform Ops to push DNS-related changes through IaC workflows.
Network Security (Palo Alto & Firewalls) (ISO - CYBER DEFENSE SUPPORT)
- Manage advanced firewall/security configurations (IPS/IDS, threat prevention, logging, rules).
- Enforce network segmentation policies.
- Implement firewall changes with Azure Platform Ops using Terraform/IaC.
Active Directory (Directory Services Infrastructure (DSI))
- Administer domain controllers, group policies, and authentication for Epic.
- Coordinate account provisioning, deprovisioning, and security policy changes with Azure Platform Ops.
Cloudflare (Load Balancer Web Application Firewall)
- Provide external DNS management, Content Delivery Network (CDN), and DDoS protection for public Epic services.
- Monitor edge performance/availability.
- Implement changes through IaC tools with Azure Platform Ops.
Terraform Enterprise & Hashi Vault (E2M TEK)
- Maintain and support the Terraform Enterprise and HashiCorp Vault platforms.
- Address incidents or outages related to the availability or performance of these tools.
- Collaborate with Infra Ops and other teams if platform issues impact environment provisioning or secret management.
- Responsibility for IaC templates, modules, and secrets remains with the Infra Ops teams.
3. Incident Detection & Triage
3.1 Corporate Priority Grid (ServiceNow SLA Group)
Incident priorities are assigned in ServiceNow based on the corporate SLA group. The table below describes each priority, with response and resolution goals:
| Priority | Definition | Response Goal | Restoration/Fulfillment Goal |
|---|---|---|---|
| 1 | Major outages (business-critical app down). TCC & SMEs assess VBFs. | 15 min | 1 hr |
| 2 | Outages/service degradations. TCC & SMEs assess impact. | 15 min | 4 hrs |
| 3 | Single-user “hard down” or multi-user (not P1/P2). | 4 business hrs | 1 business day |
| 4 | Single-user, workaround exists. | 1 business day | 2 business days |
| 5 | Low-impact (password resets, RFIs, etc.). | 5 business days | 5 business days |
Monitoring & Alerts
- Automated monitoring tools (e.g., logging platforms, system alerts) should be tuned for timely detection.
- All incident alerts route to the respective assignment and/or paging groups.
Triage
- Validate incident severity and assign the appropriate priority (1–5).
- Assign incidents to the appropriate team based on environment, service ownership, and business impact.
4. Escalation & Communication Plan
4.1 Escalation Triggers
- If an incident cannot be resolved within specified timeframes or requires specialized expertise, escalate to the next tier (DevOps, Infrastructure, Security, etc.).
4.2 Communication Channels
- Primary: Email distribution lists, instant messaging channels, or ticketing system notifications (e.g., ServiceNow).
- Executive Updates: High-priority incidents warrant direct communication to executive and business leadership. These communications are typically coordinated and delivered by the Business Operations (Epic - Azure National West) team.
4.3 Timeframes
- Acknowledge incidents within the timeframe defined by their assigned priority.
- Update stakeholders at defined intervals (e.g., every 30 minutes for Priority 1 incidents).
5. Resolution & Post-Incident Review
Resolution
- The assigned team mitigates or resolves the issue.
- Confirm all systems are stable and notify impacted users/stakeholders of the resolution.
Post-Incident Review
- Conduct a retrospective for critical (P1/P2) incidents.
- Document root cause, lessons learned, and action items to prevent recurrence.
6. Additional Resources
- Refer to the full Corporate Incident Management Policy for detailed escalation procedures.
- For ServiceNow access instructions, visit: https://uhgazure.sharepoint.com/sites/Optum/SitePages/Secure-Access-for-ServiceNow.aspx
- For additional knowledge items and resources related to Epic on Azure, visit the Epic on Azure Knowledge Portal.
- Contact the respective assignment/paging group for guidance on team ownership or IaC processes.