Navigation
SupportUpdated July 3, 2026

Epic on Azure Operations Response Framework

supportoperations-frameworkincident-responsewar-roomcollaborationtroubleshootingepicazurep1-p2

Epic on Azure – Operations Response Framework

Why We're Taking This Approach

This framework is designed to establish clear structure and a tighter, faster response to high-severity or potentially recurring Epic-on-Azure incidents.

Instead of narrowing investigations to a single fault domain, this approach ensures all teams collaborate from the outset within their areas of responsibility.

Goals & Principles

  • Support collaborative problem-solving, maintaining attention on prompt diagnosis and solution delivery.
  • Ensure every domain is checked quickly and systematically, reducing the risk of missing early signs in non-network layers.
  • Shorten time to root cause by removing sequential, siloed troubleshooting.
  • Create a repeatable, transparent process that works regardless of which layer ultimately owns the fix.
  • Maintain patient-care focus by keeping resolution speed and accuracy at the center of the response. <br>

This is about discipline in execution—using a shared playbook, common communication space, and defined team roles so that the right people are looking at the right data, right away.


1. Teams Structure for Incidents

Main Channel: Epic on Azure – for announcements, updates, and general discussions.

Incidents Subchannel: Incident Collaboration – private, dedicated to P1/P2 incident war rooms and urgent P3 triage.

Usage Rule: Any time the business raises an issue or any team sees a potential high-impact problem, post it here immediately.

Role-Based Tags:

  • @eoa_BusOps – Business Operations (Technical Stakeholders / Lead)
  • @eoa_EpicTech – Epic Technical Support (Liaison to Epic App and Third Party Integration Teams)
  • @eoa_Citrix – Citrix
  • @eoa_InfraOps – Azure Infrastructure Ops (Liaison to infrastructure supply chain support resources)
  • @eoa_CloudOps – IaC-first standards, secure Azure operations with Terraform, CI/CD, AWX, and Key Vault.

Team Responsibilities

Business Operations

  • Oversee the Epic application and ensure operational and patient-care needs are met.
  • Perform Epic database administration (DBA), performance monitoring, and routine maintenance.
  • Coordinate with other teams for Epic upgrades or critical infrastructure changes.
  • Work with Azure Platform Ops and Citrix to ensure infrastructure changes follow Terraform/IaC principles.

Epic Application

  • Primarily DBA – liaison to Epic vendor.
  • Review Epic application logs and monitor application status.
  • Assess application queues and dependencies for issues.

Citrix

  • Manage remote application and desktop virtualization services for secure Epic access.
  • Maintain Citrix server farms, load balancing, and user access policies.
  • Collaborate with Azure Platform Ops for infrastructure provisioning and updates via IaC.

Azure Platform Ops

  • Manage the production Azure environment hosting Epic (including training, which is treated as production).
  • Maintain non-production Azure environments (e.g., development, testing).
  • Monitor workloads, manage capacity, and ensure regulatory/performance compliance.
  • Provision, patch, decommission resources, and implement changes using Terraform and Infrastructure as Code (IaC) best practices.

CloudOps

  • Terraform modules + Git/PRs; consistent Azure IaC; remediate drift.
  • Infra CI/CD (Azure DevOps/GitHub Actions): lint, scan, compliance; enforce Policy/RBAC/tagging.
  • AWX: platform admin/governance; define standards (RBAC, projects, creds, inventories, job/workflows); L2/L3 rollout guardrails.
  • Key Vault: manage certs/secrets/keys; enforce encryption and rotation. <br>
TeamAssignment Group
Business Ops (Epic App DBA)Epic - Azure (National West)
CitrixUSS_Virtual_Workspace
Azure Platform OpsEpic_Azure_Infrastructure_Ops (Prod/NonProd)
Cloud OpsGitHub Issues
<br>

Full details of incident management: Epic Incident Management Roles & Responsibilities


2. P1/P2 Incident Activation

Trigger the framework when:

  • Patient care workflows are disrupted (downtime, login failures, critical slowness).
  • Multiple users/regions/sites are affected.
  • Any Epic production outage or degraded performance in Azure.

3. First 5 Minutes – Stabilization

Actions:

  1. Post Incident Template in Incidents subchannel:

    • 🚨 P1/P2 Incident Alert
      • Date/Time:
      • Reported by:
      • Scope:
      • Symptoms:
      • Users Impacted:
      • Initial Evidence:
      • Suspected Layer:
  2. Tag all relevant groups at once.

  3. First lead available from any group assigns themselves as Incident Commander.

  4. Create Teams meeting/bridge from the channel.


4. Minutes 5–15 – Parallel Quick Checks

Each team responds in-thread to the incident post:

  • BusOps: Confirm impact scope, user errors, patient safety risk.
  • InfraOps: Check Azure service health, VMs, DBs, IaC pipeline status.
  • Citrix: Session health, brokering, app availability, latency.
  • EpicTech: Logs, application status, queues, app dependencies.

5. Minutes 15–30 – Hypothesis & Assignment

  • Incident Commander summarizes findings, declares working hypothesis.
  • Assigns primary fix owner; others go to standby but stay in channel.
  • Update incident ticket with timeline and actions so far.

6. Communication Cadence

  • War Room: Continuous updates in-thread.
  • Leadership: Every 15–20 min for P1, 30 min for P2.
  • End Users: Business Ops handles once scope is validated.

7. Resolution & Closure

  • Primary owner fixes issue; all teams verify stability.
  • Incident Commander posts resolution summary in channel:
    • Resolved – P1/P2
      • Root Cause:
      • Fix Applied:
      • Mean Time to Restore/Resolve (MTTR):
      • Post Incident Review Date:

8. Post-Incident Review

Within 48 hours:

  • Walk through timeline, comms, and hand-offs.
  • Identify monitoring/alerting gaps.
  • Update playbook/theme-specific runbooks as needed.

Reference: Source: Epic Incident Management


Technical Support Structure Diagram

Guiding Principles:

  • All teams—including the Business Operations Layer and all Epic Command Center teams—collaborate and communicate within the Epic on Azure Incident Collaboration Teams channel.
  • Epic Command Center teams also conduct proactive monitoring via various channels in order to detect issues in the environment before they’re reported up through business teams.

The following diagram illustrates the technical support structure and escalation paths:

flowchart TB

%% 1) Points of Observation
subgraph POO["<b>Points of Observation</b>"]
  direction TB
  HD["Help Desk<br/>(Incidents)"]
  OCC["OCC<br/>Monitoring"]
  NOC["Clinical NOC<br/>Monitoring"]
end

%% 2) Business Operations Layer
subgraph BOL["<b>Business Operations Layer</b>"]
  direction TB
  BOT["Business Ops<br/>Team"]
end

%% 3) Epic Command Center
subgraph ECC["<b>Epic Command Center</b>"]
  direction TB

  %% Technical Teams
  subgraph APP["<b>Technical Teams</b>"]
    direction TB
    ETT["Epic Tech Team<br/>(DB Admins)"]
    CIT["Citrix Team<br/>(Virtualization)"]
  end

  %% Third-Party Integrations
  subgraph TPI["<b>Third-Party Integrations</b>"]
    direction TB
    TPV["Third-Party Vendors<br/>& Partners"]
  end

  %% Infrastructure Engineering and Shared Support Services
  subgraph IESS["<b>Infrastructure & Support Services</b>"]
    direction TB
    IE["Infrastructure Engineering<br/>(CI/CD Pipeline & Support)"]

    %% Shared Support Services
    subgraph SSS["<b>Shared Support Services</b>"]
      direction TB
      FW["Firewall"]
      DNS["DNS"]
      AD["Active<br/>Directory"]
    end
  end
end

%% Flows from points of observation to Business Ops
HD -->|Report| BOT
OCC -->|Alert| BOT
NOC -->|Notify| BOT

%% Escalation/triage paths from Business Ops
BOT -->|Issues| ETT
BOT -->|Issues| CIT
BOT -->|Issues| TPV
BOT -->|Issues| IE

%% Direct dotted lines from Help Desk
HD -.->|Report| CIT
HD -.->|Report| IE

%% Coordination within Technical Teams
ETT -.->|Coordinate| CIT

%% Interactions with Infrastructure Engineering
ETT -.->|Coordinate| IE
CIT -.->|Coordinate| IE
TPV -.->|Support| IE

%% Infrastructure Engineering to Shared Services
IE -.->|Coordinate| FW
IE -.->|Coordinate| DNS
IE -.->|Coordinate| AD

%% Third-Party coordination
ETT -.->|Coordinate| TPV

%% Custom styling for more professional appearance
style POO fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style BOL fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000
style ECC fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style APP fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#000
style TPI fill:#fff8e1,stroke:#f57f17,stroke-width:2px,color:#000
style IESS fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style SSS fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000

%% Node styling for uniform sizing
style HD fill:#bbdefb,stroke:#1565c0,stroke-width:2px
style OCC fill:#bbdefb,stroke:#1565c0,stroke-width:2px
style NOC fill:#bbdefb,stroke:#1565c0,stroke-width:2px
style BOT fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
style ETT fill:#ffcdd2,stroke:#c62828,stroke-width:2px
style CIT fill:#ffcdd2,stroke:#c62828,stroke-width:2px
style TPV fill:#fff9c4,stroke:#f57f17,stroke-width:2px
style IE fill:#b3e5fc,stroke:#0277bd,stroke-width:2px
style FW fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
style DNS fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
style AD fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px

Diagram Description

This flowchart illustrates the support structure with the following key components:

  1. Points of Observation - Where issues are first detected (customer side impacts)
    • Individual Command Center Teams maintain their own early detection monitoring
  2. Business Operations Layer - Central triage and escalation point
  3. Epic Command Center - Technical teams handling application and infrastructure issues
    • Application Teams - Epic Tech Team and Citrix Team
    • Third-Party Integrations - External vendors and partners
    • Infrastructure & Support Services - Contains Infrastructure Engineering which manages shared support services (Firewall, DNS, Active Directory)

The arrows indicate escalation paths and information flows between different teams and components. Infrastructure Engineering manages and interacts with the Shared Support Services. The Epic Tech Team has a direct coordination relationship with the Third-Party Vendors.