Building a HIPAA-Compliant Telehealth Platform: Architecture & Scale

Key Takeaways

We chose an SFU (Selective Forwarding Unit) over an MCU topology because avoiding server-side transcoding preserves end-to-end encryption and eliminates a latency bottleneck. The trade-off is higher client-side bandwidth usage from simulcast.
HIPAA compliance is better treated as a security engineering discipline than a checklist exercise. We implemented technical controls for all Security Rule specifications, which made the HITRUST certification process significantly smoother.
Adaptive bitrate alone is insufficient for clinical video. Different specialties have different quality requirements (spatial resolution for dermatology, temporal resolution for neurology), so we built specialty-aware encoding profiles.
End-to-end encryption using the Insertable Streams API means our SFU forwards packets it cannot decrypt. This zero-knowledge architecture simplified our threat model and compliance posture substantially.
Predictive auto-scaling based on appointment schedules eliminated cold-start quality degradation. Pre-warming infrastructure 30 minutes before predicted demand spikes is more effective than reactive scaling for scheduled healthcare workloads.

The Telehealth Engineering Challenge

Most telehealth platforms in production today started as consumer video tools wrapped in BAA agreements during the 2020 surge. They work, but they were not designed for clinical use. The gap shows up in three places: inconsistent video quality on patient home networks (patients are not on corporate fiber), insufficient integration with clinical workflows (physicians alt-tab between the video call and the EHR), and security architectures that satisfy the letter of HIPAA but leave metadata exposed.

We were asked to build a telehealth platform for a multi-site health system after their existing vendor had a security incident that exposed session metadata. The clinical content was not breached, but session metadata (who talked to whom, when, for how long) is still PHI under HIPAA. The incident triggered an OCR investigation and eroded patient trust. The mandate was to rebuild with security as the foundational layer, not a compliance add-on.

The technical requirements were: support tens of thousands of concurrent video sessions across specialties from primary care to telestroke, integrate with existing EHR systems, support remote patient monitoring device streaming, enable e-prescribing during visits, and maintain sub-200ms video latency. This post covers the architecture decisions we made and why.

WebRTC Architecture for Clinical Video

WebRTC was the obvious foundation: peer-to-peer capable, broad browser support, built-in DTLS-SRTP encryption. But raw WebRTC has limitations for clinical video. Consumer applications tolerate dropped frames and quality fluctuations. Clinical video cannot, at least not uniformly. A dermatologist examining a lesion needs consistent color accuracy and spatial resolution. A telestroke neurologist needs sub-second latency and high frame rates to assess facial droop. A behavioral health session needs stable facial expression continuity at 30fps, but can tolerate lower resolution.

Why SFU Over MCU or Mesh P2P

The topology decision was between three options: mesh P2P (each participant connects directly to every other), MCU (a server decodes, composites, and re-encodes all streams), or SFU (a server routes packets without decoding). Mesh P2P does not scale beyond 3-4 participants and does not give us server-side control over bandwidth allocation. MCU introduces transcoding latency (typically 100-300ms) and, critically, breaks end-to-end encryption because the server must decode the video. SFU was the right choice: it routes encrypted media without decoding, adds minimal latency, and gives us control over which simulcast layer each participant receives.

We built the SFU in Rust for predictable performance and memory safety. Each participant sends three simulcast layers (high, medium, low resolution). The SFU selects which layer to forward to each recipient based on their available bandwidth, display size, and the clinical context of the session. The clinical context part is what differentiates this from a generic SFU. In a dermatology consult, the patient camera stream gets bandwidth priority because it is the primary clinical input. In a telestroke session, the patient stream gets maximum resolution and frame rate. These specialty-specific media policies are configured per session type.

End-to-End Encryption Architecture

WebRTC provides DTLS-SRTP by default, which encrypts media in transit between each hop. But in an SFU topology, the SFU is a hop, which means standard DTLS-SRTP lets the SFU see cleartext media. For a HIPAA-compliant system handling PHI, we did not want our infrastructure to have access to the video content at all. We used the Insertable Streams API to add application-layer encryption before media reaches the SFU. Each session uses ephemeral keys derived via ECDH key exchange, with AES-256-GCM as the cipher. The result is a zero-knowledge architecture: our SFU forwards encrypted packets it cannot decrypt.

DTLS 1.3: Transport-layer encryption for all signaling and media channels with perfect forward secrecy.
Insertable Streams E2EE: Application-layer encryption applied before media reaches the SFU. The server never sees plaintext video or audio.
Key rotation: Session keys rotate every 30 minutes with seamless re-keying. We chose 30 minutes as a balance between security and the overhead of key exchange.
Certificate pinning: TURN server certificates are pinned in the client to prevent MITM attacks on media relay paths.

HIPAA Security Layer: Encryption, Audit, and Access Control

The HIPAA Security Rule has 42 addressable and 20 required specifications. Many organizations treat these as a policy exercise: write a document, check a box, move on. We implemented them as technical controls wherever possible, because policies that depend on humans following procedures will be violated. Technical controls that enforce behavior will not.

Tamper-Evident Audit Logging

We implemented an append-only audit log where each entry contains a cryptographic hash of the previous entry, creating a hash chain. Any modification or deletion of a log entry breaks the chain and triggers an alert. This is a simpler and more verifiable approach than "blockchain-based logging" (which usually means the same thing without the buzzword). Audit logs capture every PHI access, every configuration change, every failed authentication attempt, and every data export. The logs live in a separate AWS account with write-only access from production, so even a full production compromise cannot alter audit history retroactively.

Attribute-Based Access Control

Simple role-based access control (RBAC) is insufficient for clinical systems. A physician should not be able to join any telehealth session just because they have the "physician" role. We implemented attribute-based access control (ABAC) that evaluates access requests against multiple dimensions: the user's role, whether they are on the patient's active care team, the device security posture (encrypted, current OS, no jailbreak), the network location, and whether the access falls within the scheduled session window. This contextual approach catches both genuine policy violations and the more common case of a clinician accidentally clicking the wrong patient.

Device attestation: Mobile devices must pass security checks (encryption enabled, OS version current, no jailbreak) before accessing any PHI.
Session recording consent: Dual-party consent enforcement with state-specific rules for all 50 states, automatically applied based on participant geoIP. State consent laws vary significantly and this was more complex to implement than we expected.
Automatic session termination: Idle sessions terminate after 5 minutes. This prevents the scenario where a clinician walks away and the session remains open on an unattended screen.
Data residency: PHI is restricted to US-based AWS GovCloud regions with automated enforcement preventing cross-border data transfer.

Session Management and Clinical Workflow Integration

A telehealth visit is a clinical encounter, not a video call. It needs to support the same workflows as an in-person visit: patient intake, vitals capture, clinical assessment, ordering, prescribing, and documentation. If clinicians have to switch between the video platform and three other applications to complete a visit, the platform has failed at its core job. Our session management layer orchestrates these workflows within the video session.

Pre-Visit Workflow

Patients receive a link 24 hours before their appointment with instructions to complete intake forms. The intake module collects chief complaint, validated symptom questionnaires (PHQ-9, HIT-6, etc.), medication reconciliation, and insurance verification. This data flows into the EHR via FHIR before the encounter starts, so the physician sees a pre-populated visit note. Patients also run a network quality test that predicts whether their connection supports high-definition video. If not, the system provides specific troubleshooting steps rather than a generic "check your internet" message.

In-Visit Clinical Tools

During the session, clinicians have access to e-prescribing (connected to Surescripts for real-time formulary checks), lab ordering (integrated with Quest and Labcorp), and RPM device streaming. The RPM integration receives real-time data from patient Bluetooth devices (blood pressure cuffs, pulse oximeters, glucometers) through the patient's mobile device and displays live vitals alongside the video feed. This was technically straightforward using the Web Bluetooth API but required significant UX work to guide non-technical patients through device pairing.

We also built clinical screen-sharing that maintains the video feed in picture-in-picture mode during sharing. This sounds minor, but it matters clinically: when a physician shares imaging results or a difficult diagnosis, observing the patient's reaction is part of the clinical assessment. Generic screen sharing that replaces the video feed breaks this.

Video Quality Optimization Under Constrained Networks

The single biggest complaint about telehealth is video quality, and it correlates strongly with patient network conditions. Many patients, especially in rural areas and older demographics, are on connections below 2 Mbps downstream. Telling patients to upgrade their internet is not a solution. We needed to deliver clinically acceptable video on connections as low as 500 Kbps.

Bandwidth Prediction

Reactive bitrate adaptation (the standard WebRTC approach) responds to congestion after it happens: the patient sees a freeze, the encoder ratchets down quality, and it takes several seconds to recover. We trained a lightweight model on historical bandwidth measurement samples that predicts network conditions a few seconds into the future based on recent throughput, jitter, and packet loss trends. This lets us proactively adjust encoding parameters before quality degrades. The model runs in the browser using TensorFlow.js and adds negligible overhead per prediction cycle. It is not perfect, but it catches the common patterns: the bandwidth dip when someone else in the house starts streaming, the periodic congestion on saturated last-mile connections.

Specialty-Aware Encoding Profiles

Standard video codecs optimize for overall perceptual quality, which is the wrong objective for clinical video. Different specialties have different quality requirements, and the codec's rate-distortion trade-off should reflect that. We implemented per-specialty encoding profiles.

Dermatology profile: Prioritizes color accuracy (BT.709 color space enforcement) and spatial resolution. Uses a real-time skin segmentation model to reduce compression in skin-tone regions. Spatial detail matters more than frame rate here.
Behavioral health profile: Maintains consistent 30fps for facial expression continuity. Allows lower spatial resolution. Uses ROI-based encoding that prioritizes face region quality. Temporal consistency matters more than spatial detail.
Telestroke profile: Ultra-low latency mode targeting sub-100ms glass-to-glass. 60fps encoding because neurological assessment of facial droop requires high temporal resolution. Generates alerts if network conditions threaten clinical assessment capability.
General consult profile: Balanced quality with adaptive resolution scaling from 1080p down to 360p based on available bandwidth. Designed to maintain diagnostic adequacy across the quality range.

These profiles are selected automatically based on the visit type in the scheduling system. Clinicians do not need to configure anything. The profiles made a noticeable difference in reducing "switch to phone call" fallbacks, particularly for dermatology and telestroke where the previous generic encoding was inadequate.

Scaling to 50,000 Concurrent Sessions

Healthcare workloads have predictable demand curves: Monday morning is peak, weekends are low, and the appointment schedule tells you exactly when demand will spike. This predictability is an advantage over consumer video platforms that face unpredictable viral load events. We exploited it.

The infrastructure runs across three AWS GovCloud availability zones. Each SFU instance handles around 800 concurrent sessions. Auto-scaling adds instances when any server exceeds 70% capacity, with new instances ready in under a minute. But the more important optimization is predictive scaling: we pre-warm capacity based on the appointment schedule, spinning up additional infrastructure 30 minutes before predicted demand spikes. After the first month of tuning the prediction model, cold-start quality degradation dropped to effectively zero.

Peak load tested: 47,000+ concurrent sessions during Monday morning clinic hours with consistent quality and latency.
Latency: Around 130ms glass-to-glass (camera capture to remote display) for sessions within the same region. Cross-region adds 30-50ms.
Availability: 99.97% over a 12-month measurement period. The downtime was concentrated in two incidents, both related to AWS dependency issues rather than application failures.
Failover: Automatic region failover completes in under 10 seconds with session migration that participants experience as a brief quality dip, not a disconnect.

On cost: a custom SFU is more work to build and operate than using a CPaaS provider, but the per-minute cost is significantly lower at this scale. The break-even point depends on volume, but for a health system running tens of thousands of daily sessions, the economics favor a custom build. Below a few thousand daily sessions, a managed service like Twilio or Vonage is probably the right call.

Compliance Lessons and Certification Path

HITRUST CSF r2 certification took 14 months from scoping to final assessment. It validates 485 control requirements across 19 domains. Our approach was to bake compliance checks into the CI/CD pipeline rather than treating certification as a separate annual exercise. Every pull request runs automated checks that verify encryption configurations, access control policies, and audit logging completeness. This means compliance drift is caught at code review time, not during the annual audit.

The OCR investigation that precipitated this project ultimately closed with zero findings against the new platform. The auditors noted the zero-knowledge architecture and tamper-evident logging favorably. More practically, the fact that we could answer every audit question with a query against the audit log rather than a manual investigation saved significant time during the review.

Automate compliance checks: We run over a thousand automated compliance checks on every deployment. Manual compliance reviews cannot keep pace with continuous deployment cadences. Automation catches issues before they reach production.
Design for auditability: Every system action generates an audit event. The goal is that when an auditor asks "who accessed what, when, and why," the answer is a database query, not a week-long research project.
Automate BAA tracking: We built a system that tracks vendor agreements, monitors compliance obligations, and flags expiring or non-compliant agreements. The previous tracking mechanism was a spreadsheet, which had contributed to the original security incident through a lapsed vendor agreement.
Test incident response regularly: Quarterly tabletop exercises for security incidents and quarterly penetration tests. The pen tests found a couple of medium-severity issues in year one, both remediated within 48 hours. The value of regular testing is that it keeps the incident response process practiced, not theoretical.

The most useful insight from this project: HIPAA compliance and good user experience are not in tension. Our most secure features, like automated session termination and device attestation, also improved UX by eliminating confusing error states and ensuring sessions start from a known-good posture. Security controls that create friction for users are usually poorly designed security controls.

Telehealth Engineering

Need a HIPAA-Compliant Telehealth Platform Built for Scale?

Our team has built secure telehealth infrastructure handling 50,000+ concurrent sessions with HITRUST certification. Let us architect your next-generation virtual care platform.

Talk to Our Healthcare Team

Clinical & Front-Office

Revenue Cycle & Insurance

Finance & Accounting

Procurement & Supply Chain

Sales & RevOps

People & HR

IT & Engineering

End-to-end delivery, one trusted team

Transforming Every Industry

FHIR-Native Data Fabric

Healthcare

RECOMMENDED BLOGS

AI Agents for Enterprise

Building a HIPAA-Compliant Telehealth Platform: Architecture, Security, and Scale

Key Takeaways

The Telehealth Engineering Challenge