Cloud Architecture for Healthcare in 2026: HIPAA, Multi-Cloud Strategy, and AI Infrastructure

Healthcare cloud spending is estimated to have surpassed $100 billion globally in 2025 and is projected to exceed $200 billion by 2030. The growth is driven by EHR migrations to managed cloud infrastructure, the data infrastructure requirements of clinical AI, and the interoperability mandates that require health systems to expose FHIR APIs — a task far easier in cloud-native architecture than in on-premises systems.

But healthcare cloud adoption produces two very different outcomes depending on how it is architected. Done correctly, it enables faster AI development, better disaster recovery, lower total infrastructure cost, and interoperability with the broader health data ecosystem. Done poorly, it introduces compliance risk, creates data siloes that are harder to query than the on-premises systems they replaced, and generates infrastructure costs that exceed the on-premises baseline because the cloud environment was not designed with healthcare-specific constraints in mind.

This article covers the architectural decisions that determine which outcome you get.

The Compliance Foundation: What HIPAA Requires from Cloud Infrastructure

HIPAA does not prohibit cloud storage or processing of Protected Health Information. It requires that any cloud environment handling PHI meets specific technical, administrative, and physical safeguard requirements — and that every cloud vendor processing PHI on your behalf signs a Business Associate Agreement (BAA).

The three major cloud providers all offer HIPAA-eligible services:

Provider	HIPAA BAA	Healthcare-specific services	Key notes
AWS	Available; covers 150+ services	HealthLake, Comprehend Medical, Transcribe Medical	BAA must be explicitly accepted; not all services are BAA-covered
Azure	Available; included in most enterprise agreements	Azure Health Data Services, Health Bot	Check service list — coverage varies by region
Google Cloud	Available; covers core infrastructure and healthcare APIs	Cloud Healthcare API, Healthcare Natural Language API	Requires specific product selection; not global by default

What a HIPAA-compliant cloud configuration requires:

Encryption at rest: All PHI storage (object storage, databases, backups) must use AES-256 encryption with keys managed in a dedicated key management service (AWS KMS, Azure Key Vault, Google Cloud KMS) — not platform-managed default keys you cannot audit.
Encryption in transit: TLS 1.2 minimum for all data transmission; TLS 1.3 preferred. Unencrypted connections must be disabled at the network policy level.
Access controls: Role-based access with principle of least privilege. No shared service accounts. MFA required for all administrative access. PHI access requires explicit IAM policy grants.
Audit logging: CloudTrail (AWS), Azure Monitor (Azure), or Cloud Audit Logs (GCP) must be enabled for all PHI-touching services. Logs must be immutable, retained per your HIPAA retention schedule (minimum 6 years), and queryable for compliance investigations.
Breach detection: GuardDuty (AWS), Microsoft Defender (Azure), or Security Command Center (GCP) for automated threat detection. HIPAA’s 60-day breach notification window requires that you can detect incidents quickly enough to investigate and report within that window.
Network segmentation: PHI environments must be in dedicated VPCs/VNets with strict security group rules. Public internet access to PHI systems must be explicitly prohibited at the network layer, not just at the application layer.

The BAA covers the vendor’s liability for their infrastructure. It does not cover misconfigurations in your environment — those are your responsibility.

Healthcare-Specific Cloud Architecture Patterns

Pattern 1: The Data Lake for Clinical AI

Clinical AI models require training data at scale — longitudinal patient records, imaging data, lab results, clinical notes. A healthcare data lake consolidates this data from multiple source systems (EHR, lab, imaging, wearables) into a queryable store for ML training and population health analytics.

Architecture:

Source systems (EHR, LIS, PACS, RPM devices)
→ FHIR extraction (SMART on FHIR API / ETL)
→ Landing zone (S3 / ADLS Gen2 / GCS — raw format)
→ Curation layer (de-identification / normalisation)
→ Analytics layer (columnar format: Parquet / Delta Lake)
→ ML training environment (SageMaker / Azure ML / Vertex AI)
→ Feature store (real-time feature serving for inference)

Critical design decisions:

De-identification before analytics access. Raw PHI in the landing zone requires full HIPAA controls. The curated analytics layer should use de-identified or synthetic data wherever possible, reducing the PHI surface and enabling broader researcher and data scientist access.
FHIR as the normalisation standard. Extracting data from EHRs into FHIR resources (Patient, Observation, Condition, Encounter) standardises the schema across source systems. AWS HealthLake, Azure Health Data Services, and Google Cloud Healthcare API all provide managed FHIR stores with search and analytics capabilities.
Data versioning for model reproducibility. Clinical AI models must be retrained as new data arrives and patient populations change. Delta Lake or Apache Iceberg table formats enable time-travel queries — you can reconstruct the exact training dataset used for a model version, which is essential for model audit and regulatory review.

Pattern 2: Multi-Cloud for Resilience and Best-of-Breed AI

Healthcare organisations with stringent uptime requirements (clinical systems, emergency department tools) increasingly run multi-cloud architectures — primary workloads on one cloud provider, DR on a second, with workload-specific services selected across providers based on capability.

Where multi-cloud makes sense in healthcare:

Primary EHR and clinical operations on one provider (often AWS, given Epic’s AWS partnership)
Medical imaging (DICOM) processing on a provider with the strongest GPU instance availability and medical imaging AI services
Population health analytics on whichever provider has the best partnership with your analytics tool of choice
DR in a second cloud region with automated failover

Where multi-cloud adds complexity without value:

Small clinical operations where the management overhead exceeds the resilience benefit
Organisations without the cloud engineering team to maintain consistent security posture across multiple cloud environments
Latency-sensitive clinical applications where cross-cloud latency introduces unacceptable lag

Multi-cloud is not a best practice universally — it is the right choice for specific resilience and capability requirements in large, well-resourced health systems.

Pattern 3: Cloud-Native FHIR API Platform

The 21st Century Cures Act’s information blocking rules require covered EHRs to expose patient data via FHIR APIs. Building and operating a FHIR API platform on cloud infrastructure — rather than on-premises — provides the scalability, uptime, and managed service options that make this requirement operationally sustainable.

All three major cloud providers offer managed FHIR stores:

AWS HealthLake: FHIR R4 store with built-in search, analytics export, and NLP services for clinical note processing
Azure Health Data Services: FHIR R4, DICOM, and MedTech (IoT device data) services in an integrated platform
Google Cloud Healthcare API: FHIR R4, DICOM, and HL7v2 stores with BigQuery integration for analytics

Using a managed FHIR store eliminates the infrastructure management of running and scaling a FHIR server yourself. The trade-off is vendor dependency and the cost of managed services at scale.

AI Infrastructure for Healthcare Cloud

Healthcare AI workloads have specific infrastructure requirements that differ from general ML workloads.

Medical imaging (DICOM): Training imaging AI models requires GPU clusters with access to large DICOM datasets. Managed training infrastructure (SageMaker, Azure ML, Vertex AI) with GPU instance types and integration with DICOM stores reduces the infrastructure management burden. Inference for imaging AI in production requires GPU instances for latency-sensitive inference or CPU-optimised models for batch processing.

Clinical NLP: Processing clinical notes — extracting diagnoses, medications, and procedures from unstructured text — uses specialised medical NLP models. AWS Comprehend Medical, Azure Text Analytics for Health, and Google Cloud Healthcare Natural Language API provide managed clinical NLP without requiring custom model development. For organisations needing custom models (rare disease terminology, specific clinical specialties), fine-tuning on proprietary clinical notes requires PHI-compliant training environments with full audit trails.

Real-time inference for clinical decision support: Clinical decision support tools that alert clinicians in real time (sepsis detection, medication interaction checking) require sub-second inference latency. This requires inference infrastructure close to the data — either in the same cloud region as the EHR data, or deployed at the edge within the hospital network. Cloud-hosted inference with round-trip latency over 500ms is not suitable for time-critical clinical alerts.

Cost Architecture: Where Healthcare Cloud Deployments Go Wrong

The most common failure mode in healthcare cloud migration is moving on-premises costs to the cloud without redesigning the architecture to take advantage of cloud economics. The result: cloud costs exceed on-premises costs while the organisation is also still operating on-premises infrastructure during a prolonged migration.

The cost levers that matter:

Storage tiering: Medical imaging (DICOM) generates massive data volumes. Active images should be in standard object storage; archived images should automatically tier to lower-cost storage classes (S3 Glacier, Azure Archive, GCS Archive) based on age and access patterns. A hospital that doesn’t implement storage tiering will see object storage costs compound every year as imaging volumes grow.
Compute right-sizing: Over-provisioned instances — sized for peak load running at 20% utilisation — are the most common cloud cost waste. Use auto-scaling for variable workloads; reserved or savings plan pricing for baseline constant workloads.
Data egress costs: Cloud providers charge for data leaving their network. Applications that generate high cross-region or cross-cloud data transfers incur egress costs that are invisible during architecture design and significant at scale. Design data flows to minimise egress.
Managed service vs. self-managed trade-off: Managed FHIR stores, managed databases, and managed ML platforms cost more per compute unit than self-managed equivalents but include the operational overhead of updates, patching, and scaling. For small engineering teams, managed services reduce total cost including engineering time.

How we approach this at Insoftex

Healthcare cloud architecture is one of the areas where we most consistently encounter the failure mode described in the cost section: organisations migrating on-premises infrastructure to cloud without redesigning the architecture to take advantage of cloud economics. The migration produces a higher bill and a more complex operational environment than the on-premises predecessor. The root cause is almost always that cost modelling was not part of the architecture design — storage tiering, compute right-sizing, and data egress were discovered as cost surprises after the system was running, not designed as cost controls before the migration was complete.

For our AI-powered healthcare platform engagement, HIPAA compliance was the first architectural constraint — before cloud service selection, before data model design, before API surface definition. The shared responsibility model means that the HIPAA obligation sits with the healthcare organisation and the software team, not with the cloud provider. A BAA with AWS does not make a misconfigured S3 bucket HIPAA-compliant. We review the specific cloud service configurations and their PHI handling implications — storage encryption, access logging, cross-region replication policies — as part of the architecture design, not as a post-deployment compliance review.

The FHIR integration timing question — specifically that semantic mapping between a client’s existing clinical data structures and the FHIR resource model should happen in the scoping phase rather than during build — is a finding from direct experience on healthcare platform builds. The FHIR data model looks deceptively simple at the resource level; the complexity is in the semantic mapping, where a client’s diagnosis code system, medication identifiers, or provider hierarchy does not map cleanly to FHIR’s assumed structure. Discovering this during implementation requires data model changes that are significantly more expensive than the same discovery made before the architecture is set.

Architecting a healthcare cloud platform or migrating clinical infrastructure? Our healthcare engineering team specialises in HIPAA-compliant cloud architecture and FHIR-integrated clinical systems. Start with a Product Pilot for compliance design, FHIR architecture, and infrastructure cost modelling in three weeks.

Frequently Asked Questions

Does moving to the cloud make HIPAA compliance easier or harder?

It depends on the cloud provider and how the environment is configured. The major cloud providers (AWS, Azure, GCP) offer HIPAA-eligible infrastructure with BAAs, encryption, access controls, and audit logging that meet HIPAA technical safeguard requirements — and in many cases, these controls are easier to configure and verify than equivalent on-premises controls. The risk is misconfiguration: cloud environments expose more configuration surface than on-premises environments, and a misconfigured S3 bucket, overly permissive IAM policy, or missing encryption setting can create HIPAA exposure that would not exist in a correctly managed on-premises system. The answer is that cloud can make compliance easier, but it requires deliberate compliance-first architecture — not default configurations.

What is the difference between AWS HealthLake, Azure Health Data Services, and Google Cloud Healthcare API?

All three are managed FHIR R4 data stores with search, analytics integration, and clinical NLP capabilities. The differences: AWS HealthLake integrates with SageMaker for ML and with Amazon Comprehend Medical for clinical NLP; it is most commonly chosen by organisations already on AWS or working with Epic (which has a deep AWS partnership). Azure Health Data Services includes FHIR, DICOM, and MedTech (IoT medical device) services in an integrated platform — useful for organisations that need imaging and device data alongside clinical records. Google Cloud Healthcare API integrates with BigQuery for population health analytics, which is particularly powerful for organisations doing large-scale data analysis on clinical cohorts. The practical choice depends on which cloud provider your organisation uses for other workloads and which analytics and ML tools you use — the FHIR capabilities are broadly comparable across all three.

How do you handle medical imaging (DICOM) storage cost in the cloud?

Medical imaging is the largest contributor to healthcare storage costs — a single CT scan can be 500MB to several GB; a large health system generates petabytes of imaging data annually. Cloud cost management for DICOM: (1) Implement automated storage tiering — images accessed in the last 90 days in standard storage, images 90 days to 3 years in infrequent-access tiers, images older than 3 years in archive tiers. Cost reduction of 60–80% compared to keeping all images in standard storage. (2) Use DICOM-aware compression where clinically appropriate — lossless JPEG 2000 reduces file sizes without compromising diagnostic quality. (3) Implement lifecycle policies that enforce tiering automatically at the object storage level — not as a manual process. (4) Track DICOM storage separately in your cloud cost allocation tags so you can see imaging cost trends independently of application infrastructure cost.

What are the latency requirements for clinical AI inference in cloud environments?

Latency requirements vary by clinical use case. Passive decision support (risk scores surfaced in a dashboard that clinicians review when they choose to) can tolerate 2–5 second inference latency — the result is displayed when the page loads, not in the middle of a workflow. Active alerts (sepsis detection, deterioration warnings, medication interaction checks that interrupt a clinician's workflow) must complete in under 500ms to be acceptable in clinical use; under 200ms is preferred. Real-time applications (AI-assisted interpretation during a procedure, live transcription for ambient documentation) require sub-100ms latency and typically cannot tolerate cloud round-trip overhead — they require edge deployment within or near the clinical environment. Design your inference infrastructure based on the specific latency requirement of the use case, not a generic 'cloud inference' assumption.