Hybrid SQL–Synapse Data Mesh
This blueprint enables a domain-driven data mesh spanning on-prem SQL Server, Azure SQL, and Azure Synapse Analytics, with governed data products that publish read-optimized interfaces to consumers. It balances regulatory controls, performance, and cost using serverless/elastic compute, caching, and layered storage.
When to Use
- Regulated enterprises modernizing large SQL estates with incremental migration to Azure.
- Federated domains (Payments, Risk, Client 360) needing productized data with SLOs.
- Hybrid footprints where some sources remain on-prem for latency, data gravity, or compliance.
Logical Architecture
- Source Plane: On-Prem SQL Server, Oracle (optional), Azure SQL DB/MI, Event Hub/Kafka, SaaS APIs.
- Ingestion & Processing: ADF/Synapse Pipelines for batch; CDC via Azure DMS/SQL MI Link; streaming via Event Hub & Spark Structured Streaming.
- Storage Layers (ADLS Gen2): Raw (immutable), Curated (conformed), Product (domain contracts).
- Compute: Synapse serverless SQL for ad-hoc & virtualization; dedicated SQL pool for MPP warehouse; Spark for transformation/feature engineering.
- Serving: Azure SQL (read replicas), Synapse external tables/views, Power BI semantic models, APIs.
- Governance: Microsoft Purview (catalog, lineage, DQ), policy-as-code, Key Vault, private endpoints.
- Observability & FinOps: Log Analytics + Azure Monitor, SLO dashboards, workload isolation, cost guardrails.
Core Components
| Component | Purpose | Notes |
|---|---|---|
| Azure Data Factory / Synapse Pipelines | Batch ingestion & orchestration | IR for hybrid connectivity; trigger on CDC windows; parameterized pipelines. |
| SQL MI Link / DMS + CDC | Low-latency incremental replication | Use for near-real time ingestion from on-prem SQL to ADLS/Synapse. |
| ADLS Gen2 (Raw→Curated→Product) | Lake zoning & lifecycle | Immutable raw, curated conformed, product for domain contracts & SLAs. |
| Synapse (Serverless + Dedicated) | Virtualization + MPP warehousing | Serverless for cheap, elastic access; dedicated for heavy, repeatable workloads. |
| Azure SQL (read replicas) | Operational serving & APIs | Publish productized tables to consumer apps; optionally read replicas for scale. |
| Microsoft Purview | Catalog, lineage, DQ policies | Register domains, enforce PII tags, publish product contracts. |
| Entra ID + Private Endpoints | Zero-trust access control | MSI, RBAC, row-level & column-level security; VNet injection. |
Domain Data Products
Each domain publishes a contract (schema + semantics + SLOs) and exposes governed interfaces:Synapse views/external tables, Azure SQL tables, or Power BI semantic models.
- Schema Evolution: versioned contracts; backward-compatible changes via views.
- Quality: DQ rules (completeness, uniqueness, timeliness) validated in Curated before Product.
- Access: consumers onboarded via Purview collections + Entra security groups.
Security & Compliance
- Identity-based access (MSI) for pipelines and compute; secrets in Key Vault.
- Private endpoints for ADLS, Synapse, Azure SQL; no public exposure.
- Row-Level Security and Dynamic Masking on serving endpoints for least privilege.
- PII/PCI tagging in Purview; policy-driven sharing with audit trails.
Resilience & DR
- RPO: 15–30 mins (CDC to lake), RTO: 1–2 hrs (warm serverless + staged metadata).
- Geo-redundant storage (RA-GRS) for lake; recover metadata (Purview) via backup/export.
- Stateless pipelines; re-playable CDC offsets; checkpointed Spark streaming.
Performance & Cost (FinOps)
- Use serverless SQL for ad-hoc/interactive; auto-pause dedicated pools out of hours.
- Partitioning & file sizing: prefer 100–500 MB parquet files; pushdown predicates.
- Tiered storage & lifecycle policies; archive Curated after SLA windows.
Reference Diagram

Example: Serverless External Table on ADLS
-- In Synapse Serverless
CREATE EXTERNAL DATA SOURCE ds_adls
WITH ( TYPE = HADOOP, LOCATION = 'abfss://curated@<storageaccount>.dfs.core.windows.net' );
CREATE EXTERNAL FILE FORMAT parquet_format
WITH ( FORMAT_TYPE = PARQUET );
CREATE EXTERNAL TABLE product_payments
WITH (
LOCATION = '/payments/v1/',
DATA_SOURCE = ds_adls,
FILE_FORMAT = parquet_format
)
AS SELECT * FROM OPENROWSET(
BULK 'abfss://curated@<storageaccount>.dfs.core.windows.net/payments/v1/*.parquet',
FORMAT='PARQUET'
) AS src;Example: Contracted Consumer View
CREATE VIEW product_payments_v1 AS
SELECT
transaction_id,
merchant_id,
amount,
currency,
txn_ts AT TIME ZONE 'UTC' AS txn_utc,
customer_id -- masked / RLS applied downstream
FROM product_payments
WHERE txn_ts >= DATEADD(day, -90, SYSUTCDATETIME());Operational Runbook (Brief)
- Provision Purview, workspaces, storage (raw/curated/product), Key Vault, private endpoints.
- Stand up ingestion (CDC + batch) with parameterized pipelines per domain.
- Implement DQ checks → promote to Product only on pass.
- Expose product contracts (views/tables/models) and register in Purview.
- Set SLOs, alerts (latency, freshness, failure), and weekly cost/usage reviews.
Summary
This hybrid SQL–Synapse mesh lets domains publish governed, reliable data products with predictable SLOs while controlling cost via serverless access and tiered storage. It’s a pragmatic path for regulated institutions to modernize without a risky big-bang migration.