Hybrid SQL–Synapse Data Mesh

This blueprint enables a domain-driven data mesh spanning on-prem SQL Server, Azure SQL, and Azure Synapse Analytics, with governed data products that publish read-optimized interfaces to consumers. It balances regulatory controls, performance, and cost using serverless/elastic compute, caching, and layered storage.

When to Use

Regulated enterprises modernizing large SQL estates with incremental migration to Azure.
Federated domains (Payments, Risk, Client 360) needing productized data with SLOs.
Hybrid footprints where some sources remain on-prem for latency, data gravity, or compliance.

Logical Architecture

Source Plane: On-Prem SQL Server, Oracle (optional), Azure SQL DB/MI, Event Hub/Kafka, SaaS APIs.
Ingestion & Processing: ADF/Synapse Pipelines for batch; CDC via Azure DMS/SQL MI Link; streaming via Event Hub & Spark Structured Streaming.
Storage Layers (ADLS Gen2): Raw (immutable), Curated (conformed), Product (domain contracts).
Compute: Synapse serverless SQL for ad-hoc & virtualization; dedicated SQL pool for MPP warehouse; Spark for transformation/feature engineering.
Serving: Azure SQL (read replicas), Synapse external tables/views, Power BI semantic models, APIs.
Governance: Microsoft Purview (catalog, lineage, DQ), policy-as-code, Key Vault, private endpoints.
Observability & FinOps: Log Analytics + Azure Monitor, SLO dashboards, workload isolation, cost guardrails.

Core Components

Component	Purpose	Notes
Azure Data Factory / Synapse Pipelines	Batch ingestion & orchestration	IR for hybrid connectivity; trigger on CDC windows; parameterized pipelines.
SQL MI Link / DMS + CDC	Low-latency incremental replication	Use for near-real time ingestion from on-prem SQL to ADLS/Synapse.
ADLS Gen2 (Raw→Curated→Product)	Lake zoning & lifecycle	Immutable raw, curated conformed, product for domain contracts & SLAs.
Synapse (Serverless + Dedicated)	Virtualization + MPP warehousing	Serverless for cheap, elastic access; dedicated for heavy, repeatable workloads.
Azure SQL (read replicas)	Operational serving & APIs	Publish productized tables to consumer apps; optionally read replicas for scale.
Microsoft Purview	Catalog, lineage, DQ policies	Register domains, enforce PII tags, publish product contracts.
Entra ID + Private Endpoints	Zero-trust access control	MSI, RBAC, row-level & column-level security; VNet injection.

Domain Data Products

Each domain publishes a contract (schema + semantics + SLOs) and exposes governed interfaces:Synapse views/external tables, Azure SQL tables, or Power BI semantic models.

Schema Evolution: versioned contracts; backward-compatible changes via views.
Quality: DQ rules (completeness, uniqueness, timeliness) validated in Curated before Product.
Access: consumers onboarded via Purview collections + Entra security groups.

Security & Compliance

Identity-based access (MSI) for pipelines and compute; secrets in Key Vault.
Private endpoints for ADLS, Synapse, Azure SQL; no public exposure.
Row-Level Security and Dynamic Masking on serving endpoints for least privilege.
PII/PCI tagging in Purview; policy-driven sharing with audit trails.

Resilience & DR

RPO: 15–30 mins (CDC to lake), RTO: 1–2 hrs (warm serverless + staged metadata).
Geo-redundant storage (RA-GRS) for lake; recover metadata (Purview) via backup/export.
Stateless pipelines; re-playable CDC offsets; checkpointed Spark streaming.

Performance & Cost (FinOps)

Use serverless SQL for ad-hoc/interactive; auto-pause dedicated pools out of hours.
Partitioning & file sizing: prefer 100–500 MB parquet files; pushdown predicates.
Tiered storage & lifecycle policies; archive Curated after SLA windows.

Reference Diagram

Example: Serverless External Table on ADLS

-- In Synapse Serverless
CREATE EXTERNAL DATA SOURCE ds_adls
WITH ( TYPE = HADOOP, LOCATION = 'abfss://curated@<storageaccount>.dfs.core.windows.net' );

CREATE EXTERNAL FILE FORMAT parquet_format
WITH ( FORMAT_TYPE = PARQUET );

CREATE EXTERNAL TABLE product_payments
WITH (
  LOCATION = '/payments/v1/',
  DATA_SOURCE = ds_adls,
  FILE_FORMAT = parquet_format
)
AS SELECT * FROM OPENROWSET(
  BULK 'abfss://curated@<storageaccount>.dfs.core.windows.net/payments/v1/*.parquet',
  FORMAT='PARQUET'
) AS src;

Example: Contracted Consumer View

CREATE VIEW product_payments_v1 AS
SELECT
  transaction_id,
  merchant_id,
  amount,
  currency,
  txn_ts AT TIME ZONE 'UTC' AS txn_utc,
  customer_id  -- masked / RLS applied downstream
FROM product_payments
WHERE txn_ts &gt;= DATEADD(day, -90, SYSUTCDATETIME());

Operational Runbook (Brief)

Provision Purview, workspaces, storage (raw/curated/product), Key Vault, private endpoints.
Stand up ingestion (CDC + batch) with parameterized pipelines per domain.
Implement DQ checks → promote to Product only on pass.
Expose product contracts (views/tables/models) and register in Purview.
Set SLOs, alerts (latency, freshness, failure), and weekly cost/usage reviews.

Summary

This hybrid SQL–Synapse mesh lets domains publish governed, reliable data products with predictable SLOs while controlling cost via serverless access and tiered storage. It’s a pragmatic path for regulated institutions to modernize without a risky big-bang migration.