Skip to main content
  1. Posts/

Two perimeters, one trust anchor

TL;DR
#

At some point, every Microsoft cloud environment hits the same wall: you stop knowing what’s set, where the gaps are, and whether the configuration that exists today matches what was intended. Manual discovery doesn’t scale. Neither does institutional memory.

The only durable answer is to declare tenant configuration in code — and then design the identity architecture so that the pipelines delivering that configuration are as tightly governed as the human access they sit alongside.

This blog is about building Microsoft cloud environments you can prove. The two layers that make this possible — the identity architecture that governs who can change the tenant, and the IaC pipeline that actually makes those changes — are the same design problem at different scopes.

The problem: tenant state you can’t account for
#

Spend enough time inside Microsoft cloud environments and a pattern emerges. Conditional Access exceptions that made sense when they were created. PIM role assignments that outlived the project that created them. Defender settings configured correctly on day one and never reviewed since. M365 self-service licence settings that opened a surface nobody tracked. Tenant security defaults with known gaps that were documented and never closed because the fix felt too risky to apply manually.

The frustration isn’t that those changes happened. The frustration is that there is no systematic way to discover what’s set, what’s missing, and what has drifted — unless you already know where to look. In a tenant that spans Conditional Access, Defender XDR, tenant security defaults, M365 service configurations, Azure control plane resources, and automation coverage, there is no single place to look. You find things one at a time, usually after something breaks or a question you can’t answer comes up in a review.

The honest question that raises: if you don’t have a complete declaration of what your tenant is supposed to look like, what does “fix the gap” actually mean?

Everything-as-code changes the question
#

The shift from portal administration to everything-as-code changes the question you’re able to ask. Instead of what is set? you can ask does the tenant match the declaration? That is a substantially more useful question.

This is the everything-as-code (EaC) commitment: Conditional Access as code, PIM as code, Defender baselines as code, Sentinel rules as code, M365 tenant settings as code, Azure infrastructure as code. Every layer of tenant state that can be drifted can instead be declared.

When those policies, assignments, baselines, rules, and settings are declared in version-controlled repos and delivered by pipeline, gap analysis becomes mechanical. The source of truth is the repository. The audit trail is the pipeline run log. Replicating a known-good configuration to a new tenant — a new customer, a new environment, a migration target — means running the pipeline against a new scope.

This is the architecture that makes tenant governance tractable, and it’s what I’ve built across the projects currently on my plate. An MSSP platform managing Conditional Access, PIM, Entra ID settings, and Landing Zone configurations for multiple customers from three Azure DevOps organisations — three Workload Identity Federation service principals each scoped to one domain, no PATs, no client secrets anywhere in the chain. Customer onboarding is JSON-driven: a single config file is the source of truth for all deployment flags; the pipeline composes it into Terraform calls, Graph API writes, and RBAC assignments. PIM is a system across three activation tiers with Conditional Access authentication contexts enforced at activation — the policy lives in Git, the enforcement is in the Entra primitive, the pipeline run is the change record. AKS-hosted banking applications where every identity component — frontend, IDP, backend, data plane, Key Vault, Event Hub — is provisioned through GitLab CI with Terraform and Managed Identity, no service account passwords anywhere in the chain. M365 tenant migrations where the destination configuration is fully declared before the first user moves.

In each case, the practical approach is the same: Azure DevOps or GitLab pipelines using Workload Identity Federation, service principals that authenticate via OIDC token exchange, Terraform or Bicep declaring the desired state. The pipeline runs are the change record. The repository is the configuration snapshot. Assessment is a diff.

The tools that exist — and the gap they leave
#

The ecosystem has not ignored this problem. The assessment side has matured significantly, and four tools between them cover much of the right-hand side of a governance loop — detect, snapshot, diff, and report.

Unified Tenant Configuration Management (UTCM) is Microsoft’s strategic direction. In staged public preview, it snapshots over three hundred resource types across Entra, Exchange, Intune, Teams, and Defender on a six-hour cycle, surfaces drift against a declared baseline, and integrates into automated reporting pipelines. If Microsoft’s direction holds, this is the platform-layer monitoring story for M365 configuration governance.

Microsoft365DSC is the community answer that predates UTCM — the closest existing tool to a full governance loop: declarative DSC configuration, apply operations, fifteen-minute drift detection, optional auto-remediation across all M365 workloads. Community-maintained, not Microsoft-supported; Microsoft’s stated direction is for UTCM to supersede it. Its DSC execution model does not fit cleanly into API-first pipelines, and it carries significant operational overhead, but it has more production history than anything else in this space.

Maester (v2.0, February 2026) is a read-only security test framework: 285-plus tests drawn from EIDSCA and CIS baselines, covering Entra, Conditional Access, Intune, and Azure, producing HTML and Markdown reports. It does not remediate anything — it is a post-apply validation gate, the step that confirms a pipeline run against a known-good security baseline.

The Zero Trust Assessment is Microsoft’s GA posture tool: point-in-time across all Zero Trust pillars, useful for a periodic review, not for CI/CD integration. Large tenants take over twenty-four hours to assess; there is no machine-readable diff output.

I run all four — but as the double-check layer after delivery, not as the delivery mechanism. Maester runs as the final stage of every pipeline apply. UTCM monitors run continuously and alert on drift. ZTA and an M365DSC full-tenant export run periodically as compliance snapshots. What these tools catch is exactly what the pipeline can’t reach: portal-only toggles, undocumented endpoints, settings that exist in the tenant but have no writable API surface. The gap the tools reveal is the same gap the pipeline is built to close where an API exists, and to document as a permanent exception where it doesn’t.

What none of them own is the left side of the loop: declaring desired state in a versioned artifact, delivering it through a governed pipeline, and producing an auditable diff as the authoritative change record. That gap is what the everything-as-code approach fills — and it is what this blog covers.

The gaps the tools can’t close — and where the work goes next
#

Closing the loop on every layer of Microsoft tenant state is harder than the architecture describes. The Microsoft API surface has real limits, and understanding those limits is as important as understanding the pattern.

The UTCM write gap. UTCM is detection-only. The strategic Microsoft tool for drift detection cannot remediate — it can tell you something changed, but it cannot write the change back. Until Microsoft ships write operations against the UTCM API, the left side of the loop — declare, deliver, apply — requires direct Graph API calls and Exchange Online PowerShell. The custom pipeline approach exists because of this gap, not despite it.

Defender for Endpoint Advanced Features. Approximately thirty toggle switches under the Defender portal’s Advanced Features page have no public Graph API. The community-discovered XDRInternals module can read and write these settings using the same session tokens the portal uses internally — acknowledged risk; undocumented endpoints, no Microsoft support guarantee, subject to breaking without notice. MDCA App Connectors each require their own OAuth consent flow with no bulk API. App Governance has a portal-only enable toggle. For initial tenant deployment, either XDRInternals or a manual runbook; UTCM monitors ongoing drift.

M365 Admin Center tenant-level settings. Several high-value security settings have no Graph API for write operations: Customer Lockbox, Release Preferences, Idle Session Timeout for unmanaged devices, Sway and Forms external sharing controls. Self-service purchase control requires the separate MSCommerce PowerShell module iterated per-product — no single toggle, no Graph API path. These are not obscure settings; Customer Lockbox and self-service purchase control are meaningful security hygiene items, and they cannot be delivered by any pipeline that relies on documented public APIs.

The exception registry as a first-class artifact. Portal-only settings are not failures of the EaC approach — they are a category the approach handles explicitly. Each gets a runbook (screenshot-based, operator-executed at initial setup), an entry in an exception registry recording what cannot be automated and why, and a UTCM monitor where the API supports it. The exception registry matters as much as the pipeline configuration: it makes the boundary between what is pipeline-managed and what is manually configured explicit, auditable, and intentional. A documented gap is not a governance failure; an undocumented one is.

What this means for the series. The Secure Delivery category is a workload-by-workload account of how this loop closes in practice — Conditional Access, PIM, and Entra ID settings first, then Defender for Office 365, Exchange Online, Intune, Purview, Teams, and SharePoint. Each workload follows the same shape — export desired state to JSON, review in a pull request, deliver via Graph API or PowerShell — but each has its own API surface, its own portal-only corner cases, and its own set of things that do not go in code. The pattern scales; the edge cases do not collapse.

Where identity and pipeline are the same surface
#

Here is the constraint that sharpens the design problem: the pipeline delivering tenant configuration is itself an identity actor in the tenant. The service principal that writes Conditional Access policies, assigns PIM roles, and configures Defender settings holds significant scope over the tenant’s security posture. Designing the tenant’s identity controls without designing the pipeline’s identity controls leaves a back door that is easier to walk through than the front.

The “two perimeters” in the title describes this exactly. The identity architecture that governs human access (PIM, Conditional Access, JIT, authentication contexts) and the pipeline identity architecture that governs machine access (Workload Identity Federation, service principal scoping, role-assignable group boundaries, no-secret delivery) are two separate constructs that answer the same question: who is allowed to make what change to this tenant, under what conditions, and can you prove it?

Both have to be designed against the same threat model. A tenant strict about human access but casual about pipeline identity has a gap that is often structurally simpler to exploit. A pipeline scoped carefully but running under an identity that bypasses the same Conditional Access policies governing human admins does not have a demonstrably better security posture.

One principle that recurs across every project I run: the security boundary often sits at a deeper layer than the surface permission suggests. A pipeline service principal holding tenant-wide RoleManagementPolicy.ReadWrite.AzureADGroup looks alarming in isolation. Read in context — the isAssignableToRole boolean on the target groups — and the actual boundary is in the Entra primitive, not in the Graph grant. The MSSP platform’s customer-onboarding pipeline safety story depends entirely on that boolean being set correctly at group-creation time. The entire Graph permission is defensible because of a single attribute.

That pattern — defensible boundaries sitting at a layer below where the surface tooling draws the line — is why both perimeters have to be understood together, not handed off to separate teams.

What the four categories cover
#

The blog is organised around four categories that carry this thesis across different parts of the Microsoft security stack.

Identity & Privileged Access — PIM design at scale across roles, groups, and Azure resources; Conditional Access authentication contexts in practice; RBAC design for tenant isolation in multi-customer deployments; JIT patterns for high-value workloads such as Azure DevOps; the security boundary that role-assignable groups draw around tenant-wide Graph permissions; CA for the new AI-agent identity type.

Secure Delivery — Workload Identity Federation patterns for ADO pipelines; self-hosted agents without PATs or stored secrets; three-domain Terraform separation of duties for tenants that write to Entra, Azure ARM, and Microsoft Graph in the same run; supply-chain isolation for security-sensitive IaC libraries; M365 configuration as code ahead of UTCM write operations.

Security Operations — Sentinel hunting queries grounded in real privileged-access audit data; Defender for Cloud at management-group scope; KQL parsers for ADO audit logs; assessment pipelines for Zero Trust posture; Sentinel cost optimisation against the March 2027 Azure-portal retirement deadline; Workspace Manager distribution across customer-isolated Sentinel workspaces.

Landing Zone Architecture — management-group hierarchy design for multi-customer tenants; subscription-vending automation; sovereign controls and Managed HSM; hub-and-spoke patterns at multi-customer scale; formula-driven customer numbering; three-tier logging architecture across application, security analytics, and archive tiers.

The first two categories carry the thesis directly. The third and fourth ground it in the operational reality of real platforms — because governance architecture only proves itself when it runs.

What this blog won’t cover
#

The blog is deliberately narrow against the surface of the Microsoft cloud:

  • Generic Azure cost management beyond what intersects with subscription vending.
  • M365 collaboration tooling — Teams admin, SharePoint settings, Loop, Viva.
  • Non-identity Defender content such as email policies or attack simulator.
  • Terraform style and idiom — HashiCorp’s own documentation handles this well.

What comes next
#

The first technical post takes up the identity half of the thesis: PIM designed as one system across roles, groups, and Azure resources — not as three separate workloads administered in three separate places.

The pipeline-side posts follow. Posts in Security Operations and Landing Zone Architecture interleave with the lead categories rather than running as a separate stream, because the operational evidence for the thesis lives there.

If you arrived from a CFP submission, a LinkedIn post, or a search result that landed on a specific technical page — this is the editorial position the rest of the content is written against.

Share on Linkedin
Categories
Meta
Tags
thesis identity-governance devsecops iac positioning everything-as-code