Created on

Azure Local – Design the infrastructure


I have seen many designs of Azure Local stacks that have encountered issues that could have been prevented if the design was changed to follow some best practices. I wanted to share some of the things I have come across that should never have been built the way it was. I have also included some drawings of minimal designs based on my experiences.

Active Directory

At some point we will likely not need Active Directory anymore. Currently in public preview, we can start use Azure KeyVault instead. But for now, lets keep on Active Directory in our deployment scope. Source: https://learn.microsoft.com/en-us/azure/azure-local/deploy/deployment-local-identity-with-key-vault?view=azloc-2506

Joined Azure Local to production domain

I see it all the time. Azure Local is been joined to the same domain as production where all kinds of member servers and clients also are joined to (client devices should not be in any domain, they should be joined to Entra ID and only connected to domain member servers via Cloud Kerberos Trust). This is not a recommended approach. Yes, if your domain is fully configured with LAPS, Tiering, Kerberos armoring, disabling of older protocols e.g., you would have a “safe” domain that you could join your Azure Local nodes into. However, I would still not do it. I will always prefer a dedicated management Active Directory for Azure Local. This is because I can then segment this management domain completely from all other areas of my environment. In some industries this is referred to as “out-off-band”. This management Active Directory should only be reached from Azure Local nodes management network and from a secured instance of Azure Bastion.

Active Directory for Azure Local running on top of stack

I believe this speaks for itself, but having the Active Directory that Azure Local is joined to, running on top of the same stack, is prone to fail at some point. That is why I like running management Active Directory in Azure. No need for extra hardware for management domain, just running a few small VMs in Azure.

Active Directory DNS Servers

Then deploying Azure Local 23H2 and newer, we need to be aware that the DNS servers we specify for the MOC IP Range, will be very hard to change afterwards (require redeployment of the MOC). If possible, deploy a private DNS resolver in Azure, use Azure Firewall with DNS Proxy or another method of DNS proxy. In some cases I just use IP addresses of Active Directory DNS servers directly, but I always make sure the customer understand this deployment option and that they cannot just swap IP addresses of those DNS servers at a later time without a huge redeployment job of the MOC.

Networking

RDMA (Remote Direct Memory Access) support on network adapter cards is essential when working with Storage Spaces Direct (S2D) in Azure Stack HCI or Azure Local environments because it enables low-latency, high-throughput communication between nodes with minimal CPU overhead. RDMA allows data to be transferred directly between the memory of different servers without involving the CPU, significantly improving performance for storage traffic. This is critical for S2D, which relies heavily on east-west network traffic to mirror and distribute storage across cluster nodes. Without RDMA, performance suffers due to increased latency and CPU load, reducing the efficiency and scalability of the system.

Lack of RSS support (Mandatory)

RSS (Receive Side Scaling) support on network adapter cards is important when using Storage Spaces Direct (S2D) in Azure Local because it ensures efficient processing of high-volume network traffic across multiple CPU cores. S2D generates substantial east-west traffic between cluster nodes, and without RSS, all incoming traffic would be handled by a single core, creating a bottleneck. RSS distributes network processing across multiple cores, improving throughput and reducing latency. This parallel processing is essential for maintaining consistent performance and maximizing the benefits of S2D in high-performance, hyper-converged infrastructure environments.

My go-to network adapter card

When possible, I go for Melanox. Melanox have good support for Azure Local. I have often used ConnectX-2 LX and ConnectX-4 LX.

Switchless networking for storage

In some setups I have storage intent without a switch. This is referred to as switchless and is supported for up to 4 nodes (please look at the requirement if aiming for 4 node support, because it requires 6 storage NICs per node).

However, as of writing this article, if we choose 2 node setup to begin with in a switchless setup, Microsoft currently does not support expanding to 3 or 4 nodes. This could be a key point for choosing to include a supported storage switch to begin with, because we can then go for 2 or 4 storage NICs per node and have the capability to expand to more nodes in the future.

Compute intent with DHCP IP addresses

I often see that there is DHCP enabled on compute network VLANs. That can be enabled for a valid reason for workloads running on top of the Azure Local stack. However, no need to assign any static or DHCP IP addresses (you could do a closed untagged VLAN on the compute intent with static IP addresses on each node just to enable the failover cluster to recon the network as compute, but not required) on the untagged VLAN going go the NICs on the compute intent. I like to configure only tagged VLANs on compute intent because it is only used for workloads running on top of the Azure Local stack.

Storage

Cluster Shared Volumes

Microsoft recommends us to configure 1 CSV per node in the cluster and divide workloads across these CSVs. This is also default when deploying a new stack. So if our stack consists of 4 nodes, we should have 4 CSVs created and in production mode, we should ensure that each node is owner of 1 CSV to spread ownership of CSVs across nodes.

Cache and capacity storage

Some customers goes for all flash SSD, but we can gain more speed and better price to space ratio, if we use use tiered model.

We can use a combination of NVMe, SSD, and HDD in Azure Local (Storage Spaces Direct)—this is actually a common and recommended approach to build a hybrid storage configuration that balances performance and capacity.

Here’s how it typically works:

  • NVMe drives are used as the cache tier (also called performance tier).
  • SSD and/or HDD drives are used as the capacity tier (also called capacity devices).

Benefits:

  • NVMe Cache Tier: Accelerates read/write performance by buffering data before it’s written to slower disks.
  • HDD Capacity Tier: Offers large storage capacity at lower cost per TB.
  • SSD Capacity Tier (optional): Used when you want faster capacity storage but without the high cost of all-NVMe.

Key Considerations:

  • All nodes in the cluster should have a consistent drive layout.
  • NVMe cache is especially beneficial in write-intensive or latency-sensitive workloads.
  • S2D automatically manages tiering between cache and capacity layers.

Typical Example:

A 2-tier configuration might look like:

  • 2 × NVMe drives (cache)
  • 4 × HDDs or SATA SSDs (capacity)

This setup is fully supported and optimized in Azure Local for performance and efficiency.

Designs

Design 1 – 2 node setup with storage switch

Design 2 – 2 node setup with switchless storage