Azure Local - Troubleshooting: Clear the DoNotDelete lock after deployment validation fails

Intro
Problem
Root cause
Solution
Recommendation

Intro

When Azure Local deployment validation fails, the portal can leave behind stale VM switch data and the next validation attempt may fail in a way that looks unrelated to the original issue. One thing I check early is whether the first machine is still protected by the automatic DoNotDelete lock.

In this post I walk through the cleanup flow I use when validation has to be retried. I will show the symptoms, explain why the lock matters, and then go through the steps to clear the stale state and rerun the deployment.

The guidance in this article is based on the current Azure Local troubleshooting documentation: Troubleshoot deployment validation issues in Azure Local

Problem

After a failed validation retry, the portal can show errors that do not match the actual network configuration.

Typical symptoms include:

The selected physical network adapter is not binded to the management virtual switch.
deploymentdata.physicalnodes[0].ipv4address: The specified ... is not a valid IPv4 address
The deployment wizard keeps returning to the same validation failure even after I correct the obvious issue

The exact message can vary, but the pattern is the same: the portal is reading stale state.

Root cause

During validation, Azure Local creates a temporary VM switch on the device. If validation fails and I retry, the DeviceManagementExtension can miss the cleanup.

That leaves two things out of sync:

the local machine state
the cloud-side edgeDevices/default resource

If the automatic DoNotDelete lock is still in place on the first machine, cleanup of that stale state can fail or stay incomplete.

Solution

In the Azure portal, go to the first machine or the resource group that contains it.
Open Settings > Locks.
Delete the DoNotDelete lock.

If I skip this step, the cleanup can fail with a scope locked error.

On the first machine, I check for an unexpected validation switch:

Get-VMSwitch

If I see a switch I did not create intentionally, I remove it:

Remove-VMSwitch -Name "<VM Switch Name>" -Force

Then I clean up the stale edge device resource from Azure CLI:

az login --tenant <tenant ID> --use-device-code
az account set --subscription "<Subscription ID>"
az resource show --ids "/subscriptions/<Subscription ID>/resourceGroups/<Resource Group Name>/providers/Microsoft.HybridCompute/machines/<Machine Name>/providers/Microsoft.AzureStackHCI/edgeDevices/default"
az resource delete --ids "/subscriptions/<Subscription ID>/resourceGroups/<Resource Group Name>/providers/Microsoft.HybridCompute/machines/<Machine Name>/providers/Microsoft.AzureStackHCI/edgeDevices/default"

After that, I restart the DeviceManagementService on the first machine:

Restart-Service DeviceManagementService

Once the cloud data has refreshed, I go back to the Azure portal and rerun the deployment validation. If the stale VM switch was the issue, the portal should stop complaining about the old validation state.

Recommendation

After the mitigation is complete, I recreate the DoNotDelete lock on the first machine.

HINT If az resource delete fails with ScopeLocked, I know I missed the lock removal step and need to go back to the first machine in the portal.

Final remark: when deployment validation behaves oddly after a retry, I check for stale VM switch data and the DoNotDelete lock before I spend time looking for a network issue that is not really there.

Have feedback on this post?

Send me a message and I'll get back to you.