Azure Local - Troubleshooting: Clear the DoNotDelete lock after deployment validation fails
Intro
When Azure Local deployment validation fails, the portal can leave behind stale VM switch data and the next validation attempt may fail in a way that looks unrelated to the original issue. One thing I check early is whether the first machine is still protected by the automatic DoNotDelete lock.
In this post I walk through the cleanup flow I use when validation has to be retried. I will show the symptoms, explain why the lock matters, and then go through the steps to clear the stale state and rerun the deployment.
The guidance in this article is based on the current Azure Local troubleshooting documentation: Troubleshoot deployment validation issues in Azure Local
Problem
After a failed validation retry, the portal can show errors that do not match the actual network configuration.
Typical symptoms include:
The selected physical network adapter is not binded to the management virtual switch.deploymentdata.physicalnodes[0].ipv4address: The specified ... is not a valid IPv4 address- The deployment wizard keeps returning to the same validation failure even after I correct the obvious issue
The exact message can vary, but the pattern is the same: the portal is reading stale state.
Root cause
During validation, Azure Local creates a temporary VM switch on the device. If validation fails and I retry, the DeviceManagementExtension can miss the cleanup.
That leaves two things out of sync:
- the local machine state
- the cloud-side
edgeDevices/defaultresource
If the automatic DoNotDelete lock is still in place on the first machine, cleanup of that stale state can fail or stay incomplete.
Solution
- In the Azure portal, go to the first machine or the resource group that contains it.
- Open Settings > Locks.
- Delete the
DoNotDeletelock.
If I skip this step, the cleanup can fail with a scope locked error.
On the first machine, I check for an unexpected validation switch:
Get-VMSwitch
If I see a switch I did not create intentionally, I remove it:
Remove-VMSwitch -Name "<VM Switch Name>" -Force
Then I clean up the stale edge device resource from Azure CLI:
az login --tenant <tenant ID> --use-device-code
az account set --subscription "<Subscription ID>"
az resource show --ids "/subscriptions/<Subscription ID>/resourceGroups/<Resource Group Name>/providers/Microsoft.HybridCompute/machines/<Machine Name>/providers/Microsoft.AzureStackHCI/edgeDevices/default"
az resource delete --ids "/subscriptions/<Subscription ID>/resourceGroups/<Resource Group Name>/providers/Microsoft.HybridCompute/machines/<Machine Name>/providers/Microsoft.AzureStackHCI/edgeDevices/default"
After that, I restart the DeviceManagementService on the first machine:
Restart-Service DeviceManagementService
Once the cloud data has refreshed, I go back to the Azure portal and rerun the deployment validation. If the stale VM switch was the issue, the portal should stop complaining about the old validation state.
Recommendation
After the mitigation is complete, I recreate the DoNotDelete lock on the first machine.
HINT If
az resource deletefails withScopeLocked, I know I missed the lock removal step and need to go back to the first machine in the portal.
Final remark: when deployment validation behaves oddly after a retry, I check for stale VM switch data and the DoNotDelete lock before I spend time looking for a network issue that is not really there.
Have feedback on this post?
Send me a message and I'll get back to you.