Workaround for network bug in Azure Local 23H2

I wanted to share my recent work with a customer that have multiple Azure Local 23H2 stacks that have been upgraded from 22H2. We see an issue once we start installing NetworkATC feature. Uninstall of NetworkATC does not remove the issue sadly.

The issue is that every time a node is rebooted (and also sometime randomly then running), ALL VMSwitches on the node, switches from external to internal. This is because the bindings to the physical NICs are lost. Event logs tells us that “Physical network adapter not found” and “Physical network adapter disconnected”.

We see this issue both with Intel based network adapter cards and Melanox. Firmware and drivers are updated. Operation system on the stack nodes are also updated to newest version.

We have been running a few cases with Microsoft on this subject and so far we are still waiting for a solution for this issue. In the mean time the customer needed to ensure they could remediate the issue every time they needed to reboot a node or when it happened while node was running. I then decided it was better to write a complete script rather than letting the customer call me up every time they needed it fixed.

I wanted to share this script because I see this issue as production critical and can potentially cause downtime for workloads running on an affected stack. They customer deployed this script as a startup script via Scheduled Task. We could also choose to deploy it as a scheduled task that runs as often as possible.

Lets hope Microsoft will soon find a permanent fix for the issue – fingers crossed 🙂

# This script is used to recreate Hyper-V virtual switches for a converged network setup.
# Script is build to address a current bug in Azure Local 23H2 that has been upgraded from 22H2.
# The bug causes the Hyper-V virtual switches to loose its binding to the physical adapters after a reboot.
# It is designed to use VMSwitches for all network types: storage, compute, and management.
# Storage VMSwitch will be using multiple adapters running without VLANs, this would change once we migrate to Network ATC.
# It checks for existing switches and removes them if they are of type "Internal" or not found.
# It then creates new virtual switches with specific configurations for storage, compute, and management networks.

# Storage Switch - change parameter values as needed
$VMSwitch = $null
$VMSwitchName = "ConvergedSwitch(Storage)"
$AdapterNames = '"storage01","storage02"'
$IPAddress = "10.71.1.11"
$IPPrefixLength = 24

$VMSwitch = Get-VMSwitch -Name $VMSwitchName -ErrorAction SilentlyContinue

if($null -eq $VMSwitch -or $VMSwitch.SwitchType -eq "Internal") 
{
    # Remove existing switch if it exists
    if ($VMSwitch.SwitchType -eq "Internal")
    {
        Remove-VMSwitch -Name $VMSwitchName -Force
        Start-Sleep -Seconds 5
    }

    $VMSwitch = New-VMSwitch -Name $VMSwitchName -NetAdapterName $AdapterNames -EnableEmbeddedTeaming $true
    Set-VMSwitchTeam -Name $VMSwitch.Name -LoadBalancingAlgorithm HyperVPort
    Set-VMSwitchTeam -Name $VMSwitch.Name -TeamingMode SwitchIndependent
    # Set Jumbo Packet size for the adapters - asumees the adapters support it
    Set-NetAdapterAdvancedProperty -Name $AdapterNames -DisplayName "Jumbo Packet" -DisplayValue "9014"

    $NetAdapter = Get-NetAdapter -Name "vEthernet ($VMSwitchName)"

    New-NetIPAddress -InterfaceIndex $NetAdapter.ifIndex -IPAddress $IPAddress -PrefixLength $IPPrefixLength
}

# Compute Switch - change parameter values as needed
$VMSwitch = $null
$VMSwitchName = "ConvergedSwitch(Compute)"
$AdapterNames = '"compute01","compute02"'

$VMSwitch = Get-VMSwitch -Name $VMSwitchName -ErrorAction SilentlyContinue

if($null -eq $VMSwitch -or $VMSwitch.SwitchType -eq "Internal") {
    
    # Need to disconnect all VMs from the old switch before creating a new one
    $VMs = Get-VM
    $VMs | ForEach-Object{Get-VMNetworkAdapter -VMName $_.name | Disconnect-VMNetworkAdapter}

    # Remove existing switch if it exists
    if ($VMSwitch.SwitchType -eq "Internal")
    {
        Remove-VMSwitch -Name $VMSwitchName -Force
        Start-Sleep -Seconds 5
    }

    $VMSwitch = New-VMSwitch -Name $VMSwitchName -NetAdapterName $AdapterNames -EnableEmbeddedTeaming $true
    Set-VMSwitchTeam -Name $VMSwitch.Name -LoadBalancingAlgorithm HyperVPort
    Set-VMSwitchTeam -Name $VMSwitch.Name -TeamingMode SwitchIndependent

    # Connect all VMs to the new switch
    $VMSW = Get-VMSwitch -Name $VMSwitchName
    $VMs | ForEach-Object{Get-VMNetworkAdapter -VMName $_.name | Connect-VMNetworkAdapter -SwitchName $VMSW.name}
}

# Management Switch - change parameter values as needed
$VMSwitch = $null
$VMSwitchName = "ConvergedSwitch(Management)"
$AdapterNames = '"mgmt01","mgmt02"'
$IPAddress = "10.10.0.11"
$IPDefaultGateway = "10.10.0.1"
$IPPrefixLength = 24
$IPDNSServers = '"10.20.0.11","10.20.0.12"'

$VMSwitch = Get-VMSwitch -Name $VMSwitchName -ErrorAction SilentlyContinue

if($null -eq $VMSwitch -or $VMSwitch.SwitchType -eq "Internal") {
    
    # Remove existing switch if it exists
    if ($VMSwitch.SwitchType -eq "Internal")
    {
        Remove-VMSwitch -Name $VMSwitchName -Force
        Start-Sleep -Seconds 5
    }

    $VMSwitch = New-VMSwitch -Name $VMSwitchName -NetAdapterName $AdapterNames -EnableEmbeddedTeaming $true
    Set-VMSwitchTeam -Name $VMSwitch.Name -LoadBalancingAlgorithm HyperVPort
    Set-VMSwitchTeam -Name $VMSwitch.Name -TeamingMode SwitchIndependent

    $NetAdapter = Get-NetAdapter -Name "vEthernet ($VMSwitchName)"

    New-NetIPAddress -InterfaceIndex $NetAdapter.ifIndex -IPAddress $IPAddress -PrefixLength $IPPrefixLength -DefaultGateway $IPDefaultGateway
    Set-DnsClientServerAddress -InterfaceIndex $NetAdapter.ifIndex -ServerAddresses ($IPDNSServers)
}

2 thoughts on “Workaround for network bug in Azure Local 23H2

  1. I truly appreciate your technique of writing a blog. I added it to my bookmark site list and will

Leave a Reply

Your email address will not be published. Required fields are marked *