Cisco HyperFlex and Nexus 3548 ping / ARP drop Layer-2 connectivity issue
I recently deployed a Cisco HyperFlex system with some old Cisco Catalyst 2960s switches. I now have this shiny new piece of tin which does 10Gb connectivity but the upstream switches were limiting it to 1Gb. It is also recommended to have jumbo frames enabled but the Catalyst switches required a reload to enable jumbo frames on them. High time to upgrade to 10Gb switching!!
I called my trusty Cisco SE and asked what the most cost effective 10Gb switch was and was recommended the Nexus N3K-C3524P-10GX which was basically a Nexus 3548 with 24 ports out of the 48 ports being licensed. After receiving the new switches, I decided to upgrade the NX-OS to the latest version from the 6.x version that it shipped with. I don’t run any advanced L3 features and only require a basic 10Gb L2 switch so didn’t think much of going to the latest version which was 7.0(3)I7(3) as I have other Nexus 9300 switches running similar 7.x NX-OS.
After configuring the new Nexus 3548 switches, I then started migrating to the new switches from the Catalyst. Instead of doing a “big bang” approach, I moved one uplink from one of the HyperFlex UCS FIs from the Catalyst to the new Nexus 3548 then moved a test VLAN / portgroup on the ESXi hosts to go through the new Nexus by changing over the vSphere vSwitch uplinks. I then tried pinging the test VMs and found that the pings were not getting through as soon as the links were changed.
I then tried to think of all the potential reasons for the root cause… configuration error?… mac address bug on the Nexus?.. mac aging issues? I then created an SVI for the test VLAN on the upstream switches from the Nexus 3548 and also SVIs on the Nexus 3548 and surprisingly the SVIs on the upstream switches and Nexus 3548 could ping each other. However, the SVI on the upstream switches were unable to ping the VMs south of the Nexus 3548 which were behind the HyperFlex UCS FIs. To add to the strange behavior, the Nexus 3548 SVIs could ping the test VMs successfully. In a nutshell, the problem statement was that anything northbound (upstream) of the new Nexus 3548 was failing to ping the VMs sitting southbound of the Nexus 3548.
I logged a case with Cisco TAC and did more testing. TAC recommended that I connected another switch with SVI south of the Nexus 3548 to test and just like the previous test, the ping works between SVIs. It ONLY seems to fail when going through the Cisco HyperFlex FIs. However, I refuse to accept that the Cisco HyperFlex FIs were the issue as it was all working via the Catalyst switches. I spent approximately 2 weeks with TAC and they even got the HyperFlex TAC team to assist. We did packet captures via the Nexus 3548, UCS FIs, ESXi hosts and even connected a laptop to run Wireshark but TAC could not find the root cause. They identified that ARP requests were being sent towards the VMs and the ESXi hosts and UCS FIs were returning the ARP responses but the responses were not coming back through the Nexus 3548.
Finally, I decided to downgrade the NX-OS of the Nexus 3548 after exhausting other possible reasons. I selected the latest v6.x NX-OS which is 6.0(2)A8(8) and got TAC to guide me through the downgrade. As v7.x had different syntax, I basically needed to remove some of the switch configuration for the downgrade to be successful. After downgrading the NX-OS version, everything worked perfectly. Issue resolved!!
Cisco TAC then tried replicating the issue in the lab and after a few weeks finally found the root cause. Cisco is now calling it a defect on the Nexus 3548 when running NX-OS 7.0(3)I7(3) with bug ID CSCvj86260. It ONLY affects the Nexus 3548 and not other Nexus 3K switches and ONLY on the new NX-OS 7.0(3)I7(3).
The explanation of the issue encountered is that the bug or defect causes the switch to drop all layer-2 packets with non-zero CoS values. So under standard operation and whilst testing with other devices and switches, there is no CoS value set and everything works fine. HyperFlex on the other hand, has CoS values set to classify and prioritize different types of traffic. This makes sense for HyperFlex as it needs to prioritize storage replication traffic over standard traffic.
So it seems that I was probably the first person in the world to have such a set up of HyperFlex with the Nexus 3548 upstream switches running NX-OS 7.0(3)I7(3).