XS1920-12HP: loosing connectivity to devices in mgmt VLAN (except switch itself) after 5 minutes

in Discussions
Hi!
I am experiencing a very strange problem in my network recently and I get more and more the impression it is related to the switch. I have a XS1930-12HP on which several VLANS are defined. One of these VLANs is the management VLAN where all the APs, AP controller, and switches of the network are connected. It looks like after 5 minutes of inactivity I am loosing the connectivity to all the devices in the management VLAN except the XS1930 itself, which I can still ping/log into, etc. After doing this, (e.g. a ping) the rest of the network becomes reachable again.
Here is a wireshark capture. You see the packets as they appear at the bond.99 interface of the router port into the subnet. I should mention that the router is a "router on a stick" setup where I provide all the routed subnets from the switch in a trunk connection which aggregates two physical ports into a LAG. From the wireshark dump below you can see that my first ping from my machine (192.168.11.50) to some other device in the management network (192.168.99.13) is not replied. The ARP request of the router (e0:63:da:cc:eb:46) for the target IP (192.168.99.13) is not answered. After that there is some more ARP traffic going on. The router sees the broadcast requests and answers them, but I don't know of course if the requestors really receive the replies from the router. Later you see a second ping where I now ping the switch and this seems to work as expected: icmp echo request from my machine, arp request from the switch in order to discover the router, arp reply from the router, echo reply from the switch. Once the switch "learned" the router mac address, you see that suddenly I am also able to ping 192.168.99.13, which did not work in the first place. I will now have proper connection to 192.168.99.13 for five minutes and after that it will stop again and I would have to ping the switch again in order to re-enable the connectivity.

I am experiencing this problem only with the management VLAN. All other VLANs seem to remain functional.
To me this really looks like a L2 problem and I would be extremely grateful if someone could help me with this issue. If you need more details on the switch configuration, please let me know.
Best, Benedikt
Sign In to comment.
All Replies
Welcome to Zyxel Community!
It will be helpful if you may draw a simple picture of your current topology and share here.
I will also PM you later for the running-config of your Switch, please check your inbox on the top-right of the forum page for more details.
Thanks.
Thanks for your sharing!
After checking your configuration, you may need to configure ARP learning mode as ARP-Request on your uplink port of your XS1930-12HP because it is a L3(Routing) capable Switch.
Default is ARP-Reply.
Location is at IP Application > ARP Setup > ARP Learning.
Hope it helps.
From the screenshot of your packet capture, it seems like the router 192.168.99.1 sent ARP request to ask the MAC address of 192.168.99.13, but it didn't get ARP reply from .99.13.
Also the same thing happen when .99.13 try to find .99.1.
We think the reason might related to the MAC address is learning at the wrong port, and ARP-request will help to update ARP table immediately when Switch receives ARP request.
That is also the reason why the symptom will temporary gone after you ping the Switch.
If it is convenience for you, you may unplug one cable of LAG(leave only one cable between your router and XS1930) and change the ARP learning mode to ARP-reply to see if the symptom not happen again.