XS1920-12HP: loosing connectivity to devices in mgmt VLAN (except switch itself) after 5 minutes

antesilvam
antesilvam Posts: 3  Zyxel Employee
Friend Collector First Comment
edited August 2022 in Switch
Hi!

I am experiencing a very strange problem in my network recently and I get more and more the impression it is related to the switch. I have a XS1930-12HP on which several VLANS are defined. One of these VLANs is the management VLAN where all the APs, AP controller, and switches of the network are connected. It looks like after 5 minutes of inactivity I am loosing the connectivity to all the devices in the management VLAN except the XS1930 itself, which I can still ping/log into, etc. After doing this, (e.g. a ping) the rest of the network becomes reachable again.

Here is a wireshark capture. You see the packets as they appear at the bond.99 interface of the router port into the subnet. I should mention that the router is a "router on a stick" setup where I provide all the routed subnets from the switch in a trunk connection which aggregates two physical ports into a LAG. From the wireshark dump below you can see that my first ping from my machine (192.168.11.50) to some other device in the management network (192.168.99.13) is not replied. The ARP request of the router (e0:63:da:cc:eb:46) for the target IP (192.168.99.13) is not answered. After that there is some more ARP traffic going on. The router sees the broadcast requests and answers them, but I don't know of course if the requestors really receive the replies from the router. Later you see a second ping where I now ping the switch and this seems to work as expected: icmp echo request from my machine, arp request from the switch in order to discover the router, arp reply from the router, echo reply from the switch. Once the switch "learned" the router mac address, you see that suddenly I am also able to ping 192.168.99.13, which did not work in the first place. I will now have proper connection to 192.168.99.13 for five minutes and after that it will stop again and I would have to ping the switch again in order to re-enable the connectivity.



I am experiencing this problem only with the management VLAN. All other VLANs seem to remain functional.

To me this really looks like a L2 problem and I would be extremely grateful if someone could help me with this issue. If you need more details on the switch configuration, please let me know.

Best, Benedikt

All Replies

  • Zyxel_Jason
    Zyxel_Jason Posts: 394  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @antesilvam,

    Welcome to Zyxel Community!

    It will be helpful if you may draw a simple picture of your current topology and share here.
    I will also PM you later for the running-config of your Switch, please check your inbox on the top-right of the forum page for more details.

    Thanks.
    Jason
  • antesilvam
    antesilvam Posts: 3  Zyxel Employee
    Friend Collector First Comment
    thanks a lot for your reply and your PM. I drew a small sketch of the current network configuration. It does not contain all the details, but the relevant connections to the switch should be in.


    Please let me know, if something is unclear.

    Best, Benedikt
  • Zyxel_Jason
    Zyxel_Jason Posts: 394  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @antesilvam,

    Thanks for your sharing!

    After checking your configuration, you may need to configure ARP learning mode as ARP-Request on your uplink port of your XS1930-12HP because it is a L3(Routing) capable Switch.
    Default is ARP-Reply.
    Location is at IP Application > ARP Setup > ARP Learning.


    Hope it helps.
    Jason
  • antesilvam
    antesilvam Posts: 3  Zyxel Employee
    Friend Collector First Comment
    thanks for your reply and the support. I am a bit confused. I know that the switch is L3 capable, but I did not turn on this functionality anywhere. So it seems strange to me that ARP needs to be configured on the switch, however, it looks like making the proposed changes solve the problem. It all starts to make sens if I assume that the XS1930 is routing the management subnet, which I explicitly don't want as my router is also applying FW rules to this subnet!
    Best, Benedikt

  • Zyxel_Jason
    Zyxel_Jason Posts: 394  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @antesilvam ,

    From the screenshot of your packet capture, it seems like the router 192.168.99.1 sent ARP request to ask the MAC address of 192.168.99.13, but it didn't get ARP reply from .99.13.
    Also the same thing happen when .99.13 try to find .99.1.

    We think the reason might related to the MAC address is learning at the wrong port, and ARP-request will help to update ARP table immediately when Switch receives ARP request.
    That is also the reason why the symptom will temporary gone after you ping the Switch.

    If it is convenience for you, you may unplug one cable of LAG(leave only one cable between your router and XS1930) and change the ARP learning mode to ARP-reply to see if the symptom not happen again.
    Jason