Intermittent loss of internet access and cannot ping firewall

I'm starting this thread as I have seen this problem on more than one client site and I can't see other posts relating to this issue.
Our setup is Leased Line (static IP) > USG310 > ZyXEL GS Managed Switch > ZyXEL Smaller GS Managed Switch > Desktop PCs and Macs
About once a day a few random PCs will loose internet access and we can't ping the firewall. Unplugging the network cable and plugging it back in resolves the issue. Oddly, it seems "established" services like Slack still work, the PC still has a valid IP and the firewall can still ping the desktop. I thought it may be Avast on the desktops, but the Macs sometimes have the same problem.
Our USG and switches have a management VLAN (Untagged / PVID=1), a Data VLAN (tagged everywhere up to the switch port that the PC plugs into) and a Phone VLAN. The Phones go through a different gateway (ISP router with it's own DHCP server) and never have a problem, hence I wonder if the issue is at the USG. The USG acts as a DHCP server for the management and Data VLANs.
Because it happens on user PCs we have limited time to fault find; next up we are going to see what the desktops can ping, e.g. the printers. We wouldn't be able to ping the switches as they are on a different VLAN.
We are also using Spanning Tree Protocol across our switches, but I suspect if that was the problem all desktops would stop, rather than random PCs at random times of day.

All Replies

  • AlfonsoAlfonso Member Posts: 245  Master Member
    Hi @Dudley_Winchester

    It really looks very strange.
    I have some doubts about your scenario.

    Are the desktops connected to the ip phones, and are the ip phones connected to the switch?
    Or, are either the phones and the desktops physically connected to the switch?

    I suppose second scenario.

    As many network issues, it is not easy to give you a magic solution, but i am going to give you how i would proceed:

    - monitor the network interfaces off all network devices (USG, switches, ...) via SNMP, putting focus on bandwith, error, ... It you want a free solution i would use cacti
    - deploy a network sniffer, i would start sniffic traffic to the USG
    -  review all network devices logs (the best way is to send all network logs to a centralized syslog server).

    I hope this ideas will help you.

    If you have further information to share, please let us know.

    Regards
  • Hi Alfonso,
    Thanks so much for your reply. I should clarify.
    The phones and computers are each connected to switch ports (ie PCs don't piggy back off phones).
    What we do see is that the firewall has a LAN1 with no VLAN (although it's all tagged PVID = 1 in the switches for the purpose of management of those devices. in the firewall the port is called ge3.
    Then, in the firewall we have a VLAN10 on top of ge3 with its own subnet. Using tagging in the switches and untagging at the switch port, the PCs get the correct IP address and are members of VLAN10... usually.
    What we have observed - when a PC goes offline, it still has a valid IP from the correct subnet and can ping other devices in the same VLAN - except for the firewall itself. Renewing the lease on the PC (or unplugging / plugging in the network cable) means the firewall pings again and we are back online.
    What we have seen in the firewall DHCP table is very odd - The desktops PCs all get the correct IP address in the range, but some entries show that the DHCP server is VLAN10 (correct) or ge3 (wrong) - ge3 should only issue IP addresses from a different subnet for switch management.
    I'm thinking for now that we turn off the DHCP server linked to ge3 to see if that helps fix things - I'm also looking to see how I can roll back the USG from v4.3x firmware to v4.25.
  • danyedinakdanyedinak Member Posts: 25  Freshman Member
    Hi Dudley,
    I don't know if you're still having this issue or not, but I have had this issue on a multitude of sites, primarily on the wireless connected clients. In each case, the resolution has been to increase the session-limit limit from 1000 to at least 4000, sometimes (though, rarely) higher. Keep an eye on it, though, as there appears to be at least some instances of that session limit being reset back to 1000 between firmware updates. There was also a pretty consistent problem with this on an older firmware version. 
Sign In to comment.