Ethernet backhaul breaks DHCP


  • I have three Alien routers and a single Alien meshpoint. Current setup in the app shown here:

    0_1638307061885_Screen Shot 2021-11-30 at 1.17.36 PM.png

    A physical layout looks like this:

    0_1638306773495_working.png

    If I switch any of the RAMPs or the mesh nodes to ethernet backhaul, the network stops working. The physical layout, for example:

    0_1638306830007_broken.png

    I've reported the bug to customer service and got some escalations but no actual traction on what the problem is or why it might be broken. They suggested "ethernet loops," but I don't see how that's possible unless the loop is being created by putting the mesh point in ethernet backhaul mode. Is there maybe something about running the main node in bridge/AP mode that is causing trouble?

    The actual failure mode seems to mostly happen on iOS devices, but I think that might be a red herring. When I look at a DHCP log on the router when the problem is happening, I can see the failing node repeatedly trying to get a DHCP address, the router replying, but it looks like the failing device never gets the response:

    Nov 30 11:20:42 home-router dhcpd3: DHCPDISCOVER from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:42 home-router dhcpd3: DHCPOFFER on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:43 home-router dhcpd3: DHCPDISCOVER from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:43 home-router dhcpd3: DHCPOFFER on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:46 home-router dhcpd3: DHCPDISCOVER from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:46 home-router dhcpd3: DHCPOFFER on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:50 home-router dhcpd3: DHCPDISCOVER from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:20:50 home-router dhcpd3: DHCPOFFER on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    

    vs when DHCP is working:

    Nov 30 11:35:02 home-router dhcpd3: DHCPDISCOVER from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:35:03 home-router dhcpd3: DHCPOFFER on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:35:04 home-router dhcpd3: DHCPREQUEST for 192.168.2.145 (192.168.2.1) from 4e:7d:60:01:d1:41 via eth1
    Nov 30 11:35:04 home-router dhcpd3: DHCPACK on 192.168.2.145 to 4e:7d:60:01:d1:41 via eth1
    

    The support folks said that an ethernet loop would be detectable by the router's CPU usage spiking, which definitely does happen - so what's going on here, why does ethernet backhaul cause it, and how can I fix it?

    My ultimate goal is for all the RaMPs and the mesh node to use wired backhaul....


  • Service ticket ID #310038 in case any Ubiquity people are here reading this...


  • Hi @carlnorum - officially AmpliFi says that wired backhauls have to be connected to the LAN side of the primary Alien...
    https://help.amplifi.com/hc/en-us/articles/115006826048-How-to-Enable-Ethernet-Wired-Backhaul

    If you start with only the original Alien 'Entryway' MeshPoint wired directly to 'Network Closet' Alien router does everything work fine?

    People have wired mesh points through the WAN port to Bridge mode AmpliFi routers successfully in the past, but there may be something buggy in the current Alien firmware or they have broken it due to some other reason (i.e. virtual MAC addressing?)

    If you eventually do have to run everything through the 24-port switch, you might need to run a second VLAN line from the Network Closet Alien LAN side to the switch and connect the wired backhaul mesh points back through the VLAN if a direct connection is required

    Just throwing out ideas...


  • Yeah - I just tried a direct wire of the entryway mesh node, and things seem good. I wonder why a switch can't be in there? I can probably get away with direct wiring everything... it's only going to be weird at the TV stand, where I'll have to insert the RAMP ahead of that 16 port switch. Ew.


  • Hi @carlnorum - you don't have to direct wire everything, they just want he backhauls logically wired directly to the LAN side of primary Alien router

    The backhauls can still go through switches and other Aliens, etc., as long as the connection ends up on LAN side instead of traversing through the WAN port of the Alien router (which has worked in the past, but never officially supported)


  • OK - so if the backhauls are on the LAN side of the primary, it still has to have its WAN side hooked up to something, right?

    I guess that means interposing the primary node between my router and my entire home network.... I guess I can do that, but it seems undesirable. I'd rather hardwire these few connections I guess? Hm.


  • Nope - direct wiring a second node has broken things. Same behaviour with broken DHCP, same CPU spike on the router.

    I've direct wired Entryway and TV Stand; so my network can't possibly be involved now, right?


  • Here's the current (direct wired) setup, for reference.

    0_1638317376997_Screen Shot 2021-11-30 at 4.08.16 PM.png


  • Hi @carlnorum - that topology should work, so creating a Support Info file and updating your ticket should help them to diagnose the problem(s)


  • I wonder if you are running into spanning tree issues due to too many switching devices in the network topology?


  • Yeah, I have a support ticket open already. Grump grump grump.

    And only two switches, and in the new arrangement they aren't even directly connected; what could break?


  • @carlnorum I think, depending on how the Lan ports on the Amplifi units are used, that they may count in the path total for switches. Perhaps I'm speculating wildly; I'll be interested in the results of the support ticket.


  • Zero updates from them so far. I'm going to hit up live chat again right now and see if they have anything to say...


  • They've got me trying the 3.6.3rc3 beta now. Let's see what happens! 🤞


  • No go; exactly the same behaviour with 3.6.3rc3. I'll try the direct wiring version later this evening.


  • Hi @carlnorum - if it isn't too painful you might want to factory reset everything and go step-by-step confirming what works along the way and pulling Support Info files at each step until it breaks...

    1. Router Only
    2. Wireless MeshPoint
    3. Direct Wired MeshPoint
    4. Wireless RAMP #1
    5. Direct Wired RAMP #1
      etc.

  • I got another lengthy reply from support that didn't explain anything and tried to blame my other networking equipment (also made by Ubiquiti, and not in any kind of weird configurations). I'm going to give up and return the router/meshpoint combo and just run the routers as seperate/independent APs.


  • If it's not too much hassle you should really try Derek's suggestion of starting from scratch and testing as you go. It's probably the only way you're going to find out where and what the issue is.


  • Yup, I'm going to do that for sure. Unfortunately I don't hold out much hope. The other internet users in my house don't particularly care for my breaking everything, as you might imagine. 😉


  • Well, I set everything up. And then I added wired backhaul on one node. And it worked. And so I added wired backhaul on another node. And it worked. (Both mesh points on the LAN side ports of the main node).

    And then 3 days later (tonight, in fact), I came home from an outing and my phone is out to lunch just like before. Router showing high CPU usage just like before. It's exactly the same and for no reason.

    45 minutes later and my wife's phone starts doing it too.

    I'm dying over here. Support is useless: "It seems you are case is escalated to the tier3 team.", " I request you to wait for the tier3 team's reply. They will get back to you by email.", "I am sorry the tier3 team is available by email only. They will get back to you by email."

    I'm not sure how I expect this to be resolved without some live logging/debugging on their part.

    Has anybody ever tried wired backhaul besides me? Has anybody had it work?