DHCP issues with IoT devices?  How the misbehavior of few IoT devices made me appreciate OpenWiFi more

DHCP issues with IoT devices?  How the misbehavior of few IoT devices made me appreciate OpenWiFi more

As a new fan for home automation, I’ve recently started a journey to install several IoT devices that improve the convenience and effectiveness of my daily life. 95% of those devices are 2.4GHz only, which has a unique set of challenges to begin with. A few of the gadgets I have so far installed are a Govee water leak detector, a Ring Amazon Camera, and TP-Link light switches and plugs. Unfortunately, when Proxy-ARP was enabled on my personal OpenWiFi APs setup, I had an issue where some IoT devices were unable to receive IP addresses.

The Govee Wi-Fi gateway and the Amazon Ring Camera were the main devices that displayed the IP address acquisition problem in the presence of active Proxy-ARP functionality on the AP. Understanding how devices behave during the IP acquiring process is crucial for identifying the root cause of this issue.

When an IoT device (or just any device for that matter) connects to a network, it requests an IP address from the network’s DHCP server. To ensure that the IP address it receives is available for use, the device sends a special type of ARP (Address Resolution Protocol) called the DHCP ARP Probe to check if any other devices on the network are using the same IP address. The expectation is that if the IP address is free, no response will be received, indicating its availability. The device may, however, think the IP address is already in use if it receives a response to its ARP Probe and declines the DHCP offer.

The IoT devices’s misbehavior manifests when Proxy-ARP is enabled on the AP. In this scenario, when an IoT device sends an ARP probe to validate IP address availability, the access point responds on behalf of the device. However, the IoT device fails to recognize that the MAC address in the response from the Proxy-ARP is its own.

As a result, the IoT device mistakenly believes that the IP address it acquired from the DHCP server is already in use since it receives a response to its ARP probe. Consequently, the device declines the DHCP offer, causing it to fail in acquiring an IP address and establishing connectivity.

As it can be seen in both packet captures for the Amazon Ring Camera and the Govee WiFi Gateway, both devices decline the DHCP offer after receiving an ARP response to the ARP probe of the offered IP address by the DHCP server.

On the other hand, the remaining devices (about 45 devices from different vendors) on both the 5GHz and 2.4GHz bands recognize the MAC addresses in the responses as being their own MAC addresses and accept the DHCP IP offers. When the Proxy-ARP capability is enabled, an iPhone 13 in the example below behaves as predicted.

Disabling the Proxy-ARP feature on the 2.4GHz SSID (as 90% of IoT devices can only operate on 2.4GHz) was the simplest and most direct solution to the issue. it worked… in my relatively quiet 2.4GHz home environment but, do I really want to compromise a topology design because of couple of misbehaving clients? and I can only picture the AirTime utilization nightmare that would arise in a larger, industrial, and much noisy environment.

I wanted to conduct one more test to further my conviction that this is a device-specific issue, which I was 99% certain of at this time. To my amazement, the Ring Camera functioned flawlessly with Proxy-ARP enabled on the SSID when I tested it against a Ruckus R730 AP that I had lying around. This prompted me to return to the drawing board and think about additional key factors that may have influenced the issue on the AP side.

Reading about how Proxy-ARP is implemented in OpenWiFi (or, to be more precise, the Linux kernel) and discussing the issue with the community. We uncovered that the ARP bounce-back capability is enabled by default in the Linux Kernel. This meant that the Kernel was replying back to the source of the ARP Probe and telling them they are the owner of that MAC address. Now all I had to is to disable that feature in the Kernel and Ring Camera should work with Proxy-ARP enabled.

Disabling the a feature in Kernel’s and building a new functional image may sounds like a tremendous task to take on that could weeks or even months. Imagine all the support nightmare you will have to go through with a vendor to just get such a fix included in their next release. The beauty of OpenWiFi being open-source is that you can easily reconfigure any fundamental functionality and create a new image to mitigate against any issues, like misbehaving clients for example. Utilizing the CICD pipeline of OpenWiFi to disable the ARP bounce-back feature in the Kernel, a new dev image with the Kernel ARP bounce-back feature set to disable from one of the program maintainers took only a few hours.

The fresh AP NOS image had the capacity to correct the misbehavior of a few IoT devices while utilizing less AirTime since I was able to enable the Proxy-ARP capabilities. While it fixed the issue, I wanted to also make sure there are no unintended consequences.  Running the new image through out nightly open-source 5000 test cases ensued that I had a stable image that is ready for deployment.

My objective when begin to write the article was to present an interesting technical case that could potentially save someone from losing a few days of their life attempting to troubleshoot it. By the time the draft was finished, I could clearly see how the message had evolved to emphasize how this case demonstrates the strength and potential of OpenWiFi and its transformative open and collaborative lifecycle, which gives everyone a voice, the power to directly affect and prioritize roadmaps, and the agility to ensure the stability of the ecosystem.

Screenshots of the packet capture with Proxy-ARP functionality is disabled on the 2.4GHz SSID. The IoT devices tend to send around 3-4 ARP Probes before it declares to the network over an ARP announcement that it has updated it is IP address.

Leave a Reply