OpenWiFi Deployments with Multi-PSK Solutions

OpenWiFi Deployments with Multi-PSK Solutions

Recently, solutions offering MPSK-RADIUS services have gained traction in various deployment environments, including MDUs, parks, and venues. These systems provide a versatile solution, enabling user authentication, accounting, and attribute assignments such as VLANs, rate limiting, and quotas, all without requiring EAP support.

Some solutions, like those built on Passpoint 2.0, offer a smoother user experience by enabling seamless connectivity with zero user intervention. However, these solutions require EAP support, which is often lacking in low-end devices such as IoT devices. As a result, these devices cannot take advantage of the benefits provided by Passpoint 2.0-based solutions.

On the other hand, MPSK-RADIUS systems are based on the most common and basic security protocol in Wi-Fi, PSK (Personal), which is supported by virtually every device. In MPSK-RADIUS deployments, from the client’s perspective, there is no visible RADIUS server. Clients perform simple WPA or WPA2 Personal authentication using passphrases entered by the users.

In a traditional MPSK-RADIUS system, the AP does not have knowledge of the passphrase. Instead, a RADIUS server is configured with a list of MAC addresses and their corresponding passphrases. When a new client tries to connect to the AP, it goes through the open system authentication and association processes. Afterward, the AP initiates the 4-way handshake:

  • The AP sends EAPOL-Key message 1, which contains the ANonce.
  • The client generates the SNonce and, using it alongside the passphrase (which is expanded based on the SSID to become the PMK, where PMK = f(PSK, SSID)), generates the PTK and its derivatives (KEK, KCK, and TK).
    • PTK = f(PMK, Client MAC, BSSID, ANonce, SNonce)
  • The client installs the PTK, constructs EAPOL-Key message 2 by adding the SNonce, and uses the KCK to calculate a MIC for the message payload.
  • In a typical WPA2 Personal scenario, the AP would use the same inputs to calculate the PTK and verify the MIC based on the KCK. However, since the passphrase is stored on the RADIUS server, the AP forwards the entire content of EAPOL-Key message 2 to the RADIUS server in an Access-Request message.
  • The RADIUS server searches for a matching MAC address in its list. Once a match is found, it uses the passphrase and the information from EAPOL-Key message 2 to calculate the PMK. The RADIUS server then sends the PMK to the AP in an Access-Accept message.
  • With the PMK, the AP can either use the existing ANonce and SNonce to calculate the PTK and its derivatives or restart the 4-way handshake procedure to obtain new values. This decision typically depends on the configured timeout between EAPOL-Key messages and the time it takes for the RADIUS server to respond with the Access-Accept message.
  • The remainder of the process follows the standard 4-way handshake. The AP installs the PTK, generates the GTK (if this is its first client), encrypts it with the PTK, and sends it in EAPOL-Key message 3.
  • The final message, EAPOL-Key 4, is sent by the client to acknowledge the receipt and installation of the GTK.

This setup works well in controlled environments where the network administrator has a pre-populated list of all client MAC addresses. This scenario is common in enterprise networks where EAP is the primary authentication method, and MPSK-RADIUS is used to onboard IoT devices that do not support 802.1X.

However, a traditional MPSK-RADIUS setup is impractical in environments where the administrator has no control over the devices users bring to the network. A prime example is MDU networks, where the admin needs to manage a pool of PSKs that are not tied to a one-to-one mapping with specific MAC addresses. The system must allow users to authenticate with their assigned PSK, regardless of the MAC address of the device they are using to connect to the network.

MPSK with RADIUS systems offers a solution to the problem mentioned above. These solutions typically consist of multiple components that provide various services. Some of these components include:

  • RADIUS server for authentication and accounting.
  • Databases to manage pools of PSKs and PMKs.
  • Captive portal services for user interaction and onboarding.
  • SMS gateways to assist with new customer onboarding.
  • User portal for customers to manage their subscriptions and change their passwords without requiring network admin intervention.

The workflow for an MPSK-RADIUS solution is similar to the traditional MPSK-RADIUS setup described above, with one key difference: instead of matching the MAC address of the authenticating client to a list of one-to-one MAC-to-PSK pairs, the system calculates the PTK and MIC for all PSKs in the pool and tests them against the MIC sent by the AP in EAPOL-Key message 2. When a match is found, the system forwards the corresponding PMK to the AP. Once the AP receives the PMK, it has two options:

  • Create the GTK and build EAPOL-Key message 3: The AP generates the Group Temporal Key (GTK) if this is the first client or if a new GTK is needed. It then builds EAPOL-Key message 3, encrypts the GTK with the PTK, and sends it to the client.
  • Restart the 4-way handshake: If the time spent waiting for the Access-Accept message with the PMK exceeds the timeout for the 4-way handshake, the AP will restart the handshake process. This ensures that fresh ANonce and SNonce values are used, allowing the process to complete successfully.
MPSK-Radius solution authentication process

One major downside to the process described in the flowchart above is that every time a device roams to a new AP, it must undergo a full RADIUS authentication, which usually takes about one second. A roaming time of one second is unacceptable, as it can disrupt most Layer 3 (L3) connections. To address this, OpenWiFi enables PMKSA key caching for when the client re-roams back to the original AP and supports Fast Transition (FT) with MPSK-RADIUS or PSK-RADIUS when roaming to a new AP. The FT process can be summarized as follows:

  • Client initiates the roaming process by scanning for the next best candidate AP.
  • Client confirms FT support: The client checks whether the target AP supports Over-The-Air (OTA) or Distribution System (DS) FT transition roaming.
  • For Over-The-Air (OTA) FT:
  • Authentication exchange: The client and the target AP exchange FT Authentication frames. The client shares its SNonce, PMK ID, PMK-R0 ID, and Mobility Domain ID with the target AP. In return, the target AP provides its ANonce, PMK ID, PMK-R0 ID, PMK-R1 ID, and Mobility Domain ID.
  • Association exchange: The client and the target AP exchange Association and Reassociation frames. In the Association frames, the client includes the same information as above, along with a MIC generated on the message payload. In the Reassociation frames, the target AP includes the same information plus the GTK.
  • For Distribution System (DS) FT:
  • Authentication exchange: The same information is exchanged in the Authentication frames as in OTA FT, but instead of being sent over the air, the data is embedded in an Action Frame sent to the current AP. The current AP forwards this frame over the distribution system to the target AP. The target AP replies to the client via the distribution system through the current AP.
  • Association and Reassociation exchange: The Association and Reassociation frames remain unchanged, and the procedure follows the same steps as in the OTA case.
Flowchart of the OTA FT process from the IEEE802.11 2020 document
Flowchart of over DS FT process from the IEEE802.11 2020 document

The packet capture screenshots below illustrate how PMKSA key caching and Over-The-Air (OTA) Fast Transition (FT) with MPSK-RADIUS reduce authentication time from approximately 1 second to around 50 milliseconds.

OpenWiFi time with PMKSA Key caching when roaming back to the original AP
OpenWiFi Roaming time with MPSK-Radius with OTA FT enabled.

Multi-Vender MPSK deployment with Dynamic VLAN Assignment on FreeRadius 

Multi-Vender MPSK deployment with Dynamic VLAN Assignment on FreeRadius 

In today’s diverse networking environments, there’s a growing trend towards combining proprietary vendor solutions with open-source alternatives to achieve a balance of reliable performance and cost-effectiveness. This article explores this idea by demonstrating a Multi-Pre-Shared Key (MPSK) Wi-Fi solution using both Juniper APs and OpenWiFi APs, where authentication and accounting is managed through FreeRadius server.

The Network Setup

At the heart of the network, we have the 3.0 version of the FreeRadius server—an open-source solution capable of handling authentication for a variety of devices which I am running on my home NUC Ubuntu box, alongside a ISC-DHCP-Server to handle DHCP requests, and some iptables NAT rules to nat clients to the outside world . In this scenario, we’re looking at the Juniper AP45 and the TIP OpenWiFi CIG WF-196, both are configured through their cloud controller and added as clients within the FreeRadius’s clients.conf file.

# Juniper AP45
client a83a79a8367e {
    secret = secret
    ipaddr = 10.0.122.18
}

# OpenWiFi WF-196
client d4babaa1484 {
    secret = secret
    ipaddr = 10.0.122.23
}

For continuous roaming tests between these two APs, the Candela LANforge WiFi Roam test is employed. The test basically allows the user to construct a script that can run for a set amount of time or indefinitely. The operator can determine where, when, and how long the client should roam.

Configuring MPSK

The MPSK configuration requires minimal adjustments on the FreeRadius server. For each device, the MAC address and the PSK are entered into the users file:

# LANforge AX210
e8f4080324ba Cleartext-Password := "e8f4080324ba"
        Tunnel-Type = 13,
        Tunnel-Medium-Type = 6,
        Tunnel-Private-Group-Id = 1000,
        # Expected PSK for OpenWiFi using the tunneled password attribute
        Tunnel-Password = 12345678,
        # Expected PSK for Juniper in a Cisco AVPair attributes
        Cisco-AVPair = "psk-mode=ascii",
        Cisco-AVPair += "psk=12345678"

Notably, TIP OpenWiFi APs require the Tunnel-Password attribute, while the Juniper APs utilize the Cisco-AVPair attributes. The 2 different approaches show case how an homogenous ecosystem can be built by integrating proprietary and open source solutions with a universal authentication system like FreeRadius.

Mixed Ecosystem Benefits

The merger of TIP OpenWiFi and proprietary solutions like Juniper’s APs provides numerous benefits:

  • Cost Efficiency: TIP OpenWiFi APs like the WF-196 are significantly more affordable than many proprietary solutions and can sometimes be the only realistic solution in industries with limited budget. A simple comparison of the two APs demonstrates why the open source solution might be a game changer for industries where budgets are limited.
  • Flexibility: TIP OpenWiFi’s open-source nature allows for extensive customization and integration. The open source code allows for customizations that might not be available with the vendor locked solution.
  • Innovation: Proprietary solutions often come with advanced features and robust support, while open-source projects bring community-driven innovation. As a member of the OpenWiFi community you have the opportunity to lead the development of new features on corner cases that the vendor might not be interested in developing.

Configuraing The OpenWiFi Controller: An SDK with a UI

The OpenWiFi controller stands out as an SDK (Software Development Kit) with a user interface, designed for developers and organizations to create their own user-friendly controllers.below we show a screenshot of the configs in a JSON file format. Not all fields have been expanded to not overwhelm the reader. Only the important parts that highlight the settings for the SSID and the RADIUS server.

The Juniper Controllers Configuration

Configuring the Juniper Controller was straightforward and uncomplicated. As someone who had never used it before, I felt as if I had been using it for years. Every configuration was where you expected it to be, and the transitions between steps were effortless.

Running the Test.

The brief movie below shows a wireless STA managed by the Canddela LANforge roaming between the AP45 (channel 48) and the WF-196 (channel 44). Two additional radios were utilized as sniffers to catch the traffic. An EAPOL display filter was employed to separate the 4-Way Handshake from the rest of the air traffic.

I did not get into the specifics of how to set up a FreeRadius server or the more in-depth aspects of the Juniper Mist MPSK setups because that was not the purpose of this article. If you need assistance with the issues listed above, please see my wonderful friend Mohammad Al‘s post at https://artofrf.com/2024/04/02/mist-mpsk-with-freeradius/.

DHCP Option 82 and the endless possibilties

DHCP Option 82 and the endless possibilties

Recently, the development team at OpenWiFi has added a DHCP-Relay feature to the OpenWiFi firmware. I worked with the Q&A team to come up with a test plan for the feature. At first it sounded straight forward. All we have to do is capture the DHCP request packets leaving the AP and verify whether they are sent out as broadcast or unicast. The other aspect we needed to verify was that under option 82 the development team allowed for 3 values to be forwarded to the DHCP server under the 1st and 2nd sub-options, (Agent Circuit ID and Agent Remote ID) the SSID, the VLAN, or the AP MAC address but, this could also be verified with the same simple Wireshark capture.

This is all good but, is it enough? Does that mean that the DHCP-Relay will work as a part of an ecosystem? Will the DHCP server recognize the information sent under option 82 and filter clients accordingly ? Those questions made me realize that a simple capture is insufficient to guarantee whether or not this new feature actually works. We needed to come up with a scenario where this feature is tested under real world conditions. A test plan that mimics how this feature will be used in the field.

But, first let’s take a step back and refresh our memory on what does a DHCP Relay do and why is it important ? For a client to obtain an IP from a DHCP server it has to go through the DORA frame work.

The process starts with broadcasting a Discover packet with a destination address 255.255.255.255. If there is an active DHCP sever on the same VLAN or subnet it will reply with an Offer message and the client and server go through the DORA framework. However, sometimes the DHCP server is on a different VLAN or sets in the cloud serving many locations at once. We all know that broadcasts are dropped by the routers so how can a client reach a DHCP server that doesn’t live on the same subnet. This is where the DHCP-Relay comes into play. The DHCP-Relay intercepts the Discover packet and transform it on behave of the client from a broadcast to a unicast that targets a specific preconfigured DHCP server.

Considering all of the above, it was time to get to the whiteboard. I decided to use my home network as the ground for testing. My home router which is just an Ubuntu NUC running linux will act as the NAT router for 2 new VLANs 100, and 200 (along side my native VLAN subnet 10.0.0.0/16 that I use for my home devices and AP and switches management interfaces). My Ubuntu server will provide DHCP pools for the 2 new VLANs through ISC-DHCP-Server service. I needed a switch to tie it all together so I bought a managed L3 TP-Link switch (TL-SG2008P). I created VLANs on it and connected everything together.

I configured the OpenWiFi AP to act as the DHCP-Relay for any clients connected on the 2 SSIDs I created, WAN100 and WAN200 (WAN100 -> VLAN100, WAN200-> VLAN200). I also configured the DHCP-Relay to include the SSID name under the 1st sub-option of option 82. The reason for that is to use this piece of information as the basis for filtering. The DHCP server will decide on what pool should the clients gets its IP from.

The real challenge laid with understanding and configuring the ISC-DHCP-Server to distinguish between the different request and assign the correct IP address.

I created 2 classes to filter sub-option1 under DHCP option 82. I used the allow and deny statement to filter which pool should serve each VLAN. I deep dove into the packet capture to find out the offset and length of the information included in sub-option 1 to input the correct values for the match if condition statement. This required a capture of the unicast DHCP Discover packet and finding out the number of bytes included in the sub-option.

It can be seen in the screenshot above the total number of highlighted bytes are 8 and the offset is 0 since we started counting from the first received byte.

All was left now if to create 2 STAs and connect each one to a different SSID on the AP and hope they get the correct IP addresses. I took me around 3 days to perfect the setup but at the end the 2 virtual STAs I created on my Candela LANforge box acquired the correct IP addresses with respect to the SSIDs they were connected to.

The 2 highlighted STAs wlan0 and wlan1 are shown to be connected to their respected SSIDs and the IP addresses match the pools assigned to those SSIDs.

The DHCP-Relay feature combined with the ability to filter based on the information included in option 82 on the DHCP server side opens up countless opportunities for different deployment scenarios. Clients can identified and assigned to specific pools based on many different aspects. For example, one can include the AP MAC to differentiate clients based on their physical location, or assign variable lease times based on whether this is a guest or employee VLAN.

DHCP issues with IoT devices?  How the misbehavior of few IoT devices made me appreciate OpenWiFi more

DHCP issues with IoT devices?  How the misbehavior of few IoT devices made me appreciate OpenWiFi more

As a new fan for home automation, I’ve recently started a journey to install several IoT devices that improve the convenience and effectiveness of my daily life. 95% of those devices are 2.4GHz only, which has a unique set of challenges to begin with. A few of the gadgets I have so far installed are a Govee water leak detector, a Ring Amazon Camera, and TP-Link light switches and plugs. Unfortunately, when Proxy-ARP was enabled on my personal OpenWiFi APs setup, I had an issue where some IoT devices were unable to receive IP addresses.

The Govee Wi-Fi gateway and the Amazon Ring Camera were the main devices that displayed the IP address acquisition problem in the presence of active Proxy-ARP functionality on the AP. Understanding how devices behave during the IP acquiring process is crucial for identifying the root cause of this issue.

When an IoT device (or just any device for that matter) connects to a network, it requests an IP address from the network’s DHCP server. To ensure that the IP address it receives is available for use, the device sends a special type of ARP (Address Resolution Protocol) called the DHCP ARP Probe to check if any other devices on the network are using the same IP address. The expectation is that if the IP address is free, no response will be received, indicating its availability. The device may, however, think the IP address is already in use if it receives a response to its ARP Probe and declines the DHCP offer.

The IoT devices’s misbehavior manifests when Proxy-ARP is enabled on the AP. In this scenario, when an IoT device sends an ARP probe to validate IP address availability, the access point responds on behalf of the device. However, the IoT device fails to recognize that the MAC address in the response from the Proxy-ARP is its own.

As a result, the IoT device mistakenly believes that the IP address it acquired from the DHCP server is already in use since it receives a response to its ARP probe. Consequently, the device declines the DHCP offer, causing it to fail in acquiring an IP address and establishing connectivity.

As it can be seen in both packet captures for the Amazon Ring Camera and the Govee WiFi Gateway, both devices decline the DHCP offer after receiving an ARP response to the ARP probe of the offered IP address by the DHCP server.

On the other hand, the remaining devices (about 45 devices from different vendors) on both the 5GHz and 2.4GHz bands recognize the MAC addresses in the responses as being their own MAC addresses and accept the DHCP IP offers. When the Proxy-ARP capability is enabled, an iPhone 13 in the example below behaves as predicted.

Disabling the Proxy-ARP feature on the 2.4GHz SSID (as 90% of IoT devices can only operate on 2.4GHz) was the simplest and most direct solution to the issue. it worked… in my relatively quiet 2.4GHz home environment but, do I really want to compromise a topology design because of couple of misbehaving clients? and I can only picture the AirTime utilization nightmare that would arise in a larger, industrial, and much noisy environment.

I wanted to conduct one more test to further my conviction that this is a device-specific issue, which I was 99% certain of at this time. To my amazement, the Ring Camera functioned flawlessly with Proxy-ARP enabled on the SSID when I tested it against a Ruckus R730 AP that I had lying around. This prompted me to return to the drawing board and think about additional key factors that may have influenced the issue on the AP side.

Reading about how Proxy-ARP is implemented in OpenWiFi (or, to be more precise, the Linux kernel) and discussing the issue with the community. We uncovered that the ARP bounce-back capability is enabled by default in the Linux Kernel. This meant that the Kernel was replying back to the source of the ARP Probe and telling them they are the owner of that MAC address. Now all I had to is to disable that feature in the Kernel and Ring Camera should work with Proxy-ARP enabled.

Disabling the a feature in Kernel’s and building a new functional image may sounds like a tremendous task to take on that could weeks or even months. Imagine all the support nightmare you will have to go through with a vendor to just get such a fix included in their next release. The beauty of OpenWiFi being open-source is that you can easily reconfigure any fundamental functionality and create a new image to mitigate against any issues, like misbehaving clients for example. Utilizing the CICD pipeline of OpenWiFi to disable the ARP bounce-back feature in the Kernel, a new dev image with the Kernel ARP bounce-back feature set to disable from one of the program maintainers took only a few hours.

The fresh AP NOS image had the capacity to correct the misbehavior of a few IoT devices while utilizing less AirTime since I was able to enable the Proxy-ARP capabilities. While it fixed the issue, I wanted to also make sure there are no unintended consequences.  Running the new image through out nightly open-source 5000 test cases ensued that I had a stable image that is ready for deployment.

My objective when begin to write the article was to present an interesting technical case that could potentially save someone from losing a few days of their life attempting to troubleshoot it. By the time the draft was finished, I could clearly see how the message had evolved to emphasize how this case demonstrates the strength and potential of OpenWiFi and its transformative open and collaborative lifecycle, which gives everyone a voice, the power to directly affect and prioritize roadmaps, and the agility to ensure the stability of the ecosystem.

Screenshots of the packet capture with Proxy-ARP functionality is disabled on the 2.4GHz SSID. The IoT devices tend to send around 3-4 ARP Probes before it declares to the network over an ARP announcement that it has updated it is IP address.