Part 2 of the mini-series. Today we will review the results and some details of the experiment.
The University has an enterprise-grade WLAN from a Tier1 vendor. The overall amount of APs under test is 100 (plus up to 90 neighbouring APs not belonging to that network). They represent a portion of the bigger campus that serves the education, service and research facilities on the university. Each AP had visibility into ~13 other managed APs and ~6 neighbours on average. There were groups (‘cliques’) as large as 16 APs, where each AP could see the other 15. It means that at least 16 non-overlapping channels are required to run the network smoothly using traditional planning approach.
The WLAN by default runs vendor’s automated channel selection algorithm, which set the channels on the APs under test like this:
The researchers have been runnning the network with this setup for two weeks collecting statistical data for baseline. The data was collected 4 times a day – just before and after two main peak network times (below one can see two spikes on each day’s graph)
Afterwards, they have connected their own software module that implemented the advanced channel planning algorithm to the controller. The module was:
- Getting the stats from the controller via SNMP (same four times a day);
- Optimizing the channel plan;
- Pushing the manual channel assignments back (over SNMP as well) at the end of the day (in order to minimize downtimes) – to be tested next day.
This has continued for two weeks and finally resulted in the following setup:
Seems familiar to the good old 1-5-9-13, doesn’t it? Note that this time APs are using every single channel available. Of course, the majority still sticks to the minimally overlapping channels, albeit with reduced channel separation of 4.
But did this actually improve the network performance?
In order to verify this, the researchers have analysed the three network performance counters on the APs:
- FCSErrorCount – frames that have failed the FCS check. Those were successfully transmitted but were not successfully received (on the AP, so that’s uplink). In a congested network this will mostly happen due to interference or a late collision. Each of these errors may lead to a failure on a higher layer and trigger even more data retransmission (TCP segment retransmission, for example) greatly increasing the network overhead.
- FailedCount – this shows amount of failed transmissions (on the AP, so that’s downlink), including failed after multiple retransmits. It provides a good correlation with how congested a network is when multiple devices contend for the medium and when the ACKs are lost. Yet again, these failures may lead to errors/retransmits on the higher layers of networking stack.
- MulipleRetryCount – these frames were successfully transmitted after several attempts. It is a good complement to the FailedCount. This means that we were able to avoid error escalation to the higher layers of the networking stack, since the frame was sent eventually.
Each day the researchers have been comparing the data with the baseline (assuming that the user behaviour and traffic patterns differed insignificantly during the baselining and testing periods) and here’s the final averaged result and the relative per-day graphs.
As you can see, the improvement is quite impressive:
74% less Tx errors on average (ranged between -41% and –94% on different days), including 26% increase in success after multiple retransmits. Even the lowest improvement (-41%) means that the default algorithm causes 1.6 times more packet loss (3.8 times more loss compared to the average).
The success after multiple retries shows how much packets did not trigger errors on upper layers, reducing the amount of application errors and network overhead. Eduard provided a good example for this:
I like to explain the relationship between MulipleRetryCount and FailedCount with a story from the WW I: after introducing a steel combat helmet for the first time (e.g. the British Brodie helmet or the French Adrian helmet), physicians observed that the number of soldiers being treated of head wounds has increased.
Surprise? No, without the helmet many of them would have increased the number of deaths.
So, surprise for the increment in MulipleRetryCount (packets that didn’t succeed after the first transmission)? No, many of those packets that needed several transmissions would have increased the FailedCount without our interference-minimizing channel selection.
This will be overall seen as increased download speeds and lower network/application latency at the clients.
41% less Rx errors on average (ranged between -64% to +16% on different days). This increases the effective uplink contributes to downlink and latencies as well (ACKs). Note that even on the days where Rx errors increased compared to the baseline, the amount of Tx errors was still reduced.
The experiment proved that the network with cleverly overlapping channels can work significantly better than the network with non-overlapping channels, and can tolerate neighbour interference significantly better as well.
Also, for once, AutoRF did not suck! The experiment proves that it can be done right (I’m not advocating to abandon RF Design). And it doesn’t take a 4-CPU 10500TB RAM monster to run it: for this experiment the researchers used old Pentium 4 box, which took less than a minute to calculate the new RF plan for the entire network. Furthermore, they claim that an older (less functional, though) version of algorithm ran fine on the “good old” WRT54G ! Given that most modern WLAN controllers are multi-CPU x86-based servers (the NX9500 is, for-example has 2x 6-core Xeons and 36GB of RAM), running something like this should not be a problem.
You can read the full paper here .
There’s just one unanswered question left:
What does it have to do with Formula 1? Well, there is a commonality. Can you guess based on the picture below? I’ll reveal the answer in Part 3 (it’s not small either).
For now – post your guesses in the comments!
 Ester Mengual, Eduard Garcia-Villegas, Rafael Vidal, “Channel management in a campus-wide WLAN with partially overlapping channels“. In 2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Mobile and Wireless Networks.
 Ester Mengual, Eduard Garcia-Villegas “Frequency management in a campus-wide Wi-Fi deployment”. http://upcommons.upc.edu/pfc/handle/2099.1/17298 ,2013
 Garcia-Villegas, E.; Vidal-Ferré, R; Paradells-Aspas, J., “Implementation of a Distributed Dynamic Channel Assignment Mechanism for IEEE 802.11 Networks,” Personal, Indoor and Mobile Radio Communications, 2005. PIMRC 2005. IEEE 16th International Symposium on , vol.3, no., pp.1458,1462, 11-14 Sept. 2005