WLAN Channel Management F1 Style: Part 2 of 3

Part 2 of the mini-series. Today we will review the results and some details of the experiment.

The University has an enterprise-grade WLAN from a Tier1 vendor. The overall amount of APs under test is 100 (plus up to 90 neighbouring APs not belonging to that network). They represent a portion of the bigger campus that serves the education, service and research facilities on the university. Each AP had visibility into ~13 other managed APs and ~6 neighbours on average. There were groups (‘cliques’) as large as 16 APs, where each AP could see the other 15. It means that at least 16 non-overlapping channels are required to run the network smoothly using traditional planning approach.

The WLAN by default runs vendor’s automated channel selection algorithm, which set the channels on the APs under test like this:

ChStart

The researchers have been runnning the network with this setup for two weeks collecting statistical data for baseline. The data was collected 4 times a day – just before and after two main peak network times (below one can see two spikes on each day’s graph)

UlDl

Afterwards, they have connected their own software module that implemented the advanced channel planning algorithm to the controller. The module was:

  • Getting the stats from the controller via SNMP (same four times a day);
  • Optimizing the channel plan;
  • Pushing the manual channel assignments back (over SNMP as well) at the end of the day (in order to minimize downtimes) – to be tested next day.

This has continued for two weeks and finally resulted in the following setup:

ChEnd

Seems familiar to the good old 1-5-9-13, doesn’t it? Note that this time APs are using every single channel available. Of course, the majority still sticks to the minimally overlapping channels, albeit with reduced channel separation of 4.

But did this actually improve the network performance?

In order to verify this, the researchers have analysed the three network performance counters on the APs:

  • FCSErrorCount – frames that have failed the FCS check. Those were successfully transmitted but were not successfully received (on the AP, so that’s uplink). In a congested network this will mostly happen due to interference or a late collision. Each of these errors may lead to a failure on a higher layer and trigger even more data retransmission (TCP segment retransmission, for example) greatly increasing the network overhead.
  • FailedCount – this shows amount of failed transmissions (on the AP, so that’s downlink), including failed after multiple retransmits. It provides a good correlation with how congested a network is when multiple devices contend for the medium and when the ACKs are lost. Yet again, these failures may lead to errors/retransmits on the higher layers of networking stack.
  • MulipleRetryCount – these frames were successfully transmitted after several attempts. It is a good complement to the FailedCount. This means that we were able to avoid error escalation to the higher layers of the networking stack, since the frame was sent eventually.

Each day the researchers have been comparing the data with the baseline (assuming that the user behaviour and traffic patterns differed insignificantly during the baselining and testing periods) and here’s the final averaged result and the relative per-day graphs.

MACcounters

MACcountersGraph

As you can see, the improvement is quite impressive:

74% less Tx errors on average (ranged between -41% and –94% on different days), including 26% increase in success after multiple retransmits. Even the lowest improvement (-41%) means that the default algorithm causes 1.6 times more packet loss (3.8 times more loss compared to the average).

The success after multiple retries shows how much packets did not trigger errors on upper layers, reducing the amount of application errors and network overhead. Eduard provided a good example for this:

I like to explain the relationship between MulipleRetryCount and FailedCount with a story from the WW I: after introducing a steel combat helmet for the first time  (e.g. the British Brodie helmet or the French Adrian helmet), physicians observed that the number of soldiers being treated of head wounds has increased.

 

Surprise? No, without the helmet many of them would have increased the number of deaths.

 

So, surprise for the increment in MulipleRetryCount (packets that didn’t succeed after the first transmission)? No, many of those packets that needed several transmissions would have increased the FailedCount without our interference-minimizing channel selection.

This will be overall seen as increased download speeds and lower network/application latency at the clients.

41% less Rx errors on average (ranged between -64% to +16% on different days). This increases the effective uplink contributes to downlink and latencies as well (ACKs). Note that even on the days where Rx errors increased compared to the baseline, the amount of Tx errors was still reduced.

Conclusions:

The experiment proved that the network with cleverly overlapping channels can work significantly better than the network with non-overlapping channels, and can tolerate neighbour interference significantly better as well.

Also, for once, AutoRF did not suck! The experiment proves that it can be done right (I’m not advocating to abandon RF Design). And it doesn’t take a 4-CPU 10500TB RAM monster to run it: for this experiment the researchers used old Pentium 4 box, which took less than a minute to calculate the new RF plan for the entire network. Furthermore, they claim that an older (less functional, though) version of algorithm ran fine on the “good old” WRT54G [3]! Given that most modern WLAN controllers are multi-CPU x86-based servers (the NX9500 is, for-example has 2x 6-core Xeons and 36GB of RAM), running something like this should not be a problem.

You can read the full paper here [2].

There’s just one unanswered question left:

What does it have to do with Formula 1? Well, there is a commonality. Can you guess based on the picture below? I’ll reveal the answer in Part 3 (it’s not small either).

For now – post your guesses in the comments!

[1] Ester Mengual, Eduard Garcia-Villegas, Rafael Vidal, “Channel management in a campus-wide WLAN with partially overlapping channels“. In 2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Mobile and Wireless Networks.

[2] Ester Mengual, Eduard Garcia-Villegas “Frequency management in a campus-wide Wi-Fi deployment”. http://upcommons.upc.edu/pfc/handle/2099.1/17298 ,2013

[3] Garcia-Villegas, E.; Vidal-Ferré, R; Paradells-Aspas, J., “Implementation of a Distributed Dynamic Channel Assignment Mechanism for IEEE 802.11 Networks,” Personal, Indoor and Mobile Radio Communications, 2005. PIMRC 2005. IEEE 16th International Symposium on , vol.3, no., pp.1458,1462, 11-14 Sept. 2005

Advertisements

8 thoughts on “WLAN Channel Management F1 Style: Part 2 of 3

  1. I was going to say “In both WiFi and F1 you are trying to get the most speed possible while working within a fixed set of limits.” But I don’t see what that has to do with the picture. From the picture I think it must have something to do with steering [users|frequency] to where it will do the most good.

    Like

  2. Photo shows a front wing of an F1 car. It ‘directs’ the airflow to generate downforce for optimal grip needed for the ‘fastest’ lap time. Maybe you are driving an anology to TX beamforming.

    Like

  3. I’m curious as to where you are going with this. I have been having similar thoughts recently since the old “1,6&11” statement is 15 years old and possibly could be revisited since we have better equipment on both ends of the link.

    I won’t be making any changes to the WLAN controllers just yet, though. 🙂

    Like

  4. Did the study measure the actual user performance throughput? If a user cannot process RTS/CTS it will not result in an error, but he will still have to wait for available airtime. In the scenario described above with an abundance of noisy channels, that situation would present itself often and actual performance would suffer, wouldn’t it?

    Like

    1. Andre,
      No, the study didn’t include throughput measurements for two reasons:

      1 – In order to get the permission to conduct the experiments, one condition was not to disrupt the normal operation of the network and, user-level throughput measurements would require flooding the network with probe traffic to assess the potential throughput and hence, throughput was dismissed as performance metric.

      2 – It is impractical even in a not-so-large venue. In order to have RELIABLE results (statistically meaningful), a large number of throughput samples would have been required, which should have come from a large number of locations at different times of the day, and then repeat that multiple days.

      The alternative is (almost) passive measurements, for example, using delay-related measurements or relative error ratio. Both metrics are good performance indicators and provide a clear picture of the level of contention and interference (if you know the influence of those phenomena on time and error measurements).

      Like

  5. Very interesting research. Although I think there’s a fly in the ointment. The original approach used channels 1 to 11. The improved approach used channels 1 to 13. As we all know enlarging the spectrum anyway has a positive impact on the throughput.
    Would have been great to see the results of the improved channel assignment while keeping the constraint of using only channels 1 to 11.
    Apart from that there are also situations where you don’t want to use channel 12 and 13. Not for regulatory reasons but for staying compatible with all WiFi devices. Like those WiFi devices from guests coming from countries with limited 2.4GHz spectrum.

    Like

  6. I’d be interested to see what happens if you tell the channel planning algorithm that it can use 1,5,9,13 only. Do we get any benefit from utilising the channels in the middle over and above the simple 4 channel plan.

    Like

  7. Jon and Franz made a good point. The original experiment was somehow limited for two reasons: one, the network administrators were a little bit hesitant; and two, the firmware of the WLC had known bugs affecting some statistics. That’s why we just proved to them that we could do better than their default configuration (i.e. channels 1, 6, 11 set automatically by the WLC).

    Now that those two issues are solved (we didn’t destroy their network, as they could have feared, and we have their full trust and, also, firmware has been recently updated), we’re in position to repeat and extend our previous study, and one of the tests we already had in mind was to allow the default channel assignment algorithm to use the fantastic four: 1, 5, 9 and 13.

    Should you have any other suggestion, don’t hesitate to conatct me directly (eduardg at entel.upc.edu or via LinkedIn) or, with Arsen’s permission, through this blog post.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s