I was recently asked
about a product that requires 90Mbps (bits per sec) of throughput. With a Gig
link, no problem. If the link is 100Mbps, technically it could work
but that is a little beyond my engineering safety factor of 20%, so
about the most I would want to see on a link is 80% of the line rate,
or 80Mbps. But if we put in all Gig links, it should be fine.
Then they broke the
news that what they really want is to be able to go wireless, too.
That's a lot of throughput for wireless; even with new devices that
claim super speeds:
Trying to find the
asterisk note at the bottom of the page (and not singling out Linksys
for any particular reason as all the commodity product vendors do
similar things):
*The
standard transmission rates–1000 Mbps or 2166 Mbps (for 5 GHz),
1000 Mbps (for 2.4 GHz), 54 Mbps, and 11 Mbps–are the physical data
rates. Actual data throughput will be lower and may depend on the mix
of wireless products used and external factors.
So you don't really
get 5.3Gbps… these are devices that have three radios
Tri-Band
(5 GHz + 5GHz + 2.4 GHz)
And the speeds
listed are datarates, and then they sum the maximum datarate on each
radio:
Wi-Fi
Speed: AC5400
(N1000 + AC2166 + AC2166)
I ask: what type of
WiFi system do you have available? Turns out the product will have
an 802.11 abgn 1x1:1 radio, that supports 20MHz & SGI (short
guard interval). So now we know some capabilities.
From http://mcsindex.com/, we
know the maximum datarate this device can support is 72.2Mbps:
The description tells us:
- 802.11a means device can utilize the 5GHz band. Exactly how many channels that can be used depends on a number of factors: regulatory domain (i.e. US uses FCC, Europe has ETSI, etc), whether DFS channels are supported, etc. So some information here, but not the complete picture. See for more details https://en.wikipedia.org/wiki/List_of_WLAN_channels.
-
802.11bg means the device works on 2.4GHz. There is less room for interpretation here, but it still does depend on the regulatory domain. For example, FCC provides for channels 1-11, where three are practically usable simultaneously in an infrastructure deployment (the so called three channel plan, channels 1/6/11). But other regions get additional 2.4GHz channels for use.
-
802.11n means that the device uses enhancements to both 2.4 and 5GHz bands for performance improvement. Note that n is not a band, but a series of performance-enhancing features. For practical reasons, those enhancements do more for 5GHz channels than 2.4GHz, but even 2.4GHz operation benefits from these enhancements. To see all of the specific performance capabilities of n, look at the MCS Index table for the HT (high throughput) Index values. Note that there are many; indeed, there are many options and different levels of support so just saying n support does not provide a lot of detail as to the maximum datarate capability of the device. In this case, the device supports 1 spatial stream, 20MHz channel bandwidth, and SGI=400ns (so short guard interval) and a maximum MCS Index of 7, so maximum datarate is 72.2Mbps. One popular device used in the WiFi diagnostic industry claims
802.11
a/b/g/n packet injection at all rates
yet does not support
SGI, so how could it support all rates?
So inevitably, after
showing the MCSIndex table, the gap to close is 90 Mbps required Vs.
72.2 Mbps capability. But not so fast… who is getting 72.2Mbps?
“But you just
showed us the table and told us we can do 72.2!” they say.
Not really. There
are liars, damn liars, and 802.11 WiFi engineers. No one gets real
throughput that matches the datarate. Let's be clear on the choice
of words: datarate is the rate at which data can be transferred.
Throughput is the actual amount of data of that is transferred; make
no mistake - users care about throughput:
“How fast can I
download the file?”
“The Internet
seems slow today”
“Why is my network
game so slow?”
“This Youtube
video keeps stopping!”
Of course there is
some relation: datarate would be the upper bound of throughput and
the confusion stems from this fact; in the wired world, in general,
throughput generally equals datarate.
Let's try iperf
between two hosts, wired, with Gig links. All switched network:
user@host:~$
iperf3 -c 192.168.20.21 -f m -i 1 -t 10
Connecting
to host 192.168.20.21, port 5201
[
4] local 192.168.20.14 port 41333 connected to 192.168.20.21 port
5201
[
ID] Interval Transfer Bandwidth
[
4] 0.00-1.00 sec 113 MBytes 944 Mbits/sec
...
[
4] 9.00-10.00 sec 113 MBytes 949 Mbits/sec
-
- - - - - - - - - - - - - - - - - - - - - - - -
[
ID] Interval Transfer Bandwidth
[
4] 0.00-10.00 sec 1.10 GBytes 949 Mbits/sec
sender
[
4] 0.00-10.00 sec 1.10 GBytes 948 Mbits/sec
receiver
So throughput here
is about 950Mbps for a link speed of 1000Mbps, so about 95% of line
rate, where line rate is datarate. In this case,
throughput is basically the same as datarate. There is still some
overhead when using TCP so we would never reach 100% for actual data throughput. The link may be saturated, but due to overhead when
using TCP/IP, it's not all available for data. For full size frames
with an assumed MTU of 1500bytes (so max frame size is 1514, which is
MTU+Ethernet Header),
- TCP/IP Frame ComponentSize [bytes]Ethernet14IP (no options)20TCP (no options)20Data (TCP MSS)1460
We can calculate the
efficiency. However, it's not that important for this topic as 95%
of line rate is good enough for our needs at this point to assume
throughput=datarate.
However, this is not
always the case in the wired world. It's very common, but sometimes
we can run into products where the bottleneck is upstream of the
network interface, so the maximum throughput could be less than the
datarate. For example, take an iMX6 maker board. There are several
manufacturers, for example,
The errata
(https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf) for the CPU
includes a note regarding maximum throughput:
ERR004512
Description:
The theoretical maximum performance of 1 Gbps ENET is limited to 470
Mbps (total for Tx and Rx). The actual measured performance in an
optimized environment is up to 400 Mbps.
And indeed, using
flowcontrol to even attain these speeds on the infrastructure switch,
we can get for the forward and reverse directions:
user@host:~$
iperf -c 192.168.20.21 -f m -i
1 -t 10
------------------------------------------------------------
Client
connecting to 192.168.20.21, TCP port 5001
TCP
window size: 0.02 MByte (default)
------------------------------------------------------------
[
3] local 192.168.20.224 port 35568 connected with 192.168.20.21 port
5001
[
ID] Interval Transfer Bandwidth
[
3] 0.0- 1.0 sec 46.8 MBytes 392 Mbits/sec
...
[
3] 0.0-10.0 sec 501 MBytes 420 Mbits/sec
user@host:~$
iperf -c 192.168.20.21 -f m -i 1 -t 10 -R
------------------------------------------------------------
Client
connecting to 192.168.20.21, TCP port 5001
TCP
window size: 0.02 MByte (default)
------------------------------------------------------------
[
3] local 192.168.20.224 port 35569 connected with 192.168.20.21 port
5001
[
ID] Interval Transfer Bandwidth
[
3] 0.0- 1.0 sec 47.8 MBytes 401 Mbits/sec
...
[
3] 0.0-10.0 sec 504 MBytes 423 Mbits/sec
So we see in this
case that we have a link speed of 1Gig, so datarate is 1Gbps, but
maximum throughput is somewhat less. We know we do not have a
100Mbps link, or the max would be much lower than what is observed.
This validates the errata from the chip manufacturer, as expected,
and we match the advertised performance. However, this is an edge
case. It does come up, probably most notably in the consumer space
when adding an Ethernet interface via USB. If USB 1.1 or 2.0, these
speeds can be much less than 1Gbps, so we would run into the same
problem as the data flow limitation will come from the USB bus, not
the network interface datarate.
So where does that
leave us with WiFi and 802.11? It turns actual throughput will be
MUCH less than sticker (i.e. datarate) for three reasons:
- Sticker performance will always be the maximum possible datarate that is physically possible from the system, and includes best case. Best case in this context means fantastic signal to noise ratio (SNR). So basically if you are in sealed chamber, and are sitting on top of the access point (AP), this will be the datarate. It's not really that bad, but the maximum requires a very healthy SNR and datarate will fall as one moves away from the AP. 5GHz does not travel as far as 2.4GHz signals, so will drop faster for a given distance change than would be observed if using 2.4GHz.
-
Protocol overhead is significant in the 802.11 world to improve robustness; one of the biggest issues with wireless is the pervasive issue of packet loss. In the wired world, especially on LANs, packet loss is minimal from the network (not always true, but usually on a good network). However, wireless networks are transient as things come and go that affect the RF environment so we have to manage this packet loss as well as provide for other signaling capabilities. For example, in general, every unicast frame is ACKd per the 802.11 protocol which takes time and bandwidth. Also APs send beacons and other control and management frames either per schedule or as needed to maintain the functioning of the network (e.g. beacons, probe responses, RTS/CTS, etc). All of this extra traffic takes bandwidth away from clients trying to send data.
-
Finally, the RF environment used by WiFi on a given channel is shared. It's inherently half duplex: if one station is communicating, all others must defer; only one gets the network at a time. Obviously, the more stations that are trying to communicate, the less time available for any one station to make use of the network to transmit data. This is evaluated as channel utilization but can be a somewhat elusive number; in some ways it can be useful to think of channel quantity as time; there is only so much available time to access the channel and when it is gone, it is gone. So we want to optimize what is done during that time; the more we can transfer in a given period of time, the more we can get out of the limited resource available.
So what can we get
out of a given WiFi radio in a particular environment? What types of
throughput are possible? What is the limiting resource that we have
to manage?
In evaluating the
maximum throughput, we can look at our three reasons of why we don't
get sticker performance and see if we can fix some of them. Those
that are left we can then design some experiments to observe and
evaluate the behavior.
For SNR, we can fix
the test devices in known locations ensuring a healthy SNR so that
communications will typically be at highest possible datarates. For
protocol overhead, there is not much to do as the protocol is what it
is: it behaves in a particular way, per the standard, and that is it.
We will just try not to make design choices that would exacerbate
this effect: use typical best practice settings. For instance,
beacon intervals are typically at just about100ms; we could
manipulate the amount of protocol overhead in a number of ways:
-
Decrease the beacon interval, so more beacons are present for a given time period. This also affects the channel utilization as more frames means higher utilization.
-
Decrease the datarate of the control and management frames: for a fixed size frame, a lower datarate will take more time to transmit. This takes time away from other stations that want access to the shared RF resource.
-
Add additional SSIDs, where each SSID will send a beacon at the given datarate. There are spreadsheets available on the Internet which caclulate the overhead due to beacon datarate and number of SSIDs (http://revolutionwifi.blogspot.com/p/ssid-overhead-calculator.html).
However, if we fix
these items to what best practice might be, this can eliminate some
factors in our evaluation. So we can plan to:
- Leave beacon timing at default
-
Use a channel with few other interfering devices
-
Use a datarate for management and control frames that is consistent with a high density design, so we want high datarate here (the physics of it shows that higher datarate frames do not travel as far, and for high density designs, we want small, high performance WiFi cells). On 5GHz, lets choose 24Mbps by setting the lowest mandatory rate to 24. All other 802.11a rates are supported.
So the last factor
to evaluate is channel utilization. This is in fact the precious
resource that needs to be managed; there is only so much, and we
really want to optimize the time that we have when accessing the
resource to maximize the throughput.
Thought experiment –
in a vacuum, compare the following scenarios:
- Sending 100Mb at 1Mbps datarate – expect that it might take 100sec to transfer 100Mb (bit, just so numbers are easy to calculate). What would channel utilization be?
-
Sending 100Mb at 100Mbps datarate – expect that it might take 1sec to transfer 100Mb at this datarate. This is obviously better than the 1Mbps datarate case as it goes much faster. So what would channel utilization be in this case?
For case 1, since the station is transmitting for 100sec, the network
is blocked the whole time (ignoring other network effects for
simplicity). So the channel utilization will be 100% for this
100sec; there is no time left for other devices to do anything.
For case 2, only while station is transmitting will the utilization
be counted; so for the 1sec it is 100%, then for the other 99sec, it
would be 0%, and we could do a weighted average and show that it is
much lower, on average, than 100%.
I often hear for low throughput requirements: “we don't need
802.11n, or 802.11ac – we don't move a lot of traffic”. It's not
really true; using the higher datarate modulations available from
802.11n/ac allow for more optimum use of the channel; though
throughput requirement may be low, a high datarate allows for channel
access, transmission, then release of the channel faster, allowing
more time for others to access the channel and do useful work.
A couple of points:
- It's tough to get 100% channel utilization; maximum usable appears to be about 85% when in heavy operation. Values higher usually indicate some type of problem. Values this high will starve out stations as well, so operating even close to hear will be problematic.
-
It would be nice to trend channel utilization over time on all channels to see what is happening. In the event of an issue, we could look here first to see if heavy utilization correlates to the problem.
For a simple test, we can use the beacons to tell us channel utilization if the QBSS load element is available:
Experiment: how does actual throughput and channel utilization change as we vary datarate? We can vary the datarate by changing the supported MCS Index values at the AP; since wifi client and AP communication is a sort of negotiation (i.e. both sides announce what they support, and the highest common parameters are usually chosen). We can use a laptop for test with an Intel 7265 wifi radio; this supports 802.11abgn-ac
2x2:2 adapter and we will work on a 5GHz/20MHz channel. The chipset will actually do 40 and 80MHz bandwidth, but for testing, we will limit to 20MHz so we can easily see the effect on channel utilization on a single channel.
The MCS Index table for support in this case: 144.4Mbps
should be max with 802.11ac capability
disabled (Intel 7265).
Wireless client is iperf3 test client (so TCP client, and sends data) and report read
receiver values (they do differ, slightly - due to packet loss maybe?)
admin@kali:~$
iperf3 -c 192.168.30.21 -f m -i 1 -t 10
Connecting
to host 192.168.30.21, port 5201
[ 5]
local 192.168.30.157 port 38148 connected to 192.168.30.21 port 5201
[ ID]
Interval Transfer Bitrate Retr Cwnd
[ 5]
0.00-1.00 sec 5.63 MBytes 47.2 Mbits/sec 1 269 KBytes
[ 5]
1.00-2.00 sec 13.7 MBytes 115 Mbits/sec 0 570 KBytes
[ 5]
2.00-3.00 sec 13.5 MBytes 114 Mbits/sec 0 706 KBytes
[ 5]
3.00-4.00 sec 13.8 MBytes 115 Mbits/sec 0 706 KBytes
[ 5]
4.00-5.00 sec 12.5 MBytes 105 Mbits/sec 0 748 KBytes
[ 5]
5.00-6.00 sec 12.5 MBytes 105 Mbits/sec 0 783 KBytes
[ 5]
6.00-7.00 sec 5.00 MBytes 41.9 Mbits/sec 41 256 KBytes
[ 5]
7.00-8.00 sec 12.5 MBytes 105 Mbits/sec 0 554 KBytes
[ 5]
8.00-9.00 sec 12.5 MBytes 105 Mbits/sec 0 621 KBytes
[ 5]
9.00-10.00 sec 13.8 MBytes 115 Mbits/sec 0 738 KBytes
- - - - -
- - - - - - - - - - - - - - - - - - - -
[ ID]
Interval Transfer Bitrate Retr
[ 5]
0.00-10.00 sec 115 MBytes 96.8 Mbits/sec 42 sender
[ 5]
0.00-10.04 sec 112 MBytes 93.3 Mbits/sec
receiver
To get at RSSI we want to look at both sides as we need frames to go back and forth; if the AP power is much higher than the wireless client, then it's entirely possible for the client to continue to hear the AP, but the AP cannot hear the client which causes roaming problems as roaming is typically a client decision. Buying a more powerful AP for better coverage is not always the answer; actually turning down the AP power to balance with the clients is often a better design decision. From the client we can get RSSI signal strength from the AP:
admin@kali:~$
iwconfig
wlan0
IEEE 802.11 ESSID:"LWAPV6"
Mode:Managed Frequency:5.825 GHz Access Point: A0:E0:AF:4E:B0:6F
Bit Rate=144.4 Mb/s Tx-Power=22 dBm
Retry short limit:7 RTS thr:off Fragment thr:off
Power Management:on
Link Quality=66/70 Signal level=-44 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:1 Missed beacon:0
The Cisco AP will give us the signal strength of the wireless client from the AP point of view:
It's also common that AP power is greater than wireless client power, so the client RSSI as measured at the AP will likely be the minimum of the two values. However, there is no guarantee of this being true.
The Intel client will use the highest datarate available, legacy or HT, not necessarily the highest MCS Index rate. Therefore, to configure the test let's set as follows:
For a typical high density deployment, I would likely choose different values here.
Results
To
interpret, we can see that as MCS Index increases, so does the actual
throughput (red squares). Indices 8-11 show some abnormality, to be
explained. Notice, though, the channel utilization is not very
sensitive, at all, to the actual throughput or the datarate in use;
it hovers just over 90% for all tests, regardless of the actual
datarates in use.
To
explain the issue with indices 8-11, we can see that the maximum
projected datarate actually falls. Reference the MCS index table;
indices 0-7 for HT (i.e. 802.11n) are for single stream; when we move
to indices 8-15, these are two spatial streams. But note that Index 8
has a maximum datarate of 14.4 Mbps; this is well below index 7 value
of 72.2Mbps. What is observed via an OTA capture is that the client
(in this case an Intel 7265) seems to choose the highest datarate
available, not necessarily the highest MCS Index. So until the two
stream datarate exceeds the single stream, the maximum single stream
rate of 72.2Mbps is selected for transmission. Note that this
behavior could be very chipset and version dependent. Other systems
could behave very different; it appears that the Cisco AP utilizes
the maximum MCS Index as part of the rate selection algorithm. Since
the bulk data transfer is upstream with wireless as a client to a
wired server, the maximum throughput is heavily dependent on the rate
selection of the client, in this case the Intel chipset. Perhaps
this explain the slight increase in throughput as MCS Index increases
from 8 to 12; the client would be fixed in Tx datarate (MCS 7 single
stream datarate of 72.2) but the AP increases it's datarate.
The last datapoint is with 802.11ac rates enabled; in this case, it
would be 2SS (spatial stream) VHT Index 8 for a datarate of
173.3Mbps.
Observe
also that actual iperf throughput (in TCP mode) is always below the
datarate; due to the extensive amount of control and management
traffic, which uses a rate selection of relatively low basic rates so it's difficult to match the actual throughput with datarate. Note,
also, that this implies a number of things when trying to capture
traffic: it's very unlikely you won't see the client at all; if so,
it's likely not due to a modulation mismatch but more likely that the
capture is for the wrong channel, or the adapter does not support
promiscuous mode, SGI not supported, etc. The ACKs/Block ACKs are at
low datarates so would be picked up even if the data frames (usually
QoS-Data) might be missed due to too high a modulation, but evidence
will still exist for this MAC address in the form of control traffic.
So we can see that we can get more real throughput with higher datarates, but the channel utilization is the same: we consume channel time (the precious resource to conserve) when transmitting so the higher the datarates in use, the more real throughput is available for a given amount of channel utilization.
Some ideas as to go forward:
- Datarate is usually not artificially limited in this way by configuration; this is a test. What really happens is the SNR is reduced due to distance or other obstructions in the RF path. This will cause a reduction in throughput as well.
- It would be nice to get a rollup of channel utilization, on all channels, as part of a comprehensive data collection initiative that we might get with a typical network manager. Something like this:
Graphs for all channels would be useful.
A GHz wireless speed test is crucial for checking your Wi-Fi performance. It helps you optimize your network for better speed and coverage, ensuring a reliable connection for all your online activities.
ReplyDelete