If the stored configurations for an SSID have changed, we can no longer
trust the current blacklist state of that SSID, since the updated
configs could change the behavior of the network. E.g., the BSS could be
blacklisted due to a bad password, and the config could be updated to
store the correct password. In this case, keeping the BSS in the
blacklist will prevent the user from connecting to the BSS after the
correct password has been updated.
Add the value was_changed_recently to the wpa_ssid struct. Update this
value every time a config is changed through wpa_set_config(). Check
this value in wpa_blacklist_get() to clear the blacklist whenever the
configs of current_ssid have changed.
This solution was chosen over simply clearing the blacklist whenever
configs change because the user should be able to change configs on an
inactive SSID without affecting the blacklist for the currently active
SSID. This way, the blacklist won't be cleared until the user attempts
to connect to the inactive network again. Furthermore, the blacklist is
stored per-BSSID while configs are stored per-SSID, so we don't have the
option to just clear out certain blacklist entries that would be
affected by the configs.
Finally, the function wpa_supplicant_reload_configuration() causes the
configs to be reloaded from scratch, so after a call to this function
all bets are off as to the relevance of our current blacklist state.
Thus, we clear the entire blacklist within this function.
Signed-off-by: Kevin Lund <kglund@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
This change adds the function wpa_blacklist_update(), which goes through
all blacklist entries and deletes them if their blacklist expired over
an hour ago. The purpose of this is to remove stale entries from the
blacklist which likely do not reflect the current condition of device's
network surroundings. This function is called whenever the blacklist is
polled, meaning that the caller always gets an up-to-date reflection of
the blacklist.
Another solution to clearing the blacklist that was considered was
to slowly reduce the counts of blacklist entries over time, and delete
them if the counts dropped below 0. We decided to go with the current
solution instead because an AP's "problematic" status is really a binary
thing: either the AP is no longer problematic, or it's still causing us
problems. So if we see any more problems within a reasonable amount of
time, it makes sense to just keep the blacklist where it was since the
AP is likely still undergoing the same issue. If we go a significant
amount of time (semi-arbitrarily chosen as 1 hour) without any issues
with an AP, it's reasonable to behave as if the AP is no longer
undergoing the same issue. If we see more problems at a later time, we
can start the blacklisting process fresh again, treating this as a brand
new issue.
Signed-off-by: Kevin Lund <kglund@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
wpa_supplicant keeps a blacklist of BSSs in order to prevent repeated
associations to problematic APs*. Currently, this blacklist is
completely cleared whenever we successfully connect to any AP. This
causes problematic behavior when in the presence of both a bad AP and
a good AP. The device can repeatedly attempt to roam to the bad AP
because it is clearing the blacklist every time it connects to the good
AP. This results in the connection constantly ping-ponging between the
APs, leaving the user stuck without connection.
Instead of clearing the blacklist, implement timeout functionality which
allows association attempts to blacklisted APs after some time has
passed. Each time a BSS would be added to the blacklist, increase the
duration of this timeout exponentially, up to a cap of 1800 seconds.
This means that the device will no longer be able to immediately attempt
to roam back to a bad AP whenever it successfully connects to any other
AP.
Other details:
The algorithm for building up the blacklist count and timeout duration
on a given AP has been designed to be minimally obtrusive. Starting with
a fresh blacklist, the device may attempt to connect to a problematic AP
no more than 6 times in any ~45 minute period. Once an AP has reached a
blacklist count >= 6, the device may attempt to connect to it no more
than once every 30 minutes. The goal of these limits is to find an
ideal balance between minimizing connection attempts to bad APs while
still trying them out occasionally to see if the problems have stopped.
The only exception to the above limits is that the blacklist is still
completely cleared whenever there are no APs available in a scan. This
means that if all nearby APs have been blacklisted, all APs will be
completely exonerated regardless of their blacklist counts or how close
their blacklist entries are to expiring. When all nearby APs have been
blacklisted we know that every nearby AP is in some way problematic.
Once we know that every AP is causing problems, it doesn't really make
sense to sort them beyond that because the blacklist count and timeout
duration don't necessarily reflect the degree to which an AP is
problematic (i.e. they can be manipulated by external factors such as
the user physically moving around). Instead, its best to restart the
blacklist and let the normal roaming algorithm take over to maximize
our chance of getting the best possible connection quality.
As stated above, the time-based blacklisting algorithm is designed to
be minimally obtrusive to user experience, so occasionally restarting
the process is not too impactful on the user.
*problematic AP: rejects new clients, frequently de-auths clients, very
poor connection quality, etc.
Signed-off-by: Kevin Lund <kglund@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Within wpas_connection_failed(), the 'count' value of wpa_blacklist is
erroneously used as a tally of the number times the device has failed
to associate to a given BSSID without making a successful connection.
This is not accurate because there are a variety of ways a BSS can be
added to the blacklist beyond failed association such as interference
or deauthentication. This 'count' is lost whenever the blacklist is
cleared, so the wpa_supplicant stores an additional value
'extra_blacklist_count' which helps persist the 'count' through clears.
These count values are used to determine how long to wait to rescan
after a failed connection attempt.
While this logic was already slightly wrong, it would have been
completely broken by the upcoming change which adds time-based
blacklisting functionality. With the upcoming change, 'count' values
are not cleared on association, and thus do not necessarily even
approximate the "consecutive connection failures" which they were being
used for.
This change seeks to remove this unnecessary overloading of the
blacklist 'count' by directly tracking consecutive connection failures
within the wpa_supplicant struct, independent of the blacklist. This new
'consecutive_conn_failures' is iterated with every connection failure
and cleared when any successful connection is made. This change also
removes the now unused 'extra_blacklist_count' value.
Signed-off-by: Kevin Lund <kglund@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
wpas_connection_failed() uses the blacklist count to figure out a
suitable time to wait for the next scan. This mechanism did not work
properly in cases where the temporary blacklist gets cleared due to no
other BSSes being available. Address this by maintaining an additional
count of blacklisting values over wpa_blacklist_clear() calls. In
addition, add one more step in the count to timeout mapping to go to 10
second interval if more than four failures are seen.
Signed-hostap: Jouni Malinen <j@w1.fi>
There were various issues in how the SME (i.e., nl80211-based driver
interface) handled various authentication and association timeouts and
failures. Authentication failure was not handled at all (wpa_supplicant
just stopped trying to connect completely), authentication timeout
resulted in blacklisting not working in the expected way (i.e., the same
BSS could be selected continuously), and association cases had similar
problems.
Use a common function to handle all these cases and fix the blacklist
operation. Use smaller delay before trying to scan again during the
initial cycle through the available APs to speed up connection. Add
a special case for another-BSS-in-the-same-ESS being present to
speed up recovery from networks with multiple APs doing load balancing
in various odd ways that are deployed out there.