from a management frame transmission.
This bug is a bit loopy, so here goes.
The underlying cause is pretty easy to understand - the node isn't
referenced before passing into the callout, so if the node is deleted
before the callout fires, it'll dereference free'd memory.
The code path however is slightly more convoluted.
The functions _say_ mgt_tx - ie management transmit - which is partially
true. Yes, that callback is attached to the mbuf for some management
frames. However, it's only for frames relating to scanning and
authentication attempts. It helpfully drives the VAP state back to
"SCAN" if the transmission fails _OR_ (as I subsequently found out!)
if the transmission succeeds but the state machine doesn't make progress
towards being authenticated and active.
Now, the code itself isn't terribly clear about this.
It _looks_ like it's just handling the transmit failure case.
However, when you look at what goes on in the transmit success case, it's
moving the VAP state back to SCAN if it hasn't changed state since
the time the callback was scheduled. Ie, if it's in ASSOC or AUTH still,
it'll go back to SCAN. But if it has transitioned to the RUN state,
the comparison will fail and it'll not transition things back to the
SCAN state.
So, to fix this, I decided to leave everything the way it is and merely
fix the locking and remove the node reference.
The _better_ fix would be to turn this callout into a "assoc/auth request"
timeout callback and make the callout locked, thus eliminating all races.
However, until all the drivers have been fixed so that transmit completions
occur outside of any locking that's going on, it's going to be impossible
to do this without introducing LORs. So, I leave some of the evilness
in there.
Tested:
* AR5212, ath(4), STA mode
* 5100 and 4965 wifi, iwn(4), STA mode