There is a complex race in in_pcblookup_hash() and in_pcblookup_group().

Both functions need to obtain lock on the found PCB, and they can't do
classic inter-lock with the PCB hash lock, due to lock order reversal.
To keep the PCB stable, these functions put a reference on it and after PCB
lock is acquired drop it. If the reference was the last one, this means
we've raced with in_pcbfree() and the PCB is no longer valid.

  This approach works okay only if we are acquiring writer-lock on the PCB.
In case of reader-lock, the following scenario can happen:

  - 2 threads locate pcb, and do in_pcbref() on it.
  - These 2 threads drop the inp hash lock.
  - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock,
    does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which
    doesn't free the pcb due to two references on it. Then it unlocks the pcb.
  - 2 aforementioned threads acquire reader lock on the pcb and run
    in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues,
    second gets 0 and considers pcb freed, returns.
  - The thread that got 1 continutes working with detached pcb, which later
    leads to panic in the underlying protocol level.

  To plumb that problem an additional INPCB flag introduced - INP_FREED. We
check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend
that that was the last reference.

Discussed with:		rwatson, jhb
Reported by:		Vladimir Medvedkin <medved rambler-co.ru>
This commit is contained in:
Gleb Smirnoff 2012-10-02 12:03:02 +00:00
parent 6320afb5c5
commit df4e91d386
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=241129
2 changed files with 12 additions and 1 deletions

View File

@ -1105,8 +1105,17 @@ in_pcbrele_rlocked(struct inpcb *inp)
INP_RLOCK_ASSERT(inp);
if (refcount_release(&inp->inp_refcount) == 0)
if (refcount_release(&inp->inp_refcount) == 0) {
/*
* If the inpcb has been freed, let the caller know, even if
* this isn't the last reference.
*/
if (inp->inp_flags2 & INP_FREED) {
INP_RUNLOCK(inp);
return (1);
}
return (0);
}
KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL", __func__));
@ -1186,6 +1195,7 @@ in_pcbfree(struct inpcb *inp)
inp_freemoptions(inp->inp_moptions);
#endif
inp->inp_vflag = 0;
inp->inp_flags2 |= INP_FREED;
crfree(inp->inp_cred);
#ifdef MAC
mac_inpcb_destroy(inp);

View File

@ -542,6 +542,7 @@ void inp_4tuple_get(struct inpcb *inp, uint32_t *laddr, uint16_t *lp,
#define INP_RT_VALID 0x00000002 /* cached rtentry is valid */
#define INP_PCBGROUPWILD 0x00000004 /* in pcbgroup wildcard list */
#define INP_REUSEPORT 0x00000008 /* SO_REUSEPORT option is set */
#define INP_FREED 0x00000010 /* inp itself is not valid */
/*
* Flags passed to in_pcblookup*() functions.