Handle a rare edge case with nearly full TCP receive buffers. If a TCP

buffer fills up causing the remote sender to enter into persist mode, but
there is still room available in the receive buffer when a window probe
arrives (either due to window scaling, or due to the local application
very slowing draining data from the receive buffer), then the single byte
of data in the window probe is accepted.  However, this can cause rcv_nxt
to be greater than rcv_adv.  This condition will only last until the next
ACK packet is pushed out via tcp_output(), and since the previous ACK
advertised a zero window, the ACK should be pushed out while the TCP
pcb is write-locked.

During the window while rcv_nxt is greather than rcv_adv, a few places
would compute the remaining receive window via rcv_adv - rcv_nxt.
However, this value was then (uint32_t)-1.  On a 64 bit machine this
could expand to a positive 2^32 - 1 when cast to a long.  In particular,
when calculating the receive window in tcp_output(), the result would be
that the receive window was computed as 2^32 - 1 resulting in advertising
a far larger window to the remote peer than actually existed.

Fix various places that compute the remaining receive window to either
assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the
window as full if rcv_nxt is greather than rcv_adv.

Reviewed by:	bz
MFC after:	1 month
This commit is contained in:
John Baldwin 2011-05-02 21:05:52 +00:00
parent 3b0f406639
commit f701e30d7f
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=221346
3 changed files with 22 additions and 6 deletions

View File

@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
win = sbspace(&so->so_rcv);
if (win < 0)
win = 0;
KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
tp->rcv_adv, tp->rcv_nxt));
tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
/* Reset receive buffer auto scaling when not in bulk receive mode. */
@ -2868,7 +2871,10 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
* buffer size.
* XXX: Unused.
*/
len = so->so_rcv.sb_hiwat - (tp->rcv_adv - tp->rcv_nxt);
if (SEQ_GT(tp->rcv_adv, tp->rcv_nxt))
len = so->so_rcv.sb_hiwat - (tp->rcv_adv - tp->rcv_nxt);
else
len = so->so_rcv.sb_hiwat;
#endif
} else {
m_freem(m);

View File

@ -561,15 +561,21 @@ tcp_output(struct tcpcb *tp)
* taking into account that we are limited by
* TCP_MAXWIN << tp->rcv_scale.
*/
long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
(tp->rcv_adv - tp->rcv_nxt);
long adv;
int oldwin;
adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale);
if (SEQ_GT(tp->rcv_adv, tp->rcv_nxt)) {
oldwin = (tp->rcv_adv - tp->rcv_nxt);
adv -= oldwin;
} else
oldwin = 0;
/*
* If the new window size ends up being the same as the old
* size when it is scaled, then don't force a window update.
*/
if ((tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale ==
(adv + tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale)
if (oldwin >> tp->rcv_scale == (adv + oldwin) >> tp->rcv_scale)
goto dontupdate;
if (adv >= (long) (2 * tp->t_maxseg))
goto send;
@ -1008,7 +1014,8 @@ tcp_output(struct tcpcb *tp)
if (recwin < (long)(so->so_rcv.sb_hiwat / 4) &&
recwin < (long)tp->t_maxseg)
recwin = 0;
if (recwin < (long)(tp->rcv_adv - tp->rcv_nxt))
if (SEQ_GT(tp->rcv_adv, tp->rcv_nxt) &&
recwin < (long)(tp->rcv_adv - tp->rcv_nxt))
recwin = (long)(tp->rcv_adv - tp->rcv_nxt);
if (recwin > (long)TCP_MAXWIN << tp->rcv_scale)
recwin = (long)TCP_MAXWIN << tp->rcv_scale;

View File

@ -242,6 +242,9 @@ tcp_twstart(struct tcpcb *tp)
/*
* Recover last window size sent.
*/
KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
tp->rcv_adv, tp->rcv_nxt));
tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale;
/*