FreeBSD-SA-14:19.tcp raised attention to the state of our stack

towards blind SYN/RST spoofed attack.

Originally our stack used in-window checks for incoming SYN/RST
as proposed by RFC793. Later, circa 2003 the RST attack was
mitigated using the technique described in P. Watson
"Slipping in the window" paper [1].

After that, the checks were only relaxed for the sake of
compatibility with some buggy TCP stacks. First, r192912
introduced the vulnerability, just fixed by aforementioned SA.
Second, r167310 had slightly relaxed the default RST checks,
instead of utilizing net.inet.tcp.insecure_rst sysctl.

In 2010 a new technique for mitigation of these attacks was
proposed in RFC5961 [2]. The idea is to send a "challenge ACK"
packet to the peer, to verify that packet arrived isn't spoofed.
If peer receives challenge ACK it should regenerate its RST or
SYN with correct sequence number. This should not only protect
against attacks, but also improve communication with broken
stacks, so authors of reverted r167310 and r192912 won't be
disappointed.

[1] http://bandwidthco.com/whitepapers/netforensics/tcpip/TCP Reset Attacks.pdf
[2] http://www.rfc-editor.org/rfc/rfc5961.txt

Changes made:

o Revert r167310.
o Implement "challenge ACK" protection as specificed in RFC5961
  against RST attack. On by default.
  - Carefully preserve r138098, which handles empty window edge
    case, not described by the RFC.
  - Update net.inet.tcp.insecure_rst description.
o Implement "challenge ACK" protection as specificed in RFC5961
  against SYN attack. On by default.
  - Provide net.inet.tcp.insecure_syn sysctl, to turn off
    RFC5961 protection.

The changes were tested at Netflix. The tested box didn't show
any anomalies compared to control box, except slightly increased
number of TCP connection in LAST_ACK state.

Reviewed by:	rrs
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
This commit is contained in:
Gleb Smirnoff 2014-09-16 11:07:25 +00:00
parent 8a0834ec28
commit 3220a2121c

View File

@ -186,11 +186,17 @@ SYSCTL_VNET_INT(_net_inet_tcp_ecn, OID_AUTO, maxretries, CTLFLAG_RW,
&VNET_NAME(tcp_ecn_maxretries), 0,
"Max retries before giving up on ECN");
VNET_DEFINE(int, tcp_insecure_syn) = 0;
#define V_tcp_insecure_syn VNET(tcp_insecure_syn)
SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, insecure_syn, CTLFLAG_RW,
&VNET_NAME(tcp_insecure_syn), 0,
"Follow RFC793 instead of RFC5961 criteria for accepting SYN packets");
VNET_DEFINE(int, tcp_insecure_rst) = 0;
#define V_tcp_insecure_rst VNET(tcp_insecure_rst)
SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, insecure_rst, CTLFLAG_RW,
&VNET_NAME(tcp_insecure_rst), 0,
"Follow the old (insecure) criteria for accepting RST packets");
"Follow RFC793 instead of RFC5961 criteria for accepting RST packets");
VNET_DEFINE(int, tcp_recvspace) = 1024*64;
#define V_tcp_recvspace VNET(tcp_recvspace)
@ -2044,98 +2050,83 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
* Then check that at least some bytes of segment are within
* receive window. If segment begins before rcv_nxt,
* drop leading data (and SYN); if nothing left, just ack.
*
*
* If the RST bit is set, check the sequence number to see
* if this is a valid reset segment.
* RFC 793 page 37:
* In all states except SYN-SENT, all reset (RST) segments
* are validated by checking their SEQ-fields. A reset is
* valid if its sequence number is in the window.
* Note: this does not take into account delayed ACKs, so
* we should test against last_ack_sent instead of rcv_nxt.
* The sequence number in the reset segment is normally an
* echo of our outgoing acknowlegement numbers, but some hosts
* send a reset with the sequence number at the rightmost edge
* of our receive window, and we have to handle this case.
* Note 2: Paul Watson's paper "Slipping in the Window" has shown
* that brute force RST attacks are possible. To combat this,
* we use a much stricter check while in the ESTABLISHED state,
* only accepting RSTs where the sequence number is equal to
* last_ack_sent. In all other states (the states in which a
* RST is more likely), the more permissive check is used.
* If we have multiple segments in flight, the initial reset
* segment sequence numbers will be to the left of last_ack_sent,
* but they will eventually catch up.
* In any case, it never made sense to trim reset segments to
* fit the receive window since RFC 1122 says:
* 4.2.2.12 RST Segment: RFC-793 Section 3.4
*
* A TCP SHOULD allow a received RST segment to include data.
*
* DISCUSSION
* It has been suggested that a RST segment could contain
* ASCII text that encoded and explained the cause of the
* RST. No standard has yet been established for such
* data.
*
* If the reset segment passes the sequence number test examine
* the state:
* SYN_RECEIVED STATE:
* If passive open, return to LISTEN state.
* If active open, inform user that connection was refused.
* ESTABLISHED, FIN_WAIT_1, FIN_WAIT_2, CLOSE_WAIT STATES:
* Inform user that connection was reset, and close tcb.
* CLOSING, LAST_ACK STATES:
* Close the tcb.
* TIME_WAIT STATE:
* Drop the segment - see Stevens, vol. 2, p. 964 and
* RFC 1337.
*/
if (thflags & TH_RST) {
if (SEQ_GEQ(th->th_seq, tp->last_ack_sent - 1) &&
SEQ_LEQ(th->th_seq, tp->last_ack_sent + tp->rcv_wnd)) {
switch (tp->t_state) {
/*
* RFC5961 Section 3.2
*
* - RST drops connection only if SEG.SEQ == RCV.NXT.
* - If RST is in window, we send challenge ACK.
*
* Note: to take into account delayed ACKs, we should
* test against last_ack_sent instead of rcv_nxt.
* Note 2: we handle special case of closed window, not
* covered by the RFC.
*/
if ((SEQ_GEQ(th->th_seq, tp->last_ack_sent) &&
SEQ_LT(th->th_seq, tp->last_ack_sent + tp->rcv_wnd)) ||
(tp->rcv_wnd == 0 && tp->last_ack_sent == th->th_seq)) {
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
KASSERT(ti_locked == TI_WLOCKED,
("%s: TH_RST ti_locked %d, th %p tp %p",
__func__, ti_locked, th, tp));
KASSERT(tp->t_state != TCPS_SYN_SENT,
("%s: TH_RST for TCPS_SYN_SENT th %p tp %p",
__func__, th, tp));
if (V_tcp_insecure_rst ||
tp->last_ack_sent == th->th_seq) {
TCPSTAT_INC(tcps_drops);
/* Drop the connection. */
switch (tp->t_state) {
case TCPS_SYN_RECEIVED:
so->so_error = ECONNREFUSED;
goto close;
case TCPS_ESTABLISHED:
if (V_tcp_insecure_rst == 0 &&
!(SEQ_GEQ(th->th_seq, tp->rcv_nxt - 1) &&
SEQ_LEQ(th->th_seq, tp->rcv_nxt + 1)) &&
!(SEQ_GEQ(th->th_seq, tp->last_ack_sent - 1) &&
SEQ_LEQ(th->th_seq, tp->last_ack_sent + 1))) {
TCPSTAT_INC(tcps_badrst);
goto drop;
}
/* FALLTHROUGH */
case TCPS_FIN_WAIT_1:
case TCPS_FIN_WAIT_2:
case TCPS_CLOSE_WAIT:
so->so_error = ECONNRESET;
close:
KASSERT(ti_locked == TI_WLOCKED,
("tcp_do_segment: TH_RST 1 ti_locked %d",
ti_locked));
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
tcp_state_change(tp, TCPS_CLOSED);
TCPSTAT_INC(tcps_drops);
/* FALLTHROUGH */
default:
tp = tcp_close(tp);
break;
}
} else {
TCPSTAT_INC(tcps_badrst);
/* Send challenge ACK. */
tcp_respond(tp, mtod(m, void *), th, m,
tp->rcv_nxt, tp->snd_nxt, TH_ACK);
tp->last_ack_sent = tp->rcv_nxt;
m = NULL;
}
}
goto drop;
}
case TCPS_CLOSING:
case TCPS_LAST_ACK:
/*
* RFC5961 Section 4.2
* Send challenge ACK for any SYN in synchronized state.
*/
if ((thflags & TH_SYN) && tp->t_state != TCPS_SYN_SENT) {
KASSERT(ti_locked == TI_WLOCKED,
("tcp_do_segment: TH_RST 2 ti_locked %d",
ti_locked));
("tcp_do_segment: TH_SYN ti_locked %d", ti_locked));
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
tp = tcp_close(tp);
break;
}
TCPSTAT_INC(tcps_badsyn);
if (V_tcp_insecure_syn &&
SEQ_GEQ(th->th_seq, tp->last_ack_sent) &&
SEQ_LT(th->th_seq, tp->last_ack_sent + tp->rcv_wnd)) {
tp = tcp_drop(tp, ECONNRESET);
rstreason = BANDLIM_UNLIMITED;
} else {
/* Send challenge ACK. */
tcp_respond(tp, mtod(m, void *), th, m, tp->rcv_nxt,
tp->snd_nxt, TH_ACK);
tp->last_ack_sent = tp->rcv_nxt;
m = NULL;
}
goto drop;
}
@ -2306,20 +2297,6 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
tp->ts_recent = to.to_tsval;
}
/*
* If a SYN is in the window, then this is an
* error and we send an RST and drop the connection.
*/
if (thflags & TH_SYN) {
KASSERT(ti_locked == TI_WLOCKED,
("tcp_do_segment: TH_SYN ti_locked %d", ti_locked));
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
tp = tcp_drop(tp, ECONNRESET);
rstreason = BANDLIM_UNLIMITED;
goto drop;
}
/*
* If the ACK bit is off: if in SYN-RECEIVED state or SENDSYN
* flag is on (half-synchronized state), then queue data for