Commit Graph

4604 Commits

Author SHA1 Message Date
tuexen
cd30dfafe0 Initialize the fibnum for outgoing packets to 0. This avoids
crashing due to the usage of uninitialized fibnum.
This bugs became visiable after
http://svnweb.freebsd.org/changeset/base/250700

MFC after: 2 weeks
2013-05-19 16:06:43 +00:00
tuexen
7d22df2a4c Set errno to ETIMEDOUT if an SCTP association times out during
setup.

MFC after: 1 week
2013-05-17 22:26:05 +00:00
tuexen
d482c3688d Don't send an ABORT chunk with verification 0.
MFC after: 1 week
2013-05-17 21:45:52 +00:00
jimharris
7c58f64221 Fix typo in net.inet.tcp.minmss sysctl description.
MFC after:	3 days
2013-05-13 19:55:27 +00:00
hrs
960e7e9fd4 Add IFF_MONITOR support to gre(4).
Tested by:	Chip Marshall
MFC after:	1 week
2013-05-11 19:05:38 +00:00
glebius
f43ee707dd Rate limit the number of remotely triggered ARP log messages
to 1 log message per second.
2013-05-11 10:51:32 +00:00
tuexen
6ea39edf93 Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY
socket option for SCTP sockets in the same way as for UDP or TCP
sockets.

MFC after: 2 weeks
2013-05-10 18:09:38 +00:00
andre
cc8c6e4d01 Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
2013-05-06 16:42:18 +00:00
hrs
d9d71436d9 Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address.  Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one.  When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by:		ume
Based on work by:	Thomas Scheffler
PR:			conf/174957
MFC after:		2 weeks
2013-05-04 19:16:26 +00:00
cperciva
fcb6743484 Move IPPROTO_IPV6 from #ifdef __BSD_VISIBLE to #if __POSIX_VISIBLE >= 201112
since POSIX 2001 states that it shall be defined.

Reported by:	sbruno
Reviewed by:	jilles
MFC after:	1 week
2013-04-27 23:36:01 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
glebius
8b3c8983bc Fix couple of mbuf leaks in incoming ARP processing. 2013-04-25 17:38:04 +00:00
glebius
dd30d87e8d Introduce a pointer to const variable gw, which points either at the
same place as dst, or to the sockaddr in the routing table.

The const constraint of gw makes us safe from modifing routing table
accidentially. And "onstantness" of dst allows us to remove several
bandaids, when we switched it back at &ro->ro_dst, now it always
points there.

Reviewed by:	rrs
2013-04-25 12:42:09 +00:00
rrs
d90354b019 This fixes the issue with the "randomly changing" default
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does *not* do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.

The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.

PR:	 174749,157796
MFC after:	1 week
2013-04-24 18:30:32 +00:00
andre
1f1bd98319 When doing RFC3042 limited transmit on the first on second
duplicate ACK make sure we actually have new data to send.
This prevents us from sending unneccessary pure ACKs.

Reported by:	Matt Miller <matt@matthewjmiller.net>
Tested by:	Matt Miller <matt@matthewjmiller.net>
MFC after:	2 weeks
2013-04-23 14:06:32 +00:00
oleg
9917da6df0 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
gabor
188c638b60 - Corrrect mispellings of word useful
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:45:15 +00:00
delphij
8809aa4451 Fix incomplete printf.
PR:		kern/177889
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:32:12 +00:00
delphij
54d1fdc9a1 Don't leak lock when returning.
PR:		kern/177888
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:25:41 +00:00
ae
d657004b7f Reflect removing of the counter_u64_subtract() function in the macro. 2013-04-12 16:29:15 +00:00
glebius
d07f9bea4e Fix tcp_output() so that tcpcb is updated in the same manner when an
mbuf allocation fails, as in a case when ip_output() returns error.

To achieve that, move large block of code that updates tcpcb below
the out: label.

This fixes a panic, that requires the following sequence to happen:

1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss
2) The retransmit timeout happened for the SYN we had sent,
   tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output().
   In tcp_output m_get() fails.
3) Later on the SYN|ACK for the SYN sent in step 1) came,
   tcp_input sets tp->snd_una += 1, which leads to
   tp->snd_una > tp->snd_nxt inconsistency, that later panics in
   socket buffer code.

For reference, this bug fixed in DragonflyBSD repo:

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419

Reviewed by:	andre
Tested by:	pho
Sponsored by:	Nginx, Inc.
PR:		kern/177456
Submitted by:	HouYeFei&XiBoLiu <lglion718 163.com>
2013-04-11 18:23:56 +00:00
glebius
e79bb9704b Fix build. 2013-04-10 08:09:25 +00:00
andre
306fddaf78 Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.
2013-04-09 21:02:20 +00:00
andre
f70f4c314a Fix a race condition on tcp listen socket teardown with pending
connections in the accept queue and contiguous new incoming SYNs.

Compared to the original submitters patch I've moved the test
next to the SYN handling to have it together in a logical unit
and reworded the comment explaining the issue.

Submitted by:	Matt Miller <matt@matthewjmiller.net>
Submitted by:	Juan Mojica <jmojica@gmail.com>
Reviewed by:	Matt Miller (changes)
Tested by:	pho
MFC after:	1 week
2013-04-09 20:52:26 +00:00
glebius
99469b7e8a Fix VIMAGE build. 2013-04-09 09:15:26 +00:00
ae
844d612b2a Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after:	1 week
2013-04-09 07:11:22 +00:00
glebius
2a0fbb38ed Merge from projects/counters: TCP/IP stats.
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).

  This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.

Sponsored by:	Nginx, Inc.
2013-04-08 19:57:21 +00:00
tuexen
c19a395776 Add a macro for checking for IPv4 link local addresses.
MFC after: 1 week
2013-03-31 18:27:46 +00:00
emaste
35fe5219ab Keep fwd_tag around for subsequent pcb lookups
For TIMEWAIT handling tcp_input may have to jump back for an additional
pass through pcblookup.  Prior to this change the fwd_tag had been
discarded after the first lookup, so a new connection attempt delivered
locally via 'ipfw fwd' would fail to find a match.

As of r248886 the tag will be detached and freed when passed to the
socket buffer.
2013-03-29 20:51:44 +00:00
melifaro
31a6358fff Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
glebius
09c3ea7c84 In m_megapullup() instead of reserving some space at the end of packet,
m_align() it, reserving space to prepend data.

Reviewed by:	mav
2013-03-17 07:37:10 +00:00
glebius
f4837a2e63 - Replace compat macros with function calls. 2013-03-16 08:58:28 +00:00
glebius
fcc62f8147 We can, and should use M_WAITOK here.
Sponsored by:	Nginx, Inc.
2013-03-15 13:10:06 +00:00
glebius
b37af62b9e Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
glebius
d9a8b8e5e6 - Use m_getcl() instead of hand allocating.
Sponsored by:	Nginx, Inc.
2013-03-15 12:53:53 +00:00
glebius
37a43650ed Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
glebius
6feb84d64e Remove LIBALIAS_LOCK_ASSERT(), including a couple with an uninitialzed
argument, in code that isn't compiled in kernel.

PR:		kern/176667
Sponsored by:	Nginx, Inc.
2013-03-11 12:22:44 +00:00
lstewart
0d18b7b93e The hashmask returned by hashinit() is a valid index in the returned hash array.
Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in
hashdestroy() by ensuring the last array index in the flow counter hash table is
flushed of entries.

MFC after:	3 days
2013-03-07 04:42:20 +00:00
davide
431035cf16 - Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with:	mav
Reviewed by:	attilio, bde, luigi, phk
Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo (amd64, sparc64), marius (sparc64), ian (arm),
		markj (amd64), mav, Fabian Keil
2013-03-04 11:09:56 +00:00
tuexen
8c2c9fac8b Fix a potential race in returning setting errno when an
association goes down.
Reported by Mozilla in
https://bugzilla.mozilla.org/show_bug.cgi?id=845513

MFC after: 3 days
2013-02-27 19:51:47 +00:00
gallatin
8189336983 Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED.
Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4
checksum is correct. Without this fix, the tcp_lro code will reject
good IPv4 traffic from drivers that do not implement IPv4 header
harder csum offload.

Sponsored by: Myricom Inc.

MFC after:	7 days
2013-02-21 17:00:35 +00:00
pluknet
5be772ca10 ip_savecontrol() style fixes. No functional changes.
- fix indentation
- put the operator at the end of the line for long statements
- remove spaces between the type and the variable in a cast
- remove excessive parentheses

Tested by:	md5
2013-02-20 15:44:40 +00:00
tuexen
3017e3b84b Send the adaptation layer indication only if set by the user.
MFC after: 3 days
Discussed with: rrs
2013-02-11 21:02:49 +00:00
tuexen
ccc66b91fb Don't send kernel provided information in the User Initiated
ABORT cause, since the user can also provide this kind of
information. So the receiver doesn't know who provided the
information.
While there: Fix a bug where the stack would send a malformed
ABORT chunk when using a send() call with SCTP_ABORT|SCT_SENDALL
flags.

MFC after: 3 days
2013-02-11 13:57:03 +00:00
glebius
a47c0295c5 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
tuexen
6e3d604447 Make sure that received packets for removed addresses are handled
consistently. While there, make variable names consistent.

MFC after: 3 days
2013-02-10 19:57:19 +00:00
tuexen
026c1e8b1a Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK
only the supported address types. While there, do some whitespace
cleanups.

MFC after: 1 week
2013-02-09 17:26:14 +00:00
tuexen
6b769b5afe Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or
SHUTDOWN_ACK_SENT state. While there, make the corresponding
code consistent.

MFC after: 1 week
2013-02-09 08:27:08 +00:00
jhb
b55183a894 Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after:	2 weeks
2013-02-01 15:32:20 +00:00
andre
922fff4953 uma_zone_set_max() directly returns the rounded effective zone
limit.  Use the return value directly instead of doing a second
uma_zone_set_max() step.

MFC after:	1 week
2013-02-01 14:21:09 +00:00