Commit Graph

4573 Commits

Author SHA1 Message Date
Alexander V. Chernikov
ae01d73c04 Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
Gleb Smirnoff
7525c48111 In m_megapullup() instead of reserving some space at the end of packet,
m_align() it, reserving space to prepend data.

Reviewed by:	mav
2013-03-17 07:37:10 +00:00
Gleb Smirnoff
aa8bd99d99 - Replace compat macros with function calls. 2013-03-16 08:58:28 +00:00
Gleb Smirnoff
3c26f4a9bc We can, and should use M_WAITOK here.
Sponsored by:	Nginx, Inc.
2013-03-15 13:10:06 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
39f6074e2e - Use m_getcl() instead of hand allocating.
Sponsored by:	Nginx, Inc.
2013-03-15 12:53:53 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Gleb Smirnoff
f4562a299c Remove LIBALIAS_LOCK_ASSERT(), including a couple with an uninitialzed
argument, in code that isn't compiled in kernel.

PR:		kern/176667
Sponsored by:	Nginx, Inc.
2013-03-11 12:22:44 +00:00
Lawrence Stewart
1e0e83d760 The hashmask returned by hashinit() is a valid index in the returned hash array.
Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in
hashdestroy() by ensuring the last array index in the flow counter hash table is
flushed of entries.

MFC after:	3 days
2013-03-07 04:42:20 +00:00
Davide Italiano
5b999a6be0 - Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with:	mav
Reviewed by:	attilio, bde, luigi, phk
Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo (amd64, sparc64), marius (sparc64), ian (arm),
		markj (amd64), mav, Fabian Keil
2013-03-04 11:09:56 +00:00
Michael Tuexen
e045904fdc Fix a potential race in returning setting errno when an
association goes down.
Reported by Mozilla in
https://bugzilla.mozilla.org/show_bug.cgi?id=845513

MFC after: 3 days
2013-02-27 19:51:47 +00:00
Andrew Gallatin
e5ca1ffab5 Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED.
Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4
checksum is correct. Without this fix, the tcp_lro code will reject
good IPv4 traffic from drivers that do not implement IPv4 header
harder csum offload.

Sponsored by: Myricom Inc.

MFC after:	7 days
2013-02-21 17:00:35 +00:00
Sergey Kandaurov
46f2df9c13 ip_savecontrol() style fixes. No functional changes.
- fix indentation
- put the operator at the end of the line for long statements
- remove spaces between the type and the variable in a cast
- remove excessive parentheses

Tested by:	md5
2013-02-20 15:44:40 +00:00
Michael Tuexen
2416af26a0 Send the adaptation layer indication only if set by the user.
MFC after: 3 days
Discussed with: rrs
2013-02-11 21:02:49 +00:00
Michael Tuexen
c53f854a17 Don't send kernel provided information in the User Initiated
ABORT cause, since the user can also provide this kind of
information. So the receiver doesn't know who provided the
information.
While there: Fix a bug where the stack would send a malformed
ABORT chunk when using a send() call with SCTP_ABORT|SCT_SENDALL
flags.

MFC after: 3 days
2013-02-11 13:57:03 +00:00
Gleb Smirnoff
24421c1c32 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
Michael Tuexen
f0d44a49a0 Make sure that received packets for removed addresses are handled
consistently. While there, make variable names consistent.

MFC after: 3 days
2013-02-10 19:57:19 +00:00
Michael Tuexen
a1cb341b5d Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK
only the supported address types. While there, do some whitespace
cleanups.

MFC after: 1 week
2013-02-09 17:26:14 +00:00
Michael Tuexen
c39cfa1f7e Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or
SHUTDOWN_ACK_SENT state. While there, make the corresponding
code consistent.

MFC after: 1 week
2013-02-09 08:27:08 +00:00
John Baldwin
0d25fab44d Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after:	2 weeks
2013-02-01 15:32:20 +00:00
Andre Oppermann
cda3447bb0 uma_zone_set_max() directly returns the rounded effective zone
limit.  Use the return value directly instead of doing a second
uma_zone_set_max() step.

MFC after:	1 week
2013-02-01 14:21:09 +00:00
Gleb Smirnoff
498944374f - Move AUTHORS and ACKNOWLEDGEMENTS to the end of the page.
- Add myself to list of authors.
2013-01-31 10:29:22 +00:00
Gleb Smirnoff
9711a168b9 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
Gleb Smirnoff
ea26ed7eea Utilize m_get2() to get mbuf of appropriate size. 2013-01-30 18:40:19 +00:00
Navdeep Parhar
adfaf8f6ad Add checks for SO_NO_OFFLOAD in a couple of places that I missed earlier
in r245915.
2013-01-26 01:41:42 +00:00
Navdeep Parhar
20be068c8a Teach toe_l2_resolve to resolve IPv6 destinations too.
Reviewed by:	bz@
2013-01-26 00:57:29 +00:00
Navdeep Parhar
4364ec0852 Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by:	bz@
2013-01-25 23:58:21 +00:00
Navdeep Parhar
460cf046c2 There is no need to call into the TOE driver twice in pru_rcvd (tod_rcvd
and then tod_output right after that).

Reviewed by:	bz@
2013-01-25 22:50:52 +00:00
Navdeep Parhar
464dfeb43f Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the one
that exists for IPv4.

Reviewed by:	bz@
2013-01-25 22:16:35 +00:00
Navdeep Parhar
b218348bc3 Teach toe_4tuple_check() to deal with IPv6 4-tuples too.
Reviewed by:	bz@
2013-01-25 20:45:24 +00:00
Navdeep Parhar
37cc0ecb1b Heed SO_NO_OFFLOAD.
MFC after:	1 week
2013-01-25 20:23:33 +00:00
Navdeep Parhar
5cd3dcaa25 Remove redundant test, we know inp_lport is 0.
MFC after:	1 week
2013-01-25 20:14:27 +00:00
John Baldwin
1d77fa5a26 Use decimal values for UDP and TCP socket options rather than hex to avoid
implying that these constants should be treated as bit masks.

Reviewed by:	net
MFC after:	1 week
2013-01-22 19:45:04 +00:00
Lawrence Stewart
5b648e797b Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited"
logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are
unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could
therefore potentially corrupt the result (although under normal operation,
neither variable should legitmately exceed 32 bits).

[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html

Submitted by:	jhb
MFC after:	1 week
2013-01-22 09:44:21 +00:00
John Baldwin
6c0ef8957f Don't drop options from the third retransmitted SYN by default. If the
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not.  This can result in
very slow data transfers in the case of window scaling disagreements.

The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.

Reviewed by:	net@
MFC after:	2 weeks
2013-01-09 20:27:06 +00:00
Peter Wemm
8a1163e82f Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
2013-01-03 10:21:28 +00:00
Michael Tuexen
11e03b3200 Some cleanups.
MFC after: 3 days
2012-12-27 08:10:58 +00:00
Michael Tuexen
72c123a8b4 Minor cleanups of debug messages.
MFC after: 3 days
2012-12-27 08:06:58 +00:00
Michael Tuexen
2c2e3218cb Fix a copy and paste error.
MFC after: 3 days
2012-12-27 08:02:58 +00:00
Gleb Smirnoff
c4d0697685 Garbage collect carp_cksum(). 2012-12-25 14:29:38 +00:00
Gleb Smirnoff
7951008b47 Change net.inet.carp.demotion sysctl to add the supplied value
to the current demotion factor instead of assigning it.

  This allows external scripts to control demotion factor together
with kernel in a raceless manner.
2012-12-25 14:08:13 +00:00
Gleb Smirnoff
e8db9937f3 Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,
and arg2 doesn't pass size of arg1.
2012-12-25 13:55:21 +00:00
Gleb Smirnoff
468e45f3bd The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

  However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

  This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
2012-12-25 13:01:58 +00:00
Gleb Smirnoff
3e6c8b5366 Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.
2012-12-24 21:35:48 +00:00
Gleb Smirnoff
b8056fae06 Fix !INET6 build after r244365. 2012-12-18 08:14:16 +00:00
Gleb Smirnoff
dd029d52fa Clear correct flag in INET6 case. 2012-12-18 08:09:44 +00:00
Andrey V. Elsukov
f491274582 Since we use different flags to detect tcp forwarding, and we share the
same code for IPv4 and IPv6 in tcp_input, we should check both
M_IP_NEXTHOP and M_IP6_NEXTHOP flags.

MFC after:	3 days
2012-12-17 20:55:33 +00:00
Gleb Smirnoff
b1ec2940af Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
Gleb Smirnoff
78a7880f64 Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it,
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.

Reported & tested by:	Pawel Tyll <ptyll nitronet.pl>
MFC after:		3 days
2012-12-12 17:41:21 +00:00
Michael Tuexen
cca6f4a8f3 Get it compiling without INET and INET6 support (mainly userland stack).
MFC after: 2 weeks
2012-12-08 15:11:09 +00:00