Commit Graph

4810 Commits

Author SHA1 Message Date
tuexen
bde098f08a Use ENOBUFS instead of ENOMEM in error situations related to m_uiotombuf().
This was suggested by kevlo@.

MFC after: 3 days
2014-06-05 12:51:12 +00:00
kevlo
22c7fedfe4 Fix build UDP-Lite with VIMAGE enabled when building with gcc.
Reported and tested by: Jason Hellenthal
2014-06-03 01:30:32 +00:00
hiren
cc47b6d947 ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by:	Midori Kato (aoimidori27@gmail.com)
Worked with:	Lars Eggert (lars@netapp.com)
Reviewed by:	luigi, hiren
2014-06-01 07:28:24 +00:00
bz
df1284e013 While PAWS is disabled, there are no consumers for the tcp options
argument to tcp_twcheck();  thus mark it __unused.

MFC after:	2 weeks
2014-05-30 22:34:06 +00:00
asomers
7ca8bf0f2c Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
jilles
1effdc7fef netinet/in.h: Expose htonl(), htons(), ntohl() and ntohs() in strict POSIX
mode.

Put the htonl(), htons(), ntohl() and ntohs() declarations under
__POSIX_VISIBLE >= 200112. POSIX.1-2001 and newer require these to be
exposed from <netinet/in.h> (as well as <arpa/inet.h>).

Note that it may be unnecessary to check __POSIX_VISIBLE >= 200112 because
older versions of POSIX and the C standard do not define this header.
However, other places in the same file already perform the check.

PR:		188316
Submitted by:	Christian Neukirchen
2014-05-29 15:23:37 +00:00
adrian
9cac8fd2f5 The users of RSS shouldn't be directly concerned about hash -> CPU ID
mappings.  Instead, they should be first mapping to an RSS bucket and
then querying the RSS bucket -> CPU ID mapping to figure out the target
CPU.

When (if?) RSS rebalancing is implemented or some other (non round-robin)
distribution of work from buckets to CPU IDs, various bits of code - both
userland and kernel - will need to know how this mapping works.

So, to support this:

* Add a new function rss_m2bucket() - this maps an mbuf to a given bucket.
  Anything which is currently doing hash -> CPU work may instead wish to
  do hash -> bucket, and then query the bucket->cpuid map for which
  CPU it belongs on.  Or, map it to a bucket, then re-pin that bucket ->
  CPU during a rebalance operation.

* For userland applications which wish to exploit affinity to RSS buckets,
  the bucket -> CPU ID mapping is now available via a sysctl.
  net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via
  a list of bucket:cpu pairs.
2014-05-27 08:06:20 +00:00
bz
606b94139a Remove the prototpye for the static inline function
tcp_signature_verify_input().
The function is defined before first use already.

MFC after:	2 weeks
2014-05-24 15:31:40 +00:00
bz
cb0049082b syncache_lookup() is a file local function. Make it static and
take it out of the public KPI; seems it was never used elsewhere.

MFC after:	2 weeks
2014-05-24 15:03:36 +00:00
bz
618ca8446a Make tcp_twrespond() file local private; this removes it from the
public KPI; it is not used anywhere else and seems it never was.

MFC after:	2 weeks
2014-05-24 14:01:18 +00:00
bz
b513252b3a Remove the prototypes for things that are no longer file local but were
moved to the header file.

Pointy hat to:	clang || bz
MFC after:	2 weeks
X-MFC with:	r266596
Reported by:	gcc build of sparc64
2014-05-23 21:12:33 +00:00
bz
c4e312930b Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.

Reviewed by:	np
MFC after:	2 weeks
2014-05-23 20:15:01 +00:00
adrian
22b3cc8e05 Use CPU_FIRST() / CPU_NEXT() to iterate over the valid CPU IDs. 2014-05-22 07:25:36 +00:00
adrian
7bdec225ef When RSS is enabled and per cpu TCP timers are enabled, do an RSS
lookup for the inp flowid/flowtype to destination CPU.

This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled.  Otherwise the behaviour should be the same
as before.
2014-05-18 22:39:01 +00:00
adrian
429607ed6b * When copying the flowid from inp -> outbound mbuf, also assign the
hashtype to to the outbound mbuf as well as the flowid.

* Add in socket options to fetch the hashid, the hashtype and RSS CPU
  ID for a given socket.
2014-05-18 22:37:31 +00:00
adrian
ce4c5bcf33 Ensure that the flowid hashtype is assigned to the inp if the flowid
is also assigned.
2014-05-18 22:34:06 +00:00
adrian
6cb1f60ec9 Add a new function to do a CPU ID lookup based on RSS hash information.
This is intended to be used by various places that wish to hash some
information about a TCP/UDP/IP flow but don't necessarily have a
live mbuf to do it with.

Refactor rss_m2cpuid() to use the refactored function.
2014-05-18 22:32:04 +00:00
adrian
f91e4baca7 Add the flowtype to the inpcb.
The flowid isn't enough to use as part of any RSS related CPU affinity
lookups - the RSS code would like to know what kind of hash it is.
2014-05-18 22:30:12 +00:00
melifaro
f4783a05e9 Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR:		bin/189471
Submitted by:	Dennis Yusupoff <dyr@smartspb.net>
MFC after:	2 weeks
2014-05-17 13:45:03 +00:00
glebius
e21a2ccdb6 Provide compatibility #define after r265408.
Suggested by:	truckman
2014-05-17 12:33:27 +00:00
adrian
784a0cfd01 Reserve IP_FLOWID, IP_FLOWTYPE, IP_RSSCPUID socket option IDs for
near-term future use.

These are intended to fetch the current flow id, flow hash type
(M_HASHTYPE_* from the sys/mbuf.h) and if RSS is enabled, the
RSS destined CPU ID for the receive path.
2014-05-17 00:09:12 +00:00
silby
278abd3220 Remove the function tcp_twrecycleable; it has been #if 0'd for
eight years.  The original concept was to improve the
corner case where you run out of ephemeral ports, but it
was causing performance problems and the mechanism
of limiting the number of time_wait sockets serves
the same purpose in the end.

Reviewed by:	bz
2014-05-16 01:38:38 +00:00
yongari
40228299cd Fix checksum computation. Previously it didn't include carry.
Reviewed by:	tuexen
2014-05-13 05:07:03 +00:00
tuexen
eff58a975e Disable TX checksum offload for UDP-Lite completely. It wasn't used for
partial checksum coverage, but even for full checksum coverage it doesn't
work.
This was discussed with Kevin Lo (kevlo@).
2014-05-12 09:46:48 +00:00
tuexen
285d10d30f Whitespace change. 2014-05-10 08:48:04 +00:00
tuexen
7bb91ee942 Fix a logic bug which prevented the sending of UDP packet with 0 checksum.
This bug was introduced in r264212 and should be X-MFCed with that
revision, if UDP-Lite support if MFCed.
2014-05-09 14:15:48 +00:00
tuexen
b6b80cf495 Use KASSERTs as suggested by glebius@
MFC after: 3 days
X-MFC with: 265691
2014-05-08 20:47:54 +00:00
tuexen
c296f5d2af For some UDP packets (for example with 200 byte payload) and IP options,
the IP header and the UDP header are not in the same mbuf.
Add code to in_delayed_cksum() to deal with this case.

MFC after: 3 days
2014-05-08 17:27:46 +00:00
tuexen
ebf0ea80e4 Remove unused code. This is triggered by the bugreport of Sylvestre Ledru
which deal with useless code in the user land stack:
https://bugzilla.mozilla.org/show_bug.cgi?id=1003929

MFC after: 3 days
2014-05-06 16:51:07 +00:00
glebius
68ca190323 - Remove net.inet.tcp.reass.overflows sysctl. It counts exactly
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
  description in netstat(1) output.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-06 00:00:07 +00:00
glebius
9f8ba75695 The tcp_log_addrs() uses th pointer, which points into the mbuf, thus we
can not free the mbuf before tcp_log_addrs().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-05-05 21:33:20 +00:00
glebius
d2bbb6646b The FreeBSD-SA-14:08.tcp was a lesson on not doing acrobatics with
mixing on stack memory and UMA memory in one linked list.

Thus, rewrite TCP reassembly code in terms of memory usage. The
algorithm remains unchanged.

We actually do not need extra memory to build a reassembly queue.
Arriving mbufs are always packet header mbufs. So we got the length
of data as pkthdr.len. We got m_nextpkt for linkage. And we need
only one pointer to point at the tcphdr, use PH_loc for that.

In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen
field now counts not packets, but bytes in the queue. This gives
us more precision when comparing to socket buffer limits.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-04 23:25:32 +00:00
melifaro
876586a2b1 Fix panic on IPv4 address removal introduced in r265279.
Reported by:	Trond Endrestøl
MFC with:	r265279
2014-05-03 20:22:13 +00:00
melifaro
a4407f98c0 Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().
2014-05-03 16:28:54 +00:00
delphij
4007fe7eae Fix TCP reassembly vulnerability.
Patch done by:	glebius
Security:	FreeBSD-SA-14:08.tcp
Security:	CVE-2014-3000
2014-04-30 04:02:57 +00:00
asomers
130691029e Fix a panic when removing an IP address from an interface, if the same address
exists on another interface.  The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS.  The solution
is to use the interface fib in that call.  For the majority of users, that will
be equivalent to the legacy behavior.

PR:		kern/189089
Reported by:	neel
Reviewed by:	neel
MFC after:	3 weeks
X-MFC with:	264887
Sponsored by:	Spectra Logic
2014-04-29 14:46:45 +00:00
asomers
f8a34b6f49 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
asomers
6e7494c7e1 Fix host and network routes for new interfaces when net.add_addr_allfibs=0
sys/net/route.c
	In rtinit1, use the interface fib instead of the process fib.  The
	latter wasn't very useful because ifconfig(8) is usually invoked
	with the default process fib.  Changing ifconfig(8) to use setfib(2)
	would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear the expected ATF failure

sys/net/if.c
	Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
	Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
	in_scrubprefix.  Pass it the interface fib.

PR:		kern/187549
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-04-24 17:23:16 +00:00
smh
977eea5c3a Fix jailed raw sockets not setting the correct source address by
calling in_pcbladdr instead of prison_get_ip4

MFC after:	1 month
2014-04-24 12:52:31 +00:00
tuexen
17495be0b7 Don't free an mbuf twice. This only happens in very rare error
cases where the peer sends illegal sequencing information in
DATA chunks for an existing association.

MFC after: 3 days.
2014-04-23 21:20:55 +00:00
rmacklem
87348af494 Add {} braces so that the code conforms to the indentation.
Fortunately, I don't think doing the assignment of cap->tsomax
unconditionally causes any problem.

Reviewed by:	glebius
MFC after:	2 weeks
2014-04-21 19:17:19 +00:00
tuexen
0d6ccd58e0 Add consistency checks to ensure that fragments of a user message
have the same U-bit.

MFC after: 3 days
2014-04-20 21:11:39 +00:00
tuexen
eaf7d5a955 Send also a packet containing an ABORT chunk in response to an OOTB packet
containing a COOKIE-ECHO chunk.

MFC after: 3 days
2014-04-20 18:15:23 +00:00
tuexen
03865d88ca Use consistently debug output instead of an unconditional printf.
MFC after: 3 days
2014-04-19 20:55:51 +00:00
tuexen
156da197a9 Send the correct error cause, when a DATA chunk with no user data
is received. This bug was reported by Irene Ruengeler.

MFC after: 3 days
2014-04-19 19:21:06 +00:00
jhb
4a20a393db Some whitespace and style fixes.
Submitted by:	bde
2014-04-11 21:00:59 +00:00
jhb
3f6d278d38 The tw_pcbrele() function does not need the global timewait lock.
Submitted by:	Julien Charbon
Suggested by:	glebius
2014-04-11 19:17:45 +00:00
jhb
3bdbc6b29a Don't leak the TCP pcbinfo lock if a time wait connection is closed
in between grabbing a reference on the connection structure and obtaining
the pcbinfo lock.

Reviewed by:	Julien Charbon
2014-04-11 13:11:43 +00:00
jhb
7d16e10f89 Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.

To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT.  Only acquire the TCP pcbinfo lock when
closing an expired connection.  This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.

Submitted by:	Julien Charbon <jcharbon@verisign.com>
Reviewed by:	rwatson, rrs, adrian
MFC after:	2 months
2014-04-10 18:15:35 +00:00
kevlo
2302f326b9 Remove a bogus re-assignment. 2014-04-08 01:54:50 +00:00