Commit Graph

3188 Commits

Author SHA1 Message Date
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Alexander Motin
2d222cb761 Improve locking of multicast addresses in VLAN and LAGG interfaces.
This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.

MFC after:	2 weeks
2014-08-04 00:58:12 +00:00
Gleb Smirnoff
9753faf553 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
Kevin Lo
c29a33213b Deprecate m_act. Use m_nextpkt always. 2014-07-17 05:21:16 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Alexander V. Chernikov
402000ffa3 Improve logic besides net.bpf.optimize_writers.
Direct bpf(4) consumers should now work fine with this tunable turned on.
In fact, the only case when optimized_writers can change program
behavior is direct bpf(4) consumer setting its read filter to
catch-all one.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-11 11:27:44 +00:00
Luigi Rizzo
9225c8085b misc bugfixes:
- stdio.h is needed for fprint()
- make memsize uint32_t to avoid errors due to overflow
- honor the *XPOLL flagg in NIOCREGIF requests
- mmap fails wit MAP_FAILED, not NULL.

MFC after:	3 days
2014-06-06 15:17:19 +00:00
Luigi Rizzo
5c8c100428 whitespace change: fix one comment, remove a stale one. 2014-06-06 15:15:27 +00:00
Luigi Rizzo
43ed1d3c76 whitespace change: remove trailing whitespace 2014-06-05 21:12:41 +00:00
Marcel Moolenaar
62d76917b8 Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers.  This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1.  Juniper Networks - The network stack implementation in Junos
    is entirely different from FreeBSD's one and this change
    allows Juniper to build "stock" NIC drivers that can be used
    in combination with both the FreeBSD and Junos stacks.
2.  FreeBSD - This change opens the door towards changing ifnet
    and implementing new features and optimizations in the network
    stack without it requiring a change in the many NIC drivers
    FreeBSD has.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Reviewed by:	glebius@
Obtained from:	Juniper Networks, Inc.
2014-06-02 17:54:39 +00:00
Alan Somers
2f308a343f Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
Peter Grehan
6902364468 Bump bhyve allocation up to 20 bits to avoid
birthday-paradox style address collisions when
bhyve VMs are connected to the same broadcoast
domain and are using pseudo-random allocations.

Reviewed by:	gnn
MFC after:	1 week
2014-05-20 02:59:13 +00:00
Alexander V. Chernikov
6db47af467 Rename rt_msg1() to more handy rtsock_msg_mbuf().
(Just for history purposes: rt_msg2() was renamed
 to rtsock_msg_buffer() in r265019).

Sponsored by:	Yandex LLC
MFC after:	1 month
2014-05-08 13:54:57 +00:00
Alexander V. Chernikov
3deb3649d5 Fix incorrect netmasks being passed via rtsock.
Since radix has been ignoring sa_family in passed sockaddrs,
no one ever has bothered filling valid sa_family in netmasks.
Additionally, radix adjusts sa_len field in every netmask not to
compare zero bytes at all.

This leads us to rt_mask with sa_family of AF_UNSPEC (-1) and
arbitrary sa_len field (0 for default route, for example).

However, rtsock have been passing that rt_mask intact for ages,
requiring all rtsock consumers to make ther own local hacks.
We even have unfixed on in base:

do `route -n monitor` in one window and issue `route -n get addr`
for some directly-connected address. You will probably see the following:

got message of size 304 on Thu May  8 15:06:06 2014
RTM_GET: Report Metrics: len 304, pid: 30493, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
 10.0.0.0 link#1 (255) ffff ffff ff em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^

after the change:

got message of size 312 on Thu May  8 15:44:07 2014
RTM_GET: Report Metrics: len 312, pid: 2895, seq 1, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
 10.0.0.0 link#1 255.255.255.0 em0:8.0.27.c5.29.d4 10.0.0.92
_________________^^^^^^^^^^^^^^^^^^

Sponsored by:	Yandex LLC
MFC after:	1 month
2014-05-08 11:56:06 +00:00
Alexander V. Chernikov
c9f98940b9 Fix sysctl_ifmalist() broken in r265019.
Reported by:	Olivier Cochard-Labbé
MFC with:	r265019
2014-05-03 17:57:06 +00:00
Alexander V. Chernikov
972ed56a33 Remove additional fib checks from rtalloc1_fib.
It looks like current consumers are either unaware
of MRT (and uses RT_DEFAULT_FIB implicitly) or
know what thay are doing, In latter case they
will be either hit by KASSERT or ESCRH will be returned
due to NULL rnh.
2014-05-03 16:38:05 +00:00
Alexander V. Chernikov
b980262e63 Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().
2014-05-03 16:28:54 +00:00
Alan Somers
f544a74870 Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed.
The thread that is destroying the lagg has already set sc->sc_psc=NULL when
the "ifconfig -am" thread gets to lacp_req().  It tries to dereference
sc->sc_psc and panics.  The solution is for lacp_req() to check the value of
sc->sc_psc.  If NULL, harmlessly return an lacp_opreq structure full of
zeros.  Full details in GNATS.

PR:		kern/189003
Reviewed by:	timeout on freebsd-net@
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-05-02 16:24:09 +00:00
Alexander V. Chernikov
32fb15e802 Fix rnh_walktree_from() function (patch from kern/174959).
Require valid netmask to be passed since host route is always a leaf.

PR:		kern/174959
Submitted by:	Keith Sklower
MFC after:	2 weeks
2014-05-01 15:04:32 +00:00
Alexander V. Chernikov
d9437c0f46 Partially revert r265019 - allocating 512 bytes on stack
can be too much for architectures like ARM. Always use rounded
malloc instead.

Discussed with:	jmallett
MFC after:	4 weeks
2014-04-29 19:48:11 +00:00
Alexander V. Chernikov
0fb9298db9 Move rt_setmetrics() from rtsock.c to route.c.
All rtsock-initiated rte creation/modification are now
performed in route.c holding radix tree write lock.
This reduces the need for per-rte mutex.

Sponsored by:	Yandex LLC
MFC after:	1 month
2014-04-29 19:14:42 +00:00
Alexander V. Chernikov
a713ee5cf7 Do not use senderr() in rtrequest1_fib_change().
Suggested by:	glebius
MFC after:	4 weeks
2014-04-29 12:52:36 +00:00
Alexander V. Chernikov
de46b2c650 Fix build
Found by:	ian
Pointyhat to:	me
2014-04-27 21:17:54 +00:00
Alexander V. Chernikov
f2e5eb368a Improve memory allocation model for rt_msg2() rtsock messages:
* memory is now allocated as early as possible, without holding locks.
 * sysctl users are now guaranteed to get a response (M_WAITOK buffer prealloc).
 * socket users are more likely to use on-stack buffer for replies.
 * standard kernel malloc/free functions are now used instead of radix wrappers.
rt_msg2() has been renamed to rtsock_msg_buffer().

MFC after:	1 month
2014-04-27 17:41:18 +00:00
Alexander V. Chernikov
f1fcb55271 Remove useless zeroing of RTAX_DST on error.
Cleanup a bit.

MFC after:	1 month
2014-04-27 10:43:48 +00:00
Alexander V. Chernikov
92c227af54 Cleanup route_output() a bit.
MFC after:	1 month
2014-04-27 10:20:37 +00:00
Alexander V. Chernikov
2277c5e5e2 Do not delay freeing rtm. Bandaid added in r227061 is not needed since r227061,
MFC after:	1 month
2014-04-27 09:49:35 +00:00
Alexander V. Chernikov
f5d9a6964d Move up fibnum to ensure it is always defined.
Found by:	ian
MFC with:	r264987
2014-04-27 02:20:09 +00:00
Alexander V. Chernikov
f59c6cb0fc Remove useless `register' declarations.
MFC after:	1 month
2014-04-26 22:42:21 +00:00
Alexander V. Chernikov
773aa0533b Determine fibnum once in the beginning of route_output().
MFC after:	1 month
2014-04-26 22:32:04 +00:00
Alexander V. Chernikov
c77462dd64 Decouple RTM_CHANGE from RTM_GET handling in rtsock.c:route_output().
RTM_CHANGE is now handled inside route.c:rtrequest1_fib() as it should be.
Note change change handler is a separate function rtrequest1_fib_change().

MFC after:	1 month
2014-04-26 21:03:41 +00:00
Alexander V. Chernikov
36d55f0f9d Unify sa_equal() macro usage.
MFC after:	2 weeks
2014-04-26 14:52:03 +00:00
Alan Somers
0cfee0c223 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
Alan Somers
0489b8916e Fix host and network routes for new interfaces when net.add_addr_allfibs=0
sys/net/route.c
	In rtinit1, use the interface fib instead of the process fib.  The
	latter wasn't very useful because ifconfig(8) is usually invoked
	with the default process fib.  Changing ifconfig(8) to use setfib(2)
	would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear the expected ATF failure

sys/net/if.c
	Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
	Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
	in_scrubprefix.  Pass it the interface fib.

PR:		kern/187549
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-04-24 17:23:16 +00:00
Martin Matuska
ecb47cf9c5 Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
John-Mark Gurney
de51fbd695 garbage collect something that hasn't been triggered in almost 5 years...
the last consumer was removed a couple years ago...
2014-04-19 19:08:08 +00:00
Rick Macklem
209579aeac For NFS mounts using rsize,wsize=65536 over TSO enabled
network interfaces limited to 32 transmit segments, there
are two known issues.
The more serious one is that for an I/O of slightly less than 64K,
the net device driver prepends an ethernet header, resulting in a
TSO segment slightly larger than 64K. Since m_defrag() copies this
into 33 mbuf clusters, the transmit fails with EFBIG.
A tester indicated observing a similar failure using iSCSI.

The second less critical problem is that the network
device driver must copy the mbuf chain via m_defrag()
(m_collapse() is not sufficient), resulting in measurable overhead.

This patch reduces the default size of if_hw_tsomax
slightly, so that the first issue is avoided.
Fixing the second issue will require a way for the
network device driver to inform tcp_output() that it
is limited to 32 transmit segments.

Reported and tested by:	csforgeron@gmail.com, markus.gebert@hostpoint.ch
MFC after:	2 weeks
2014-04-17 23:31:50 +00:00
Rick Macklem
bd9fd667c4 Vlan did not set the value of if_hw_tsomax, so when vlan
was stacked on top of a network interface that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface. This patch modifies vlan so that
it sets if_hw_tsomax to the value of the parent interface.

Reviewed by:	glebius
MFC after:	2 weeks
2014-04-15 21:48:35 +00:00
Rick Macklem
d092e11c6a Fix build for non-INET that was broken by r264469.
MFC after:	2 weeks
2014-04-15 13:28:54 +00:00
Rick Macklem
9387570f8a Lagg did not set the value of if_hw_tsomax, so when lagg
was stacked on top of network interfaces that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface(s). This patch modifies lagg so that
it sets if_hw_tsomax to the minimum of the value(s) for the
underlying network interfaces.

Reviewed by:	glebius
MFC after:	2 weeks
2014-04-14 20:34:48 +00:00
Bruce M Simpson
d565e5f206 In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec.
This KASSERT() existed as a sanity check that upper layers in the network
stack (e.g. inet, inet6) had released their reference to the underlying
driver's multicast memberships (ifmultiaddr{}). However it assumes the
lifecycle of the driver membership corresponds to the lifecycle of the
network layer membership.

In the submitter's case, ieee80211_ioctl_updatemulti() attempts to
reprogram the (parent, physical) ifnet{} memberships in response
to a change in membership on the (child, virtual) VAP ifnet, using
a batched update mechanism. These updates happen independently from
the network layer, causing a "false negative" assertion failure.

There are possibly other use cases where this KASSERT() may be triggered
by other networking stack activity (e.g. where a nesting relationship
exists between multiple ifnet{} instances). This suggests that further
review of FreeBSD's approach to nested ifnet relationships is needed.

MFC after:	6 weeks
Submitted by:	adrian@
2014-04-10 18:43:02 +00:00
Michael Tuexen
7f946da063 Call sctp_addr_change() from rt_addrmsg() instead of rt_newaddrmsg_fib(),
since rt_addrmsg() gets also called from other functions.

MFC after: 3 days
2014-04-07 21:28:21 +00:00
Martin Matuska
7e92ce7380 De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by:	Nikos Vassiliadis, trociny@
2014-03-29 09:05:25 +00:00
Martin Matuska
1709ccf9d3 Merge head up to r263906. 2014-03-29 08:39:53 +00:00
Martin Matuska
d318d97fb5 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
Navdeep Parhar
61b9f217c6 Add a shorter alias for if_data.ifi_oqdrops. 2014-03-20 02:23:52 +00:00
Julio Merino
b6d88aa39b Include strings.h so that bpf_filter.c can be built in userland.
This is to bring in a definition for bzero(3), which in turn allows the
tests in tools/regression/bpf/ to build again.
2014-03-19 13:10:25 +00:00