3721 Commits

Author SHA1 Message Date
bz
7a1c0b1ad1 Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
kp
b06d3a64e7 pf: Filter on and set vlan PCP values
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.

Reviewed by:    allanjude, araujo
Approved by:	re (gjb)
Obtained from:  OpenBSD (mostly)
Differential Revision:  https://reviews.freebsd.org/D6786
2016-06-17 18:21:55 +00:00
cem
2b749a1b92 iflib: Improve cleanup on iflib_queues_alloc error path
Fix some memory leaks.  Some may remain.

Reported by:	Coverity
Discussed with:	mmacy
CIDs:		1356036, 1356037, 1356038
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 20:26:00 +00:00
cem
3b6ce85def iflib: Fix potential leak in iflib_if_transmit
Due to an accidental mismatch between allocation and release in the slow path
of iflib_if_transmit, if a caller passed 9-16 mbufs to the routine, the mbuf
array would be leaked.

Fix the mismatch by removing the magic numbers in favor of nitems() on the
stack array.  According to mmacy, this leak is unlikely.

Reported by:	Coverity
Discussed with:	mmacy
CID:		1356040
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 19:49:08 +00:00
pfg
4da121b5b1 ng_mppc(4): Bring netgraph(3) MPPC compression support.
Support for compression has been available from July 2007 but it
was never imported due to concerns with patents once held by
STAC/HiFn. The issues have clearly been resolved so bring it
in now.

Special thanks to Brett Glass for preserving the code and
pointing documentation for the expiration case.

Obtained from:	mav (through Brett Glass)
Relnotes:	yes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6739
2016-06-07 15:07:00 +00:00
sephe
7acd138965 net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties
Reviewed by:	hps, erj, tuexen
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6688
2016-06-07 04:51:50 +00:00
bz
5365a749e6 After tearing down the interface per-"domain" bits, set the data area
to NULL to avoid it being mis-treated on a possible re-attach but also
to get a clean NULL pointer derefence in case of errors due to
unexpected race conditions elsewhere in the code, e.g., callouts.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 22:59:58 +00:00
bz
08c2ef1fa5 Similarly to r301505 protect the removal of the ifa from the if_addrhead
by a lock (as well as the check that the list is not empty).

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 16:23:02 +00:00
bz
28309d8186 In if_purgeaddrs() we cannot hold the lock over the entire loop
due to called functions (as in other parts of the stack, leave a comment).
Put around a lock the removal of the ifa from the list however to
reduce the possible race with other places.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 13:17:25 +00:00
bz
dc7aa8b621 SYSINIT functions do not return a value; switch to void, remove
the return value, and mark the unused argument __unused.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 13:01:57 +00:00
bz
8757b6342b Provide a public interface to rt_flushifroutes which takes the address
family as an argument as well.
This will be used to cleanup individual protocols during VNET teardown.

Obtained from:	projects/vnet
Sponsored by:	The FreeBSD Foundation
2016-06-06 12:49:47 +00:00
bz
4948c572e7 Make the KASSERT message more helpful by also printing the ifp information
which we are asserting.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 10:13:48 +00:00
araujo
af4ff984e7 Add support to priority code point (PCP) that is an 3-bit field
which refers to IEEE 802.1p class of service and maps to the frame
priority level.

Values in order of priority are: 1 (Background (lowest)),
0 (Best effort (default)), 2 (Excellent effort),
3 (Critical applications), 4 (Video, < 100ms latency),
5 (Video, < 10ms latency), 6 (Internetwork control) and
7 (Network control (highest)).

Example of usage:
root# ifconfig em0.1 create
root# ifconfig em0.1 vlanpcp 3

Note:
The review D801 includes the pf(4) part, but as discussed with kristof,
we won't commit the pf(4) bits for now.
The credits of the original code is from rwatson.

Differential Revision:	https://reviews.freebsd.org/D801
Reviewed by:	gnn, adrian, loos
Discussed with: rwatson, glebius, kristof
Tested by:	many including Matthew Grooms <mgrooms__shrew.net>
Obtained from:	pfSense
Relnotes:	Yes
2016-06-06 09:51:58 +00:00
bz
69cdb2137c Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.

Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.

The change should be transparent for non-VIMAGE kernels.

Reviewed by:	gnn (, hiren)
Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6691
2016-06-03 13:57:10 +00:00
gnn
d75e0c471e This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by:	Mike Karels
Differential Revision:	https://reviews.freebsd.org/D6262
2016-06-02 17:51:29 +00:00
bz
fbe84e555e In if_attachdomain1() there does not seem to be any reason
to use TRYLOCK rather than just acquire the lock, so just do that.

Reviewed by:		markj
Obtained from:		projects/vnet
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6578
2016-05-28 08:32:15 +00:00
n_hibma
6a72802152 Change net.link.log_promisc_mode_change to a read-only tunable
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	hselasky
MFC after:	3 days
2016-05-25 09:00:05 +00:00
tuexen
d1b4674834 Allow an MTU of 65535 bytes to be set via TUN[SG]IFINFO. This requires
changing the type on the mtu field in struct tuninfo from short to
unsigned short.
This is used, for example, by packetdrill to test with MTUs up to the
maximum value.

Differential Revision:	6452
2016-05-24 11:47:14 +00:00
pfg
40226afa3d sys/net: more spelling. 2016-05-19 16:28:05 +00:00
tuexen
513d614cdf Allow writing IP packets of length TUNMRU no matter if TUNSIFHEAD is set
or not.
2016-05-19 13:52:12 +00:00
bz
47d341d05f Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by:		gnn
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6438
2016-05-18 20:06:45 +00:00
bz
cd56bdcd66 Add a "vnet_state" field to struct vnet.
This is set to the SI_SUB_* value before executing any VNET_SYSINIT
or VNET_SYSUNINT.  While good for debugging especially VNET teardown
problems having a chance to know at which level during teardown we are,
it will also be used to identify to detcted a "stable state"
(as in fully up and running) later on.

Obtained from:	projects/vnet
Sponsored by:	The FreeBSD Foundation
2016-05-18 15:50:52 +00:00
scottl
eb3b190ade Activate the NO_64BIT_ATOMICS code for mips and powerpc 2016-05-18 15:45:12 +00:00
scottl
4f6cb0738c Remove assertions that don't make sense for the data type. 2016-05-18 15:44:45 +00:00
bz
6a7d3963bc Add a dummy VNET_SYSINIT that will make sure all VNETs started will
always end on SI_SUB_VNET_DONE.

Obtained from:	projects/vnet
Sponsored by:	The FreeBSD Foundation
2016-05-18 15:25:19 +00:00
bz
bfac6710a1 Split 'show vnets' into 'show vnet' and 'show all vnets'.
While here adjust some db_printf format string.

Document the two show commands in ddb.4.

Sponsored by:	The FreeBSD Foundation
2016-05-18 14:43:17 +00:00
bz
74d8a16a48 Make compile without INET or without IP support in the kernel by hiding
variables and lro function calls behind approriate #ifdefs.

Also move the #includes for "opt_*" to the place where they should be.
2016-05-18 14:18:03 +00:00
scottl
596f2a146d Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00
eadler
156fd4834a Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
bz
393a88e32e Mark the unused arguments of various SYSINIT functions __unused.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-05-17 00:32:36 +00:00
truckman
4d98e41134 When handling SIOCSIFNAME ensure that the new interface name is NUL
terminated.  Reject the rename attempt if the name is too long.

MFC after:	1 week
2016-05-15 21:37:36 +00:00
jhb
bcc5b0c55d Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
n_hibma
34530d661f Allow silencing of 'promiscuous mode enabled/disabled' messages.
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	eugen.grosbein.net
MFC after:	1 week
2016-05-12 19:42:13 +00:00
asomers
09b44517ca Improve performance and functionality of the bitstring(3) api
Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow
for efficient searching of set or cleared bits starting from any bit offset
within the bit string.

Performance is improved by operating on longs instead of bytes and using
ffsl() for searches within a long. ffsl() is a compiler builtin in both
clang and gcc for most architectures, converting what was a brute force
while loop search into a couple of instructions.

All of the bitstring(3) API continues to be contained in the header file.
Some of the functions are large enough that perhaps they should be uninlined
and moved to a library, but that is beyond the scope of this commit.

sys/sys/bitstring.h:
        Convert the majority of the existing bit string implementation from
        macros to inline functions.

        Properly protect the implementation from inadvertant macro expansion
        when included in a user's program by prefixing all private
        macros/functions and local variables with '_'.

        Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and
        bit_ffc() in terms of their "at" counterparts.

        Provide a kernel implementation of bit_alloc(), making the full API
        usable in the kernel.

        Improve code documenation.

share/man/man3/bitstring.3:
        Add pre-exisiting API bit_ffc() to the synopsis.

        Document new APIs.

        Document the initialization state of the bit strings
        allocated/declared by bit_alloc() and bit_decl().

        Correct documentation for bitstr_size(). The original code comments
        indicate the size is in bytes, not "elements of bitstr_t". The new
        implementation follows this lead. Only hastd assumed "elements"
        rather than bytes and it has been corrected.

etc/mtree/BSD.tests.dist:
tests/sys/Makefile:
tests/sys/sys/Makefile:
tests/sys/sys/bitstring.c:
        Add tests for all existing and new functionality.

include/bitstring.h
	Include all headers needed by sys/bitstring.h

lib/libbluetooth/bluetooth.h:
usr.sbin/bluetooth/hccontrol/le.c:
        Include bitstring.h instead of sys/bitstring.h.

sbin/hastd/activemap.c:
        Correct usage of bitstr_size().

sys/dev/xen/blkback/blkback.c
        Use new bit_alloc.

sys/kern/subr_unit.c:
        Remove hard-coded assumption that sizeof(bitstr_t) is 1.  Get rid of
        unrb.busy, which caches the number of bits set in unrb.map.  When
        INVARIANTS are disabled, nothing needs to know that information.
        callapse_unr can be adapted to use bit_ffs and bit_ffc instead.
        Eliminating unrb.busy saves memory, simplifies the code, and
        provides a slight speedup when INVARIANTS are disabled.

sys/net/flowtable.c:
        Use the new kernel implementation of bit-alloc, instead of hacking
        the old libc-dependent macro.

sys/sys/param.h
        Update __FreeBSD_version to indicate availability of new API

Submitted by:   gibbs, asomers
Reviewed by:    gibbs, ngie
MFC after:      4 weeks
Sponsored by:   Spectra Logic Corp
Differential Revision:  https://reviews.freebsd.org/D6004
2016-05-04 22:34:11 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
bz
a0e5ea962e Remove the most useful INET || INET6 check leftover from whenever,
doing nothing.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-05-03 16:01:53 +00:00
rrs
64e463c093 Complete the UDP tunneling of ICMP msgs to those protocols
interested in having tunneled UDP and finding out about the
ICMP (tested by Michael Tuexen with SCTP.. soon to be using
this feature).

Differential Revision:	http://reviews.freebsd.org/D5875
2016-04-28 15:53:10 +00:00
cem
5abde16df3 radix_mpath: Don't derefence a NULL pointer in for loop iteration
It seems rn_dupedkey may be NULL, because of the NULL check inside the loop.
(Also, the rt gets assigned from rn_dupedkey and NULL checked at top of loop.)
However, the for-loop update condition happens before the top-of-loop check and
dereferences 'rt' unconditionally.

Instead, NULL-check before dereferencing.

If rn_dupedkey cannot in fact be NULL, or something else protects this, feel
free to revert this and add an ASSERT of some kind instead.

This was introduced in r191080 (2009) and moved around slightly in r293657.

Reported by:	Coverity
CID:		1348482
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 20:27:17 +00:00
pfg
fc01419148 sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
pfg
729533413f sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
pfg
fc65edc1cd Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
bz
d0fd815b72 Add more fields from struct ifnet needed during debugging a kernel panic.
Move if_fib into the right place.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-04-20 21:04:39 +00:00
cem
1fdda1ad1e radix rn_inithead: Fix minor leak in low memory conditions
R_Zalloc is essentially a malloc(M_NOWAIT) wrapper.  It is possible that 'rnh'
failed to allocate, but 'rmh' succeeds.  In that case, we bail out of
rn_inithead() but previously did not free 'rmh'.

Introduced in r287073 (projects/routing) / MFP r294706.

Reported by:	Coverity
CID:		1350258
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 02:01:45 +00:00
cem
9904129a9f bpf_getdltlist: Don't overrun 'lst'
'lst' is allocated with 'n1' members.  'n' indexes 'lst'.  So 'n == n1' is an
invalid 'lst' index.  This is a follow-up to r296009.

Reported by:	Coverity
CID:		1352743
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:39:31 +00:00
pfg
a7d40a88c9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
pfg
12232f8463 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
bz
146f54f44d During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.

Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.

Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5896
2016-04-11 10:00:38 +00:00
pfg
b63211eed5 Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
rpokala
b4cec75b57 Revert accidental submit of WIP as part of r297609
Pointyhat to:	rpokala
2016-04-06 04:58:20 +00:00
rpokala
1f8d2c0640 Storage Controller Interface driver - typo in unimplemented macro in
scic_sds_controller_registers.h

s/contoller/controller/

PR:		207336
Submitted by:	Tony Narlock <tony @ git-pull.com>
2016-04-06 04:50:28 +00:00