Commit Graph

2376 Commits

Author SHA1 Message Date
Kip Macy
de4ab55e43 add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry
2009-04-15 20:34:19 +00:00
Marko Zec
4d79e3d5e8 In the !VIMAGE_GLOBALS case, make sure not to call vnet_net_iattach()
both via the vnet_mod_register() framework and then directly, but only
once.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-15 18:15:29 +00:00
Kip Macy
427ac07f05 Extend route command:
- add show as alias for get
	- add weights to allow mpath to do more than equal cost
	- add sticky / nostick to disable / re-enable per-connection load balancing

This adds a field to rt_metrics_lite so network bits of world will need to be re-built.

Reviewed by:	jeli & qingli
2009-04-14 23:05:36 +00:00
Kip Macy
aee3056f64 call default if_qflush on ifq if default method isn't used by the driver 2009-04-14 03:17:44 +00:00
Kip Macy
3efea33724 Adapt buf_ring abstraction interface to allow consumers to interoperate with ALTQ 2009-04-14 00:27:59 +00:00
Robert Watson
86425c62a0 Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel.  This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 23:35:20 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Max Laier
8623f9fd7a Follow up for r190895 It's not only the "all" group that is affected, but
all groups on the given interface.

PR:		kern/130977, kern/131310
MFC after:	3 days (%vnet)
2009-04-10 19:16:14 +00:00
Max Laier
876d5c038f Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics
when using the "self" keyword in tables or as ()-style host address and
fixes "ifconfig -g all" output.

PR:		kern/130977, kern/131310
Submitted by:	Mikolaj Golub
MFC after:	3 days
2009-04-10 14:41:51 +00:00
Ed Schouten
156cfba72c Add parentheses to under-parenthesized macro.
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-07 19:35:20 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Ed Schouten
d2a0bb0803 Remove if_ppp(4) and if_sl(4).
Not only did these two drivers depend on IFF_NEEDSGIANT, they were
broken 7 months ago during the MPSAFE TTY import. if_ppp(4) has been
replaced by ppp(8). There is no replacement for if_sl(4).

If we see regressions in for example the ports tree, we should just use
__FreeBSD_version 800045 to check whether if_ppp(4) and if_sl(4) are
present. Version 800045 is used to denote the import of MPSAFE TTY.

Discussed with: rwatson, but also rwatson's IFF_NEEDSGIANT emails on the
                lists.
2009-04-05 22:08:18 +00:00
Rui Paulo
dcf377edf1 Sync DLTs with latest libpcap version. 2009-04-02 13:02:12 +00:00
Sam Leffler
a51f44a7a6 enable setting the mac address of 802.11 devices 2009-03-28 17:36:56 +00:00
Jamie Gritton
bc3977f122 Call the interface's if_ioctl from ifioctl(), if the protocol didn't
handle the ioctl.  There are other paths that already call it, but this
allows for a non-interface socket (like AF_LOCAL which ifconfig now
uses) to use a broader class of interface ioctls.

Approved by:	bz (mentor), rwatson
2009-03-20 13:41:23 +00:00
Sean Farley
a5b1a8553c Remove the splimp()/splx() calls around the setting of the MTU. They are
no-op's that I inadvertently added.  Even if locking is needed in general
for the ioctl's, setting a single long will not need it due to the operation
being atomic.

Reported by:	rwatson
2009-03-17 02:32:36 +00:00
Robert Watson
5bb89930e4 Define and use two macros for loopback checksum offload:
LO_CSUM_FEATURES - a bitmask of supported transmit offload features, which
  will be stored in if_hwassist if IFCAP_TXCSUM is enabled, and be cleared
  from mbuf packet header csum flags on transmit. (1)

LO_CSUM_SET - a bitmask of supported receive offload features, which will
  be set on the mbuf packet header csum flags on transmit if IFCAP_RXCSUM
  is enabled.

While here, fix SCTP offload for loopback: offer generation on the
transmit side, don't just skip validation on the receive side.

Obtained from:  DragonflyBSD (1)
MFC after:      1 week
2009-03-16 10:56:50 +00:00
Robert Watson
fe89fae166 if_hwassist should be initialized with CSUM, rather than IFCAP, flags.
Submitted by:	yongari
MFC after:	1 week
2009-03-16 09:22:34 +00:00
Sean Farley
c035ab1c6f Add the SIOCSIFMTU ioctl handling directly to tap(4) permitting it to
have its MTU set higher than 1500 (ETHERMTU).  Its new limit is now
65535 as enforced by ifhwioctl() in if.c

This allows a tap(4) device to be added to a bridge, which requires all
interface members to have the same MTU, with an interface configured for
jumbo frames.  QEMU may now connect to a network via tap(4) without
requiring the real interface to have its MTU set to 1500 or lower.

Reviewed by:	rpaulo, bms
MFC after:	1 week
2009-03-16 03:11:02 +00:00
Robert Watson
3cb73e3d8b Teach the loopback interface about checksum generation and validation
avoidance:

- Enable setting the RXCSUM and TXCSUM flags for loopback interfaces;
  set both by default.
- When RXCSUM is set, flag packets sent over the loopback interface as
  having checked and valid IP, UDP, TCP checksums so that higher
  protocol layers won't check them.
- Always clear CSUM_{IP,UDP_TCP} checksum required flags on transmit,
  as they will have gotten there as a result of TXCSUM being set.

This is done only for packets explicitly sent over the loopback, not
simulated loopback via if_simloop() due to !SIMPLEX interfaces, etc.

Note that enabling TXCSUM but not RXCSUM will lead to unhappiness, as
checksums won't be generated but will be validated.

Kris reports that this leads to significant performance improvements
in loopback benchmarking with TCP and UDP for throughput:

	RXCSUM 	RXCSUM+TXCSUM
TCP	15%	37%
UDP	10%	74%

Update man page.

Reviewed by:	sam
Tested by:	kris
MFC after:	1 week
2009-03-15 20:17:44 +00:00
Robert Watson
e5adda3d51 Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
Sam Leffler
c0c9ea90a8 remove stray ; 2009-03-14 17:54:58 +00:00
Christian S.J. Peron
ffeeb9241a Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup.  pflogd(8)
is an example.  The problem is understood and there is a fix coming in
shortly.

Folks who want to continue using it can do so by setting

net.bpf.zerocopy_enable

to 1.

Discussed with:	rwatson
2009-03-10 14:28:19 +00:00
Robert Watson
e82669d99b When resetting a BPF descriptor, properly check that zero-copy buffers
are not currently owned by userspace before clearing or rotating them.

Otherwise we may not play by the rules of the shared memory protocol,
potentially corrupting packet data or causing userspace applications
that are playing by the rules to spin due to being notified that a
buffer is complete but the shared memory header not reflecting that.

This behavior was seen with pflogd by a number of reporters; note that
this fix is not sufficient to get pflogd properly working with
zero-copy BPF, due to pflogd opening the BPF device before forking,
leading to the shared memory buffer not being propery inherited in the
privilege-separated child.  We're still deciding how to fix that
problem.

This change exposes buffer-model specific strategy information in
reset_d(), which will be fixed at a later date once we've decided how
best to improve the BPF buffer abstraction.

Reviewed by:	csjp
Reported by:	keramida
2009-03-07 22:17:44 +00:00
Marius Strobl
c89c8a1029 On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR:		131921
MFC after:	1 week
2009-03-07 19:08:58 +00:00
Christian S.J. Peron
927094113e Mark the bpf stats sysctl as being mpsafe. We do not require
Giant here.
2009-03-07 17:07:29 +00:00
Robert Watson
784cd896fc Clarify some comments, fix some types, and rename ZBUF_FLAG_IMMUTABLE to
ZBUF_FLAG_ASSIGNED to make it clear why the buffer can't be written to:
it is assigned to userspace.
2009-03-07 10:21:37 +00:00
Bruce M Simpson
1e98b429fe Reserve a netisr slot for the IGMPv3 output queue. 2009-03-04 02:54:11 +00:00
Christian S.J. Peron
f32b457b6e Switch the default buffer mode in bpf(4) to zero-copy buffers.
Discussed with:	rwatson
2009-03-02 19:42:01 +00:00
Robert Watson
3055e123d7 Do a bit of struct ifnet cleanup in preparation for 8.0: group function
pointers together, move padding to the bottom of the structure, and add
two new integer spares due to attrition over time.  Remove unused spare
"flags" field, we can use one of the spare ints if we need it later.

This change requires a rebuild of device driver modules that depend on
the layout of ifnet for binary compatibility reasons.

Discussed with:	kmacy
2009-03-01 12:42:54 +00:00
Bjoern A. Zeeb
2bebb49117 Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by:	peter
2009-03-01 11:01:00 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Luigi Rizzo
b27b816fc5 we need if_var.h not if.h 2009-02-16 15:10:03 +00:00
Luigi Rizzo
a1d4f19c0d remove unnecessary forward declaration 2009-02-16 15:09:37 +00:00
Robert Watson
d7bf9f090d IFF_NEEDSGIANT will no longer be supported, so remove compatibility code
from if_sppp framework for interfaces requiring Giant.
2009-02-16 10:29:03 +00:00
Luigi Rizzo
ada55ca0b7 remove unnecessary #include from vnet.h and vinet.h
Approved by:	Marko Zec
2009-02-15 00:28:28 +00:00
Andrew Thompson
66c84010b4 bridge_delete_member is called via the event handler from if_detach
after the LLADDR is reclaimed which causes a null pointer deref with
inherit_mac enabled. Record the ifnet pointer of the interface and then compare
that to find when to re-assign the bridge address.

Submitted by:	sam
2009-02-13 19:20:25 +00:00
Maxim Konovalov
96c8ef3a18 o In case of the error do not forget to deallocate a cloned device unit.
PR:		kern/131642
Submitted by:	Dmitrij Tejblum
MFC after:	1 week
2009-02-13 12:59:54 +00:00
Robert Watson
86c2bc49cc Remove unused ifaddr local variable in ioctl routine.
MFC after:	3 days
2009-02-13 00:01:11 +00:00
Jamie Gritton
9c79d2432e Call prison_if from rtm_get_jailed, instead of splitting it out into
prison_check_ip4 and prison_check_ip6.  As prison_if includes a jailed()
check, remove that check before calling rtm_get_jailed.

Approved by:	bz (mentor)
2009-02-05 14:58:16 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
Randall Stewart
2f4afd2125 Adds support for SCTP checksum offload. This means
we, like TCP and UDP, move the checksum calculation
into the IP routines when there is no hardware support
we call into the normal SCTP checksum routine.

The next round of SCTP updates will use
this functionality. Of course the IGB driver needs
a few updates to support the new intel controller set
that actually does SCTP csum offload too.

Reviewed by:	gnn, rwatson, kmacy
2009-02-03 11:00:43 +00:00
Bjoern A. Zeeb
2e730bea0a Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code.  Also check that the rt is valid before freeing it
either way there.

PR:		kern/129793
Submitted by:	Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after:	2 weeks
Committed from:	Bugathon #6
2009-01-31 10:48:02 +00:00
Bjoern A. Zeeb
1cecba0fcd For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by:	jamie (as part of a larger patch)
MFC after:	1 week
2009-01-25 10:11:58 +00:00
John Baldwin
eb322a6f77 Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after:	2 weeks
2009-01-23 20:53:01 +00:00
Qing Li
82b334e80d The RTF_LLINFO was revived unconditionally, but within the kernel the
check on the sysctl argument value being RTF_LLINFO is conditioned on
the COMPAT_ROUTE_FLAGS kernel option. This mismatch caused the L2
table retrieval failure, and the arp/ndp -an command displays empty L2
tables.

Reviewed by:   pjd
2009-01-16 09:01:45 +00:00
Qing Li
14981d8057 Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.
2009-01-12 11:24:32 +00:00
Robert Watson
3dc85f8d63 Do invoke mac_ifnet_check_transmit() and mac_ifnet_create_mbuf()
in the loopback and synthetic loopback code so that packets are
access control checked and relabeled.  Previously, the MAC
Framework enforced that packets sent over the loopback weren't
relabeled, but this will allow policies to make explicit choices
about how and whether to relabel packets on the loopback.  Also,
for SIMPLEX devices, this produces more consistent behavior for
looped back packets to the local MAC address by labeling those
packets as coming from the interface.

Discussed with:	csjp
Obtained from:	TrustedBSD Project
2009-01-10 23:50:23 +00:00
Bjoern A. Zeeb
c2ded8aefb Rather than using the cred from curthread, take it from the thread
referenced in the sysctl req argument.

Reviewed by:	rwatson
MFC after:	2 weeks
2009-01-09 23:57:59 +00:00
Bjoern A. Zeeb
813dd6ae5e Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR:		kern/68189
Submitted by:	Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with:	rwatson [2]
Reviewed by:	rwatson
MFC after:	4 weeks
2009-01-09 21:57:49 +00:00