Introduce a separate sx lock for protecting lists of vnet sysinit
and sysuninit handlers.
Previously, sx_vnet, which is a lock designated for protecting
the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
handler lists as well. Holding exclusively the sx_vnet lock while
invoking sysinit and / or sysuninit handlers turned out to be
problematic, since some of the handlers may attempt to wake up
another thread and wait for it to walk over the vnet list, hence
acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
Protecting vnet sysinit / sysuninit lists with a separate lock
mitigates this issue, which was first observed with
flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.
Reviewed by: rwatson, jhb
MFC after: 3 days
Approved by: re (rwatson)
Prefix on-link verification is being performed on statically
configured prefixes. Since these statically configured prefixes
do not have any associated advertising routers, these prefixes
are treated as unreachable and those prefix routes are deleted
from the routing table. Therefore bypass prefixes that are not
learned from router advertisements during prefix on-link check.
Reviewed by: hrs
Approved by: re
In ip_output(), the flow-table module must not try to cache L2/L3
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.
When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.
Reviewed by: kmacy, julian, rwatson
Approved by: re
Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.
Reviewed by: kmacy, rwatson
Approved by: re
When multiple interfaces exist in the system, with each interface having
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.
Reviewed by: bz, gnn, kmacy
Approved by: re
Prior to the dire warning about values of network_interfaces other than
AUTO the biggest mistake users made was leaving lo0 off the list. Since
lo0 is effectively mandatory, check for it and add it to the list if
it's not there.
MFC 196523:
Improve the case test to detect the presence of lo0 in the list of
network_interfaces.
Submitted by: Christoph Mallon <christoph.mallon@gmx.de>
Approved by: re (kib)
It is possible for all the kthreads to exit (hci modules unloaded) which in
turn ends our usb process. This means the proc pointer becomes invalid and will
panic if a new kthread is added. Count the number of threads and clear the proc
pointer on the last one.
Approved by: re (kib)
Merge DTLS fixes from vendor-crypto/openssl/dist:
- Fix memory consumption bug with "future epoch" DTLS records.
- Fix fragment handling memory leak.
- Do not access freed data structure.
- Fix DTLS fragment bug - out-of-sequence message handling which could
result in NULL pointer dereference in
dtls1_process_out_of_seq_message().
Note that this will not get FreeBSD Security Advisory as DTLS is
experimental in OpenSSL.
Security: CVE-2009-1377 CVE-2009-1378 CVE-2009-1379 CVE-2009-1387
Approved by: re (kib)
Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.
Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().
Reviewed by: bz
Approved by: re (kib)
Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.
Reviewed by: bz
Approved by: re (kib)
Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy, qingli
Approved by: re (kib)
Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.
Approved by: re (kib)
Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.
Submitted by: pjd
Reviewed by: {krw,sthen,markus}@openbsd.org
Approved by: re (kib)
Rather than using IFNET_RLOCK() when iterating over (and modifying) the
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.
This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.
Discussed with: bz, julian
Reported by: phk
Approved by: re (kib)
Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
Approved by: re (kib)
Consider flag == 0 as the same of flag == R_NEXT. This change will restore
a historical behavior that has been changed by revision 190491, and has seen
to break exim.
Approved by: re (kib)
When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When "jail -c vnet" request fails, the current code actually creates and
leaves behind an orphaned vnet. This change ensures that such vnets get
released.
This change affects only options VIMAGE builds.
Submitted by: jamie
Discussed with: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In
nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
handler, resulting in effectively identical operation as it was prior this
change. div_destroy() also tears down hashtables used by ipdivert, which
were previously left behind on ipdivert kldunloads.
For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
because without introducing additional locking logic it is impossible to
atomically check whether all ipdivert instances in all vnets are idle, and
proceed with cleanup without opening a race window for a vnet to open an
ipdivert socket while ipdivert tear-down is in progress.
While here, staticize div_init(), because it is not used outside of
ip_divert.c.
In cooperation with: julian
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When registering a protocol to an existing protocol domain via
pf_proto_register(), iterate over all existing vnets to call protosw_init()
and thus the appropriate .pr_init() handler in the context of each vnet.
NB in the future we probably want to separate pr_init() handlers into
two, i.e. per-vnet and global, functions.
This change has no impact on nooptions VIMAGE builds.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't try to power down PHY when alc(4) failed to map the device.
This fixes system crash when mapping alc(4) device failed in device
attach.
Reported by: Jim < stapleton.41 <> gmail DOT com >
Approved by: re (kib)
Add RTL8168DP/RTL8111DP device id. While I'm here append "8111D" to
the description of RTL8168D as RL_HWREV_8168D can be either
RTL8168D or RTL8111D.
PR: kern/137672
Approved by: re (kib)
Our implementation of granpt(3) could be valid in the future.
When I wrote the pseudo-terminal driver for the MPSAFE TTY code, Robert
Watson and I agreed the best way to implement this, would be to let
posix_openpt() create a pseudo-terminal with proper permissions in place
and let grantpt() and unlockpt() be no-ops.
This isn't valid behaviour when looking at the spec. Because I thought
it was an elegant solution, I filed a bug report at the Austin Group
about this. In their last teleconference, they agreed on this subject.
This means that future revisions of POSIX may allow grantpt() and
unlockpt() to be no-ops if an open() on /dev/ptmx (if the implementation
has such a device) and posix_openpt() already do the right thing.
I'd rather put this in the manpage, because simply mentioning we don't
comply to any standard makes it look worse than it is. Right now we
don't, but at least we took care of it.
Approved by: re (kib)
In the loop through the list of interfaces in network6_interface_setup()
rtsol_interface gets reset to "yes" each time through the loop, but
rtsol_available does not. If a user has lo0 first in their list of
interfaces rtsol_available will get set to "no" the first time through
the loop and subsequent interfaces will not get rtsol'ed when they should.
Therefore change the conditional for the is_wired() test to _interface.
Approved by: re (kib)
Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
Approved by: re (kensmith)
Fix a few issues with the lib32 dist so that it includes ldd32.
- Use a better find invocation to purge empty directories from all the dist
trees during a release build. The previous version did not purge
directories whose contents were all empty directories.
- Explicitly blacklist a few files from the lib32 dist instead of using a
whitelist. A better longterm solution is to fix the few offenders to not
install data files during a lib32 install.
Approved by: re (kib)
Tweak the way that the ACPI and ISA bus drivers match hint devices to
BIOS-enumerated devices:
- Assume a device is a match if the memory and I/O ports match even if the
IRQ or DRQ is wrong or missing. Some BIOSes don't include an IRQ for
the atrtc device for example.
- Add a hack to better match floppy controller devices. Many BIOSes do not
include the starting port of the floppy controller listed in the hints
(0x3f0) in the resources for the device. So far, however, all the BIOS
variations encountered do include the 'port + 2' resource (0x3f2), so
adjust the matching for "fdc" devices to look for 'port + 2'.
Approved by: re (kib)
create stdin and stdout, don't blindly try to use stdin as a bi-directional
channel. Instead, detect the pipe and set up a special exec handler
that indirects write() calls through stdout.
This fixes the problem where ``set device "!ssh -e none host ppp
-direct label"'' no longer works with an openssh-5.2 server side as
that version of openssh ignores the USE_PIPES config setting and
*always* uses pipes (rather than socketpair) for stdin/stdout channels.
Approved by: re (kib)
The svnversion string is only relevant when newvers.sh is called
during the kernel build process, the other places that call the
script do not make use of that information. So restrict execution
of the svnversion-related code to the kernel build context.
Approved by: re (kib)
Move is_wired_interface() from rc.d/wpa_supplicant into network.subr,
simplify it a bit, and make use of that method to determine if an
interface is a candidate for IPv6 rtsol rather than listing all of the
possible wireless interfaces that should _not_ get rtsol'ed.
This change is only relevant for 8.0+ unless the "wlan mandatory" code
gets ported back to RELENG_7.
Approved by: re (kib)
Add a script to create the /var/db/mergemaster.mtree file for new
releases so that when users subsequently update their source trees
they can make use of mergemaster's -U option.
Approved by: re (kib)
Enable _DIRENT_HAVE_D_TYPE so wpa_cli scans directories properly
for it's unix domain socket. Before this change wpa_cli would take
the first file in the directory that was not "." or "..".
Approved by: re (rwatson)
Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.
This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
Bugfix: all requests for creating vnets via vimage -c were always
reported as failures, even if the actual library / system call
would succeed, because error message would be reported if the return
value from jail_setv() call was >= 0, and if not, then if that same
value was < 0, i.e. always. The correct behavior is to abort (only)
if jail_setv() returns < 0.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Approved by: re (kib)
Fix a boot hang for hptrr(4) caused by changes introduced in r195534.
It is necessary to make sure cpi->transport is set for xpt_scan_bus() to
work properly.
Submitted by: Bernhard Schmidt (scb+freebsd-current <at> techwires
<dot> net)
Reviewed by: scottl
Approved by: re (kib)
Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
vimage(8) is a legacy CLI interface for managing jails associated with
network stack instances, which is provided for compatibility with
older applications. This change brings it back to life in a followup
to the initial conversion of vimage to use the new jail(4)
userland-kernel API:
- when creating vimages via "vimage -c", by default turn on a few
options expected by legacy applications, such as allow operations on
raw sockets, FS mounts etc, and allow jail-related parameters to be
optionally configured.
- introduce the "-m" modifier which allows for configuring jail
parameters of existing vimages / vnet-jails.
- make "vimage name command ..." actually work.
- when reassigning ifnets to vnets using "vimage -i", attempt to rename
the ifnet as "ethXXX" on arrival in the target vnet. Several legacy
applications are known to depend heavily on such behavior.
- vimage -l lists only jails associated with vnets. The output is
sorted using vimage / jail names as keys.
- vimage -l by default searches only the current level in the jail
hierarchy. Recursive listing can be requested via -r switch.
- vimage -l by default prints only jail names on each line, making
such output suitable for pipelining to other commands. More verbose
output can be obtained via -v switch, and even more jail specific
information will be displayed if -j switch is turned on.
- there's no need to build vimage as statically linked, so update the
Makefile accordingly.
- update the vimage.8 man page.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Rather than fix questionable ifnet list locking in the implementation of
the kern.polling.enable sysctl, remove the sysctl. It has been deprecated
since FreeBSD 6 in favour of per-ifnet polling flags.
Reviewed by: luigi
Approved by: re (kib)