Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
Approved by: re (kib)
Consider flag == 0 as the same of flag == R_NEXT. This change will restore
a historical behavior that has been changed by revision 190491, and has seen
to break exim.
Approved by: re (kib)
When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When "jail -c vnet" request fails, the current code actually creates and
leaves behind an orphaned vnet. This change ensures that such vnets get
released.
This change affects only options VIMAGE builds.
Submitted by: jamie
Discussed with: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In
nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
handler, resulting in effectively identical operation as it was prior this
change. div_destroy() also tears down hashtables used by ipdivert, which
were previously left behind on ipdivert kldunloads.
For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
because without introducing additional locking logic it is impossible to
atomically check whether all ipdivert instances in all vnets are idle, and
proceed with cleanup without opening a race window for a vnet to open an
ipdivert socket while ipdivert tear-down is in progress.
While here, staticize div_init(), because it is not used outside of
ip_divert.c.
In cooperation with: julian
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When registering a protocol to an existing protocol domain via
pf_proto_register(), iterate over all existing vnets to call protosw_init()
and thus the appropriate .pr_init() handler in the context of each vnet.
NB in the future we probably want to separate pr_init() handlers into
two, i.e. per-vnet and global, functions.
This change has no impact on nooptions VIMAGE builds.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't try to power down PHY when alc(4) failed to map the device.
This fixes system crash when mapping alc(4) device failed in device
attach.
Reported by: Jim < stapleton.41 <> gmail DOT com >
Approved by: re (kib)
Add RTL8168DP/RTL8111DP device id. While I'm here append "8111D" to
the description of RTL8168D as RL_HWREV_8168D can be either
RTL8168D or RTL8111D.
PR: kern/137672
Approved by: re (kib)
Our implementation of granpt(3) could be valid in the future.
When I wrote the pseudo-terminal driver for the MPSAFE TTY code, Robert
Watson and I agreed the best way to implement this, would be to let
posix_openpt() create a pseudo-terminal with proper permissions in place
and let grantpt() and unlockpt() be no-ops.
This isn't valid behaviour when looking at the spec. Because I thought
it was an elegant solution, I filed a bug report at the Austin Group
about this. In their last teleconference, they agreed on this subject.
This means that future revisions of POSIX may allow grantpt() and
unlockpt() to be no-ops if an open() on /dev/ptmx (if the implementation
has such a device) and posix_openpt() already do the right thing.
I'd rather put this in the manpage, because simply mentioning we don't
comply to any standard makes it look worse than it is. Right now we
don't, but at least we took care of it.
Approved by: re (kib)
In the loop through the list of interfaces in network6_interface_setup()
rtsol_interface gets reset to "yes" each time through the loop, but
rtsol_available does not. If a user has lo0 first in their list of
interfaces rtsol_available will get set to "no" the first time through
the loop and subsequent interfaces will not get rtsol'ed when they should.
Therefore change the conditional for the is_wired() test to _interface.
Approved by: re (kib)
Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
Approved by: re (kensmith)
Fix a few issues with the lib32 dist so that it includes ldd32.
- Use a better find invocation to purge empty directories from all the dist
trees during a release build. The previous version did not purge
directories whose contents were all empty directories.
- Explicitly blacklist a few files from the lib32 dist instead of using a
whitelist. A better longterm solution is to fix the few offenders to not
install data files during a lib32 install.
Approved by: re (kib)
Tweak the way that the ACPI and ISA bus drivers match hint devices to
BIOS-enumerated devices:
- Assume a device is a match if the memory and I/O ports match even if the
IRQ or DRQ is wrong or missing. Some BIOSes don't include an IRQ for
the atrtc device for example.
- Add a hack to better match floppy controller devices. Many BIOSes do not
include the starting port of the floppy controller listed in the hints
(0x3f0) in the resources for the device. So far, however, all the BIOS
variations encountered do include the 'port + 2' resource (0x3f2), so
adjust the matching for "fdc" devices to look for 'port + 2'.
Approved by: re (kib)
create stdin and stdout, don't blindly try to use stdin as a bi-directional
channel. Instead, detect the pipe and set up a special exec handler
that indirects write() calls through stdout.
This fixes the problem where ``set device "!ssh -e none host ppp
-direct label"'' no longer works with an openssh-5.2 server side as
that version of openssh ignores the USE_PIPES config setting and
*always* uses pipes (rather than socketpair) for stdin/stdout channels.
Approved by: re (kib)
The svnversion string is only relevant when newvers.sh is called
during the kernel build process, the other places that call the
script do not make use of that information. So restrict execution
of the svnversion-related code to the kernel build context.
Approved by: re (kib)
Move is_wired_interface() from rc.d/wpa_supplicant into network.subr,
simplify it a bit, and make use of that method to determine if an
interface is a candidate for IPv6 rtsol rather than listing all of the
possible wireless interfaces that should _not_ get rtsol'ed.
This change is only relevant for 8.0+ unless the "wlan mandatory" code
gets ported back to RELENG_7.
Approved by: re (kib)
Add a script to create the /var/db/mergemaster.mtree file for new
releases so that when users subsequently update their source trees
they can make use of mergemaster's -U option.
Approved by: re (kib)
Enable _DIRENT_HAVE_D_TYPE so wpa_cli scans directories properly
for it's unix domain socket. Before this change wpa_cli would take
the first file in the directory that was not "." or "..".
Approved by: re (rwatson)
Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.
This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
Bugfix: all requests for creating vnets via vimage -c were always
reported as failures, even if the actual library / system call
would succeed, because error message would be reported if the return
value from jail_setv() call was >= 0, and if not, then if that same
value was < 0, i.e. always. The correct behavior is to abort (only)
if jail_setv() returns < 0.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Approved by: re (kib)
Fix a boot hang for hptrr(4) caused by changes introduced in r195534.
It is necessary to make sure cpi->transport is set for xpt_scan_bus() to
work properly.
Submitted by: Bernhard Schmidt (scb+freebsd-current <at> techwires
<dot> net)
Reviewed by: scottl
Approved by: re (kib)
Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
vimage(8) is a legacy CLI interface for managing jails associated with
network stack instances, which is provided for compatibility with
older applications. This change brings it back to life in a followup
to the initial conversion of vimage to use the new jail(4)
userland-kernel API:
- when creating vimages via "vimage -c", by default turn on a few
options expected by legacy applications, such as allow operations on
raw sockets, FS mounts etc, and allow jail-related parameters to be
optionally configured.
- introduce the "-m" modifier which allows for configuring jail
parameters of existing vimages / vnet-jails.
- make "vimage name command ..." actually work.
- when reassigning ifnets to vnets using "vimage -i", attempt to rename
the ifnet as "ethXXX" on arrival in the target vnet. Several legacy
applications are known to depend heavily on such behavior.
- vimage -l lists only jails associated with vnets. The output is
sorted using vimage / jail names as keys.
- vimage -l by default searches only the current level in the jail
hierarchy. Recursive listing can be requested via -r switch.
- vimage -l by default prints only jail names on each line, making
such output suitable for pipelining to other commands. More verbose
output can be obtained via -v switch, and even more jail specific
information will be displayed if -j switch is turned on.
- there's no need to build vimage as statically linked, so update the
Makefile accordingly.
- update the vimage.8 man page.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Rather than fix questionable ifnet list locking in the implementation of
the kern.polling.enable sysctl, remove the sysctl. It has been deprecated
since FreeBSD 6 in favour of per-ifnet polling flags.
Reviewed by: luigi
Approved by: re (kib)
Change the 'resid' parameter to sglist_consume_uio() from an int to a
size_t to match the recent type change of the uio_resid member of struct
uio.
Approved by: re (kib)
This affects only fstat on zfs and devfs, only on 64-bit systems
and only when fsid is greater than 2^31 - 1.
When fstat examines a file via stat(2) it takes uint32_t st_dev
and assigns to (signed) (64-bit) long fsid, this results in
a positive value.
When fstat examines opened files it takes int32_t f_fsid.val[0]
and assigns to (signed) (64-bit) long fsid, this results in
a negative value.
So, while initially st_dev and f_fsid.val[0] have the same bit
values they get promoted to different 64-bit values because
of the signed-vs-unsigned difference.
A fix is to use "more natural" positive numbers by introducing
intermediate unsigned cast for f_fsid.val[0].
Reviewed by: jhb, lulf
Approved by: re (kib)
MFC after: 1 week (to stable/7)
Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when
CARP tries to free them using M_IFADDR after the last address for a virtual
host is removed and when detaching from the parent interface.
Approved by: re (kib), ken (mentor)
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.
Reported by: Thomas Backman <serenity@exscape.org>
Reviewed by: kmacy
Approved by: re (kib)
Add support for backing up the old kernel when installing a new kernel
using freebsd-update. This applies to using freebsd-update in "upgrade
mode" and normal freebsd-update on a security branch.
The backup kernel will be written to /boot/kernel.old, if the directory
does not exist, or the directory was created by freebsd-update in a
previous backup. Otherwise freebsd-update will generate a new directory
name for use by the backup. By default symbol files are not backed up
to save diskspace and avoid filling up the root partition.
This feature is fully configurable in the freebsd-update config file,
but defaults to enabled.
Reviewed by: cperciva
Approved by: re (kib)
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.
Approved by: re@
Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.
A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.
Reviewed by: jfv, yongari
Approved by: re (kib)
Explicitly line up the CPU state labels with the calculated starting column
that takes into account the width of the largest CPU ID. On systems with
> 10 CPUs the labels for the first 10 CPUs were not lined up properly
otherwise.
Approved by: re (kib)
Remove the dependency on the kernel -- in particular the gctl request to
the GEOM_BSD class -- to translate the absolute offsets in the label to
relative ones. This makes bslabel(8) work correctly with GEOM_PART and
also when the BSD label is nested under arbitrary partitioning schemes.
Inspired by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Approved by: re (kib)
Fix USB cache sync operations for platforms with non-coherent DMA.
- usb_pc_cpu_invalidate() is called between [consecutive] reads from a device,
so a sequence of BUS_DMASYNC_POSTREAD and _PREREAD should be used. Note we
cannot use or'ed shorthand ( _POSTREAD | _PREREAD) for BUS_DMASYNC flags, as
the low level bus dma sync operation is implementation dependent and we
cannot assume the required order of operations to be guaranteed.
- usb_pc_cpu_flush() is called before writing to a device, so
BUS_DMASYNC_PREWRITE should be used.
Submitted by: Grzegorz Bernacki
Reviewed by: HPS, arm@, usb@ ML
Tested by: HPS, Mike Tancsa
Approved by: re (kib)
Obtained from: Semihalf
Small changes to the warning message generated by pty(4):
- Only print the warning once, instead of filling up the screen.
- Use the word "legacy" for the pty_warningcnt description, to prevent
confusion.
- Use log() instead of printf().
Discussed with: rwatson, jhb
Approved by: re (kib)
- Make note of the update of tzcode from 2004a to 2009h
Add an extra alert that people who update via source or via
freebsd-update will have to run the tzsetup(8) utility.
Approved by: re (Kostik)
If we cannot immediately get the pf_consistency_lock in the purge thread,
restart the scan after acquiring the lock the hard way. Otherwise we
might end up with a dead reference.
Approved by: re (kib)