causing a panic if it is killed due to a unsolved stack overflow
seen very late during shutdown on sparc64 when the gmirror worker
process exists, which is a regression introduced in 8.0.
Reviewed by: kib
Approved by: re (rwatson)
Fix a LOR between allprison_lock and vnode locks by releasing
allprison_lock before releasing a prison's root vnode.
PR: kern/138004
Reviewed by: kib
Approved by: re (rwatson), bz (mentor)
Fix a few panics in linuxulator + VIMAGE due to curvnet not being set.
This change affects only options VIMAGE builds.
Reviewed by: julian
Approved by: re (rwatson)
Introduce a separate sx lock for protecting lists of vnet sysinit
and sysuninit handlers.
Previously, sx_vnet, which is a lock designated for protecting
the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
handler lists as well. Holding exclusively the sx_vnet lock while
invoking sysinit and / or sysuninit handlers turned out to be
problematic, since some of the handlers may attempt to wake up
another thread and wait for it to walk over the vnet list, hence
acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
Protecting vnet sysinit / sysuninit lists with a separate lock
mitigates this issue, which was first observed with
flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.
Reviewed by: rwatson, jhb
MFC after: 3 days
Approved by: re (rwatson)
Prefix on-link verification is being performed on statically
configured prefixes. Since these statically configured prefixes
do not have any associated advertising routers, these prefixes
are treated as unreachable and those prefix routes are deleted
from the routing table. Therefore bypass prefixes that are not
learned from router advertisements during prefix on-link check.
Reviewed by: hrs
Approved by: re
In ip_output(), the flow-table module must not try to cache L2/L3
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.
When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.
Reviewed by: kmacy, julian, rwatson
Approved by: re
Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.
Reviewed by: kmacy, rwatson
Approved by: re
When multiple interfaces exist in the system, with each interface having
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.
Reviewed by: bz, gnn, kmacy
Approved by: re
It is possible for all the kthreads to exit (hci modules unloaded) which in
turn ends our usb process. This means the proc pointer becomes invalid and will
panic if a new kthread is added. Count the number of threads and clear the proc
pointer on the last one.
Approved by: re (kib)
Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.
Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().
Reviewed by: bz
Approved by: re (kib)
Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.
Reviewed by: bz
Approved by: re (kib)
Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy, qingli
Approved by: re (kib)
Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.
Approved by: re (kib)
Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.
Submitted by: pjd
Reviewed by: {krw,sthen,markus}@openbsd.org
Approved by: re (kib)
Rather than using IFNET_RLOCK() when iterating over (and modifying) the
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.
This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.
Discussed with: bz, julian
Reported by: phk
Approved by: re (kib)
Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
Approved by: re (kib)
When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When "jail -c vnet" request fails, the current code actually creates and
leaves behind an orphaned vnet. This change ensures that such vnets get
released.
This change affects only options VIMAGE builds.
Submitted by: jamie
Discussed with: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In
nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
handler, resulting in effectively identical operation as it was prior this
change. div_destroy() also tears down hashtables used by ipdivert, which
were previously left behind on ipdivert kldunloads.
For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
because without introducing additional locking logic it is impossible to
atomically check whether all ipdivert instances in all vnets are idle, and
proceed with cleanup without opening a race window for a vnet to open an
ipdivert socket while ipdivert tear-down is in progress.
While here, staticize div_init(), because it is not used outside of
ip_divert.c.
In cooperation with: julian
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When registering a protocol to an existing protocol domain via
pf_proto_register(), iterate over all existing vnets to call protosw_init()
and thus the appropriate .pr_init() handler in the context of each vnet.
NB in the future we probably want to separate pr_init() handlers into
two, i.e. per-vnet and global, functions.
This change has no impact on nooptions VIMAGE builds.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't try to power down PHY when alc(4) failed to map the device.
This fixes system crash when mapping alc(4) device failed in device
attach.
Reported by: Jim < stapleton.41 <> gmail DOT com >
Approved by: re (kib)
Add RTL8168DP/RTL8111DP device id. While I'm here append "8111D" to
the description of RTL8168D as RL_HWREV_8168D can be either
RTL8168D or RTL8111D.
PR: kern/137672
Approved by: re (kib)
Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
Approved by: re (kensmith)
Tweak the way that the ACPI and ISA bus drivers match hint devices to
BIOS-enumerated devices:
- Assume a device is a match if the memory and I/O ports match even if the
IRQ or DRQ is wrong or missing. Some BIOSes don't include an IRQ for
the atrtc device for example.
- Add a hack to better match floppy controller devices. Many BIOSes do not
include the starting port of the floppy controller listed in the hints
(0x3f0) in the resources for the device. So far, however, all the BIOS
variations encountered do include the 'port + 2' resource (0x3f2), so
adjust the matching for "fdc" devices to look for 'port + 2'.
Approved by: re (kib)
The svnversion string is only relevant when newvers.sh is called
during the kernel build process, the other places that call the
script do not make use of that information. So restrict execution
of the svnversion-related code to the kernel build context.
Approved by: re (kib)
Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.
This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Approved by: re (kib)
Fix a boot hang for hptrr(4) caused by changes introduced in r195534.
It is necessary to make sure cpi->transport is set for xpt_scan_bus() to
work properly.
Submitted by: Bernhard Schmidt (scb+freebsd-current <at> techwires
<dot> net)
Reviewed by: scottl
Approved by: re (kib)
Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
Rather than fix questionable ifnet list locking in the implementation of
the kern.polling.enable sysctl, remove the sysctl. It has been deprecated
since FreeBSD 6 in favour of per-ifnet polling flags.
Reviewed by: luigi
Approved by: re (kib)
Change the 'resid' parameter to sglist_consume_uio() from an int to a
size_t to match the recent type change of the uio_resid member of struct
uio.
Approved by: re (kib)
Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when
CARP tries to free them using M_IFADDR after the last address for a virtual
host is removed and when detaching from the parent interface.
Approved by: re (kib), ken (mentor)
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.
Reported by: Thomas Backman <serenity@exscape.org>
Reviewed by: kmacy
Approved by: re (kib)
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.
Approved by: re@
Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.
A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.
Reviewed by: jfv, yongari
Approved by: re (kib)
Fix USB cache sync operations for platforms with non-coherent DMA.
- usb_pc_cpu_invalidate() is called between [consecutive] reads from a device,
so a sequence of BUS_DMASYNC_POSTREAD and _PREREAD should be used. Note we
cannot use or'ed shorthand ( _POSTREAD | _PREREAD) for BUS_DMASYNC flags, as
the low level bus dma sync operation is implementation dependent and we
cannot assume the required order of operations to be guaranteed.
- usb_pc_cpu_flush() is called before writing to a device, so
BUS_DMASYNC_PREWRITE should be used.
Submitted by: Grzegorz Bernacki
Reviewed by: HPS, arm@, usb@ ML
Tested by: HPS, Mike Tancsa
Approved by: re (kib)
Obtained from: Semihalf
Small changes to the warning message generated by pty(4):
- Only print the warning once, instead of filling up the screen.
- Use the word "legacy" for the pty_warningcnt description, to prevent
confusion.
- Use log() instead of printf().
Discussed with: rwatson, jhb
Approved by: re (kib)