Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.
Rationale:
- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
invoked depending on its relative order with scheduler; this was not obvious
and the bug actually used to exist
Reviewed by: kib (ealier version)
MFC after: 14 days
addresses added to the UUID generator using uuid_ether_add(). The
UUID generator keeps an arbitrary number of MAC addresses, under
the assumption that they are rarely removed (= uuid_ether_del()).
This achieves the following:
1. It brings up closer to having the network stack as a loadable
module.
2. It allows the UUID generator to filter MAC addresses for best
results (= highest chance of uniqeness).
3. MAC addresses can come from anywhere, irrespactive of whether
it's used for an interface or not.
A side-effect of the change is that when no MAC addresses have been
added, a random multicast MAC address is created once and re-used if
needed. Previusly, when a random MAC address was needed, it was
created for every call. Thus, a change in behaviour is introduced
for when no MAC addresses exist.
Obtained from: Juniper Networks, Inc.
for consumption outside the vfs_aio.c.
For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.
Tested by: Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
vm_map_find() that will try to alter the alignment of a mapping to match
any existing superpage mappings of the object being mapped. If no
suitable address range is found with the necessary alignment,
vm_map_find() will fall back to using the simple first-fit strategy
(VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to
use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.
Reviewed by: alc (earlier version)
MFC after: 2 weeks
Now that r253351 moved sendfile() stats to a separate struct, the
last field used in mbstat is m_mcfail, which is updated, but never
read or obtained from userland.
Submitted by: adrian, zec
Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.
(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.
(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.
(3) Use VNET_DOMAIN_SET() in ng_btsocket.c
(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.
error if any user wired mappings exist. Doing the invalidation
destroys the user wiring.
The change is the temporal measure to close the bug, the more proper
fix is to delegate the invalidation of the page to upper layers
always.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.
This change differs from r253224 in the following way:
o The vfs_mounted handler is called before mountcheckdirs() and with
newdp locked. vp is unlocked.
o The event handlers are declared in <sys/eventhandler.h> and not in
<sys/mount.h>. The <sys/mount.h> header is used in user land code
that pretends to be kernel code and as such creates a very convoluted
environment. It's hard to untangle.
Submitted by: stevek@juniper.net
Discussed with: pjd@
Obtained from: Juniper Networks, Inc.
vfs_busy(mp);
vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls. The unmount starts a write, while vfs_write_suspend() drain
writers. On the other hand, unmount drains busy references, causing
the deadlock.
Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked. The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.
Reported and tested by: Andreas Longwitz <longwitz@incore.de>
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
The distance between ticks and td_swvoltick should be calculated as
an unsigned number. Previously we could end up comparing a negative
number with hogticks in which case should_yield() would give incorrect
answer.
We should probably ensure that td_swvoltick is properly initialized.
Sponsored by: HybridCluster
MFC after: 5 days
/dev/kmem and /dev/mem (in addition to traditional file permission checks).
PRIV_KMEM_READ is different from other PRIV_* checks in that it's allowed
by default.
Reviewed by: kib, mckusick
in the ithread code where we could lose ithread interrupts if
intr_event_schedule_thread() is called while the ithread is already
running. Effectively memory writes could be ordered incorrectly
such that the unatomic write of 0 to ithd->it_need (in ithread_loop)
could be delayed until after it was set to be triggered again by
intr_event_schedule_thread().
This was particularly a big problem for CAM because CAM optimizes
scheduling of ithreads such that it only signals camisr() when it
queues to an empty queue. This means that additional completion
events would not unstick CAM and quickly lead to a complete lockup
of the CAM stack.
To fix this use atomics in all places where ithd->it_need is accessed.
Submitted by: delphij, mav
Obtained from: TrueOS, iXsystems
MFC After: 1 week
If n fds were passed, it would receive the first one n times.
Reported by: Shawn Webb <lattera@gmail.com>, koobs, gleb
Tested by: koobs, gleb
Reviewed by: pjd
Issues were noted by Bruce Evans and are present on all architectures.
On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.
On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.
PowerPC64 is treated the same as amd64.
For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.
Noted by: bde
Sponsored by: The FreeBSD Foundation
instead of allocating new one each time
All limits are set to RLIM_INFINITY which sould be ok (even though we
care only about RLIMT_FSIZE in this case).
MFC after: 1 week
- Reconnect with some minor modifications, in particular now selsocket()
internals are adapted to use sbintime units after recent'ish calloutng
switch.
The filedesc lock may not be dropped unconditionally before exporting
fd to sbuf: fd might go away during execution. While it is ok for
DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode
here, for other types it is unsafe.
Instead, drop the lock in export_fd_to_sb(), after preparing data in
memory and before writing to sbuf.
Spotted by: mjg
Suggested by: kib
Review by: kib
MFC after: 1 week
deadlkres was using a reversed test to check whether ticks had rolled over.
This meant that deadlkres could only fire after ticks had rolled over.
This test was actually unnecessary as deadlkres only ever took the
difference of ticks values which is safe even in the presence of ticks
rollover. Remove the tests entirely. Now deadlkres will properly fire
after a lock has been held after the timeout period.
MFC after: 1 month
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.
Discussed with: alc, kib, attilio
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
bug where a PCI device would be powered down if it failed to probe, but
not when its driver was detached (e.g. via kldunload).
- Add a new helper method resource_list_release_active() which forcefully
releases any active resources of a specified type from a resource list.
- Add a bus_child_detached method for the PCI bus driver which forces any
active resources to be released (and whines to the console if it finds
any) and then powers the device down.
- Call pci_child_detached() if we fail to probe a device when a driver
is kldloaded. This isn't perfect but can avoid leaking resources
from a probe() routine in the kldload case.
Reviewed by: imp, brooks
MFC after: 1 month
- Call lock_init() first before setting any lock_object fields in
lock init routines. This way if the machine panics due to a duplicate
init the lock's original state is preserved.
- Somewhat similarly, don't decrement td_locks and td_slocks until after
an unlock operation has completed successfully.
provided by Isilon.
- Add an rm_assert() supporting various lock assertions similar to other
locking primitives. Because rmlocks track readers the assertions are
always fully accurate unlike rw_assert() and sx_assert().
- Flesh out the lock class methods for rmlocks to support sleeping via
condvars and rm_sleep() (but only while holding write locks), rmlock
details in 'show lock' in DDB, and the lc_owner method used by
dtrace.
- Add an internal destroyed cookie so that API functions can assert
that an rmlock is not destroyed.
- Make use of rm_assert() to add various assertions to the API (e.g.
to assert locks are held when an unlock routine is called).
- Give RM_SLEEPABLE locks their own lock class and always use the
rmlock's own lock_object with WITNESS.
- Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
while holding a read lock on an rmlock.
Submitted by: andre
Obtained from: EMC/Isilon
mbuf that was fully consumed by the previous call, the mbuf ptr returned by the
current call ends up being the previous mbuf in the sb chain to the one that
contains the data we want.
This does not cause any observable issues because the mbuf copy routines happily
walk the mbuf chain to get to the data at the moff offset, which in this case
means they effectively skip over the mbuf returned by sbsndptr().
We can't adjust sb->sb_sndptr during the previous call for this case because the
next mbuf in the chain may not exist yet. We therefore need to detect the
condition and make the adjustment during the current call.
Fix by detecting the special case of moff being at the start of the next mbuf in
the chain and adjust the required accounting variables accordingly.
Reviewed by: andre
MFC after: 2 weeks
star for me. EVENTHANDLER_DEREGISTER() attempts to acquire the lock which is
held by the event handler framework while executing event handler functions,
leading to deadlock.
Move EVENTHANDLER_DEREGISTER() to alq_load_handler() and thus deregister the ALQ
shutdown_pre_sync handler at module unload time, which takes care of the
originally reported panic and fixes the deadlock introduced in r250951.
Reported by: Luiz Otavio O Souza
MFC after: 3 days
X-MFC with: 250951