that don't support superpages. This keeps the number of spans and internal
fragmentation lower.
- When the user asks for alignment from vmem_xalloc adjust the imported size
by 2*align to be certain we can satisfy the allocation. This comes at
the expense of potential failures when the backend can't supply enough
memory but could supply the requested size and alignment.
Sponsored by: EMC / Isilon Storage Division
for a very long time, if ever.
Should such a functionality ever be needed again the appropriate and
much better way to do it is through a custom EXT_SOMETHING external mbuf
type together with a dedicated *ext_free function.
Discussed with: trociny, glebius
When an existing process is provided, the thread selected to use
to initialize the new thread could have exited and be reaped.
Acquire the proc lock earlier to ensure the thread remains valid.
Reviewed by: jhb, julian (previous version)
MFC after: 3 days
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.
So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.
sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.
kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.
geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.
Remove the D_UNMAPPED_IO flag.
nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.
vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.
sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.
Reviewed by: kib, jimharris
MFC after: 1 week
Sponsored by: Spectra Logic
the order that they arrive, to holding
(a) granted write lock requests, followed by
(b) granted read lock requests, followed by
(c) ungranted requests, in order of arrival.
This changes the stopping condition for iterating through granted locks to
see if a new request can be granted: When considering a read lock request,
we can stop iterating as soon as we see a read lock request, since anything
after that point is either a granted read lock request or a request which
has not yet been granted. (For write lock requests, we must still compare
against all granted lock requests.)
For workloads with R parallel reads and W parallel writes, this improves
the time spent from O((R+W)^2) to O(W*(R+W)); i.e., heavy parallel-read
workloads become significantly more scalable.
No statistically significant change in buildworld time has been measured,
but synthetic tests of parallel 'dd > /dev/null' and 'openssl enc >/dev/null'
with the input file cached yield dramatic (up to 10x) improvement with high
(up to 128 processes) levels of parallelism.
Reviewed by: kib
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.
There is no functional change.
MFC after: 2 weeks
- Set NOTE_TRACKERR before running filt_proc(). If the knote did not
have NOTE_FORK set in fflags when registered, then the TRACKERR event
could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events. The special
handling for pids is done knote_fork() directly and no longer in
filt_proc().
MFC after: 2 weeks
probes declared in a kernel module when that module is unloaded. In
particular,
* Unloading a module with active SDT probes will cause a panic. [1]
* A module's (FBT/SDT) probes aren't destroyed when the module is unloaded;
trying to use them after the fact will generally cause a panic.
This change fixes both problems by porting the DTrace module load/unload
handlers from illumos and registering them with the corresponding
EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all
probes defined in a module when that module is unloaded, and to prevent a
module unload from proceeding if some of its probes are active. The latter
problem has already been fixed for FBT probes by checking lf->nenabled in
kern_kldunload(), but moving the check into the DTrace framework generalizes
it to all kernel providers and also fixes a race in the current
implementation (since a probe may be activated between the check and the
call to linker_file_unload()).
Additionally, the SDT implementation has been reworked to define SDT
providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT
to create and destroy SDT probes when a module is loaded or unloaded. This
simplifies things quite a bit since it means that pretty much all of the SDT
code can live in sdt.ko, and since it becomes easier to integrate SDT with
the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible
in that SDT providers spanning multiple modules can be created on the fly
when a module is loaded; at the moment it looks like illumos' SDT
implementation requires all SDT probes to be statically defined in a single
kernel table.
PR: 166927, 166926, 166928
Reported by: davide [1]
Reviewed by: avg, trociny (earlier version)
MFC after: 1 month
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.
Reviewed by: avg
MFC after: 2 weeks
is operational. init_sleepqueues() initializes 256 mutexes, which,
due to witness still being cold, started to overflow the pending_locks
array.
As stated in the reported panic message, increase WITNESS_PENDLIST
from 768 to 1024, which provides space for additional 256 locks.
Reported by: many
Tested by: rakuco, bdrewery
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.
In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.
vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl
Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls.
This helps in cases we want to narrow down to specific calls the
possibility to recurse for some locks.
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, alc
Tested by: pho
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.
Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl
match devices where the driver class was fixed but the unit number was
wildcarded. This better matches the documented behaviour in
DEVICE_PROBE(9).
Reviewed by: imp
if NOTE_EXIT is not being monitored. The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT. This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
knote without signalling it to userland and use this when a process
exits but the fflags in the knote is zero.
Reviewed by: jmg
MFC after: 1 month
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.
Before this patch is reinserted we need to break this ordering.
Sponsored by: EMC / Isilon storage division
Reported by: kib
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages
Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).
After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).
Fixing such primitive can bring to complete removal of the page hold
mechanism.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().
Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: pho
other than the one specified by the BOOTP server. This configures NFS
using the BOOTP protocol while also respecting other root-path options such
as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT
boot option. It allows you to override the root path provided by the
server, or to supply a root path when the server provides IP configuration
but no root path info.
This maintains the historical BOOTP_NFSROOT behavior of panicking on a
failure to mount the root path provided by the server, unless you've
provided an alternative via the ROOTDEVNAME kernel option or by setting
vfs.root.mountfrom. The behavior of panicking when given no other options
is preserved because it amounts to a bit of a retry loop that could
eventually recover from a transient network or server problem.
The user can now override the root path from loader(8) even if the
kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in
the environment it is used unconditionally -- it always overrides the
BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it
instead of the server-provided info. If it specifies some other
filesystem then the bootp code will not panic like it used to and the code
in vfs_mountroot.c will invoke the right filesystem to do the mount.
If the kernel is compiled with the ROOTDEVNAME option, then that name is
used by the BOOTP code if either
* The server doesn't provide a pathname.
* The boothowto flags include RB_DFLTROOT.
The latter allows the user to compile in alternate path in ROOTDEVNAME
such as ufs:/dev/da0s1a and boot from that path by setting
boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).
The one thing not provided here is automatic failover from a
server-provided path to a compiled-in one without the user manually
requesting that. The code just isn't currently structured in a way that
makes that possible with a lot of rewrite. I think the ability to set
vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server
doesn't provide a name covers the most common needs.
A set of patches submitted by Lars Eggert provided the part I couldn't
figure out by myself when I tried to do this last year; many thanks.
Reviewed by: rodrigc
must be destroyed, knlist_clear() and seldrain() calls could be
avoided, since vpollinfo was not used. More, the knlist_clear()
calling protocol requires the knlist locked, which is not true at the
call site.
Split the destruction into the helper destroy_vpollinfo_free(), and
call it when raced, instead of destroy_vpollinfo().
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
to those that are universally administered. While it is possible to
add locally administered MAC addresses, it's unclear whether those
are (expected) to be more unique than random multicast MAC addresses
or not.
With many U-Boot configurations assigning fixed and non-official MAC
addresses to ethernet ports and without setting the 'X' flag, this
change may have very little value in the embedded (development)
space. Uniqueness of the universally administered addresses is non-
existent on the (H/W) bench and questionable under the (S/W) desk.
In short: this change is aimed at production environments...
Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.
Rationale:
- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
invoked depending on its relative order with scheduler; this was not obvious
and the bug actually used to exist
Reviewed by: kib (ealier version)
MFC after: 14 days