1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.
Discussed with: phk
MFC after: 2 weeks
a da(4) instance going away while GEOM is still probing it.
In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.
While that event is queued, the da(4) instance goes away. When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.
The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up. This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.
scsi_cd.c,
scsi_da.c: In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().
Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.
In the cd(4) driver, clean up open and close
behavior slightly. GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.
In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.
geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.
geom_disk.h: Add a new d_gone() callback to the GEOM disk
interface.
Bump the DISK_VERSION to version 2. This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.
geom_disk.c: Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.
Bump the DISK_VERSION to 2.
geom_subr.c: In g_destroy_provider(), call the providergone
callback if it has been provided.
In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.
blkfront.c: Callers of disk_create() are supposed to pass in
DISK_VERSION, not an explicit disk API version
number. Update the blkfront driver to do that.
disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.
MFC after: 5 days
- Consistently refer to rmlocks as "read-mostly locks".
- Relate rmlocks to rwlocks rather than sx locks since they are closer to
rwlocks.
- Add a separate paragraph on sleepable read-mostly locks contrasting them
with "normal" read-mostly locks.
- The flag passed to rm_init_flags() to enable recursion for readers is
RM_RECURSE, not LO_RECURSABLE.
- Fix the description for RM_RECURSE (it allows readers to recurse, not
writers).
- Explicitly note that rm_try_rlock() honors RM_RECURSE.
- Fix some minor grammar nits.
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock
Add several forgotten assertions (thanks to adrian@)
Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.
Approved by: kib(mentor)
Tested by: gnn
MFC in: 4 weeks
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues
Reviewed by: glebius
Approved by: ae(mentor)
MFC after: 2 weeks
3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo *'' instead of
``sockaddr *''. While here, un-document RTM_RESOLVE cmd argument for
ifa_rtrequest() that became a stub after separating L2 tables in r186119.
MFC after: 1 week
- Document the following routines: pci_alloc_msi(), pci_alloc_msix(),
pci_find_cap(), pci_get_max_read_req(), pci_get_vpd_ident(),
pci_get_vpd_readonly(), pci_msi_count(), pci_msix_count(),
pci_pending_msix(), pci_release_msi(), pci_remap_msix(), and
pci_set_max_read_req().
- Group the functions into five sub-sections: raw configuration access,
locating devices, device information, device configuration, and
message signaled interrupts.
- Discourage use of pci_disable_io() and pci_enable_io() in device drivers.
The PCI bus driver handles this automatically as resources are activated.
MFC after: 2 weeks
not ACPI-specific at all, but deal with PCI power states. Also,
pci_set_powerstate() fails with EOPNOTSUPP if a request is made that the
underlying device does not support rather than falling back to somehow
setting D0.
pci_cfg_save() and pci_cfg_restore() for device drivers to use when
saving and restoring state (e.g. to handle device-specific resets).
Reviewed by: imp
MFC after: 2 weeks
The reasoning behind this, is that if we are consistent in our
documentation about the uint*_t stuff, people will be less tempted to
write new code that uses the non-standard types.
I am not going to bump the man page dates, as these changes can be
considered style nits. The meaning of the man pages is unaffected.
MFC after: 1 month
These functions take a `struct cdev *' -- not a dev_t. Inside the
kernel, dev_t has the same use as in userspace, namely to store a device
identifier.
MFC after: 2 weeks
I was considering adding it to libc as well, but last minute I thought
it would be good enough to add it to libkern exclusively. I forgot to
rename the man page and hook it up.
It seems two of the file system drivers we have in the tree, namely ufs
and ext3, use a function called `skpc()'. The meaning of this function
does not seem to be documented in FreeBSD, but it turns out one needs to
be a VAX programmer to understand what it does.
SPKC is an instruction on the VAX that does the opposite of memchr(). It
searches for the non-equal character. Add a new function called
memcchr() to the tree that has the following advantages over skpc():
- It has a name that makes more sense than skpc(). Just like strcspn()
matches the complement of strspn(), memcchr() is the complement of
memchr().
- It is faster than skpc(). Similar to our strlen() in libc, it compares
entire words, instead of single bytes. It seems that for this routine
this yields a sixfold performance increase on amd64.
- It has a man page.
objects created by shm_open(2) into the kernel's address space. This
provides a convenient way for creating shared memory buffers between
userland and the kernel without requiring custom character devices.
and DEVMETHOD() we can fully hide the explicit mention of kobj(9) from
device drivers.
- Update the example in driver.9 to use DEVMETHOD_END.
Submitted by: jhb
MFC after: 3 days
This enables locking consumers to pass their own structures around as const and
be able to assert locks embedded into those structures.
Reviewed by: ed, kib, jhb
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.
Reviewed by: nwhitehorn (earlier version), jhb
MFC after: 3 days
document knlist_delete, and better document what knlist_clear does... Note
that both of these functions may sleep, and also unlock/relock the list
lock...
document knlist_init_mtx (forgotten by kib)...
other minor improvements
Reviewed by: ru (previous rev)
MFC after: 1 week
mod_cc.4 and mod_cc.9 respectively to avoid any possible confusion with the cc.1
gcc man page. Update references to these man pages where required.
Requested by: Grenville Armitage
Approved by: re (kib)
MFC after: 3 days
If it overflows before the taskqueue can run, the task will be
re-added to the taskqueue and cause a loop in the task list.
Reported by: Arnaud Lacombe <lacombar@gmail.com>
Submitted by: Ryan Stone <rysto32@gmail.com>
Reviewed by: jhb
Approved by: re (kib)
MFC after: 1 day
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.
That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks
to be assigned to a non-default FIB instance.
You may need to recompile world or ports due to the change of struct ifnet.
Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet
wrapper around rman_adjust_resource(). Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus. There is currently no default implementation. A
bus_adjust_resource() wrapper is provided for use in drivers.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
existing resource. It only succeeds if the newly requested address
space is already free. It also supports shrinking a resource in
which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
start and end addresses for the first or last unallocated region in
an rman, respectively. This can be used to determine by how much
the resource backing an rman must be adjusted to accomodate an
allocation request that does not fit into the existing rman.
While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
While there, fix the type of the func argument of INIT_TASK macro,
and use the modern name of the analogous facility from Linux kernel.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
FreeBSD" FreeBSD Foundation funded project.
- Add new man pages for the modular congestion control, Khelp and Hhook
frameworks (cc.4, cc.9, khelp.9 and hhook.9).
- Add new man pages for each available congestion control algorithm (cc_chd.4,
cc_cubic.4, cc_hd.4, cc_htcp.4, cc_newreno.4 and cc_vegas.4).
- Add a new man page for the Enhanced Round Trip Time (ERTT) Khelp module
(h_ertt.4).
- Update the TCP (tcp.4) man page to mention the TCP_CONGESTION socket option,
cross reference to cc.4 and remove references to the retired
"net.inet.tcp.newreno" sysctl MIB variable.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
MFC after: 3 months
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
may still return a non-zero value... You are not guaranteed to get a one
to one mapping between wakeup_one and zero return values...
Reviewed by: kib
MFC after: 3 days
reading. (This was already done for writing to a sysctl). This
requires all SYSCTL setups to specify a type. Most of them are now
checked at compile-time.
Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).
Succested by: bde
Although not directly related the FreeBSD Foundation funded "Five New TCP
Congestion Control Algorithms for FreeBSD" project, the understanding and
inspiration required to write this documentation was significantly bolstered
by the Foundation's support.
Reviewed by: pjd
MFC after: 1 week
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.
PR: kern/80980
Discussed with: jhb
only happen if VOP_INACTIVATE() drops the vnode lock, which is quite
unreasonable behaviour for filesystem, and should not be mentioned
in the description of VFS primitives.
MFC after: 1 week
svn r147332 (by jeff): "Don't make vgonel() globally visible".
While here, specify the vnode locking scheme for vgone().
Discussed on: freebsd-hackers@
Approved by: kib (mentor)
MFC after: 10 days
It's a bit more pedantic regarding .Bl list elements. This has an added
benefit of unbreaking the ipfw(8) manpage, where groff was silently
skipping one list element.
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks
A new function prep_devname() sanitizes a device name by removing
leading and redundant sequential slashes. The function returns an error
for names which already exist or are considered invalid.
A new flag MAKEDEV_CHECKNAME for make_dev_p(9) and make_dev_credf(9)
indicates that the caller is prepared to handle an error related to the
device name. An invalid name triggers a panic if the flag is not
specified.
Document the MAKEDEV_CHECKNAME flag in the make_dev(9) manual page.
Idea from: kib
Reviewed by: kib
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
called when the sbuf internal buffer is filled. For kernel sbufs with a
drain, the internal buffer will never be expanded. For userland sbufs
with a drain, the internal buffer may still be expanded by
sbuf_[v]printf(3).
Sbufs now have three basic uses:
1) static string manipulation. Overflow is marked.
2) dynamic string manipulation. Overflow triggers string growth.
3) drained string manipulation. Overflow triggers draining.
In all cases the manipulation is 'safe' in that overflow is detected and
managed.
Reviewed by: phk (the previous version)
- add rm_try_rlock().
- add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while
holding the write lock.
- change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go
through the lock/unlock in order to synchronize. As a side effect, this
also avoids IPI to CPUs without any readers during rm_wlock.
Discussed with: ups@, rwatson@ on arch@
Sponsored by: Isilon Systems, Inc.
- chooseproc() is long gone, MLINK choosethread instead
- Update NAME section for choosethread
- Mark chooseproc.9 for removal
PR: 149549
Submitted by: pluknet
MFC after: 1 week
use-after-free over a longer time. Also release the backing pages of
a guarded allocation at free(9) time to reduce the overhead of using
memguard(9). Allow setting and varying the malloc type at run-time.
Add knobs to allow:
- randomly guarding memory
- adding un-backed KVA guard pages to detect underflow and overflow
- a lower limit on the size of allocations that are guarded
Reviewed by: alc
Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page)
Silence from: -arch
Approved by: zml (mentor)
MFC after: 1 month
taskqueues, more than one task can be running simultaneously.
Also make taskqueue_run(9) static to the file, since there are no
consumers in the base kernel and the function signature needs to change
with this fix.
Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the
taskqueue(9) man page.
Reviewed by: jhb
Approved by: zml (mentor)
numbers. This change adds a new function alloc_unr_specific() which
returns the requested unit number if it is free. If the number is
already allocated or out of the range, -1 is returned.
Update alloc_unr(9) manual page accordingly and add a MLINK for
alloc_unr_specific(9).
Discussed on: freebsd-hackers
document one of the optional flags; clarify which of the flags are
optional (and which are not), and remove mention of a restriction on
the reclamation of cached pages that no longer holds since version 7.
MFC after: 1 week
than camelCase or TitleCase.
According to grep and my checked-out source tree, we're currently at
3733379 internal_underscores, 93024 camelCases, and 80831 TitleCases;
so this commit is merely documenting existing practice.
bottom of the manpages and order them consistently.
GNU groff doesn't care about the ordering, and doesn't even mention
CAVEATS and SECURITY CONSIDERATIONS as common sections and where to put
them.
Found by: mdocml lint run
Reviewed by: ru
things allows variable length messages to be easily supported.
- Extend KPI with alq_writen() and alq_getn() to support variable length
messages, which is enabled at ALQ creation time depending on the
arguments passed to alq_open(). Also add variants of alq_open() and
alq_post() that accept a flags argument. The KPI is still fully
backwards compatible and shouldn't require any change in ALQ consumers
unless they wish to utilise the new features.
- Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers
to have more control over IO scheduling and resource acquisition
respectively.
- Strengthen invariants checking.
- Document ALQ changes in ALQ(9) man page.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jeff, rpaulo, rwatson
MFC after: 1 month
Although groff_mdoc(7) gives another impression, this is the ordering
most widely used and also required by mdocml/mandoc.
Reviewed by: ru
Approved by: philip, ed (mentors)
In the current code, the locking is completely broken and may lead
easilly to deadlocks. Fix it by using the proc_mtx, linked to the
suspending thread, as lock for the operation. Keep using the
thread_lock for setting and reading the flag even if it is not entirely
necessary (atomic ops may do it as well, but this way the code is more
readable).
- Fix a deadlock within kthread_suspend().
The suspender should not sleep on a different channel wrt the suspended
thread, or, otherwise, the awaker should wakeup both. Uniform the
interface to what the kproc_* counterparts do (sleeping on the same
channel).
- Change the kthread_suspend_check() prototype.
kthread_suspend_check() always assumes curthread and must only refer to
it, so skip the thread pointer as it may be easilly mistaken.
If curthread is not a kthread, the system will panic.
In collabouration with: jhb
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
MFC: 2 weeks
if_initname().
- Document if_drv_flags and replace references to IFF_(RUNNING|OACTIVE)
with references to IFF_DRV_(RUNNING|OACTIVE).
- Complete truncated sentence in the description of if_transmit by copying
from the description in if_qflush.
- Add missing line breaks for translators.
Reviewed by: brooks (1)
MFC after: 3 days
<sys/proc.h>.
- Add RETURN VALUES and ERROR sections for namei()'s error return values.
- Add a missing link to NDHASGIANT.9.
PR: docs/142815, docs/142816
Submitted by: Lachlan Kang (1, 2)
MFC after: 3 days
While the name is pretentious, a good explanation of its targets is
reported in this 17 months old presentation e-mail:
http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html
In order to implement it, the sq_type in sleepqueues is mandatory and not
only compiled along with INVARIANTS option. Additively, a new sleepqueue
function, sleepq_type() is added, returning the type of the sleepqueue
linked to a wchan.
Three new sysctls are added in order to configure the thread:
debug.deadlkres.slptime_threshold
debug.deadlkres.blktime_threshold
debug.deadlkres.sleepfreq
rappresenting the thresholds for sleep and block time that will lead to
a deadlock matching (when exceeded), while the sleepfreq rappresents the
number of seconds between 2 consecutive thread runnings.
In order to enable the deadlock resolver thread recompile your kernel
with the option DEADLKRES.
Reviewed by: jeff
Tested by: pho, Giovanni Trematerra
Sponsored by: Nokia Incorporated, Sandvine Incorporated
MFC after: 2 weeks
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).
In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.
The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.
Reported by: avg, kib, pho
Reviewed by: kib
Tested by: pho
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().
Requested by: many
Reviewed by: scottl
MFC after: 2 weeks
the kthread_create(9) man page to the kproc(9) page as it had migrated and
people looking for it may need a hand to find its new name.
MFC after: 1 week
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Requested by: np (3)
Approved by: re (kib)
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.
Approved by: re (kib)
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
- update for getrlimit(2) manpage;
- support for setting RLIMIT_SWAP in login class;
- addition to the limits(1) and sh and csh limit-setting builtins;
- tuning(7) documentation on the sysctls controlling overcommit.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
Actually, as it did receive few tuning, the support is disabled by
default, but it can opt-in with the option ADAPTIVE_LOCKMGRS.
Due to the nature of lockmgrs, adaptive spinning needs to be
selectively enabled for any interested lockmgr.
The support is bi-directional, or, in other ways, it will work in both
cases if the lock is held in read or write way. In particular, the
read path is passible of further tunning using the sysctls
debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls
should be axed or compiled out before release.
Addictionally note that adaptive spinning doesn't cope well with
LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers
of LK_SLEEPFAIL are mainly interested in knowing if the interlock was
dropped or not in order to reacquire it and re-test initial conditions.
This directly interacts with adaptive spinning because lockmgr needs
to drop the interlock while spinning in order to avoid a deadlock
(further details in the comments inside the patch).
Final note: finding someone willing to help on tuning this with
relevant workloads would be either very important and appreciated.
Tested by: jeff, pho
Requested by: many
queue was drained. It will never fire for a directly dispatched packet.
You will most likely never want to use this for any ordinary netisr usage
and you will never blame netisr in case you try to use it and it does
not work as expected.
Reviewed by: rwatson
probe. The current device order is unchanged. This commit just adds the
infrastructure and ABI changes so that it is easier to merge later changes
into 8.x.
- Driver attachments now have an associated pass level. Attachments are
not allowed to probe or attach to drivers until the system-wide pass level
is >= the attachment's pass level. By default driver attachments use the
"last" pass level (BUS_PASS_DEFAULT). Driver's that wish to probe during
an earlier pass use EARLY_DRIVER_MODULE() instead of DRIVER_MODULE() which
accepts the pass level as an additional parameter.
- A new method BUS_NEW_PASS has been added to the bus interface. This
method is invoked when the system-wide pass level is changed to kick off
a rescan of the device tree so that drivers that have just been made
"eligible" can probe and attach.
- The bus_generic_new_pass() function provides a default implementation of
BUS_NEW_PASS(). It first allows drivers that were just made eligible for
this pass to identify new child devices. Then it propogates the rescan to
child devices that already have an attached driver by invoking their
BUS_NEW_PASS() method. It also reprobes devices without a driver.
- BUS_PROBE_NOMATCH() is only invoked for devices that do not have
an attached driver after being scanned during the final pass.
- The bus_set_pass() function is used during boot to raise the pass level.
Currently it is only called once during root_bus_configure() to raise
the pass level to BUS_PASS_DEFAULT. This has the effect of probing all
devices in a single pass identical to previous behavior.
Reviewed by: imp
Approved by: re (kib)
Each list describes a logical memory object that is backed by one or more
physical address ranges. To minimize locking, the sglist objects
themselves are immutable once they are shared.
These objects may be used in the future to facilitate I/O requests using
physically-addressed buffers. For the immediate future I plan to use them
to implement a new type of VM object and pager.
Reviewed by: jeff, scottl
MFC after: 1 month
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.
Reviewed by: rwatson@
- Add rm_init_flags() and accept extended options only for that variation.
- Add a flags space specifically for rm_init_flags(), rather than borrowing
the lock_init() flag space.
- Define flag RM_RECURSE to use instead of LO_RECURSABLE.
- Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS
checking; this wasn't possible previously as rm_init() always passed
LO_WITNESS when initializing an rmlock's struct lock.
- Add RM_SYSINIT_FLAGS().
- Rename embedded mutex in rmlocks to make it more obvious what it is.
- Update consumers.
- Update man page.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
Add support for kernel fault injection using KFAIL_POINT_* macros and
fail_point_* infrastructure. Add example fail point in vfs_bio.c to
simulate VM buf pressure.
Approved by: dfr (mentor)