In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues
Reviewed by: glebius
Approved by: ae(mentor)
MFC after: 2 weeks
- Document the following routines: pci_alloc_msi(), pci_alloc_msix(),
pci_find_cap(), pci_get_max_read_req(), pci_get_vpd_ident(),
pci_get_vpd_readonly(), pci_msi_count(), pci_msix_count(),
pci_pending_msix(), pci_release_msi(), pci_remap_msix(), and
pci_set_max_read_req().
- Group the functions into five sub-sections: raw configuration access,
locating devices, device information, device configuration, and
message signaled interrupts.
- Discourage use of pci_disable_io() and pci_enable_io() in device drivers.
The PCI bus driver handles this automatically as resources are activated.
MFC after: 2 weeks
pci_cfg_save() and pci_cfg_restore() for device drivers to use when
saving and restoring state (e.g. to handle device-specific resets).
Reviewed by: imp
MFC after: 2 weeks
I was considering adding it to libc as well, but last minute I thought
it would be good enough to add it to libkern exclusively. I forgot to
rename the man page and hook it up.
objects created by shm_open(2) into the kernel's address space. This
provides a convenient way for creating shared memory buffers between
userland and the kernel without requiring custom character devices.
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.
Reviewed by: nwhitehorn (earlier version), jhb
MFC after: 3 days
mod_cc.4 and mod_cc.9 respectively to avoid any possible confusion with the cc.1
gcc man page. Update references to these man pages where required.
Requested by: Grenville Armitage
Approved by: re (kib)
MFC after: 3 days
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.
That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks
wrapper around rman_adjust_resource(). Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus. There is currently no default implementation. A
bus_adjust_resource() wrapper is provided for use in drivers.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
existing resource. It only succeeds if the newly requested address
space is already free. It also supports shrinking a resource in
which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
start and end addresses for the first or last unallocated region in
an rman, respectively. This can be used to determine by how much
the resource backing an rman must be adjusted to accomodate an
allocation request that does not fit into the existing rman.
While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
FreeBSD" FreeBSD Foundation funded project.
- Add new man pages for the modular congestion control, Khelp and Hhook
frameworks (cc.4, cc.9, khelp.9 and hhook.9).
- Add new man pages for each available congestion control algorithm (cc_chd.4,
cc_cubic.4, cc_hd.4, cc_htcp.4, cc_newreno.4 and cc_vegas.4).
- Add a new man page for the Enhanced Round Trip Time (ERTT) Khelp module
(h_ertt.4).
- Update the TCP (tcp.4) man page to mention the TCP_CONGESTION socket option,
cross reference to cc.4 and remove references to the retired
"net.inet.tcp.newreno" sysctl MIB variable.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
MFC after: 3 months
reading. (This was already done for writing to a sysctl). This
requires all SYSCTL setups to specify a type. Most of them are now
checked at compile-time.
Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).
Succested by: bde
Although not directly related the FreeBSD Foundation funded "Five New TCP
Congestion Control Algorithms for FreeBSD" project, the understanding and
inspiration required to write this documentation was significantly bolstered
by the Foundation's support.
Reviewed by: pjd
MFC after: 1 week
svn r147332 (by jeff): "Don't make vgonel() globally visible".
While here, specify the vnode locking scheme for vgone().
Discussed on: freebsd-hackers@
Approved by: kib (mentor)
MFC after: 10 days
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
called when the sbuf internal buffer is filled. For kernel sbufs with a
drain, the internal buffer will never be expanded. For userland sbufs
with a drain, the internal buffer may still be expanded by
sbuf_[v]printf(3).
Sbufs now have three basic uses:
1) static string manipulation. Overflow is marked.
2) dynamic string manipulation. Overflow triggers string growth.
3) drained string manipulation. Overflow triggers draining.
In all cases the manipulation is 'safe' in that overflow is detected and
managed.
Reviewed by: phk (the previous version)
- add rm_try_rlock().
- add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while
holding the write lock.
- change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go
through the lock/unlock in order to synchronize. As a side effect, this
also avoids IPI to CPUs without any readers during rm_wlock.
Discussed with: ups@, rwatson@ on arch@
Sponsored by: Isilon Systems, Inc.
- chooseproc() is long gone, MLINK choosethread instead
- Update NAME section for choosethread
- Mark chooseproc.9 for removal
PR: 149549
Submitted by: pluknet
MFC after: 1 week
numbers. This change adds a new function alloc_unr_specific() which
returns the requested unit number if it is free. If the number is
already allocated or out of the range, -1 is returned.
Update alloc_unr(9) manual page accordingly and add a MLINK for
alloc_unr_specific(9).
Discussed on: freebsd-hackers
<sys/proc.h>.
- Add RETURN VALUES and ERROR sections for namei()'s error return values.
- Add a missing link to NDHASGIANT.9.
PR: docs/142815, docs/142816
Submitted by: Lachlan Kang (1, 2)
MFC after: 3 days
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().
Requested by: many
Reviewed by: scottl
MFC after: 2 weeks
the kthread_create(9) man page to the kproc(9) page as it had migrated and
people looking for it may need a hand to find its new name.
MFC after: 1 week
probe. The current device order is unchanged. This commit just adds the
infrastructure and ABI changes so that it is easier to merge later changes
into 8.x.
- Driver attachments now have an associated pass level. Attachments are
not allowed to probe or attach to drivers until the system-wide pass level
is >= the attachment's pass level. By default driver attachments use the
"last" pass level (BUS_PASS_DEFAULT). Driver's that wish to probe during
an earlier pass use EARLY_DRIVER_MODULE() instead of DRIVER_MODULE() which
accepts the pass level as an additional parameter.
- A new method BUS_NEW_PASS has been added to the bus interface. This
method is invoked when the system-wide pass level is changed to kick off
a rescan of the device tree so that drivers that have just been made
"eligible" can probe and attach.
- The bus_generic_new_pass() function provides a default implementation of
BUS_NEW_PASS(). It first allows drivers that were just made eligible for
this pass to identify new child devices. Then it propogates the rescan to
child devices that already have an attached driver by invoking their
BUS_NEW_PASS() method. It also reprobes devices without a driver.
- BUS_PROBE_NOMATCH() is only invoked for devices that do not have
an attached driver after being scanned during the final pass.
- The bus_set_pass() function is used during boot to raise the pass level.
Currently it is only called once during root_bus_configure() to raise
the pass level to BUS_PASS_DEFAULT. This has the effect of probing all
devices in a single pass identical to previous behavior.
Reviewed by: imp
Approved by: re (kib)
Each list describes a logical memory object that is backed by one or more
physical address ranges. To minimize locking, the sglist objects
themselves are immutable once they are shared.
These objects may be used in the future to facilitate I/O requests using
physically-addressed buffers. For the immediate future I plan to use them
to implement a new type of VM object and pager.
Reviewed by: jeff, scottl
MFC after: 1 month
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.
Reviewed by: rwatson@
Add support for kernel fault injection using KFAIL_POINT_* macros and
fail_point_* infrastructure. Add example fail point in vfs_bio.c to
simulate VM buf pressure.
Approved by: dfr (mentor)
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.
Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.
While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.
Approved by: bz (mentor)
- Document the minor(3), major(3) and makedev(3) macro's. They also
apply to umajor() and uminor() in the kernel, but hopefully we'll sort
that out one day.
- Briefly dev2unit() inside the make_dev(9) manual page, since this is
now the preferred macro to obtain character device unit numbers inside
the kernel.
- Remove the device_ids(9) manual page. It contains highly inaccurate
information, such as a description of the nonexistent major().
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.
This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.
This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by: rwatson
linker interfaces for looking up function names and offsets from
instruction pointers. Create two variants of each call: one that is
"DDB-safe" and avoids locking in the linker, and one that is safe for
use in live kernels, by virtue of observing locking, and in particular
safe when kernel modules are being loaded and unloaded simultaneous to
their use. This will allow them to be used outside of debugging
contexts.
Modify two of three current stack(9) consumers to use the DDB-safe
interfaces, as they run in low-level debugging contexts, such as inside
lockmgr(9) and the kernel memory allocator.
Update man page.
This commit includes the following core components:
* sample configuration file for sensorsd
* rc(8) script and glue code for sensorsd(8)
* sysctl(3) doc fixes for CTL_HW tree
* sysctl(3) documentation for hardware sensors
* sysctl(8) documentation for hardware sensors
* support for the sensor structure for sysctl(8)
* rc.conf(5) documentation for starting sensorsd(8)
* sensor_attach(9) et al documentation
* /sys/kern/kern_sensors.c
o sensor_attach(9) API for drivers to register ksensors
o sensor_task_register(9) API for the update task
o sysctl(3) glue code
o hw.sensors shadow tree for sysctl(8) internal magic
* <sys/sensors.h>
* HW_SENSORS definition for <sys/sysctl.h>
* sensors display for systat(1), including documentation
* sensorsd(8) and all applicable documentation
The userland part of the framework is entirely source-code
compatible with OpenBSD 4.1, 4.2 and -current as of today.
All sensor readings can be viewed with `sysctl hw.sensors`,
monitored in semi-realtime with `systat -sensors` and also
logged with `sensorsd`.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
Obtained from: OpenBSD (parts)
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
event. Locking primitives that support this (mtx, rw, and sx) now each
include their own foo_sleep() routine.
- Rename msleep() to _sleep() and change it's 'struct mtx' object to a
'struct lock_object' pointer. _sleep() uses the recently added
lc_unlock() and lc_lock() function pointers for the lock class of the
specified lock to release the lock while the thread is suspended.
- Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks
(rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and
is now identical to mtx_sleep(), but it is deprecated.
- Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP.
- Rewrite much of sleep.9 to not be msleep(9) centric.
- Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS'
section.
- Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will
warn if you try to pass a NULL wait channel. The functions already have
a KASSERT to that effect.
- Don't claim that the mutex is atomically reacquired when a cv_wait
routine returns. There's nothing atomic or magical about the lock
reacquire. The only magic is that we atomically drop the lock by
placing the thread on the sleep queue before dropping the lock.
- Markup sx_unlock() as a function rather than saying it is a macro.
The macro part is an implementation detail, and all the other sx_*lock()
functions are actually macros, too.
- Use the same style as rwlock(9) and mutex(9) to markup sx_assert() and
SX_SYSINIT() with respect to headers and kernel options.
- Add a missing MLINK.
want an equivalent of DELAY(9) that sleeps instead of spins. It accepts
a wmesg and a timeout and is not interrupted by signals. It uses a private
wait channel that should never be woken up by wakeup(9) or wakeup_one(9).
Glanced at by: phk
Approved by: gnn
Add a new function hashinit_flags() which allows NOT-waiting
for memory (or waiting). The old hashinit() function now
calls hashinit_flags(..., HASH_WAITOK);
revision 1.199
date: 2004/09/24 08:30:57; author: phk; state: Exp; lines: +0 -1
Remove the cdevsw() function which is now unused.
(the log is wrong, it was really devsw that was removed).
# we really need to actually document the functions in sys/conf.h as well
# as things like d_open...
privilege for threads and credentials. Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed. Two interfaces are provided, replacing the
existing suser(9) interface:
suser(td) -> priv_check(td, priv)
suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags)
A comprehensive list of currently available kernel privileges may be
found in priv.h. New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.
The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail. As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.
The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.
The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated. The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.
This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
pages existed only for the dynamic sysctl interfaces. There's probably
more complete and accurate content, better advice, etc, that could be added
here.
Per scottl's suggest, add a small piece of moralizing text regarding the
fact that sysctl names quickly get embedded in system configuration files,
libraries, third party applications, and even books, so renaming and
removing names after they've been published is a tricky issue.
MFC after: 1 month