least significant cpuset_t word at the outmost right part of the string
(more far from the beginning of it). This follows the natural build of
bits rappresentation in the words.
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
and destroy_devl() drops dev_mtx. The protection against the race
with dev_rel(), introduced in r163328, should be extended to cover
destroy_devl() calls for the children of the destroyed dev.
Reported and tested by: joerg
MFC after: 1 week
- Remove the following sysctl:
kern.sched.ipiwakeup.onecpu
kern.sched.ipiwakeup.htt2
Because they are absolutely obsolete. Probabilly the whole wakeup
forward mechanism should be revisited for a better fitting in modern
hw, in the future.
- As map2 variable is no longer used rename map3 to map2
- Fix a string by making more informative the msg and removing the
arguments passing.
Reviewed by: julian
Tested by: several
last CPU to to finish the rendezvous action may become visible to
different CPUs at different times. As a result, the CPU that initiated
the rendezvous may exit the rendezvous and drop the lock allowing another
rendezvous to be initiated on the same CPU or a different CPU. In that
case the exit sentinel may be cleared before all CPUs have noticed causing
those CPUs to hang forever.
Workaround this by using a generation count to notice when this race
occurs and to exit the rendezvous in that case.
The problem was independently diagnosted by mlaier@ and avg@ as well.
Submitted by: neel
Reviewed by: avg, mlaier
Obtained from: NetApp
MFC after: 1 week
is no relevant difference for sbufs, and it increases portability of
the source code.
Split the actual initialization of the sbuf into a separate local
function, so that certain static code checkers can understand
what sbuf_new() does, thus eliminating on silly annoyance of
MISRA compliance testing.
Contributed by: An anonymous company in the last business I
expected sbufs to invade.
choice of default size in the first place)
Reverse the order of arguments to the internal static sbuf_put_byte()
function to match everything else in this file.
Move sbuf_putc_func() inside the kernel version of sbuf_vprintf
where it belongs.
sbuf_putc() incorrectly used sbuf_putc_func() which supress NUL
characters, it should use sbuf_put_byte().
Make sbuf_finish() return -1 on error.
Minor stylistic nits fixed.
Now in the case when one-shot timers are used cyclic events should fire
closer to theier scheduled times. As the cyclic is currently used only
to drive DTrace profile provider, this is the area where the change
makes a difference.
Reviewed by: mav (earlier version, a while ago)
X-MFC after: clocksource/eventtimer subsystem
Xen timer and time counter to provide one-shot and periodic time events.
On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.
Now only ia64 and some ARMs left not migrated to the new event timers.
should not change. Fetch the td_user_pri under the thread lock. This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.
Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
file and processes information retrieval from the running kernel via sysctl
in the form of new library, libprocstat. The library also supports KVM backend
for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have
been modified to take advantage of the library (as the bonus point the fstat(1)
utility no longer need superuser privileges to operate), and the procstat(1)
utility is now able to display information from memory dumps as well.
The newly introduced fuser(1) utility also uses this library and able to operate
via sysctl and kvm backends.
The library is by no means complete (e.g. KVM backend is missing vnode name
resolution routines, and there're no manpages for the library itself) so I
plan to improve it further. I'm commiting it so it will get wider exposure
and review.
We won't be able to MFC this work as it relies on changes in HEAD, which
was introduced some time ago, that break kernel ABI. OTOH we may be able
to merge the library with KVM backend if we really need it there.
Discussed with: rwatson
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).
Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.
The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN
while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.
Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now
The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.
Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno
structure, which acts as a proxy between them. This makes jail rules
persistent, i.e. they can be added before jail gets created, and they
don't disappear when the jail gets destroyed.
kern.sched.ipiwakeup.onecpu
kern.sched.ipiwakeup.htt2
Because they are absolutely obsolete. Probabilly the whole wakeup
forward mechanism should be revisited for a better fitting in modern
hw.
- As map2 variable is no longer used rename map3 to map2
- Fix a string by making more informative the msg and removing the
arguments passing
Approved by: julian
wrapper around rman_adjust_resource(). Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus. There is currently no default implementation. A
bus_adjust_resource() wrapper is provided for use in drivers.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
existing resource. It only succeeds if the newly requested address
space is already free. It also supports shrinking a resource in
which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
start and end addresses for the first or last unallocated region in
an rman, respectively. This can be used to determine by how much
the resource backing an rman must be adjusted to accomodate an
allocation request that does not fit into the existing rman.
While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
rm_end to allow managing the full range (0 to ~0ul) if they are not set by
the caller when rman_init() is called.
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.
I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).
Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks
bound to an AP before SMP has started, the system will panic when we try
to touch per-CPU state for that AP because that state has not been
initialized yet. Fix this in the same way as ULE: place all threads in
the global run queue before SMP has started.
Reviewed by: jhb
MFC after: 1 month
mechanism. The caller may specify a timeout in ticks after which the
task will be scheduled.
Sponsored by: The FreeBSD Foundation
Reviewed by: jeff, jhb
MFC after: 1 month
the mutexes in the wrong order for the case where the
MBF_MNTLSTLOCK is set. I believe this did have the
potential for deadlock. For example, if multiple nfsd threads
called vfs_busyfs(), which calls vfs_busy() with MBF_MNTLSTLOCK.
Thanks go to pho for catching this during his testing.
Tested by: pho
Submitted by: kib
MFC after: 2 weeks
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".
This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.
Reviewed by: bde