Fix kern_select() and sys_poll() so that they can handle sub-tick
precision for timeouts (in the same fashion it was done for nanosleep()
in r247797).
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Specify that precision of 0.5s is enough for resource limitation.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Specify that wakeup rate of 7.5-10Hz is enough for yarrow harvesting
thread.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Specify that syslog doesn't need exactly 5 wakeups per second.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
kern_nanosleep() is now converted to use tsleep_sbt(). With this change
nanosleep() and usleep() can handle sub-tick precision for timeouts.
Also, try to help coalesce of events passing as argument to tsleep_bt()
a precision value calculated as a percentage of the sleep time.
This percentage is default 5%, but it can tuned according to users
need via the sysctl interface.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
- Switch syscons from timeout() to callout_reset_flags() and specify that
precision is not important there -- anything from 20 to 30Hz will be fine.
- Reduce syscons "refresh" rate to 1-2Hz when console is in graphics mode
and there is nothing to do except some polling for keyboard. Text mode
refresh would also be nice to have adaptive, but this change at least
should help laptop users who running X.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
As vm objects are type-stable there is no need to initialize the
resident splay tree pointer and the cache splay tree pointer in
_vm_object_allocate() but this could be done in the init UMA zone
handler.
The destructor UMA zone handler, will further check if the condition is
retained at every destruction and catch for bugs.
Sponsored by: EMC / Isilon storage division
Submitted by: alc
Introduce sbt variants of msleep(), msleep_spin(), pause(), tsleep() in
the KPI, allowing to specify timeout in 'sbintime_t' rather than ticks.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Extend condvar(9) KPI introducing sbt variant of cv_timedwait. This
rely on the previously committed sleepq_set_timeout_sbt().
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Convert sleepqueue(9) bits to the new callout KPI. Take advantage of
the possibility to run callback directly from hw interrupt context.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
Make loadavg calculation callout direct. There are several reasons for it:
- it is very simple and doesn't worth context switch to SWI;
- since SWI is no longer used here, we can remove twelve years old hack,
excluding this SWI from from the loadavg statistics;
- it fixes problem when eventtimer (HPET) shares interrupt with some other
device, and that interrupt thread counted as permanent loadavg of 1; now
loadavg accounted before that interrupt thread is scheduled.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, Fabian Keil, markj
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.
This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.
Together with: mav
Reviewed by: attilio, bde, luigi, phk
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm),
markj (amd64), mav, Fabian Keil
pointers to the file structure receiving descriptors stopped to work when also
at least few kilobytes of data is being send. In the kernel the
soreceive_generic() function doesn't see control mbuf as the first mbuf and
unp_externalize() is never called, first 6(?) kilobytes of data is missing as
well on receiving end.
This breaks for example tmux.
I don't know yet why going from 8 bytes to sizeof(struct filedescent) per
descriptor (or even to 16 bytes per descriptor) breaks things, but to
work-around it for now use 8 bytes per file descriptor at the cost of memory
allocation.
Reported by: flo, Diane Bruce, Jan Beich <jbeich@tormail.org>
Simple testcase provided by: mjg
necessary because we already use parenthesis in zfs_cv_init().
This fixes a long standing bug where there would be an extra ")" at the
end of the string. This extra parenthesis would show up in the WCHAN of
the process (top, stty status, etc.).
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);
which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.
- Add manual pages for the new syscalls.
- Make the new syscalls available for processes in capability mode sandbox.
- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.
- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.
- Update procstat(1) to recognize the new capability rights.
- Document the new capability rights in cap_rights_limit(2).
Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des
valid if the flag OBJ_COLORED is set. Since _vm_object_allocate()
doesn't set this flag, it needn't initialize pg_color.
Sponsored by: EMC / Isilon Storage Division
devices. While at it, update the comment now that we know that MSI-X
doesn't work with ESXi 5.1 for Intel 82576 either and the underlying issue
is a bug in the MSI-X allocation code of the hypervisor.
Reported by: Harald Schmalzbauer
- Make the nomatch table const.
MFC after: 1 week
from the tree since few months (please note that the userland bits
were already disconnected since a long time, thus there is no need
to update the OLD* entries).
This is not targeted for MFC.
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: flo, pho, jhb, davide
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
lock, which filters shouldn't do. It also seems that kern_reboot(9) still
may require Giant to be hold.
- Correct an incorrect argument to shutdown_nice(9).
Submitted by: bde
interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep
lock, which filters shouldn't do. It also seems that kern_reboot(9) still
may require Giant to be hold.
Submitted by: bde
Include some flags of the nullfs mount itself:
MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW.
This allows userland code calling statfs() or fstatfs() to see these flags.
In particular, this allows opendir() to detect that a -t nullfs -o union
mount needs deduplication (otherwise at least . and .. are returned twice)
and allows rtld to detect a -t nullfs -o noexec mount as noexec.
Turn off the MNT_ROOTFS flag from the underlying filesystem because the
nullfs mount is definitely not the root filesystem.
Reviewed by: kib
MFC after: 1 week
on the target directory descriptor, but only if this is renameat(2) and real
target directory descriptor is given (not AT_FDCWD). Without this fix regular
rename(2) fails if the target file already exists.
Reported by: Michael Butler <imb@protected-networks.net>
Reported by: Larry Rosenman <ler@lerctr.org>
Sponsored by: The FreeBSD Foundation
It unfortunately steals a fair chunk of RAM at startup even if it's not
actively used, which prevents FreeBSD VMs of 128MB from successfully
booting and running.
other architectures [1].
While here:
- Remove an unused and commented out include.
- Add a comment describing the file that other copies have.
- Fix the style of the defines and add a comment on what each one is.
Suggested by: [1] alc