to unconditionally set PG_REFERENCED on a page before sleeping. In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable. Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.
Submitted by: Alexander Krizhanovsky <ak natsys-lab com>
MFC after: 1 week
taskqueue_drain(9) will not correctly detect whether a task is
currently running. The check is against a field in the taskqueue
struct, but for a threaded queue with more than one thread, multiple
threads can simultaneously be running a task, thus stomping over the
tq_running field.
Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: jhb
Approved by: dfr (mentor)
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
This is done in kern_ntptime, perhaps not the best place.
This is done using resettodr().
Some features:
- make save period configurable via tunable and sysctl
- period of zero disables saving, setting a non-zero period re-enables
it or reschedules it
- do saving only if system clock is ntp-synchronized
- save on shutdown
Discussed with: des, Peter Jeremy <peterjeremy@acm.org>
X-Maybe: save time near seconds boundary for better precision
MFC after: 2 weeks
things allows variable length messages to be easily supported.
- Extend KPI with alq_writen() and alq_getn() to support variable length
messages, which is enabled at ALQ creation time depending on the
arguments passed to alq_open(). Also add variants of alq_open() and
alq_post() that accept a flags argument. The KPI is still fully
backwards compatible and shouldn't require any change in ALQ consumers
unless they wish to utilise the new features.
- Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers
to have more control over IO scheduling and resource acquisition
respectively.
- Strengthen invariants checking.
- Document ALQ changes in ALQ(9) man page.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jeff, rpaulo, rwatson
MFC after: 1 month
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
mostly work on 64bit host.
The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.
Reviewed by: jhb
MFC after: 1 week
Keep the interrupts disabled in order to avoid preemption problems.
Reported by: tinderbox, b.f. <bf1783 at googlemail dot com>
MFC: 2 weeks
X-MFC: r206878
Assert this.
In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.
Verify that dvp is not reclaimed before calling cache_enter().
Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks
When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.
Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.
Reported by: rwatson
Discussed with: jhb
Tested by: rnoland, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
MFC: 2 weeks
Special deciation to: anyone who made possible to have 16-ways machines
in Netperf
disable alq, it acts as if alq had not been enabled in the build.
in other words, the rest of ktr is still available for use.
If you really don't want that as well, set the mask to 0.
MFC after:3 weeks
DTYPE_VNODE.
Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode,
since we allow for fcntl(2) to process with advisory locks for
DTYPE_VNODE only. Another reason is that all fo_close() routines need to
check and release locks otherwise.
For O_TRUNC, call fo_truncate() instead of truncating the vnode.
Discussed with: rwatson
MFC after: 2 week
Previously, one of these limits was initialized in two places to a
different value in each place. Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures. (Currently, this
error is masked by login.conf's default settings.)
Make vm_thread_swapin() and vm_thread_swapout() static.
Submitted by: bde (an earlier version)
Reviewed by: kib
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.
This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.
In collaboration with: pho
Reviewed by: alc
MFC after: 4 weeks
As currently st_blksize is always PAGE_SIZE, it is playing safe to not
use any smaller value. For some cases this might not be optimal, but
at least nothing should get broken.
Generally I don't expect this commit to change much for the following
reasons (in case of VREG, VDIR):
- application I/O and physical I/O are sufficiently decoupled by
filesystem code, buffer cache code, cluster and read-ahead logic
- not all applications use st_blksize as a hint, some use f_iosize, some
use fixed block sizes
I expect writes to the middle of files on ZFS to benefit the most from
this change.
Silence from: fs@
MFC after: 2 weeks
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk. This should coinside with
vp being a devvp.
Reported by: Mykola Dzham <i@levsha.me>
Tested by: Mykola Dzham <i@levsha.me>
Pointyhat to: avg
MFC after: 2 weeks
X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks
unlocks and unreferences for argument vnodes, as expected by
kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and
reference leaks when rename is attempted on fs that does not
implement rename.
PR: kern/107439
Based on submission by: Mikolaj Golub <to.my.trociny gmail com>
Tested by: Mikolaj Golub
MFC after: 1 week
- Use the new alq_destroy() to properly handle a failure case in alq_open().
Sponsored by: FreeBSD Foundation
Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by: kmacy (mentor)
MFC after: 1 month
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.
This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.
I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.