Keep the interrupts disabled in order to avoid preemption problems.
Reported by: tinderbox, b.f. <bf1783 at googlemail dot com>
MFC: 2 weeks
X-MFC: r206878
Assert this.
In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.
Verify that dvp is not reclaimed before calling cache_enter().
Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks
When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.
Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.
Reported by: rwatson
Discussed with: jhb
Tested by: rnoland, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
MFC: 2 weeks
Special deciation to: anyone who made possible to have 16-ways machines
in Netperf
disable alq, it acts as if alq had not been enabled in the build.
in other words, the rest of ktr is still available for use.
If you really don't want that as well, set the mask to 0.
MFC after:3 weeks
DTYPE_VNODE.
Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode,
since we allow for fcntl(2) to process with advisory locks for
DTYPE_VNODE only. Another reason is that all fo_close() routines need to
check and release locks otherwise.
For O_TRUNC, call fo_truncate() instead of truncating the vnode.
Discussed with: rwatson
MFC after: 2 week
Previously, one of these limits was initialized in two places to a
different value in each place. Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures. (Currently, this
error is masked by login.conf's default settings.)
Make vm_thread_swapin() and vm_thread_swapout() static.
Submitted by: bde (an earlier version)
Reviewed by: kib
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.
This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.
In collaboration with: pho
Reviewed by: alc
MFC after: 4 weeks
As currently st_blksize is always PAGE_SIZE, it is playing safe to not
use any smaller value. For some cases this might not be optimal, but
at least nothing should get broken.
Generally I don't expect this commit to change much for the following
reasons (in case of VREG, VDIR):
- application I/O and physical I/O are sufficiently decoupled by
filesystem code, buffer cache code, cluster and read-ahead logic
- not all applications use st_blksize as a hint, some use f_iosize, some
use fixed block sizes
I expect writes to the middle of files on ZFS to benefit the most from
this change.
Silence from: fs@
MFC after: 2 weeks
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk. This should coinside with
vp being a devvp.
Reported by: Mykola Dzham <i@levsha.me>
Tested by: Mykola Dzham <i@levsha.me>
Pointyhat to: avg
MFC after: 2 weeks
X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks
unlocks and unreferences for argument vnodes, as expected by
kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and
reference leaks when rename is attempted on fs that does not
implement rename.
PR: kern/107439
Based on submission by: Mikolaj Golub <to.my.trociny gmail com>
Tested by: Mikolaj Golub
MFC after: 1 week
- Use the new alq_destroy() to properly handle a failure case in alq_open().
Sponsored by: FreeBSD Foundation
Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by: kmacy (mentor)
MFC after: 1 month
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.
This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.
I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.
Reviewed by: jhb
executable status of segments instead of detecting the main text segment
by which segment contains the program entry point. This affects
obreak() and is required for correct operation of that function
on 64-bit PowerPC systems. The previous behavior was apparently
required only for the Alpha, which is no longer supported.
Reviewed by: jhb
Tested on: amd64, sparc64, powerpc
panic: rw lock not unlocked
was not really helpful for debugging. Now one can at least call
show lock <ptr>
form ddb to learn more about the lock.
MFC after: 3 days
so should be copied to userspace with suword32() instead of suword().
This alleviates problems on 64-bit big-endian architectures, and is a
no-op on all 32-bit architectures.
Tested on: amd64, sparc64, powerpc64
According to POSIX open() must return ENOTDIR when the path name does
not refer to a path name. Change vn_open() to respect this flag. This
also simplifies the Linuxolator a bit.
that provides the allocated and setup eventhandler entry.
Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.
Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.
Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.
Sponsored by: ISPsystem
Discussed with: jhb
Reviewed by: zec (earlier version), jhb
MFC after: 1 month
sysv_{msg,sem,shm}.c files.
Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.
This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.
Reviewed by: jhb
MFC after: 2 weeks
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.
Reviewed by: kib, jhb
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).
sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.
MFC after: 3 weeks