to be dropped, 2) attempting to lock acct_mtx while already holding it.
Sorry to those who experienced pain.
- Added two comments referring to two areas in which acct_mtx is held over
vnode operations that might sleep. Patch in the works for this.
request structure.
- Re-optimize the case of utrace being disabled by doing an explicit
KTRPOINT check instead of relying on the one in ktr_getrequest() so that
we don't waste time on a malloc in the non-tracing case.
- Change utrace() to return an error if the copyin() fails. Before it
would just ignore the request but still return success. This last is
a change in behavior and can be backed out if necessary.
transfer to a malloc'd buffer and use that bufer for the ktrace event.
This means that genio ktrace events no longer need to be synchronous.
- Now that ktr_buffer isn't overloaded to sometimes point to a cached uio
pointer for genio requests and always points to a malloc'd buffer if not
NULL, free the buffer in ktr_freerequest() instead of in
ktr_writerequest(). This closes a memory leak for ktrace events that
used a malloc'd buffer that had their vnode ripped out from under them
while they were on the todo list.
Suggested by: bde (1, in principle)
- Rename kern.ktrace_request_pool tunable/sysctl to
kern.ktrace.request_pool.
- Add a variable to control the max amount of data to log for genio events.
This variable is tunable via the tunable/sysctl kern.ktrace.genio_size
and defaults to one page.
Changed rename(2) to follow the letter of the POSIX spec. POSIX
requires rename() to have no effect if its args "resolve to the same
existing file". I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.
ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable. This fixes
some races. All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).
Prodded by: alfred
MFC after: 8 weeks
PR: 42617
- Get the initial mode from the prom settings and don't clobber the mode
on open.
- Copy output into an internal ring buffer instead of accessing the tty
outq directly in the interrupt handler. This fixes a problem where
garbage would show up in the output stream.
- Reset the console port completely and reprogram all the parameters
before enabling it. This fixes seemingly random hangs on startup
when using a fast interrupt handler.
- Add minimal locking in place of spls.
- Remove dead code and minor cleanups.
available at module compile time. Do not #include the bogus
opt_kstack_pages.h at this point and instead refer to the variables that
are also exported via sysctl.
PS_STRINGS and USRSTACK is. This is necessary in order to decode a.out
core dumps. kern_proc.c was already referring to both of these values
but was missing the #include "opt_kstack_pages.h". Make the sysctl
variables visible so that certain kld modules can see how their parent
kernel was configured.
The process allocator now caches and hands out complete process structures
*including substructures* .
i.e. it get's the process structure with the first thread (and soon KSE)
already allocated and attached, all in one hit.
For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is
unused and in the process cache. This saves having to allocate and attach it
later, effectively bringing us (hopefully) close to the efficiency
of pre-KSE systems where these were a single structure.
Reviewed by: davidxu@freebsd.org, peter@freebsd.org
Together these two implement a simple transcation style grouping for
modifications of extended attributes on a vnode.
VOP_CLOSEEXTATTR() takes a boolean "commit" argument, which determines
if the aggregate changes are attempted written or not. A commit will
fail if any of the VOP_SETEXTATTR() calls since the VOP_OPENEXTATTR()
have failed to meet their objective or if the flush to disk fails.
The default operations for these two VOP's is to return EOPNOTSUPP.
This API may still be subject to change.
Sponsored by: DARPA & NAI Labs
s/SNGL/SINGLE/
s/SNGLE/SINGLE/
Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.
Approved by: julian (mentor)
layers deep in <sys/proc.h> or <sys/vnode.h>.
Removed unused includes.
Fixed some printf format errors (1 fatal on i386's; 1 fatal on alphas;
1 not fatal on any supported machine).
functions which run for several milliseconds at a time and getting
in queue behind one or more of those makes us miss our rewind.
Instead call it from hardclock() like we used to do, but retain the
prescaler so we still cope with high HZ values.
out that there is no easy way to discern the difference between a text
segment and a data segment through the read-only OR execute attribute
in the elf segment header, so revert the algorithm to what it was before.
Neither can we account for multiple data load segments in the vmspace
structure (at least not without more work), due to assumptions obreak()
makes in regards to the data start and data size fields.
Retain RLIMIT_VMEM checking by using a local variable to track the
total bytes of data being loaded.
Reviewed by: peter
X-MFC after: ASAP
so that it works on the Alpha. This defines the segment that the entry
point exists in as 'text' and any others (usually one) as data.
Submitted by: tmm
Tested on: i386, alpha
it can do it w/o needing to hold the filelist_lock sx lock.
- fdalloc() doesn't need Giant to call free() anymore. It also doesn't
need to drop and reacquire the filedesc lock around free() now as a
result.
- Try to make the code that copies fd tables when extending the fd table in
fdalloc() a bit more readable by performing assignments in separate
statements. This is still a bit ugly though.
- Use max() instead of an if statement so to figure out the starting point
in the search-for-a-free-fd loop in fdalloc() so it reads better next to
the min() in the previous line.
- Don't grow nfiles in steps up to the size needed if we dup2() to some
really large number. Go ahead and double 'nfiles' in a loop prior
to doing the malloc().
- malloc() doesn't need Giant now.
- Use malloc() and free() instead of MALLOC() and FREE() in fdalloc().
- Check to see if the size we are going to grow to is too big, not if the
current size of the fd table is too big in the loop in fdalloc(). This
means if we are out of space or if dup2() requests too high of a fd,
then we will return an error before we go off and try to allocate some
huge table and copy the existing table into it.
- Move all of the logic for dup'ing a file descriptor into do_dup() instead
of putting some of it in do_dup() and duplicating other parts in four
different places. This makes dup(), dup2(), and fcntl(F_DUPFD) basically
wrappers of do_dup now. fcntl() still has an extra check since it uses
a different error return value in one case then the other functions.
- Add a KASSERT() for an assertion that may not always be true where the
fdcheckstd() function assumes that falloc() returns the fd requested and
not some other fd. I think that the assertion is always true because we
are always single-threaded when we get to this point, but if one was
using rfork() and another process sharing the fd table were playing with
the fd table, there might could be a problem.
- To handle the problem of a file descriptor we are dup()'ing being closed
out from under us in dup() in general, do_dup() now obtains a reference
on the file in question before calling fdalloc(). If after the call to
fdalloc() the file for the fd we are dup'ing is a different file, then
we drop our reference on the original file and return EBADF. This
race was only handled in the dup2() case before and would just retry
the operation. The error return allows the user to know they are being
stupid since they have a locking bug in their app instead of dup'ing
some other descriptor and returning it to them.
Tested on: i386, alpha, sparc64
sleep mutexes and vice versa. WITNESS normally should catch this but
not everyone uses WITNESS so this is a fallback to catch nasty but easy
to do bugs.
PCATCH means 'if we get a signal, interrupt me!" and tsleep returns
either EINTR or ERESTART depending on the circumstances. ERESTART is
"special" because it causes the system call to fail, but right as it
returns back to userland it tells the trap handler to move %eip back a
bit so that userland will immediately re-run the syscall.
This is a syscall restart. It only works for things like read() etc where
nothing has changed yet. Note that *userland* is tricked into restarting
the syscall by the kernel. The kernel doesn't actually do the restart. It
is deadly for things like select, poll, nanosleep etc where it might cause
the elapsed time to be reset and start again from scratch. So those
syscalls do this to prevent userland rerunning the syscall:
if (error == ERESTART) error = EINTR;
Fake "signals" like SIGTSTP from ^Z etc do not normally invoke userland
signal handlers. But, in -current, the PCATCH *is* being triggered and
tsleep is returning ERESTART, and the syscall is aborted even though no
userland signal handler was run.
That is the fault here. We're triggering the PCATCH in cases that we
shouldn't. ie: it is being triggered on *any* signal processing, rather
than the case where the signal is posted to userland.
--- Peter
The work of psignal() is a patchwork of special case required by the process
debugging and job-control facilities...
--- Kirk McKusick
"The design and impelementation of the 4.4BSD Operating system"
Page 105
in STABLE source, when psignal is posting a STOP signal to sleeping
process and the signal action of the process is SIG_DFL, system will
directly change the process state from SSLEEP to SSTOP, and when
SIGCONT is posted to the stopped process, if it finds that the process
is still on sleep queue, the process state will be restored to SSLEEP,
and won't wakeup the process.
this commit mimics the behaviour in STABLE source tree.
Reviewed by: Jon Mini, Tim Robbins, Peter Wemm
Approved by: julian@freebsd.org (mentor)
brand early in the process of loading an elf file, so that we can
identify the sysentvec, and so that we do not continue if we do not
have a brand (and thus a sysentvec). Use the values in the sysentvec
for the page size and vm ranges unconditionally, since they are all
filled in now.