if one of the new poll types is requested; hopefully this will not break
any existing code. (This is done so that programs have a dependable
way of determining whether a filesystem supports the extended poll types
or not.)
The new poll types added are:
POLLWRITE - file contents may have been modified
POLLNLINK - file was linked, unlinked, or renamed
POLLATTRIB - file's attributes may have been changed
POLLEXTEND - file was extended
Note that the internal operation of poll() means that it is impossible
for two processes to reliably poll for the same event (this could
be fixed but may not be worth it), so it is not possible to rewrite
`tail -f' to use poll at this time.
- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.
- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.
- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.
... fix a bug with orecvfrom() or recvfrom() called with
the MSG_COMPAT flag on kernels compiled with the COMPAT_43 option.
The symptom is that the fromaddr is not correctly returned.
This affects the Linux emulator.
Submitted by: pb@fasterix.freenix.org (Pierre Beyssac)
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.
Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.
quite a while, but forgot to do so. For now, this code supports
most daemons running as kernel threads in UP kernels, and as
full processes in SMP. We will soon be able to run them as
threads in SMP, but not yet.
Note that an unload facility should be used to call rm_at_exit() (if
procfs is being loaded as an LKM and is subsequently removed), but it
was non-obvious how to do this in the VFS framework.
Reviewed by: Julian Elischer
surprise, procfs actually is optional, and some people truly do generate
kernels without it. Wow. I built a kernel without 'options PROCFS' and
it compiled and linked.
1) Fix the initialization of malloc structure that changed
due to perf opt.
2) Remove unneeded include.
3) An initialization assert added to malloc.
Submitted by: John Hood <cgull@smoke.marlboro.vt.us>
workaround. Note that this currently eats up two pages extra in the system;
this could be alleviated by aligning idt correctly, and then only dealing with
that (as opposed to the current method of allocated two pages and copying the
IDT table to that, and then setting that to be the IDT table).
or aio_write can return the pid of the new thread. This is due to the
way that return values from system calls being passed by side-effect in
the proc structure now. This commit fixes the problem with aio_read and
aio_write.
remove alot of overly verbose debugging statements.
ioproclist {
int aioprocflags; /* AIO proc flags */
TAILQ_ENTRY(aioproclist) list; /* List of processes */
struct proc *aioproc; /* The AIO thread */
TAILQ_HEAD (,aiocblist) jobtorun; /* suggested job to run */
};
/*
* data-structure for lio signal management
*/
struct aio_liojob {
int lioj_flags;
int lioj_buffer_count;
int lioj_buffer_finished_count;
int lioj_queue_count;
int lioj_queue_finished_count;
struct sigevent lioj_signal; /* signal on all I/O done */
TAILQ_ENTRY (aio_liojob) lioj_list;
struct kaioinfo *lioj_ki;
};
#define LIOJ_SIGNAL 0x1 /* signal on all done (lio) */
#define LIOJ_SIGNAL_POSTED 0x2 /* signal has been posted */
/*
* per process aio data structure
*/
struct kaioinfo {
int kaio_flags; /* per process kaio flags */
int kaio_maxactive_count; /* maximum number of AIOs */
int kaio_active_count; /* number of currently used AIOs */
int kaio_qallowed_count; /* maxiumu size of AIO queue */
int kaio_queue_count; /* size of AIO queue */
int kaio_ballowed_count; /* maximum number of buffers */
int kaio_queue_finished_count; /* number of daemon jobs finished */
int kaio_buffer_count; /* number of physio buffers */
int kaio_buffer_finished_count; /* count of I/O done */
struct proc *kaio_p; /* process that uses this kaio block */
TAILQ_HEAD (,aio_liojob) kaio_liojoblist; /* list of lio jobs */
TAILQ_HEAD (,aiocblist) kaio_jobqueue; /* job queue for process */
TAILQ_HEAD (,aiocblist) kaio_jobdone; /* done queue for process */
TAILQ_HEAD (,aiocblist) kaio_bufqueue; /* buffer job queue for process */
TAILQ_HEAD (,aiocblist) kaio_bufdone; /* buffer done queue for process */
};
#define KAIO_RUNDOWN 0x1 /* process is being run down */
#define KAIO_WAKEUP 0x2 /* wakeup process when there is a significant
event */
TAILQ_HEAD (,aioproclist) aio_freeproc, aio_activeproc;
TAILQ_HEAD(,aiocblist) aio_jobs; /* Async job list */
TAILQ_HEAD(,aiocblist) aio_bufjobs; /* Phys I/O job list */
TAILQ_HEAD(,aiocblist) aio_freejobs; /* Pool of free jobs */
static void aio_init_aioinfo(struct proc *p) ;
static void aio_onceonly(void *) ;
static int aio_free_entry(struct aiocblist *aiocbe);
static void aio_process(struct aiocblist *aiocbe);
static int aio_newproc(void) ;
static int aio_aqueue(struct proc *p, struct aiocb *job, int type) ;
static void aio_physwakeup(struct buf *bp);
static int aio_fphysio(struct proc *p, struct aiocblist *aiocbe, int type);
static int aio_qphysio(struct proc *p, struct aiocblist *iocb);
static void aio_daemon(void *uproc);
SYSINIT(aio, SI_SUB_VFS, SI_ORDER_ANY, aio_onceonly, NULL);
static vm_zone_t kaio_zone=0, aiop_zone=0,
aiocb_zone=0, aiol_zone=0, aiolio_zone=0;
/*
* Single AIOD vmspace shared amongst all of them
*/
static struct vmspace *aiovmspace = NULL;
/*
* Startup initialization
*/
void
aio_onceonly(void *na)
{
TAILQ_INIT(&aio_freeproc);
TAILQ_INIT(&aio_activeproc);
TAILQ_INIT(&aio_jobs);
TAILQ_INIT(&aio_bufjobs);
TAILQ_INIT(&aio_freejobs);
kaio_zone = zinit("AIO", sizeof (struct kaioinfo), 0, 0, 1);
aiop_zone = zinit("AIOP", sizeof (struct aioproclist), 0, 0, 1);
aiocb_zone = zinit("AIOCB", sizeof (struct aiocblist), 0, 0, 1);
aiol_zone = zinit("AIOL", AIO_LISTIO_MAX * sizeof (int), 0, 0, 1);
aiolio_zone = zinit("AIOLIO",
AIO_LISTIO_MAX * sizeof (struct aio_liojob), 0, 0, 1);
aiod_timeout = AIOD_TIMEOUT_DEFAULT;
aiod_lifetime = AIOD_LIFETIME_DEFAULT;
jobrefid = 1;
}
/*
* Init the per-process aioinfo structure.
* The aioinfo limits are set per-process for user limit (resource) management.
*/
void
aio_init_aioinfo(struct proc *p)
{
struct kaioinfo *ki;
if (p->p_aioinfo == NULL) {
ki = zalloc(kaio_zone);
p->p_aioinfo = ki
support was missing in the previous version of the AIO code. More
tunables added, and very efficient support for VCHR files has been added.
Kernel threads are not used for VCHR files, all work for such files is
done for the requesting process directly. Some attempt has been made to
charge the requesting process for resource utilization, but more work
is needed. aio_fsync is still missing (but the original fsync system
call can be used for now.) aio_cancel is essentially a noop, but that
is okay per POSIX. More aio_cancel functionality can be added later,
if it is found to be needed.
The functions implemented include:
aio_read, aio_write, lio_listio, aio_error, aio_return,
aio_cancel, aio_suspend.
The code has been implemented to support the POSIX spec 1003.1b
(formerly known as POSIX 1003.4 spec) features of the above. The
async I/O features are truly async, with the VCHR mode of operation
being essentially the same as physio (for appropriate files) for
maximum efficiency. This code also supports the signal capability,
is highly tunable, allowing management of resource usage, and
has been written to allow a per process usage quota.
Both the O'Reilly POSIX.4 book and the actual POSIX 1003.1b document
were the reference specs used. Any filedescriptor can be used with
these new system calls. I know of no exceptions where these
system calls will not work. (TTY's will also probably work.)
this results in a few functions becoming static, and
the SYSINITs being close to the code they are related to.
setting up the dump device is with dumpsys() and
kicking off the scheduler is with the scheduler.
Mounting root is with the code that does it.
Reviewed by: phk
b_flags, and this patch removes unneeded modifications. Only the needed b_flags
bits are modified now. (Specifically, it is usually wrong to zero b_flags.)
Submitted by: bde@freebsd.org
Fixed overflow of FFLAGS() in fcntl(F_SETFL, ...). This was not
a security hole, but gave wrong results for silly flags values.
E.g., it make fcntl(F_SETFL, -1) equivalent to fcntl(F_SETFL, 0).
POSIX requires ignoring the open mode bits in fcntl() (even if
they would be invalid for open()).
Use OID_AUTO instead of a magic number for the debug.syncprt sysctl.
(This sysctl doesn't actually work. FreeBSD nuked it, but parts
of it were mismerged from Lite2. It is not very good, but better
than nothing.)
`mount -u'. This only matters for `mount -u' competing with unmounts.
If I understand the locking correctly: if mount() blocks, then unmount()
may run and set mp->kern_flag for the same mp. Then unmount() blocks
waiting for mount() to finish. When unmount() continues, its MNTK flags
(MNTK_UNMOUNT and MNTK_MWAIT) may have been clobbered.
Didn't fix old bugs:
- restoring mp->mnt_kern_flag is wrong for the same reasons in the error
case.
- the error case of unmount() seems to be broken too:
(a) MNTK_UNMOUNT gets clobbered, although another unmount() may have
set it. Perhaps it shouldn't be set until after the full lock is
aquired.
(b) MNTK_MWAIT isn't honoured.
Fixed a nearby style bug.
Disallow wait options that are not a combination of the standard POSIX
options WUNTRACED and WNOHANG, as is required by POSIX. BSD doesn't
have any extensions here, but the code was `#ifdef notyet' for some
reason.