- Fix the locking protocol to eliminate races between normal I/O and AENs.
- Various small improvements and usability tweaks.
Sponsored by: IronPort
Portions Submitted by: Doug Ambrisko
- Use bus_dmamap_load_mbuf_sg() to eliminate the need for the callback and
all of the extra bookkeeping associated with it.
- Eliminate the bce_dmamap_arg structure and streamline the memory allocation
routines to not need it. This does change some of the debugging messages.
- Refactor the loop that fills the buffer descriptor so that it can be done
with a single set of logic in a single loop instead of two sets of logic.
- Eliminate the need to cache and pass descriptor indexes between the start
loop and the encap function.
- Change the start loop to always check the ifnet sendq for more work.
This significantly helps the driver withstand large UDP workloads, though
it's still not perfect. I suspect the remaining work lies with handling
the OACTIVE flag, and also in possibly streamlining the interrupt handler
some. It is, however, nearly on par with the other popular gigabit drivers
in terms of stability now.
the probe packet we sent and the packet quoted by the ICMP response.
Can be useful for spotting hops that change the packet in-flight
or have problems generating correct ICMP responses.
MFC after: 3 weeks
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.
From the submitter:
---snip---
DESIGN NOTES:
1. Linux permits a process to own multiple AIO queues (distinguished by
"context"), but FreeBSD creates only one single AIO queue per process.
My code maintains a request queue (STAILQ of queue(3)) per "context",
and throws all AIO requests of all contexts owned by a process into
the single FreeBSD per-process AIO queue.
When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
io_cancel(2), my code can pick out requests owned by the specified context
from the single FreeBSD per-process AIO queue according to the per-context
request queues maintained by my code.
2. The request queue maintained by my code stores contrast information between
Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
(struct aiocb). FreeBSD IO control block actually exists in userland memory
space, required by FreeBSD native aio_XXXXXX(2).
3. It is quite troubling that the function io_getevents() of libaio-0.3.105
needs to use Linux-specific "struct aio_ring", which is a partial mirror
of context in user space. I would rather take the address of context in
kernel as the context ID, but the io_getevents() of libaio forces me to
take the address of the "ring" in user space as the context ID.
To my surprise, one comment line in the file "io_getevents.c" of
libaio-0.3.105 reads:
Ben will hate me for this
REFERENCE:
1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/
(include/linux/aio_abi.h, fs/aio.c)
2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/
(io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))
3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html
The design notes: http://lse.sourceforge.net/io/aionotes.txt
4. The package libaio, both source and binary:
http://rpmfind.net/linux/rpm2html/search.php?query=libaio
Simple transparent interface to Linux AIO system calls.
5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/
POSIX AIO implementation based on Linux AIO system calls (depending on
libaio).
---snip---
Submitted by: Li, Xiao <intron@intron.ac>
it introduced a check after the call to file system's get pages method
that assumes that the get pages method does not change the array of pages
that is passed to it. In the case of vnode_pager_generic_getpages(),
this assumption has been incorrect. The contents of the array of pages
may be shifted by vnode_pager_generic_getpages(). Likely, the problem
has been hidden by vnode_pager_haspage() limiting the set of pages that
are passed to vnode_pager_generic_getpages() such that a shift never
occurs.
The fix implemented herein is to adjust the pointer to the array of pages
rather than shifting the pages within the array.
MFC after: 3 weeks
Fix suggested by: tegge
an error returned by VOP_BMAP() and a hole in the file.
Change the callers to vnode_pager_addr() such that they return
VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page.
Reviewed by: tegge
MFC after: 3 weeks
if backward copatibility options are present) from attempting
to free memory that wasn't allocated. This is an old bug, and
previously it would attempt to free a null pointer. I noticed
this bug when working on the previous revision, but forgot to
fix it.
Security: local DoS
Reported by: Peter Holm
MFC after: 3 days
: revision 1.13
: date: 2002/04/03 15:44:53; author: phk; state: Exp; lines: +0 -5
: Initial deorbit burn for the undocumented and unused d_boot[01]
: fields of struct disklabel.
:
: Sponsored by: DARPA and NAI Labs.
: ----------------------------
: revision 1.15
: date: 2002/05/12 20:49:33; author: phk; state: Exp; lines: +1 -3
: Retire the bogus uses of the disklabel field d_sbsize and begin to
: initialize it to zero so we don't have to have everbody and their
: aunt including FFS specific header files.
:
: Sponsored by: DARPA & NAI Labs.
VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(),
we thought that it was optimized well enough for the one file system
that supports it (ffs) and harmless for other file systems (except
layered ones which already get the layering for VOP_SETATTR() wrong).
However, nfs_setattr() doesn't do much parameter checking, so when
it gets a combination of parameters that it doesn't understand, it
always does a Setattr RPC. This RPC can't do anything good, and for
VA_MARK_ATIME it is null except for wasting a lot of time.
This is the smallest and easiest to fix of several bugs that have
increased the number of RPCs for kernel builds on nfs by more than
100% since 2004-11-05. The real-time increase depends on network
latency and parallelization and can also be very large (approaching
the same percentage for unparallelized operations like "make depend"
on systems with fast CPUs and high-latency networks).