these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.
The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.
In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
the owner of a queue to block and unblock execution of the tasks in the
queue while allowing tasks to continue to be added queue. Combining this
with taskqueue_drain() allows a queue to be safely disabled. The unblock
function may run (or schedule to run) the queue when it is called, just as
calling taskqueue_enqueue() would.
Reviewed by: jhb, sam
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.
MFC after: 4 months
of pptpgre and ksocket nodes for all calls between two peers. This patch
modifies node's API by adding new "session_%04x" hook names support, while
keeping backward compatibility.
Together with appropriate user-level support (by latest mpd5) it gives
huge performance benefits for case of multiple active calls between
two peers because of avoiding data duplication and extra socket processing.
On my benchmarks I have got more then 10 times speedup for the 200
simultaneous PPTP calls between two peers.
In conclusion, it allows now to build effective "clients <=> PAC <=> PNS"
setups.
o sort mbuf flags together and extend values to 32 bits
o write M_COPYFLAGS in terms of M_PROTOFLAGS
o move M_COPYFLAGS and M_PROTOFLAGS up to be together with flag defs
Reviewed by: rwatson
MFC after: 3 weeks
- Take advantage of m_collapse(9).
- Sync with other NIC drivers and prepend a TX mbuf if the first attempt
to load it fails with an error other than EFBIG and stop trying instead
of freeing it and keeping on trying to enqueue more mbufs. Also ensure
the driver queue isn't empty before trying to enqueue mbufs in order to
reduce locking operations.
- In xl_ifmedia_upd() add a missing XL_UNLOCK(). [1]
- Const'ify the xl_devs array.
- Remove an outdated comment.
PR: 113406 [1]
MFC after: 1 month
- Correct the maxsize parameter when creating the mbufs busdma tag to
reflect the actual requirement of dc(4).
- Move the KASSERT in dc_newbuf() to the right spot.
- Also convert the TX side to take advantage of bus_dmamap_load_mbuf_sg(9).
- Move the comment regarding dc_start_locked() to the right spot.
MFC after: 2 weeks
- Resource allocation in aac_alloc (moved from from aac_init)
- Interrupt setup in aac_setup_intr (from aac_attach)
- Container probing in aac_get_container_info (from aac_startup and
aac_handle_aif)
- Firmware status check moved to aac_check_firmware from aac_init
In case of "new SA", we must check the hard lifetime of the old SA
to find out if it is not permanent and we can delete it.
Submitted by: sakane via gnn
MFC after: 3 days
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.
The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.
The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.
These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.
Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.
Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.
Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]
[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.
references to a vnode with VI_OWEINACT set will force the vinactive()
call. The kernel makes no guarantees about which reference was the
last to close a file or when the actual inactive processing will
happen. The previous code was designed to preserve existing semantics
in the face of shared locks, however, this was unnecessary.
Discussed with: mckusick
is requested. Handle this case specially before the while loop.
- Use the held vnode lock to check for VI_DOOMED. The vnode lock and
interlock must both be held to set VI_DOOMED so either one held, even
shared, is sufficient to check it.
No objection by: kib
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.
Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)
Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.
While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.
A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)
This is still work-in-progress.