keyboards to work if no PS/2 keyboard is attached. The position in the
menu was chosen to avoid moving option 6 (loader prompt). This should
be a no-op on non-i386/amd64 machines.
outside of the nice threshold due to a recently awoken thread with a
lower nice value. This further reduces the amount of time a positively
niced thread gets while running in conjunction with a workload that has
many short sleeps (ie buildworld).
under high load: only set function state to loop and continuing sending
if there is no data left to send.
RELENG_5_3 candidate.
Feet provided: Peter Losher <Peter underscore Losher at isc dot org>
Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx>
Submitted by: mohan <mohans at yahoo-inc dot com>
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.
MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit
check for TD_ON_RUNQ() no longer means the thread is really on a run-
queue. I suspect this state should be re-evaluated as it must mean
something else now. This fixes ULE+KSE+PREEMPTION on UP x86.
This eliminates a bunch of vnode overhead (approx 1-2 % speed
improvement) and gives us more control over the access to the storage
device.
Access counts on the underlying device are not correctly tracked and
therefore it is possible to read-only mount the same disk device multiple
times:
syv# mount -p
/dev/md0 /var ufs rw 2 2
/dev/ad0 /mnt ufs ro 1 1
/dev/ad0 /mnt2 ufs ro 1 1
/dev/ad0 /mnt3 ufs ro 1 1
Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches
things in RAM) this is not possible with read-write mounts, and the system
will correctly reject this.
Details:
Add a geom consumer and a bufobj pointer to ufsmount.
Eliminate the vnode argument from softdep_disk_prewrite().
Pick the vnode out of bp->b_vp for now. Eventually we
should find it through bp->b_bufobj->b_private.
In the mountcode, use g_vfs_open() once we have used
VOP_ACCESS() to check permissions.
When upgrading and downgrading between r/o and r/w do the
right thing with GEOM access counts. Remove all the
workarounds for not being able to do this with VOP_OPEN().
If we are the root mount, drop the exclusive access count
until we upgrade to r/w. This allows fsck of the root
filesystem and the MNT_RELOAD to work correctly.
Set bo_private to the GEOM consumer on the device bufobj.
Change the ffs_ops->strategy function to call g_vfs_strategy()
In ufs_strategy() directly call the strategy on the disk
bufobj. Same in rawread.
In ffs_fsync() we will no longer see VCHR device nodes, so
remove code which synced the filesystem mounted on it, in
case we came there. I'm not sure this code made sense in
the first place since we would have taken the specfs route
on such a vnode.
Redo the highly bogus readblock() function in the snapshot
code to something slightly less bogus: Constructing an uio
and using physio was really quite a detour. Instead just
fill in a bio and ship it down.
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.
[1] Ok, I know, but I couldn't resist the pun.
buf->b-dev.
Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.
Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
IF INVARIANTS is defined, and in the rare case that we have
allocated some objects from the slab and at least one initializer
on at least one of those objects failed, and we need to fail the
allocation and push the uninitialized items back into the slab
caches -- in that scenario, we would fail to [re]set the
bucket cache's ub_bucket item references to NULL, which would
eventually trigger a KASSERT.
an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console. It's easily hit by accident
when frobbing other sysctls late at night.
the device is suspended or shutting down. This will need to be rethought
slightly if we implement suspend/resume support within vr(4).
This appears to fix the vr_shutdown() panic on SMP machines.
My theory here is there's a race somewhere during vr_detach() with
vr_intr() in the SMP case which was sometimes being triggered,
although quite why this was happening is unclear (vr_stop() also
explicitly disables interrupts by writing to the IMR register).
MFC-to-RELENG_5* candidate.
PR: kern/62889
Tested by: seb at struchtrup dot com
MFC after: 10 days
that was greater than 4G. I originally used the same values as i386 in
order to save opening a new PML4 page slot, but in the day of gigabytes
of memory, worrying about a 4K page seems futile. Moving from 8 to 32G
moves the page to a different index, it doesn't increase the number of
pages used.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.
Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.
Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
is ffs_copyonwrite() and the only place it can be called from is FFS which
would never want to call another filesystems copyonwrite method, should one
exist, so there is no reason why anything generic should know about this.
on UltraSPARC workstations. The driver is based on OpenBSD's SBus
cs4231 driver and heavily modified to incorporate into sound(4)
infrastructure. Due to the lack of APCDMA documentation, the DMA
code of SBus cs4231 came from OpenBSD's driver.
The driver runs without Giant lock and supports both SBus and EBus
based CS4231 audio controller. Special thanks to marius for providing
feedbacks during the driver writing. His feedback made it possible
to write hiccup free playback code under high system loads.
Approved by: jake (mentor)
Reviewed by: marius (initial version)
Tested by: marius, kwm, Julian C. Dunn(jdunn AT opentrend DOT net)
and release of the global page queues lock required to make the call.
Remove GIANT_REQUIRED from vm_hold_free_pages(). All of its VM operations
are properly synchronized.
count to prevent sockets from being garbage collected during
socket-specific system calls. This is the same approach used in
most VFS-specific system calls, as well as generic file descriptor
system calls such as read() and write().
To do this, add a utility function getsock(), which is logically
identical to getvnode() used for the same purpose in VFS. Unlike
fgetsock(), it returns with the file reference count elevated, but
no bump of the socket reference count. Replace matching calls to
fputsock() with fdrop().
This change is made to all socket system calls other than
sendfile() and accept(), but the approach should be applicable to
those system calls also.
This shaves about four mutex operations off of each of these
system calls, including send() and recv() variants, adding about
1% to pps on minimal UDP packets for UP using netblast, and 4% on
SMP.
Reviewed by: pjd
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.
Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
trigger a socket creation race some some kind). Checking for non-NULL socket
and credential is not a bad idea anyway. Unfortunatly too late for the
release.
Reported & tested by: Gilbert Cao
MFC after: 2 weeks
vm_page_sleep_if_busy(). (The motivation being to transition
synchronization of the vm_page's PG_BUSY flag from the global page queues
lock to the per-object lock.)
two loops in agp_generic_bind_memory(). As an intended side-effect, all
of the calls to vm_page_wakeup() are now performed with the containing
vm object lock held.
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page. Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked. So, the busy flag might as well never be set.
This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.
This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe: Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.
Assert that topology and Giant is not held in g_waitidle().
or pru_attach is NULL. With loadable protocols the SPACER dummy protocols
have valid function pointers for all methods to functions returning just
EOPNOTSUPP. Thus the early abort check would not detect immediately that
attach is not supported for this protocol. Instead it would correctly
get the EOPNOTSUPP error later on when it calls the protocol specific
attach function.
Add testing against the pru_attach_notsupp() function pointer to the
early abort check as well.
the final set of traces -- someone with more busdma background
will probably want to review and expand this, as well as port to
other platforms. This tracing is sufficient to identify key
busdma events on i386, and in particular to draw attention to
bounce buffering events that may have a substantial performance
impact.
o Instead of locking and unlocking all over the place, use
lock assertions to make certain that the bfe lock is held
where necessary.
o Create locked and unlocked versions of bfe_init and bfe_start. These
functions can be called from outside the module and by functions
within the bfe module. The calls from outside the module don't
hold the bfe lock so the unlocked versions called by these functions
simple obtain the bfe lock and call the locked version.
- Fix a typo (scp) in the locking macros that only worked because in all the
instances in which it was called the softc pointer happened to be named 'sc'.
- Mark the interrupt MPSAFE
Tested by: matusita, Dario Freni <saturnero@gufi.org>
Silence from: -net, wpaul
as well as document the properties of the mac_policy_conf structure.
Warn about the ABI risks in changing the structure without careful
consideration.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR
without a mountpoint. In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?
Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
in the g_up and g_down threads. Each time a bio is propelled up and
down the stack, an event is generating showing the provider, offset,
and length, as well as thread wakeup and work status information.
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.
Move v_numoutput from vnode to bufobj. Add renaming macro to
postpone code sweep.
modes on a tty structure. Both the ".init" and the current settings
are initialized allowing the function to be used both at attach and
open time.
The function takes an argument to decide if echoing should be enabled
by default. Echoing should not be enabled for regular physical
serial ports unless they are consoles, in which case they should
be configured by ttyconsolemode() instead.
Use the new function throughout.
right bits rather than piggy-backing on the V* rights defined in
vnode.h. The mac_bsdextended bits are given the same values as the V*
bits to make the new kernel module binary compatible with the old
version of libugidfw that uses V* bits. This avoids leaking kernel
API/ABI to user management tools, and in particular should remove the
need for libugidfw to include vnode.h.
Requested by: phk
is locked when vm_page_io_finish() is called on a page. This is to satisfy
a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the
process of transitioning the responsibility for synchronizing access to
various fields/flags on the page from the global page queues lock to the
per-object lock.)
Tripped over by: obrien@
with a weak memory model or x86 + PAE (or more specifically, your
driver is using bounce pages) and you have had problems with em(4),
this may fix it. At least this is needed to have em(4) work properly
on FreeBSD/arm.
Original version by: cognet
Reviewed by: tackerman
Tested by: cognet
protocols: it is possible for sockets to be created and attached
to the divert protocol between the test for sockets present and
successful unload of the registration handler. We will need to
explore more mature APIs for unregistering the protocol and then
draining consumers, or an atomic test-and-unregister mechanism.
of protocols. The call to divert_packet() is done through a function pointer. All
semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to
install divert rules and natd will complain about 'protocol not supported'. Once
it is loaded both will work and accept rules and open the divert socket. The module
can only be unloaded if no divert sockets are open. It does not close any divert
sockets when an unload is requested but will return EBUSY instead.
Add constants for SPI protocol delays that are needed for
target mode.
aic7xxx.c:
Correct a target mode issue that caused an occassional
spurious REQ to be seen on the bus when performing manual
message processing (e.g. transfer rate negotiation).
Enforce phase change bus settle rules with explicit
delays when performing manual message processing in
target mode. The sequencer already did this for
"fast-path", target mode message processing.
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.
MFC after: 3 days
Bumped into by: jmg
protocols in inetsw[] and define initially eight spacer slots.
Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is
now declared and initialized in kern/uipc_domain.c.
With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain. However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.
The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way. To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required. The function does not allow to overwrite an already
registered protocol. The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.
as the original logic did. This fixes a race with vr_intr() which was
masked on UP systems and manifested on SMP systems.
PR: kern/62889
MFC after: 1 day
families.
The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time. This made it impossible to dynamically add or remove any
protocols to or from it. We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time. The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP. This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer. Instead of fixing all potential
callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller. Upon successful registration
the pr_init() function will be called to finish initialization of the protocol. The
unregister function restores the SPACER in place of the protocol again. It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767
o Prototypes for all pru_*_notsupp() functions
o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the
pru_*_notsupp() functions
o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
the ATA pccard locking function. This makes pccard devices like
Compact Flash cards work again.
PR: kern/72805
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
MFC in: 2 days
frames. BGE hardware with the rx alignment bug will still be handled by the
calls to m_adj() that already exist. m_adj() is probably better suited for
this task anyways. Just as with if_em, this saves a malloc + several locks
per packet and prevents unneeded data copying within busdma.
Since the e1000 DMA engines hava no constraints on the alignment of buffer
transfers, there is no reason to tell busdma that there is. This save a
minimum of 1 malloc call per packet, which translates to eliminating 4 locks.
It also means that buffers are not needlessly bounced when transfered. The
end result is a 38% improvement in pps in a 4 way bridging environment.
Obtained from: Sandvine, Inc.
(usually taking 20 seconds to transmit a packet).. no longer fall back
to only transmitting one packet (instead of the entire queue) after we
have processed the entire send queue... I have no idea why we didn't
start seeing this problem ~6 years ago when this code was introduced...
(sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
modes on a tty structure.
Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.
The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.
Use the new function throughout.
List of functional changes:
- Make a single device per single node with a single hook.
This gives us parrallelizm, which can't be achieved on a single
node with many devices/hooks. This also gives us flexibility - we
can play with a particular device node, not affecting others.
- Remove read queue as it is. Use struct ifqueue instead. This change
removes a lot of extra memcpy()ing, m_devget()ting and m_copymem()ming.
In ng_device_receivedata() we enqueue an mbuf and wake readers.
In ngdread() we take one mbuf from qeueue and uiomove() it to
userspace. If no mbuf is present we optionally block. [1]
- In ngdwrite() we create an mbuf from uio using m_uiotombuf().
This is faster then uiomove() into buffer, and then m_copydata(),
and this is much better than huge m_pullup().
- Perform locking of device
- Perform locking of connection list.
- Clear out _rcvmsg method, since it does nothing good yet.
- Implement NGM_DEVICE_GET_DEVNAME message.
- #if 0 ioctl method, while nothing is done here yet.
- Return immediately from ngdwrite() if uio_resid == 0.
List of tidyness changes:
- Introduce device2priv(), to remove cut'n'paste.
- Use MALLOC/FREE, instead of malloc/free.
- Use unit2minor().
- Use UID_ROOT/GID_WHEEL instead of 0/0.
- Define NGD_DEVICE_DEVNAME, use it.
- Use more nice macros for debugging. [2]
- Return Exxx, not -1.
style(9) changes:
- No "#endif" after short block.
- Break long lines.
- Remove extra spaces, add needed spaces.
[1] Obtained from: if_tun.c
[2] Obtained from: ng_pppoe.c
Reviewed by: marks
Approved by: julian (mentor)
MFC after: 1 month