as well as document the properties of the mac_policy_conf structure.
Warn about the ABI risks in changing the structure without careful
consideration.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR
without a mountpoint. In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?
Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
in the g_up and g_down threads. Each time a bio is propelled up and
down the stack, an event is generating showing the provider, offset,
and length, as well as thread wakeup and work status information.
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.
Move v_numoutput from vnode to bufobj. Add renaming macro to
postpone code sweep.
modes on a tty structure. Both the ".init" and the current settings
are initialized allowing the function to be used both at attach and
open time.
The function takes an argument to decide if echoing should be enabled
by default. Echoing should not be enabled for regular physical
serial ports unless they are consoles, in which case they should
be configured by ttyconsolemode() instead.
Use the new function throughout.
right bits rather than piggy-backing on the V* rights defined in
vnode.h. The mac_bsdextended bits are given the same values as the V*
bits to make the new kernel module binary compatible with the old
version of libugidfw that uses V* bits. This avoids leaking kernel
API/ABI to user management tools, and in particular should remove the
need for libugidfw to include vnode.h.
Requested by: phk
is locked when vm_page_io_finish() is called on a page. This is to satisfy
a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the
process of transitioning the responsibility for synchronizing access to
various fields/flags on the page from the global page queues lock to the
per-object lock.)
Tripped over by: obrien@
with a weak memory model or x86 + PAE (or more specifically, your
driver is using bounce pages) and you have had problems with em(4),
this may fix it. At least this is needed to have em(4) work properly
on FreeBSD/arm.
Original version by: cognet
Reviewed by: tackerman
Tested by: cognet
protocols: it is possible for sockets to be created and attached
to the divert protocol between the test for sockets present and
successful unload of the registration handler. We will need to
explore more mature APIs for unregistering the protocol and then
draining consumers, or an atomic test-and-unregister mechanism.
of protocols. The call to divert_packet() is done through a function pointer. All
semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to
install divert rules and natd will complain about 'protocol not supported'. Once
it is loaded both will work and accept rules and open the divert socket. The module
can only be unloaded if no divert sockets are open. It does not close any divert
sockets when an unload is requested but will return EBUSY instead.
Add constants for SPI protocol delays that are needed for
target mode.
aic7xxx.c:
Correct a target mode issue that caused an occassional
spurious REQ to be seen on the bus when performing manual
message processing (e.g. transfer rate negotiation).
Enforce phase change bus settle rules with explicit
delays when performing manual message processing in
target mode. The sequencer already did this for
"fast-path", target mode message processing.
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.
MFC after: 3 days
Bumped into by: jmg
protocols in inetsw[] and define initially eight spacer slots.
Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is
now declared and initialized in kern/uipc_domain.c.
With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain. However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.
The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way. To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required. The function does not allow to overwrite an already
registered protocol. The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.
as the original logic did. This fixes a race with vr_intr() which was
masked on UP systems and manifested on SMP systems.
PR: kern/62889
MFC after: 1 day
families.
The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time. This made it impossible to dynamically add or remove any
protocols to or from it. We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time. The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP. This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer. Instead of fixing all potential
callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller. Upon successful registration
the pr_init() function will be called to finish initialization of the protocol. The
unregister function restores the SPACER in place of the protocol again. It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767
o Prototypes for all pru_*_notsupp() functions
o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the
pru_*_notsupp() functions
o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
the ATA pccard locking function. This makes pccard devices like
Compact Flash cards work again.
PR: kern/72805
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
MFC in: 2 days
frames. BGE hardware with the rx alignment bug will still be handled by the
calls to m_adj() that already exist. m_adj() is probably better suited for
this task anyways. Just as with if_em, this saves a malloc + several locks
per packet and prevents unneeded data copying within busdma.
Since the e1000 DMA engines hava no constraints on the alignment of buffer
transfers, there is no reason to tell busdma that there is. This save a
minimum of 1 malloc call per packet, which translates to eliminating 4 locks.
It also means that buffers are not needlessly bounced when transfered. The
end result is a 38% improvement in pps in a 4 way bridging environment.
Obtained from: Sandvine, Inc.
(usually taking 20 seconds to transmit a packet).. no longer fall back
to only transmitting one packet (instead of the entire queue) after we
have processed the entire send queue... I have no idea why we didn't
start seeing this problem ~6 years ago when this code was introduced...
(sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
modes on a tty structure.
Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.
The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.
Use the new function throughout.
List of functional changes:
- Make a single device per single node with a single hook.
This gives us parrallelizm, which can't be achieved on a single
node with many devices/hooks. This also gives us flexibility - we
can play with a particular device node, not affecting others.
- Remove read queue as it is. Use struct ifqueue instead. This change
removes a lot of extra memcpy()ing, m_devget()ting and m_copymem()ming.
In ng_device_receivedata() we enqueue an mbuf and wake readers.
In ngdread() we take one mbuf from qeueue and uiomove() it to
userspace. If no mbuf is present we optionally block. [1]
- In ngdwrite() we create an mbuf from uio using m_uiotombuf().
This is faster then uiomove() into buffer, and then m_copydata(),
and this is much better than huge m_pullup().
- Perform locking of device
- Perform locking of connection list.
- Clear out _rcvmsg method, since it does nothing good yet.
- Implement NGM_DEVICE_GET_DEVNAME message.
- #if 0 ioctl method, while nothing is done here yet.
- Return immediately from ngdwrite() if uio_resid == 0.
List of tidyness changes:
- Introduce device2priv(), to remove cut'n'paste.
- Use MALLOC/FREE, instead of malloc/free.
- Use unit2minor().
- Use UID_ROOT/GID_WHEEL instead of 0/0.
- Define NGD_DEVICE_DEVNAME, use it.
- Use more nice macros for debugging. [2]
- Return Exxx, not -1.
style(9) changes:
- No "#endif" after short block.
- Break long lines.
- Remove extra spaces, add needed spaces.
[1] Obtained from: if_tun.c
[2] Obtained from: ng_pppoe.c
Reviewed by: marks
Approved by: julian (mentor)
MFC after: 1 month
failure in the NFS server would result in a leaked instance of the NFS
server subsystem lock. Liberally sprinkle assertions in all target
labels for error unwinding to assert the desired locking state.
RELENG_5_3 candidate.
MFC after: 3 days
Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
freeing it. The page already has PG_BUSY set by vm_page_alloc().
Setting it again could cause an assertion failure.
MFC after: 2 weeks
vm_page_io_finish(). The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
invalidate the TLB(s) if the old mapping wasn't used by the CPU. With
network interfaces that implement checksum off-loading, the old mapping is
almost never used by the CPU, only by the device driver for setting up the
DMA operation.
Reviewed by: tegge@
critical_exit as the process is getting scheduled to run. This is subotimal
but for now avoid the LOR between the scheduler and the sleepq systems.
This is a 5.3 candidate.
Submitted by: davidxu
MFC After: 3 days
constrained to a small number of sessions by the small on-card memories found
in newer devices. This is really a stopgap solution as having session state
in main memory incurs a (small but noticeable) performance penalty. The better
solution is to manage session state so that it's cached on chip.
Obtained from: openbsd
* Get flags first, in case there is no devclass.
* Reset flags after each probe in case the next driver has no hints so it
doesn't inherit the old ones.
* Set them again before the winning probe.
Tested ok both with and without ACPI for ISA device flags.
Reviewed by: imp
MFC after: 1 day
providers for tasting. Before this hack, race below is possible:
SI_SUB_RAID (no not-fully-configured geoms, so don't block)
GEOM tasting (now geoms are created)
SI_SUB_MOUNT_ROOT (if root file system is placed on a mirror, it is
possible that this mirror is not fully configured yet)
There is a lot of work to do to avoid such hacks and I need a working
solution before 5.3, sorry.
Reported by: John Hay <jhay@icomtek.csir.co.za>
We have to use our own destroy_geom method, because default one, which
is a part of geom_slice is broken.
MT5 candidate.
PR: kern/72467
Submitted by: Vladimir Novoseltsev
The changes in the next commit would make the code totally unreadable
if the #ifdef'ing were maintained.
It might make a lot of sense to split if_cx.c in a netgraph related
and in a tty related file but I will not attempt that without hardware.
handle DMA addresses located above 1GB. The LBA(loop begin address)
register which holds DMA base address is 32bits register. But the
MSB 2bits are used for other purposes. This effectivly limits the
DMA address space up to 1GB.
Approved by: jake (mentor)
Reviewed by: truckman, matk
assign DMA address to the wrong address. It can cause system lockup
or other mysterious errors. Since most sound cards requires low DMA
address(BUS_SPACE_MAXADDR_24BIT) sndbuf_alloc() would fail when the
audio driver is loaded after long running of operations.
Approved by: jake (mentor)
Reviewed by: truckman, matk
- Implement dcons_ischar() and dcons_load_buffer().
- If loader passed a dcons buffer address, keep using it.
(We still need a patch to cheat memory management system.)
asynchronous. I realize that this means the custom application will
not work as written, but it is not okay to break most users of ugen(4).
The major problem is that a bulk read transfer is not an interrupt
saying that X bytes are available -- it is a request to be able to
receive up to X bytes, with T timeout, and S short-transfer-okayness.
The timeout is a software mechanism that ugen(4) provides and cannot
be implemented using asynchronous reads -- the timeout must start at
the time a read is done.
The status of up to how many bytes can be received in this transfer
and whether a short transfer returns data or error is also encoded
at least in ohci(4)'s requests to the controller. Trying to detect
the "maximum width" results in using a single buffer of far too
small when an application requests a large read.
Even if you combat this by replacing all buffers again with the
maximal sized read buffer (1kb) that ugen(4) would allow you to
use before, you don't get the right semantics -- you have to
throw data away or make all the timeouts invalid or make the
short-transfer settings invalid.
There is no way to do this right without extending the ugen(4) API
much further -- it breaks the USB camera interfaces used because
they need a chain of many maximal-width transfers, for example, and
it makes cross-platform support for all the BSDs gratuitously hard.
Instead of trying to do select(2) on a bulk read pipe -- which has
neither the information on desired transfer length nor ability to
implement timeout -- an application can simply use a kernel thread
and pipe to turn that endpoint into something poll-able.
It is unfortunate that bulk endpoints cannot provide the same semantics
that interrupt and isochronous endpoints can, but it is possible to just
use ioctl(USB_GET_ENDPOINT_DESC) to find out when different semantics
must be used without preventing the normal users of the ugen(4) device
from working.
The original idea was to use it for firmware upgrading and similar
operations. In real life almost all Bluetooth USB devices do not
need firmware download. If device does require firmware download
then ugen(4) (or specialized driver like ubtbcmfw(8)) should be
used instead.
MFC after: 3 days
in udp_input(), since the udbinfo lock is used to prevent removal of
the inpcb while in use (i.e., as a form of reference count) in the
in-bound path.
RELENG_5 candidate.
- Add a new _lock() call to each API that locks the associated chain lock
for a lock_object pointer or wait channel. The _lookup() functions now
require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
the associated queue structure internally via _lookup() rather than
accepting a pointer from the caller. For turnstiles, this means that
the actual lookup of the turnstile in the hash table is only done when
the thread actually blocks rather than being done on each loop iteration
in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup()
is no longer used outside of the sleep queue code except to implement an
assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
lock is already required. For condition variables, this lets the
cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
while testing the waiters count. This means that the waiters count
internal to condition variables is no longer protected by the interlock
mutex and cv_broadcast() and cv_signal() now no longer require that the
interlock be held when they are called. This lets consumers of condition
variables drop the lock before waking other threads which can result in
fewer context switches.
MFC after: 1 month
it isn't printed if the IP address in question is '0.0.0.0', which is
used by nodes performing DHCP lookup, and so constitute a false
positive as a report of misconfiguration.
processes in jail could create raw sockets, additional access control
checks were added to raw IP sockets to limit the ways in which those
sockets could be used. Specifically, only the socket option IP_HDRINCL
was permitted in rip_ctloutput(). Other socket options were protected
by a call to suser(). This change was required to prevent processes
in a Jail from modifying system properties such as multicast routing
and firewall rule sets.
However, it also introduced a regression: processes that create a raw
socket with root privilege, but then downgraded credential (i.e., a
daemon giving up root, or a setuid process switching back to the real
uid) could no longer issue other unprivileged generic IP socket option
operations, such as IP_TOS, IP_TTL, and the multicast group membership
options, which prevented multicast routing daemons (and some other
tools) from operating correctly.
This change pushes the access control decision down to the granularity
of individual socket options, rather than all socket options, on raw
IP sockets. When rip_ctloutput() doesn't implement an option, it will
now pass the request directly to in_control() without an access
control check. This should restore the functionality of the generic
IP socket options for raw sockets in the above-described scenarios,
which may be confirmed with the ipsockopt regression test.
RELENG_5 candidate.
Reviewed by: csjp
Implement preemption between threads in the same ksegp in out of slot
situations to prevent priority inversion.
Tested by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
followed by "make depend" shouldn't do anything. It doesn't
seem to be a problem anymore, and if someone finds it to break
again, please contact me so we can work on a real fix.
Reviewed by: bde
- Sort kmod.mk knobs in the documentation section.
- Fixed misuses of the word "KLD" which stands for
"kernel ld", or "kernel linker", where kernel
module is meant.
- Removed redundant uses of ${.OBJDIR}.
- Whitespace and indentation fixes.
- CLEANFILES cleanup.
- Target redefinition protection (install.debug).
Submitted by: bde, ru
Reviewed by: ru, bde
- push all bridge logic from if_ethersubr.c into bridge.c
make bridge_in() return mbuf pointer (or NULL).
- call only bridge_in() from ether_input(), after ng_ether_input()
was optinally called.
- call bridge_in() from ng_ether_rcv_upper().
Long description: http://lists.freebsd.org/mailman/htdig/freebsd-net/2004-May/003881.html
Reported by: Jian-Wei Wang <jwwang at FreeBSD.csie.NCTU.edu.tw>
Tested by: myself, Sergey Lyubka
Reviewed by: sam
Approved by: julian (mentor)
MFC after: 2 months
interface with the network stack but are not yet sufficiently
synchronized to run without the Giant lock. It migh be possible
to mark the interfaces as IFF_NEEDSGIANT, but I'm unable to test
that configuration and am unfamiliar with the architecture of
i4b.
New devicename is ttyy{unit}{port}
No callout devices created as there is no modemcontrol on these ports.
Add data structure to represent each port to avoid excessive array use.
from within umass_ufi_transform(). This includes the 12-byte commands
FORMAT_UNIT, WRITE_AND_VERIFY, VERIFY, and READ_FORMAT_CAPACITIES
(sorted in numerical order).
Reviewed by: ken, scottl
MFC after: 2 weeks
md(8). The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.
There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
by the time that kldload(8) returns. Satisfy that by making the GEOM
module load event -- only when the kernel is !cold -- wait until the
GEOM module init function has finished instead of returning immediately.
This is the other half of fixing md(8) (actually, "mfs" in fstab(5))
that is similar to r1.128 of src/sys/dev/md/md.c. This bug would be
why RAM disks would often fail on boot and the first call to mdconfig(8)
would probably fail.
pjd has ideas for not requiring kldload(8) to work synchronously for
control devices that could make this obsolete.
Silence on: -arch
that conjures up the device node so it isn't true PNP. Noticed by jhb@.
* Add an attachment for esscontrol since it too uses ISA_PNP_PROBE.
* Move an attachment from snd_mss to snd_pnpmss. The latter is the real
PNP user.
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.
I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.
With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.
This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.
Extract the struct cdev pointer and the tty device from inside rather than
incorrectly casting the 'struct cdev *' pointer to a 'dev_t' int. Not
that this was particularly important since it was only used for reading
vmcore files.
that was fixed by this should not normally happen, and since I did not
record the traces of my failed build attempt that had been solved with
that change, it's not entirely clear whether it hadn't been a pilot
error on my end. In dubio pro reo. :-)
* Fix a bug where caches were flushed on non-C3 transitions.
* Be sure a working flush cache instruction is present before using it.
* Disable C3 completely if it isn't present.
well. This field is actually used by various netisr functions to determine
the availablility of the specified netisr. This uncomplete unregister leads
directly to a crash when the KLD unregistering the netisr is unloaded.
Submitted by: Sam <sah@softcardsystems.com>
MFC after: 3 days
remove previous entropy harvesting mutex names as they are no longer
present. Commit to this file was ommitted when randomdev_soft.c:1.5
was made.
Feet shot: Robert Huff <roberthuff at rcn dot com>
Sockets in the listen queues have reference counts of 0, so if the
protocol decides to disconnect the pcb and try to free the socket, this
triggered a race with accept() wherein accept() would bump the reference
count before sofree() had removed the socket from the listen queues,
resulting in a panic in sofree() when it discovered it was freeing a
referenced socket. This might happen if a RST came in prior to accept()
on a TCP connection.
The fix is two-fold: to expand the coverage of the accept mutex earlier
in sofree() to prevent accept() from grabbing the socket after the "is it
really safe to free" tests, and to expand the logic of the "is it really
safe to free" tests to check that the refcount is still 0 (i.e., we
didn't race).
RELENG_5 candidate.
Much discussion with and work by: green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
may want to shut down here but the chance of BIOS vendors getting this
wrong is high. They're only supposed to announce this when all batteries
hit their critical level but past experience indicates we should be
conservative about this for now.
- Trade off granularity to reduce overhead, since the current model
doesn't appear to reduce contention substantially: move to a single
harvest mutex protecting harvesting queues, rather than one mutex
per source plus a mutex for the free list.
- Reduce mutex operations in a harvesting event to 2 from 4, and
maintain lockless read to avoid mutex operations if the queue is
full.
- When reaping harvested entries from the queue, move all entries from
the queue at once, and when done with them, insert them all into a
thread-local queue for processing; then insert them all into the
empty fifo at once. This reduces O(4n) mutex operations to O(2)
mutex operations per wakeup.
In the future, we may want to look at re-introducing granularity,
although perhaps at the granularity of the source rather than the
source class; both the new and old strategies would cause contention
between different instances of the same source (i.e., multiple
network interfaces).
Reviewed by: markm
reaching into the socket buffer. This prevents a number of potential
races, including dereferencing of sb_mb while unlocked leading to
a NULL pointer deref (how I found it). Potentially this might also
explain other "odd" TCP behavior on SMP boxes (although haven't
seen it reported).
RELENG_5 candidate.
and make it visible (same way as in OpenBSD). Describe usage in manpage.
This change is useful for creating custom free methods, which
call default free method at their end.
While here, make malloc declaration for mbuf tags more informative.
Approved by: julian (mentor), sam
MFC after: 1 month
when the spin lock in question isn't -- it's the critical_enter() that
KDB set. No more panic in DDB for console -> syscons -> tty -> knote
operations.
be used to announce various system activity.
The auxio device provides auxiliary I/O functions and is found on various
SBus/EBus UltraSPARC models. At present, only front panel LED is
controlled by this driver.
Approved by: jake (mentor)
Reviewed by: joerg
Tested by: joerg
state management corruption, mbuf leaks, general mbuf corruption,
and at least on i386 a first level splash damage radius that
encompasses up to about half a megabyte of the memory after
an mbuf cluster's allocation slab. In short, this has caused
instability nightmares anywhere the right kind of network traffic
is present.
When the polymorphic refcount slabs were added to UMA, the new types
were not used pervasively. In particular, the slab management
structure was turned into one for refcounts, and one for non-refcounts
(supposed to be mostly like the old slab management structure),
but the latter was almost always used through out. In general, every
access to zones with UMA_ZONE_REFCNT turned on corrupted the
"next free" slab offset offset and the refcount with each other and
with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).
Fix things so that the right type is used to access refcounted zones
where it was not before. There are additional errors in gross
overestimation of padding, it seems, that would cause a large kegs
(nee zones) to be allocated when small ones would do. Unless I have
analyzed this incorrectly, it is not directly harmful.
with acpi but the timer runs twice as fast. Note that the main problem
(system doesn't work properly with acpi disabled) should be fixed separately.
Changes:
* Add a quirk to disable the timer
* Merge the P5A and P5A-B quirks since they appear to be based on the
same ASL.
PR: i386/72450
Tested by: Kevin Oberman <oberman es.net>
MFC after: 3 days
We return ENOBUF to indicate the problem, which is an errno that should be
handled well everywhere.
Requested & Submitted by: green
Silently okay'ed by: The rest of the firewall gang
MFC after: 3 days
Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)
Reviewed by: tegge@
PR: i386/61852
problem I had but it's happening in code that is messing around with
register windows - I'm willing to live with that piece being sensitive
to this and it looks like the other problems we had reported lately
are not fixed by using -O instead of -O2.
Sorry for the churn. Looks like I need a second pointy hat. Someone
tells me they stack well. :-))))
This also adds support for bigger disks on the controller I have access to,
and maybe others if I understood the adhoc methods used on those.
Those with more PC98 bigdrive controllers it is hereby invited to add/fix
support for those in geom_pc98.c and not using #ifdef PC98 all over the place.
callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an
assertion regarding Giant if they enter other parts of the stack from
the callout.
MFC after: 3 days
Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
-O2 on kernel compiles after all. While working on adding a KASSERT
to sparc64/sparc64/rwindow.c I found that it was "position sensitive",
putting it above a call to flushw() instead of below caused corruption
of processes on the system. jake and jhb have both confirmed there is
no obvious explanation for that. The exact same kernel code does not
have the process corruption problem if compiled with -O instead of -O2.
There have been signs of similar issues floated on the sparc64@ mailing
list, lets see if this helps make them go away.
Note this isn't an optimal fix as far as the file format goes, if this
disgusts too many people I'll fix it the right way. Since compiling
with something other than -O is a known problem this format would prevent
a change to the default causing grief. And this may also help motivate
finding out what the compiler is doing wrong so we can shift back to
using -O2. :-)
My turn for the pointy hat... One of the florescent ones...
MFC after: 2 days
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.
Discussed with: davidxu, deischen, marcel
MFC after: 3 days
allocate unallocated memory resources from the top 32MB of the address
space rather than the top 2GB. While the latter works on some
chipsets, it fails badly on others. 32MB is more conservative and
matches what cheap harware from this era is hardwired to pass.
RAM. Many older, legacy bridges only allow allocation from this
range. This only appies to devices who don't have their memory
assigned by the BIOS (since we allocate the ranges so assigned
exactly), so should have minimal impact.
Hoewver, for CardBus bridges (cbb), they rarely get the resources
allocated by the BIOS, and this patch helps them greatly. Typically
the 'bad Vcc' messages are caused by this problem.
(and panic). To try to finish making BPF safe, at the very least,
the BPF descriptor lock really needs to change into a reader/writer
lock that controls access to "settings," and a mutex that controls
access to the selinfo/knote/callout. Also, use of callout_drain()
instead of callout_stop() (which is really a much more widespread
issue).
Discussion: this panic (or waning) only occurs when the kernel is
compiled with INVARIANTS. Otherwise the problem (which means that
the vp->v_data field isn't NULL, and represents a coding error and
possibly a memory leak) is silently ignored by setting it to NULL
later on.
Panicking here isn't very helpful: by this time, we can only find
the symptoms. The panic occurs long after the reason for "not
cleaning" has been forgotten; in the case in point, it was the
result of severe file system corruption which left the v_type field
set to VBAD. That issue will be addressed by a separate commit.
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.
Reviewed by: julian
table. acpidump(8) concatenates the body of the DSDT and SSDTs so an
edited ASL will contain all the necessary information. We can't use a
completely empty table since ACPI-CA reports this as a problem.
MFC after: 3 days
This really doesn't belong here but is preferred (for the moment) over
adding yet another mechanism for sending msgs from the kernel to user apps.
Reviewed by: imp
the raw values including for child process statistics and only compute the
system and user timevals on demand.
- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.
Submitted by: bde (mostly)
MFC after: 1 month
to control the packets injected while in sack recovery (for both
retransmissions and new data).
- Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c.
- Add a new sysctl (net.inet.tcp.sack.initburst) that controls the
number of sack retransmissions done upon initiation of sack recovery.
Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
turnstile chain lock until after making all the awakened threads
runnable. First, this fixes a priority inversion race. Second, this
attempts to finish waking up all of the threads waiting on a turnstile
before doing a preemption.
Reviewed by: Stephan Uphoff (who found the priority inversion race)
+ * 9: 0x3f0-0x3f3,0x3f4-0x3f5,0x3f7
This requires only one change to support. Rather than keying on the
size of the resource being 2, instead key off the end & 7 being 3.
This covers the same cases that the size of 2 would catch, but also
covers the new above case.
In addition, I think it is clearer to use the end in preference to the
size and start for case #8 as well. Turns two tests into one, and
catches no other cases.
Make minor commentary changes to deal with new case #9.
# This change is specifically minimal to allow easy MFC. A more
# extensive change will go into current once I've had a chance to test
# it on a lot of hardware...
with different file systems. This may cause ill things
with my previous fix. Now it translate fsid of direct child of
mount point directory only.
Pointed out by: Uwe Doering