* block packets that fail to create state table entries
* only allow non-fragmented packets to influence whether or not a logged
packet is the same as the one logged before.
* correct the ICMP packet checksum fixing up when processing ICMP errors for NAT
* implement a maximum for the number of entries in the NAT table (NAT_TABLE_MAX
and ipf_nattable_max)
* frsynclist() wasn't paying attention to all the places where interface
names are, like it should.
* fix comparing ICMP packets with established TCP state where only 8 bytes
of header are returned in the ICMP error.
MFC after: 1 week
* Obtain/release schedlock around calls to calcru.
* Sort switch cases which do not cascade per style(9).
* Sort local variables per style(9).
* Remove "superfluous" whitespace.
* Cleanup handling of NULL uap->tp in clock_getres(). It would probably
be better to return EFAULT like clock_gettime() does by passing the
pointer to copyout(), but I presume it was written to not fail on
purpose in the original code. I'll defer to -standards on this one.
Reported by: bde
This is not really used by the process but it's confusing to some
status readers to see zombie processes the "runnin" threads.
Pointed out by: Don Lewis <truckman@FreeBSD.org>
the GEOM topology.
There are still issues with not detaching from cam correctly such that
upon a device detach there's an invalid pointer dereference from the
later call to cam_rescan().
that are on a CISS bus to be exported up to CAM and made available as normal
devices. This will typically add one or two buses to CAM, which will be
numbered starting at 32 to allow room for CISS proxy buses. Also, the CISS
firmware usually hides disk devices, but these can also be exposed as 'pass'
devices if you set the hw.ciss.expose_hidden_physical tunable.
Sponsored by: Tape Laboratories, Inc.
MFC After: 3 days
where it is known to detect a problem but the problem is not very easy
to fix. The warning became very common recently after a call to calcru()
was added to fill_kinfo_thread().
Another (much older) cause of "negative times" (actually non-monotonic
times) was fixed in rev.1.237 of kern_exit.c.
Print separate messages for non-monotonic and negative times.
from exit1(). sched_exit() must be called unconditionally from exit1().
It was called almost unconditionally because the only exits on system
shutdown if at all.
(2) Removed the comment that presumed to know what sched_exit() does.
sched_exit() does different things for the ULE case. The call became
essential when it started doing load average stuff, but its caller
should not know that.
(3) Didn't fix bugs caused by bitrot in the condition. The condition was
last correct in rev.1.208 when it was in wait1(). There p was spelled
curthread->td_proc and was for the waiting parent; now p is for the
exiting child. The condition was to avoid lowering init's priority.
It should be in sched_exit() itself. Lowering of priorities is broken
in other ways in at least the 4BSD scheduler, and doing it for init
causes less noticeable problems than doing it for for shells.
Noticed by: julian (1)
least the pci device unloadable
- Use ttymalloc() rather than a plain malloc to allocate the
rp->rp_tty ttys. This is now required due to the recent locking
changes to ttys and prevents a panic due to locking an unitialized
t_mtx.
- Allow the pci driver to be unloaded. This involved moving
the call rp_releaseresource() to the end of rp_pcireleaseresource(),
since rp_pcireleaseresource() uses ctlp->dev, which is freed
by rp_releaseresource().
- Allow the generic part of the driver to be unattached by providing
a hook to cancel timeouts.
Glanced at by: obrien
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
ld: locore.o: non-pic code with imm relocation against dynamic
symbol `__gp'
With binutils 2.15, ld(1) defines the implicit/automatic symbol __gp
as a dynamic symbol and thus will now complain when used in a non-PIC
fashion (the immediate relocation used to set the GP register). Resolve
this by defining __gp in the linker script. Make sure __gp is aligned
on a 16-byte boundary.
Note: the 0x200000 magic offset is due to having a 22-bit GP-relative
relocation. The GOT will be accessed with negative offsets from GP.
that it is a series of alphabetically-ordered #fidef's, from Bruce Evans.
Define two new thread-related values in kproc_info, from Cyrille Lefevre.
Also remove a few values from kproc_info that were not needed, and change
around a few comments, from me. Changes are combined into a single commit
simply because it is a hassle to make sure that alignments and sizes are
not changed on any platform when modifying kproc_info.
socket from its accept queue when aborting it during a new inbound
connection. Update spx_input() to acquire the accept lock, assert
the condition of the socket on its parent queue, and approriately
disconnect it from the queue before calling soabort() on it.
Tweak things so that ng_fec has a chance of working with things
other than ethernet. Use ifp->if_output of the underlying interfaces
and use IF_HANDOFF() rather than depending on ether_output() and
ether_output_frame() explicitly. Also, don't insist that underlying
devices be IFM_ETHER when checking their link states in the link
monitor code.
With these changes, I was able to create a two channel bundle
consisting of one ethernet interface and one 802.11 wireless
device (via ndis). Note that this only works because both devices
use the same if_output vector: ng_fec will not let you bundle
devices with different output vectors together (it really doesn't
make sense to do that).
underlying interfaces rather than using ac_netgraph in struct arpcom.
The latter is meant only for use by ng_ether, and using it breaks
interoperability with the rest of netgraph.
socket lock over pulling so_options and so_linger out of the socket
structure in order to retrieve a consistent snapshot. This may be
overkill if user space doesn't require a consistent snapshot.
resolved by socket locking: in particular, that we test the connection
state at the socket layer without locking, request that the protocol
begin listening, and then set the listen state on the socket
non-atomically, resulting in a non-atomic cross-layer test-and-set.
lots of errors. Blind substitution of "dev_t foo" by "struct cdev *foo"
in comments usually just created an English syntax error (e.g.,
"struct cdev *changes"), but here it did less than that since the dev_t
is a user dev_t.
in npxsetregs() too. npxsetregs() must overwrite the previous state, and
it is never paired with an npxgetregs() that would defuse the previous
state (since npxgetregs() would have fninit'ed the state, leaving nothing
to do).
PR: 68058 (this should complete the fix)
Tested by: Simon Barner <barner@in.tum.de>
to vm_map_find() that is less likely to be outside of addressable memory
for 32-bit processes: just past the end of the largest possible heap.
This is the same hint that mmap() uses.
ki_childutime, and ki_emul. Also uses the timevaladd() routine to
correct the calculation of ki_childtime. That will correct the value
returned when ki_childtime.tv_usec > 1,000,000.
This also implements a new KERN_PROC_GID option for kvm_getprocs().
(there will be a similar update to lib/libkvm/kvm_proc.c)
Submitted by: Cyrille Lefevre
which just mark areas which are empty due to issues with the alignment
of already-existing fields. This defines several unrelated variables
in one shot, because most of the work for updating kinfo_proc is making
sure the sizeof(struct kinfo_proc) remains the same across all hardware
platforms, and that no space is wasted on any platform due to alignment
issues with the new variables.
Submitted by: some by Cyrille Lefevre, some by me
unnecessary because cpu_setregs() and/or npxinit() always sets CR0_TS
during system initialization, and CR0_TS is set in the next statement
(fpstate_drop()) if necessary after system initialization. Setting
it unnecessarily was less than a pessimization since it broke the
invariant that the npx can be used without an npxdna() trap if
fpucurthread is non-null. The broken invariant became harmful when I
added an fnclex to npxdrop().
Removed setting of CR0_MP in exec_setregs(). This was similarly
unnecessary but was harmless.
Updated comments (mainly by removing them). Things are simpler now
that we have cpu_setregs() and don't support a math emulator or pretend
to support not having either a math emulator or an npx.
Removed the ifdef for avoiding setting CR0_NE in the !SMP case in
cpu_setregs(). npx_probe() should reverse the setting if it wants to
force IRQ13 exception handling for testing.
lock state. Convert tsleep() into msleep() with socket buffer mutex
as argument. Hold socket buffer lock over sbunlock() to protect sleep
lock state.
Assert socket buffer lock in sbwait() to protect the socket buffer
wait state. Convert tsleep() into msleep() with socket buffer mutex
as argument.
Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK()
in order to call into these functions with the lock, as well as to
start protecting other socket buffer use in their implementation. Drop
the socket buffer mutexes around calls into the protocol layer, around
potentially blocking operations, for copying to/from user space, and
VM operations relating to zero-copy. Assert the socket buffer mutex
strategically after code sections or at the beginning of loops. In
some cases, modify return code to ensure locks are properly dropped.
Convert the potentially blocking allocation of storage for the remote
address in soreceive() into a non-blocking allocation; we may wish to
move the allocation earlier so that it can block prior to acquisition
of the socket buffer lock.
Drop some spl use.
NOTE: Some races exist in the current structuring of sosend() and
soreceive(). This commit only merges basic socket locking in this
code; follow-up commits will close additional races. As merged,
these changes are not sufficient to run without Giant safely.
Reviewed by: juli, tjr
ip_ctloutput(), as it may need to perform blocking memory allocations.
This also improves consistency with locking relative to other points
that call into ip_ctloutput().
Bumped into by: Grover Lines <grover@ceribus.net>
output to permanently (not ephemerally) go to the console. It is also
sent to any other console specified by TIOCCONS as normal.
While I'm here, document the kern.log_console_output sysctl.
NULL ifc.ifc_buf pointer, to determine the expected buffer size.
The submitted fix only takes account of interfaces with an AF_INET
address configured. This could no doubt be improved.
PR: kern/45753
Submitted by: Jacques Garrigue (with cleanups)
was received on a broadcast address on the input path. Under certain
circumstances this could result in a panic, notably for locally-generated
packets which do not have m_pkthdr.rcvif set.
This is a similar situation to that which is solved by
src/sys/netinet/ip_icmp.c rev 1.66.
PR: kern/52935
be suspended in thread_suspend_check, after they are resumed, all
threads will call thread_single, but only one can be success,
others should retry and will exit in thread_suspend_check.
wasn't actually clean, it was saving the xmm registers as left over by the
bios. fninit() doesn't clear those.
In fpudna(), instead of doing a fninit() and forgetting to load the initial
mxcsr, do a full fxrstor(&fpu_cleanstate). Otherwise we hand over whatever
random values are left in the xmm registers by the last user.
I'm not certain of whether this is excessive paranoia or not, but there was
an outright bug in neglecting to set the mxcsr value that caused awk to
SIGFPE in some case. Especially for Tim Robbins. :-)
i386 probably should do something about the mxcsr setings too.
Found by: tjr
encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer
when registering trace records. 'ipov' is specific to IPv4, and
will therefore be uninitialized.
[This fandango is only necessary in the first place because of our
host-byte-order IP field pessimization.]
PR: kern/60856
Submitted by: Galois Zheng
rwatson_netperf:
Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.
Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.
Simplify the logic in sodisconnect() since we no longer need spls.
NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.
unless the segment really contains the last of the data for the stream.
PR: kern/34619
Obtained from: OpenBSD (tcp_output.c rev 1.47)
Noticed by: Joseph Ishac
Reviewed by: George Neville-Neil
frstor can trap despite it being a control instruction, since it bogusly
checks for pending exceptions in the state that it is overwriting.
This used to be a non-problem because frstor was always paired with a
previous fnsave, and fnsave does an implicit fninit so any pending
exceptions only remain live in the saved state. Now frstor is sometimes
paired with npxdrop() and we must do a little more than just forget
that the npx was used in npxdrop() to avoid a trap later. This is a
non-problem in the FXSR case because fxrstor doesn't do the bogus check.
FXSR is part of SSE, and npxdrop() is only in FreeBSD-5.x, so this bug
only affected old machines running FreeBSD-5.x.
PR: 68058
- Lock down low hanging fruit use of sb_flags with socket buffer
lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.
- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to
follow.
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.
Explained by: jhb
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
- prevent an endless loop with route-to lo0, fixes PR 3736 (dhartmei@)
- The rule_number parameter for pf_get_pool() needs to be 32 bits, not 8 -
this fixes corruption of the address pools with large rulesets.
(mcbride@, pb@)
Reviewed-by: dhartmei
timeout values in the CAM CCBs. Divide by 1000 to get values in seconds
which are what ata(4) timeouts internally use.
This does lose granularity, though, and small values can now round down
to zero. It's probably worth making all ata(4) timeouts in terms of
hz/ticks/milliseconds/something.
Version 3.5 brings:
- Atomic commits of ruleset changes (reduce the chance of ending up in an
inconsistent state).
- A 30% reduction in the size of state table entries.
- Source-tracking (limit number of clients and states per client).
- Sticky-address (the flexibility of round-robin with the benefits of
source-hash).
- Significant improvements to interface handling.
- and many more ...
- Remove pflog and pfsync modules. Things will change in such a fashion
that there will be one module with pf+pflog that can be loaded into
GENERIC without problems (which is what most people want). pfsync is no
longer possible as a module.
- Add multicast address for in-kernel multicast pfsync protocol. Protocol
glue will follow once the import is done.
- Add one more mbuf tag
placates gcc which seems to like to complain about -1 being assigned to
an unsigned value. It is well defined and intended, but since signess bugs
are being hunted just change to 0xffffffff.
o Mask the lower 8 bits, not the lower 4 bits for the ai_capabilities word.
All 8 bits are defined and the 0xf was almost certainly a typo.
o Define APM_UNKNOWN to 0xff for emulation layer.
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.
Submitted by: Gleb Smirnoff
Reviewed by: -current (silence)
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).
PR: kern/42727
Obtained from: NetBSD (ip_input.c rev 1.151)
fixes the problem of UDP sockets getting wedged in a connected state (and
bound to their destination) under heavy load.
Temporary bind/connect should probably be deleted in future
as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].
Notes:
- INP_LOCK() is already held in udp_output(). The connection is in effect
happening at a layer lower than the socket layer, therefore in theory
socket locking should not be needed.
- Inlining the in_pcbdisconnect() operation buys us nothing (in the case
of the current state of the code), as laddr is not part of the
inpcb hash or the udbinfo hash. Therefore there should be no need
to rehash after restoring laddr in the error case (this was a
concern of the original author of the patch).
PR: kern/41765
Requested by: gnn
Submitted by: Jinmei Tatuya (with cleanups)
Tested by: spray(8)
so_timeo Used as a sleep/wakeup address, no locking.
sb_* Almost all socket buffer fields locked with
sockbuf lock for the oskcet buffer.
so_cred Static after socket creation.
using rawcb_mtx. Hold this mutex while modifying or iterating over
the control list; this means that the mutex is held over calls into
socket delivery code, which no longer causes a lock order reversal as
the routing socket code uses a netisr to avoid recursing socket ->
routing -> socket.
Note: Locking of IPsec consumers of rawcb_list is not included in this
commit.
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
other modules to explode. eg: snd_ich->snd_pcm and umass->usb.
The problem was that I was using the unified base address of the module
instead of finding the start address of the section in question.
1. Remove a race whereby contigmalloc() would deadlock against the
running processes in the system if they kept reinstantiating
the memory on the active and inactive page queues that it was
trying to flush out. The process doing the contigmalloc() would
sit in "swwrt" forever and the swap pager would be going at full
force, but never get anywhere. Instead of doing it until the
queues are empty, launder for as many iterations as there are
pages in the queue.
2. Do all laundering to swap synchronously; previously, the vnode
laundering was synchronous and the swap laundering not.
3. Increase the number of launder-or-allocate passes to three, from
two, while failing without bothering to do all the laundering on
the third pass if allocation was not possible. This effectively
gives exactly two chances to launder enough contiguous memory,
helpful with high memory churn where a lot of memory from one pass
to the next (and during a single laundering loop) becomes dirtied
again.
I can now reliably hot-plug hardware requiring a 256KB contigmalloc()
without having the kldload/cbb ithread sit around failing to make
progress, while running a busy X session. Previously, it took killing
X to get contigmalloc() to get further (that is, quiescing the system),
and even then contigmalloc() returned failure.
live in src/lib/libc/gmon/gmon.c. glibc puts these prototypes in the same
header, so put them here for the sake of consistency.
PR: bin/4459
Reviewed by: bde
success and a proper errno value on failure. This makes it
consistent with cv_timedwait(), and paves the way for the
introduction of functions such as sema_timedwait_sig() which can
fail in multiple ways.
Bump __FreeBSD_version and add a note to UPDATING.
Approved by: scottl (ips driver), arch
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:
SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
allocated ressouces should be ultimately freed in gv_destroy_geom()
(when unloading the module and not earlier), but I need to look at this
more closely.
I added bounds checking to the patch and cg improved
the formular.
Submitted by: Andriy Gapon <avg@icyb.net.ua>
PR: kern/65485
Approved by: cg
Reviewed by: imp, rwatson, le
I've had this sitting in my tree for a long time and I can't seem to
find who sent it to me in the first place, apologies to whoever is
missing out on a Contributed by: line here.
I belive it works as it should.
In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.
Note: Check new features w/ LINT *and* w/ LINT minus the new feature.
Found-by: rwatson
allocation was passed up to nexus. Now, we probe sysresource objects and
manage the resources they describe in a local rman pool. This helps
devices which attach/detach varying resources (like the _CST object) and
module loads/unloads. The allocation/release routines now check to see if
the resource is described in a child sysresource object and if so,
allocate from the local rman. Sysresource objects add their resources to
the pool and reserve them upon boot. This means sysresources need to be
probed before other ACPI devices.
Changes include:
* Add ordering to the child device probe. The current order is: system
resource objects, embedded controllers, then everything else.
* Make acpi_MatchHid take a handle instead of a device_t arg.
* Replace acpi_{get,set}_resource with the generic equivalents.
- Define type rlim_t and struct timeval. This makes autoconf
happier. (PR: 62388)
- Add RLIMIT_AS, which is an alias for our RLIMIT_VMEM.
- structs orlimit and loadavg, as well as macros CP*, should
only appear if __BSD_VISIBLE.
- Use underscored versions of int32_t and fixpt_t in case
<sys/types.h> is not included.
- Document areas of non-conformance.
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.
__FreeBSD_version bump will follow.
Tested-by: (i386)LINT
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
in a TAILQ. Re-arrange some of the ecb elements so that they can stay
stable through alloc/free cycles while the rest get bzero'd.
- Use the tag_id from the ecb rather than fro the ccb. The latter is only
for target mode.
- Honor the ccb flags for tag_action when deciding whether to do a tagged
or untagged transaction.
- Re-arrange autosense completion so that it works correctly in failure
cases.
- Turn on the PI_TAG_ABLE flag so that CAM will send us tagged transactions.
This enables tagged queueing in the driver.
SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or
manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local
copy while holding the socket lock, then release the socket lock
to externalize to userspace.
order definition for witness. Send lock before receive lock, and
socket locks after accept but before select:
filedesc -> accept -> so_snd -> so_rcv -> sellck
All routing locks after send lock:
so_rcv -> radix node head
All protocol locks before socket locks:
unp -> so_snd
udp -> udpinp -> so_snd
tcp -> tcpinp -> so_snd
before calling sotryfree().
-- Body of earlier bulk commit this belonged with --
Log:
Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.
Reviewed by: tegge@
rig a PREPEND macro for ALTQ as the POLL/DEQUEUE semantic is very bad in
terms of locking. We make this a full functional queue to allow "bulk
dequeue" which will further reduce the locking overhead (for non-altq
enabled devices). Drivers will access this via the following macros, which
will show up in <net/if_var.h> once we expose ALTQ to the build:
IFQ_DRV_DEQUEUE(ifq, m) - takes a mbuf off the queue (driver queue first)
IFQ_DRV_PREPEND(ifq, m) - pushes a mbuf back to the driver queue
IFQ_DRV_PURGE(ifq) - drops all packets in both queues
IFQ_DRV_IS_EMPTY(ifq) - checks for pending mbufs in either queue
One has to make sure that the first three are protected by a driver mutex.
At the moment most network drivers still require Giant, so this is not an
issue. Even those that have thier own mutex usually hold it in if_start and
the like, so this requirement is almost always satisfied.
This evolved from a discussion with Andrew Gallatin.
protect fields in the socket buffer. Add accessor macros to use the
mutex (SOCKBUF_*()). Initialize the mutex in soalloc(), and destroy
it in sodealloc(). Add addition, add SOCK_*() access macros which
will protect most remaining fields in the socket; for the time being,
use the receive socket buffer mutex to implement socket level locking
to reduce memory overhead.
Submitted by: sam
Sponosored by: FreeBSD Foundation
Obtained from: BSD/OS
that the command succeeded. Sheesh! This makes CDROMs no longer cause an
instant panic at boot. Thanks to Jake Burkholder for providing a remote
test setup.
Also make device resets work, thanks to another typo.
pass any traffic. Unfortunately this means no full-duplex link with auto-
negotiation on hme(4) using DP83840A PHYs again.
I really thought I had tested this also on a Netra t1 100...
- add locking
- disable ALTQ3_COMPAT by default (do not remove the code to keep the diff
towards KAME small)
- put some more code under ALTQ3 conditional compilation as it should be
- account for if_xname
- some more minor compile fixes
As people started wondering:
The strange path layout "altq/altq" is there to avoid "-Isys/contrib" and
make it "-Isys/contrib/altq" instead, as we will need at least <altq/altq.h>
and <altq/if_altq.h> for kernel compilation.
The "freebsd4_..." in the privious commit is just the best tag name in the
KAME tree I could find to classify this in order to track its history. It
does *not* mean that this will go to 4-STABLE or anything of that kind.
HEAD at this point). This will not exactly live in a vendor branch, but have
the vendor backing to make it easier to exchange diffs.
This will be followed by a diff which takes most of the .c files off the
vendor branch in order to:
- add locking
- disable ALTQ3_COMPAT code (which is outdated and "un-lockable")
There is work in progress to refine the configuration API. Import this "as
is" now to have more exposure time before 5-STABLE.
This is only the import, it will be some more days until you will actually
be able to compile ALTQ support into your kernel so don't hold your breath.
HEADUPs will be posted on current@ and net@ before this is actually enabled.
No-objection: re(scottl), core(rwatson)
ruleset, the pcb is looked up once per ipfw_chk() activation.
This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).
Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.
The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png
Reviewed by: andre, luigi, rwatson
Approved by: bmilekic (mentor)
driver tries to submit the same request repeatedly, on finding the
controller cmd queue to be full.
Submitted by:ps, vkashyap
Reviewed by:re
Approved by:re
ifp->if_output() basedd on debug.mpsafenet. That way once bpfwrite()
can be called without Giant, it will acquire Giant (if desired) before
entering the network stack.
the need to synchronize access to the structure. I believe this
should fit into the stack under the necessary circumstances, but
if not we can either add synchronization or use a thread-local
malloc for the duration.
versions of various routers seen:
- Introduce igmp_mtx.
- Protect global variable 'router_info_head' and list fields
in struct router_info with this mutex, as well as
igmp_timers_are_running.
- find_rti() asserts that the caller acquires igmp_mtx.
- Annotate a failure to check the return value of
MALLOC(..., M_NOWAIT).
is the actual name here) on EBus and which are PCF8584 (on systems having
a boot-bus controller the i2c are said to not be a PCF8584). Similar to the
SUNW,envctrl devices, onboard slaves for monitoring fans, temperatures and
such hang off of these i2c devices. But there's also stuff like EEPROMs
housing the hostid of the system and the boards usally have a connector to
add custom slave devices (on CP1500 there's actually a second PCF8584 with
its own I2C bus for these).
This driver already works fine but I'm not yet sure if access to the slave
devices on CP1400/CP1500 marked as "reserved for factory use" in the docs
should be blocked (most likely these are the voltage controllers wich aren't
meant to be controller by software and even not by the firmware). Once the
issues with polled mode are fixed in the common pcf(4) part in pcf.c, this
front-end should probably honour the poll-mode property of the i2c devices.
Tested on Ultra AXe and CP1500 (Netra t1 100).
OK'ed by: joerg, nsouch
- Use "envctrl" as the name when registering this module rather than "pcf";
we can't have "pcf" as the name for all pcf(4) front-ends or we would get
conflicts.
OK'ed by: joerg
- s,pcf_,pcf_isa, to better reflect the purpose of this front-end and to
avoid conflicts.
- Don't use this front-end for attaching to EBus, declaring it as an EBus
driver was a cut&paste accident according to joerg.
OK'ed by: joerg, nsouch
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
present and thus that the PnPBIOS probe should be skipped instead of
having ACPI zero out the PnPBIOStable pointer.
- Make the PnPBIOStable pointer static to i386/i386/bios.c now that that is
the only place it is used.
its primary use is for the FEPS/FAS366 SCSI found in Sun Ultra 1e and 2
machines. Once the pci front-end is ported, this driver can replace the
amd(4) driver.
The code as-is is fairly stable. I've disabled tagged-queueing until I can
figure out a corruption bug related to it. I'm importing it now so that
people with these machines can (finally) stop netbooting and report bugs
before 5.3.
as otherwise the junk it contains may cause uhub_explore to give
up without ever trying to restart the port. This fixes the following
errors I was seeing with a VIA UHCI controller:
uhub0: port error, restarting port 1
uhub0: port error, giving up port 1
(time grows downward)
thread 1 thread 2
------------|------------
dec ref_cnt |
| dec ref_cnt <-- ref_cnt now zero
cmpset |
free all |
return |
|
alloc again,|
reuse prev |
ref_cnt |
| cmpset, read
| already freed
| ref_cnt
------------|------------
This should fix that by performing only a single
atomic test-and-set that will serve to decrement
the ref_cnt, only if it hasn't changed since the
earlier read, otherwise it'll loop and re-read.
This forces ordering of decrements so that truly
the thread which did the LAST decrement is the
one that frees.
This is how atomic-instruction-based refcnting
should probably be handled.
Submitted by: Julian Elischer
internal reference counters, UMA_ZONE_NOFREE. This way, those slabs
(with their ref counts) will be effectively type-stable, then using
a trick like this on the refcount is no longer dangerous:
MEXT_REM_REF(m);
if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) {
if (m->m_ext.ext_type == EXT_PACKET) {
uma_zfree(zone_pack, m);
return;
} else if (m->m_ext.ext_type == EXT_CLUSTER) {
uma_zfree(zone_clust, m->m_ext.ext_buf);
m->m_ext.ext_buf = NULL;
} else {
(*(m->m_ext.ext_free))(m->m_ext.ext_buf,
m->m_ext.ext_args);
if (m->m_ext.ext_type != EXT_EXTREF)
free(m->m_ext.ref_cnt, M_MBUF);
}
}
uma_zfree(zone_mbuf, m);
Previously, a second thread hitting the above cmpset might
actually read the refcnt AFTER it has already been freed. A very
rare occurance. Now we'll know that it won't be freed, though.
Spotted by: julian, pjd
all of the interface between the driver and the bus. This will enable
us to stop special casing eisa bus attachments in modules and treat them
like we treat all other busses.
In the longer run, we need to eliminate much (all?) of these interfaces
and switch to using the standard bus_alloc_resource(), but that's not
done right now.
# I've not updated the modules to include eisa, etc, just yet
Tested on: Compaq Proliant 3000/333 purchased for eisa work
mode. The 5704 apparently has some s00p3r s33kr1t registers for setting
the advertisement of pause frame ability (i.e flow control) when in
autoneg mode. If we don't set these registers correctly, we may not
be able to negotiate a proper link with some switches. (Symptom is that
the NIC reports the link as up (PCS synched) but no traffic can be
exchanged.)
PR: kern/67598
Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct
tty with a reference count of one. when ttyrel sees the count go to zero,
struct tty is freed.
Hold references for open ttys and for ttys which are controlling terminal
for sessions.
Until drivers start using ttyrel(), this commit will make no difference.