are set when we attempt to remove a buffer from a queue we should panic.
Hopefully this will catch the source of the wrong bufobj panics.
Sponsored by: Isilon Systems, Inc.
needed only for implicit connect cases. Under load, especially on SMP,
this can greatly reduce contention on the tcbinfo lock.
NB: Ambiguities about the state of so_pcb need to be resolved so that
all use of the tcbinfo lock in non-implicit connection cases can be
eliminated.
Submited by: Kazuaki Oda <kaakun at highway dot ne dot jp>
a new entry in the taskqueue struct each time it wakes up to see if it
should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL
Reviewed by: dfr, njl
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.
KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.
Submitted by: ups
vtryrecycle(). We could sometimes get into situations where two threads
could try to recycle the same vnode before this.
- vtryrecycle() is now responsible for returning the vnode to the free list
if it fails and someone else hasn't done it.
- Make a new function vfreehead() which moves a vnode to the head of the
free list and use it in vgone() to clean up that code a bit.
Sponsored by: Isilon Systems, Inc.
Reported by: pho, kkenn
due to a change made in revision 1.284 of sys/kern/kern_sig.c in August
2004 which made ptracestop() return with the process still locked.
Submitted by: Mauritz Sundell
MFC After: 3 days
mutexes, which offers lower overhead on both UP and SMP. When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks. We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access. In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.
Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.
Reviewed by: bmilekic, jeff (somewhat)
Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others
theoretically unload pci bridges or pci drivers. It will also allow
detach to work if one needed to detach a subtree.
This is inspired by looking at the p4 commits from bms to his 5.4
tree, but I didn't look at the final results.
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c: In function `fr_ipid_newfrag':
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c:397: warning: cast to pointer from integer of different size
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c: In function `fr_ipid_knownfrag':
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c:582: warning: cast from pointer to integer of different size
for the VGA I/O or memory ranges, when it's not within the default
ranges decoded by the bridge. When allocation for VGA addresses is
attempted, check that the bridge has the VGA Enable bit set before
allowing it.
As such, newbusified VGA drivers can allocate their resources when
the VGA adapter is behind a PCI-to-PCI bridge.
Reviewed by: imp@, jhb@
Only allow a process to use the x86 RDPMC instruction if it has
allocated and attached a PMC to itself.
Inform the MD layer of the "pseudo context switch out" that needs
to be done when the last thread of a process is exiting.
Move crc32() and crc32_raw() from the latter to the former. Move
the declaration of crc32_tab[] to <sys/libkern.h> as well.
Pointed out by: bde@
Tested on: ia64, sparc64
CRC logic to a new function: crc32_raw() that obtains the initial
CRC value as well as leaves any post-processing to the caller. As
such, it can be used when the initial CRC value is not ~0U or when
the final CRC value does need to be inverted (bitwise). It also
means that crc32_raw() can be called repeatedly when the data is
not available as a single block, such as for scatter/gather lists
and the likes.
Avoid the additional call overhead incured by the refactoring by
moving the implementation off crc32() to sys/systm.h and making it
inlinable. Since crc32_raw() is itself trivial and since it may
be used in loops that iterate over fragments, having it available
for inlining can be beneficial. Hence, move its implementation
to sys/systm.h as well.
Keep the original implementation of crc32() in libkern/crc32.c for
documentation purposes (as a comment of course).
Triggered by: Jose M Rodriguez (josemi at freebsd dot jazztel dot es)
Discussed on: current@
Tested on: amd64, ia64 (BVO having GPT partitions)
Jargon file candidate: BVO = By Virtue Of :-)
fact that access to RR0 does not need a prior write to the register
index because the index always reverts to 0 after the indexed register
has been accessed.
Typically when a RR or WR is to accessed, one programs the index (which
is a write to the control register), followed by a read or write to the
actual indexed register (a read pr write to the same control register).
When this non-atomic sequence is interrupted after having written the
index and low-level console I/O is done in that situation, the write to
program the index will actually write to the indexed register and nuke
state. This almost always yields a wedge.
By not programming the index register and instead just reading from RR0,
the worst case scenario is non-fatal. For if we don't actually read from
RR0 but some other register we get an invalid status, which may lead us
to conclude that the transit data register is empty when it's not or that
the receive data register contains data when it doesn't. Hence, we may
lose an output character or get a sporadic input character, but given
the situation this is a non-issue.
Full serialization is not possible due to the fact that this code needs
to work from DDB and before mutex initialization has happened.
In collaboration with: kris@, marius@
Tested by: kris@
MFC after: 1 day
X-MFC: 5.4-RELEASE candidate
the MNT_RDONLY flag if the "ro" option was passed in from userland, and
clears it otherwise. In the diskless case, the MNT_RDONLY flag is already
set when this code is reached, but there are no mount options, so it was
incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the
"ro" option was specified, and left alone otherwise.
Note that the NFS code will still happily let you mount a filesystem RW
even if the server exports it RO. I'm not sure how to fix that.
caller will be interested in the actual data contents of a vnode after
a successful lookup. This intended to help deal with lifetime issues
for device cloning and to alert autofs when filesystems need to be
mounted.
fields of an ICMP packet.
Use this to allow ipfw to pullup only these values since it does not use
the rest of the packet and it was failed on ICMP packets because they
were not long enough.
struct icmp should probably be modified to use these at some point, but
that will break a fair bit of code so it can wait for another day.
On the off chance that adding this struct breaks something in ports,
bump __FreeBSD_version.
Reported by: Randy Bush <randy at psg dot com>
Tested by: Randy Bush <randy at psg dot com>
during a data phase. Before, we would try to recover the autosense, but
the DMA engine would still be active with interrupted transfer, and we'd
quickly spiral out of control and cause massive data corruption. For now,
just reset the chip and cancel everything. The better solution is to
cancel the DMA operation, but there is no clear way to do that right now.
The data corruption problem is severe enough to warrant this fix in the
interim. Thanks to Kris Kenneway to sacrificing countless filesystems to
this bug.
MFC After: 3 days
the number of entries in exec_map (maximum number of simultaneous execs
that can be handled by the kernel). The default value of 16 is
insufficient on heavily loaded machines (particularly SMP machines), and
if it is exceeded then executing further processes will generate a SIGABRT.
This is a workaround until a better solution can be implemented.
Reviewed by: alc
MFC after: 3 days
on boards with VIA gigE controllers that are embedded in VIA chipsets.
Presumably, they don't have an external EEPROM and store the MAC
address somewhere else. To get around this, force an autoload and
read the station address from the RX filter registers instead.
This has been tested to work on both embedded and standalone
controllers.
While there also check for failed device_add_child calls.
Found by: Coventry Analysis tool[1].
Submitted by: sam[1]
Approved by: pjd (mentor)
MFC after: 1 week
here on in, if_ndis.ko will be pre-built as a module, and can be built
into a static kernel (though it's not part of GENERIC). Drivers are
created using the new ndisgen(8) script, which uses ndiscvt(8) under
the covers, along with a few other tools. The result is a driver module
that can be kldloaded into the kernel.
A driver with foo.inf and foo.sys files will be converted into
foo_sys.ko (and foo_sys.o, for those who want/need to make static
kernels). This module contains all of the necessary info from the
.INF file and the driver binary image, converted into an ELF module.
You can kldload this module (or add it to /boot/loader.conf) to have
it loaded automatically. Any required firmware files can be bundled
into the module as well (or converted/loaded separately).
Also, add a workaround for a problem in NdisMSleep(). During system
bootstrap (cold == 1), msleep() always returns 0 without actually
sleeping. The Intel 2200BG driver uses NdisMSleep() to wait for
the NIC's firmware to come to life, and fails to load if NdisMSleep()
doesn't actually delay. As a workaround, if msleep() (and hence
ndis_thsuspend()) returns 0, use a hard DELAY() to sleep instead).
This is not really the right thing to do, but we can't really do much
else. At the very least, this makes the Intel driver happy.
There are probably other drivers that fail in this way during bootstrap.
Unfortunately, the only workaround for those is to avoid pre-loading
them and kldload them once the system is running instead.
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
avoid fix) and combining ta_pending and ta_priority
Reviewed by: dwhite, dfr
MFC after: 4 days
inherit signal mask from parent thread, setup TLS and stack, and
user entry address.
Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM,
sysctl is also provided to control the scheduler scope.
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
- Introduce a global mutex, mac_bsdextended_mtx, to protect the rule
array and hold this mutex over use and modification of the rule array
and rules.
- Re-order and clean up sysctl_rule so that copyin/copyout/update happen
in the right order (suggested by: jhb done by rwatson).
destination windows were confused, one instead of other.
This error was masked, because first segment of just
established connection is usually smaller than initially
announced window, and it was successfully passed. First
window reannouncement corrected erroneous 'seqhi' value.
The error showed up when client connected to synproxy
with zero initial window, and reannounced it after
session establishment.
In collaboration with: dhartmei [we came to same patch independtly]
Reviewed by: mlaier
Sponsored by: Rambler
MFC after: 3 days
instead of assuming fixed offsets within the GDT. The hard-coded
values here have been incorrect since Peter's GDT rearranging around
10 days ago, causing ACPI resume problems.
Reviewed by: peter
16C950. Adding it here doesn't unlock any of the cool 16C950 features
(like the 128 byte fifo, the different prescalor, etc), but it does
seem to get it working for me in light testing.
Card Provided by: Ihsan Dogan
o Remove the clock interface. Not only does it conflict with the MI
version when device genclock is added to the kernel, it was also
not possible to have more than 1 clock device. This of course would
have been a problem if we actually had more than 1 clock device.
In short: we don't need a clock interface and if we do eventually,
we should be using the MI one.
o Rewrite inittodr() and resettodr() to take into account that:
1) We use the EFI interface directly.
2) time_t is 64-bit and we do need to make sure we can determine
leap years from year 2100 and on. Add a nice explanation of
where leap years come from and why.
3) This rewrite happened in 2005 so any date prior to 1/1/2005
(either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with
1/1/2005 in that case.
4) The EFI clock has a high probability of being correct, so
only (further) correct the EFI clock when the file system time
is larger. That should never happen in a time-synchronised world.
Complain when EFI lost 2 days or more.
Replace the copyright notice now that I (pretty much) rewrote all of
this file.
pumping data despite our scsi data counters being at 0, something has
gone massively wrong. The consequence of happily ignoring this is more
DMA phase errors and a disk full of spammed sectors. Instead, panic on
the first occurance to hopefully limit the damage.
MFC After: 3 days
do not correctly deal with failures. This presently risks deadlock
problems if dependency processing is held up by failures to allocate
a vnode, however, this is better than the situation with the failures.
Sponsored by: Isilon Systems, Inc.
If TCP Signatures are enabled, the maximum allowed sack blocks aren't
going to fit. The fix is to compute how many sack blocks fit and tack
these on last. Also on SYNs, defer padding until after the SACK
PERMITTED option has been added.
Found by: Mohan Srinivasan.
Submitted by: Mohan Srinivasan, Noritoshi Demizu.
Reviewed by: Raja Mukerji.
code readability and facilitates some anticipated optimizations in
tcp_sack_option().
- Remove tcp_print_holes() and TCP_SACK_DEBUG.
Submitted by: Raja Mukerji.
Reviewed by: Mohan Srinivasan, Noritoshi Demizu.
- If the peer sends the Signature option in the SYN, use of Timestamps
and Window Scaling were disabled (even if the peer supports them).
- The sender must not disable signatures if the option is absent in
the received SYN. (See comment in syncache_add()).
Found, Submitted by: Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>.
Reviewed by: Mohan Srinivasan <mohans at yahoo-inc dot com>.
tcp_ctlinput() and subject it to active tcpcb and sequence
number checking. Previously any ICMP unreachable/needfrag
message would cause an update to the TCP hostcache. Now only
ICMP PMTU messages belonging to an active TCP session with
the correct src/dst/port and sequence number will update the
hostcache and complete the path MTU discovery process.
Note that we don't entirely implement the recommended counter
measures of Section 7.2 of the paper. However we close down
the possible degradation vector from trivially easy to really
complex and resource intensive. In addition we have limited
the smallest acceptable MTU with net.inet.tcp.minmss sysctl
for some time already, further reducing the effect of any
degradation due to an attack.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.2
MFC after: 3 days
latest 82550 and 82551 chipsets (revision IDs 0x0e, 0x0f and 0x10).
We were only enabling it for revisions 0x0c and 0x0d, now it's
enabled for any 8255x NIC with a revision ID bigger than 0x0c. It
should be safe, and this is what Intel does in their open source
driver.
MFC after: 2 weeks
Tested by: Pavel Lobach lobach_pavel at mail dot ru
ineffective, depreciated and can be abused to degrade the performance
of active TCP sessions if spoofed.
Replace a bogus call to tcp_quench() in tcp_output() with the direct
equivalent tcpcb variable assignment.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1
MFC after: 3 days
This also removes the warning timeout on the taskqueues stalling as
I'm tired of getting ATA error reports for problems in other parts ;)
Misc cosmetic and comment cleanups now we are here.
number of task threads to start on boot. Go back to a default of 3
threads to work around lost battery state problems. Users that need
a setting of 1 can set this via the tunable. I am investigating the
underlying issues and this tunable can be removed once they are solved.
MFC after: 2 days
HWPMC_HOOKS is defined. The pmc_cpu_is_*() functions in this file
are referenced unconditionally by hwpmc(4).
This is mostly a stop-gap. The pmc_cpu_is*() function should
probably be declared inline in <sys/pmc.h> or <sys/pmckern.h> and
the function pointers with corresponding SX lock should probably
be moved to another file and compiled conditionally upon HWPMC_HOOKS.
Ok'd by: jkoshy@
includes the MD header for us. Do not include <machine/specialreg.h>
as it is not a header file that can be included from MI files. It
is included from <machine/pmc_mdep.h> if so needed and possible.
Ok'd: jkoshy@
inclusion of <sys/pmc.h> and depending on being included from
that header file.
o Include any MD specific header files that otherwise need to be
included from MI files.
Ok'd: jkoshy@
<machine/pmc_mdep.h> here.
o Remove the #error directive. There's no union md_pm referenced
on (as of yet) unsupported platforms and will not be if there
are no MD extensions for a particular platform.
Further cleanups can be expected.
Ok'd: jkoshy@