89660 Commits

Author SHA1 Message Date
adrian
17c1a9e4d5 Store away the RTS aggregate limit from the HAL.
This will be used by some upcoming code to ensure that aggregates
are enforced to be a certain size.  The AR5416 has a limitation on
RTS protected aggregates (8KiB).
2012-04-07 02:51:53 +00:00
adrian
8e4ce17ba2 Remove duplicate txflags field from ath_buf.
rename bf_state.bfs_flags to bf_state.bfs_txflags, as that is what
it effectively is.
2012-04-07 02:01:26 +00:00
nwhitehorn
a8563567c4 Execute an initial ptesync if and only if the PTE is actually being
invalidated, as opposed to a ref/changed bit update.
2012-04-06 22:33:13 +00:00
ken
f5030474fb Change the SCSI INQUIRY peripheral qualifier that CTL reports for LUNs
that don't exist.

Anecdotal evidence indicates that it is better to return 011b (bad LUN)
than 001b (LUN offline).  However, this change also gives the user a
sysctl/tunable, kern.cam.ctl.inquiry_pq_no_lun, to override the change
and return to the previous behavior.  (The previous behavior was to
return 001b, or LUN offline.)

ctl.c:		Change the default inquiry peripheral qualifier to 011b,
		and add a sysctl and tunable to allow the user to change
		it back to 001b if needed.

		Don't insert a Copan copyright statement in the inquiry
		data.  The copyright statements on the files are
		sufficient.

ctl_private.h:	Add sysctl variable context to the CTL softc.

ctl_cmd_table.c,
ctl_frontend_internal.c,
ctl_frontend.c,
ctl_backend.c,
ctl_error.c:	Include sys/sysctl.h.

MFC after:	3 days
2012-04-06 22:23:13 +00:00
gibbs
ea2a8fbda5 Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.

sys/x86/x86/intr_machdep.c:
	In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
	the "intr_cpus" cpuset to only contain CPU0.

	This initialization is too late and nullifies the results of calls
	the intr_add_cpu() that occur much earlier in the boot process.
	Since "intr_cpus" is statically initialized to the empty set, and
	all processors, including the BSP, already add themselves to
	"intr_cpus" no special initialization for the BSP is necessary.

MFC after:	3 days
2012-04-06 21:19:28 +00:00
attilio
4bd8f1d188 Staticize vm_page_cache_remove().
Reviewed by:	alc
2012-04-06 20:34:00 +00:00
nwhitehorn
6c37128bdd Substantially reduce the scope of the locks held in pmap_enter(), which
improves concurrency slightly.
2012-04-06 18:18:48 +00:00
alc
8466366587 Micro-optimize free_pv_entry() for the expected case. 2012-04-06 16:41:19 +00:00
nwhitehorn
9c79ae8eea Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction
caches, by invalidating kernel icaches only when needed and not flushing
user caches for shared pages.

Suggested by:	kib
MFC after:	2 weeks
2012-04-06 16:03:38 +00:00
nwhitehorn
e05a39a26d Give the kernel pmap lock a different name than user pmap locks. It has
(slightly) different semantics and renaming it prevents a (harmless)
WITNESS warning during bootup for 32-bit kernels on 64-bit CPUs.

MFC after:	5 days
2012-04-06 16:00:37 +00:00
melifaro
ff7ed324aa Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
avg
3e6288731d retrofit Safe Mode loader menu item actions
The menu item is now made completely independent with the ACPI item - most
modern systems seem to require ACPI and become even more "unsafe"
without it.
Safe Mode no longer disables APIC for the same reason.
kbdmux is not disabled as this feature has proven itself stable.

New actions:
- SMP is disabled in the Safe Mode now
- eventtimers are forced to periodic mode (some real and virtual systems
  seem to have problems otherwise)
- geom extra vigorous integrity checking is disabled, this is to
  facilitate migration from previous versions

Possible short term to do:
- make SMP switch a separate menu item
- restore APIC switch as a separate menu item

Longer term to do:
- turn various tweaks into separate menu items in a Safe Mode sub-menu

Please consider adding a safety tweak to Safe Mode when introducing
new major features or changes that may cause instabilities.

Discussed with:	jhb, scottl, Devin Teske
MFC after:	3 weeks (stable/9 only)
2012-04-06 09:36:22 +00:00
tuexen
19af6e5d9d Remove duplicate condition in if statement.
Obtained from: brucec@
MFC after: 3 days
2012-04-06 09:03:02 +00:00
pluknet
9d2114a1e8 Free ballooned pages with the corresponding malloc type.
MFC after:	1 week
2012-04-06 08:13:29 +00:00
melifaro
85ccef88d3 - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
melifaro
8b1d10268c - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
jhb
5829de48d9 Add new ktrace records for the start and end of VM faults. This gives
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take.  The new ktrace records are
enabled via the 'p' trace type, and are enabled in the default set of
trace points.

Reviewed by:	kib
MFC after:	2 weeks
2012-04-05 17:13:14 +00:00
avg
69c46e84cc zfs_ioctl: no need for ddi_copyin/out here because sys_ioctl handles that
On FreeBSD the direct ioctl argument is automatically copied in/out
as necesary by the kernel ioctl entry point.

PR:		kern/164445
Submitted by:	Luis Garces-Erice <lge@ieee.org>
Tested by:	Attila Nagy <bra@fsn.hu>
MFC after:	5 days
2012-04-05 07:59:59 +00:00
ae
333953a424 Fix VIMAGE build. 2012-04-05 04:41:06 +00:00
davidxu
cc55f4943b In sem_post, the field _has_waiters is no longer used, because some
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.
2012-04-05 03:05:02 +00:00
davidxu
8c31e244f2 umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily.  On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c
2012-04-05 02:24:08 +00:00
adrian
4b2962c0bd Implement BAR TX.
A BAR frame must be transmitted when an frame in an A-MPDU session fails
to transmit - it's retried too often, or it can't be cloned for
re-transmission.  The BAR frame tells the remote side to advance the
left edge of the block-ack window (BAW) to a new value.

In order to do this:

* TX for that particular node/TID must be paused;
* The existing frames in the hardware queue needs to be completed, whether
  they're TXed successfully or otherwise;
* The new left edge of the BAW is then communicated to the remote side
  via a BAR frame;
* Once the BAR frame has been sucessfully TXed, aggregation can resume;
* If the BAR frame can't be successfully TXed, the aggregation session
  is torn down.

This is a first pass that implements the above.  What needs to be done/
tested:

* What happens during say, a channel reset / stuck beacon _and_ BAR
  TX.  It _should_ be correctly buffered and retried once the
  reset has completed.  But if a bgscan occurs (and they shouldn't,
  grr) the BAR frame will be forcibly failed and the aggregation session
  will be torn down.

  Yes, another reason to disable bgscan until I've figured this out.

* There's way too much locking going on here.  I'm going to do a couple
  of further passes of sanitising and refactoring so the (re) locking
  isn't so heavy.  Right now I'm going for correctness, not speed.

* The BAR TX can fail if the hardware TX queue is full.  Since there's
  no "free" space kept for management frames, a full TX queue (from eg
  an iperf test) can race with your ability to allocate ath_buf/mbufs
  and cause issues.  I'll knock this on the head with a subsequent
  commit.

* I need to do some _much_ more thorough testing in hostap mode to ensure
  that many concurrent traffic streams to different end nodes are correctly
  handled.  I'll find and squish whichever bugs show up here.

But, this is an important step to being able to flip on 802.11n by default.
The last issue (besides bug fixes, of course) is HT frame protection and
I'll address that in a subsequent commit.
2012-04-04 23:45:15 +00:00
adrian
7752b3142e Track and optionally log the actual sync interrupt cause.
These are involved in tracking host interface issues (ie, PCI/PCIe/AHB
interface.)
2012-04-04 22:51:50 +00:00
adrian
5f2a4bba47 Disable the HWQ contents upon a TX queue reset, rather than a TX queue flush.
This is designed to assist in figuring out what the hardware state is
when something like a queue hang has occured.
2012-04-04 22:24:11 +00:00
adrian
16c6304ced Now that I've fixed the BAW TX hangs, disable this verbose debugging
again.
2012-04-04 22:22:50 +00:00
jkim
10109cda1b Save and restore VGA display memory between suspend and resume. 2012-04-04 22:02:54 +00:00
adrian
866c0e9fc2 Correctly handle AR_MoreAggr when assembling multi-descriptor final frames.
Linux ath9k doesn't have this issue as it doesn't try queuing multi-
descriptor frames to the hardware.

Before, I was only setting the first and last descriptor in the final
frame correctly - and that was done by accident. The first descriptor in
the last sub-frame was being correctly updated by ath_tx_setds_11n();
the last descriptor in the last sub-frame was being correctly updated
by ath_buf_set_rate(). But both of those are "incorrect".

The correct behaviour is:

* AR_IsAggr is set for all descriptors for all subframes in an aggregate.
* AR_MoreAggr is set for all descriptors for all non-final sub-frames
  in an aggregate.

Ie, all descriptors in the last sub-frame of an aggregate must have this
field set to 0.

I still need to do a couple of extra passes to ensure the pad delimiter
field is being correctly handled in all descriptors in the last sub-frame.
2012-04-04 21:49:49 +00:00
jkim
1a4c884636 Do not copy VESA state buffer if the VBE call has failed for any reason.
Do not unnecessarily clear the state buffer before calling the function.
2012-04-04 21:38:26 +00:00
jhb
452f890f26 Disable INET6 support in modules when building the LINT-NOINET6 kernel.
Reviewed by:	bz
MFC after:	1 week
2012-04-04 21:31:20 +00:00
jkim
a390e7b468 Remove a useless warning. The mode information is unused for very long time
and this function may be used with VESA mode since r232069.
2012-04-04 21:19:55 +00:00
marius
cc107cbdf9 - Const'ify the device lookup-table.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
- Enable support for flow control.
  Tested by: yongari

MFC after:	1 week
2012-04-04 21:09:02 +00:00
adrian
a544afcdc9 Add a threadid to the ah_decode API.
This adds the current thread ID to each logged register and mark entry,
allowing for easier debugging of concurrent/overlapping NIC operations.
2012-04-04 20:46:20 +00:00
marius
3eaf24f2ea Refine r233827; as it turns out, controllers with a device ID of 0x0059
can be upgraded to MegaRAID mode, in which case mfi(4) should attach to
these based on the sub-vendor and -device ID instead (not currently done).
Therefore, let mpt_pci_probe() return BUS_PROBE_LOW_PRIORITY.
While it, let mpt_pci_probe() return BUS_PROBE_DEFAULT instead of 0 in
the default case.

MFC after:	3 days
2012-04-04 20:42:45 +00:00
adrian
c754ab5e30 Disable a specific Merlin hardware workaround which may cause hangs on some
PCIe controllers.

Obtained from:	Atheros / Linux
2012-04-04 20:42:32 +00:00
jkim
5c66ff4b78 - Do not include machine/atomic.h. It is no longer necessary since r233768.
- Remove bogus "atomic" macros and a read-only variable from softc.

Reviewed by:	ambrisko
2012-04-04 16:15:40 +00:00
jh
7494e2ac3f Add a check for unsupported file flags to ufs_setattr().
Discussed with:	bde
MFC after:	2 weeks
2012-04-04 14:50:21 +00:00
glebius
7676adf25d Merge from OpenBSD:
revision 1.173
  date: 2011/11/09 12:36:03;  author: camield;  state: Exp;  lines: +11 -12
  State expire time is a baseline time ("last active") for expiry
  calculations, and does _not_ denote the time when to expire.  So
  it should never be added to (set into the future).

  Try to reconstruct it with an educated guess on state import and
  just set it to the current time on state updates.

  This fixes a problem on pfsync listeners where the expiry time
  could be double the expected value and cause a lot more states
  to linger.
2012-04-04 14:47:59 +00:00
jhb
b45da04a8e Add descriptions after the 'device' line for several NICs to match the
existing style.
2012-04-04 13:49:22 +00:00
jhb
d2b8d8566c Fix build on i386. 2012-04-04 11:55:20 +00:00
np
307ef13f94 - Remove redundant call to pr_ctloutput from code that handles SO_SETFIB.
- Add a check for errors during copyin while here.

Reviewed by:	julian, bz
MFC after:	2 weeks
2012-04-03 18:38:00 +00:00
glebius
4a817f8ca2 Since pf 4.5 import pf(4) has a mechanism to defer
forwarding a packet, that creates state, until
pfsync(4) peer acks state addition (or 10 msec
timeout passes).

This is needed for active-active CARP configurations,
which are poorly supported in FreeBSD and arguably
a good idea at all.

Unfortunately by the time of import this feature in
OpenBSD was turned on, and did not have a switch to
turn it off. This leaked to FreeBSD.

This change make it possible to turn this feature
off via ioctl() and turns it off by default.

Obtained from:	OpenBSD
2012-04-03 18:09:20 +00:00
bschmidt
d54496d7dd Add basic HT channel setup to ieee80211_init_channels(), this will be
used by at least ral(4).

Reviewed by:	ray
2012-04-03 17:48:42 +00:00
marius
ae90e605cb Fix probing of SAS1068E with a device ID of 0x0059 after r232411.
Reported by:	infofarmer

MFC after:	3 days
2012-04-03 08:28:43 +00:00
mckusick
dc3cb66f4c A file cannot be deallocated until its last name has been removed
and it is no longer referenced by a user process. The inode for a
file whose name has been removed, but is still referenced at the
time of a crash will still be allocated in the filesystem, but will
have no references (e.g., they will have no names referencing them
from any directory).

With traditional soft updates these unreferenced inodes will be
found and reclaimed when the background fsck is run. When using
journaled soft updates, the kernel must keep track of these inodes
so that it can find and reclaim them during the cleanup process.
Their existence cannot be stored in the journal as the journal only
handles short-term events, and they may persist for days. So, they
are tracked by keeping them in a linked list whose head pointer is
stored in the superblock. The journal tracks them only until their
linked list pointers have been commited to disk. Part of the cleanup
process involves traversing the list of unreferenced inodes and
reclaiming them.

This bug was triggered when confusion arose in the commit steps
of keeping the unreferenced-inode linked list coherent on disk.
Notably, a race between the link() system call adding a link-count
to a file and the unlink() system call removing a link-count to
the file. Here if the unlink() ran after link() had looked up
the file but before link() had incremented the link-count of the
file, the file's link-count would drop to zero before the link()
incremented it back up to one. If the file was referenced by a
user process, the first transition through zero made it appear
that it should be added to the unreferenced-inode list when in
fact it should not have been added. If the new name created by
link() was deleted within a few seconds (with the file still
referenced by a user process) it would legitimately be a candidate
for addition to the unreferenced-inode list. The result was that
there were two attempts to add the same inode to the unreferenced-inode
list which scrambled the unreferenced-inode list's pointers leading
to a panic. The fix is to detect and avoid the false attempt at
adding it to the unreferenced-inode list by having the link()
system call check to see if the link count is zero before it
increments it. If it is, the link() fails with ENOENT (showing that
it has failed the link()/unlink() race).

While tracking down this bug, we have added additional assertions
to detect the problem sooner and also simplified some of the code.

Reported by:      Kirk Russell
Fix submitted by: Jeff Roberson
Tested by:        Peter Holm
PR:               kern/159971
MFC (to 9 only):  2 weeks
2012-04-02 21:58:37 +00:00
kib
ff6239a557 When process exists, not only the children shall be reparented to
init, but also the orphans shall be removed from the orphan list,
because the list header is destroyed.

Reported and tested by:	pho
MFC after:	3 days
2012-04-02 19:35:36 +00:00
kib
9ad701f91f Add helper function to remove the process from the orphans list and
use it instead of inlined code.

Tested by:	pho
MFC after:	3 days
2012-04-02 19:34:56 +00:00
ambrisko
d5677c25b7 Move struct megasas_sge from mfi_ioctl.h to mfivar.h so we can
remove including machine/bus.h.  Add some more mfi_ prefixes to
avoid name space pollution.

This should address the last tinderbox issues.
2012-04-02 19:13:02 +00:00
jhb
f021ea5434 Further tweak the changes made in r233709. The kernel doesn't permit
sleeping from a swi handler (even though in this case it would be ok), so
switch the refill and scanning SWI handlers to being tasks on a fast
taskqueue.  Also, only schedule the refill task for a CMCI as an MC# can
fire at any time, so it should do the minimal amount of work needed and
avoid opportunities to deadlock before it panics (such as scheduling a
task it won't ever need in practice).  To handle the case of an MC# only
finding recoverable errors (which should never happen), always try to
refill the event free list when the periodic scan executes.

MFC after:	2 weeks
2012-04-02 17:26:21 +00:00
jh
b0414ca221 - Use more natural ip->i_flags instead of vap->va_flags in the final
flags check.
- Add a comment for the immutable/append check done after handling of
  the flags.
- Style improvements.

No functional change intended.

Submitted by:	bde
MFC after:	2 weeks
2012-04-02 16:33:21 +00:00
jhb
e14923e42f Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible.  Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after:	2 weeks
2012-04-02 15:07:22 +00:00