Commit Graph

88634 Commits

Author SHA1 Message Date
Andriy Gapon
d50aaa6d6d ata_da: set disk::d_ident from serial number
MFC after:	10 days
2012-10-06 21:42:07 +00:00
Andriy Gapon
38ba6e6ae5 add detection of serial console presence to btx and boot2-like blocks
Note that this commit slightly increases size of boot blocks.

Reviewed by:	jhb
Tested by:	Olivier Cochard-Labbe <olivier@cochard.me>
MFC after:	26 days
2012-10-06 20:08:29 +00:00
Andriy Gapon
fdb3d7b169 i386 comconsole: don't loop forever if hardware doesn't respond
- clear capability flags when hw timeouts
- retire comc_started status variable and directly use c_flags to see
  if comconsole is selected for use

Reviewed by:	jhb
Tested by:	Uffe Jakobsen <uffe@uffe.org>,
		Olivier Cochard-Labbe <olivier@cochard.me>
MFC after:	26 days
2012-10-06 20:04:51 +00:00
Andriy Gapon
4ecbcb6f49 boot/console: handle consoles that fail to probe
- clarify meaning of console flags
- perform i/o via a console only if both of the following conditions are met:
   o console is active (selected by user or config)
   o console flags that it can perform the operation
- warn if a chosen console can not work (the warning may go nowhere without
  working and active console, though)

Reviewed by:	jhb
Tested by:	Uffe Jakobsen <uffe@uffe.org>,
		Olivier Cochard-Labbe' <olivier@cochard.me>
MFC after:	26 days
2012-10-06 20:01:17 +00:00
Andriy Gapon
8bf749ef3a zvol: set mediasize in geom provider right upon its creation
... instead of deferring the action until first open.
Unlike upstream this has no benefit on FreeBSD.
We know that as soon as the provider is created it is going to be tasted
and thus opened.  Initial mediasize of zero causes tasting failure
and subsequent retasting because of the size change.

MFC after:	14 days
2012-10-06 19:57:27 +00:00
Andriy Gapon
a90c9dfeab g_part_taste: directly destroy consumer and geom here, no need for withering
Besides withered but still alive consumers may interfere with
re-tatsing.

MFC after:	16 days
2012-10-06 19:52:50 +00:00
Andriy Gapon
298fbd1605 cngetc: use cpu_spinwait to ease the cncheckc loop a tiny bit
Reviewed by:	julian
MFC after:	10 days
2012-10-06 19:50:23 +00:00
Andriy Gapon
9d200697d7 zfsboot: simplify probe_drive() a little bit
The first discovered pool, whether it covers the whole boot disk or not,
is going to be first in zfs_pools list.  So there is no need at all
for spapp parameter.

This commit also fixes a bug where NULL would be assigned to NULL
pointer when probe_drive was called  with the spapp parameter of NULL.

MFC after:	21 days
2012-10-06 19:48:15 +00:00
Andriy Gapon
aae0c9de03 zfs boot: export boot/primary pool and vdev guid all the way to kenv
This is work in progress to for znextboot and it also provides
some convenient infrastructure.

MFC after:	20 days
2012-10-06 19:47:24 +00:00
Andriy Gapon
d39075208e zfs loader: treat plain pool name as a name of its root dataset
... as opposed to the previous behavior of treating it as boot
dataset (specified by bootfs or default)

MFC after:	19 days
2012-10-06 19:42:50 +00:00
Andriy Gapon
edfd4fce8f zfs boot spa_status: print bootfs for each reported pool
MFC after:	9 days
2012-10-06 19:42:05 +00:00
Andriy Gapon
164efe4010 boot/zfs: a small whitespace cleanup
MFC after:	5 days
2012-10-06 19:41:11 +00:00
Andriy Gapon
62c725a9db boot/zfs: call zfs_spa_init for all found pools
... and drop those for which it fails.
Also, add more sanity checking to the function.

MFC after:	16 days
2012-10-06 19:40:12 +00:00
Andriy Gapon
f152e0b5be zfsboot: use the same zfs dataset naming format as loader
Also, pool name alone now names a root dataset of the pool regardless
of bootfs property value.

MFC after:	15 days
2012-10-06 19:38:33 +00:00
Alan Cox
4ed2e31f01 In general pmap implementations do not set the wired attribute on
the temporary mappings that are used to implement operations like
pmap_zero_page().  There is no reason for the MIPS pmap to deviate
from that practice.
2012-10-06 19:33:52 +00:00
Andriy Gapon
61e100ee3b zfs_mount: taste geom providers for root pool config
This should allow to mount a dataset as a root filesystem even if
it belongs to a pool that is not described in zpool.cache.
This adds some overhead to the boot process though.

If the root filesystem's pool is found in zpool.cache, the by default
its cached configuration will be used for import.
vfs.zfs.rootpool.prefer_cached_config could be set to zero to force
the config to be retasted.

Discussed with:	gibbs, pjd, des
MFC after:	25 days
2012-10-06 19:33:47 +00:00
Andriy Gapon
296e021066 zfs boot: add lszfs command to i386 loader
... to list child datasets of a specified dataset.
Dataset name should be provided in poolname/dsname format.

MFC after:	17 days
2012-10-06 19:27:54 +00:00
Andriy Gapon
74b3e265c7 zfs boot: add code for listing child datasets of a given dataset
- only filesystem datasets are supported
- children names are printed to stdout

To do: allow to iterate over the list and fetch names programatically

MFC after:	17 days
2012-10-06 19:27:04 +00:00
Andriy Gapon
84b339ac4c zfs boot: chose a "first" pool if none is explicitly requested
MFC after:	8 days
2012-10-06 19:25:40 +00:00
Andriy Gapon
c331c9703c ktrace/kern_exec: check p_tracecred instead of p_cred
.. when deciding whether to continue tracing across suid/sgid exec.
Otherwise if root ktrace-d an unprivileged process and the processed
exec-ed a suid program, then tracing didn't continue across exec.

Reviewed by:	bde, kib
MFC after:	22 days
2012-10-06 19:23:44 +00:00
Alan Cox
948aea4031 Correct two pessimizations in pmap_extract_and_hold(). Test the PTE for
having PTE_RO set instead of PTE_D.  This avoids some unnecessary failures
by pmap_extract_and_hold() that will have to be handled by a call to
vm_fault_hold().  Testing the PTE for both being non-zero and having PTE_V
set is redundant.  The latter suffices.
2012-10-06 19:05:50 +00:00
Gleb Smirnoff
21d172a3f1 A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
Gleb Smirnoff
ea2951beed The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.
2012-10-06 07:06:57 +00:00
Xin LI
15752fa858 MFV: libpcap 1.3.0.
MFC after:	4 weeks
2012-10-05 18:42:50 +00:00
Adrian Chadd
943e37a120 Initialise an uninitialised variable. 2012-10-05 16:44:00 +00:00
John Baldwin
f8d6c20a0f Further adjust the workaround in r234501. Rounding all small requests up
to 32k swamped the controller causing firmware hangs.  Instead, round
requests smaller than 64k up to the next power of 2 as a general rule.
To handle the one known special case of a command that accepts a 12k
buffer returning a 24k-ish reply, round requests between 8k and 16k up
to 32k rather than 16k.  The result is that commands less than 8k should
now be rounded up to a smaller size (either 4k or 8k) rather than 32k.

PR:		kern/155658
Tested by:	Andreas Longwitz
MFC after:	1 week
2012-10-05 15:52:31 +00:00
Andriy Gapon
102548d143 mount.h: MNTK_VGONE_UPPER and MNTK_VGONE_WAITER were supposed to be different
... otherwise a waiter is never woken up.

Reported by:	swills
Discussed with:	jhb
Approved by:	kib
MFC after:	3 days
2012-10-05 14:42:38 +00:00
Pyun YongHyeon
a6e66cd28b Follow Broadcom datasheet:
Delay 100 microseconds after enabling transmit MAC.
 Delay 10 microseconds after enabling receive MAC.
2012-10-05 07:13:21 +00:00
Pyun YongHyeon
9b80ffe78f Add 40 microseconds delay after updating EMAC Mode register as
recommended by Broadcom data sheet.
2012-10-05 06:24:22 +00:00
Alan Cox
d68ca35a82 Eliminate a stale and a duplicated comment. 2012-10-05 04:35:20 +00:00
Pyun YongHyeon
a0a03d1e82 APE firmware touches EMAC Mode and TX/RX MAC Mode registers to keep
the MAC connected to the outside world.  So keep the accesses
atomic.
2012-10-05 03:46:25 +00:00
Pyun YongHyeon
e4146b9510 Don't touch EMAC Mode and TX/RX MAC Mode register when driver is
not running.
2012-10-05 03:35:38 +00:00
Adrian Chadd
e9472a9f88 Implement the quarter rate fractional channel programming for the
AR5416 and AR9280, but leave it disabled by default.

TL;DR: don't enable this code at all unless you go through the process
of getting the NIC re-certified.  This is purely to be used as a
reference and NOT a certified solution by any stretch of the imagination.

The background:

The AR5112 RF synth right up to the AR5133 RF synth (used on the AR5416,
derivative is used for the AR9130/AR9160) only implement down to 2.5MHz
channel spacing in 5GHz.  Ie, the RF synth is programmed in steps of 2.5MHz
(or 5, 10, 20MHz.) So they can't represent the quarter rate channels
in the 4.9GHz PSB (which end in xxx2MHz and xxx7MHz).  They support
fractional spacing in 2GHz (1MHz spacing) (or things wouldn't work,
right?)

So instead of doing this, the RF synth programming for the AR5112 and
later code will round to the nearest available frequency.

If all NICs were RF5112 or later, they'll inter-operate fine - they all
program the same. (And for reference, only the latest revision of the
RF5111 NICs do it, but the driver doesn't yet implement the programming.)

However:

* The AR5416 programming didn't at all implement the fractional synth
  work around as above;
* The AR9280 programming actually programmed the accurate centre frequency
  and thus wouldn't inter-operate with the legacy NICs.

So this patch:

* Implements the 4.9GHz PSB fractional synth workaround, exactly as the
  RF5112 and later code does;
* Adds a very dirty workaround from me to calculate the same channel
  centre "fudge" to the AR9280 code when operating on fractional frequencies
  in 5GHz.

HOWEVER however:

It is disabled by default.  Since the HAL didn't implement this feature,
it's highly unlikely that the AR5416 and AR928x has been tested in these
centre frequencies.  There's a lot of regulatory compliance testing required
before a NIC can have this enabled - checking for centre frequency,
for drift, for synth spurs, for distortion and spectral mask compliance.
There's likely a lot of other things that need testing so please don't
treat this as an exhaustive, authoritative list.  There's a perfectly good
process out there to get a NIC certified by your regulatory domain, please
go and engage someone to do that for you and pay the relevant fees.

If a company wishes to grab this work and certify existing 802.11n NICs
for work in these bands then please be my guest.  The AR9280 works fine
on the correct fractional synth channels (49x2 and 49x7Mhz) so you don't
need to get certification for that. But the 500KHz offset hack may have
the above issues (spur, distortion, accuracy, etc) so you will need to
get the NIC recertified.

Please note that it's also CARD dependent.  Just because the RF synth
will behave correctly doesn't at all mean that the card design will also
behave correctly.  So no, I won't enable this by default if someone
verifies a specific AR5416/AR9280 NIC works.  Please don't ask.

Tested:

I used the following NICs to do basic interoperability testing at
half and quarter rates.  However, I only did very minimal spectrum
analyser testing (mostly "am I about to blow things up" testing;
not "certification ready" testing):

* AR5212 + AR5112 synth
* AR5413 + AR5413 synth
* AR5416 + AR5113 synth
* AR9280
2012-10-04 15:42:45 +00:00
Tijl Coosemans
9cdf77375c Define clang feature test macro __has_extension. It's used in stdatomic.h. 2012-10-04 08:53:05 +00:00
Andrew Thompson
3e92ee8a53 Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The function
return value is not even checked and could lead to a panic on a null sc_rthash.

MFC after:	2 weeks
2012-10-04 07:40:55 +00:00
Pedro F. Giffuni
0d1040e5e1 rpc: convert all uid and gid variables to u_int.
After further discussion, instead of pretending to use
uid_t and gid_t as upstream Solaris and linux try to, we
are better using u_int, which is in fact what the code
can handle and best approaches the range of values used
by uid and gid.

Discussed with:	bde
Reviewed by:	bde
2012-10-04 04:15:18 +00:00
Adrian Chadd
0eb8162623 Pause and unpause the software queues for a given node based on the
net80211 node power save state.

* Add an ATH_NODE_UNLOCK_ASSERT() check
* Add a new node field - an_is_powersave
* Pause/unpause the queue based on the node state
* Attempt to handle net80211 concurrency issues so the queue
  doesn't get paused/unpaused more than once at a time from
  the net80211 power save code.

Whilst here (and breaking my usual rule), set CLRDMASK when a queue
is unpaused, regardless of whether the queue has some pending traffic.
This means the first frame from that TID (now or later) will hvae
CLRDMASK set.

Also whilst here, bump the swretrymax counters whenever the
filtered frames code expires a frame.  Again, breaking my rule, but
this is just a statistics thing rather than a functional change.

This doesn't fix ps-poll (but it doesn't break it too much worse
than it is at the present) or correcting the TID updates.
That's next on the list.

Tested:
	* AR9220 AP (Atheros AP96 reference design)
	* Macbook Pro and LG Optimus 1 Android phone, both setting
	  and clearing power save state (but not using PS-POLL.)
2012-10-03 23:23:45 +00:00
Ed Maste
104d9fc776 Cast through void * to silence compiler warning
The base netmap pointer and offsets involved are provided by the kernel
side of the netmap interface and will have appropriate alignment.

Sponsored by: ADARA Networks
MFC After: 2 weeks
2012-10-03 21:41:20 +00:00
Andrey V. Elsukov
45ac30d5f8 Replace all references to loader_callbacks_v1 with loader_callbacks.
Suggested by:	grehan@
2012-10-03 17:20:34 +00:00
Ed Schouten
6b1b791da6 Fix faulty error code handling in read(2) on TTYs.
When performing a non-blocking read(2), on a TTY while no data is
available, we should return EAGAIN. But if there's a modem disconnect,
we should return 0. Right now we only return 0 when doing a blocking
read, which is wrong.

MFC after:	1 month
2012-10-03 13:51:03 +00:00
Alexander Motin
6b67444bac Fix build without options ATA_CAM, broken by r241144. 2012-10-03 12:43:26 +00:00
Alan Cox
54f3305cca Reimplement pmap_qremove() using the new TLB invalidation function for
efficiently invalidating address ranges.
2012-10-03 05:42:15 +00:00
Alan Cox
4db2c4b8c7 Tidy up a bit:
Update some of the comments.  In particular, use "sleep" in preference to
"block" where appropriate.

Eliminate some unnecessary casts.

Make a few whitespace changes for consistency.

Reviewed by:	kib
MFC after:	3 days
2012-10-03 05:06:45 +00:00
Kenneth D. Merry
0e28d282b7 Add casts to unbreak the i386 PAE build for the mps(4) driver.
MFC after:	3 days
Prompted by:	Garrett Cooper
2012-10-02 23:04:12 +00:00
Alexander Motin
9c87d811eb Implement SATA revision (speed) control for legacy SATA controller for
both boot (via loader tunables) and run-time (via `camcontrol negotiate`).
Tested to work at least on NVIDIA MCP55 chipset.

H/w provided by:	glebius
2012-10-02 22:03:21 +00:00
Pedro F. Giffuni
0c2222baf4 libtirpc: be sure to free cl_netid and cl_tp
When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.

This change matches the reference (OpenSolaris) implementation.

Tested by:	David Wolfskill
Obtained from:	Bull GNU/Linux NFSv4 Project (libtirpc)
MFC after:	2 weeks
2012-10-02 19:10:19 +00:00
Pedro F. Giffuni
f3c3ef7b2a RPC: Convert all uid and gid variables of the type uid_t and gid_t.
This matches what upstream (OpenSolaris) does.

Tested by:	David Wolfskill
Obtained from:	Bull GNU/Linux NFSv4 project (libtirpc)
MFC after:	3 days
2012-10-02 19:00:56 +00:00
Garrett Wollman
48b5c7410f Fix spelling of the function name in two assertion messages. 2012-10-02 18:38:05 +00:00
Adrian Chadd
e7f0d7cf47 Migrate the power-save functions to be overridable VAP methods.
This turns ieee80211_node_pwrsave(), ieee80211_sta_pwrsave() and
ieee80211_recv_pspoll() into methods.

The intent is to let drivers override these and tie into the power save
management pathway.

For ath(4), this is the beginning of forcing a node software queue to
stop and start as needed, as well as supporting "leaking" single frames
from the software queue to the hardware.

Right now, ieee80211_recv_pspoll() will attempt to transmit a single frame
to the hardware (whether it be a data frame on the power-save queue or
a NULL data frame) but the driver may have hardware/software queued frames
queued up.  This initial work is an attempt at providing the hooks required
to implement correct behaviour.

Allowing ieee80211_node_pwrsave() to be overridden allows the ath(4)
driver to pause and unpause the entire software queue for a given node.
It doesn't make sense to transmit anything whilst the node is asleep.

Please note that there are other corner cases to correctly handle -
specifically, setting the MORE data bit correctly on frames to a station,
as well as keeping the TIM updated.  Those particular issues can be
addressed later.
2012-10-02 17:45:19 +00:00
Gleb Smirnoff
aa955cb5b8 To reduce volume of pfsync traffic:
- Scan request update queue to prevent doubles.
- Do not push undersized daragram in pfsync_update_request().
2012-10-02 12:44:46 +00:00
John Baldwin
b3aa419331 Rename the module for 'device enc' to "if_enc" to avoid conflicting with
the CAM "enc" peripheral (part of ses(4)).  Previously the two modules
used the same name, so only one was included in a linked kernel causing
enc0 to not be created if you added IPSEC to GENERIC.  The new module
name follows the pattern of other network interfaces (e.g. "if_loop").

MFC after:	1 week
2012-10-02 12:25:30 +00:00
Gleb Smirnoff
df4e91d386 There is a complex race in in_pcblookup_hash() and in_pcblookup_group().
Both functions need to obtain lock on the found PCB, and they can't do
classic inter-lock with the PCB hash lock, due to lock order reversal.
To keep the PCB stable, these functions put a reference on it and after PCB
lock is acquired drop it. If the reference was the last one, this means
we've raced with in_pcbfree() and the PCB is no longer valid.

  This approach works okay only if we are acquiring writer-lock on the PCB.
In case of reader-lock, the following scenario can happen:

  - 2 threads locate pcb, and do in_pcbref() on it.
  - These 2 threads drop the inp hash lock.
  - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock,
    does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which
    doesn't free the pcb due to two references on it. Then it unlocks the pcb.
  - 2 aforementioned threads acquire reader lock on the pcb and run
    in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues,
    second gets 0 and considers pcb freed, returns.
  - The thread that got 1 continutes working with detached pcb, which later
    leads to panic in the underlying protocol level.

  To plumb that problem an additional INPCB flag introduced - INP_FREED. We
check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend
that that was the last reference.

Discussed with:		rwatson, jhb
Reported by:		Vladimir Medvedkin <medved rambler-co.ru>
2012-10-02 12:03:02 +00:00
Hans Petter Selasky
6320afb5c5 Style.
MFC after:	1 week
2012-10-02 10:09:23 +00:00
Hans Petter Selasky
1d8fa9519f Remove unused field.
MFC after:	1 week
2012-10-02 10:05:39 +00:00
Alan Cox
9a974b9024 Introduce a new TLB invalidation function for efficiently invalidating
address ranges, and use this function in pmap_remove().

Tested by:	jchandra
2012-10-02 07:14:22 +00:00
Eitan Adler
8dbce2a343 Provide a generic way to disable devices at boot time
PR:		kern/119202
Requested by:	peterj
Reviewed by:	sbruno, jhb
Approved by:	cperciva
MFC after:	1 week
2012-10-02 03:33:41 +00:00
Kenneth D. Merry
25aae1bed3 Add the mps(4) driver to the i386 GENERIC config file. LSI has tested it
on i386 and verified that it works.

Submitted by:	Harald Schmalzbauer, John Baldwin, Kashyap Desai
MFC after:	3 days
2012-10-01 21:42:32 +00:00
Tim Kientzle
79823ad281 Support kernel options from ubldr. 2012-10-01 14:56:48 +00:00
Rick Macklem
05496254a6 Attila Bogar and Herbert Poeckl both reported similar problems
w.r.t. a Linux NFS client doing a krb5 NFS mount against the
FreeBSD server. We determined this was a Linux bug:
http://www.spinics.net/lists/linux-nfs/msg32466.html, however
the mount failed to work, because the Destroy operation with a
bogus encrypted checksum destroyed the authenticator handle.
This patch changes the rpcsec_gss code so that it doesn't
Destroy the authenticator handle for this case and, as such,
the Linux mount will work.

Tested by: Attila Bogar and Herbert Poeckl
MFC after:	2 weeks
2012-10-01 12:28:58 +00:00
Pawel Jakub Dawidek
55711729f3 - Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change
mkfifoat(2) was not restricted.
- Introduce CAP_MKNOD and enforce it on mknodat(2).

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-10-01 05:43:24 +00:00
Hans Petter Selasky
0324d54acb Inherit USB mode from RootHUB port where the USB device is connected.
Only RootHUB ports can be dual mode. Disallow OTG ports on external HUBs.
This simplifies some checks in the USB controller drivers.

MFC after:	1 week
2012-10-01 05:42:43 +00:00
Andrew Turner
5bd9e48117 Remove unused variables from the OMAP ehci code. 2012-10-01 05:15:13 +00:00
Andrew Turner
052e6d041f Fix the clobber list on the atomic operators that do comparisons. Without
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
2012-10-01 05:12:17 +00:00
Hans Petter Selasky
12b16d85ae The USB Bluetooth driver should only grab its own interfaces. This allows the
USB bluetooth driver to co-exist with other USB device classes and drivers.

Reported by:	Geoffrey Levand
MFC after:	1 week
2012-09-30 19:31:20 +00:00
Kevin Lo
954c5baed9 Add missing header needed by free(9).
Spotted by:	David Wolfskill <david at catwhisker dot org>
2012-09-30 15:42:20 +00:00
Andrey V. Elsukov
04773e8b75 Fix the style. 2012-09-30 13:17:33 +00:00
Andrey V. Elsukov
b3651aad67 Remember the file format of the last loaded module and try to use it for
next files.
2012-09-30 13:14:37 +00:00
Andrey V. Elsukov
95b2c05cf0 Reduce the number of attempts to detect proper kld format for the amd64
loader.
2012-09-30 12:24:15 +00:00
Kevin Lo
93f01327ea Remove an unneeded NULL check after M_WAITOK. 2012-09-30 09:26:26 +00:00
Kevin Lo
b5db12bfb5 Free result of device_get_children(9). 2012-09-30 09:21:10 +00:00
Andrey V. Elsukov
089afddef4 Fix disk_cleanup() to work without DISK_DEBUG too. 2012-09-30 07:52:40 +00:00
Alan Cox
26e874e0d5 Stop calling pmap_remove_write() from pmap_remove_all(). Doing so is not
only inefficient but also leads to recursive lock acquisition.

Tested by:	ray
2012-09-30 03:54:57 +00:00
Alan Cox
a1685193bc Eliminate an unused declaration. 2012-09-29 22:28:00 +00:00
Gleb Smirnoff
7b6fbb7367 Clear and re-setup all function pointers that glue pf(4) and pfsync(4)
together whenever the pfsync0 is brought down or up respectively.
2012-09-29 20:11:00 +00:00
Gleb Smirnoff
0fa4aaa7e6 Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
  as queue property.
- Write methods don't need to know about mbud, supply just buf
  to them.
- No need for safe queue iterator in pfsync_sendout().

Obtained from:	OpenBSD
2012-09-29 20:02:26 +00:00
Alan Cox
f0084308a0 Eliminate unused variables. 2012-09-29 19:09:11 +00:00
Alan Cox
e95f0abb09 Add support for mincore(). Specifically, this is an adaptation of the
pmap_mincore() implementation that was added to the original arm pmap
in r235717.
2012-09-29 17:20:16 +00:00
Andrey V. Elsukov
f9cd8b07a4 Almost each time when loader opens a file, this leads to calling
disk_open(). Very often this is called several times for one file.
This leads to reading partition table metadata for each call. To
reduce the number of disk I/O we have a simple block cache, but it
is very dumb and more than half of I/O operations related to reading
metadata, misses this cache.

Introduce new cache layer to resolve this problem. It is independent
and doesn't need initialization like bcache, and will work by default
for all loaders which use the new DISK API. A successful disk_open()
call to each new disk or partition produces new entry in the cache.
Even more, when disk was already open, now opening of any nested
partitions does not require reading top level partition table.
So, if without this cache, partition table metadata was read around
20-50 times during boot, now it reads only once. This affects the booting
from GPT and MBR from the UFS.
2012-09-29 16:47:56 +00:00
Kevin Lo
000811380d If devclass_get_devices(9) returns success but a count of 0,
free the pointer.
2012-09-29 16:27:13 +00:00
Kevin Lo
374c6ff93a Remove unused variables. 2012-09-29 16:15:27 +00:00
Andrey V. Elsukov
ab945379ed Disable splitfs support, since we aren't support floppies for a long
time. This slightly reduces an overhead, when loader tries to open
file that doesn't exist.
2012-09-29 15:08:55 +00:00
Alan Cox
208d06cea8 Update a comment to reflect recent locking changes. 2012-09-29 08:11:12 +00:00
Gleb Smirnoff
891122d180 carp_send_ad() should never return without rescheduling next run. 2012-09-29 05:52:19 +00:00
Gleb Smirnoff
e2cfe42430 Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
2012-09-28 20:43:03 +00:00
Gleb Smirnoff
063efed28c The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
Hans Petter Selasky
2196d98ea0 Make sure we don't leak a mbuf in a fail case. 2012-09-28 16:23:01 +00:00
Hans Petter Selasky
66249c7c82 Remove some trailing bytes which are not part of the ethernet packet.
Discussed with:		bgray @
2012-09-28 15:33:13 +00:00
Hans Petter Selasky
a3bfcf3e5d Correct NYET handling. Remove superfluous transfer complete interrupt mask. 2012-09-28 15:24:14 +00:00
Alexander Motin
d6e285946d Change queue overflow checks from DIAGNOSTIC+panic() to KASSERT() to make
them enabled on HEAD by default. It is probably better to do single compare
then hunt for unexpected memory corruption.
2012-09-28 12:13:34 +00:00
John Baldwin
960b5a7080 - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00
Konstantin Belousov
877d24ac8a Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
Andrey V. Elsukov
88a0dd24bf Make the loader a bit smarter, when it tries to open disk and the slice
number is not exactly specified. When the disk has MBR, also try to read
BSD label after ptable_getpart() call. When the disk has GPT, also set
d_partition to 255.  Mostly, this is how it worked before.
2012-09-28 10:49:41 +00:00
Pawel Jakub Dawidek
5d8a6a1078 Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.

The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.

Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.

Discussed with:	ken
MFC after:	1 week
2012-09-28 08:22:51 +00:00
Alan Cox
e4b8a2fc5a Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
Matthew D Fleming
fc8fdae0df Fix up kernel sources to be ready for a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:30:49 +00:00
Ryan Stone
3fabe28bdc Ensure that all cases that enqueue a netgraph item for delivery by a
ngthread properly set the item's depth to 1.  In particular, prior to this
change if ng_snd_item failed to acquire a lock on a node, the item's depth
would not be set at all.  This fix ensures that the error code from rcvmsg/
rcvdata is properly passed back to the apply callback.  For example, this
fixes a bug where an error from rcvmsg/rcvdata would not previously
propagate back to a libnetgraph consumer when the message was queued.

Reviewed by:	mav
MFC after:	1 month
Sponsored by:	Sandvine Incorporated
2012-09-27 20:12:51 +00:00
Pedro F. Giffuni
06f13fb3f4 Complete revert of r239963:
The attempt to merge changes from the linux libtirpc caused
rpc.lockd to exit after startup under unclear conditions.

After many hours of selective experiments and inconsistent results
the conclusion is that it's better to just revert everything and
restart in a future time with a much smaller subset of the
changes.
____

MFC after:	3 days
Reported by:	David Wolfskill
Tested by:	David Wolfskill
2012-09-27 19:10:25 +00:00
Max Khon
617643aaa6 Fix pseudo checksum calculation.
This fixes ipfilter w/ network controllers that implement only
partial rx csum offloading.

PR:			106438
Obtained from:		upstream
MFC after:		1 week
2012-09-27 18:15:01 +00:00
Pawel Jakub Dawidek
c8e781f6e0 Revert r240931, as the previous comment was actually in sync with POSIX.
I have to note that POSIX is simply stupid in how it describes O_EXEC/fexecve
and friends. Yes, not only inconsistent, but stupid.

In the open(2) description, O_RDONLY flag is described as:

	O_RDONLY	Open for reading only.

Taken from:

	http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

Note "for reading only". Not "for reading or executing"!

In the fexecve(2) description you can find:

	The fexecve() function shall fail if:

	[EBADF]
		The fd argument is not a valid file descriptor open for executing.

Taken from:

	http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html

As you can see the function shall fail if the file was not open with O_EXEC!

And yet, if you look closer you can find this mess in the exec.html:

	Since execute permission is checked by fexecve(), the file description
	fd need not have been opened with the O_EXEC flag.

Yes, O_EXEC flag doesn't have to be specified after all. You can open a file
with O_RDONLY and you still be able to fexecve(2) it.
2012-09-27 16:43:23 +00:00
Hans Petter Selasky
7a2275046d Make sure the "wMaxPacketSize" limitations are respected. 2012-09-27 15:45:24 +00:00
Hans Petter Selasky
19f9c619a2 Make sure we record NAK tokens in the TD structure for IN direction.
Improve host channel disabling. Wait two times 125us for channel to be
disabled. The DWC OTG doesn't like when channels are re-used too early.
2012-09-27 15:23:38 +00:00
Mikolaj Golub
47813f5d94 Kernel and modules have "set_vnet" linker set, where virtualized
global variables are placed. When a module is loaded by link_elf
linker its variables from "set_vnet" linker set are copied to the
kernel "set_vnet" ("modspace") and all references to these variables
inside the module are relocated accordingly.

The issue is when a module is loaded that has references to global
variables from another, previously loaded module: these references are
not relocated so an invalid address is used when the module tries to
access the variable. The example is V_layer3_chain, defined in ipfw
module and accessed from ipfw_nat.

The same issue is with DPCPU variables, which use "set_pcpu" linker
set.

Fix this making the link_elf linker on a module load recognize
"external" DPCPU/VNET variables defined in the previously loaded
modules and relocate them accordingly. For this set_pcpu_list and
set_vnet_list are used, where the addresses of modules' "set_pcpu" and
"set_vnet" linker sets are stored.

Note, archs that use link_elf_obj (amd64) were not affected by this
issue.

Reviewed by:	jhb, julian, zec (initial version)
MFC after:	1 month
2012-09-27 14:55:15 +00:00
Edward Tomasz Napierala
a0a6ff825b Remove useless NULL checks after M_WAITOK allocations. 2012-09-27 10:51:38 +00:00
Gleb Smirnoff
e5280830c4 Fix zillions of style(9) and spacing bugs introduced by r240981.
Pointy hat to:	sobomax
2012-09-27 10:46:22 +00:00
Gleb Smirnoff
904c39091c Fix several build failures for !COMPAT_FREEBSD32 and
!COMPAT_FREEBSD* kernels introduced by r240981.

Pointy hat to:	sobomax
2012-09-27 10:30:11 +00:00
Gleb Smirnoff
85c05144f1 Fix bug in TCP_KEEPCNT setting, which slipped in in the last round
of reviewing of r231025.

Unlike other options from this family TCP_KEEPCNT doesn't specify
time interval, but a count, thus parameter supplied doesn't need
to be multiplied by hz.

Reported & tested by:	amdmi3
2012-09-27 07:13:21 +00:00
Adrian Chadd
08977788d5 Track the last ANI TX/RX sample correctly.
This doesn't specifically fix the issue(s) i'm seeing in this 2GHz
environment (where setting/increasing spur immunity causes OFDM restart
errors to skyrocket through the roof; but leaving it at 0 would leave
the environment cleaner..)

Pointy-hat-to:	me, for committing this broken code in the first place.
2012-09-27 06:05:54 +00:00
Alan Cox
703205f3c6 Implementing pmap_kextract(va) as pmap_extract(kernel_pmap, va) is
problematic because some callers to pmap_kextract() expect its
implementation to be lock-less.  In particular, uma_dbg_alloc() implicitly
requires this.  Otherwise, lock-order reversals occur between pmap locks and
UMA zone locks.  So, this change introduces a lock-less implementation of
pmap_kextract().

Disable recursion on the pvh global lock in the new armv6 pmap.  While
recursion on this locks occurs in the old arm pmap, it thankfully doesn't
occur in the armv6 pmap.

Tested by:	jmg
2012-09-27 05:39:42 +00:00
Maxim Sobolev
b01bf72b6e Add 32-bit ABI compat shims. Those are necessary for i386 binary-only
tools like sysutils/hpacucli (HP P4xx RAID controller management
suite) working on amd64 systems.

PR:		139271
Submitted by:	Kazumi MORINAGA, Eugene Grosbein
MFC after:	1 week
2012-09-27 04:28:55 +00:00
Gleb Smirnoff
80cd7c7596 - In the bridge_enqueue() do success/error accounting for
each fragment, not only once.
- In the GRAB_OUR_PACKETS() macro do increase if_ibytes.
2012-09-26 20:09:48 +00:00
Hans Petter Selasky
55df160153 Make sure the DWC OTG host mode channels are given enough time to disable. 2012-09-26 18:59:20 +00:00
John Baldwin
aceb040376 Merge similar fixes from 223198 from igb to ixgbe:
- Use a dedicated task to handle deferred transmits from the if_transmit
  method instead of reusing the existing per-queue interrupt task.
  Reusing the per-queue interrupt task could result in both an interrupt
  thread and the taskqueue thread trying to handle received packets on a
  single queue resulting in out-of-order packet processing and lock
  contention.
- Don't define ixgbe_start() at all where if_transmit is used.

Tested by:	Vijay Singh
Reviewed by:	jfv
MFC after:	2 weeks
2012-09-26 18:11:43 +00:00
Jim Harris
37274fc04c Create led(4) device nodes mapped to isci(4) SGPIO locate LEDs.
Device nodes are in the format /dev/led/isci.busX.portY.locate.

Sponsored by:	Intel
Requested by:	Paul Maulberger <paul dot maulberger at gmx dot de>
MFC after:	1 week
2012-09-26 16:46:44 +00:00
John Baldwin
b04e4c122b Remove FreeBSD 4.x compat shims. Verified by md5. 2012-09-26 14:17:14 +00:00
John Baldwin
7f6194d6a6 Grab the mfi_config_lock while performing a MFI_DCMD_CFG_FOREIGN_IMPORT
request on behalf of a user utility.

Submitted by:	Steven Hartland  killing multiplay co uk
MFC after:	1 week
2012-09-26 14:14:06 +00:00
Andrew Turner
2193c1b48a Create the new initarm_ functions to reduce the diff to the other FDT
versions of initarm
2012-09-26 10:07:53 +00:00
Martin Matuska
8469b12c2e Merge recent vendor changes in ZFS.
Illumos issued covered:
2811 missing implementation: zfs send -r
3139 zdb dies when it tries to determine path of unlinked file
3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg
3208 moving zpool cross-endian results in incorrect user/group accounting

References:
  https://www.illumos.org/issues/ + [issue_id]

Obtained from:	illumos (vendor/illumos, vendor/illumos-sys)
MFC after:	2 weeks
2012-09-26 09:37:58 +00:00
Andrew Turner
a9111e46bc Use arm_dump_avail_init to build the dump_avail array 2012-09-26 09:27:38 +00:00
Andrew Turner
f902e2e2d3 Start to clean up the lpc initarm as it also uses FDT. 2012-09-26 09:25:31 +00:00
Konstantin Belousov
94cb35459d Make the updates of the tid ring buffer' head and tail pointers
explicit by moving them into separate statements from the buffer
element accesses.

Requested by:	jhb
MFC after:	3 days
2012-09-26 09:25:11 +00:00
Edward Tomasz Napierala
43f3d8e372 Fix panic in CTL caused by trying to free invalid pointers passed
by the userland process via the IOCTL interface.

Reviewed by:	ken@
2012-09-26 07:09:15 +00:00
Adrian Chadd
7403d1b9b2 Map the non-QoS TID to the voice queue, in order to ensure important
things like EAPOL frames make it out.

After a whole bunch of hacking/testing, I discovered that they weren't
being early-dropped by the stack (but I should look at ensuring that
later..) but were even making to the hardware transmit queue.
They were mostly even being received by the remote end.  However, the
remote end was completely ignoring them.

This didn't happen under 150-170MBit TCP tests as I'm guessing the TX
queue stayed very busy and the STA didn't do any scanning. However, when
doing 100Mbit/s of TCP traffic, the STA would do background scanning -
which involves it coming in and out of powersave mode with the AP.

Now, this is a total and utter hack around the real problems, which are:

* I need to implement proper power save handling and integrate it into
  the filtered frames support, so the driver/stack doesn't send frames
  whilst the station is actually in sleep;

* .. but frames were actually making it to the STA (macbook pro) and
  the AP did receive an ACK; but a tcpdump on the receiving side showed
  the EAPOL frame never made it. So the stack was dropping it for
  some reason;

* Importantly - the EAPOL frames are currently going into the non-QoS
  TID, which maps to the BE queue and is susceptible to that queue being
  busy doing other things, but;

* There's other traffic going on in the non-QoS TID from other contexts
  when scanning is going on and it's possible there's some races causing
  sequence number/IV issues, but;

* Importantly importantlly, I think the interaction with TID 16 multicast
  traffic in power save mode is causing issues - since I -believe- the
  sequence number space being used by the EAPOL frames on TID 16 overlaps
  with the multicast frames that have sequence numbers allocated and
  are then stuffed on the cabq.  Since with EAPOL frames being in TID 16
  and queued to the BE queue, it's going to be waiting to be serviced
  with all of the aggregate traffic going on - and if the CABQ gets
  emptied beforehand, those TID 16 multicast frames with sequence numbers
  will go out beforehand.

Now, there's quite likely a bunch of "stuff happening slightly out of
sequence" going on due to the nature of the TX path (read: lots of
overlapping and concurrent ath_start() and ath_raw_xmit() calls going
on, sigh) but I thought I had caught them all and stuffed each TID TX
behind a lock (that lasted as long as it needed to in order to get
the frame onto the relevant destination queue - thus keeping things
in order.)

Unfortunately the last problem is the big one and I'm going to stare at
it some more.  If it _is_

So this is a work around for now to ensure that EAPOL frames actually
make it out before any other stuff in the non-QoS TID and HOPEFULLY
before the CABQ gets active.

I'm now going to spend a little time in the TX path figuring out exactly
why the sender is rejecting things. There's two (well, three if you count
EAPOL contents invalid) possibilities:

* The sequence number is out of order (ie, something else like the multicast
  traffic on CABQ) is going out first on TID 16;
* The CCMP IV is out of order (similar to above - but less likely,  as the
  TX key for multicast traffic is different to unicast traffic);
* EAPOL contents strangely invalid.

AP: Ubiquiti RSPRO, AR9160/AR9220 NICs
STA: Macbook Pro, Broadcom 11n NIC
2012-09-26 03:45:42 +00:00
Ed Maste
66d3579a1e Correct misspelling in debug output. 2012-09-26 01:09:19 +00:00
Ed Maste
c11038e252 Revert part of an earlier patch attempt that snuck in with r240938. 2012-09-25 23:41:45 +00:00
Pawel Jakub Dawidek
28f865b0b1 Fix freebsd32_kmq_timedreceive() and freebsd32_kmq_timedsend() to use
getmq_read() and getmq_write() respectively, just like sys_kmq_timedreceive()
and sys_kmq_timedsend().

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 22:15:59 +00:00
Aleksandr Rybalko
869785b49c Add more SPI flash IDs.
Submitted by:	Luiz Otavio O Souza.
Submitted by:	ZRouter.org project.
Approved by:	adrian (menthor)
2012-09-25 22:12:07 +00:00
Ed Maste
5a71e42350 Avoid INVARIANTS panic destroying an in-use tap(4)
The requirement (implied by the KASSERT in tap_destroy) that the tap is
closed isn't valid; destroy_dev will block in devdrn while other threads
are in d_* functions.

Note: if_tun had the same issue, addressed in SVN revisions r186391,
r186483 and r186497.  The use of the condvar there appears to be
redundant with the functionality provided by destroy_dev.

Sponsored by:	ADARA Networks
Reviewed by:	dwhite
MFC after:	2 weeks
2012-09-25 22:10:14 +00:00
Pawel Jakub Dawidek
8c706ce0d0 vn_write() always expects FOF_OFFSET flag, which is asserted at the begining,
so there is no need to check for it.

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:31:17 +00:00
Ed Maste
cf8f32025f Remove an incorrect comment 2012-09-25 21:19:17 +00:00
Pawel Jakub Dawidek
3a038c4d68 We cannot open file for reading and executing (O_RDONLY | O_EXEC).
Well, in theory we can pass those two flags, because O_RDONLY is 0,
but we won't be able to read from a descriptor opened with O_EXEC.

Update the comment.

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:11:40 +00:00
Pawel Jakub Dawidek
5c3e5c7f03 Require CAP_DELETE on directory descriptor for unlinkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:00:36 +00:00
Pawel Jakub Dawidek
cffcbad2bf Require CAP_CREATE on directory descriptor for symlinkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 20:59:12 +00:00
Pawel Jakub Dawidek
d2e166e654 Require CAP_CREATE on directory descriptor for linkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 20:58:15 +00:00
Pawel Jakub Dawidek
1159429db8 O_EXEC flag is not part of the O_ACCMODE mask, check it separately.
If O_EXEC is provided don't require CAP_READ/CAP_WRITE, as O_EXEC
is mutually exclusive to O_RDONLY/O_WRONLY/O_RDWR.

Without this change CAP_FEXECVE capability right is not enforced.

Sponsored by:	FreeBSD Foundation
MFC after:	3 days
2012-09-25 20:48:49 +00:00
Adrian Chadd
0a54471901 Oops - don't do the clrdmask check in ath_tx_xmit_normal() - the wrong
lock may be held.

Kim reported that the TID lock wasn't held when ath_tx_update_clrdmask()
was called. Well, the underlying hardware TXQ for that TID.

I'm betting it's the cabq stuff. ath_tx_xmit_normal() can be called
for both real and software cabq.  For software cabq, the real destination
txq is different to the txq. So, the lock check will fail.

Reported by:	Kim Culhan <w8hdkim@gmail.com>
2012-09-25 20:41:43 +00:00
George V. Neville-Neil
0bf9cb917c Change the module name for the I/O provider to "kernel" from
"genunix"  This will requires us to modify externally created
DTrace scripts but makes logical sense for FreeBSD.

Requested by: rpaulo
MFC after:	2 weeks
2012-09-25 19:16:28 +00:00
Ryan Stone
a916893f23 Some aac(4) adapters will always report that a direct access device is
offline in response to a INQUIRY command that does not retreive vital
product data(I personally have observed the behaviour on an Adaptec 2405
and a 5805).  Force the peripheral qualifier to "connected" so that upper
layers correctly recognize that a disk is present.

This bug was uncovered by r216236.  Prior to that fix, aac(4) was
accidentally clearing the peripheral qualifier for all inquiry commands.

This fixes an issue where passthrough devices were not created for
disks behind aac(4) controllers suffering from the bug.  I have
verified that if a disk is not present that we still properly detect
that and not create the passthrough device.

Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2012-09-25 19:12:12 +00:00
John Baldwin
d95dca1d08 Add optional entropy harvesting for software interrupts in swi_sched()
as controlled by kern.random.sys.harvest.swi.  SWI harvesting feeds into
the interrupt FIFO and each event is estimated as providing a single bit of
entropy.

Reviewed by:	markm, obrien
MFC after:	2 weeks
2012-09-25 14:55:46 +00:00
Gleb Smirnoff
486e090204 Fix panic introduced by me in r240835, when zero weight
was passed to wtab_alloc().

Reported by:	Kim Culhan <w8hdkim gmail.com>
2012-09-25 12:45:41 +00:00
Alexander Motin
aa91d1a85c Reduce delays in several wait loops from 10ms to 10us, same is it is done
in Linux. This substantially increases graphics performance on Ivy Bridge.

Submitted by:	avg@
Reviewed by:	kib@
2012-09-25 10:52:49 +00:00
Adrian Chadd
23f44d2b30 Call ath_tx_tid_unsched() after the node has been flushed, so the
state can be printed correctly.
2012-09-25 05:56:59 +00:00
Alan Cox
c4fc9c14c2 Eliminate an unused declaration. 2012-09-25 03:59:10 +00:00
Jim Harris
6aa3468375 Use CAM_SEL_TIMEOUT and CAM_DEV_NOT_THERE to report missing targets or
LUNs respectively.  This removes a huge number of error messages
from CAM during bus scans.

Copied almost verbatim from mav's commit r237460.

Submitted by:	Mike Tancsa <mike at sentex dot net>
MFC after:	3 days
2012-09-24 21:45:41 +00:00
Jim Harris
95161fc323 Specify MTX_RECURSE for the controller's io_lock. Without it, tws(4)
immediately panics on boot with INVARIANTS enabled.  The driver already
clearly expects to be able to recurse on this mutex - the main I/O
is always recursing on this lock.

Reported and tested by:  Mike Tancsa <mike at sentex dot net>
MFC after: 1 week
2012-09-24 21:40:22 +00:00
Adrian Chadd
0368251456 Migrate the ath(4) KTR logging to use an ATH_KTR() macro.
This should eventually be unified with ATH_DEBUG() so I can get both
from one macro; that may take some time.

Add some new probes for TX and TX completion.
2012-09-24 20:35:56 +00:00
Adrian Chadd
6d24c7dbab Debugging output fixes:
* use the correct frame status - although the completion descriptor is
  the _last_ in the frame/aggregate, the status is currently stored in
  the _first_ buffer.

* Print out ath_buf specific fields once, not per descriptor in an ath_buf.
2012-09-24 19:48:41 +00:00
Hans Petter Selasky
391d3f18dc DWC OTG host mode improvements:
- Make HSIC selection dynamic.
 - Make LOW speed USB devices work through HIGH speed USB HUB.
2012-09-24 16:34:13 +00:00
Alexander Motin
076b76e871 Fix panic caused by wrong pointer dereference, left after pin sense rewrite
at r230551.

Also while there, make sense polling use reported for each node separately
instead of reporting accumulated total status.

Submitted by:	Barbara <barbara.freebsd@gmail.com> (1)
MFC after:	3 days
2012-09-24 08:23:05 +00:00
Adrian Chadd
0c54de88e6 Prepare for software retransmission of non-aggregate frames but ensure
it's disabled.

The previous commit to enable CLRDMASK setting didn't do it at all
correctly for non-aggregate sessions - so the CLRDMASK bit would be
cleared and never re-set.

* move ath_tx_update_clrdmask() to be called by functions that setup
  descriptors and queue frames to the hardware, rather than scattered
  everywhere.

* Force CLRDMASK to be set on all non-aggregate session frames being
  transmitted.

* Use ath_tx_normal_comp() now on non-aggregate sessoin frames
  that are queued via ath_tx_xmit_normal().  That way the TID hwq is
  updated and they can trigger (eventual) filter frame queue resets
  and software retransmits.

There's still a bit more work to do in this area to reverse the silly
short-sightedness on my part, however it's likely going to be better
to fix this now than just reverting the patch.

Thanks to people on the freebsd-wireless@ mailing list for promptly
pointing this out.
2012-09-24 06:42:20 +00:00
Adrian Chadd
94eefcf1dc In (eventual) preparation for supporting disabling the whole 11n/software
retry path - add some code to make it obvious (to me!) how to disable
the software tx path.
2012-09-24 06:00:51 +00:00
Pedro F. Giffuni
c148237d44 Partial revert of r239963:
The following change caused rpc.lockd to exit after startup:
____

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the
same way.
____

MFC after:	3 days
Reported by:	David Wolfskill
Tested by:	David Wolfskill
2012-09-24 03:14:17 +00:00
Sean Bruno
126a39ce60 This patch fixes a nit in the em, lem, and igb driver statistics. Increment
adapter->dropped_pkts instead of if_ierrors because if_ierrors is
overwritten by hw stats collection.

Submitted by:	Andrew Boyer <aboyer@averesystems.com>
Reviewed by:	Jack F Vogel <jfv@freebsd.org>
MFC after:	2 weeks
2012-09-23 22:53:39 +00:00
Pawel Jakub Dawidek
c622f88dd2 It is possible to recursively destroy snapshots even if the snapshot
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:

	tank
	tank/foo
	tank/foo@snap
	tank/bar
	tank/bar@snap

We can execute:

	# zfs destroy -t tank@snap

eventhough tank@snap doesn't exit.

Unfortunately it is not possible to do the same with recursive rename:

	# zfs rename -r tank@snap tank@pans
	cannot open 'tank@snap': dataset does not exist

...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.

Sponsored by:	rsync.net
MFC after:	2 weeks
2012-09-23 20:12:10 +00:00
Andrew Turner
c2257f93ba Clean up the bcm2835 initarm. It is now identical to the other ARMv6 copies
Tested by:	Alexander Yerenkow
2012-09-23 19:48:29 +00:00
Pawel Jakub Dawidek
bcb77be2b7 Add TRIM support.
The code builds a map of regions that were freed. On every write the
code consults the map and eventually removes ranges that were freed
before, but are now overwritten.

Freed blocks are not TRIMed immediately. There is a tunable that defines
how many txg we should wait with TRIMming freed blocks (64 by default).

There is a low priority thread that TRIMs ranges when the time comes.
During TRIM we keep in-flight ranges on a list to detect colliding
writes - we have to delay writes that collide with in-flight TRIMs in
case something will be reordered and write will reached the disk before
the TRIM. We don't have to do the same for in-flight writes, as
colliding writes just remove ranges to TRIM.

Sponsored by:	multiplay.co.uk

This work includes some important fixes and some improvements obtained
from the zfsonlinux project, including TRIMming entire vdevs on pool
create/add/attach and on pool import for spare and cache vdevs.

Obtained from:	zfsonlinux
Submitted by:	Etienne Dechamps <etienne.dechamps@ovh.net>
2012-09-23 19:40:58 +00:00
Alan Cox
1f11f2bff4 Address a race condition that was introduced in r238212. Unless the page
queues lock is acquired before the page lock is released, there is no
guarantee that the page will still be in that same page queue when
vm_page_requeue() is called.

Reported by:		pho
In collaboration with:	kib
MFC after:	3 days
2012-09-23 17:42:39 +00:00
Nathan Whitehorn
2383d92ae8 Move the prototype for savectx from cpu.h to pcb.h, as it is on other
platforms, as well as putting it in an #ifdef KERNEL block.

MFC after:	2 weeks
2012-09-23 17:33:16 +00:00
Hans Petter Selasky
3eabad2587 DWC OTG host mode improvements. Add support for the 3-strikes and you are
gone rule. Optimise use of channels so that when a channel
is not ready another channel is used. Instead of using the SOF interrupt
use the system timer to drive the host statemachine. This might
give lower throughput and higher latency, but reduces the CPU usage
significantly. The DWC OTG host mode support should not be considered
for serious USB host controller applications. Some problems are still
seen with LOW speed USB devices.
2012-09-23 12:19:19 +00:00
Hans Petter Selasky
3c12706c5e Correct driver name.
MFC after:	1 weeks
2012-09-23 09:39:04 +00:00
Yoshihiro Takahashi
47c6c015ff MFi386: revision 237445
Commit changes missed from r237435.  Properly calculate the signal
  trampoline addresses after the shared page is enabled.  Handle FreeBSD
  ABIs without shared page support too.

MFi386: revision 238792

  Introduce curpcb magic variable.
2012-09-23 09:13:57 +00:00
Yoshihiro Takahashi
a112b2d0e4 MFi386: revision 240637
loader/i386: replace ugly inb/outb re-implementations with cpufunc.h
2012-09-23 08:50:54 +00:00
Andrew Turner
610ba83a97 Fix a typo in a Broadcom initarm debug printf 2012-09-23 08:49:41 +00:00
Yoshihiro Takahashi
4794983d3e Cosmetic changes. 2012-09-23 08:46:44 +00:00
Kevin Lo
cecaa4738c Remove unused variable ma. 2012-09-23 08:44:12 +00:00
Michael Tuexen
e06f3469e0 Whitespace change.
MFC after:	3 days
2012-09-23 07:43:10 +00:00
Michael Tuexen
a98809db78 Declare a static function as such.
MFC after:	3 days
2012-09-23 07:23:18 +00:00
Andrew Turner
1f008b99cc Pull out the SoC specific parts of initarm into separate functions 2012-09-23 03:46:03 +00:00
Andrew Turner
70203625fb Update different versions of physmap_init to be identical in preparation
for merging them.
2012-09-23 02:01:59 +00:00
Andrew Turner
d98d8a1e83 Reduce the diff between the FDT implementations of initarm.
This only touches whitespace and comments.
2012-09-22 22:41:38 +00:00
Michael Tuexen
efb0814c24 Fix a bug related to handling Re-config chunks. It is not true that
the association can be removed if the socket is gone.

MFC after:	3 days
2012-09-22 22:04:17 +00:00
Gleb Smirnoff
51e02a31d0 EBUSY is a better reply for refusing to unload pf(4) or pfsync(4).
Submitted by:	pluknet
2012-09-22 19:03:11 +00:00
Gleb Smirnoff
e9e4cb7345 Use M_NOWAIT in wtab_alloc(), too. Convert panic() to
a soft failure here. wtab_alloc() is used by red_alloc(),
which can fail.

Reported by:	Kim Culhan <w8hdkim gmail.com>
2012-09-22 18:47:14 +00:00
Pawel Jakub Dawidek
45a1f1e1ff Add rounddown2() macro similar to the roundup2() macro. 2012-09-22 17:49:25 +00:00
Andriy Gapon
81c4584e30 zfs: allow a zvol to be used as a pool vdev, again
Do this by checking if spa_namespace_lock is already held and not taking
it again in that case.
Add a comment explaining why that is done and why it is safe.

Reviewed by:	pjd
MFC after:	24 days
2012-09-22 17:42:53 +00:00
Pawel Jakub Dawidek
a6512306ce Fix an obvious typo. 2012-09-22 17:41:56 +00:00
Pawel Jakub Dawidek
3c5a057574 As in r226967, r226987 and r232401 changes to UFS and TMPFS remove cache
entries associated with the source and the target of rename().

MFC after:	1 week
2012-09-22 17:32:40 +00:00
Michael Tuexen
2089750009 Small cleanups. No functional change.
MFC after:	10 days
2012-09-22 14:39:20 +00:00
Gleb Smirnoff
03fb709e1f Convert more M_WAITOK malloc() to M_NOWAIT.
Reported by:	Kim Culhan <w8hdkim gmail.com>
2012-09-22 12:49:36 +00:00
Pawel Jakub Dawidek
171f6b3a34 Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.

Reported by:	Brian Parkison <parkison@panzura.com>
Discussed with:	phk
MFC after:	1 week
2012-09-22 12:41:49 +00:00
Konstantin Belousov
787a64ddd2 Do not skip two elements of the tid_buffer when reusing the buffer
slot. This eventually results in exhaustion of the tid space, causing
new threads get tid -1 as identifier.

The bad effect of having the thread id equal to -1 is that
UMTX_OP_UMUTEX_WAIT returns EFAULT for a lock owned by such thread,
because casuword cannot distinguish between literal value -1 read from
the address and -1 returned as an indication of faulted
access. _thr_umutex_lock() helper from libthr does not check for
errors from _umtx_op_err(2), causing an infinite loop in
mutex_lock_sleep().

We observed the JVM processes hanging and consuming enormous amount of
system time on machines with approximately 100 days uptime.

Reported by:	Mykola Dzham <freebsd levsha org ua>
MFC after:	1 week
2012-09-22 12:17:09 +00:00
Gleb Smirnoff
29bdd62c85 When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().
2012-09-22 10:14:47 +00:00
Gleb Smirnoff
e706fd3a3a In pfr_insert_kentry() return ENOMEM if memory allocation failed. 2012-09-22 10:04:48 +00:00
Gleb Smirnoff
7348c5240d Fix fallout from r236397 in pfr_update_stats(), that was missed
later in r237155. We need to zero sockaddr before lookup. While
here, make pfr_update_stats() panic on unknown af.
2012-09-22 10:02:44 +00:00
Hans Petter Selasky
9b42038b8a Apply some more casting. 2012-09-22 08:02:42 +00:00
Rui Paulo
5ea861555c Improve the check for p4 opened files.
Now we only search for opened files in ${SYSDIR}, which makes it
possible to use multiple source trees.
2012-09-22 07:44:36 +00:00
Hans Petter Selasky
8692ca3647 Apply correct casting. 2012-09-22 07:27:24 +00:00
Alan Cox
6b7d314db2 Since UMA_ZONE_NOFREE is specified when l2zone and l2table_zone are created,
there is no need to release and reacquire the pmap and pvh global locks
around calls to uma_zfree().  Recursion into the pmap simply won't occur.

Eliminate the use of M_USE_RESERVE.  It is deprecated and, in fact, counter-
productive, meaning that it actually makes the memory allocation request
more likely to fail.

Eliminate the macros pmap_{alloc,free}_l2_dtable().  They are of limited
utility, and pmap_free_l2_dtable() was inconsistently used.

Tidy up pmap_init().  In particular, change the initialization of the PV
zone so that it doesn't span the initialization of the l2 and l2table zones.

Tested by:	jmg
2012-09-22 06:54:03 +00:00
Andrew Turner
1161298251 Create a common set_stackptrs in sys/arm/machdep.c.
On single core devices set_stackptrs is only ever called with cpu = 0 in
initarm and will be identical to the existing function. On SMP this needs
to be implemented for sys/arm/mp_machdep.c, but the implementations are
identical for each SoC.
2012-09-22 06:41:56 +00:00
Andreas Tobler
bf23276b69 Remove leftover from r215163. 2012-09-21 21:27:57 +00:00
Rui Paulo
b0bf1a1836 Remove #ident macro. 2012-09-21 19:18:39 +00:00
Andreas Tobler
777813c555 Implement elfN(reloc) for powerpc. With this change the kernel is now able to
resolve dependencies of modules at boot time and load additional modules when
needed.

MFC after:	1 week
2012-09-21 18:21:31 +00:00
Dimitry Andric
f99157cced After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after:	3 days
2012-09-21 10:31:19 +00:00
Hans Petter Selasky
2a4e4c6772 Fix typo. 2012-09-20 15:11:59 +00:00
Kevin Lo
26f370d011 Fix typo: s/protocl/protocol 2012-09-20 10:07:31 +00:00
Gleb Smirnoff
3b7d677b8f Convert lagg(4) to use if_transmit instead of if_start.
In collaboration with:	thompsa, sbruno, fabient
2012-09-20 10:05:10 +00:00
Konstantin Belousov
5f9c767b19 Plug the accounting leak for the wired pages when msync(MS_INVALIDATE)
is performed on the vnode mapping which is wired in other address space.

While there, explicitely assert that the page is unwired and zero the
wire_count instead of substract. The condition is rechecked later in
vm_page_free(_toq) already.

Reported and tested by:	zont
Reviewed by:	alc (previous version)
MFC after:	1 week
2012-09-20 09:52:57 +00:00
Gavin Atkinson
3cdfd8d3b2 The correct generic term for PCIS_STORAGE_NVM is "NVM" not "NVM Express".
Submitted by:	jimharris
MFC after:	6 days
2012-09-20 08:30:17 +00:00
Gleb Smirnoff
b7340ded6e Reduce copy/paste when freeing an source node. 2012-09-20 07:04:08 +00:00
Gleb Smirnoff
22c914789e Utilize Jenkins hash with random seed for source nodes storage. 2012-09-20 06:52:05 +00:00
Kevin Lo
b7e1113e8f Fix typo: s/pakcet/packet 2012-09-20 03:29:43 +00:00
Adrian Chadd
4e81f27c59 Introduce the CLRDMASK gating based on tid->clrdmask, enabling filtered
frames to occur.

* Create a new function which will set the bf_flags CLRDMASK bit
  if required.
* For raw frames, always set CLRDMASK.
* For BAR, ADDBA frames, always set CLRDMASK.
* For everything else, check if CLRDMASK needs to be set before
  calling tx_setds() or tx_setds11n().
* When unpausing a queue or drain/resetting it, set tid->clrdmask=1
  just to ensure traffic starts flowing.

What I need to do:

* Modify that function to _clear_ the CLRDMASK if it's not required,
  or retried frames may have CLRDMASK set when they don't need to.
  (Which isn't a huge deal, but..)

Whilst I'm here:

* ath_tx_normal_xmit() should really act like the AMPDU session TX
  functions - any incomplete frames will end up being assigned
  ath_tx_normal_comp() which will decrement tid->hwq_depth - but that
  won't have been incremented.

  So whilst I'm here, add a comment to do that.

* Fix the debug print function to be slightly clearer about things;
  it's not a good sign when I can't interpret my own debugging output.

I've done some testing on AR9280/AR5416/AR9160 STA and AP modes.
2012-09-20 03:13:20 +00:00
Gleb Smirnoff
7b11548469 Add missing break.
Pointy hat to:	glebius
2012-09-20 03:09:58 +00:00
Adrian Chadd
d05b576d61 Place the comment where it should be. 2012-09-20 03:04:19 +00:00
Adrian Chadd
088d8b81f3 Add a work-around for some strange net80211 BAR races in the wireless
stack.

There are unfortunately quite a few odd cases in BAR TX and BAR TX
retransmission that I haven't yet fully diagnosed.  So for now, add
this work-around so the resume() function isn't called too often,
decrementing pause to -1 (and causing things to stay paused.)
2012-09-20 03:03:01 +00:00
Rick Macklem
c52005a31d Modify the NFSv4 client so that it can handle owner
and owner_group strings that consist entirely of
digits, interpreting them as the uid/gid number.
This change was needed since new (>= 3.3) Linux
servers reply with these strings by default.
This change is mandated by the rfc3530bis draft.
Reported on freebsd-stable@ under the Subject
heading "Problem with Linux >= 3.3 as NFSv4 server"
by Norbert Aschendorff on Aug. 20, 2012.

Tested by:	norbert.aschendorff at yahoo.de
Reviewed by:	jhb
MFC after:	2 weeks
2012-09-20 02:49:25 +00:00
Jung-uk Kim
042ff955b5 Merge ACPICA 20120913. 2012-09-19 23:25:24 +00:00
Tijl Coosemans
303d68bc4f Fix a panic when trying to play invalid audio tracks. 2012-09-19 18:42:31 +00:00
Jim Harris
7e2fd60604 In nvme(4), set device description for BUS_PROBE_GENERIC case.
Reported by:	jhb
2012-09-19 18:25:25 +00:00
Gavin Atkinson
a5c5eaae8c Recognise NVM Express devices and pretty-print their name.
MFC after:	1 week
2012-09-19 18:22:14 +00:00
Jim Harris
d891b199bf Report nvme(4) as a generic driver for NVMe devices if PCI class, subclass
and programming interface codes match.

Sponsored by:	Intel
2012-09-19 16:21:23 +00:00
Jim Harris
6483d5a592 Add constants for programming interfaces for NVM/solid state storage
controller sub-class code.

Reference:  PCI Code and ID Assignment Specification Rev 1.2

Sponsored by:	Intel
Inspired by:	gavin
MFC after: 	1 week
X-MFC-With:	r240694
2012-09-19 15:43:30 +00:00
Gavin Atkinson
536f8fdecf Add PCI subclass for NVM Express devices.
Reference:
http://www.nvmexpress.org/index.php/download_file/view/42/1/NVM_Express_1_0b.pdf
section 2.1.5.

MFC after:	1 week
2012-09-19 12:54:25 +00:00
Gavin Atkinson
e935190a33 Switch some PCI register reads from using magic numbers to using the names
defined in pcireg.h

MFC after:	1 week
2012-09-19 12:27:23 +00:00
John Baldwin
26e76e98ef As a followup to r234501, ensure that the native ioctl path always allocates
a 4kb buffer if a request uses a buffer size of 0.  (The Linux ioctl path
already did this.)

PR:		kern/155658
Submitted by:	Andreas Longwitz
MFC after:	1 week
2012-09-19 11:54:32 +00:00
Gavin Atkinson
d11d0374ab Add entries for two USB devices I have locally.
MFC after:	1 week
2012-09-18 22:25:49 +00:00
Gavin Atkinson
389c8bd51e Align the PCI Express #defines with the style used for the PCI-X
#defines.  This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
  s/PCIR_EXPRESS_/PCIER_/g
  s/PCIM_EXP_/PCIEM_/g
  s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 22:04:59 +00:00
Adrian Chadd
0aa5c1bbf5 Oops - take a copy of ath_tx_status from the buffer before the TX processing
is done.

The aggregate path was definitely accessing 'ts' before it was actually
being assigned.

This had the side effect of over-filtering frames, since occasionally that
bit would be '1'.

Whilst here, do the same thing in the non-aggregate completion function -
as calling the filter function may also invalidate bf.

Pointy hat to: adrian, for not noticing this over many, many code reviews.
2012-09-18 20:33:04 +00:00
Gleb Smirnoff
2864dbbfc1 If caller specifies UMA_ZONE_OFFPAGE explicitly, then do not waste memory
in an allocation for a slab.

Reviewed by:	jeff
2012-09-18 20:28:55 +00:00
Jim Harris
8a382371f1 Add #if 0 around nvme_async_event_cb() until NVMe AER functionality
can be tested.

This fixes a build warning found only with clang.
2012-09-18 18:23:21 +00:00
Jim Harris
be4dcf1bfa Add __aligned(4) to NVMe defined data structures.
This fixes issue in nvmecontrol(8), where clang throws a cast-align
warning when casting a __packed structure pointer to a uint32_t
pointer as part of printing raw hex output.

Reported by: dhw
2012-09-18 18:16:52 +00:00
Alexander Motin
1123f298f3 Fix panics on attempt to dereference uninitizlized pointer, returned via
'path' argument of ofw_parsedev() if devspec refers raw device with no path.

For example, `ls /pci@1f,0/ide@d/disk@0,0:a/` works fine, while
`ls /pci@1f,0/ide@d/disk@0,0:a` panicked before this change.
2012-09-18 15:38:42 +00:00
Andriy Gapon
e67c0426bc hwpmc amd_pcpu_fini: fix a bug in code locked under DEBUG
MFC after:	16 days
2012-09-18 13:33:39 +00:00
Gleb Smirnoff
32726fe911 Do more than r236298 did in the projects/pf branch: use M_NOWAIT in
altq_add() and its descendants. Currently altq(4) in FreeBSD is configured
via pf(4) ioctls, which can't configure altq(4) w/o holding locks.
Fortunately, altq(4) code in spife of using M_WAITOK is ready to receive
NULL from malloc(9), so change is mostly mechanical. While here, utilize
M_ZERO instead of bzero().

A large redesign needed to achieve M_WAITOK usage when configuring altq(4).
Or an alternative (not pf(4)) configuration interface should be implemented.

Reported by:	pluknet
2012-09-18 12:34:35 +00:00
Gleb Smirnoff
9ed8bbbdbe Fix build, pass the pointy hat please. 2012-09-18 12:21:32 +00:00
Gleb Smirnoff
7f7ef494f1 Provide kernel compile time option to make pf(4) default rule to drop.
This is important to secure a small timeframe at boot time, when
network is already configured, but pf(4) is not yet.

PR:		kern/171622
Submitted by:	Olivier Cochard-LabbИ <olivier cochard.me>
2012-09-18 11:07:19 +00:00
Gleb Smirnoff
1d6139c0e4 Make ruleset anchors in pf(4) reentrant. We've got two problems here:
1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
  as small as possible, following measures taken:
  - Maximum stack size reduced from 64 to 32.
  - The struct pf_anchor_stackframe trimmed by one pointer - parent.
    We can always obtain the parent via the rule pointer.
  - When pf_test_rule() calls pf_get_translation(), the former lends
    its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with:		dhartmei
2012-09-18 10:54:56 +00:00
Gleb Smirnoff
9e8c4accee - Add $FreeBSD$ to allow modifications to this file.
- Move $OpenBSD$ to a more standard place.
2012-09-18 10:52:46 +00:00
Adrian Chadd
f1bc738ece Implement my first cut at filtered frames in aggregation sessions.
The hardware can optionally "filter" frames if successive transmissions
to a given node (ie, "entry in the keycache") fail.  That way the hardware
can implement a kind of early abort of all the other frames queued to
that destination, rather than simply trying to TX each frame to that
destination (and failing.)

The background:

* If a frame comes back as being filtered, the hardware didn't try to
  TX it (or it was outside the TX burst opportunity.) So, take it as a hint
  that some (but not all, see below) frames to the destination may be
  filtered.

* If the CLRDMASK bit is set in a TX descriptor, the "filter to this
  destination" bit in the keycache entry is cleared and TX to that host
  will be unconditionally retried.

* Right now everything has the CLRDMASK bit set, so filtered frames
  tend to be aggregates and frames that fall outside of the WME burst
  window. It was a bit worse in the past as I had messed up the TX
  flags and CLRDMASK wasn't being set on aggregate frames.

The annoying bits:

* It's easy (ish) to do for aggregate session frames - firstly, they
  can be retried in any order as long as they're within the BAW, and
  there's already a bunch of infrastructure tracking how many frames
  the TID has queued to the hardware (tid->hwq_depth.) However, for
  frames that bypassed the software queue, hwq_depth doesn't get
  incremented. I'll fix that in a subsequent commit.

* For non-aggregate session frames, the only retries that can occur
  are ones for sequence numbers that hvaen't successfully been TXed yet.
  Since there's no re-ordering going on in non-aggregate sessions, if any
  subsequent seqno frames make it out, any filtered frames before that
  seqno need to be dropped.

  Hence why this initially is just for aggregate session frames.

* Since there may be intermediary frames to the destination that
  have CLRDMASK set - for example, any directly dispatched management
  frames to that destination - it's possible that there will be some
  filtered frames followed up by some non filtered frames.  Thus,
  it can't be assumed that once you see a filtered frame for the given
  destination node, all subsequent frames for all TIDs will be filtered.

Ok, with that in mind:

* Create a per-TID filtered frame queue for frames that the hardware
  returns as filtered.

* Track filtered frames per-tid, rather than per-node.  It just makes
  the locking much easier.

* When a filtered frame appears in the completion function, the node
  transitions to "filtered", and all subsequent completed error frames
  (filtered or otherwise) are put on the filtered frame queue.  The TID
  is paused once (during the transition from non-filtered to filtered).

* If a filtered frame retry count exceeds SWMAX_RETRIES, a BAR should be
  sent.

* Once all the frames queued to the hardware for the given filtered frame
  TID, transition back from filtered frame to non-filtered frame, which
  means pre-pending all the filtered frames onto the head of the software
  queue, clearing the filtered frame state and unpausing the TID.

Things get quite hairy around handling completion (aggr, non-aggr, norm,
direct-dispatched frames to a hardware queue); whether it's an "error",
"cleanup" or "BAR" state as well as filtered, which order to do things
in (eg do filtered BEFORE checking for BAR, as the filter completion
may be needed to actually transmit a BAR frame.)

This work has definitely reminded me that I have to tidy up all the locking
and remove some of the ridiculous lock/unlock/lock/unlock going on in the
completion functions.

It's also reminded me that I should really split out TID versus hardware TXQ
locking, even if the underlying locking is still the destination hardware TXQ.

Finally, this is all pre-requisite for working on AP mode power save support
(PS-POLL, uAPSD) as well as improving performance to misbehaving nodes (as
they can transition into filter mode, stopping any TX until everything has
caught up.)

Finally (ish) - this should also be done for non-aggregate sessions as
there are still plenty of laptops and mobile devices that don't speak
802.11n but do wish for stable, useful power save AP support where packets
aren't simply dropped.  This requires software retransmission for
non-aggregate sessions to be implemented, which includes the caveats I've
mentioned above.

Finally finally - this doesn't yet do anything about the CLRDMASK bit in the
TX descriptor.  That's still unconditionally set to 1.  I'll debug the
current work (mostly ensuring I haven't busted up the hairy transitions
between BAR, filtered, error (all frames in an aggregate failing) and
cleanup (when transitioning from aggregation -> non-aggregation.))

Finally finally finally - this is all original work by yours truely, rather
than ported from the Atheros internal driver codebase or Linux ath9k.

Tested:
 * AR9280, AR5416 in STA mode
 * AR9280, AR9130 in hostap mode
 * Lots and lots of iperf testing in very marginal and non-marginal conditions,
   complete with inducing filtered frames + BAR TX conditions.
2012-09-18 10:14:17 +00:00
Gleb Smirnoff
effbcf3842 Fix DIOCNATLOOK: zero key padding before performing lookup. 2012-09-18 09:15:32 +00:00
Andriy Gapon
a80a10b13b loader/i386: replace ugly inb/outb re-implementations with cpufunc.h
Use of __builtin_constant_p in a function that is only called via
a pointer is a good example of how out-of-date it was.

Suggested by:	bde
MFC after:	1 week
2012-09-18 08:53:11 +00:00
Andriy Gapon
154fc7b6c7 acpi_cpu: explicitly notify userland about c-state changes
... after they are committed.
A notification is sent per CPU.

Reviewed by:	imp
MFC after:	3 weeks
2012-09-18 08:17:29 +00:00
Andriy Gapon
6ed9e9f32f zfs: correctly calculate dn_bonuslen for saving SAs to disk
Since all attribute values start at 8-byte aligned boundary, we would
previously incorrectly calculate dn_bonuslen if any attribute but the
last had a variable-length value with length not multiple of 8.

Reported by:	Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Tested by:	Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Reviewed by:	Matthew Ahrens <mahrens@delphix.com> (for upstream)
MFC after:	2 weeks
2012-09-18 08:02:54 +00:00
Andriy Gapon
ea559fb573 zfs: allow both DEBUG and ZFS_DEBUG to be defined on command line
Discussed with:	pjd
MFC after:	10 days
2012-09-18 08:00:56 +00:00
Kevin Lo
9f614af4cf Add missing break 2012-09-18 08:00:43 +00:00
Andriy Gapon
85f5b9aa70 g_disk_flushcache definitely should not be traced under G_T_TOPOLOGY
... use G_T_BIO instead

MFC after:	1 week
2012-09-18 07:57:34 +00:00
Kevin Lo
08466b02d4 Remove bogus break statements.
Obtained from:	DragonFly
2012-09-18 02:19:43 +00:00
Adrian Chadd
8122c3163f Add a couple of accessor inline functions for state that exists in net80211.
Obtained from:	Qualcomm Atheros
2012-09-18 01:27:24 +00:00
Attilio Rao
6a612df12c Remove namespace pollution in _rmlock.h by defining rm_queue structure
directly in _rmlock.h and then including it (and its dependencies)
in pcpu.h. This leads to few _*.h headers to be included in pcpu.h
but this is not considered a big deal.

Really pc_rm_queue should be implemented as a dynamic member with
DPCPU interface, but we really want to keep the read acquisition as
fast as possible, so even the further pc_dynamic indirection should be
avoided, and the pollution is dealt like this.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 00:43:15 +00:00
Adrian Chadd
d94f2d7f34 Rename AH_MIMO_MAX_CHAINS to AH_MAX_CHAINS, for compatibility with
internal atheros HAL code.
2012-09-17 23:24:45 +00:00
Jim Harris
978b27047d Add nvme(4) and nvd(4) Makefiles to the tree.
Noticed by:	pluknet
Pointy-hat to:  jimharris
2012-09-17 19:58:02 +00:00
Jim Harris
eb85d44f06 Integrate nvme(4) and nvd(4) into the amd64 and i386 builds.
Sponsored by:	Intel
2012-09-17 19:26:33 +00:00
Jim Harris
bb0ec6b359 This is the first of several commits which will add NVM Express (NVMe)
support to FreeBSD.  A full description of the overall functionality
being added is below.  nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."

This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.

Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.

There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback.  Questions and feedback are welcome.

nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces.  The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.

The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
     dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7

nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling

nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
      human-readable or hex format
- perftest: quick and dirty performance test to measure raw
      performance of NVMe device without userspace/physio/GEOM
      overhead

The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval

This has been primarily tested on amd64, with light testing on i386.  I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself.  Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.

The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.

Sponsored by:        Intel
Contributions from:  Joe Golio/EMC <joseph dot golio at emc dot com>
2012-09-17 19:23:01 +00:00
Hans Petter Selasky
d7dd13419e Add UQ_UMS_IGNORE quirk.
Wrap two long lines.
Some minor spelling correction.

PR:	usb/171721
2012-09-17 19:06:35 +00:00
Hans Petter Selasky
e2524b2ec9 Implement support for USB Audio v2.0. Remove some redundant
USB audio v1.0 debug data, hence userspace tools like lsusb
exist to show this information properly.
2012-09-17 15:43:57 +00:00
John Baldwin
0fca6f8bf5 Add locking to mlx(4) to make it MPSAFE along with some other fixes:
- Use callout(9) rather than timeout(9).
- Add a mutex as an I/O lock that protects the adapter and is used
  for the I/O path.
- Add an sx lock as a configuration lock that protects the relationship
  of configured volumes.
- Freeze the request queue when a DMA load is deferred with EINPROGRESS
  and unfreeze the queue when the DMA callback is invoked.
- Explicitly poll the hardware while waiting to submit a command to
  allow completed commands to free up slots in the command ring.
- Remove driver-wide 'initted' variable from mlx_*_fw_handshake() routines.
  That state should be per-controller instead.  Add it as an argument
  since the first caller knows when it is the first caller.
- Remove explicit bus_space tag/handle and use bus_*() rather than
  bus_space_*().
- Move duplicated PCI device ID probing into a  mlx_pci_match() routine.
- Don't check for PCIM_CMD_MEMEN (the PCI bus will enable that when
  allocating the resource) and use pci_enable_busmaster() rather than
  manipulating the register directly.

Tested by:	no one despite multiple requests (hope it works)
2012-09-17 15:27:30 +00:00
Gavin Atkinson
058ede33bf - Add #defines for the bits within the iPCI Express PCIR_EXPRESS_LINK_CTL
register
- Add missing register PCIR_EXPRESS_ROOT_CAP
- Correct a spelling mistake (SLAT -> SLOT) [1]

Reviewed by:	jhb [1]
2012-09-17 12:51:48 +00:00
Kevin Lo
4e4eb12038 Remove unused variable cd.
This variable is initialized but not used.
2012-09-17 09:32:11 +00:00
Andrew Turner
71f5a44d88 Add a kernel config for the Toshiba AC100. The AC100 is an ARM laptop with
an NVidia Tegra 2 CPU.

Tegra 2 needs an external patch to pmap for atomic operations to work. Even
with this the Kernel only gets to the mount root prompt. As such Tegra
support is considered experimental, however adding the kernel config will
help ensure the Tegra code builds.
2012-09-17 09:22:59 +00:00
Andrew Turner
a7dc3573ca Add the Tegra2 DTS files. Now our dtc supports including other files use
this support to pull out the SoC specific parts of the dts file.
2012-09-17 07:14:07 +00:00
Adrian Chadd
c6e9cee205 Take credit for the work I've done in this source file. 2012-09-17 03:17:42 +00:00