Commit Graph

92969 Commits

Author SHA1 Message Date
Adrian Chadd
ff5b563430 Initialise the chainmask fields regardless of whether 11n support
is compiled in or not.

This fixes issues with people running -HEAD but who build modules
without doing a "make buildkernel KERNCONF=XXX", thus picking up
opt_*.h.  The resulting module wouldn't have 11n enabled and the
chainmask configuration would just be plain wrong.
2013-04-19 21:49:11 +00:00
Luigi Rizzo
2579e2d715 mostly whitespace changes:
- remove vestiges of the old memory allocator
- clean up some comments
2013-04-19 21:08:21 +00:00
Kenneth D. Merry
21b6ee96fc Update chio(1) and ch(4) to support reporting element designators.
This allows mapping a tape drive in a changer (as reported by
'chio status') to a sa(4) driver instance by comparing the
serial numbers.

The designators can be ASCII (which is printed out directly), binary
(which is printed in hex format) or UTF-8, which is printed in either
native UTF-8 format if the terminal can support it, or in %XX notation
for non-ASCII characters.  Thanks to Hiroki Sato <hrs@> for the
explaining UTF-8 printing and example UTF-8 printing code.

chio.h:		Modify the changer_element_status structure to add new
		fields and definitions from the SMC3r16 spec.

		Rename the original CHIOGSTATUS ioctl to OCHIOGTATUS and
		define a new CHIOGSTATUS ioctl.

		Clean up some tab/space issues.

chio.c: 	For the 'status' subcommand, print the designator field
		if it is supplied by a device.

scsi_ch.h:	Add new flags for DVCID and CURDATA to the READ
		ELEMENT STATUS command structure.

		Add a read_element_status_device_id structure
		for the data fields in the new standard. Add new
		unions, dt_or_obsolete and voltage_devid, to hold
		and address data from either SCSI-2 or newer devices.

scsi_ch.c:	Implement support for fetching device IDs with READ
		ELEMENT STATUS data.

		Add new arguments to scsi_read_element_status() to
		allow the user to request the DVCID and CURDATA bits.
		This isn't compiled into libcam (it's only an internal
		kernel interface), so we don't need any special
		handling for the API change.

		If the user issues the new CHIOGSTATUS ioctl, copy all of
		the available element status data out.  If he issues the
		OCHIOGSTATUS ioctl, we don't copy the new fields in the
		structure.

		Fix a bug in chopen() that would result in the peripheral
		never getting unheld if chgetparams() failed.

Sponsored by:	Spectra Logic
Submitted by:	Po-Li Soong
MFC After:	1 week
2013-04-19 20:03:51 +00:00
Adrian Chadd
7b796c4039 Implement a very basic multi-PHY aware switch device.
This is intended to be used as a stop-gap for switch devices
which expose multiple ethernet PHYs but we don't have a driver
for - here, etherswitchcfg and the general switch configuration
API can be used to interface to said PHYs.

Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
2013-04-19 17:50:38 +00:00
Jaakko Heinonen
a208417c41 Include PID in the error message which is printed when the maxproc limit
is exceeded. Improve formatting of the message while here.

PR:		kern/60550
Submitted by:	Lowell Gilbert, bde
2013-04-19 15:19:29 +00:00
Gleb Smirnoff
14658a80fe Don't compare unsigned socklen_t against < 0.
Reviewed by:	jhb
2013-04-19 13:40:13 +00:00
Jilles Tjoelker
1e367efa8b sem: Restart the POSIX sem_* calls after signals with SA_RESTART set.
Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.

The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.

The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.

It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.

Discussed with:	jhb
MFC after:	1 week
2013-04-19 10:16:00 +00:00
Adrian Chadd
7904f51655 Add a debug statement to log the currently chosen chainmask configuration. 2013-04-19 08:06:45 +00:00
Adrian Chadd
b0bf95ff15 .. don't know how this snuck into this commit. Sorry.
Fix compile build before anyone notices.
2013-04-19 08:01:34 +00:00
Adrian Chadd
b661bd2e52 Print out the chainmask configuration. 2013-04-19 07:56:22 +00:00
Adrian Chadd
6f4fb2d8e6 Use uint32_t for fields that are fetched via ath_hal_getcapability(). 2013-04-19 06:59:10 +00:00
Justin Hibbits
6ba2699056 Fix the uart(4) module build. Without uart_dev_lpc the module cannot be loaded. 2013-04-19 05:46:16 +00:00
Andrey A. Chernov
2b50ce65be Attempt to mitigate poor initialization of arc4 by one-shot
reinitialization from yarrow right after good entropy is harvested.

Approved by:    secteam (delphij)
MFC after:      1 week
2013-04-19 00:30:52 +00:00
Rick Macklem
64a0e848ab When an NFS unmount occurs, once vflush() writes the last dirty
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
2013-04-18 23:20:16 +00:00
Navdeep Parhar
3cc7ae06fd cxgbe(4): Refuse to install T5 firmwares on a T4 card (and vice versa).
MFC after:	1 week
2013-04-18 22:54:41 +00:00
Oleg Bulyzhin
2c5b403e2d Recover missing arp_ifinit() call.
MFC after:	2 weeks
2013-04-18 20:13:33 +00:00
Navdeep Parhar
dd181b2652 cxgbe/tom: Update the CLIP table on the chip when there are changes
to the list of IPv6 addresses on the system.  The table is used for
TOE+IPv6 only.
2013-04-18 19:52:11 +00:00
Alexander Motin
b2c63698d4 Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot.  It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.

Reviewed by:	jimharris, kib
MFC after:	1 month
2013-04-18 17:07:04 +00:00
Rick Macklem
175b3f31d3 Both NFS clients can deadlock when using the "rdirplus" mount
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.

Reported and tested by:	jhb
Reviewed by:	jhb
MFC after:	2 weeks
2013-04-18 13:09:04 +00:00
Alexander Motin
ca11419237 Make siis(4) and mvs(4) send bus_get_dma_tag() requests to parent buses
passing real bus' child pointers instead of grandchilds.

Requested by:	kib
2013-04-18 12:43:06 +00:00
Rui Paulo
5dfae12246 Move the previously added CPUID7 macros to CPUID_STDEXT. 2013-04-18 07:09:27 +00:00
Alan Cox
880659fe81 When calculating the number of reserved nodes, discount the pages that will
be used to store the nodes.

Sponsored by:	EMC / Isilon Storage Division
2013-04-18 05:34:33 +00:00
Rui Paulo
ba5f77bf16 Add the most current CPUID7_* definitions. 2013-04-18 01:30:08 +00:00
Rui Paulo
068e8f74e4 Print RDSEED, ADX, and SMAP.
Pointed out by:	kib
2013-04-18 01:21:44 +00:00
Kenneth D. Merry
adb974068b Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to
sys/nfs, since it is now shared by the two NFS servers.

Suggested by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 22:42:43 +00:00
Hiren Panchasara
76fe16bb78 Improving r249461 by providing a better way to handle the clang warning.
PR:		kern/177164
Reviewed by:	jhb
Approved by:	sbruno (mentor)
2013-04-17 21:21:27 +00:00
Kenneth D. Merry
d96b98a360 Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes).  It does not currently work for NFSv4 requests.  They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately.  Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads.  This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered.  Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS.  Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once.  ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking.  This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables.  The read offset window calculation has been slightly
modified as well.  Instead of having separate bins, each file
handle has a rolling window of bin_shift size.  This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
	Add nfs_fha_new.c and nfs_fha_old.c.  Compile nfs_fha.c
	when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
	Bring in changes from Rick Macklem to newnfs_realign that
	allow it to operate in blocking (M_WAITOK) or non-blocking
	(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
	Bring in a change from Rick Macklem to allow telling
	nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
	Bring in changes from Rick Macklem to create a new
	nfsm_dissect_nonblock() inline function and
	NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
	Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
	Setup the new NFS server's RPC thread pool so that it will
	call the FHA code.

	Add the malloc flag argument to newnfs_realign().

	Unstaticize newnfs_nfsv3_procid[] so that we can use it in
	the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
	In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
	that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
	In nfsd_fhtovp(), if we're starting a write, check to see
	whether the underlying filesystem supports shared writes.
	If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
	Remove all code that is specific to the NFS server
	implementation.  Anything that is server-specific is now
	accessed through a callback supplied by that server's FHA
	shim in the new softc.

	There are now separate sysctls and tunables for the FHA
	implementations for the old and new NFS servers.  The new
	NFS server has its tunables under vfs.nfsd.fha, the old
	NFS server's tunables are under vfs.nfsrv.fha as before.

	In fha_extract_info(), use callouts for all server-specific
	code.  Getting file handles and offsets is now done in the
	individual server's shim module.

	In fha_hash_entry_choose_thread(), change the way we decide
	whether two reads are in proximity to each other.
	Previously, the calculation was a simple shift operation to
	see whether the offsets were in the same power of 2 bucket.
	The issue was that there would be a bucket (and therefore
	thread) transition, even if the reads were in close
	proximity.  When there is a thread transition, reads wind
	up going somewhat out of order, and ZFS gets confused.

	The new calculation simply tries to see whether the offsets
	are within 1 << bin_shift of each other.  If they are, the
	reads will be sent to the same thread.

	The effect of this change is that for sequential reads, if
	the client doesn't exceed the max_reqs_per_nfsd parameter
	and the bin_shift is set to a reasonable value (22, or
	4MB works well in my tests), the reads in any sequential
	stream will largely be confined to a single thread.

	Change fha_assign() so that it takes a softc argument.  It
	is now called from the individual server's shim code, which
	will pass in the softc.

	Change fhe_stats_sysctl() so that it takes a softc
	parameter.  It is now called from the individual server's
	shim code.  Add the current offset to the list of things
	printed out about each active thread.

	Change the num_reads and num_writes counters in the
	fha_hash_entry structure to 32-bit values, and rename them
	num_rw and num_exclusive, respectively, to reflect their
	changed usage.

	Add an enable sysctl and tunable that allows the user to
	disable the FHA code (when vfs.XXX.fha.enable = 0).  This
	is useful for before/after performance comparisons.

nfs_fha.h:
	Move most structure definitions out of nfs_fha.c and into
	the header file, so that the individual server shims can
	see them.

	Change the default bin_shift to 22 (4MB) instead of 18
	(256K).  Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
	Add shims for the old and new NFS servers to interface with
	the FHA code, and callbacks for the

	The shims contain all of the code and definitions that are
	specific to the NFS servers.

	They setup the server-specific callbacks and set the server
	name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
	Configure the RPC code to call fhaold_assign() instead of
	fha_assign().

sys/modules/nfsd/Makefile:
	Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
	Add nfs_fha_old.c.

Reviewed by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 21:00:22 +00:00
Gleb Smirnoff
8f779cc541 On non-ACPI i386 mp_ncpus is initialized at SI_SUB_CPU, and this
prevents us from creating UMA_ZONE_PCPU zones earlier.

As bandaid shift initialization of counter(9) zone later.

Reviewed by:		kib
Reported & tested by:	Lytochkin Boris <lytboris gmail.com>
2013-04-17 18:43:33 +00:00
Adrian Chadd
d22e3b024e Add the static kernel boot environment, needed to actually boot this thing.
(Wasting 4k just as a temporary placeholder for a boot environment seems
a bit ridiculous, but hey.)

Tested: gxemul:

$ gxemul -e malta -d i:/home/adrian/work/freebsd/svn/mfsroot-rspro.img -C 4Kc /tftpboot/kernel.MALTA
2013-04-17 18:26:01 +00:00
Gabor Kovesdan
a8b5c2a0aa - Correct spelling in comments
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:56:11 +00:00
Gabor Kovesdan
84a17a97ce - Correct mispellings of word and
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:48:46 +00:00
Gabor Kovesdan
b78540b1c7 - Correct mispellings of word resource
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2013-04-17 11:47:32 +00:00
Gabor Kovesdan
8fb3bbe770 - Corrrect mispellings of word useful
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:45:15 +00:00
Gabor Kovesdan
f0d0985ee9 - Correct mispellings of word miscellaneous
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:43:46 +00:00
Gabor Kovesdan
a2098fea6d - Correct mispellings of the word necessary
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:42:40 +00:00
Gabor Kovesdan
ab3f6b347e - Correct mispellings of the word occurrence
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:40:10 +00:00
Ivan Voras
7ceaf939d6 Link g_label_disk_ident when building geom_label as a module 2013-04-17 09:19:29 +00:00
Adrian Chadd
91046e9c5f Setup needed tables for TPC on AR5416->AR9287 chips.
* Add ah_ratesArray[] to the ar5416 HAL state - this stores the maximum
  values permissable per rate.
* Since different chip EEPROM formats store this value in a different place,
  store the HT40 power detector increment value in the ar5416 HAL state.
* Modify the target power setup code to store the maximum values in the
  ar5416 HAL state rather than using a local variable.
* Add ar5416RateToRateTable() - to convert a hardware rate code to the
  ratesArray enum / index.
* Add ar5416GetTxRatePower() - which goes through the gymnastics required
  to correctly calculate the target TX power:
  + Add the power detector increment for ht40;
  + Take the power offset into account for AR9280 and later;
  + Offset the TX power correctly when doing open-loop TX power control;
  + Enforce the per-rate maximum value allowable.

Note - setting a TPC value of 0x0 in the TX descriptor on (at least)
the AR9160 resulted in the TX power being very high indeed.  This didn't
happen on the AR9220.  I'm guessing it's a chip bug that was fixed at
some point.  So for now, just assume the AR5416/AR5418 and AR9130 are
also suspect and clamp the minimum value here at 1.

Tested:

* AR5416, AR9160, AR9220 hostap, verified using (2GHz) spectrum analyser
* Looked at target TX power in TX descriptor (using athalq) as well as TX
  power on the spectrum analyser.

TODO:

* The TX descriptor code sets the target TX power to 0 for AR9285 chips.
  I'm not yet sure why.  Disable this for TPC and ensure that the TPC
  TX power is set.
* AR9280, AR9285, AR9227, AR9287 testing!
* 5GHz testing!

Quirks:

* The per-packet TPC code is only exercised when the tpc sysctl is set
  to 1. (dev.ath.X.tpc=1.) This needs to be done before you bring the
  interface up.
* When TPC is enabled, setting the TX power doesn't end up with a call
  through to the HAL to update the maximum TX power.  So ensure that
  you set the TPC sysctl before you bring the interface up and configure
  a lower TX power or the hardware will be clamped by the lower TX
  power (at least until the next channel change.)

Thanks to Qualcomm Atheros for all the hardware, and Sam Leffler for use
of his spectrum analyser to verify the TX channel power.
2013-04-17 07:31:53 +00:00
Adrian Chadd
8b470f6f71 Use the TPC bank by default for AR9160.
Tested:

* AR9160, hostap, verified TX power using (2GHz) spectrum analyser

TODO:

* 5GHz verification!
2013-04-17 07:22:23 +00:00
Adrian Chadd
de00e5cb54 Update the rate series setup code to use the decisions already made in
ath_tx_rate_fill_rcflags().  Include setting up the TX power cap in the
rate scenario setup code being passed to the HAL.

Other things:

* add a tx power cap field in ath_rc.
* Add a three-stream flag in ath_rc.
* Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a
  global flag for the transmission.
2013-04-17 07:21:30 +00:00
Rui Paulo
d1dcd93145 Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).
2013-04-17 06:51:17 +00:00
Neel Natu
5685b37212 Correct misleading bootverbose output: ahc_isa_probe -> ahc_isa_identify 2013-04-17 02:33:56 +00:00
Pedro F. Giffuni
03836978be DTrace: Revert r249367
The following change from illumos brought caused DTrace to
pause in an interactive environment:

3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider

This was not detected during testing because it doesn't
affect scripts.

We shouldn't be changing the environment, especially since the
LD_NOLAZYLOAD option doesn't apply to our (GNU) ld.
Unfortunately the change from upstream was made in such a way
that it is very difficult to separate this change from the
others so, at least for now, it's better to just revert
everything.

Reference:
https://www.illumos.org/issues/3026

Reported by:	Navdeep Parhar and Mark Johnston
2013-04-17 02:20:17 +00:00
Ivan Voras
8e9405e8a7 Comment typo fix.
Is aware of the importance of comments: dim
2013-04-16 22:42:40 +00:00
Warner Losh
11c601447d r249408 and r249436 cause a NULL pointer dereference on the CUBIEBOARD
since it doesn't set the kernel envrionment at all. Work around this
by making sure kern_envp is non-NULL before dereferencing it.
2013-04-16 22:09:08 +00:00
Adrian Chadd
12087a0769 Use the new net80211 method to fetch the node TX power, rather than
directly referencing ni->ni_txpower.

This provides the hardware with a slightly more accurate idea of
the maximum TX power to be using.

This is part of a series to get per-packet TPC to work (better).

Tested:

* AR5416, hostap mode
2013-04-16 21:26:44 +00:00
Adrian Chadd
ebe15a7b25 Implement a utility function to return the current TX power cap for
the given node.

This takes into account the per-node cap, the ic cap and the
per-channel regulatory caps.

This is designed to replace references to ni_txpower in various net80211
drivers - ni_txpower doesn't necessarily reflect the actual cap for
the given node (eg if the node has the default value of 50dBm (100) and
the administrator has manually configured a lower TX power.)
2013-04-16 20:36:32 +00:00
John Baldwin
8916af883c - Document that sem_wait() can fail with EINTR if it is interrupted by a
signal.
- Fix the old ksem implementation for POSIX semaphores to not restart
  sem_wait() or sem_timedwait() if interrupted by a signal.

MFC after:	1 week
2013-04-16 20:26:31 +00:00
Adrian Chadd
5d4dedadb6 Use a per-RX-queue deferred list, rather than a single deferred list for
both queues.

Since ath_rx_pkt() does multi-mbuf frame recombining based on the RX queue,
this needs to occur.

Tested:

* AR9380 (XB112), hostap mode
2013-04-16 20:21:02 +00:00
Ivan Voras
9a796b22f6 Fix the buffer-overflow-fixing fixes.
Pointy-hat to: me, for not realizing snprintf() is available in kernel.
Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for
           noting that the panic can be provoked in i386 and not in amd64
2013-04-16 19:58:24 +00:00
Xin LI
f2297451fe Fix incomplete printf.
PR:		kern/177889
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:32:12 +00:00
Xin LI
c1031303f0 Don't leak lock when returning.
PR:		kern/177888
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:25:41 +00:00
Mikolaj Golub
f1fca82ed5 Add a new set of notes to a process core dump to store procstat data.
The notes format is a header of sizeof(int), which stores the size of
the corresponding data structure to provide some versioning, and data
in the format as it is returned by a related sysctl call.

The userland tools (procstat(1)) will be taught to extract this data,
providing additional info for postmortem analysis.

PR:		kern/173723
Suggested by:	jhb
Discussed with:	jhb, kib
Reviewed by:	jhb (initial version), kib
MFC after:	1 month
2013-04-16 19:19:14 +00:00
Brooks Davis
b7b63db789 Partial MFP4 of 222836:
Only look for FDT partitions if our potential parent is a DISK device.

Excluding direct recursion on the flashmap geoms was insufficient
because it did not prevent the underlying device from being retrieved if
flashmap geoms were further partitioned.

Reviewed by:	imp
Sponsored by:	DARPA, AFRL
2013-04-16 17:47:13 +00:00
Tijl Coosemans
6c81895dab Fix build after r249543. 2013-04-16 16:59:29 +00:00
Warner Losh
dd65664bbc Point to regdef.h. May need to dig up references to the N32 standard
that support this usage (which may be a bit rough, since different
parts of the standard say mutually contradictory things).
2013-04-16 16:54:37 +00:00
Rick Macklem
64fa8df6e0 Allow the vnode to be unlocked for the weird case of
LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a
usecount on a vnode during NFSv4 recovery from an
expired lease.

Reported and tested by:	pho
MFC after:	2 weeks
2013-04-16 14:22:16 +00:00
Andrey V. Elsukov
4ff7c740fe Fix accounting after the r249528, also add several another counters to
the statistics.
2013-04-16 11:31:26 +00:00
Andrey V. Elsukov
eca4d72003 Use IP6S_M2MMAX macro. 2013-04-16 11:19:13 +00:00
Andrey V. Elsukov
43851aae9a Replace hardcoded numbers. 2013-04-16 11:12:58 +00:00
Konstantin Belousov
44d95698ba Some compilers issue a warning when wider integer is casted to narrow
pointer.  Supposedly shut down the warning by casting through
uintptr_t.

Reported by:	ian
2013-04-16 07:11:52 +00:00
Andrey V. Elsukov
e7a87117d3 The source address selection algorithm tries to apply several rules
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.
2013-04-15 21:02:40 +00:00
Warner Losh
e4be3d3ddd Fix N32/N64 register saving by ensuring that all registers resolve
to unique values.

There's some confusion about what the n32 assembler API really is
(since on page 9 of the spec they say that t0-t3 don't exist, then
turn around on page 22 and say that t4-t7 don't exist), and this
doesn't touch that.

NetBSD's version of this file follows the convention I used here, and
is likely to be correct.

This should fix gdb/ptrace.
2013-04-15 19:32:14 +00:00
Adrian Chadd
978c5ce568 Now that the register definitions are in -HEAD, enable this. 2013-04-15 17:59:06 +00:00
Adrian Chadd
a04110a3b6 Bring over some AR9271 register definitions from the QCA HAL.
Obtained from:	Qualcomm Atheros
2013-04-15 17:58:11 +00:00
George V. Neville-Neil
8f2ba63493 Point args[0] not at the thread that is ending but at the one that
is starting.  This is in line with practice in OpenSolaris.

Note that this change is only in ULE and not in the 4BSD scheduler.
Once this change settles in (MFC timeout has expired) we'll try it out
on 4BSD as well.

PR:		177706
Submitted by:	Tiwei Bie
MFC after:	1 month
2013-04-15 17:21:02 +00:00
Jack F Vogel
386c110e3c Corrections to the RX checksum code, make sure its disabled as
well as enabled when necessary. And simplify the checksum routine
itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing
out the problems and code suggestions.
2013-04-15 17:01:42 +00:00
Ivan Voras
c072011223 Introduce glabel labels based on GEOM ident attributes. In this initial
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.

Discussed with:	trasz
Approved by:	silence from freebsd-geom@
2013-04-15 16:09:24 +00:00
Ivan Voras
252c094e53 Introduce a symbol for the GEOM class name instead of using the ad-hoc string
constant.
2013-04-15 15:55:40 +00:00
Gleb Smirnoff
b64478a137 Switch lagg(4) statistics to counter(9).
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by:	Nginx, Inc.
2013-04-15 13:00:42 +00:00
Luigi Rizzo
aa76317cfc fix a bug in the computation of the userspace offset for a give netmap buffer.
Submitted by: Hugh Nhan
2013-04-15 11:49:16 +00:00
Alan Cox
a08f2cf69e Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we have previously created a level zero
interior node at the root of every non-empty trie, even when that node is
not strictly necessary, i.e., it has only one child.  This change is the
second (and final) step in eliminating those unnecessary level zero interior
nodes.  Specifically, it updates the deletion and insertion functions so
that they do not require a level zero interior node at the root of the trie.
For a "buildworld" workload, this change results in a 16.8% reduction in the
number of interior nodes allocated and a similar reduction in the average
execution time for lookup functions.  For example, the average execution
time for a call to vm_radix_lookup_ge() is reduced by 22.9%.

Reviewed by:	attilio, jeff (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-04-15 06:12:00 +00:00
Mikolaj Golub
5ea21e6904 Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv()
to be able to reuse the code.

MFC after:	3 weeks
2013-04-14 20:03:48 +00:00
Mikolaj Golub
fe52cf5475 Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(),
and kern_proc_vmmap_out() functions to output process kinfo structures
to sbuf, to make the code reusable.

The functions are going to be used in the coredump routine to store
procstat info in the core program header notes.

Reviewed by:	kib
MFC after:	3 weeks
2013-04-14 20:01:36 +00:00
Mikolaj Golub
bd3902134c Re-factor coredump routines. For each type of notes an output
function is provided, which is used either to calculate the note size
or output it to sbuf.  On the first pass the notes are registered in a
list and the resulting size is found, on the second pass the list is
traversed outputing notes to sbuf.  For the sbuf a drain routine is
provided that writes data to a core file.

The main goal of the change is to make coredump to write notes
directly to the core file, without preliminary preparing them all in a
memory buffer.  Storing notes in memory is not a problem for the
current, rather small, set of notes we write to the core, but it may
becomes an issue when we start to store procstat notes.

Reviewed by:	jhb (initial version), kib
Discussed with:	jhb, kib
MFC after:	3 weeks
2013-04-14 19:59:38 +00:00
Warner Losh
8fa3a54014 Print MB and GB instead of MiB and GiB mislabeled as MB and GB.
SD cards are sold in GB not GiB, this will result in less confusion.
Also, cache parent device pointer to save a few cycles for loops.
2013-04-14 19:21:43 +00:00
Alexander Motin
1268d4813e Remove some more pieces of multilevel freeze mechanism, missed in r249466. 2013-04-14 18:09:08 +00:00
Mateusz Guzik
db8f33fd32 Add fdallocn function and use it when passing fds over unix socket.
This gets rid of "unp_externalize fdalloc failed" panic.

Reviewed by:	pjd
MFC after:	1 week
2013-04-14 17:08:34 +00:00
Konstantin Belousov
5c818b67a4 Usnure that PCI bus BIS_GET_DMA_TAG() method sees the actual PCI
device which makes the request for dma tag, instead of some descendant
of the PCI device, by creating a pass-through trampoline for vga_pci
and ata_pci buses.

Sponsored by:	The FreeBSD Foundation
Suggested by:	jhb
Discussed with:	jhb, mav
MFC after:	1 week
2013-04-14 14:02:34 +00:00
Alexander Motin
d442caf633 Remove owner field from struct cam_ed, unused at least since FreeBSD 7. 2013-04-14 10:14:26 +00:00
Alexander Motin
e5dfa058da MFprojects/camlock r248982:
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning.  NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph".  If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
2013-04-14 09:55:48 +00:00
Alexander Motin
cccf422080 MFprojects/camlock r248890, r248897, r248898, r248900, r248903, r248905,
r248917, r248918, r248978, r249001, r249014, r249030:

Remove multilevel freezing mechanism, implemented to handle specifics of
the ATA/SATA error recovery, when post-reset recovery commands should be
allocated when queues are already full of payload requests.  Instead of
removing frozen CCBs with specified range of priorities from the queue
to provide free openings, use simple hack, allowing explicit CCBs over-
allocation for requests with priority higher (numerically lower) then
CAM_PRIORITY_OOB threshold.

Simplify CCB allocation logic by removing SIM-level allocation queue.
After that SIM-level queue manages only CCBs execution, while allocation
logic is localized within each single device.

Suggested by:	gibbs
2013-04-14 09:28:14 +00:00
Hiren Panchasara
0fcdd1b528 Fixing a clang warning indicating uninitialized variable usage.
PR:	kern/177164
Approved by:	sbruno (mentor)
2013-04-14 02:42:40 +00:00
Hiren Panchasara
c851725506 Improve/correct a comment. We now support a lot more cpu types.
PR:	kern/177496
Approved by:	sbruno (mentor)
2013-04-14 02:26:12 +00:00
Neel Natu
3565b59ec0 Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX
hardware capabilities.

Obtained from:	NetApp
2013-04-13 21:41:51 +00:00
Dimitry Andric
27e644a80b Fix undefined behaviour in several gpio_pin_setflags() routines (under
sys/arm and sys/mips), squelching the clang 3.3 warnings about this.

Noticed by:	tinderbox and many irate spectators
Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
PR:		kern/177759
MFC after:	3 days
2013-04-13 21:21:13 +00:00
John-Mark Gurney
d7078f3ba0 move the error report to a lower log level... Now you can see when it
returns an error without getting every single io that went through it..

MFC after:	1 week
2013-04-13 19:02:58 +00:00
Konstantin Belousov
fcb29b9210 Fix the name of the pcb member in the comments.
Submitted by:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	3 days
2013-04-13 15:20:33 +00:00
Alexander Motin
a4f17f083f MFprojects/camlock r248894:
Use full freeze while PMP does hard reset. This is only cosmetical change.
2013-04-13 14:03:44 +00:00
Jayachandran C.
f46206c270 Fix changes made in r249408.
In some cases, kern_envp is set by the architecture code and env_pos does
not contain the length of the static kernel environment. In these cases
r249408 causes the kernel to discard the environment.

Fix this by updating the check for empty static env to *kern_envp != '\0'

Reported by:	np@
2013-04-13 07:23:37 +00:00
Neel Natu
26d66b9d58 Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from:	NetApp
2013-04-13 05:11:21 +00:00
Jung-uk Kim
933c7bc907 Unbreak tinderbox build after r249420. 2013-04-12 23:10:56 +00:00
Ryan Stone
c83c365008 Cosmetic change: make a comment reference Sandy Bridge *Xeon*
Reviewed by:	sbruno
MFC after:	1 week
2013-04-12 20:43:14 +00:00
Alan Cox
6f9c0b15bb Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we always create a level zero interior node at
the root of every non-empty trie, even when that node is not strictly
necessary, i.e., it has only one child.  This change is the first step in
eliminating those unnecessary level zero interior nodes.  Specifically, it
updates all of the lookup functions so that they do not require a level zero
interior node at the root.

Reviewed by:	attilio, jeff (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-04-12 20:21:28 +00:00
Jim Harris
5076698e19 Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by:	Intel
2013-04-12 17:56:47 +00:00
Jim Harris
7c3f19d7bb Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by:	Intel
2013-04-12 17:52:17 +00:00
Jim Harris
ca269f32ef Move the busdma mapping functions to nvme_qpair.c.
This removes nvme_uio.c completely.

Sponsored by:	Intel
2013-04-12 17:48:45 +00:00
Jim Harris
611060cab5 Remove the NVMe-specific physio and associated routines.
These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio.  Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.

Sponsored by:	Intel
2013-04-12 17:44:55 +00:00
Jim Harris
97fafe2580 Add a mutex to each namespace, for general locking operations on the namespace.
Sponsored by:	Intel
2013-04-12 17:41:24 +00:00
Jim Harris
a90b810492 Rename the controller's fail_req_lock, so that it can be used for other
locking operations on the controller.

Sponsored by:	Intel
2013-04-12 17:36:48 +00:00