Commit Graph

97139 Commits

Author SHA1 Message Date
smh
20b03102b8 Updated TRIM calculations in cam/ata to be based off ATA_DSM_* defines
Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:59:19 +00:00
smh
7cdb7369d5 Added the ability to send ATA identify and Data Set Management (DSM) TRIM
commands to an ATA device attached via a SCSI control.

sys/cam/scsi/scsi_all.c:
        - Added scsi_ata_identify, scsi_ata_trim
          Which use ATA Pass-Through to send commands to the attached disk.

sys/cam/scsi/scsi_all.h:
        - Added defines for all missing ATA Pass-Through commands values.

        - Added scsi_ata_identify, scsi_ata_trim methods used in ATA TRIM
          support.

        - Added scsi_vpd_logical_block_prov structure used when querying for
          the supported sizes UNMAP commands.

        - Added scsi_vpd_block_limits structure used when querying for the
          supported sizes of the UNMAP command.

Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:53:22 +00:00
smh
f3c66bac10 Added Dataset Management defines to be used by TRIM in cam ata and scsi to
calculate the size of blocks.

Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:46:09 +00:00
smh
5e3cfe527c Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum
size of a delete request sent to the providing device performed by g_dev_ioctl.

This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.

Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.

In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E

For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s

Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:43:24 +00:00
smh
bb465a0ea2 Removed unneeded tests in dadeletemethodset changing it to return void
Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 15:31:52 +00:00
glebius
9efad16075 Add usie to LINT. 2013-04-26 13:03:22 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
smh
61d529d085 Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled
Enabled ZFS TRIM by default

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-26 11:24:20 +00:00
imp
d75579076e Octeon 2 (6xxx) and newer CPUs don't use the clock CPU speed for its
I/O clock. Thankfully, the simple executive provies a way to querry
the proper clock that works on all models. Move to asking for the SCLK
via this interface.

This gets the serial console working after we start init and open the
console and set the divisor (which turned the output from good to
bad). I can login on the console now.
2013-04-26 05:42:35 +00:00
jhibbits
94794a6904 Remove a comment that shouldn't have gone in.
X-MFC-with:	r249864
2013-04-26 05:18:18 +00:00
sbruno
508d67d2f7 In the case where the controller supports an sg_list LESS than our predefined
and tuned value, we would advertise the unsupported value to CAM and it would
merrily destroy the controller with way too many IO operations.

This manifests itself in a Zero Memory RAID configuration for a P410 and
possibly other controllers.

Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-25 23:10:34 +00:00
glebius
8b3c8983bc Fix couple of mbuf leaks in incoming ARP processing. 2013-04-25 17:38:04 +00:00
imp
c1ef354a95 Minor whitespace nit 2013-04-25 17:27:13 +00:00
imp
a3c707ad59 Use the offsets from pcb.h rather than regnum.h to store the registers
in the pcb. setjmp/longjmp in the kernel also used these values, so
continue to use them although their use isn't technically the pcb
register array (matching is all that's important for setjmp/longjmp in
the kernel). Finally, eliminate the old register names from regnum.h.

This is a lexical change only. The non-debug .o files have the same md5.
2013-04-25 17:23:54 +00:00
smh
2f83e51ae7 Adds Host Protected Area (HPA) support for ATA disks to camcontrol
Reviewed by:	mav
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-04-25 14:11:38 +00:00
glebius
dd30d87e8d Introduce a pointer to const variable gw, which points either at the
same place as dst, or to the sockaddr in the routing table.

The const constraint of gw makes us safe from modifing routing table
accidentially. And "onstantness" of dst allows us to remove several
bandaids, when we switched it back at &ro->ro_dst, now it always
points there.

Reviewed by:	rrs
2013-04-25 12:42:09 +00:00
imp
26718d463f Make it possible to include this file in assembler .S sources. 2013-04-25 06:29:23 +00:00
imp
c75545d741 Use the defines from pcb.h over the ones from regnum.h for this 'C'
code. In theory, the ones from regnum.h should be used only for
assembler code.
2013-04-25 06:28:19 +00:00
grehan
d46f135643 Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.

Hit by the OpenBSD local-apic code.

Submitted by:	neel
Reviewed by:	grehan
Obtained from:	NetApp
2013-04-25 04:56:43 +00:00
jhibbits
58ae34c700 Introduce kernel coredumps to ppc32 AIM. Leeched from the booke code.
MFC after:	2 weeks
2013-04-25 00:39:43 +00:00
mm
e2d108deac MFV r249857:
Merge vendor bugfix for a possible deadlock related to async destroy
and improve write performance by introducing a new lock protecting
tx_open_txg.

Illumos ZFS issues:
  3642 dsl_scan_active() should not issue I/O to determine if async
       destroying is active
  3643 txg_delay should not hold the tc_lock

MFC after:	1 week
2013-04-24 21:21:03 +00:00
mav
df40a0e7ef Move hptmv and mpt drivers shutdown a bit later to the SHUTDOWN_PRI_LAST
stage of shutdown_post_sync.  That should allow CAM to do final cache flush
at the SHUTDOWN_PRI_DEFAULT without using polling magic.

MFC after:	3 days
2013-04-24 19:00:45 +00:00
rrs
d90354b019 This fixes the issue with the "randomly changing" default
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does *not* do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.

The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.

PR:	 174749,157796
MFC after:	1 week
2013-04-24 18:30:32 +00:00
imp
4094a2fb73 Fix N32/N64 ABIs to use proper registers after recent changes.
Pointy Hat to: imp
2013-04-24 18:00:28 +00:00
dim
b2f606714a When rebooting (exiting) from the BTX loader, make sure to restore the
GDT from the correct segment, otherwise a triple fault would be caused.
In some virtual environments (VMware, VirtualBox, etc) this could lead
to a unhandled error or hang in the guest emulation software.

Thanks to avg and jhb for a few hints in the right direction.

Noticed by:	Jeremy Chadwick <jdc@koitsu.org> (and many others)
MFC after:	1 week
2013-04-24 17:20:45 +00:00
hselasky
0d3ea3b043 Fix for duplicate sample rate detection after recent patches. 2013-04-24 16:52:03 +00:00
hselasky
956702e671 Fix the USB audio feedback endpoint algorithm. There should not
be any need to bias the returned value.

Reported by:	Craig Leres <leres@ee.lbl.gov>
2013-04-24 16:22:53 +00:00
andre
87ef00d48c Base the calculation of maxmbufmem in part on kmem_map size
instead of kernel_map size to prevent kernel memory exhaustion
by mbufs and a subsequent panic on physical page allocation
failure.

On architectures without a direct map all mbuf memory (except
for jumbo mbufs larger than PAGE_SIZE) comes from kmem_map.
It is the limiting factor hence.

For architectures with a direct map using the size of kmem_map
is a good proxy of available kernel memory as well.  If it is
much smaller the mbuf limit may be sub-optimal but remains
reasonable, while avoiding panics under exhaustion.

The overall mbuf memory limit calculation may be reconsidered
again later, however due to the many different mbuf sizes and
different backing KVM maps it is a tricky subject.

Found by:	pho's new network stress test
Pointed out by:	alc (kmem_map instead of kernel_map)
Tested by:	pho
2013-04-24 13:54:55 +00:00
ae
28d7b5b903 Remove unused variable.
MFC after:	1 week
2013-04-24 10:24:01 +00:00
hselasky
f1df77425c Fix playback for Focusrite Scarlett 2i2 USB recording interface.
Submitted by:	Ed Maste, emaste @
2013-04-24 06:05:33 +00:00
rpaulo
c1d3a15bba Handle the IRQ for the reset button. 2013-04-24 01:36:35 +00:00
rpaulo
b6adde6e03 wiigpio depends on options WII. 2013-04-24 01:20:10 +00:00
jkim
0f4836399d Fix white spaces. 2013-04-23 18:30:33 +00:00
sbruno
f92131578a Return a lun count of 1 and a lun id of 0 when CAM attempts a REPORT_LUNS
command on a disk device.  This quieseces some noise on the console that
recently appeared.

Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-23 18:29:51 +00:00
eadler
b95fa82671 Revert r249800 as
- it is incorrect:  In the 'back' case you want to reuse the previous
	 mbuf.
	- it was not reviewed by wireless@

Requested by:	jhb, adrian
2013-04-23 16:33:25 +00:00
andre
1f1bd98319 When doing RFC3042 limited transmit on the first on second
duplicate ACK make sure we actually have new data to send.
This prevents us from sending unneccessary pure ACKs.

Reported by:	Matt Miller <matt@matthewjmiller.net>
Tested by:	Matt Miller <matt@matthewjmiller.net>
MFC after:	2 weeks
2013-04-23 14:06:32 +00:00
eadler
84dd0c66c9 Add support for Intel C600/X79 Series Chipset KT Controller.
PR:		kern/177072
Submitted by:	Kurt Lidl <lidl@pix.net>
2013-04-23 13:03:08 +00:00
eadler
377a0c2118 Avoid warning about uninitalized variable
PR:		kern/176712
Submitted by:	Hiren Panchasara <hiren.panchasara@gmail.com> (earlier vesion)
Approved by:	cperciva (mentor)
2013-04-23 13:02:57 +00:00
eadler
508d56a8d7 Remove always-true conditions from if statement.
PR:		kern/176712
Submitted by:	Hiren Panchasara <hiren.panchasara@gmail.com>
Approved by:	cperciva (mentor)
2013-04-23 13:02:55 +00:00
eadler
5d0f1a63d8 Make temp, temp1 the same type that they will later be used for.
PR:		kern/176712
Submitted by:	Hiren Panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jmg (earlier version)
Approved by:	cperciva (mentor)
2013-04-23 13:02:51 +00:00
eadler
ea8f9acbe0 Remove tautological compare.
PR:		kern/176712
Submitted by:	Hiren Panchasara <hiren.panchasara@gmail.com>
Approved by:	cperciva (mentor)
2013-04-23 13:02:48 +00:00
hselasky
75e877a897 Add support for runtime switching of sample rate for
USB audio devices. Previously the highest sample
rate was unconditionally selected.

Requested by:	Craig Leres <leres@ee.lbl.gov>
2013-04-23 10:48:14 +00:00
hselasky
17ea3ba57c Add convenience wrapper functions to run callbacks in the context of the
USB explore thread.
2013-04-23 10:42:15 +00:00
imp
59d26c37a6 Add an option for the GE FES based packet engines. Its board IDs
overlap with the standard ones, so kernels for this family of boards
need the option OCTEON_VENDOR_GEFES.
2013-04-23 09:40:42 +00:00
imp
b6d3001ad9 Update trapframe to be consistent with the changes made to regnum.h. This
should fix the booting problems people have been seeing.
2013-04-23 09:38:18 +00:00
mm
e05adac25b The zfs synctask code restructuring introduced a new bug that makes it
impossible to set quota and reservation on pools lower than version 22.
Problem has been reported and a solution discussed with vendor.

Illumos ZFS issues:
  3739 cannot set zfs quota or reservation on pool version < 22

Reviewed by:	Matthew Ahrens <mahrens@delphix.com>
Reported by:	Steve Wills <swills@FreeBSD.org>
MFC after:	3 days
2013-04-23 06:28:35 +00:00
hselasky
f9d1400e69 Add descriptive comment. 2013-04-23 06:26:54 +00:00
brooks
9ffb368a3b MFP4 223084, 227821:
Partially implement generic_bs_*_8() for MIPS platforms.

This is known to work with TARGET_ARCH=mips64 with FreeBSD/BERI.
Assuming that other definitions in cpufunc.h are correct it will
work on non-o64 ABI systems except sibyte. On sibyte and o32 systems
generic_bs_*_8() will remain panic() implementations.

Sponsored by:	DARPA, AFRL
Reviewed by:	imp, jmallett (older versions)
2013-04-22 19:02:37 +00:00
adrian
59d305acc5 Update arswitch to the new API. 2013-04-22 18:58:12 +00:00
gonzo
6460865378 Split BeagleBone DTS to generic AM335x part and Beagle-bone specific 2013-04-22 18:53:36 +00:00
jhb
d02e1b3646 - Some BIOSes use an Extended IRQ resource descriptor in _PRS for a link
that uses non-ISA IRQs but use a plain IRQ resource in _CRS.  However,
  a non-ISA IRQ can't fit into a plain IRQ resource.  If we encounter a
  link like this, build the resource buffer from _PRS instead of _CRS.
- Set the correct size of the end tag in a resource buffer.

Tested by:	Benjamin Lee <ben@b1c1l1.com>
MFC after:	2 weeks
2013-04-22 15:51:06 +00:00
nyan
3a6405f672 Build uart_dev_lpc.c on arm only. This fixes pc98 build. 2013-04-22 13:02:41 +00:00
glebius
18dd370b59 Panic if UMA_ZONE_PCPU is created at early stages of boot, when mp_ncpus
isn't yet initialized. Otherwise we will panic at first allocation later.

Sponsored by:	Nginx, Inc.
2013-04-22 09:02:23 +00:00
dmarion
3594d490ac Initialize GIC_PMRR register on ARM GIC.
Provided by: Thomas Skibo
2013-04-22 08:28:53 +00:00
adrian
38a1025029 Convert over the etherswitch framework to use VLAN IDs per port, rather
than VLAN groups.

Some chips (eg this rtl8366rb) has a VLAN group per port - you first
define a set of VLANs in a vlan group, then you assign a VLAN group
to a port.

Other chips (eg the AR8xxx switch chips) have a VLAN ID array per
port - there's no group per se, just a list of vlans that can be
configured.

So for now, the switch API will use the latter and rely on drivers
doing the heavy lifting if one wishes to use the VLAN group method.
Maybe later on both can be supported.

PR:		kern/177878
PR:		kern/177873
Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
Reviewed by:	ray
2013-04-22 05:52:18 +00:00
alc
78339bf7f3 Simplify vm_radix_{add,dec}lev().
Sponsored by:	EMC / Isilon Storage Division
2013-04-22 01:26:13 +00:00
oleg
9917da6df0 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
hselasky
993a273ad6 Add OHCI controller ID.
MFC after:	2 weeks
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2013-04-21 16:02:50 +00:00
tijl
1e5834db92 Remove redundant definitions of _ALIGN and _ALIGNBYTES. 2013-04-21 11:12:44 +00:00
ae
77bba67eef Since we didn't break the loop, we should set i to -1 to start from the
beginning.

Submitted by:	Steven Hartland
MFC after:	1 week
2013-04-21 09:10:35 +00:00
rpaulo
d3e64b0dcc Fix an off by one calculation in wiipic_dispatch(). 2013-04-21 08:35:38 +00:00
adrian
4501275813 When doing BAW tracking, don't dereference a NULL pointer if the BAW
slot is actually NULL.
2013-04-21 00:41:15 +00:00
adrian
d11d65f442 There's some races (likely in the BAR handling, sigh) which is causing
the pause/resume code to not be called completely symmetrically.

I'll chase down the root cause of that soon; this at least works around
the bug and tells me when it happens.
2013-04-20 22:46:31 +00:00
ken
125013a878 Fix compilation.
Pointy hat to:	ken
2013-04-20 14:33:55 +00:00
sbruno
0a05b03b3d Expose CAM_BOOT_DELAY as a kernel conf item now.
This allows users who boot without loader to adjust their environments
around slightly buggy or slow hardware.

PR:	kern/161809
Submitted by:	rozhuk.im@gmail.com
MFC after:	2 weeks
2013-04-20 00:33:37 +00:00
jkim
1d7102aa1b Merge ACPICA 20130418. 2013-04-19 23:49:34 +00:00
adrian
91a4366773 Initialise the chainmask fields regardless of whether 11n support
is compiled in or not.

This fixes issues with people running -HEAD but who build modules
without doing a "make buildkernel KERNCONF=XXX", thus picking up
opt_*.h.  The resulting module wouldn't have 11n enabled and the
chainmask configuration would just be plain wrong.
2013-04-19 21:49:11 +00:00
luigi
82a36fa8a5 mostly whitespace changes:
- remove vestiges of the old memory allocator
- clean up some comments
2013-04-19 21:08:21 +00:00
ken
9a543047eb Update chio(1) and ch(4) to support reporting element designators.
This allows mapping a tape drive in a changer (as reported by
'chio status') to a sa(4) driver instance by comparing the
serial numbers.

The designators can be ASCII (which is printed out directly), binary
(which is printed in hex format) or UTF-8, which is printed in either
native UTF-8 format if the terminal can support it, or in %XX notation
for non-ASCII characters.  Thanks to Hiroki Sato <hrs@> for the
explaining UTF-8 printing and example UTF-8 printing code.

chio.h:		Modify the changer_element_status structure to add new
		fields and definitions from the SMC3r16 spec.

		Rename the original CHIOGSTATUS ioctl to OCHIOGTATUS and
		define a new CHIOGSTATUS ioctl.

		Clean up some tab/space issues.

chio.c: 	For the 'status' subcommand, print the designator field
		if it is supplied by a device.

scsi_ch.h:	Add new flags for DVCID and CURDATA to the READ
		ELEMENT STATUS command structure.

		Add a read_element_status_device_id structure
		for the data fields in the new standard. Add new
		unions, dt_or_obsolete and voltage_devid, to hold
		and address data from either SCSI-2 or newer devices.

scsi_ch.c:	Implement support for fetching device IDs with READ
		ELEMENT STATUS data.

		Add new arguments to scsi_read_element_status() to
		allow the user to request the DVCID and CURDATA bits.
		This isn't compiled into libcam (it's only an internal
		kernel interface), so we don't need any special
		handling for the API change.

		If the user issues the new CHIOGSTATUS ioctl, copy all of
		the available element status data out.  If he issues the
		OCHIOGSTATUS ioctl, we don't copy the new fields in the
		structure.

		Fix a bug in chopen() that would result in the peripheral
		never getting unheld if chgetparams() failed.

Sponsored by:	Spectra Logic
Submitted by:	Po-Li Soong
MFC After:	1 week
2013-04-19 20:03:51 +00:00
adrian
d39e4f5f3a Implement a very basic multi-PHY aware switch device.
This is intended to be used as a stop-gap for switch devices
which expose multiple ethernet PHYs but we don't have a driver
for - here, etherswitchcfg and the general switch configuration
API can be used to interface to said PHYs.

Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
2013-04-19 17:50:38 +00:00
jh
f9bfc0cc7f Include PID in the error message which is printed when the maxproc limit
is exceeded. Improve formatting of the message while here.

PR:		kern/60550
Submitted by:	Lowell Gilbert, bde
2013-04-19 15:19:29 +00:00
glebius
b991db2beb Don't compare unsigned socklen_t against < 0.
Reviewed by:	jhb
2013-04-19 13:40:13 +00:00
jilles
be7967ddcc sem: Restart the POSIX sem_* calls after signals with SA_RESTART set.
Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.

The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.

The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.

It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.

Discussed with:	jhb
MFC after:	1 week
2013-04-19 10:16:00 +00:00
adrian
8a41f6ac5b Add a debug statement to log the currently chosen chainmask configuration. 2013-04-19 08:06:45 +00:00
adrian
d5899b9830 .. don't know how this snuck into this commit. Sorry.
Fix compile build before anyone notices.
2013-04-19 08:01:34 +00:00
adrian
1bfefbcd66 Print out the chainmask configuration. 2013-04-19 07:56:22 +00:00
adrian
420de95094 Use uint32_t for fields that are fetched via ath_hal_getcapability(). 2013-04-19 06:59:10 +00:00
jhibbits
4a3cd971e2 Fix the uart(4) module build. Without uart_dev_lpc the module cannot be loaded. 2013-04-19 05:46:16 +00:00
ache
60f7807df1 Attempt to mitigate poor initialization of arc4 by one-shot
reinitialization from yarrow right after good entropy is harvested.

Approved by:    secteam (delphij)
MFC after:      1 week
2013-04-19 00:30:52 +00:00
rmacklem
1110825468 When an NFS unmount occurs, once vflush() writes the last dirty
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
2013-04-18 23:20:16 +00:00
np
dbe0865208 cxgbe(4): Refuse to install T5 firmwares on a T4 card (and vice versa).
MFC after:	1 week
2013-04-18 22:54:41 +00:00
oleg
6ff116b3ca Recover missing arp_ifinit() call.
MFC after:	2 weeks
2013-04-18 20:13:33 +00:00
np
e140f34c92 cxgbe/tom: Update the CLIP table on the chip when there are changes
to the list of IPv6 addresses on the system.  The table is used for
TOE+IPv6 only.
2013-04-18 19:52:11 +00:00
mav
5055fe41fa Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot.  It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.

Reviewed by:	jimharris, kib
MFC after:	1 month
2013-04-18 17:07:04 +00:00
rmacklem
ef3a1169e5 Both NFS clients can deadlock when using the "rdirplus" mount
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.

Reported and tested by:	jhb
Reviewed by:	jhb
MFC after:	2 weeks
2013-04-18 13:09:04 +00:00
mav
1142f0c37e Make siis(4) and mvs(4) send bus_get_dma_tag() requests to parent buses
passing real bus' child pointers instead of grandchilds.

Requested by:	kib
2013-04-18 12:43:06 +00:00
rpaulo
c04b376f02 Move the previously added CPUID7 macros to CPUID_STDEXT. 2013-04-18 07:09:27 +00:00
alc
aaf865752d When calculating the number of reserved nodes, discount the pages that will
be used to store the nodes.

Sponsored by:	EMC / Isilon Storage Division
2013-04-18 05:34:33 +00:00
rpaulo
7b5712a5b5 Add the most current CPUID7_* definitions. 2013-04-18 01:30:08 +00:00
rpaulo
ba95edba6f Print RDSEED, ADX, and SMAP.
Pointed out by:	kib
2013-04-18 01:21:44 +00:00
ken
1ab8c6ab7f Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to
sys/nfs, since it is now shared by the two NFS servers.

Suggested by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 22:42:43 +00:00
hiren
64d828dacb Improving r249461 by providing a better way to handle the clang warning.
PR:		kern/177164
Reviewed by:	jhb
Approved by:	sbruno (mentor)
2013-04-17 21:21:27 +00:00
ken
fc3fc9c036 Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes).  It does not currently work for NFSv4 requests.  They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately.  Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads.  This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered.  Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS.  Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once.  ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking.  This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables.  The read offset window calculation has been slightly
modified as well.  Instead of having separate bins, each file
handle has a rolling window of bin_shift size.  This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
	Add nfs_fha_new.c and nfs_fha_old.c.  Compile nfs_fha.c
	when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
	Bring in changes from Rick Macklem to newnfs_realign that
	allow it to operate in blocking (M_WAITOK) or non-blocking
	(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
	Bring in a change from Rick Macklem to allow telling
	nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
	Bring in changes from Rick Macklem to create a new
	nfsm_dissect_nonblock() inline function and
	NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
	Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
	Setup the new NFS server's RPC thread pool so that it will
	call the FHA code.

	Add the malloc flag argument to newnfs_realign().

	Unstaticize newnfs_nfsv3_procid[] so that we can use it in
	the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
	In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
	that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
	In nfsd_fhtovp(), if we're starting a write, check to see
	whether the underlying filesystem supports shared writes.
	If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
	Remove all code that is specific to the NFS server
	implementation.  Anything that is server-specific is now
	accessed through a callback supplied by that server's FHA
	shim in the new softc.

	There are now separate sysctls and tunables for the FHA
	implementations for the old and new NFS servers.  The new
	NFS server has its tunables under vfs.nfsd.fha, the old
	NFS server's tunables are under vfs.nfsrv.fha as before.

	In fha_extract_info(), use callouts for all server-specific
	code.  Getting file handles and offsets is now done in the
	individual server's shim module.

	In fha_hash_entry_choose_thread(), change the way we decide
	whether two reads are in proximity to each other.
	Previously, the calculation was a simple shift operation to
	see whether the offsets were in the same power of 2 bucket.
	The issue was that there would be a bucket (and therefore
	thread) transition, even if the reads were in close
	proximity.  When there is a thread transition, reads wind
	up going somewhat out of order, and ZFS gets confused.

	The new calculation simply tries to see whether the offsets
	are within 1 << bin_shift of each other.  If they are, the
	reads will be sent to the same thread.

	The effect of this change is that for sequential reads, if
	the client doesn't exceed the max_reqs_per_nfsd parameter
	and the bin_shift is set to a reasonable value (22, or
	4MB works well in my tests), the reads in any sequential
	stream will largely be confined to a single thread.

	Change fha_assign() so that it takes a softc argument.  It
	is now called from the individual server's shim code, which
	will pass in the softc.

	Change fhe_stats_sysctl() so that it takes a softc
	parameter.  It is now called from the individual server's
	shim code.  Add the current offset to the list of things
	printed out about each active thread.

	Change the num_reads and num_writes counters in the
	fha_hash_entry structure to 32-bit values, and rename them
	num_rw and num_exclusive, respectively, to reflect their
	changed usage.

	Add an enable sysctl and tunable that allows the user to
	disable the FHA code (when vfs.XXX.fha.enable = 0).  This
	is useful for before/after performance comparisons.

nfs_fha.h:
	Move most structure definitions out of nfs_fha.c and into
	the header file, so that the individual server shims can
	see them.

	Change the default bin_shift to 22 (4MB) instead of 18
	(256K).  Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
	Add shims for the old and new NFS servers to interface with
	the FHA code, and callbacks for the

	The shims contain all of the code and definitions that are
	specific to the NFS servers.

	They setup the server-specific callbacks and set the server
	name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
	Configure the RPC code to call fhaold_assign() instead of
	fha_assign().

sys/modules/nfsd/Makefile:
	Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
	Add nfs_fha_old.c.

Reviewed by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 21:00:22 +00:00
glebius
a4662e151c On non-ACPI i386 mp_ncpus is initialized at SI_SUB_CPU, and this
prevents us from creating UMA_ZONE_PCPU zones earlier.

As bandaid shift initialization of counter(9) zone later.

Reviewed by:		kib
Reported & tested by:	Lytochkin Boris <lytboris gmail.com>
2013-04-17 18:43:33 +00:00
adrian
c70adaa3e0 Add the static kernel boot environment, needed to actually boot this thing.
(Wasting 4k just as a temporary placeholder for a boot environment seems
a bit ridiculous, but hey.)

Tested: gxemul:

$ gxemul -e malta -d i:/home/adrian/work/freebsd/svn/mfsroot-rspro.img -C 4Kc /tftpboot/kernel.MALTA
2013-04-17 18:26:01 +00:00
gabor
5635c6550b - Correct spelling in comments
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:56:11 +00:00
gabor
a1c987c247 - Correct mispellings of word and
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:48:46 +00:00
gabor
301f6461b7 - Correct mispellings of word resource
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2013-04-17 11:47:32 +00:00
gabor
188c638b60 - Corrrect mispellings of word useful
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:45:15 +00:00
gabor
3d6a89082d - Correct mispellings of word miscellaneous
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:43:46 +00:00
gabor
b86fa940aa - Correct mispellings of the word necessary
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:42:40 +00:00
gabor
d3ee8e3ff6 - Correct mispellings of the word occurrence
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:40:10 +00:00
ivoras
534bc4664d Link g_label_disk_ident when building geom_label as a module 2013-04-17 09:19:29 +00:00
adrian
94252a362d Setup needed tables for TPC on AR5416->AR9287 chips.
* Add ah_ratesArray[] to the ar5416 HAL state - this stores the maximum
  values permissable per rate.
* Since different chip EEPROM formats store this value in a different place,
  store the HT40 power detector increment value in the ar5416 HAL state.
* Modify the target power setup code to store the maximum values in the
  ar5416 HAL state rather than using a local variable.
* Add ar5416RateToRateTable() - to convert a hardware rate code to the
  ratesArray enum / index.
* Add ar5416GetTxRatePower() - which goes through the gymnastics required
  to correctly calculate the target TX power:
  + Add the power detector increment for ht40;
  + Take the power offset into account for AR9280 and later;
  + Offset the TX power correctly when doing open-loop TX power control;
  + Enforce the per-rate maximum value allowable.

Note - setting a TPC value of 0x0 in the TX descriptor on (at least)
the AR9160 resulted in the TX power being very high indeed.  This didn't
happen on the AR9220.  I'm guessing it's a chip bug that was fixed at
some point.  So for now, just assume the AR5416/AR5418 and AR9130 are
also suspect and clamp the minimum value here at 1.

Tested:

* AR5416, AR9160, AR9220 hostap, verified using (2GHz) spectrum analyser
* Looked at target TX power in TX descriptor (using athalq) as well as TX
  power on the spectrum analyser.

TODO:

* The TX descriptor code sets the target TX power to 0 for AR9285 chips.
  I'm not yet sure why.  Disable this for TPC and ensure that the TPC
  TX power is set.
* AR9280, AR9285, AR9227, AR9287 testing!
* 5GHz testing!

Quirks:

* The per-packet TPC code is only exercised when the tpc sysctl is set
  to 1. (dev.ath.X.tpc=1.) This needs to be done before you bring the
  interface up.
* When TPC is enabled, setting the TX power doesn't end up with a call
  through to the HAL to update the maximum TX power.  So ensure that
  you set the TPC sysctl before you bring the interface up and configure
  a lower TX power or the hardware will be clamped by the lower TX
  power (at least until the next channel change.)

Thanks to Qualcomm Atheros for all the hardware, and Sam Leffler for use
of his spectrum analyser to verify the TX channel power.
2013-04-17 07:31:53 +00:00
adrian
004d764bab Use the TPC bank by default for AR9160.
Tested:

* AR9160, hostap, verified TX power using (2GHz) spectrum analyser

TODO:

* 5GHz verification!
2013-04-17 07:22:23 +00:00
adrian
1840d6b004 Update the rate series setup code to use the decisions already made in
ath_tx_rate_fill_rcflags().  Include setting up the TX power cap in the
rate scenario setup code being passed to the HAL.

Other things:

* add a tx power cap field in ath_rc.
* Add a three-stream flag in ath_rc.
* Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a
  global flag for the transmission.
2013-04-17 07:21:30 +00:00
rpaulo
1f4363857d Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).
2013-04-17 06:51:17 +00:00
neel
a9b96237dc Correct misleading bootverbose output: ahc_isa_probe -> ahc_isa_identify 2013-04-17 02:33:56 +00:00
pfg
f9796e10e2 DTrace: Revert r249367
The following change from illumos brought caused DTrace to
pause in an interactive environment:

3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider

This was not detected during testing because it doesn't
affect scripts.

We shouldn't be changing the environment, especially since the
LD_NOLAZYLOAD option doesn't apply to our (GNU) ld.
Unfortunately the change from upstream was made in such a way
that it is very difficult to separate this change from the
others so, at least for now, it's better to just revert
everything.

Reference:
https://www.illumos.org/issues/3026

Reported by:	Navdeep Parhar and Mark Johnston
2013-04-17 02:20:17 +00:00
ivoras
1028220198 Comment typo fix.
Is aware of the importance of comments: dim
2013-04-16 22:42:40 +00:00
imp
761f4524d4 r249408 and r249436 cause a NULL pointer dereference on the CUBIEBOARD
since it doesn't set the kernel envrionment at all. Work around this
by making sure kern_envp is non-NULL before dereferencing it.
2013-04-16 22:09:08 +00:00
adrian
6e01debb46 Use the new net80211 method to fetch the node TX power, rather than
directly referencing ni->ni_txpower.

This provides the hardware with a slightly more accurate idea of
the maximum TX power to be using.

This is part of a series to get per-packet TPC to work (better).

Tested:

* AR5416, hostap mode
2013-04-16 21:26:44 +00:00
adrian
0128a675f3 Implement a utility function to return the current TX power cap for
the given node.

This takes into account the per-node cap, the ic cap and the
per-channel regulatory caps.

This is designed to replace references to ni_txpower in various net80211
drivers - ni_txpower doesn't necessarily reflect the actual cap for
the given node (eg if the node has the default value of 50dBm (100) and
the administrator has manually configured a lower TX power.)
2013-04-16 20:36:32 +00:00
jhb
0ed1bc2e92 - Document that sem_wait() can fail with EINTR if it is interrupted by a
signal.
- Fix the old ksem implementation for POSIX semaphores to not restart
  sem_wait() or sem_timedwait() if interrupted by a signal.

MFC after:	1 week
2013-04-16 20:26:31 +00:00
adrian
2f4d81f093 Use a per-RX-queue deferred list, rather than a single deferred list for
both queues.

Since ath_rx_pkt() does multi-mbuf frame recombining based on the RX queue,
this needs to occur.

Tested:

* AR9380 (XB112), hostap mode
2013-04-16 20:21:02 +00:00
ivoras
b7d8771305 Fix the buffer-overflow-fixing fixes.
Pointy-hat to: me, for not realizing snprintf() is available in kernel.
Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for
           noting that the panic can be provoked in i386 and not in amd64
2013-04-16 19:58:24 +00:00
delphij
8809aa4451 Fix incomplete printf.
PR:		kern/177889
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:32:12 +00:00
delphij
54d1fdc9a1 Don't leak lock when returning.
PR:		kern/177888
Submitted by:	Sven-Thorsten Dietrich <sven vyatta com>
MFC after:	1 week
2013-04-16 19:25:41 +00:00
trociny
f172830e71 Add a new set of notes to a process core dump to store procstat data.
The notes format is a header of sizeof(int), which stores the size of
the corresponding data structure to provide some versioning, and data
in the format as it is returned by a related sysctl call.

The userland tools (procstat(1)) will be taught to extract this data,
providing additional info for postmortem analysis.

PR:		kern/173723
Suggested by:	jhb
Discussed with:	jhb, kib
Reviewed by:	jhb (initial version), kib
MFC after:	1 month
2013-04-16 19:19:14 +00:00
brooks
d7ad3cab59 Partial MFP4 of 222836:
Only look for FDT partitions if our potential parent is a DISK device.

Excluding direct recursion on the flashmap geoms was insufficient
because it did not prevent the underlying device from being retrieved if
flashmap geoms were further partitioned.

Reviewed by:	imp
Sponsored by:	DARPA, AFRL
2013-04-16 17:47:13 +00:00
tijl
40254de0a6 Fix build after r249543. 2013-04-16 16:59:29 +00:00
imp
59249e7e5b Point to regdef.h. May need to dig up references to the N32 standard
that support this usage (which may be a bit rough, since different
parts of the standard say mutually contradictory things).
2013-04-16 16:54:37 +00:00
rmacklem
90b89365fd Allow the vnode to be unlocked for the weird case of
LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a
usecount on a vnode during NFSv4 recovery from an
expired lease.

Reported and tested by:	pho
MFC after:	2 weeks
2013-04-16 14:22:16 +00:00
ae
586b63d9f3 Fix accounting after the r249528, also add several another counters to
the statistics.
2013-04-16 11:31:26 +00:00
ae
bb1dffc2b9 Use IP6S_M2MMAX macro. 2013-04-16 11:19:13 +00:00
ae
e7b578dd8b Replace hardcoded numbers. 2013-04-16 11:12:58 +00:00
kib
789ce3de65 Some compilers issue a warning when wider integer is casted to narrow
pointer.  Supposedly shut down the warning by casting through
uintptr_t.

Reported by:	ian
2013-04-16 07:11:52 +00:00
ae
dec8b563fa The source address selection algorithm tries to apply several rules
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.
2013-04-15 21:02:40 +00:00
imp
927d79346e Fix N32/N64 register saving by ensuring that all registers resolve
to unique values.

There's some confusion about what the n32 assembler API really is
(since on page 9 of the spec they say that t0-t3 don't exist, then
turn around on page 22 and say that t4-t7 don't exist), and this
doesn't touch that.

NetBSD's version of this file follows the convention I used here, and
is likely to be correct.

This should fix gdb/ptrace.
2013-04-15 19:32:14 +00:00
adrian
520a733473 Now that the register definitions are in -HEAD, enable this. 2013-04-15 17:59:06 +00:00
adrian
6f9077d752 Bring over some AR9271 register definitions from the QCA HAL.
Obtained from:	Qualcomm Atheros
2013-04-15 17:58:11 +00:00
gnn
59c782b807 Point args[0] not at the thread that is ending but at the one that
is starting.  This is in line with practice in OpenSolaris.

Note that this change is only in ULE and not in the 4BSD scheduler.
Once this change settles in (MFC timeout has expired) we'll try it out
on 4BSD as well.

PR:		177706
Submitted by:	Tiwei Bie
MFC after:	1 month
2013-04-15 17:21:02 +00:00
jfv
489cecec20 Corrections to the RX checksum code, make sure its disabled as
well as enabled when necessary. And simplify the checksum routine
itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing
out the problems and code suggestions.
2013-04-15 17:01:42 +00:00
ivoras
255c9a6acb Introduce glabel labels based on GEOM ident attributes. In this initial
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.

Discussed with:	trasz
Approved by:	silence from freebsd-geom@
2013-04-15 16:09:24 +00:00
ivoras
4893ec0b76 Introduce a symbol for the GEOM class name instead of using the ad-hoc string
constant.
2013-04-15 15:55:40 +00:00
glebius
d9c22bdbc9 Switch lagg(4) statistics to counter(9).
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by:	Nginx, Inc.
2013-04-15 13:00:42 +00:00
luigi
11d1f4dcaa fix a bug in the computation of the userspace offset for a give netmap buffer.
Submitted by: Hugh Nhan
2013-04-15 11:49:16 +00:00
alc
2ce0362e96 Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we have previously created a level zero
interior node at the root of every non-empty trie, even when that node is
not strictly necessary, i.e., it has only one child.  This change is the
second (and final) step in eliminating those unnecessary level zero interior
nodes.  Specifically, it updates the deletion and insertion functions so
that they do not require a level zero interior node at the root of the trie.
For a "buildworld" workload, this change results in a 16.8% reduction in the
number of interior nodes allocated and a similar reduction in the average
execution time for lookup functions.  For example, the average execution
time for a call to vm_radix_lookup_ge() is reduced by 22.9%.

Reviewed by:	attilio, jeff (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-04-15 06:12:00 +00:00
trociny
4bb922ab39 Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv()
to be able to reuse the code.

MFC after:	3 weeks
2013-04-14 20:03:48 +00:00
trociny
335f3dbd91 Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(),
and kern_proc_vmmap_out() functions to output process kinfo structures
to sbuf, to make the code reusable.

The functions are going to be used in the coredump routine to store
procstat info in the core program header notes.

Reviewed by:	kib
MFC after:	3 weeks
2013-04-14 20:01:36 +00:00
trociny
99603a0f21 Re-factor coredump routines. For each type of notes an output
function is provided, which is used either to calculate the note size
or output it to sbuf.  On the first pass the notes are registered in a
list and the resulting size is found, on the second pass the list is
traversed outputing notes to sbuf.  For the sbuf a drain routine is
provided that writes data to a core file.

The main goal of the change is to make coredump to write notes
directly to the core file, without preliminary preparing them all in a
memory buffer.  Storing notes in memory is not a problem for the
current, rather small, set of notes we write to the core, but it may
becomes an issue when we start to store procstat notes.

Reviewed by:	jhb (initial version), kib
Discussed with:	jhb, kib
MFC after:	3 weeks
2013-04-14 19:59:38 +00:00
imp
0b437872bc Print MB and GB instead of MiB and GiB mislabeled as MB and GB.
SD cards are sold in GB not GiB, this will result in less confusion.
Also, cache parent device pointer to save a few cycles for loops.
2013-04-14 19:21:43 +00:00
mav
ce96561fb6 Remove some more pieces of multilevel freeze mechanism, missed in r249466. 2013-04-14 18:09:08 +00:00
mjg
1798a915c4 Add fdallocn function and use it when passing fds over unix socket.
This gets rid of "unp_externalize fdalloc failed" panic.

Reviewed by:	pjd
MFC after:	1 week
2013-04-14 17:08:34 +00:00
kib
0749739009 Usnure that PCI bus BIS_GET_DMA_TAG() method sees the actual PCI
device which makes the request for dma tag, instead of some descendant
of the PCI device, by creating a pass-through trampoline for vga_pci
and ata_pci buses.

Sponsored by:	The FreeBSD Foundation
Suggested by:	jhb
Discussed with:	jhb, mav
MFC after:	1 week
2013-04-14 14:02:34 +00:00
mav
dbf6d138be Remove owner field from struct cam_ed, unused at least since FreeBSD 7. 2013-04-14 10:14:26 +00:00
mav
3d32e6b10c MFprojects/camlock r248982:
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning.  NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph".  If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
2013-04-14 09:55:48 +00:00
mav
f73c311ca3 MFprojects/camlock r248890, r248897, r248898, r248900, r248903, r248905,
r248917, r248918, r248978, r249001, r249014, r249030:

Remove multilevel freezing mechanism, implemented to handle specifics of
the ATA/SATA error recovery, when post-reset recovery commands should be
allocated when queues are already full of payload requests.  Instead of
removing frozen CCBs with specified range of priorities from the queue
to provide free openings, use simple hack, allowing explicit CCBs over-
allocation for requests with priority higher (numerically lower) then
CAM_PRIORITY_OOB threshold.

Simplify CCB allocation logic by removing SIM-level allocation queue.
After that SIM-level queue manages only CCBs execution, while allocation
logic is localized within each single device.

Suggested by:	gibbs
2013-04-14 09:28:14 +00:00
hiren
12c6ad60c2 Fixing a clang warning indicating uninitialized variable usage.
PR:	kern/177164
Approved by:	sbruno (mentor)
2013-04-14 02:42:40 +00:00
hiren
67955559b1 Improve/correct a comment. We now support a lot more cpu types.
PR:	kern/177496
Approved by:	sbruno (mentor)
2013-04-14 02:26:12 +00:00
neel
df29079506 Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX
hardware capabilities.

Obtained from:	NetApp
2013-04-13 21:41:51 +00:00
dim
6a5be05861 Fix undefined behaviour in several gpio_pin_setflags() routines (under
sys/arm and sys/mips), squelching the clang 3.3 warnings about this.

Noticed by:	tinderbox and many irate spectators
Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
PR:		kern/177759
MFC after:	3 days
2013-04-13 21:21:13 +00:00
jmg
b9f17d62db move the error report to a lower log level... Now you can see when it
returns an error without getting every single io that went through it..

MFC after:	1 week
2013-04-13 19:02:58 +00:00
kib
7117978a81 Fix the name of the pcb member in the comments.
Submitted by:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	3 days
2013-04-13 15:20:33 +00:00
mav
7ae4e14b0e MFprojects/camlock r248894:
Use full freeze while PMP does hard reset. This is only cosmetical change.
2013-04-13 14:03:44 +00:00
jchandra
33ed81146c Fix changes made in r249408.
In some cases, kern_envp is set by the architecture code and env_pos does
not contain the length of the static kernel environment. In these cases
r249408 causes the kernel to discard the environment.

Fix this by updating the check for empty static env to *kern_envp != '\0'

Reported by:	np@
2013-04-13 07:23:37 +00:00
neel
f80ac5cabf Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from:	NetApp
2013-04-13 05:11:21 +00:00
jkim
8a52e20a1d Unbreak tinderbox build after r249420. 2013-04-12 23:10:56 +00:00
rstone
3343598ba8 Cosmetic change: make a comment reference Sandy Bridge *Xeon*
Reviewed by:	sbruno
MFC after:	1 week
2013-04-12 20:43:14 +00:00
alc
565184245d Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we always create a level zero interior node at
the root of every non-empty trie, even when that node is not strictly
necessary, i.e., it has only one child.  This change is the first step in
eliminating those unnecessary level zero interior nodes.  Specifically, it
updates all of the lookup functions so that they do not require a level zero
interior node at the root.

Reviewed by:	attilio, jeff (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-04-12 20:21:28 +00:00
jimharris
c0e542217e Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by:	Intel
2013-04-12 17:56:47 +00:00
jimharris
eee11f2f3d Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by:	Intel
2013-04-12 17:52:17 +00:00
jimharris
72eb2cf9e3 Move the busdma mapping functions to nvme_qpair.c.
This removes nvme_uio.c completely.

Sponsored by:	Intel
2013-04-12 17:48:45 +00:00
jimharris
3b35b1fc99 Remove the NVMe-specific physio and associated routines.
These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio.  Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.

Sponsored by:	Intel
2013-04-12 17:44:55 +00:00
jimharris
f877ea431a Add a mutex to each namespace, for general locking operations on the namespace.
Sponsored by:	Intel
2013-04-12 17:41:24 +00:00
jimharris
d3af7eb2bf Rename the controller's fail_req_lock, so that it can be used for other
locking operations on the controller.

Sponsored by:	Intel
2013-04-12 17:36:48 +00:00
jimharris
92ebbf5a66 Do not panic when a busdma mapping operation fails.
Instead, print an error message and fail the associated command with
DATA_TRANSFER_ERROR NVMe completion status.

Sponsored by:	Intel
2013-04-12 17:34:49 +00:00
jchandra
ad2a2fb3b7 Move MIPS_MAX_TLB_ENTRIES definition from cpuregs.h to tlb.c
Having MIPS_MAX_TLB_ENTRIES defined to 128 is misleading, since it used
to be 64 in older releases of MIPS architecture (where it could be read
from Config1) and can be much more than 128 for the newer processors.

For now, move the definition to the only file using it (mips/mips/tlb.c)
and define MIPS_MAX_TLB_ENTRIES depending on the MIPS cpu defined. Also
add few checks so that we do not write beyond the end of the tlb_state
array.

This fixes a kernel data corruption seen in Netlogic XLP, which was casued
by tlb_save() writing beyond the end of tlb_state array when breaking into
debugger.
2013-04-12 17:22:12 +00:00
ae
d657004b7f Reflect removing of the counter_u64_subtract() function in the macro. 2013-04-12 16:29:15 +00:00
trasz
80b8b2f779 Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by:	ken
Sponsored by:	FreeBSD Foundation
2013-04-12 16:25:03 +00:00
jchandra
06443169d2 Fix incorrect KASSERTs in xlpge
Fix for crash in Netlogic XLP network accelerator driver when invariants
are enabled - use correct the condition for KASSERT.
2013-04-12 16:03:22 +00:00
jchandra
b9f386539e Fix kenv behavior when there is no static environment
In case where there are no static kernel environment entries, the
function init_dynamic_kenv() adds an incorrect entry at position 0 of
the dynamic kernel environment. This in turn causes kenv(1) to print
and empty list even though there are dynamic entries added later.

Fix this by checking env_pos in init_dynamic_kenv() and adding dynamic
entries only if there are static entries.
2013-04-12 15:58:53 +00:00
bz
ffb752b726 isa_if.h is indirectly included.
Depend on it to unbreak pc98 builds.
2013-04-12 13:56:21 +00:00
glebius
06416daa71 Attempt to clean up spacing and long lines. 2013-04-12 08:52:19 +00:00
ae
cd45f7487f Free memory after deleting an address policy entry.
MFC after:	1 week
2013-04-12 07:59:54 +00:00
neel
cbd874121f If vmm.ko could not be initialized correctly then prevent the creation of
virtual machines subsequently.

Submitted by:	Chris Torek
2013-04-12 01:16:52 +00:00
np
255ff35d8e Add pciids of the T5 based cards. The ones that I haven't tested with
cxgbe(4) are disabled for now.  This will change.

MFC after:	2 weeks
2013-04-11 23:40:05 +00:00
np
bdeb98927c Cosmetic change (s/wrwc/wcwr/;s/WRWC/WCWR/).
MFC after:	3 days.
2013-04-11 22:49:29 +00:00
np
35e58ff9d5 Auto-reduce the holdoff timers that are greater than the maximum value
allowed by the hardware.

MFC after:	3 days
2013-04-11 22:46:39 +00:00
bz
862ce015c1 Generate a LINT for powerpc and for powerpc64.
Discussed with:	nwhitehorn
2013-04-11 22:18:20 +00:00
adrian
2de9f3a455 Always enable TXOK interrupts when setting up TX queues for EDMA NICs. 2013-04-11 22:02:35 +00:00
np
8cfaf1f711 cxgbe/tom: Slight simplification of code that calculates options2.
MFC after:	3 days
2013-04-11 21:36:01 +00:00
np
fb2d6a2d06 Get rid of a couple of stray \n's.
MFC after:	3 days.
2013-04-11 21:17:49 +00:00
np
ebbc873044 There is no need for elaborate queries and error checking when trying to
set FW4MSG_ENCAP.

MFC after:	3 days
2013-04-11 21:15:35 +00:00
trociny
156185aa6e Add sbuf_start_section() and sbuf_end_section() functions, which can
be used for automatic section alignment.

Discussed with:	kib
Reviewed by:	kib
MFC after:	1 month
2013-04-11 19:49:18 +00:00
np
e4426c1dc8 - Explain clearly why a different firmware is being installed (if/when
it is being installed).  Improve other error messages while here.

- Select special FPGA specific configuration profile when appropriate.

MFC after:	3 days
2013-04-11 19:39:40 +00:00
glebius
d07f9bea4e Fix tcp_output() so that tcpcb is updated in the same manner when an
mbuf allocation fails, as in a case when ip_output() returns error.

To achieve that, move large block of code that updates tcpcb below
the out: label.

This fixes a panic, that requires the following sequence to happen:

1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss
2) The retransmit timeout happened for the SYN we had sent,
   tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output().
   In tcp_output m_get() fails.
3) Later on the SYN|ACK for the SYN sent in step 1) came,
   tcp_input sets tp->snd_una += 1, which leads to
   tp->snd_una > tp->snd_nxt inconsistency, that later panics in
   socket buffer code.

For reference, this bug fixed in DragonflyBSD repo:

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419

Reviewed by:	andre
Tested by:	pho
Sponsored by:	Nginx, Inc.
PR:		kern/177456
Submitted by:	HouYeFei&XiBoLiu <lglion718 163.com>
2013-04-11 18:23:56 +00:00
np
f4aa2790e6 cxgbe(4): Ensure that the MOD_LOAD handler runs before either t4nex or
t5nex attach to their devices.

MFC after:	3 days
2013-04-11 17:50:50 +00:00
pfg
2554e81a37 DTrace: option for time-ordered output
Merge changes from illumos:

3021 option for time-ordered output from dtrace(1M)
3022 DTrace: keys should not affect the sort order when sorting by value
3023 it should be possible to dereference dynamic variables
3024 D integer narrowing needs some work
3025 register leak in D code generation
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider

This brings yet another feature implemented in upstream DTrace.
A complete description is available here:
http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/

This change bumps the DT_VERS_* number to 1.9.1 in
accordance to what is done in illumos.

This change was somewhat complicated because upstream is mixed many
changes in an individual commit and some of the tests don't really
apply to us.

There are also appear to be differences in timestamping with Solaris
so we had to workaround some assertions making sure no regression
happened.

Special thanks to Fabian Keil for changes and testing.

Illumos Revisions:	13758:23432da34147

Reference:
https://www.illumos.org/issues/3021
https://www.illumos.org/issues/3022
https://www.illumos.org/issues/3023
https://www.illumos.org/issues/3024
https://www.illumos.org/issues/3025
https://www.illumos.org/issues/1694

Tested by:	Fabian Keil
Obtained from:	Illumos
MFC after:	1 months
2013-04-11 16:24:36 +00:00
mm
dc37eae42d MFV r249354:
Merge bugfixes accepted and integrated by vendor. Underlying problems
have been reported by us and fixed in r240942 and r249196.

Illumos ZFS issues:
  3645 dmu_send_impl: possibilty of pool hold leak
  3692 Panic on zfs receive of a recursive deduplicated stream

MFC after:	8 days
2013-04-11 07:40:30 +00:00
mav
4a4cddb38e Do not sent 120 TEST UNIT READY requests on generic NOT READY statuses.
Some failing disks tend to return vendor-specific ASC/ASCQ codes with
NOT READY sense key.  It caused extremely long recovery attempts, repeating
these 120 TURs (it takes at least 1 minute) for every I/O request.
Instead of that use default error handling, doing just few retries.

Reviewed by:	ken, gibbs
MFC after:	1 month
2013-04-11 06:34:41 +00:00
neel
ea145a8678 Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with:	Chris Torek
2013-04-11 04:29:45 +00:00
sbruno
981c785388 While investigating a p/r I noted that the camcontrol devlist output for
volumes behind a ciss(4) controller were being reported with malformeed
names and identifiers.

Repair that reporting by using the CAM values for the three SCSI indents
reported via camcontrol devlist

PR:	kern/171650
Reviewed by:	scottl
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-10 23:31:19 +00:00
sbruno
182d1a1c67 options DPT_HANDLE_TIMEOUTS hasn't worked since dpt(4) was converted to CAM
somewhere around svn r39402 to r39234.

I don't know of anyone who really wants to test these changes, but they
only remove the deprecated code in question.  This shreds the driver down a
bit and *removes* options from the kernel configs.

These don't appear to be referenced in the man page, so no need to check it
there.

PR:		kern/44587
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-04-10 23:20:09 +00:00
ken
2e36122e63 Add a callback to the ada(4) driver so that it knows when GEOM has released
references to it.

This is the functional equivalent to change r237518, which added this
functionality to the cd(4) and da(4) drivers.

This fix prevents a panic caused by GEOM calling adaopen() while the device
is going away.  We now keep the device around until GEOM has finished
cleaning up its state.

ata_da.c:	In adaregister(), add a d_gone callback to the GEOM disk
		structure registered for the ada driver.  Increment the
		peripheral reference count for GEOM.

		Add a new callback, adadiskgonecb(), that GEOM calls when
		it is done with its resources.  This callback releases the
		reference acquired in adaregister().

Submitted by:	Po-Li Soong
Sponsored by:	Spectra Logic
MFC After:	5 days
2013-04-10 22:12:21 +00:00
mav
695c50ff24 Create controller-level DMA tag, handling range of supported addresses.
That simplifies logic for channels and gives the bus information about what
device actually allocated the tag.

Submitted by:	jhb@
2013-04-10 20:38:15 +00:00
jfv
39a8419a58 Simplify allocate_legacy code, txr pointer was breaking LEGACY compile,
thanks to Nick Rogers for pointing this out.
2013-04-10 17:51:39 +00:00
mav
d03d8a4ecf Add ID for ASMedia ASM1042 USB 3.0 controller.
MFC after:	1 week
2013-04-10 17:43:20 +00:00
glebius
42e90ceb67 Since UMA_ZONE_PCPU zones put a constraint on sizeof(struct pcpu), declared
as CTASSERT in MI pcpu.h, stop including all possible mutually exclusive
PCPU_MD_FIELDS fields into LINT kernels, due to brekaing
aforementioned CTASSERT.
2013-04-10 16:09:45 +00:00
glebius
e79bb9704b Fix build. 2013-04-10 08:09:25 +00:00