Commit Graph

111592 Commits

Author SHA1 Message Date
markj
e38d62e90d Prevent cv_waiters wraparound.
r282971 attempted to fix this problem by decrementing cv_waiters after
waking up from sleeping on a condition variable, but this can result in
a use-after-free if the CV is freed before all woken threads have had a
chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and
have cv_signal() explicitly check for sleeping threads once cv_waiters has
reached this bound.

Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4822
2016-01-09 01:56:46 +00:00
allanjude
eeaf13194e Only call init_zfs_bootenv() when the system was booted with ZFS
Add a few other safeguards to ensure things do not break when the
boot device cannot be determined

Reported by:	flo
MFC after:	3 days
Sponsored by:	ScaleEngine Inc.
2016-01-09 00:54:08 +00:00
bdrewery
a041c29bfd Update dependencies.
Sponsored by:	EMC / Isilon Storage Division
2016-01-09 00:43:11 +00:00
glebius
aaa09777e1 New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates:	Netflix global launch!
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
Relnotes:	yes
2016-01-08 20:34:57 +00:00
emaste
c6e5c9fcdd Reduce libstand Makefile duplication
Userboot's copy of the libstand Makefile had more extensive changes
compared to the one in sys/boot/libstand32, but it turns out these are
not intentional and we can just include lib/libstand/Makefile as done
for libstand32 in r293040.

Reviewed by:	imp, jhb
Tested by:	allanjude
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D4793
2016-01-08 19:12:26 +00:00
glebius
e25e77f91d Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.

Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
2016-01-08 19:03:20 +00:00
emaste
368544f672 Avoid unintended $FreeBSD$ expansion in generate-fat.sh 2016-01-08 17:33:34 +00:00
glebius
088235535d Revert r293405: it breaks socket buffer INVARIANTS when sending control
data over local sockets.
2016-01-08 17:27:23 +00:00
emaste
274aad9883 Add safety belt for boot1.efi file size
Reviewed by:	smh
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D4833
2016-01-08 16:37:22 +00:00
melifaro
108f5b92be Do more fine-grained locking in rtrequest1_fib().
Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643.
Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better
  locking for RTM_ADD / RTM_DELETE cases.
2016-01-08 16:25:11 +00:00
smh
72bd55ca76 Update generated efi boot image templates
r279533 increased the boot1 size from 64k to 128k but didn't regenerate the
fat templates, hence the change was never activated.

With recent and upcoming changes the efi boot1 binary is now > 64k.

To avoid fat corruption in the created boot images regenerate the
templates to activate the boot1 size increase.

MFC after:	2 weeks
X-MFC-With:	r293268
2016-01-08 13:58:36 +00:00
hselasky
d1f61053c3 LinuxKPI style changes:
- Properly prefix internal functions with "linux_" instead of only a
  single underscore to avoid future namespace collisions.
- Make some functions global instead of inline to ease debugging and
  to avoid unnecessary code duplication.
- Remove no longer existing kthread_create() function's prototype.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-08 10:04:19 +00:00
allanjude
3b77b5c8be Add support for ZFS Boot Environments to userboot (for bhyve and others)
While here, also fix a possible null pointer

Reported by:	lattera
MFC after:	3 days
Sponsored by:	ScaleEngine Inc.
2016-01-08 05:09:55 +00:00
glebius
a4cad9f2ef For SOCK_STREAM socket use sbappendstream() instead of sbappend(). 2016-01-08 01:16:03 +00:00
cem
66aa33e3b5 ioat(4): Add ioat_acquire_reserve() KPI
ioat_acquire_reserve() is an extended version of ioat_acquire().  It
allows users to reserve space in the channel for some number of
descriptors.  If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 23:02:15 +00:00
pfg
a32f535abc ext2fs: reading mmaped file in Ext4 causes panic
Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap().

Patch by Damjan Jovanovic.

PR:		205938
MFC after:	1 week
2016-01-07 21:43:43 +00:00
jimharris
5fa22c55c2 ismt: fix ISMT_DESC_ADDR_RW macro
Submitted by:	Masanobu SAITOH <msaitoh@netbsd.org>
MFC after:	3 days
2016-01-07 21:16:44 +00:00
bdrewery
a3fad80f1c Allow libnv to be built externally using GCC.
GCC does not define _VA_LIST_DECLARED.  It defines _VA_LIST_ and others.
This was causing the prototype to not be defined and leading to an error
later due to using nvlist_add_stringv(3) without a prototype in
nvlist_add_stringf(3).

This uses the same check as other va_list prototypes in the original
change in r279438.
2016-01-07 20:52:35 +00:00
jimharris
d613b647e0 nvme: replace NVME_CEILING macro with howmany()
Suggested by:	rpokala
MFC after:	3 days
2016-01-07 20:35:26 +00:00
jimharris
94f3dfd067 nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue.  This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues.  Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 20:32:04 +00:00
kib
b96b9a6148 Convert sys/cam to use make_dev_s().
Reviewed by:	hps, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D4746
2016-01-07 20:22:55 +00:00
kib
eb437d36bf Convert tty common code to use make_dev_s().
Tty.c was untypical in that it handled the si_drv1 issue consistently
and correctly, by always checking for si_drv1 being non-NULL and
sleeping if NULL.  The removed code also illustrated unneeded
complications in drivers which are eliminated by the use of new KPI.

Reviewed by:	hps, jhb
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D4746
2016-01-07 20:15:09 +00:00
kib
3277da17a1 Provide yet another KPI for cdev creation, make_dev_s(9).
Immediate problem fixed by the new KPI is the long-standing race
between device creation and assignments to cdev->si_drv1 and
cdev->si_drv2, which allows the window where cdevsw methods might be
called with si_drv1,2 fields not yet set.  Devices typically checked
for NULL and returned spurious errors to usermode, and often left some
methods unchecked.

The new function interface is designed to be extensible, which should
allow to add more features to make_dev_s(9) without inventing yet
another name for function to create devices, while maintaining KPI and
even KBI backward-compatibility.

Reviewed by:	hps, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D4746
2016-01-07 20:08:02 +00:00
bdrewery
d6b33ca987 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	EMC / Isilon Storage Division
2016-01-07 19:58:23 +00:00
emaste
2029b75c0e Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
bdrewery
36d4fab170 Don't install /usr/include/stand.h twice after r293040.
Only install it from lib/libstand.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 19:19:23 +00:00
avos
252aa9ecdf net80211 drivers: fix ieee80211_init_channels() usage
Fix out-of-bounds read (all) / write (11n capable) for drivers
that are using ieee80211_init_channels() to initialize channel list.

Tested with:
 * RTL8188EU, STA mode.
 * RTL8188CUS, STA mode.
 * WUSB54GC, HOSTAP mode.

Approved by:	adrian (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4818
2016-01-07 18:41:03 +00:00
sbruno
f0fdf5da87 Fix VF handling of VLANs.
This helps immensily with our ability to operate in the Amazon Cloud.

Discussed on Intel Networking Community call this morning.

Submitted by:	Jarrod Petz(petz@nisshoko.net)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4788
2016-01-07 18:34:56 +00:00
sbruno
f7dfde9b82 Fixup SFP module insertion on the 82599 when insertion happens after
the system is booted and running.

Add PHY detection logic to ixgbe_handle_mod() and add locking to
ixgbe_handle_msf() as well.

PR:		150251
Submitted by:	aboyer@averesystems.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3188
2016-01-07 17:02:34 +00:00
sbruno
205ae998a7 Disable the reuse of checksum offload context descriptors in the case
of multiple queues in em(4).  Document errata in the code.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3995
2016-01-07 16:48:47 +00:00
sbruno
a53ba5573c Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

Split rxbuffer and txbuffer apart to support the new RX descriptor format
structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the
new behavior changes.

Add a RSSKEYLEN macro for help in generating the RSSKEY data structures
in the card.

Change em_receive_checksum() to process the new rxdescriptor format
status bit.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3447
2016-01-07 16:42:48 +00:00
sbruno
1e5582b9f1 Wow, um ... sorry about that. The commit log for this code should have
read that it was for EM_MULTIQUEUE.  Revert this and try again.
2016-01-07 16:24:18 +00:00
sbruno
7d0f7e9a46 Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3447
2016-01-07 16:20:55 +00:00
jimharris
fbca74bfe1 nvme: do not revert o single I/O queue when per-CPU queues not possible
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core.  This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:18:32 +00:00
jimharris
0c3ac2f3b1 nvme: break out interrupt setup code into a separate function
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:12:42 +00:00
jimharris
35417c6f67 nvme: do not pre-allocate MSI-X IRQ resources
The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:11:31 +00:00
jimharris
6f7e8e5393 nvme: remove per_cpu_io_queues from struct nvme_controller
Instead just use num_io_queues to make this determination.

This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:09:56 +00:00
jimharris
5fd9219620 nvme: simplify some of the nested ifs in interrupt setup code
This prepares for some follow-up commits which do more work in
this area.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:08:04 +00:00
jimharris
d038cc247e nvd: submit bios directly when BIO_ORDERED not set or in flight
This significantly improves parallelism in the most common case.
The taskqueue is still used whenever BIO_ORDERED bios are in flight.

This patch is based heavily on a patch from gallatin@.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:06:23 +00:00
jimharris
17732bdc52 nvd: break out submission logic into separate function
This enables a future patch using this same logic to submit
I/O directly bypassing the taskqueue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:59:51 +00:00
jimharris
e08430cb29 nvd: skip BIO_ORDERED logic when bio fails submission
This ensures the bio flags are not read after biodone().
The ordering will still be enforced, after the bio is
submitted successfully.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:58:44 +00:00
jimharris
ede43877c5 nvd: do not wait for previous bios before submitting ordered bio
Still wait until all in-flight bios (including the ordered bio)
complete before processing more bios from the queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:57:17 +00:00
jimharris
73d6d4451b nvd: set DISKFLAG_DIRECT_COMPLETION
Submitted by:	gallatin
MFC after:	3 days
2016-01-07 15:55:41 +00:00
skra
d1805666fa Print curpmap in "show pcpu" command.
Approved by:	kib (mentor)
2016-01-07 12:31:49 +00:00
melifaro
eb981fd41f Do not use 'struct route_in6' inside hash6_insert().
rin6 was used only as sockaddr_in6 storage. Make rtalloc1_fib()
  use on-stack sin6 and return rtenry directly, instead of doing
  useless work with 'struct route_in6'.
2016-01-07 12:22:29 +00:00
jtl
a6b350eb35 Apply the changes from r293284 to one additional file.
Discussed with:	glebius
2016-01-07 11:54:20 +00:00
melifaro
76a48b4688 Convert pf(4) to the new routing API.
Differential Revision:	https://reviews.freebsd.org/D4763
2016-01-07 10:20:03 +00:00
hselasky
3d1ba99d7b Remove unused file. 2016-01-07 09:40:19 +00:00
melifaro
92f0683530 Convert cxgb/cxgbe to the new routing API.
Discussed with:		np
2016-01-07 08:07:17 +00:00
allanjude
19cde117cf Make additional parts of sys/geom/eli more usable in userspace
The upcoming GELI support in the loader reuses parts of this code
Some ifdefs are added, and some code is moved outside of existing ifdefs

The HMAC parts of GELI are broken out into their own file, to separate
them from the kernel crypto/openssl dependant parts that are replaced
in the boot code.

Passed the GELI regression suite (tools/regression/geom/eli)
 Files=20 Tests=14996
 Result: PASS

Reviewed by:	pjd, delphij
MFC after:	1 week
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D4699
2016-01-07 05:47:34 +00:00