Commit Graph

245112 Commits

Author SHA1 Message Date
kib
81faa225ff Revert r323722. A better fix will be committed shortly, as well as
some still useful bits of the reverted revision.

The problem with the committed fix is that there are still issues with
returning from NMI, when NMI interrupted kernel in a moment where the
kernel segments selectors were still not loaded into registers.  If
this happens, the NMI return would loose the userspace selectors
because r323722 does not reload segment registers on return to kernel
mode.

Fixing the problem is complicated.  Since an alternative approach to
handle the original bug exists, it makes sence to stop adding more
complexity.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-28 08:38:24 +00:00
sephe
11859da3f9 hyperv/hn: Unbreak i386 building.
Reported by:	cy
MFC after:	1 week
Sponsored by:	Microsoft
2017-09-28 07:02:56 +00:00
imp
ff7911a913 Tweak performance of nda completions
Use xpt_done_direct in preference to xpt_done when completing a
successful I/O. Continue to use xpt_done when there's an error, or for
completion of the submission of a CCB. This eliminates a context
switch to the cam_doneq thread.

Sponsored by: Netflix
Suggested by: scottl@
2017-09-28 01:27:00 +00:00
rmacklem
9dd547c597 Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.

MFC after:	2 weeks
2017-09-27 23:23:41 +00:00
jhb
bb8b530dd0 Use UMA_ALIGNOF() for name cache UMA zones.
This fixes kernel crashes due to misaligned accesses to the 64-bit
time_t embedded in struct namecache_ts in MIPS n32 kernels.

MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:18:57 +00:00
jhb
13b1e2684d Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type.  This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.

Discussed on:	arch@
MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:15:33 +00:00
landonf
67cf61ca22 bhnd: Add support for supplying bus I/O callbacks when initializing an EROM
parser.

This allows us to use the EROM parser API in cases where the standard bus
space I/O APIs are unsuitable. In particular, this will allow us to parse
the device enumeration table directly from bhndb(4) drivers, prior to
full attach and configuration of the bridge.

Approved by:	adrian (mentor)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12510
2017-09-27 19:48:34 +00:00
landonf
2e486532df bhnd: Implement bhnd(4) platform device registration.
Add bhnd(4) API for explicitly registering BHND platform devices (ChipCommon,
PMU, NVRAM, etc) with the bus, rather than walking the newbus hierarchy to
discover platform devices. These devices are now also refcounted; attempting
to deregister an actively used platform device will return EBUSY.

This resolves a lock ordering incompatibility with bwn(4)'s firmware loading
threads; previously it was necessary to acquire Giant to protect newbus access
when locating and querying the NVRAM device.

Approved by:	adrian (mentor)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12392
2017-09-27 19:44:23 +00:00
imp
f0a60eb85f Since the human readable name is actually ignored, and not matching a
'human' pnp string, change it to #, the name reserved for fields that
are ignored.
2017-09-27 19:22:10 +00:00
imp
ee8813ba3b Improve description of the PNP string a bit. 2017-09-27 19:21:52 +00:00
cem
8a39a65595 Unrevert r324059
With a colon and bogus name ("#") added to appease the simplistic parser
used in kldxref.

Sponsored by:	Dell EMC Isilon
2017-09-27 19:14:00 +00:00
markj
ae879c6393 Use C99 initializers for DTrace provider methods.
This makes the definitions easier to read and more cscope-friendly.

MFC after:	1 week
2017-09-27 17:46:38 +00:00
davidcs
270b655971 Tx Ring Shadow Consumer Index Register needs to be cleared prior
to passing it's physical address to the FW during Tx Create Context.

MFC after:3 days
2017-09-27 17:46:11 +00:00
fsu
ac00ff77c6 Add check to avoid raw inode iblocks fields overflow in case of huge_file feature.
Use the Linux logic for now.

Reviewed by:    pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      2 weeks
Differential Revision: https://reviews.freebsd.org/D12131
2017-09-27 16:12:13 +00:00
cem
df978a4bc8 Remove PNP metadata from drm2 drivers until kldxref problem is resolved
Reported by:	np
Sponsored by:	Dell EMC Isilon
2017-09-27 14:59:18 +00:00
tuexen
a1f4d252b8 Remove unused function.
MFC after:	1 week
2017-09-27 13:05:23 +00:00
manu
444f0ddd90 vfs_export: Simplify vfs_export_lookup
If the filesystem is not exported directly return NULL.
If no address is given and filesystem is exported using some default
one return it directly, if it doesn't have a default one directly
return NULL.

Reviewed by:	kib, bapt
MFC after:	1 week
Sponsored by:	Gandi.net
Differential Revision:	https://reviews.freebsd.org/D12505
2017-09-27 09:39:16 +00:00
sephe
04335d9414 kernel: Bump __FreeBSD_version for the removal of M_HASHTYPE_RSS_UDP_IPV4_EX
Sponsored by:	Microsoft
2017-09-27 06:33:55 +00:00
sephe
f72be2a2dc mbuf: Remove UDP_IPV4_EX, which was never defined.
Add comment to explain the IPV6_EX suffix.  The confusion about
these RSS hash type probably stems from the facts that they were
never widely implemented by hardwares.

Reviewed by:	rwatson
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12453
2017-09-27 06:31:35 +00:00
sephe
f1d4cb25ab ixl: Fix mbuf hash type settings.
IPV6_EXs in RSS never mean fragment.  They mean:
"- Home address from the home address option in the IPv6 destination
   options header.  If the extension header is not present, use the
   Source IPv6 Address.
 - IPv6 address that is contained in the Routing-Header-Type-2 from
   the associated extension header.  If the extension header is not
   present, use the Destination IPv6 Address."

UDP_IPV4_EX is an invalid RSS hash type, which will be removed.

Quoted from:
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ndishashipv6ex

Reviewed by:	erj
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12450
2017-09-27 05:59:54 +00:00
sephe
e1f9aeedee tcp: Don't "negotiate" MSS.
_NO_ OSes actually "negotiate" MSS.

RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."

This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:

    In syncache_respond() do not reply with a MSS that is larger than what
    the peer announced to us but make it at least tcp_minmss in size.

    Sponsored by:   TCP/IP Optimization Fundraise 2005

The tcp_minmss behaviour is still kept.

Syncookie fix was prodded by tuexen, who also helped to test this
patch w/ packetdrill.

Reviewed by:	tuexen, karels, bz (previous version)
MFC after:	2 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12430
2017-09-27 05:52:37 +00:00
sephe
b69ef721ad hyperv/hn: Fix UDP checksum offload issue in Azure.
UDP checksum offload does not work in Azure if following conditions are
met:
- sizeof(IP hdr + UDP hdr + payload) > 1420.
- IP_DF is not set in IP hdr

Use software checksum for UDP datagrams falling into this category.

Add two tunables to disable UDP/IPv4 and UDP/IPv6 checksum offload, in
case something unexpected happened.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12429
2017-09-27 05:44:50 +00:00
sephe
8fc41247cb hyperv/hn: Set tcp header offset for CSUM/LSO offloading.
No observable effect; better safe than sorry.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12417
2017-09-27 04:42:40 +00:00
mjg
92eb817a05 sysctl: remove target buffer read/write checks prior to calling the handler
Said checks were inherently racy anyway as jokers could unmap target areas
before the handler got around to accessing them.

This saves time by avoiding locking the address space.

MFC after:	1 week
2017-09-27 01:31:52 +00:00
mjg
84d28f19d5 Annotate sysctlmemlock with __exclusive_cache_line.
MFC after:	1 week
2017-09-27 01:27:43 +00:00
mjg
1b521971f3 Remove manpage entries about crshared(9)
The function itself was removed years ago in r272546

Submitted by:	Paulm <paulm tetrardus.net>
MFC after:	2 weeks
2017-09-27 01:12:47 +00:00
mjg
c8836df626 Whack procctl(8)
It was supposed to provide a recovery mechanism against bugs in procfs's
long deprecated tracing capabilities.

Remove the tool as a prerequisite to axing the kernel side.

The tracing facility to use is ptrace(2).

MFC after:	2 weeks
2017-09-27 01:03:00 +00:00
mjg
77579b97a2 mtx: drop the tid argument from _mtx_lock_sleep
tid must be equal to curthread and the target routine was already reading
it anyway, which is not a problem. Not passing it as a parameter allows for
a little bit shorter code in callers.

MFC after:	1 week
2017-09-27 00:57:05 +00:00
rmacklem
3ccccd3209 Add major and minor version arguments to nfscl_reqstart().
This patch adds "vers" and "minorvers" arguments to nfscl_reqstart().
The patch always passes them in as "0" and that implies no change
in semantics. These arguments will be used by a future commit that
adds support for the Flexible File Layout.
2017-09-26 23:42:44 +00:00
jhb
c991591342 Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration.  However, this
resulted in the item being free'd before it's address was used with
wakeup().  As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock.  This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue.  Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.

The main reason I can think of for deferring a wakeup() would be to
avoid waking up a waiter while holding a lock that the waiter would
need.  However, no locks are dropped in between the wakeup() in
workitem_free() and the end of the loop in complete_jseg() as far as I
can tell.

In general I think it is not safe to do a wakeup() after free() as one
cannot control how other parts of the kernel that might reuse the
address for a different wait channel will handle spurious wakeups.

Reported by:	pho
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12494
2017-09-26 23:24:15 +00:00
cem
8a3fa60af6 Add PNP metadata to more drivers
GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by:	imp, mmacy
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12488
2017-09-26 23:23:58 +00:00
cem
de7a7877e1 aesni(4): Add support for x86 SHA intrinsics
Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.

This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.

Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors.
The test passes on this driver.  Additionally, jhb's cryptocheck tool has
been used to compare various random inputs against OpenSSL.  This test also
passes.

Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni:      SHA1: ~8300 Mb/s    SHA256: ~8000 Mb/s
cryptosoft:       ~1800 Mb/s    SHA256: ~1800 Mb/s

So ~4.4-4.6x speedup depending on algorithm choice.  This is consistent with
the results the Linux folks saw for 4kB buffers.

The driver borrows SHA update code from sys/crypto sha1 and sha256.  The
intrinsic step function comes from Intel under a 3-clause BSDL.[0]  The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).

[0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations

Reviewed by:	jhb
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12452
2017-09-26 23:12:32 +00:00
glebius
a9aff4d0cd Fix regression from r323855. The EXIT trap now isn't cleared, so upon
exit it tried to unmount already unmounted partition, resulting in failure.
2017-09-26 21:54:19 +00:00
davidcs
0a9934b337 Fix delete all multicast addresses
Submitted by:Anand.Khoje@cavium.com
MFC after:5 days
2017-09-26 20:53:25 +00:00
manu
64478df5b2 a10_gpio: Enable all needed clocks
Do not enable only the first clock, enable them all.
2017-09-26 20:23:09 +00:00
manu
177263997c a10_ehci: Enable all clocks and reset
a10_ehci can have multiple clocks and reset, enable them all instead of
only the first one.
2017-09-26 19:21:43 +00:00
manu
3d1583cba6 aw_usbphy: Only reroute OTG for phy0
We only need to route OTG port to host mode on phy0 and if no VBUS
is present on the port, otherwise leave the port in periperal mode.
2017-09-26 19:20:50 +00:00
manu
b1773e6916 aw_usbphy: Fix write of unknown register
Some SoC require a write to a unknown register to work corectly.
This write should be in the pmu region not in the phy ctrl one.

Reported by:	Mark Millard (markmi@dsl-only.net)
2017-09-26 19:19:44 +00:00
cem
14d71cb7ed opencrypto: Use C99 initializers for auth_hash instances
A misordering in the Via padlock driver really strongly suggested that these
should use C99 named initializers.

No functional change.

Sponsored by:	Dell EMC Isilon
2017-09-26 17:52:52 +00:00
cem
0c65f5254f opencrypto: Loosen restriction on HMAC key sizes
Theoretically, HMACs do not actually have any limit on key sizes.
Transforms should compact input keys larger than the HMAC block size by
using the transform (hash) on the input key.

(Short input keys are padded out with zeros to the HMAC block size.)

Still, not all FreeBSD crypto drivers that provide HMAC functionality
handle longer-than-blocksize keys appropriately, so enforce a "maximum" key
length in the crypto API for auth_hashes that previously expressed a
requirement.  (The "maximum" is the size of a single HMAC block for the
given transform.)  Unconstrained auth_hashes are left as-is.

I believe the previous hardcoded sizes were committed in the original
import of opencrypto from OpenBSD and are due to specific protocol
details of IPSec.  Note that none of the previous sizes actually matched
the appropriate HMAC block size.

The previous hardcoded sizes made the SHA tests in cryptotest.py
useless for testing FreeBSD crypto drivers; none of the NIST-KAT example
inputs had keys sized to the previous expectations.

The following drivers were audited to check that they handled keys up to
the block size of the HMAC safely:

  Software HMAC:
    * padlock(4)
    * cesa
    * glxsb
    * safe(4)
    * ubsec(4)

  Hardware accelerated HMAC:
    * ccr(4)
    * hifn(4)
    * sec(4) (Only supports up to 64 byte keys despite claiming to
      support SHA2 HMACs, but validates input key sizes)
    * cryptocteon (MIPS)
    * nlmsec (MIPS)
    * rmisec (MIPS) (Amusingly, does not appear to use key material at
      all -- presumed broken)

Reviewed by:	jhb (previous version), rlibby (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12437
2017-09-26 16:18:10 +00:00
avg
327f4a4dcc fix r324011, MFV of r323535, 8585 improve batching done in zil_commit()
I managed to commit an older version of the change.
Plus, even the latest version was not ready for userland compilation.

Reported by:	"O. Hartmann" <ohartmann@walstatt.org>,
		cy
MFC after:	1 week
X-MFC with:	r324011
2017-09-26 15:38:16 +00:00
manu
e41366bcde mountd: Avoid memory leak by freeing dp_dirp
Introduced in r324007, the data alloced by strdup was never free'ed.
While here, remove cast to caddr_t when freeing dp.

Reported by:	bde
MFC after:	1 week
X MFC With:	r324007
2017-09-26 12:15:13 +00:00
bapt
680f0ce8ec calendar: replace strcpy/strcat with asprintf 2017-09-26 11:16:33 +00:00
manu
08db33b929 mountd: Remove unneeded cast
Reported by:	kib
MFC after:	1 week
X MFC With:	r324007
2017-09-26 11:11:17 +00:00
avg
16b20104c4 MFV r323535: 8585 improve batching done in zil_commit()
FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
  so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.

illumos/illumos-gate@1271e4b10d
1271e4b10d

https://www.illumos.org/issues/8585
  The current implementation of zil_commit() can introduce significant
  latency, beyond what is inherent due to the latency of the underlying
  storage. The additional latency comes from two main problems:
  1. When there's outstanding ZIL blocks being written (i.e. there's
      already a "writer thread" in progress), then any new calls to
      zil_commit() will block waiting for the currently oustanding ZIL
      blocks to complete. The blocks written for each "writer thread" is
      coined a "batch", and there can only ever be a single "batch" being
      written at a time. When a batch is being written, any new ZIL
      transactions will have to wait for the next batch to be written,
      which won't occur until the current batch finishes.
  As a result, the underlying storage may not be used as efficiently
      as possible. While "new" threads enter zil_commit() and are blocked
      waiting for the next batch, it's possible that the underlying
      storage isn't fully utilized by the current batch of ZIL blocks. In
      that case, it'd be better to allow these new threads to generate
      (and issue) a new ZIL block, such that it could be serviced by the
      underlying storage concurrently with the other ZIL blocks that are
      being serviced.
  2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
      to complete, prior to zil_commit() returning. The size of any given
      batch is proportional to the number of ZIL transaction in the queue
      at the time that the batch starts processing the queue; which
      doesn't occur until the previous batch completes. Thus, if there's a
      lot of transactions in the queue, the batch could be composed of
      many ZIL blocks, and each call to zil_commit() will have to wait for
      all of these writes to complete (even if the thread calling
      zil_commit() only cared about one of the transactions in the batch).

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12355
2017-09-26 11:04:08 +00:00
manu
b5f772e784 mountd: Replace malloc+strcpy to strdup
Reviewed by:	bapt
MFC after:	1 week
Sponsored by:	Gandi.net
Differential Revision:	https://reviews.freebsd.org/D12503
2017-09-26 09:18:18 +00:00
bapt
1c52c05e15 Remove empty lines for consistency with other entries 2017-09-26 05:47:33 +00:00
bapt
e8e554882c Do not actually install uneeded alias for man 2017-09-26 05:46:10 +00:00
bapt
6f72153bca Remove unneeded locales and alias man directories
In base, locales (and encoding) specific directories are not used
by any tool. Just remove them.

While here also remove the cat page directory for openssl
2017-09-26 05:43:55 +00:00
bapt
09cceb6148 Do not print error when running make delete-old on system
without catpages directories
2017-09-26 05:33:15 +00:00