pmap_remove_all(). If the object to which a page belongs has no
references, then that page cannot possibly be mapped.
Reviewed by: kib
MFC after: 1 week
Split the handlers for pop of invalid selectors from the trap frame
into usermode and kernel variants. Usermode handler is kept as is, it
restores the already loaded parts of the trap frame and jumps to set
up a signal delivery to the user process.
New kernel part of the handler emulates IRET treatment of the segments
which would violate access right. It loads NUL selector in the
segment register which load causes the fault, and then continues the
return to interrupted kernel code. Since invalid selectors in the
segment registers in the kernel mode can only exist while kernel still
enters or exits from userspace, we only zero invalid userspace
selectors. If userspace tries to use the segment register, it gets a
signal, as if the processor segment descriptor cache was reloaded.
Reported by: Maxime Villard <max@m00nbsd.net>
Suggested and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Do not return from interrupt using the POP_FRAME;iret instruction
sequence, always jump to doreti.
The user segments selectors saved on the stack might become invalid
because userspace manipulated LDT in a parallel thread. trap() is
aware of such issue, but it is only prepared to handle it at iret and
segment registers load operations in doreti path.
Also remove POP_FRAME macro because it is no longer used.
Reviewed by: bde, jhb (as part of r323722)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
some still useful bits of the reverted revision.
The problem with the committed fix is that there are still issues with
returning from NMI, when NMI interrupted kernel in a moment where the
kernel segments selectors were still not loaded into registers. If
this happens, the NMI return would loose the userspace selectors
because r323722 does not reload segment registers on return to kernel
mode.
Fixing the problem is complicated. Since an alternative approach to
handle the original bug exists, it makes sence to stop adding more
complexity.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Use xpt_done_direct in preference to xpt_done when completing a
successful I/O. Continue to use xpt_done when there's an error, or for
completion of the submission of a CCB. This eliminates a context
switch to the cam_doneq thread.
Sponsored by: Netflix
Suggested by: scottl@
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.
MFC after: 2 weeks
This fixes kernel crashes due to misaligned accesses to the 64-bit
time_t embedded in struct namecache_ts in MIPS n32 kernels.
MFC after: 1 week
Sponsored by: DARPA / AFRL
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.
Discussed on: arch@
MFC after: 1 week
Sponsored by: DARPA / AFRL
parser.
This allows us to use the EROM parser API in cases where the standard bus
space I/O APIs are unsuitable. In particular, this will allow us to parse
the device enumeration table directly from bhndb(4) drivers, prior to
full attach and configuration of the bridge.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12510
Add bhnd(4) API for explicitly registering BHND platform devices (ChipCommon,
PMU, NVRAM, etc) with the bus, rather than walking the newbus hierarchy to
discover platform devices. These devices are now also refcounted; attempting
to deregister an actively used platform device will return EBUSY.
This resolves a lock ordering incompatibility with bwn(4)'s firmware loading
threads; previously it was necessary to acquire Giant to protect newbus access
when locating and querying the NVRAM device.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12392
If the filesystem is not exported directly return NULL.
If no address is given and filesystem is exported using some default
one return it directly, if it doesn't have a default one directly
return NULL.
Reviewed by: kib, bapt
MFC after: 1 week
Sponsored by: Gandi.net
Differential Revision: https://reviews.freebsd.org/D12505
Add comment to explain the IPV6_EX suffix. The confusion about
these RSS hash type probably stems from the facts that they were
never widely implemented by hardwares.
Reviewed by: rwatson
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12453
IPV6_EXs in RSS never mean fragment. They mean:
"- Home address from the home address option in the IPv6 destination
options header. If the extension header is not present, use the
Source IPv6 Address.
- IPv6 address that is contained in the Routing-Header-Type-2 from
the associated extension header. If the extension header is not
present, use the Destination IPv6 Address."
UDP_IPV4_EX is an invalid RSS hash type, which will be removed.
Quoted from:
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ndishashipv6ex
Reviewed by: erj
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12450
_NO_ OSes actually "negotiate" MSS.
RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."
This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:
In syncache_respond() do not reply with a MSS that is larger than what
the peer announced to us but make it at least tcp_minmss in size.
Sponsored by: TCP/IP Optimization Fundraise 2005
The tcp_minmss behaviour is still kept.
Syncookie fix was prodded by tuexen, who also helped to test this
patch w/ packetdrill.
Reviewed by: tuexen, karels, bz (previous version)
MFC after: 2 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12430
UDP checksum offload does not work in Azure if following conditions are
met:
- sizeof(IP hdr + UDP hdr + payload) > 1420.
- IP_DF is not set in IP hdr
Use software checksum for UDP datagrams falling into this category.
Add two tunables to disable UDP/IPv4 and UDP/IPv6 checksum offload, in
case something unexpected happened.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12429
Said checks were inherently racy anyway as jokers could unmap target areas
before the handler got around to accessing them.
This saves time by avoiding locking the address space.
MFC after: 1 week
It was supposed to provide a recovery mechanism against bugs in procfs's
long deprecated tracing capabilities.
Remove the tool as a prerequisite to axing the kernel side.
The tracing facility to use is ptrace(2).
MFC after: 2 weeks
tid must be equal to curthread and the target routine was already reading
it anyway, which is not a problem. Not passing it as a parameter allows for
a little bit shorter code in callers.
MFC after: 1 week
This patch adds "vers" and "minorvers" arguments to nfscl_reqstart().
The patch always passes them in as "0" and that implies no change
in semantics. These arguments will be used by a future commit that
adds support for the Flexible File Layout.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.
The main reason I can think of for deferring a wakeup() would be to
avoid waking up a waiter while holding a lock that the waiter would
need. However, no locks are dropped in between the wakeup() in
workitem_free() and the end of the loop in complete_jseg() as far as I
can tell.
In general I think it is not safe to do a wakeup() after free() as one
cannot control how other parts of the kernel that might reuse the
address for a different wait channel will handle spurious wakeups.
Reported by: pho
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12494
GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt
This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.
Reviewed by: imp, mmacy
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12488
Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.
This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.
Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors.
The test passes on this driver. Additionally, jhb's cryptocheck tool has
been used to compare various random inputs against OpenSSL. This test also
passes.
Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s
So ~4.4-4.6x speedup depending on algorithm choice. This is consistent with
the results the Linux folks saw for 4kB buffers.
The driver borrows SHA update code from sys/crypto sha1 and sha256. The
intrinsic step function comes from Intel under a 3-clause BSDL.[0] The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).
[0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations
Reviewed by: jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12452
Some SoC require a write to a unknown register to work corectly.
This write should be in the pmu region not in the phy ctrl one.
Reported by: Mark Millard (markmi@dsl-only.net)
A misordering in the Via padlock driver really strongly suggested that these
should use C99 named initializers.
No functional change.
Sponsored by: Dell EMC Isilon
Theoretically, HMACs do not actually have any limit on key sizes.
Transforms should compact input keys larger than the HMAC block size by
using the transform (hash) on the input key.
(Short input keys are padded out with zeros to the HMAC block size.)
Still, not all FreeBSD crypto drivers that provide HMAC functionality
handle longer-than-blocksize keys appropriately, so enforce a "maximum" key
length in the crypto API for auth_hashes that previously expressed a
requirement. (The "maximum" is the size of a single HMAC block for the
given transform.) Unconstrained auth_hashes are left as-is.
I believe the previous hardcoded sizes were committed in the original
import of opencrypto from OpenBSD and are due to specific protocol
details of IPSec. Note that none of the previous sizes actually matched
the appropriate HMAC block size.
The previous hardcoded sizes made the SHA tests in cryptotest.py
useless for testing FreeBSD crypto drivers; none of the NIST-KAT example
inputs had keys sized to the previous expectations.
The following drivers were audited to check that they handled keys up to
the block size of the HMAC safely:
Software HMAC:
* padlock(4)
* cesa
* glxsb
* safe(4)
* ubsec(4)
Hardware accelerated HMAC:
* ccr(4)
* hifn(4)
* sec(4) (Only supports up to 64 byte keys despite claiming to
support SHA2 HMACs, but validates input key sizes)
* cryptocteon (MIPS)
* nlmsec (MIPS)
* rmisec (MIPS) (Amusingly, does not appear to use key material at
all -- presumed broken)
Reviewed by: jhb (previous version), rlibby (previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12437
I managed to commit an older version of the change.
Plus, even the latest version was not ready for userland compilation.
Reported by: "O. Hartmann" <ohartmann@walstatt.org>,
cy
MFC after: 1 week
X-MFC with: r324011
Introduced in r324007, the data alloced by strdup was never free'ed.
While here, remove cast to caddr_t when freeing dp.
Reported by: bde
MFC after: 1 week
X MFC With: r324007