nvd: set d_delmaxsize to full capacity of NVMe namespace
The NVMe specification has no ability to specify a maximum delete size
that is less than the full capacity of the namespace - so just using the
namespace size is the correct value here.
This fixes reported issues where ZFS trim on init looked like it was
hanging the system - previously the default I/O max size (128KB on
Intel NVMe controllers) was used for delete operations which worked out
to only about 8MB/s. With this patch I can add an 800GB DC P3700
drive to a ZFS pool in about 15-20 seconds.
Sponsored by: Intel
Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.
MFC (281874). It broke suspend and resume on several Thinkpads (though not
all) in 10 even though it works fine on the same laptops in HEAD.
PR: 201239
Reported by: Kevin Oberman and several others
- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
does not work with either NDP or ARP for IPv4.
- draft-ietf-6man-enhanced-dad is now RFC 7527.
Approved by: re (gjb)
Fix group membership of cloned interfaces when one is moved by
if_vmove().
In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.
This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.
Approved by: re (gjb)
sfxge: added fallbacks for pre 4.2.1 firmware support
Driver must be able to start against older firmware that is missing
recently added MCDI calls, otherwise firmware upgrade will not be
possible.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Approved by: re (gjb)
Avoid a situation where we do not set persist timer after a zero window
condition.
If you send a 0-length packet, but there is data is the socket buffer, and
neither the rexmt or persist timer is already set, then activate the persist
timer.
PR: 192599
Approved by: re (delphij)
Expose full 32bit RSS hash from card regardless of whether RSS is defined or
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.
Approved by: re (gjb)
Sponsored by: Limelight Networks
Check TCP timestamp option flag so that the automatic receive buffer
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Approved by: re (gjb)
Ensure that locstat_nsecs() has no effect when lockstat probes are not
enabled or when the profiled lock carries the LO_NOPROFILE flag.
PR: 201642, 201517
Approved by: re (gjb)
Tested by: Jason Unovitch
New partition flag for gpart, writes the 0xee partition in the pmbr in the second slot, rather than the first.
Works around Lenovo legacy GPT boot issue
PR: 184910
Approved by: re (gjb), marcel
Relnotes: yes
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3140
New function smbios_match to detect BIOS versions during boot
MFC: r277957:
Fix order of functions in smbios.c (corrects r277949)
MFC: r281138:
SMBIOS support for EFI
r281138 makes changes to the new unified EFI loader (r280950), which has not been merged to stable/10 (and likely won't be).
These changes were manually applied to the amd64 EFI loader (sys/boot/amd64/efi).
The changes to sys/boot/amd64/efi are a direct commit.
Reviewed by: stas
Approved by: re (gjb), marcel
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3129
Prevent inlining txg_quiesce
This allows dtrace to monitor the calls to txg_quiesce which can be
really helpful.
Also standardize __noinline order for arc_kmem_reap_now.
Sponsored by: Multiplay
Approved by: re
introduced by r280182. FreeBSD-head doesn't need TUNABLE_INT() now with
SYSCTL_INT() but stable/10 still does.
Note: This is a direct commit to stable/10.
PR: 201644
Reviewed by: erj
Approved by: re (gjb)
Sponsored by: Limelight Networks
Make the creation of the free lists dynamic, i.e., it is based on the
available physical memory at boot time. For amd64 systems with 64 GB
or more of physical memory, create free lists for managing pages with
physical addresses below 4 GB.
PR: 185727
Requested by: alc
Approved by: re (gjb)
Fill the port and protocol information in the SADB_ACQUIRE message
in case when security policy has it as required by RFC 2367.
PR: 192774
Approved by: re (delphij)
Use the monotonic (uptime) counter rather than time-of-day to measure
elapsed time between ntp_adjtime() clock offset adjustments. This
eliminates spurious frequency steering after a large clock step (such
as a 1970->2015 step on a system with no battery-backed clock hardware).
This problem was discovered after the import of ntpd 4.2.8, which does
things in a slightly different (but still correct) order than the 4.2.4
we had previously. In particular, 4.2.4 would step the clock then
immediately after use ntp_adjtime() to set the frequency and offset to
zero, which captured the post-step time-of-day as a side effect. In
4.2.8, ntpd sets frequency and offset to zero before any initial clock
step, capturing the time as 1970-ish, then when it next calls
ntp_adjtime() it's with a non-zero offset measurement. This non-zero
value gets multiplied by the apparent 45-year interval, which blows up
into a completely bogus frequency steer. That gets clamped to 500ppm,
but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.
Approved by: re (gjb)
Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt. As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt. Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.
Approved by: re
Set the initial system time to a sane (as in: not end of 21st century)
value when booting on a PC with CMOS clock set to a year before 2000.
This uses 1980 (instead of 1970 as in the initial patch) as pivot year as
suggested by imp in the PR followup.
PR: 195703
Submitted by: cs@soi.spb.ru
Reviewed by: imp
Approved by: re (gjb)
Fix broken implementation of "kvasprintf()" function by adding missing
kmalloc() call. Make function global instead of static inline to fix
compiler warnings about passing variable argument lists to inline
functions.
Sponsored by: Mellanox Technologies
Approved by: re, gjb
- Add the GEOM_PART_GPT option and enable MSDOSFS in the GUMSTIX
kernel. [1]
- Add GEOM_LABEL to the PANDABOARD kernel, that should have been
included included in r285132. I confused the kernel configuration
used for the WANDBOARD and PANDABOARD, which the former uses the
IMX6 kernel configuration, along with the CUBOX-HUMMINGBOARD.
This is a direct commit to stable/10, as was r285132.
[1] I do not actually have the GUMSTIX board, but I suspect it will
fail to boot in the same way as the others have been.
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
Install loader.rc with ARM u-boot loader (ubldr).
loader.rc is the responsible to read and process loader.conf variables.
This fix the issue of loader.conf being silently ignored.
Approved by: re (gjb)
Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment. Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.
This will make it possible to provide a GELI passphrase via the boot
loader.
Note: head and stable/10 differ as a result of r273174, which renames
the getenv(), setenv(), and unsetenv() functions with kern_getenv(),
kern_setenv(), and kern_unsetenv(), which was reverted in the relevant
parts of this change in 10-STABLE.
PR: 200448
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment. Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.
This will make it possible to provide a GELI passphrase via the boot
loader.
PR: 200448
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
illumos/illumos-gate@46e1baa6cfhttps://www.illumos.org/issues/5911
Sometimes ZFS appears to hang while deleting a file. It is actually
making slow progress at the file deletion, but other operations
(administrative and writes via the data path) "hang" until the file
removal completes, which can take a long time if the file has many
blocks. The deletion (or most of it) happens in a single txg, and the
sync thread spends most of its time reading indirect blocks...
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
PR: 199775
Approved by: re(kib)
Don't enable RX and TX before their initial configuration is done, i. e.
after setting up interrupt moderation but before turning interrupts on.
This matches what Realtek's r8168 Linux driver does as of version 8.039.00
and fixes problems with certain incarnations of certain MAC revisions
like the interface requiring an extra up/down-cycle after boot to start
working or DMA configuration not being adhered to.
PR: 193743, 197535
Approved by: re (kib)
kernel configuration files, resolving an issue where the UFS and
MSDOSFS partitions would not mount as set in fstab(5).
This is a direct commit to stable/10, as the GEOM_LABEL option
is handled differently in head for arm/armv6. The WANDBOARD
and PANDABOARD already have this kernel option entry via the IMX6
kernel configuration file, so do not need to be changed.
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
If the pathname is absolute or dirfd is AT_FDCWD we can
handle it exactly like open(2).
Otherwise we output an A record to indicate that the path of
an open directory needs to be used (earlier in the trace).
Also filemon_pid_check needs to hold proctree_lock
and use proc_realparent()
Differential Revision: D2810
Reviewed by: jhb
Approved by: re
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).
There are also two issues with the if_bridge error handling.
If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.
A second issue is a double free of bif. It's already freed by
bridge_delete_member().
PR: 200210