pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we
forgot to initialise the fr_flags field. As a result we sometimes
mistakenly thought the fragment to not be a buffered fragment.
This resulted in panics because we'd end up freeing the pf_fragment
but not removing it from V_pf_fragqueue (believing it to be part of
V_pf_cachequeue). The next time we iterated V_pf_fragqueue we'd use
a freed object and panic.
While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.
X-MFS-To: releng/10.2
Sponsored by: The FreeBSD Foundation
- Since r253161, uart_intr() abuses FILTER_SCHEDULE_THREAD for signaling
uart_bus_attach() during its test that 20 iterations weren't sufficient
for clearing all pending interrupts, assuming this means that hardware
is broken and doesn't deassert interrupts. However, under pressure, 20
iterations also can be insufficient for clearing all pending interrupts,
leading to a panic as intr_event_handle() tries to schedule an interrupt
handler not registered. Solve this by introducing a flag that is set in
test mode and otherwise restores pre-r253161 behavior of uart_intr(). The
approach of additionally registering uart_intr() as handler as suggested
in PR 194979 is not taken as that in turn would abuse special pccard and
pccbb handling code of intr_event_handle(). [1]
- Const'ify uart_driver_name.
- Fix some minor style bugs.
PR: 194979 [1]
Reviewed by: marcel (earlier version)
o Revert the other functional half of r239864, i. e. the merge of r134227
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
Merge from NetBSD:
o rev. 1.10: Nuke trailing whitespace.
o rev. 1.15: Fix typo in comment.
o rev. 1.16: Add the following registers from IEEE 802.3-2009 Clause 22:
- PSE control register (0x0b)
- PSE status register (0x0c)
- MMD access control register (0x0d)
- MMD access address data register (0x0e)
o rev. 1.17 (comments only): The bit location of link ability is different
between 1000Base-X and others (see Annex 28B.2 and 28D).
o rev. 1.18: Nuke dupe word.
Obtained from: NetBSD
Sponsored by: genua mbh
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
The UEFI loader on the 10.1 release install disk (disc1) modifies an
existing EFI_DEVICE_PATH_PROTOCOL instance in an apparent attempt to
truncate the device path. In doing so it creates an invalid device
path.
Perform the equivalent action without modification of structures
allocated by firmware.
PR: 197641
Submitted by: Chris Ruffin <chris.ruffin at intel.com>
During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.
Fix swapped copyin(9) arguments in cxgb's iwch_arm_cq() function.
Detected by clang 3.7.0 with the warning:
sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c:309:18: error: variable
'rptr' is uninitialized when used here [-Werror,-Wuninitialized]
chp->cq.rptr = rptr;
^~~~
- Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
DIOCSTOP ioctls.
Hyper-V on Windows Server 2012 and earlier hosts.
Submitted by: whu
Reviewed by: royger
Approved by: royger
Relnotes: No
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3217
nvd: set d_delmaxsize to full capacity of NVMe namespace
The NVMe specification has no ability to specify a maximum delete size
that is less than the full capacity of the namespace - so just using the
namespace size is the correct value here.
This fixes reported issues where ZFS trim on init looked like it was
hanging the system - previously the default I/O max size (128KB on
Intel NVMe controllers) was used for delete operations which worked out
to only about 8MB/s. With this patch I can add an 800GB DC P3700
drive to a ZFS pool in about 15-20 seconds.
Sponsored by: Intel
Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.
MFC (281874). It broke suspend and resume on several Thinkpads (though not
all) in 10 even though it works fine on the same laptops in HEAD.
PR: 201239
Reported by: Kevin Oberman and several others
- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
does not work with either NDP or ARP for IPv4.
- draft-ietf-6man-enhanced-dad is now RFC 7527.
Approved by: re (gjb)
Fix group membership of cloned interfaces when one is moved by
if_vmove().
In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.
This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.
Approved by: re (gjb)
sfxge: added fallbacks for pre 4.2.1 firmware support
Driver must be able to start against older firmware that is missing
recently added MCDI calls, otherwise firmware upgrade will not be
possible.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Approved by: re (gjb)
Avoid a situation where we do not set persist timer after a zero window
condition.
If you send a 0-length packet, but there is data is the socket buffer, and
neither the rexmt or persist timer is already set, then activate the persist
timer.
PR: 192599
Approved by: re (delphij)
Expose full 32bit RSS hash from card regardless of whether RSS is defined or
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.
Approved by: re (gjb)
Sponsored by: Limelight Networks
Check TCP timestamp option flag so that the automatic receive buffer
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Approved by: re (gjb)
Ensure that locstat_nsecs() has no effect when lockstat probes are not
enabled or when the profiled lock carries the LO_NOPROFILE flag.
PR: 201642, 201517
Approved by: re (gjb)
Tested by: Jason Unovitch
New partition flag for gpart, writes the 0xee partition in the pmbr in the second slot, rather than the first.
Works around Lenovo legacy GPT boot issue
PR: 184910
Approved by: re (gjb), marcel
Relnotes: yes
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3140
New function smbios_match to detect BIOS versions during boot
MFC: r277957:
Fix order of functions in smbios.c (corrects r277949)
MFC: r281138:
SMBIOS support for EFI
r281138 makes changes to the new unified EFI loader (r280950), which has not been merged to stable/10 (and likely won't be).
These changes were manually applied to the amd64 EFI loader (sys/boot/amd64/efi).
The changes to sys/boot/amd64/efi are a direct commit.
Reviewed by: stas
Approved by: re (gjb), marcel
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3129
Prevent inlining txg_quiesce
This allows dtrace to monitor the calls to txg_quiesce which can be
really helpful.
Also standardize __noinline order for arc_kmem_reap_now.
Sponsored by: Multiplay
Approved by: re
introduced by r280182. FreeBSD-head doesn't need TUNABLE_INT() now with
SYSCTL_INT() but stable/10 still does.
Note: This is a direct commit to stable/10.
PR: 201644
Reviewed by: erj
Approved by: re (gjb)
Sponsored by: Limelight Networks
Make the creation of the free lists dynamic, i.e., it is based on the
available physical memory at boot time. For amd64 systems with 64 GB
or more of physical memory, create free lists for managing pages with
physical addresses below 4 GB.
PR: 185727
Requested by: alc
Approved by: re (gjb)
Fill the port and protocol information in the SADB_ACQUIRE message
in case when security policy has it as required by RFC 2367.
PR: 192774
Approved by: re (delphij)