If we start with console set to comconsole, the local
console (vidconsole, efi) is never initialized and attempt to
use the data can render the loader hung.
Reported by: Kamigishi Rei
MFC after: 3 days
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import(). This
results in a boot-time hang on such platforms.
Reported by: bdragon
MFC after: 3 days
Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when
creating the unix domain socket for a given bhyve vm.
The path to the unix domain socket for a bhyve vm will now be
/var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname
Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared
to bhyvectl(8).
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28783
Add the PD_KILL flag that instructs prison_deref() to take steps
to actively kill a prison and its descendents, namely marking it
PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
attached processes.
This replaces a similar loop in sys_jail_remove(), bringing the
operation under the same single hold on allprison_lock that it already
has. It is also used to clean up failed jail (re-)creations in
kern_jail_set(), which didn't generally take all the proper steps.
Differential Revision: https://reviews.freebsd.org/D28473
Our uefi support has included environment variable support for several years
now. Remove the bogus blanket statement saying we don't support them.
MFC After: 3 days
After 0ee0dbfb0d where I imported a more
recent libcxxrt snapshot, the variables 'rtn' and 'has_ret' could in
some cases be used while still uninitialized. Most obviously this would
lead to a jemalloc complaint about a bad free(), aborting the program.
Fix this by initializing a bunch variables in their declarations. This
change has also been sent upstream, with some additional changes to be
used in their testing framework.
PR: 253226
MFC after: 3 days
LARGE_NAT is a C macro that increases
NAT_SIZE from 127 to 2047,
RDR_SIZE from 127 to 2047,
HOSTMAP_SIZE from 2047 to 8191,
NAT_TABLE_MAX from 30000 to 180000, and
NAT_TABLE_SZ from 2047 to 16383.
These values can be altered at runtime using the ipf -T command however
some adminstrators of large firewalls rebuild the kernel to enable
LARGE_NAT at boot. This revision adds the tunable net.inet.ipf.large_nat
which allows an administrator to set this option at boot instead of build
time. Setting the LARGE_NAT macro to 1 is unaffected allowing build-time
users to continue using the old way.
As of ipfilter 5.1.2 the IPv4 and IPv6 rules tables have been merged.
The ipf(8) -6 option has been a NOP since then. Currently the additional
ipf -6 load statement in rc.d/ipfilter simply added the second ipfilter
rules file to the table already populated by the previous ipf command.
Plenty of time has passed since ipfilter 5.1.2 was imported. It is time to
remove the option from rc.conf and the rc script.
Differential Revision: https://reviews.freebsd.org/D28615
The makefs msdosfs code includes fs/msdosfs/denode.h which directly uses
struct buf from <sys/buf.h> rather than the makefs struct m_buf.
To work around this problem provide a local denode.h that includes
ffs/buf.h and defines buf as an alias for m_buf.
Reviewed By: kib, emaste
Differential Revision: https://reviews.freebsd.org/D28835
Instead of 2-4 socket reads per PDU this can do as low as one read
per megabyte, dramatically reducing TCP overhead and lock contention.
With this on iSCSI target I can write more than 4GB/s through a
single connection.
MFC after: 1 month
I did this without a full vendor update since that would cause too many
conflicts. Since these files now almost match the NetBSD sources the
next git subtree merge should work just fine.
Reviewed By: lwhsu
Differential Revision: https://reviews.freebsd.org/D28797
terminating a TCP connection.
If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.
This can be explained as follows:
If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:
case 1: index 0: original packet 51
index 1: retransmitted packet 50
index 2: not relevant
case 2: index 0: not relevant
index 1: original packet 51
index 2: retransmitted packet 50
case 3: index 0: retransmitted packet 50
index 1: not relevant
index 2: original packet 51
This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().
Else no functional changes.
Discussed with: rscheff@
Submitted by: Andreas Longwitz <longwitz@incore.de>
PR: 230755
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
This is part of XSA-361.
Sponsored by: Citrix Systems R&D
A number of projects use "Fixes: <hash>" to identify a commit that is
fixed by a given change. Adopt that convention.
Differential Revision: https://reviews.freebsd.org/D28693
The caller should not be passing M_ZERO in the first place, so PG_ZERO
will not be preserved by the page allocator and clearing it accomplishes
nothing.
Reviewed by: gallatin, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28808
Notable upstream changes:
778869fa1 Fix reporting of mount progress
e7adccf7f Disable use of hardware crypto offload drivers on FreeBSD
03e02e5b5 Fix checksum errors not being counted on repeated repair
64e0fe14f Restore FreeBSD resource usage accounting
11f2e9a49 Fix panic if scrubbing after removing a slog device
MFC after: 2 weeks
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
MFC after: 1 month
Rather that using references (pr_ref and pr_uref) to deduce the state
of a prison, keep track of its state explicitly. A prison is either
"invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
(pr_uref == 0).
State transitions are generally tied to the reference counts, but with
some flexibility: a new prison is "invalid" even though it now starts
with a reference, and jail_remove(2) sets the state to "dying" before
the user reference count drops to zero (which was prviously
accomplished via the PR_REMOVE flag).
pr_state is protected by both the prison mutex and allprison_lock, so
it has the same availablity guarantees as the reference counts do.
Differential Revision: https://reviews.freebsd.org/D27876
... by moving v_hash into a 4 byte hole.
Combined with several previous size reductions this makes the size small
enough to fit 9 vnodes per page as opposed to 8.
Add a compilation time assert so that this is not unknowingly worsened.
Note the structure still remains bigger than it should be.
Notable changes:
- fix reporting of mount progress (778869fa1)
- disable use of hardware crypto offload drivers on FreeBSD (e7adccf7f)
- fix checksum errors not being counted on repeated repair (03e02e5b5)
- restore FreeBSD resource usage accounting (64e0fe14f)
- fix panic if scrubbing after removing a slog device (11f2e9a49)
Require both the prison mutex and allprison_lock when pr_ref or
pr_uref go to/from zero. Adding a non-first or removing a non-last
reference remain lock-free. This means that a shared hold on
allprison_lock is sufficient for prison_isalive() to be useful, which
removes a number of cases of lock/check/unlock on the prison mutex.
Expand the locking in kern_jail_set() to keep allprison_lock held
exclusive until the new prison is valid, thus making invalid prisons
invisible to any thread holding allprison_lock (except of course the
one creating or destroying the prison). This renders prison_isvalid()
nearly redundant, now used only in asserts.
Differential Revision: https://reviews.freebsd.org/D28419
Differential Revision: https://reviews.freebsd.org/D28458
The data is only needed by filesystems that
1. use buffer cache
2. utilize clustering write support.
Requested by: mjg
Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28679
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.
Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition. Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D28679
Tested with glibc test suite.
The C variant in libkern performs excessive branching to find the zero
byte instead of using the bsfq instruction. The same code patched to use
it is still slower than the routine implemented here as the compiler
keeps neglecting to perform certain optimizations (like using leaq).
On top of that the routine can be used as a starting point for copyinstr
which operates on words intead of bytes.
The previous attempt had an instance of swapped operands to andq when
dealing with fully aligned case, which had a side effect of breaking the
code for certain corner cases. Noted by jrtc27.
Sample results:
$(perl -e "print 'A' x 3"):
stock: 211198039
patched:338626619
asm: 465609618
$(perl -e "print 'A' x 100"):
stock: 83151997
patched: 98285919
asm: 120719888
Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D28779
Change the flow of prison_deref() so it doesn't let go of allprison_lock
until it's completely done using it (except for a possible drop as part
of an upgrade on its first try).
Differential Revision: https://reviews.freebsd.org/D28458
MFC after: 3 days
rtsock message validation changes committed in 2fe5a79425
did not take llinfo messages into account.
Add a special validation case for RTA_GATEWAY llinfo messages.
MFC after: 2 days
Add a BUGS section about using pwrite(2) when O_APPEND is set on the fd.
MFC after: 3 days
Submitted by: Ka Ho Ng <khng300@gmail.com>
Reviewed by: gbe, yuripv
Differential Revision: https://reviews.freebsd.org/D28372
After commit 8ba333e02e ("libdtrace: Stop relying on lex
compatibility"), there have been several reports of incremental
buildworlds failing since make does not know that dt_lex.c needs to be
regenerated, and I want to avoid this when merging to stable/13.
MFC with: 8ba333e02e
conical hat reduction: Make sure we also remove gotboot.efifat. It was created,
briefly, and shouldn't have existed in the first place. Kill it at the same
place we kill boot1.efifat.
Pointy Hat to: imp@
T5 and above have extra bits for the optional filter fields. This is a
correctness issue and not just a waste because a filter mode valid on a
T4 (36b) may not be valid on a T5+ (40b).
MFC after: 2 weeks
Sponsored by: Chelsio Communications