is still operational before doing any work; otherwise we might
run into, e.g., destroyed locks.
PR: 210724
Reported by: olevole olevole.ru
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Obtained from: projects/vnet
Approved by: re (gjb)
Split initializzation an teardown into module (global state) and VNET
(per virtual network stack) parts. Virtualise global state, which is
not "const".
Cleanup eventhandlers, so that we can make use of the passed in argument
to get the vnet state from the ifp; disable the "cloner" event as it is
too early, has no state, and can fire before initialisation (see comment
in the source).
Handle the dynamic sysctls specially. The problem is that "ipmain"
is the virtualized struct, but the fields used for the sysctls are
hanging off memory allocated and attached to the virtualized "ipmain"
thus standard VNET macros and sysctl handling do not work.
We still say it is VNET sysctls to get the proper protection checks
in the VIMAGE case; to solve the problem of accessing the right bit
of memory hanging of each per-VNET ipmain, we use a dedicated handler
function wrapping around sysctl_ipf_int() undoing the base calculation
from kern_sysctl.c and then adding the passed-in offset into the right
struct depending on handler. A bit of a mess exposing VNET-internals
this way but the only way to keep the code without having to massively
restructure ipf internals.
Approved by: re (hrs)
Sponsored by: The FreeBSD Foundation
Obtained from: projects/vnet
MFC after: 2 weeks
Reviewed by: cy
Differential Revision: https://reviews.freebsd.org/D7000
Those changes were found confusing FreeBSD libc ACL code, that doesn't
differentiate ACL for directories and files, and report ACLs for all
directories created after those patches as non-trivial. On the other
side these changes were considered wrong from POSIX and NFSv4 points of
view. Until further investigation done upstream, revert those changes
locally in preparation for FreeBSD 11.0 release.
Approved by: re (hrs)
Sync libarchive with vendor, bugfixes for tests:
- fix tests on filesystems without birthtime support, e.g. UFS1 (1)
- vendor issue #729: avoid use of C99 for-scope declarations in
test_write_format_gnutar_filenames.c
MFC after: 1 week
PR: 204157 (1)
Approved by: re (hrs)
for SCTP DATA and I-DATA chunks.
* For fragmented user messages, set the I-Bit only on the last
fragment.
* When using explicit EOR mode, set the I-Bit on the last
fragment, whenever SCTP_SACK_IMMEDIATELY was set in snd_flags
for any of the send() calls.
Approved by: re (hrs)
MFC after: 1 week
This patch offers a workaround to buf_ring reordering
visible on armv7 and armv8. This is supposed to be
removed once new buf_ring implementation is integrated
into the tree.
Obtained from: Semihalf
Reviewed by: alc,emaste
Differential Revision: https://reviews.freebsd.org/D6986
Approved by: re (gjb)
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
allocations from ipfilter in preparation for VNET support.
Suggested by: cy (see D7000)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
running on EC2. Due to improvements in EC2, the performance penalty which
was present on some EC2 instances no longer exists, and enabling this
feature now consistently yields ~20% higher throughput with equal or lower
latency.
Reverts: r286063
Approved by: re (gjb)
MFC after: 2 weeks
Relnotes: Improved disk throughput on EC2
Recource management functions in GT PCI controller driver
treated memory/IO resources as KSEG1 addresses, later during
activation these values would be increased by KSEG1 base again
rendering the address invalid and causing "bus error" trap.
Actual logic was converted to use real physical addresses,
so mapping takes place only during activation.
Submitted by: Aleksandr Rybalko <ray@FreeBSD.org>
Approved by: re (gjb)
Otherwise the output is buffered and it appears that make is stuck on something
long-running. This problem is not present with -j as it uses different
code that was already flushing.
Discussed with: sjg
Approved by: re (blanket, META_MODE)
Sponsored by: EMC / Isilon Storage Division
necessary because CLOOP format lacks explicit EOF or length, so that
in the presence of padding or when the CLOOP is put onto a larger
partition upper level provider size may be larger. Bound amount
of extra data that we might touch to the max length of the compressed
block and detect zero-padding in the last cluster, which when
sector is all-zero might cause us to emit bogus I/O error after
decompression of that fails. To not make code any more complicated
that it needs to be deal with it in lazy-manner, i.e. when we
first access that specific cluster.
This change also fixes stupid mistake in the LZMA code, inherited
from geom_lzma, which does not share length of the output buffer
buffer with the decompression routine, so that in the presence
of corrupted or purposedly tailored data may easily cause heap
overflow and kernel memory corruption.
Beef up validation of the CLOOP TOC by checking that lengths of
all but the last compressed clusters match upper limit set by
the decompressor and improve some error diagnostic output while
I am here.
2.Add kern.geom.uzip.attach_to tunable to artifically limit
attaching uzip to certain devices in the dev tree only.
For example the following only makes us attaching to the
GPT labels:
kern.geom.uzip.attach_to="gpt/*"
3.Add kern.geom.uzip.noattach_to, which does opposite to the (2)
above, i.e. prevents geom_uzip from tasting / attaching to
providers matching some pattern. By default we don't attach
to our own kind, i.e. kern.geom.uzip.noattach_to="*.uzip".
It saves us quite some CPU cycles, esp on low-end embedded
systems.
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7013
Add new lock for stageq (part of ieee80211_superg structure) and
ni_tx_superg (part of ieee80211_node structure);
drop com_lock protection where it is used to protect them.
While here, drop duplicate OPACKETS counter incrementation.
ni_tx_ampdu is not protected with it (however, it is also used without
locking in other places; probably, it requires some other solution
to be thread-safe).
Tested with RTL8188CUS (AP) and RTL8188EU (STA).
NOTE: Since this change breaks KBI, all wireless drivers need to be
recompiled.
Reviewed by: adrian
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D6958
late boot: enable it explicitly after installing the page tables. If booting
from an FDT, also make sure to escape the firmware's MMU context early
before overwriting firmware page tables.
Approved by: re (gjb)
r260553 added missing C++ typinfos to libcxxrt's version script.
It appears that a number of duplicate mangled symbols were added due to
a cut and paste error. Switch the second instances to _ZTS*,
typeinfo name for *.
Found by lld, which produces an error or warning for duplicate symbols.
Reviewed by: dim
Approved by: re (gjb)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7011
Prior to this change ZFS ARC min / max could only be changed using
boot time tunables, this allows the values to be tuned at runtime
using the sysctls:
* vfs.zfs.arc_max
* vfs.zfs.arc_min
When adjusting ZFS ARC minimum the memory used will only reduce
to the new minimum given memory pressure.
Reviewed by: allanjude
Approved by: re (gjb)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D5907
The interface's queues are functional after VI_INIT_DONE (which is short
of interface-up) and that's all that's needed for t4_tom to communicate
with the chip.
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
will cal if_free() in case of conflict, error, ..
if_free() however sets the VNET instance from the ifp->if_vnet which
was not yet initialized but would only in if_attach(). Fix this by
setting the curvnet from where we allocate the interface in if_alloc().
if_attach() will later overwrite this as needed. We do not set the home_vnet
early on as we only want to prevent the if_free() panic but not change any
of the other housekeeping, e.g., triggered through ifioctl()s.
Reviewed by: brooks
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7010
As the XXX notes, these should really be checking MK_GNUCXX since there is
already a version check in share/mk/src.opts.mk to disable it. Fixing that
here is more complex though. This could also be using X_COMPILER_FEATURES
but uses X_COMPILER_VERSION to keep in sync with the src.opts.mk logic.
Tested by: andreast
Sponsored by: EMC / Isilon Storage Division
Approved by: re (gjb)
associated) instance.
The result is that the packet is dropped without an indication
that smaller MTU is advisable, which is not optimal, but better
than a NULL pointer deref.
Approved by: re (glebius)
This fixes the build when DESTDIR may be blank or not yet populated.
It also fixes reproducibility.
Submitted by: brooks
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D6455
particular, the Giant is supposed to protect against parallel
ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does
copyout(9) under Giant and then examines time_status to return syscall
result. Since copyout(9) could sleep, the syscall result might be
inconsistent.
Another and more important issue is that if PPS is configured,
hardpps(9) is executed without any protection against the parallel
top-level code invocation. Potentially, this may result in the
inconsistent state of the ntptime state variables, but I cannot say
how serious such distortion is. The non-functional splclock() call in
sys_ntp_adjtime() protected against clock interrupts calling hardpps()
in the pre-SMP era.
Modernize the locking. A mutex protects ntptime data. Due to the
hardpps() KPI legitimately serving from the interrupt filters (and
e.g. uart(4) does call it from filter), the lock cannot be sleepable
mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex
to reduce interrupt latency.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6825
private mtx in resettodr(), no implementation of CLOCK_SETTIME() is
allowed to sleep.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
X-Differential revision: https://reviews.freebsd.org/D6825
TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as
if TDF_SBDRY is not set for STOP signal delivery. The only difference
is that sig_suspend_threads() should abort the sleep instead of doing
immediate suspension.
Reported by: ngie
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Approved by: re (gjb)
(WITH_SYSTEM_COMPILER: Enable by default) and it's prerequisite: r300354,
caused i386 builds to fail when cross-built on an amd64 host.
Reviewed by: bdrewery, delphij, gjb
Approved by: re (gjb)
structure, change it to int.
The real fix is to sanitize user-visible definitions in sys/event.h,
e.g. the affected struct knlist is of no use for userspace programs.
Reported and tested by: jkim
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
sleepable scan, iteration over the shadow chain looking for a page
could find an OBJ_DEAD object. Such state of the mapping is only
transient, the dead object will be terminated and removed from the
chain shortly. We must not return KERN_PROTECTION_FAILURE unless the
object type is changed to OBJT_DEAD in the chain, indicating that
paging on this address is really impossible. Returning
KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered
to processes, or kernel accesses to UVA spuriously failing with
EFAULT.
If the object with OBJ_DEAD flag is found, only return
KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD.
Otherwise, sleep a tick and retry the fault handling.
Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by
waiting until the paging on this object is finished. But to do so, we
need to reference the dead object, while vm_object_collapse() insists
on owning the final reference on the collapsed object. This could be
fixed by e.g. changing the assert to shared reference release between
vm_fault() and vm_object_collapse(), but it seems to be too much
complications for rare boundary condition.
PR: 204426
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
X-Differential revision: https://reviews.freebsd.org/D6085
MFC after: 2 weeks
Approved by: re (gjb)
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.
Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function. The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty. As result,
the knlist_remove_inevent() is no longer needed and removed.
Other changes:
In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications. The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.
Fix immediate note activation in filt_procattach(). Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.
In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken. Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).
Some minor and incomplete style fixes.
Analyzed and tested by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6859
interrupt sleeps with the ERESTART on the suspension attempts.
Otherwise, single-threading requests are deferred until the locks are
granted for NFS files, which causes hangs.
When retrying local registration of the remotely-granted adv lock,
allow full suspension and check for suspension, for usual reasons.
Reported by: markj, pho
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Most of the effect of setting MSR[SF] is that the CPU will stop ignoring
the high 32 bits of registers containing addresses in load/store
instructions. As such, the kernel was setting it only when it began to
need access to high memory. MSR[SF] also affects the operation of some
conditional instructions, however, and so setting it at late times could
subtly break code at very early times. This fixes use of the FDT mode in
loader, and FDT boot more generally, on 64-bit PowerPC systems.
Hardware provided by: IBM LTC
Approved by: re (kib)
[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.
[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.
Reported by: Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by: kib@
Approved by: re (gjb, kib)
MFC after: 3 days
for messages which have been put on the send queue:
* Do not report any DATA or I-DATA chunk padding.
* Correctly deal with the I-DATA chunk header instead of the DATA
chunk header when the I-DATA extension is used.
Approved by: re (kib)
MFC after: 1 week
A couple of minor memory size option related nits:
- use common name 'memsize' (instead of 'max-size' or just 'size')
- bhyve: update usage with memsize unit suffix, drop legacy "MB"
unit
- bhyveload: update usage with memsize unit suffix
- bhyve(8): document default size
- bhyveload(8): use memsize formatting like it's done
in bhyve(8)
Reviewed by: wblock, grehan
Approved by: re (kib), wblock, grehan
Differential Revision: https://reviews.freebsd.org/D6952