without a commit message...
Use sbuf_new() + SYSCTL_OUT() instead of wiring the userland buffer and
using sbuf_new_for_sysctl(). The preallocated 256 byte buffer is always
going to be big enough to hold these results, and this should be more
efficient than wiring the old buffer.
INCLUDENUL is set and sbuf_finish() has been called, the length has been
incremented to count the nulterm byte, and in that case current length is
allowed to be equal to buffer size, otherwise it must be less than.
Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to
be consistant with the style in the rest of this file.
A comment in the code stated we PROC_LOCK and as a side effect guarantee
all writers released process lock. But at that point such lock was already
taken while we were removing the process from all lists, so it should be already
unreachable.
The goal here is to provide one place altering process credentials.
This eases debugging and opens up posibilities to do additional work when such
an action is performed.
strings returned to userland include the nulterm byte.
Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases. (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)
Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.
PR: 195668
The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string
to be counted in the length of the data. If copying the data using the
sbuf_data() and sbuf_len() functions, or if writing it automatically with
a drain function, the net effect is that the nulterm byte is copied along
with the rest of the data.
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.
Differential Revision: https://reviews.freebsd.org/D1987
Sponsored by: Mellanox Technologies
MFC after: 1 month
A late change to the SR-IOV infrastructure broke passthrough of
VFs. device_set_devclass() was being used to try to force the
ppt driver to attach to the device, but this didn't work because
the DF_FIXEDCLASS flag wasn't being set on the device, so the
ppt driver probe routine would not match when it returned
BUS_NOWILDCARD. Fix this by adding a new device function that
both sets the devclass and sets the DF_FIXEDCLASS flag, and use
that to force the ppt driver to attach to VFs.
Differential Revision: https://reviews.freebsd.org/D2041
Reviewed by: jhb
MFC after: 3 weeks
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.
This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.
Differential Revision: https://reviews.freebsd.org/D1832
Reviewed by: rpaulo
Discussed with: kib
executables. The goal here, not yet accomplished, is to let the e500 kernel
run under QEMU by setting KERNBASE to something that fits in low memory and
then having the kernel relocate itself at runtime.
When sendfile_getobj() is called on a DTYPE_SHM file, it never
initializes error, which is eventually returned to the caller.
Differential Revision: https://reviews.freebsd.org/D1989
Reviewed by: kib
Reported by: Brainy Code Scanner, by Maxime Villard.
currently a spin lock. Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock. Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.
Change umtx_lock to be the sleepable mutex. For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.
Discussed with: jhb
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
consistently. This also matches the per-cpu pointer declaration
anyway.
This changes the tweak we give to the load from -32..31 to be 0..31
which seems more inline with the rest of the code (- rnd and the -=
64). It should also provide the randomness we need, and may fix a
signedness bug in the old code (it isn't clear that the effect was
intentional as opposed to sloppy, and the right shift of a signed
value is undefined to boot).
This stores sched_balance() behavior when it used random().
Differential Revision: https://reviews.freebsd.org/D1981
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.
Sponsored by: EMC / Isilon Storage Division
Submitted by: Conrad Meyer
MFC after: 1 week
jail's creation parameters. This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.
The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.
The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).
There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design. The system
administrator is trusted to set sane values. Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.
Differential Revision: https://reviews.freebsd.org/D1948
Relnotes: yes
we need randomness in ULE. This removes random() call from the
rebalance interval code.
Submitted by: Harrison Grundy
Differential Revision: https://reviews.freebsd.org/D1968
to its previous, unowned state. This avoids compounding an existing
problem of inconsistent ownership.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from: Dell Inc.
PR: 198914
MFC after: 1 week
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi. The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from: Dell Inc.
PR: 198914
message. This can happen when application is sending packets too big
for the path MTU and recvmsg() will return zero (indicating no data)
but there will be a cmsghdr with cmsg_type set to IPV6_PATHMTU.
Remove KASSERT() which does NULL pointer dereference in such case.
Also call m_freem() only when m isn't NULL.
PR: 197882
MFC after: 1 week
Sponsored by: Yandex LLC
Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.
Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.
Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.
Reviewed by: silence on -hackers
instead of preprocessor macros.
This will make debugger output of 'print *m' exactly match the names
we use in code, making life of a kernel hacker way more pleasant. And
this also allows to rename struct_m_ext back to m_ext.
STAILQs and SLISTs using the same structure field as good old m_next
and m_nextpkt linkage occupy.
New code is encouraged to use queue(3) macros, instead of implementing
the wheel. However, better not to have a mixture of old style and
queue(3) in one file or subsystem.
Reviewed by: rwatson, rrs, rpaulo
Differential Revision: D1499
This is a more generic version of taskqueue_start_threads_pinned()
which only supports a single cpuid.
This originally came from John Baldwin <jhb@> who implemented it
as part of a push towards NUMA awareness in drivers. I started implementing
something similar for RSS and NUMA, then found he already did it.
I'd like to axe taskqueue_start_threads_pinned() so it doesn't become
part of a longer-term API. (Read: hps@ wants to MFC things, and
if I don't do this soon, he'll MFC what's here. :-)
I have a follow-up commit which converts the intel drivers over
to using the cpuset version of this function, so we can eventually
nuke the the pinned version.
Tested:
* igb, ixgbe
Obtained from: jhbbsd