16915 Commits

Author SHA1 Message Date
markj
b0130de08d Use KOBJMETHOD_END in the kernel linker.
MFC after:	1 week
2019-10-16 22:06:19 +00:00
markj
84cd531f96 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
andrew
7052b2d00d Stop leaking information from the kernel through timespec
The timespec struct holds a seconds value in a time_t and a nanoseconds
value in a long. On most architectures these are the same size, however
on 32-bit architectures other than i386 time_t is 8 bytes and long is
4 bytes.

Most ABIs will then pad a struct holding an 8 byte and 4 byte value to
16 bytes with 4 bytes of padding. When copying one of these structs the
compiler is free to copy the padding if it wishes.

In this case the padding may contain kernel data that is then leaked to
userspace. Fix this by copying the timespec elements rather than the
entire struct.

This doesn't affect Tier-1 architectures so no SA is expected.

admbugs:	651
MFC after:	1 week
Sponsored by:	DARPA, AFRL
2019-10-16 13:21:01 +00:00
kp
038f82f772 Generalize ARM specific comments in devmap
The comments in devmap are very ARM specific, this generalizes them for other
architectures.

Submitted by:	Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by:	manu, philip
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D22035
2019-10-15 23:21:52 +00:00
glebius
9101a0b1d1 Missing from r353596. 2019-10-15 21:32:38 +00:00
glebius
072472d2fc When assertion for a thread not being in an epoch fails also print all
entered epochs. Works with EPOCH_TRACE only.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D22017
2019-10-15 21:24:25 +00:00
glebius
7361293b96 Remove pfctlinput2(). It came from KAME and had never ever been in use. 2019-10-15 15:40:03 +00:00
jeff
e249e932a5 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
jeff
51ed6c3ace (1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21548
2019-10-15 03:35:11 +00:00
luporl
57d28447c8 [PPC64] Initial kernel minidump implementation
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D21551
2019-10-14 13:04:04 +00:00
glebius
187eb910e1 Since EPOCH_TRACE had been moved to opt_global.h, we don't need to waste
extra space in struct thread.
2019-10-14 04:17:56 +00:00
mjg
7cb37ce311 vfs: add MNTK_NOMSYNC
On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:40:34 +00:00
mjg
c576b0223d vfs: return free vnode batches in sync instead of vfs_msync
It is a more natural fit. vfs_msync only deals with active vnodes.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22008
2019-10-13 15:39:11 +00:00
mav
d514a0e81a Allocate device softc from the device domain.
Since we are trying to bind device interrupt threads to the device domain,
it should have sense to make memory often accessed by them local. If domain
is not known, fall back to round-robin.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-10-12 19:03:07 +00:00
kp
9f3c88db10 mountroot: run statfs after mounting devfs
The usual flow for mounting a file system is to VFS_MOUNT() and then
immediately VFS_STATFS().

That's not done in vfs_mountroot_devfs(), which means the
mp->mnt_stat.f_iosize field is not correctly populated, which in turn
causes us to mark valid aio operations as unsafe (because the io size is
set to 0), ultimately causing the aio_test:md_waitcomplete test to fail.

Reviewed by:	mckusick
MFC after:	1 week
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D21897
2019-10-11 17:04:38 +00:00
cem
43181b339c ddb: Add CSV option, sorting to 'show (malloc|uma)'
Add /i option for machine-parseable CSV output.  This allows ready copy/
pasting into more sophisticated tooling outside of DDB.

Add total zone size ("Memory Use") as a new column for UMA.

For both, sort the displayed list on size (print the largest zones/types
first).  This is handy for quickly diagnosing "where has my memory gone?" at
a high level.

Submitted by:	Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by:	Dell EMC Isilon
2019-10-11 01:31:31 +00:00
jhb
56a61b7cc2 Don't free the cursor boundary tag during vmem_destroy().
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone.  Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21953
2019-10-09 21:20:39 +00:00
glebius
9618aadcd8 Cleanup unneeded includes that crept in with r353292. 2019-10-09 16:59:42 +00:00
glebius
7299f8c33d Enter network epoch in domain callouts. 2019-10-09 16:21:05 +00:00
markj
bb579d181e Fix handling of empty SCM_RIGHTS messages.
As unp_internalize() processes the input control messages, it builds
an output mbuf chain containing the internalized representations of
those messages.  In one special case, that of an empty SCM_RIGHTS
message, the message is simply discarded.  However, the loop which
appends mbufs to the output chain assumed that each iteration would
produce an mbuf, resulting in a null pointer dereference if an empty
SCM_RIGHTS message was followed by a non-empty message.

Fix this by advancing the output mbuf chain tail pointer only if an
internalized control message was produced.

Reported by:	syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-08 23:34:48 +00:00
jhb
02e5a4c53c Add a TOE KTLS mode and a TOE hook for allocating TLS sessions.
This adds the glue to allocate TLS sessions and invokes it from
the TLS enable socket option handler.  This also adds some counters
for active TOE sessions.

The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when
TOE KTLS is in use on a socket, but cannot be set via setsockopt().

To simplify various checks, a TLS session now includes an explicit
'mode' member set to the value returned by TLSTX_TLS_MODE.  Various
places that used to check 'sw_encrypt' against NULL to determine
software vs ifnet (NIC) TLS now check 'mode' instead.

Reviewed by:	np, gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21891
2019-10-08 21:34:06 +00:00
dougm
918670a5ed Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
glebius
337378e04f Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
trasz
008d4a5775 Introduce stats(3), a flexible statistics gathering API.
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof).  Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs.  The stats(3) framework can be used in
both userspace and the kernel.

See the stats(3) manual page for details.

This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.

The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.

Reviewed by:	sef (earlier version)
Obtained from:	Netflix
Relnotes:	yes
Sponsored by:	Klara Inc, Netflix
Differential Revision:	https://reviews.freebsd.org/D20477
2019-10-07 19:05:05 +00:00
mjg
8ed5fe7c0d vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by:	jeff (previous version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:14:32 +00:00
kevans
d0d0375375 Remove the remnants of SI_CHEAPCLONE
SI_CHEAPCLONE was introduced in r66067 for use with cloned bpfs. It was
later also used in tty, tun, tap at points. The rough timeline for being
removed in each of these is as follows:

- r181690: bpf switched to use cdevpriv API by ed@
- r181905: ed@ rewrote the TTY later to be mpsafe
- r204464: kib@ removes it from tun/tap, declaring it unused

I've not yet been able to dig up any other consumers in the intervening 9
years. It is no longer set on any devices in the tree and leaves an
interesting situation in make_dev_sv where we're ok with the device already
being set SI_NAMED.
2019-10-05 21:52:06 +00:00
kevans
2f4d113ad6 kern_conf: fully initialize cloned devices with make_dev_args, too
Attempting to initialize si_drv{1,2} with mda_si_drv{1,2} does not work if
you are operating on cloned devices.

clone_create must be called prior to the make_dev* family to create/return
the device on the clonelist as needed. This device is later returned early
in newdev(), prior to si_drv{0,1,2} initialization.

This patch simply breaks out of the loop if we've found a device and
finishes init.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21904
2019-10-05 21:44:18 +00:00
mjg
28f9e44110 devfs: plug redundant bwillwrite avoidance
vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21905
2019-10-05 17:44:33 +00:00
vangyzen
580abaccd7 Add CTLFLAG_STATS to some vfs sysctl OIDs
Add CTLFLAG_STATS to the following OIDs:

vfs.altbufferflushes
vfs.recursiveflushes
vfs.barrierwrites
vfs.flushwithdeps
vfs.reassignbufcalls

Refer to r353111.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2019-10-04 21:43:43 +00:00
emaste
3913895c3d simplify path handling in sysctl_try_reclaim_vnode
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL.  Thus there is no need to increase the
buffer size here.

The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21876
2019-10-02 21:01:23 +00:00
markj
2b0a2a713e Use OBJT_PHYS VM objects for kernel modules.
OBJT_DEFAULT incurs some unnecessary overhead given that kernel module
pages cannot be paged out.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21862
2019-10-02 16:34:42 +00:00
markj
20c5c1bf19 Disallow fcntl(F_READAHEAD) when the vnode is not a regular file.
The mountpoint may not have defined an iosize parameter, so an attempt
to configure readahead on a device file can lead to a divide-by-zero
crash.

The sequential heuristic is not applied to I/O to or from device files,
and posix_fadvise(2) returns an error when v_type != VREG, so perform
the same check here.

Reported by:	syzbot+e4b682208761aa5bc53a@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21864
2019-10-02 15:45:49 +00:00
kevans
e3f43eab06 shm_open2(2): completely unbreak
kern_shm_open2(), since conception, completely fails to pass the mode along
to kern_shm_open(). This breaks most uses of it.

Add tests alongside this that actually check the mode of the returned
files.

PR:		240934 [pulseaudio breakage]
Reported by:	ler, Andrew Gierth [postgres breakage]
Diagnosed by:	Andrew Gierth (great catch)
Tested by:	ler, tmunro
Pointy hat to:	kevans
2019-10-02 02:37:34 +00:00
emaste
a079beb4de sysalls.master: remove superfluous ellipsis in comment
A single period is sufficient in this comment, and making this change
lets us find references to varargs syscalls by searching for ...
2019-10-01 17:05:21 +00:00
brooks
1f0b5f9a65 Restore the ability to set capenabled directly in syscalls.conf.
This fixes generation of cloudabi syscall tables broken in r340424.

Reviewed by:	kevans, emaste
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21821
2019-09-30 20:58:29 +00:00
kevans
0c6303d3e7 syscalls.master: consistency, move ); to newline (no functional change) 2019-09-30 13:26:16 +00:00
markj
d46f64003d Fix some problems with the SPARSE_MAPPING option in the kernel linker.
- Ensure that the end of the mapping passed to vm_page_wire() is
  page-aligned.  vm_page_wire() expects this.
- Wire pages before reading data into them.
- Apply protections specified in the segment descriptor using
  vm_map_protect() once relocation processing is done.
- On amd64, ensure that we load KLDs above KERNBASE, since they
  are compiled with the "kernel" memory model by default.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21756
2019-09-28 01:42:59 +00:00
gallatin
db240be427 kTLS: Fix a bug where we would not encrypt anon data inplace.
Software Kernel TLS needs to allocate a new destination crypto
buffer when encrypting data from the page cache, so as to avoid
overwriting shared clear-text file data with encrypted data
specific to a single socket. When the data is anonymous, eg, not
tied to a file, then we can encrypt in place and avoid allocating
a new page. This fixes a bug where the existing code always
assumes the data is private, and never encrypts in place. This
results in unneeded page allocations and potentially more memory
bandwidth consumption when doing socket writes.

When the code was written at Netflix, ktls_encrypt() looked at
private sendfile flags to determine if the pages being encrypted
where part of the page cache (coming from sendfile) or
anonymous (coming from sosend). This was broken internally at
Netflix when the sendfile flags were made private, and the
M_WRITABLE() check was added. Unfortunately, M_WRITABLE() will
always be false for M_NOMAP mbufs, since one cannot just mtod()
them.

This change introduces a new flags field to the mbuf_ext_pgs
struct by stealing a byte from the tls hdr. Note that the current
header is still 2 bytes larger than the largest header we
support: AES-CBC with explicit IV. We set MBUF_PEXT_FLAG_ANON
when creating an unmapped mbuf in m_uiotombuf_nomap() (which is
the path that socket writes take), and we check for that flag in
ktls_encrypt() when looking for anon pages.

Reviewed by:	jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21796
2019-09-27 20:08:19 +00:00
gallatin
74ca068530 kTLS support for TLS 1.3
TLS 1.3 requires a few changes because 1.3 pretends to be 1.2
with a record type of application data. The "real" record type is
then included at the end of the user-supplied plaintext
data. This required adding a field to the mbuf_ext_pgs struct to
save the record type, and passing the real record type to the
sw_encrypt() ktls backend functions.

Reviewed by:	jhb, hselasky
Sponsored by:	Netflix
Differential Revision:	D21801
2019-09-27 19:17:40 +00:00
mjg
99543c9107 cache: decrease ncnegfactor to 5
The current mechanism is bogus in several ways:
- the limit is a percentage of total entries added, which means negative
entries get evicted all the time even if there are plenty of resources
- evicting code is almost not concurrent, which makes it unable to
remove entries fast enough when doing something as simple as -j 104
buildworld
- there is no support for performing mass removal if necessary

Vast majority of negative entries never get any hits. Only evicting
them when the filesystem demands it results in a significant growth of
the namecache with almost no improvement in the hit ratio.

Sample result about afer 90 minutes of poudriere -j 104:

           current    no evict   % of the original
numneg     219737     2013157    916
numneghits 266711906  263544562  98 [1]

[1] this may look funny but there is a certain dose of variation to the
build

The number was chosen as something which mostly eliminates spurious
evictions during lighter workloads but still keeps the total at bay.

Sponsored by:	The FreeBSD Foundation
2019-09-27 19:14:03 +00:00
mjg
306794b4a0 cache: stop requeuing negative entries on the hot list
Turns out it does not improve hit ratio, but it does come with a cost
induces stemming from dirtying hit entries.

Sample result: hit counts of evicted entries after 2 buildworlds

before:

           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@                180865
               1 |@@@@@@@                                  49150
               2 |@@@                                      19067
               4 |@                                        9825
               8 |@                                        7340
              16 |@                                        5952
              32 |@                                        5243
              64 |@                                        4446
             128 |                                         3556
             256 |                                         3035
             512 |                                         1705
            1024 |                                         1078
            2048 |                                         365
            4096 |                                         95
            8192 |                                         34
           16384 |                                         26
           32768 |                                         23
           65536 |                                         8
          131072 |                                         6
          262144 |                                         0

after:
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@                184004
               1 |@@@@@@                                   47577
               2 |@@@                                      19446
               4 |@                                        10093
               8 |@                                        7470
              16 |@                                        5544
              32 |@                                        5475
              64 |@                                        5011
             128 |                                         3451
             256 |                                         3002
             512 |                                         1729
            1024 |                                         1086
            2048 |                                         363
            4096 |                                         86
            8192 |                                         26
           16384 |                                         25
           32768 |                                         24
           65536 |                                         7
          131072 |                                         5
          262144 |                                         0

Sponsored by:	The FreeBSD Foundation
2019-09-27 19:13:22 +00:00
mjg
a67ed57908 cache: make negative list shrinking a little bit concurrent
Continue protecting demotion from the hotlist and selection of the
target list with the ncneg_shrink_lock lock, but drop it before
relocking to zap the node.

While here count how many times we skipped shrinking due to the lock
being already taken.

Sponsored by:	The FreeBSD Foundation
2019-09-27 19:12:43 +00:00
mjg
0c1eba6ea2 cache: stop recalculating upper limit each time a new entry is added
Sponsored by:	The FreeBSD Foundation
2019-09-27 19:12:20 +00:00
kib
957270782d Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap().  MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated.  According to POSIX description for
mmap(2):
   The system shall always zero-fill any partial page at the end of an
   object. Further, the system shall never write out any modified
   portions of the last page of an object which are beyond its
   end. References within the address range starting at pa and
   continuing for len bytes to whole pages following the end of an
   object shall result in delivery of a SIGBUS signal.

   An implementation may generate SIGBUS signals when a reference
   would cause an error in the mapped object, such as out-of-space
   condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered.  Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault().  Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos.  Removed
unneeded truncations of the fault addresses reported by hardware.

PR:	211924
Reviewed by:	alc
Discussed with:	jilles, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21566
2019-09-27 18:43:36 +00:00
andrew
fc91d16ffa Check the vfs option length is valid before accessing through
When a VFS option passed to nmount is present but NULL the kernel will
place an empty option in its internal list. This will have a NULL
pointer and a length of 0. When we come to read one of these the kernel
will try to load from the last address of virtual memory. This is
normally invalid so will fault resulting in a kernel panic.

Fix this by checking if the length is valid before dereferencing.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2019-09-27 16:22:28 +00:00
dab
c7fb4709b2 sysent: regenerate after r352747.
Sponsored by:	Dell EMC Isilon
2019-09-26 15:41:10 +00:00
markj
e97c4bdc7f Fix handling of invalid pages in exec_map_first_page().
exec_map_first_page() would unconditionally free an unbacked, invalid
page from the executable image.  However, it is possible that the page
is wired, in which case it is incorrect to free the page, so check for
additional wirings first.

Reported by:	syzkaller
Tested by:	pho
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21767
2019-09-26 15:35:35 +00:00
dab
edad331b44 Add an shm_rename syscall
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.

truss support is included; audit support will come later.

This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	jilles (earlier revision)
Reviewed by:	brueffer (manpages, earlier revision)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21423
2019-09-26 15:32:28 +00:00
tsoome
64ad58b605 kernel terminal should initialize fg and bg variables before calling TUNABLE_INT_FETCH
We have two ways to check if kenv variable exists - either we check return
value from TUNABLE_INT_FETCH, or we pre-initialize the variable and check
if this value did change. In terminal_init() it is more convinient to
use pre-initialized variables.

Problem was revealed by older loader.efi, which did not set teken.* variables.

Reported by:	tuexen
2019-09-26 07:19:26 +00:00
mav
19cf804564 Microoptimize sched_pickcpu() CPU affinity on SMT.
Use of CPU_FFS() to implement CPUSET_FOREACH() allows to save up to ~0.5%
of CPU time on 72-thread SMT system doing 80K IOPS to NVMe from one thread.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-09-26 00:35:06 +00:00