Commit Graph

18640 Commits

Author SHA1 Message Date
Konstantin Belousov
dc2d0899bb kern_sig.c: Remove unused SIGPROP_CANTMASK
Reviewed by:	imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32313
2021-10-08 03:21:42 +03:00
Mark Johnston
880b670c6f malloc: Unmark KASAN redzones if the full allocation size was requested
Consumers that want the full allocation size will typically access the
full buffer, so mark the entire allocation as valid to avoid useless
KASAN reports.

Sponsored by:	The FreeBSD Foundation
2021-10-06 16:09:41 -04:00
Konstantin Belousov
9b86d3e5de When queuing ignored signal, only abort target thread' sleep if it is inside sigwait()
Reported and tested by:	trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32252
2021-10-06 17:05:22 +03:00
Konstantin Belousov
f17eb93d55 When sending ignored signal, arrange for zero return code from sleep
Otherwise consumers get unexpected EINTR errors without seeing
a properly discarded signal.

Reported and tested by:	trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32252
2021-10-06 17:05:22 +03:00
Konstantin Belousov
b599982b65 Move td_pflags2 TDP2_SIGWAIT to td_flags TDF_SIGWAIT
The flag should be accessible from non-current threads.

Reviewed by:	markj
Tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32252
2021-10-06 17:05:22 +03:00
Alexander Motin
7835b2cb4a sbuf(9): Microoptimize sbuf_put_byte()
This function is actively used by sbuf_vprintf(), so this simple
inlining in half reduces time of kern.geom.confxml generation.

MFC after:	2 weeks
Sponsored by:	iXsystem, Inc.
2021-10-05 14:47:38 -04:00
Alexander Motin
6df1359e55 sleepqueue(9): Remove sbinuptime() from sleepq_timeout().
Callout c_time is always bigger or equal than the scheduled time.  It
is also smaller than sbinuptime() and can't change while the callback
is running.  So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.

According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.

MFC after:	1 month
2021-10-02 21:08:41 -04:00
Alexander Motin
1c119e173d sched_ule(4): Fix possible significance loss.
Before this change kern.sched.interact sysctl setting above 32 gave
all interactive threads identical priority of PRI_MIN_INTERACT due to
((PRI_MAX_INTERACT - PRI_MIN_INTERACT + 1) / sched_interact) turning
zero.  Setting the sysctl lower reduced the range of used priority
levels up to half, that is not great either.

Change of the operations order should fix the issue, always using full
range of priorities, while overflow is impossible there since both
score and priority values are small.  While there, make the variables
unsigned as they really are.

MFC after:	1 month
2021-10-02 00:09:45 -04:00
Mateusz Guzik
c9536389d7 vfs: hoist cn_thread assert in namei
Making it condtional on whether ktrace happens to be enabled makes no
sense.
2021-10-01 21:56:29 +00:00
Gleb Smirnoff
a37e4fd1ea Re-style dfcef87714 to keep the code and variables related to
listening sockets separated from code for generic sockets.

No objection:	markj
2021-10-01 13:38:24 -07:00
Justin Hibbits
63cb9308a7 Fix segment size in compressing core dumps
A core segment is bounded in size only by memory size.  On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core.  Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.

This dates back to the original refactor back in 2015 (r279801 /
aa14e9b7).

MFC after:	1 week
Sponsored by:	Juniper Networks, Inc.
2021-10-01 14:16:33 -05:00
Kyle Evans
2f4dbe279f kqueue: fix recent assertion
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past.  A test
has been added for this one as well.

Fixes:	9c999a259f ("kqueue: don't arbitrarily restrict long-past...")
Point hat:	kevans
Reported by:	syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
2021-10-01 13:17:30 -05:00
Kyle Evans
9c999a259f kqueue: don't arbitrarily restrict long-past values for NOTE_ABSTIME
NOTE_ABSTIME values are converted to values relative to boottime in
filt_timervalidate(), and negative values are currently rejected.  We
don't reject times in the past in general, so clamp this up to 0 as
needed such that the timer fires immediately rather than imposing what
looks like an arbitrary restriction.

Another possible scenario is that the system clock had to be adjusted
by ~minutes or ~hours and we have less than that in terms of uptime,
making a reasonable short-timeout suddenly invalid. Firing it is still
a valid choice in this scenario so that applications can at least
expect a consistent behavior.

Reviewed by:	kib, markj
Discussed with:	allanjude
Differential Revision:	https://reviews.freebsd.org/D32230
2021-09-30 21:31:24 -05:00
Mateusz Guzik
85c855d31b fd: add pwd_hold_proc 2021-09-30 12:49:51 +02:00
Mitchell Horne
ab4ed843a3 minidump: De-duplicate the progress bar
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31885
2021-09-29 16:42:21 -03:00
Jamie Gritton
747a47261e Fix error return of kern.ipc.posix_shm_list, which caused it (and thus
"posixshmcontrol ls") to fail for all jails that didn't happen to own
the last shm object in the list.
2021-09-29 10:20:36 -07:00
Mitchell Horne
800e74955d boot(9): update to match reality
This function was renamed to kern_reboot() in 2010, but the man page has
failed to keep in sync. Bring it up to date on the rename, add the
shutdown hooks to the synopsis, and document the (obvious) fact that
kern_reboot() does not return.

Fix an outdated reference to the old name in kern_reboot(), and leave a
reference to the man page so future readers might find it before any
large changes.

Reviewed by:	imp, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32085
2021-09-28 11:36:09 -03:00
Kirk McKusick
1c8d670cb6 Bring the tags and links entries for amd64 up to date.
MFC after:    1 week
Sponsored by: Netflix
2021-09-27 20:04:51 -07:00
Elliott Mitchell
bcddaadbef rman: fix overflow in rman_reserve_resource_bound()
If the default range of [0, ~0] is given, then (~0 - 0) + 1 == 0. This
in turn will cause any allocation of non-zero size to fail. Zero-sized
allocations are prohibited, so add a KASSERT to this effect.

History indicates it is part of the original rman code.  This bug may in
fact be older than some contributors.

Reviewed by:	mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30280
2021-09-27 14:38:26 -03:00
Alexander Motin
08063e9f98 sched_ule(4): Fix hang with steal_thresh < 2.
e745d729be caused infinite loop with interrupts disabled in load
stealing code if steal_thresh set below 2.  Such configuration should
not generally be used, but appeared some people are using it to
workaround some problems.

To fix the problem explicitly pass to sched_highest() minimum number
of transferrable threads, supported by the caller, instead of guessing.

MFC after:	25 days
2021-09-26 12:03:05 -04:00
Gordon Bergling
8771ff7538 jail(9): Fix a typo in a comment
- s/erorr/error/

MFC after:	3 days
2021-09-26 15:17:41 +02:00
Yoshihiro Ota
cb17f4a6bd kern_ctf: Use zlib's uncompress function for simpler code.
Reviewed by:	markj, delphij
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D21531
2021-09-25 23:33:00 -07:00
Mateusz Guzik
d71e1a883c fifo: support flock
This evens it up with Linux.

Original patch by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D24255#565302
2021-09-25 14:58:31 +00:00
Konstantin Belousov
71d31f1cf6 malloc_aligned(9): allow zero size and alignment
For alignment we do not need to do anything to make it operational.
For size, upgrade zero sized request to one byte so that we do not
request insane amount of memory for placeholder.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32127
2021-09-25 15:58:12 +03:00
Gordon Bergling
2aad906266 ubsan: Fix a typo in an error message
- s/asumption/assumption/

Obtained from:	NetBSD
MFC after:	1 week
2021-09-25 11:47:24 +02:00
Alexander Motin
d3a8f98acb Make CPU children explicitly share parent unit numbers.
Before this device unit number match was coincidental and broke if I
disabled some CPU device(s).  Aside of cosmetics, for some drivers
(may be considered broken) it caused talking to wrong CPUs.
2021-09-24 23:31:51 -04:00
Alexander Motin
f73c2bbf81 bus: Cleanup device_probe_child()
When device driver probe method returns 0, i.e. absolute priority, do
not remove its class from the device just to set it back few lines
later, that may change the device unit number, etc. and after which
we'd better call the probe again.

If during search we found some driver with absolute priority, we do
not need to set device driver and class since we haven't removed them
before.

It should not happen, but if second probe method call failed, remove
the driver and possibly the class from the device as it was when we
started.

Reviewed by:	imp, jhb
Differential Revision:	https://reviews.freebsd.org/D32125
2021-09-24 20:34:56 -04:00
Warner Losh
67a9e76da6 bus: Fix LINT / BUS_DEBUG build
Fix 0389e9be63 for LINT built. Removed an arg only from code
under BUS_DEBUG w/o rebuilding LINT...

Sponsored by:		Netflix
Fixes: 0389e9be63
2021-09-24 14:04:39 -06:00
Warner Losh
0389e9be63 bus: retire DF_REBID
I did DF_REBID to allow for 'hoover' drivers that would attach to
otherwise unattached devices in the tree. This notion didn't catch on as
it was tricky to make work well and it was easier to just publish a /dev
node of some flavor by the parent device. It's been nothing but dead
weight for a long time.

Reviewed by:		mav
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D32056
2021-09-24 12:15:34 -06:00
Nathaniel Wesley Filardo
0321a7990b kqueue: Add EV_KEEPUDATA flag
When this flag is set, operations that update an existing kevent will
not change the udata field.  This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.

Reviewed by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from:	CheriBSD
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D30286
2021-09-23 17:31:39 -07:00
Konstantin Belousov
45c2c7c484 aio_aqueue(): avoid ucred leak on failure path
PR:	258698
Submitted by:	sigsys@gmail.com
MFC after:	1 week
2021-09-24 03:18:34 +03:00
Alexander Motin
ef50d5fbc3 x86: Add NUMA nodes into CPU topology.
Depending on hardware, NUMA nodes may match last level caches, or
they may be above them (AMD Zen 2/3) or below (Intel Xeon w/ SNC).
This information is provided by ACPI instead of CPUID, and it is
provided for each CPU individually instead of mask widths, but
this code should be able to properly handle all the above cases.

This change should immediately allow idle stealing in sched_ule(4)
to prefer load from NUMA-local CPUs to remote ones when the node
does not match LLC.  Later we may think of how to better handle it
on sched_pickcpu() side.

MFC after:	1 month
2021-09-23 14:31:38 -04:00
Wojciech Macek
7bc13692a2 hwpmc: fix performance issues
Differential revision:	https://reviews.freebsd.org/D32025

Avoid using atomics as it_wait is guarded by td_lock.

Report threshold calculation is done only if at least one PMC hook
is installed

Fixes:
* avoid unnecessary branching (if frame != null ...)
  by having PMC_HOOK_INSTALLED_ANY
  condition on the top of them, which should hint
  the core not to execute speculatively anything
  which us underneath;
* access intr_hwpmc_waiting_report_threshold cacheline
  only if at least one hook is loaded;
2021-09-23 07:15:42 +02:00
Alexander Motin
884f38590c Fix false device_set_unit() error.
It should silently succeed if the current unit number is the same as
requested, not fail immediately.

MFC after:	1 week
2021-09-22 08:44:39 -04:00
Alexander Motin
8db1669959 Fix build without SMP.
MFC after:	1 month
2021-09-21 22:13:33 -04:00
Alexander Motin
e745d729be sched_ule(4): Improve long-term load balancer.
Before this change long-term load balancer was unable to migrate
running threads, only ones waiting on run queues.  But with growing
number of CPU cores it is quite typical now for system to not have
many waiting threads.  But same time if due to some coincidence two
long-running CPU-bound threads ended up sharing same physical CPU
core, they could suffer from the SMT penalty indefinitely, and the
load balancer couldn't help.

Improve that by teaching the load balancer to hint running threads
to migrate by marking them with TDF_NEEDRESCHED and new TDF_PICKCPU
flag, making sched_pickcpu() to search for better CPU later, when
it is convenient.

Fix CPU search logic when balancing to limit round-robin migrations
in case of almost equal load to the group of physical cores.  The
previous code bounced threads across all the system, that should be
pretty bad for caches and NUMA affinity, while additional fairness
was almost invisible, diminishing with number of cores in the group.

MFC after:	1 month
2021-09-21 18:19:20 -04:00
Konstantin Belousov
397f188936 Remove SV_CAPSICUM
It was only needed for cloudabi

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31923
2021-09-22 00:18:44 +03:00
Konstantin Belousov
cf0ee8738e Drop cloudabi
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.

There is no reason to keep it in FreeBSD.

Approved by:	ed (private mail)
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31923
2021-09-22 00:18:44 +03:00
Alexander Motin
bd84094a51 sched_ule(4): Fix interactive threads stealing.
In scenarios when first thread in the queue can migrate to specified
CPU, but later ones can't runq_steal_from() incorrectly returned NULL.

MFC after:	2 weeks
2021-09-21 16:03:32 -04:00
Konstantin Belousov
bd9e0f5df6 amd64: eliminate td_md.md_fpu_scratch
For signal send, copyout from the user FPU save area directly.

For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area.  We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.

Requested by:	jhb
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
df8dd6025a amd64: stop using top of the thread' kernel stack for FPU user save area
Instead do one more allocation at the thread creation time.  This frees
a lot of space on the stack.

Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn().  This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.

A useful experiment now would be to reduce KSTACK_PAGES.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
2933a7ca03 aio_fsync_vnode: handle ERELOOKUP after VOP_FSYNC()
Reported by:	tmunro
Reviewed by:	jhb, tmunro
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32023
2021-09-20 21:40:17 +03:00
Konstantin Belousov
922bee44e4 aio_fsync_vnode: use for(;;) loop instead of label
Reviewed by:	jhb, tmunro
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32023
2021-09-20 21:39:46 +03:00
Bartlomiej Grzesik
3f9a00e3b5 device: add device_get_property and device_has_property
Generialize bus specific property accessors. Those functions allow driver code
to access device specific information.

Currently there is only support for FDT and ACPI buses.

Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31597
2021-09-20 17:17:57 +02:00
Mateusz Guzik
7b2ac8eb9b vfs: add missing VIRF_MOUNTPOINT in vfs_mountroot_shuffle
Reported by:	mav
2021-09-18 21:13:51 +02:00
Mateusz Guzik
0d9e99ce3b vfs: add the missing vnode interlock in vfs_mountroot_shuffle
Around v_mountedhere assignment.
2021-09-18 21:13:51 +02:00
Mark Johnston
50b07c1f71 unix: Fix a use-after-free in unp_drop()
We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.

This previously went unnoticed as the socket zone was _NOFREE.

Reported by:	pho
MFC after:	1 week
2021-09-18 10:38:39 -04:00
Mateusz Guzik
f902e4bb04 lockmgr: fix lock profiling of face adaptive spinning 2021-09-18 10:16:58 +00:00
Mateusz Guzik
a2cb65b8fe cache: count vnodes in cache_purgevfs 2021-09-18 10:16:50 +00:00
Mateusz Guzik
5d8e32a66c vfs: retire VNODE_REFCOUNT_FENCE_* macros
They are unused as of last year.
2021-09-18 10:16:00 +00:00