Commit Graph

18354 Commits

Author SHA1 Message Date
Mateusz Guzik
a269183875 vfs: elide vnode locking when it is only needed for audit if possible 2021-05-23 19:37:16 +00:00
Mark Johnston
6f6cd1e8e8 ktrace: Remove vrele() at the end of ktr_writerequest()
As of commit fc369a353 we no longer ref the vnode when writing a record.
Drop the corresponding vrele() call in the error case.

Fixes:	fc369a353 ("ktrace: fix a race between writes and close")
Reported by:	syzbot+9b96ea7a5ff8917d3fe4@syzkaller.appspotmail.com
Reported by:	syzbot+6120ebbb354cd52e5107@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	6 days
Differential Revision:	https://reviews.freebsd.org/D30404
2021-05-23 14:13:01 -04:00
Mateusz Guzik
e2ab16b1a6 lockprof: move panic check after inspecting the state 2021-05-23 17:55:27 +00:00
Mateusz Guzik
6a467cc5e1 lockprof: pass lock type as an argument instead of reading the spin flag 2021-05-23 17:55:27 +00:00
Hans Petter Selasky
ef0f7ae934 The old thread priority must be stored as part of the EPOCH(9) tracker.
Else recursive use of EPOCH(9) may cause the wrong priority to be restored.

Bump the __FreeBSD_version due to changing the thread and epoch tracker
structure.

Differential Revision:	https://reviews.freebsd.org/D30375
Reviewed by:	markj@
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-23 10:53:25 +02:00
Mateusz Guzik
138f78e94b umtx: convert umtxq_lock to a macro
Then LOCK_PROFILING starts reporting callers instead of the inline.
2021-05-22 21:01:05 +00:00
Mateusz Guzik
e71d5c7331 Fix limit testing after 1762f674cc ktrace commit.
The previous:

if ((uoff_t)uio->uio_offset + uio->uio_resid > lim)
	signal(....);

was replaced with:

if ((uoff_t)uio->uio_offset + uio->uio_resid < lim)
	return;
signal(....);

Making (uoff_t)uio->uio_offset + uio->uio_resid == lim trip over the
limit, when it did not previously.

Unbreaks running 13.0 buildworld.
2021-05-22 20:18:21 +00:00
Konstantin Belousov
fc369a353b ktrace: fix a race between writes and close
It was possible that termination of ktrace session occured during some
record write, in which case write occured after the close of the vnode.
Use ktr_io_params refcounting to avoid this situation, by taking the
reference on the structure instead of vnode.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30400
2021-05-22 23:14:13 +03:00
Mateusz Guzik
48235c377f Fix a braino in previous.
Instead of trying to partially ifdef out ktrace handling, define the
missing identifier to 0. Without this fix lack of ktrace in the kernel
also means there is no SIGXFSZ signal delivery.
2021-05-22 19:53:40 +00:00
Mateusz Guzik
154f0ecc10 Fix tinderbox build after 1762f674cc ktrace commit. 2021-05-22 19:41:19 +00:00
Mateusz Guzik
a0842e69aa lockprof: add contested-only profiling
This allows tracking all wait times with much smaller runtime impact.

For example when doing -j 104 buildkernel on tmpfs:

no profiling:	2921.70s user 282.72s system 6598% cpu 48.562 total
all acquires:	2926.87s user 350.53s system 6656% cpu 49.237 total
contested only:	2919.64s user 290.31s system 6583% cpu 48.756 total
2021-05-22 19:28:37 +00:00
Mateusz Guzik
fca5cfd584 lockprof: retire lock_prof_skipcount
The implementation uses a global variable for *ALL* calls, defeating the
point of sampling in the first place. Remove it as it clearly remains
unused.
2021-05-22 19:28:37 +00:00
Mateusz Guzik
cf74b2be53 vfs: retire the now unused vnlru_free routine 2021-05-22 18:42:30 +00:00
Mark Johnston
e4b16f2fb1 ktrace: Avoid recursion in namei()
sys_ktrace() calls namei(), which may call ktrnamei().  But sys_ktrace()
also calls ktrace_enter() first, so if the caller is itself being
traced, the assertion in ktrace_enter() is triggered.  And, ktrnamei()
does not check for recursion like most other ktrace ops do.

Fix the bug by simply deferring the ktrace_enter() call.

Also make the parameter to ktrnamei() const and convert to ANSI.

Reported by:	syzbot+d0a4de45e58d3c08af4b@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30340
2021-05-22 12:07:32 -04:00
Konstantin Belousov
f784da883f Move mnt_maxsymlinklen into appropriate fs mount data structures
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-Note:	struct mount layout
Differential revision:	https://reviews.freebsd.org/D30325
2021-05-22 15:16:09 +03:00
Konstantin Belousov
ea2b64c241 ktrace: add a kern.ktrace.filesize_limit_signal knob
When enabled, writes to ktrace.out that exceed the max file size limit
cause SIGXFSZ as it should be, but note that the limit is taken from
the process that initiated ktrace.   When disabled, write is blocked,
but signal is not send.

Note that in either case ktrace for the affected process is stopped.

Requested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:09 +03:00
Konstantin Belousov
02645b886b ktrace: use the limit of the trace initiator for file size limit on writes
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:09 +03:00
Konstantin Belousov
1762f674cc ktrace: pack all ktrace parameters into allocated structure ktr_io_params
Ref-count the ktr_io_params structure instead of vnode/cred.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
a6144f713c ktrace: do not stop tracing other processes if our cannot write to this vnode
Other processes might still be able to write, make the decision to stop
based on the per-process situation.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
9bb84c23e7 accounting: explicitly mark the exiting thread as doing accounting
and use the mark to stop applying file size limits on the write of
the accounting record.  This allows to remove hack to clear process
limits in acct_process(), and avoids the bug with the clearing being
ineffective because limits are also cached in the thread structure.

Reported and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
70c05850e2 kern_descrip.c: Style
Wrap too long lines.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
d713bf7927 vn_need_pageq_flush(): simplify
There is no need to own vnode interlock, since v_object is type stable
and can only change to/from NULL, and no other checks in the function
access fields protected by the interlock.  Remove the need variable, the
result of the test is directly usable as return value.

Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-22 12:29:44 +03:00
Edward Tomasz Napierala
33621dfc19 Refactor core dumping code a bit
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code.  Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30369
2021-05-22 09:59:00 +01:00
Mark Johnston
916c61a5ed Fix handling of errors from pru_send(PRUS_NOTREADY)
PRUS_NOTREADY indicates that the caller has not yet populated the chain
with data, and so it is not ready for transmission.  This is used by
sendfile (for async I/O) and KTLS (for encryption).  In particular, if
pru_send returns an error, the caller is responsible for freeing the
chain since other implicit references to the data buffers exist.

For async sendfile, it happens that an error will only be returned if
the connection was dropped, in which case tcp_usr_ready() will handle
freeing the chain.  But since KTLS can be used in conjunction with the
regular socket I/O system calls, many more error cases - which do not
result in the connection being dropped - are reachable.  In these cases,
KTLS was effectively assuming success.

So:
- Change sosend_generic() to free the mbuf chain if
  pru_send(PRUS_NOTREADY) fails.  Nothing else owns a reference to the
  chain at that point.
- Similarly, in vn_sendfile() change the !async I/O && KTLS case to free
  the chain.
- If async I/O is still outstanding when pru_send fails in
  vn_sendfile(), set an error in the sfio structure so that the
  connection is aborted and the mbuf chain is freed.

Reviewed by:	gallatin, tuexen
Discussed with:	jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30349
2021-05-21 17:45:19 -04:00
Hans Petter Selasky
c82c200622 Accessing the epoch structure should happen after the INIT_CHECK().
Else the epoch pointer may be NULL.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Hans Petter Selasky
f33168351b Properly define EPOCH(9) function macro.
No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Hans Petter Selasky
cc9bb7a9b8 Rework for-loop in EPOCH(9) to reduce indentation level.
No functional change intended.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-05-21 11:21:32 +02:00
Lv Yunlong
b295c5ddce socket: Release cred reference later in sodealloc()
We dereference so->so_cred to update the per-uid socket buffer
accounting, so the crfree() call must be deferred until after that
point.

PR:		255869
MFC after:	1 week
2021-05-18 15:25:40 -04:00
Konstantin Belousov
8cf912b017 ttydev_write: prevent stops while terminal is busied
Since busy state is checked by all blocked writes, stopping a process
which waits in ttydisc_write() causes cascade.  Utilize sigdeferstop()
to avoid the issue.

Submitted by:	Jakub Piecuch <j.piecuch96@gmail.com>
PR:	255816
MFC after:	1 week
2021-05-18 20:52:03 +03:00
Mateusz Guzik
cc6f46ac2f vfs: refactor vdrop
In particular move vunlazy into its own routine.
2021-05-18 15:30:28 +00:00
Mateusz Guzik
715fcc0d34 vfs: change vn_freevnodes_* prefix to idiomatic vfs_freevnodes_* 2021-05-18 15:30:28 +00:00
Colin Percival
b6be9566d2 Fix buffer overflow in preloaded hostuuid cleaning
When a module of type "hostuuid" is provided by the loader,
prison0_init strips any trailing whitespace and ASCII control
characters by (a) adjusting the buffer length, and (b) zeroing out
the characters in question, before storing it as the system's
hostuuid.

The buffer length adjustment was correct, but the zeroing overwrote
one byte higher in memory than intended -- in the typical case,
zeroing one byte past the end of the hostuuid buffer.  Due to the
layout of buffers passed by the boot loader to the kernel, this will
be the first byte of a subsequent buffer.

This was *probably* harmless; prison0_init runs after preloaded kernel
modules have been linked and after the preloaded /boot/entropy cache
has been processed, so in both cases having the first byte overwritten
will not cause problems.  We cannot however rule out the possibility
that other objects which are preloaded by the loader could suffer from
having the first byte overwritten.

Since the zeroing does not in fact serve any purpose, remove it and
trim trailing whitespace and ASCII control characters by adjusting
the buffer length alone.

Fixes:		c3188289 Preload hostuuid for early-boot use
Reviewed by:	kevans, markj
MFC after:	3 days
2021-05-17 20:07:49 -07:00
Colin Percival
330f110bf1 Fix 'hostuuid: preload data malformed' warning
If the preloaded hostuuid value is invalid and verbose booting is
enabled, a warning is printed.  This printf had two bugs:

1. It was missing a trailing \n character.
2. The malformed UUID is printed with %s even though it is not known
to be NUL-terminated.

This commit adds the missing \n and uses %.*s with the (already known)
length of the preloaded UUID to ensure that we don't read past the end
of the buffer.

Reported by:	kevans
Fixes:		c3188289 Preload hostuuid for early-boot use
MFC after:	3 days
2021-05-17 20:07:49 -07:00
Kirk McKusick
9a2fac6ba6 Fix handling of embedded symbolic links (and history lesson).
The original filesystem release (4.2BSD) had no embedded sysmlinks.
Historically symbolic links were just a different type of file, so
the content of the symbolic link was contained in a single disk block
fragment. We observed that most symbolic links were short enough that
they could fit in the area of the inode that normally holds the block
pointers. So we created embedded symlinks where the content of the
link was held in the inode's pointer area thus avoiding the need to
seek and read a data fragment and reducing the pressure on the block
cache. At the time we had only UFS1 with 32-bit block pointers,
so the test for a fastlink was:

	di_size < (NDADDR + NIADDR) * sizeof(daddr_t)

(where daddr_t would be ufs1_daddr_t today).

When embedded symlinks were added, a spare field in the superblock
with a known zero value became fs_maxsymlinklen. New filesystems
set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded
symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus
filesystems that preceeded this change always read from blocks
(since fs->fs_maxsymlinklen == 0) and newer ones used embedded
symlinks if they fit. Similarly symlinks created on pre-embedded
symlink filesystems always spill into blocks while newer ones will
embed if they fit.

At the same time that the embedded symbolic links were added, the
on-disk directory structure was changed splitting the former
u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen.
Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can
be used to distinguish old directory formats. In retrospect that
should have just been an added flag, but we did not realize we
needed to know about that change until it was already in production.

Code was split into ufs/ffs so that the log structured filesystem could
use ufs functionality while doing its own disk layout. This meant
that no ffs superblock fields could be used in the ufs code. Thus
ffs superblock fields that were needed in ufs code had to be copied
to fields in the mount structure. Since ufs_readlink needed to know
if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen.

The kernel panic that arose to making this fix was triggered when a
disk error created an inode of type symlink with no allocated data
blocks but a large size. When readlink was called the uiomove was
attempted which segment faulted.

static int
ufs_readlink(ap)
	struct vop_readlink_args /* {
		struct vnode *a_vp;
		struct uio *a_uio;
		struct ucred *a_cred;
	} */ *ap;
{
	struct vnode *vp = ap->a_vp;
	struct inode *ip = VTOI(vp);
	doff_t isize;

	isize = ip->i_size;
	if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
	    DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
		return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
	}
	return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
}

The second part of the "if" statement that adds

	DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */

is problematic. It never appeared in BSD released by Berkeley because
as noted above mnt_maxsymlinklen is 0 for old format filesystems, so
will always fall through to the VOP_READ as it should. I had to dig
back through `git blame' to find that Rodney Grimes added it as
part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.''
He must have brought it across from an earlier FreeBSD. Unfortunately
the source-control logs for FreeBSD up to the merger with the
AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the
agreement to let FreeBSD remain unencumbered, so I cannot pin-point
where that line got added on the FreeBSD side.

The one change needed here is that mnt_maxsymlinklen is declared as
an `int' and should be changed to be `u_int64_t'.

This discovery led us to check out the code that deletes symbolic
links. Specifically

	if (vp->v_type == VLNK &&
	    (ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
	     datablocks == 0)) {
		if (length != 0)
			panic("ffs_truncate: partial truncate of symlink");
		bzero(SHORTLINK(ip), (u_int)ip->i_size);
		ip->i_size = 0;
		DIP_SET(ip, i_size, 0);
		UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE);
		if (needextclean)
			goto extclean;
		return (ffs_update(vp, waitforupdate));
	}

Here too our broken symlink inode with no data blocks allocated
and a large size will segment fault as we are incorrectly using the
test that we have no data blocks to decide that it is an embdedded
symbolic link and attempting to bzero past the end of the inode.
The test for datablocks == 0 is unnecessary as the test for
ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right
thing in all cases.

The test for datablocks == 0 was added by David Greenman in this commit:

Author: David Greenman <dg@FreeBSD.org>
Date:   Tue Aug 2 13:51:05 1994 +0000

    Completed (hopefully) the kernel support for old style "fastlinks".

    Notes:
	svn path=/head/; revision=1821

I am guessing that he likely earlier added the incorrect test in the
ufs_readlink code.

I asked David if he had any recollection of why he made this change.
Amazingly, he still had a recollection of why he had made a one-line
change more than twenty years ago. And unsurpisingly it was because
he had been stuck between a rock and a hard place.

FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code
base. Prior to that, there were three years of development in all
areas of the kernel, including the filesystem code, from the combined
set of people including Bill Jolitz, Patchkit contributors, and
FreeBSD Project members. The compatibility issue at hand was caused
by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite
changes David had to find a way to provide compatibility with both
the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite.
He felt that these changes would provide compatibility with both systems.

In his words:
``My recollection is that the 'FASTLINKS' symlinks support in
FreeBSD-1.x, as implemented by Curt Mayer, worked differently than
4.4BSD. He used a spare field in the inode to duplicately store the
length. When the 4.4BSD-Lite merge was done, the optimized symlinks
support for existing filesystems (those that were initialized in
FreeBSD-1.x) were broken due to the FFS on-disk structure of
4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to
restore the backward compatibility with FreeBSD-1.x filesystems.
I think it was the best that could be done in the somewhat urgent
circumstances of the post Berkeley-USL settlement. Also, regarding
Rod's massive commit with little explanation, some context: John
Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to
the 386 platform in just 10 days. It was by far the most intense
hacking effort of my life. In addition to the porting of tons of
FreeBSD-1 code, I think we wrote more than 30,000 lines of new code
in that time to deal with the missing pieces and architectural
changes of 4.4BSD-Lite. We didn't make many notes along the way.
There was a lot of pressure to get something out to the rest of the
developer community as fast as possible, so detailed discrete commits
didn't happen - it all came as a giant wad, which is why Rod's
commit message was worded the way it was.''

Reported by:  Chuck Silvers
Tested by:    Chuck Silvers
History by:   David Greenman Lawrence
MFC after:    1 week
Sponsored by: Netflix
2021-05-16 17:04:11 -07:00
Mateusz Guzik
852088f6af vfs: add missing atomic conversion to writecount adjustment
Fixes:	("vfs: lockless writecount adjustment in set/unset text")
2021-05-14 17:42:05 +02:00
Mateusz Guzik
ca1ce50b2b vfs: add more safety against concurrent forced unmount to vn_write
1. stop re-reading ->v_mount (can become NULL)
2. stop re-reading ->v_type (can change to VBAD)
2021-05-14 14:22:22 +00:00
Mateusz Guzik
b5fb9ae687 vfs: lockless writecount adjustment in set/unset text
... for cases where this is not the first/last exec.
2021-05-14 14:22:21 +00:00
Mark Johnston
2cca77ee01 kqueue timer: Remove detached knotes from the process stop queue
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue.  If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.

It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue.  Ensure that we do not doubly enqueue the event structure in this
case.

Reported by:	syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by:	syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30251
2021-05-14 10:08:14 -04:00
Ed Maste
2c9764f36b regen syscall files after d51198d63b63 2021-05-13 14:09:58 -04:00
Mark Johnston
8b3c4231ab posix timers: Check for overflow when converting to ns
Disallow a time or timer period value when the conversion to nanoseconds
would overflow.  Otherwise it is possible to trigger a divison by zero
in realtime_expire_l(), where we compute the number of overruns by
dividing by the timer interval.

Fixes:	7995dae9 ("posix timers: Improve the overrun calculation")
Reported by:	syzbot+5ab360bd3d3e3c5a6e0e@syzkaller.appspotmail.com
Reported by:	syzbot+157b74ff493140d86eac@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30233
2021-05-13 08:34:03 -04:00
Mark Johnston
9246b3090c fork: Suspend other threads if both RFPROC and RFMEM are not set
Otherwise, a multithreaded parent process may trigger races in
vm_forkproc() if one thread calls rfork() with RFMEM set and another
calls rfork() without RFMEM.

Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to
see if the address space is shared.

Reported by:	syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com
Reported by:	syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30220
2021-05-13 08:33:23 -04:00
Mateusz Guzik
cef8a95acb vfs: fix vnode use count leak in O_EMPTY_PATH support
The vnode returned by namei_setup is already referenced.

Reported by:	pho
2021-05-13 09:39:27 +00:00
Konstantin Belousov
6de3cf14c4 vn_open_cred(): disallow O_CREAT | O_EMPTY_PATH
This combination does not make sense, and cannot be satisfied by lookup.
In particular, lookup cannot supply dvp, it only can directly return vp.

Reported and reviewed by:	markj using syzkaller
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-05-13 02:32:04 +03:00
Mark Johnston
d8acd2681b Fix mbuf leaks in various pru_send implementations
The various protocol implementations are not very consistent about
freeing mbufs in error paths.  In general, all protocols must free both
"m" and "control" upon an error, except if PRUS_NOTREADY is specified
(this is only implemented by TCP and unix(4) and requires further work
not handled in this diff), in which case "control" still must be freed.

This diff plugs various leaks in the pru_send implementations.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30151
2021-05-12 13:00:09 -04:00
Mateusz Guzik
12288bd999 cache: fix lockless absolute symlink traversal to non-fp mounts
Said lookups would incorrectly fail with EOPNOTSUP.

Reported by:	kib
2021-05-11 04:30:12 +00:00
Mark Johnston
c8bbb1272c vfs: Fix error handling in vn_fullpath_hardlink()
vn_fullpath_any_smr() will return a positive error number if the
caller-supplied buffer isn't big enough.  In this case the error must be
propagated up, otherwise we may copy out uninitialized bytes.

Reported by:	syzkaller+KMSAN
Reviewed by:	mjg, kib
MFC aftr:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30198
2021-05-10 20:22:27 -04:00
Konstantin Belousov
5e7cdf1817 openat(2): add O_EMPTY_PATH
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.

Reviewed by:	markj
Tested by:	Andrew Walker <awalker@ixsystems.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30148
2021-05-11 02:39:24 +03:00
Konstantin Belousov
d474440ab3 Constify vm_pager-related virtual tables.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Mark Johnston
9a7c2de364 realloc: Fix KASAN(9) shadow map updates
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA.  This value is "alloc".  Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map.  Do so using the correct size.

Reported by:	kp
Tested by:	kp
Sponsored by:	The FreeBSD Foundation
2021-05-05 17:12:51 -04:00
Warner Losh
a512d0ab00 kern: clarify boot time
In FreeBSD, the current time is computed from uptime + boottime. Uptime
is a continuous, smooth function that's monotonically increasing. To
effect changes to the current time, boottime is adjusted.  boottime is
mutable and shouldn't be cached against future need. Document the
current implementation, with the caveat that we may stop stepping
boottime on resume in the future and will step uptime instead (noted in
the commit message, but not in the code).

Sponsored by:		Netflix
Reviewed by:		phk, rpokala
Differential Revision:	https://reviews.freebsd.org/D30116
2021-05-05 12:32:13 -06:00