Commit Graph

17445 Commits

Author SHA1 Message Date
Konstantin Belousov
4bc5ce2c74 Use tdfind() in pget().
Reviewed by:	jhb, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25532
2020-07-02 10:40:47 +00:00
Andrew Turner
ecc8ccb441 Simplify the flow when getting/setting an isrc
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.

Sponsored by:	Innovate UK
2020-07-01 12:07:28 +00:00
Mateusz Guzik
5d1c042d32 cache: lockless forward lookup with smr
This eliminates the need to take bucket locks in the common case.

Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:59:08 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Andrew Gallatin
46cac10b3b Fix a panic when unloading firmware
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.

This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.

Reviewed by:	jhb, markj
Sponsored by:	Netflix
2020-06-29 21:35:50 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Enji Cooper
d6701b6c8c Add kern.features.witness
Adding `kern.features.witness` helps expose whether or not the kernel has
`options WITNESS` enabled, so the `feature_present(3)` API can be used
to query whether or not witness(9) is built into the kernel.

This support is helpful with userspace applications (generally speaking,
tests), as it can be queried to determine whether or not tests related
to WITNESS should be run.

MFC after:	1 week
Reviewed by: cem, darrick.freebsd_gmail.com
Differential Revision: https://reviews.freebsd.org/D25302
Sponsored by:	DellEMC Isilon
2020-06-24 18:51:01 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00
Jeff Roberson
03270b59ee Use zone nomenclature that is consistent with UMA. 2020-06-21 04:59:02 +00:00
Brandon Bergren
40b664f64b [PowerPC] More relocation fixes
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.

This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.

Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.

This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)

 * Remove the rest of the stoffs hack.
 * Remove my half-baked displace_symbol_table() function.
 * Extend ddb initialization to cope with having a relocation offset on the
   kernel symbol table.
 * Fix my kernel-as-initrd hack to work with booke64 by using a temporary
   mapping to access the data.
 * Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
 * Change the behavior or X_db_symbol_values to apply the relocation base
   when updating valp, to match link_elf_symbol_values() behavior.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D25223
2020-06-21 03:39:26 +00:00
Pawel Biernacki
049264c5cc hw.bus.info: rework handler
hw.bus.info was added in r68522 as a node, but there was never anything
connected "behind" it.  Its only purpose is to return a struct u_businfo.
The only in-base consumer are devinfo(3)/devinfo(8).
Rewrite the handler as SYSCTL_PROC and mark it as MPSAFE and read-only
as there never was a writable path.

Reviewed by:	kib
Approved by:	kib (mentor)
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25321
2020-06-18 21:42:54 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Ryan Moeller
33b39b6615 Apply default security flavor in vfs_export
There may be some version of mountd out there that does not supply a default
security flavor when none is given for an export.

Set the default security flavor in vfs_export if none is given, and remove the
workaround for oexport compat.

Reported by:	npn
Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25300
2020-06-16 21:30:30 +00:00
Simon J. Gerraty
73845fdbd3 Make KENV_MVALLEN tunable
When doing secure boot, loader wants to export loader.ve.hashed
the value of which typically exceeds KENV_MVALLEN.

Replace use of KENV_MVALLEN with tunable kenv_mvallen.

Add getenv_string_buffer() for the case where a stack buffer cannot be
created and use uma_zone_t kenv_zone for suitably sized buffers.

Reviewed by:	stevek, kevans
Obtained from:	Abhishek Kulkarni <abkulkarni@juniper.net>
MFC after:	1 week
Sponsored by:	Juniper Networks
Differential Revision: https://reviews.freebsd.org//D25259
2020-06-16 17:02:56 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00
Conrad Meyer
508a6e84e7 Flip kern.tty_info_kstacks on by default
It's a useful debug aid for anyone using Ctrl-T today, and doesn't seem to be
widely known.  So, enable it out of the box to help people find it.

It's a tunable and sysctl, so if you don't like it, it's easy to disable
locally.

If people really hate it, we can always flip it back.

Reported by:	Daniel O'Connor
2020-06-13 03:04:40 +00:00
Mark Johnston
4f8ad92f36 Remove the FIRMWARE_MAX limit.
The firmware module arbitrarily limits us to at most 50 images.  It is
possible to hit this limit on platforms that preload many firmware
images, or link all of the firmware images for a set of devices into the
kernel.

Convert the table into a linked list, removing the limit.

Reported by:	Steve Wheeler
Reviewed by:	rpokala
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D25161
2020-06-10 23:52:29 +00:00
Konstantin Belousov
4149c6a3ec Remove double-calls to tc_get_timecount() to warm timecounters.
It seems that second call does not add any useful state change for all
implemented timecounters.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2020-06-10 22:30:32 +00:00
Ed Maste
cff33fa8c8 Fix arm64 kernel build with DEBUG on
Submitted by:	Greg V <greg@unrelenting.technology>, andrew
Differential Revision:	https://reviews.freebsd.org/D24986
2020-06-10 16:00:43 +00:00
Rick Macklem
84d746de21 Add two functions that create M_EXTPG mbufs with anonymous pages.
These two functions are needed by nfs-over-tls, but could also be
useful for other purposes.
mb_alloc_ext_plus_pages() - Allocates a M_EXTPG mbuf and enough anonymous
      pages to store "len" data bytes.
mb_mapped_to_unmapped() - Copies the data from a list of mapped (non-M_EXTPG)
      mbufs into a list of M_EXTPG mbufs allocated with anonymous pages.
      This is roughly the inverse of mb_unmapped_to_ext().

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D25182
2020-06-10 02:51:39 +00:00
Mateusz Guzik
1724c563e6 cred: distribute reference count per thread
This avoids dirtying creds in the common case, see the comment in kern_prot.c
for details.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D24007
2020-06-09 23:03:48 +00:00
John Baldwin
58b552dcec Refactor ptrace() ABI compatibility.
Add a freebsd32_ptrace() and move as many freebsd32 shims as possible
to freebsd32_ptrace().  Aside from register sets, freebsd32 passes
pointers to native structures to kern_ptrace() and converts to/from
native/32-bit structure formats in freebsd32_ptrace() outside of
kern_ptrace().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25195
2020-06-09 16:43:23 +00:00
Mateusz Guzik
90a08d6cad Assert on pg_jobc state.
Stolen from NetBSD.
2020-06-09 15:17:23 +00:00
Chuck Silvers
c2ea3d44bf Fix hang due to missing unbusy in sendfile when an async data I/O fails.
r359473 removed the page unbusy logic from sendfile_iodone() because when
vm_pager_get_pages_async() would return an error after failing to start
the async I/O (eg. because VOP_BMAP failed), sendfile_swapin() would also
unbusy the pages, and it was wrong to unbusy twice.  However this breaks
the case where vm_pager_get_pages_async() succeeds in starting an async I/O
and the async I/O is what fails.  In this case, sendfile_iodone() must
unbusy the pages, and because sendfile_iodone() doesn't know which case
it is in, sendfile_iodone() must always unbusy pages and relookup pages
which have been substituted with bogus_page, which in turn means that
sendfile_swapin() must never do unbusy or relookup for pages which have
been given to vm_pager_get_pages_async(), even if there is an error.

Reviewed by:	kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25136
2020-06-06 00:02:50 +00:00
Kyle Evans
63619b6dba vfs: add restrictions to read(2) of a directory [2/2]
This commit adds the priv(9) that waters down the sysctl to make it only
allow read(2) of a dirfd by the system root. Jailed root is not allowed, but
jail policy and superuser policy will abstain from allowing/denying it so
that a MAC module can fully control the policy.

Such a MAC module has been written, and can be found at:
https://people.freebsd.org/~kevans/mac_read_dir-0.1.0.tar.gz

It is expected that the MAC module won't be needed by many, as most only
need to do such diagnostics that require this behavior as system root
anyways. Interested parties are welcome to grab the MAC module above and
create a port or locally integrate it, and with enough support it could see
introduction to base. As noted in mac_read_dir.c, it is released under the
BSD 2 clause license and allows the restrictions to be lifted for only
jailed root or for all unprivileged users.

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:17:25 +00:00
Kyle Evans
dcef4f65ae vfs: add restrictions to read(2) of a directory [1/2]
Historically, we've allowed read() of a directory and some filesystems will
accommodate (e.g. ufs/ffs, msdosfs). From the history department staffed by
Warner: <<EOF

pdp-7 unix seemed to allow reading directories, but they were weird, special
things there so I'm unsure (my pdp-7 assembler sucks).

1st Edition's sources are lost, mostly. The kernel allows it. The
reconstructed sources from 2nd or 3rd edition read it though.

V6 to V7 changed the filesystem format, and should have been a warning, but
reading directories weren't materially changed.

4.1b BSD introduced readdir because of UFS. UFS broke all directory reading
programs in 1983. ls, du, find, etc all had to be rewritten. readdir() and
friends were introduced here.

SysVr3 picked up readdir() in 1987 for the AT&T fork of Unix. SysVr4 updated
all the directory reading programs in 1988 because different filesystem
types were introduced.

In the 90s, these interfaces became completely ubiquitous as PDP-11s running
V7 faded from view and all the folks that initially started on V7 upgraded
to SysV. Linux never supported this (though I've not done the software
archeology to check) because it has always had a pathological diversity of
filesystems.
EOF

Disallowing read(2) on a directory has the side-effect of masking
application bugs from relying on other implementation's behavior
(e.g. Linux) of rejecting these with EISDIR across the board, but allowing
it has been a vector for at least one stack disclosure bug in the past[0].

By POSIX, this is implementation-defined whether read() handles directories
or not. Popular implementations have chosen to reject them, and this seems
sensible: the data you're reading from a directory is not structured in some
unified way across filesystem implementations like with readdir(2), so it is
impossible for applications to portably rely on this.

With this patch, we will reject most read(2) of a dirfd with EISDIR. Users
that know what they're doing can conscientiously set
bsd.security.allow_read_dir=1 to allow read(2) of directories, as it has
proven useful for debugging or recovery. A future commit will further limit
the sysctl to allow only the system root to read(2) directories, to make it
at least relatively safe to leave on for longer periods of time.

While we're adding logic pertaining to directory vnodes to vn_io_fault, an
additional assertion has also been added to ensure that we're not reaching
vn_io_fault with any write request on a directory vnode. Such request would
be a logical error in the kernel, and must be debugged rather than allowing
it to potentially silently error out.

Commented out shell aliases have been placed in root's chsrc/shrc to promote
awareness that grep may become noisy after this change, depending on your
usage.

A tentative MFC plan has been put together to try and make it as trivial as
possible to identify issues and collect reports; note that this will be
strongly re-evaluated. Tentatively, I will MFC this knob with the default as
it is in HEAD to improve our odds of actually getting reports. The future
priv(9) to further restrict the sysctl WILL NOT BE MERGED BACK, so the knob
will be a faithful reversion on stable/12. We will go into the merge
acknowledging that the sysctl default may be flipped back to restore
historical behavior at *any* point if it's warranted.

[0] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:10.ufs.asc

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
MFC after:	1 month (note the MFC plan mentioned above)
Relnotes:	absolutely, but will amend previous RELNOTES entry
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:09:55 +00:00
Jason A. Harmening
ef1eabca5d vt(4): reset scrollback and cursor position after clearing history buffer
r361601 implemented basic support for cleaing the console history buffer.
But after clearing the history buffer, it's not especially useful to be
able to scroll back through that buffer, or for the cursor position to
remain at (very likely) the bottom of the screen.

PR:		224436
Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D25079
2020-06-02 01:21:48 +00:00
Rick Macklem
c13e414dc2 Fix build issue introduced by r361699.
Reported by:	cy (and others)
2020-06-02 00:03:26 +00:00
Ryan Moeller
1cfffed85d Assign default security flavor when converting old export args
vfs_export requires security flavors be explicitly listed when
exporting as of r360900.

Use the default AUTH_SYS flavor when converting old export args to
ensure compatibility with the legacy mount syscall.

Reported by:	rmacklem
Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25045
2020-06-01 18:43:51 +00:00
Peter Wemm
694f3fc81c Clarify which hints file is the source of an error message.
PR:		246688
Submitted by:	Ashish Gupta <lrx337@gmail.com>
MFC after:	1 week
2020-06-01 03:37:58 +00:00
Andriy Gapon
e298466ee2 corefile_open_last: don't keep a locked vnode while locking other ones
Consider this scenario:
- kern.corefile=/var/coredumps/%N.%U.%I.core
- multiple processes with the same name crash at the same time

It's possible that one process selects existing file N as oldvp while it
keeps looking for an unused file number.  Another process scans through
files and stumbles upon N.  That process would be blocked on the vnode
lock while holding the directory vnode exclusively locked.  The first
process would, thus, get blocked on the directory's vnode lock.

More generally, holding a file's vnode lock (oldvp) while trying to lock
its directory (for the next lookup) is a violation of the vnode locking
order.

I have observed this deadlock in the wild.

So, the change to keep oldvp "opened" but unlocked and to lock it again
only if it's to be returned as the result.
As kib noted, an alternative would be to keep the directory locked and
to use VOP_LOOKUP directly for scanning through existing core files.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D25027
2020-05-29 07:44:02 +00:00
John Baldwin
2684603c5f Permit SO_NO_DDP and SO_NO_OFFLOAD to be read via getsockopt(2).
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24627
2020-05-29 00:09:12 +00:00
Rick Macklem
c01cd3f558 Update the files created from the new syscalls.master from r361599.
Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:23:02 +00:00
Rick Macklem
d9021e389a Add a syscall for the nfs-over-tls daemons to use.
The nfs-over-tls daemons need a system call to perform operations such as
associate a file descriptor with a krpc socket.
The daemons will not be in head for some time, but it will make it
easier for testers of nfs-over-tls to do testing if the system call
is in head (basically the stub for libc which will be commited soon).

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:06:10 +00:00
Rick Macklem
469f2e9e9a Fix sosend() for the case where mbufs are passed in while doing ktls.
For kernel tls, sosend() needs to call ktls_frame() on the mbuf list
to be sent.  Without this patch, this was only done when sosend()'s
arguments used a uio_iov and not when an mbuf list is passed in.
At this time, sosend() is never called with an mbuf list argument when
kernel tls is in use, but will be once nfs-over-tls has been incorporated
into head.

Reviewed by:	gallatin, glebius
Differential Revision:	https://reviews.freebsd.org/D24674
2020-05-27 23:20:35 +00:00
Hans Petter Selasky
4f3c0f3d1e Fix build issue after r360292 when using both RSS and KERN_TLS options.
Sponsored by:	Mellanox Technologies
2020-05-26 08:25:24 +00:00
Chuck Silvers
d79ff54b5c This commit enables a UFS filesystem to do a forcible unmount when
the underlying media fails or becomes inaccessible. For example
when a USB flash memory card hosting a UFS filesystem is unplugged.

The strategy for handling disk I/O errors when soft updates are
enabled is to stop writing to the disk of the affected file system
but continue to accept I/O requests and report that all future
writes by the file system to that disk actually succeed. Then
initiate an asynchronous forced unmount of the affected file system.

There are two cases for disk I/O errors:

   - ENXIO, which means that this disk is gone and the lower layers
     of the storage stack already guarantee that no future I/O to
     this disk will succeed.

   - EIO (or most other errors), which means that this particular
     I/O request has failed but subsequent I/O requests to this
     disk might still succeed.

For ENXIO, we can just clear the error and continue, because we
know that the file system cannot affect the on-disk state after we
see this error. For EIO or other errors, we arrange for the geom_vfs
layer to reject all future I/O requests with ENXIO just like is
done when the geom_vfs is orphaned. In both cases, the file system
code can just clear the error and proceed with the forcible unmount.

This new treatment of I/O errors is needed for writes of any buffer
that is involved in a dependency. Most dependencies are described
by a structure attached to the buffer's b_dep field. But some are
created and processed as a result of the completion of the dependencies
attached to the buffer.

Clearing of some dependencies require a read. For example if there
is a dependency that requires an inode to be written, the disk block
containing that inode must be read, the updated inode copied into
place in that buffer, and the buffer then written back to disk.

Often the needed buffer is already in memory and can be used. But
if it needs to be read from the disk, the read will fail, so we
fabricate a buffer full of zeroes and pretend that the read succeeded.
This zero'ed buffer can be updated and written back to disk.

The only case where a buffer full of zeros causes the code to do
the wrong thing is when reading an inode buffer containing an inode
that still has an inode dependency in memory that will reinitialize
the effective link count (i_effnlink) based on the actual link count
(i_nlink) that we read. To handle this case we now store the i_nlink
value that we wrote in the inode dependency so that it can be
restored into the zero'ed buffer thus keeping the tracking of the
inode link count consistent.

Because applications depend on knowing when an attempt to write
their data to stable storage has failed, the fsync(2) and msync(2)
system calls need to return errors if data fails to be written to
stable storage. So these operations return ENXIO for every call
made on files in a file system where we have otherwise been ignoring
I/O errors.

Coauthered by: mckusick
Reviewed by:   kib
Tested by:     Peter Holm
Approved by:   mckusick (mentor)
Sponsored by:  Netflix
Differential Revision:  https://reviews.freebsd.org/D24088
2020-05-25 23:47:31 +00:00
John Baldwin
9c0e3d3a53 Add support for optional separate output buffers to in-kernel crypto.
Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer.  Using a separate output buffer avoids this copy.

- Create a new 'struct crypto_buffer' describing a crypto buffer
  containing a type and type-specific fields.  crp_ilen is gone,
  instead buffers that use a flat kernel buffer have a cb_buf_len
  field for their length.  The length of other buffer types is
  inferred from the backing store (e.g. uio_resid for a uio).
  Requests now have two such structures: crp_buf for the input buffer,
  and crp_obuf for the output buffer.

- Consumers now use helper functions (crypto_use_*,
  e.g. crypto_use_mbuf()) to configure the input buffer.  If an output
  buffer is not configured, the request still modifies the input
  buffer in-place.  A consumer uses a second set of helper functions
  (crypto_use_output_*) to configure an output buffer.

- Consumers must request support for separate output buffers when
  creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
  only permitted to queue a request with a separate output buffer on
  sessions with this flag set.  Existing drivers already reject
  sessions with unknown flags, so this permits drivers to be modified
  to support this extension without requiring all drivers to change.

- Several data-related functions now have matching versions that
  operate on an explicit buffer (e.g. crypto_apply_buf,
  crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).

- Most of the existing data-related functions operate on the input
  buffer.  However crypto_copyback always writes to the output buffer
  if a request uses a separate output buffer.

- For the regions in input/output buffers, the following conventions
  are followed:
  - AAD and IV are always present in input only and their
    fields are offsets into the input buffer.
  - payload is always present in both buffers.  If a request uses a
    separate output buffer, it must set a new crp_payload_start_output
    field to the offset of the payload in the output buffer.
  - digest is in the input buffer for verify operations, and in the
    output buffer for compute operations.  crp_digest_start is relative
    to the appropriate buffer.

- Add a crypto buffer cursor abstraction.  This is a more general form
  of some bits in the cryptosoft driver that tried to always use uio's.
  However, compared to the original code, this avoids rewalking the uio
  iovec array for requests with multiple vectors.  It also avoids
  allocate an iovec array for mbufs and populating it by instead walking
  the mbuf chain directly.

- Update the cryptosoft(4) driver to support separate output buffers
  making use of the cursor abstraction.

Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
Conrad Meyer
852c303b61 copystr(9): Move to deprecate (attempt #2)
This reapplies logical r360944 and r360946 (reverting r360955), with fixed
copystr() stand-in replacement macro.  Eventually the goal is to convert
consumers and kill the macro, but for a first step it helps if the macro is
correct.

Prior commit message:

Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy (with correction from brooks@ -- thanks).

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by:		jhb (earlier version)
Discussed with:		brooks (thanks!)
Differential Revision:	https://reviews.freebsd.org/D24672
2020-05-25 16:40:48 +00:00
Mateusz Guzik
5a90435ccf proc: refactor clearing credentials into proc_unset_cred 2020-05-25 12:41:44 +00:00
Mateusz Guzik
e3d16bb6a8 vfs: use atomic_{store,load}_long to manage f_offset
... instead of depending on the compiler not to mess them up
2020-05-25 04:57:57 +00:00
Mateusz Guzik
442e617fd2 vfs: restore mtx-protected foffset locking for 32 bit platforms
They depend on it to accurately read the offset.

The new code is not used as it would add an interrupt enable/disable
trip on top of the atomic.

This also fixes a bug where 32-bit nolock request would still lock the offset.

No changes for 64-bit.

Reported by:	emaste
2020-05-25 04:56:41 +00:00
Mateusz Guzik
3fc40153b2 vfs: scale foffset_lock by using atomics instead of serializing on mtx pool
Contending cases still serialize on sleepq (which would be taken anyway).

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21626
2020-05-24 03:50:49 +00:00
Ryan Moeller
245bfd34da Deduplicate fsid comparisons
Comparing fsid_t objects requires internal knowledge of the fsid structure
and yet this is duplicated across a number of places in the code.

Simplify by creating a fsidcmp function (macro).

Reviewed by:	mjg, rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24749
2020-05-21 01:55:35 +00:00
John Baldwin
6faabe5411 Remove copyinfrom() and copyinstrfrom().
These functions were added in 2001 and are currently unused.
copyinfrom() looks to have never been used.  copyinstrfrom() was used
for two weeks before the code was refactored to remove it's sole use.

Reviewed by:	brooks, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24928
2020-05-20 20:58:17 +00:00
Mark Johnston
615a9fb5fe Use the symbolic name for "modmetadata_set".
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-05-19 18:34:50 +00:00
Christian S.J. Peron
757a564248 Add BSM record conversion for a number of syscalls:
- thr_kill(2) and thr_exit(2) generally (no argument auditing here.
- A set of syscalls for the process descriptor family, specifically:
  pdfork(2), pdgetpid(2) and pdkill(2)

  For these syscalls, audit the file descriptor. In the case of pdfork(2)
  a pointer to an integer (file descriptor) is passed in as an argument.
  We audit the post initialized file descriptor (not the random garbage
  that would have been passed in). We will also audit the child process
  which was created from the fork operation (similar to what is done for
  the fork(2) syscall).

  pdkill(2) we audit the signal value and fd, and finally pdgetpid(2)
  just the file descriptor:

- Following is a sample of the produced audit trails:

  header,111,11,pdfork(2),0,Sat May 16 03:07:50 2020, + 394 msec
  argument,0,0x39d,child PID
  argument,2,0x2,flags
  argument,1,0x8,fd
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,925

  header,79,11,pdgetpid(2),0,Sat May 16 03:07:50 2020, + 394 msec
  argument,1,0x8,fd
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,0
  trailer,79

  header,135,11,pdkill(2),0,Sat May 16 03:07:50 2020, + 395 msec
  argument,1,0x8,fd
  argument,2,0xf,signal
  process_ex,root,root,0,root,0,925,0,0,0.0.0.0
  subject,root,root,0,root,0,924,0,0,0.0.0.0
  return,success,0
  trailer,135

MFC after:      1 week
2020-05-16 03:45:15 +00:00
Konstantin Belousov
a2b127ae7b Improve comment for compat32 handling of sysctl hw.pagesizes.
Explain why truncation works as intended.
Reformat.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-05-15 13:53:10 +00:00
Konstantin Belousov
6820cbed71 Revert r361077 to recommit with proper message. 2020-05-15 13:52:39 +00:00