Commit Graph

297 Commits

Author SHA1 Message Date
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Mike Karels
58a46cfd75 md driver compat32: fix structure padding for arm, powerpc
Because the 32-bit md_ioctl structure contains 64-bit members, arm
and powerpc add padding to a multiple of 8.  i386 doesn't do this.
The md_ioctl32 definition was correct for amd64/i386 without padding,
but wrong for arm64 and powerpc64.  Make __packed__ conditional on
__amd64__, and test for the expected size on non-amd64.  Note that
mdconfig is used in the ATF test suite.  Note, I verified the
structure size for powerpc, but was unable to test.

MFC after:	1 week
Reviewed by:	jrtc27
Differential Revision:	https://reviews.freebsd.org/D41339
Discussed with:	jhibbits
2023-08-08 09:09:03 -05:00
Jessica Clarke
8a6ab0f71f Pre-quote macros passed to .incbin to avoid unwanted substitution
Currently for the MFS, firmware and VDSO template assembly files we pass
the path to include with .incbin unquoted and use __XSTRING within the
assembly file to stringify it. However, __XSTRING doesn't just perform a
single level of expansion, it performs the normal full expansion of the
macro, and so if the path itself happens to tokenise to something that
includes a defined macro in it that will itself be substituted. For
example, with #define MACRO 1, a path like /path/containing/MACRO/in/it
will expand to /path/containing/1/in/it and then, when stringified, end
up as "/path/containing/1/in/it", not the intended string. Normally,
macros have names that start or end witih underscores and are unlikely
to appear in a tokenised path (even if technically they could), but now
that we've switched to GNU C as of commit ec41a96daa ("sys: Switch the
kernel's C standard from C99 to GNU99.") there are a few new macros
defined which don't start or end with underscores: unix, which is always
defined to 1, and i386, which is defined to 1 on i386. The former
probably doesn't appear in user paths in practice, but the latter has
been seen to and is likely quite common in the wild.

Fix this by defining the macro pre-quoted instead of using __XSTRING.
Note that technically we don't need to do this for vdso_wrap.S today as
all the paths passed to it are safe file names with no user-controlled
prefix but we should do it anyway for consistency and robustness against
future changes.

This allows make tinderbox to pass when built with source and object
directories inside ~/path-with-unix, which would otherwise expand to
~/path-with-1 and break.

PR:	272744
Fixes:	ec41a96daa ("sys: Switch the kernel's C standard from C99 to GNU99.")
2023-07-28 05:08:43 +01:00
Mark Johnston
30038a8b4e md: Get rid of the pbuf zone
The zone is used solely to provide KVA for mapping BIOs so that we can
pass mapped buffers to VOP_READ and VOP_WRITE.  Currently we preallocate
nswbuf/10 bufs for this purpose during boot.

The intent was to limit KVA usage on 32-bit systems, but the
preallocation means that we in fact consumed more KVA than needed unless
one has more than nswbuf/10 (typically 25) vnode-backed MD devices
in existence, which I would argue is the uncommon case.

Meanwhile, all I/O to an MD is handled by a dedicated thread, so we can
instead simply preallocate the KVA region at MD device creation time.

Event:		BSDCan 2023
Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40215
2023-05-23 10:27:10 -04:00
Konstantin Belousov
ad8feb1ed9 md.c: another style fix
Noted by:	jkim
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-01-20 04:39:36 +02:00
Konstantin Belousov
6189672e60 Handle ERELOOKUP from VOP_FSYNC() in several other places
We need to repeat the operation if the vnode was relocked.

Reported and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38114
2023-01-20 03:54:56 +02:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Aleksandr Fedorov
cb28dfb27d md(4): Add dummy support of the BIO_FLUSH command for malloc and swap
backend.

PR:	260200
Reported by:	editor@callfortesting.org
Reviewed by:	vmaffione (mentor), markj
Approved by:	vmaffione (mentor), markj
Differential Revision:	https://reviews.freebsd.org/D34260
2022-02-17 22:21:56 +03:00
Kyle Evans
b9c92d631c Annotate geom_md with MODULE_VERSION
This was missed in 74d6c131cb where other geom modules were annotated
with MODULE_VERSION.  Again, the problem is the same: we can't detect
that geom_md is loaded into the kernel without it.

This was noticed in release builds on the cluster; mdconfig attempts to
load geom_md because it can't detect it in the kernel, but the cluster
config includes md(4) and does not build the kmod.  This problem would
have been masked on hosts with the kmod built, as the kmod attempts to
register the g_md module and fails.  With this commit, mdconfig would
not even try to load it again.

Reported by:	re (cperciva)
MFC after:	3 days
2022-02-10 00:16:19 -06:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Ka Ho Ng
3703c18883 md: Add MD_MUSTDEALLOC support
This adds an option to detect if hole-punching is implemented by the
underlying file system.  If this flag is set, and if the underlying file
system does not support hole-punching, md(4) fails BIO_DELETE requests
with EOPNOTSUPP.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D31883
2021-09-11 20:04:52 +08:00
Mark Johnston
47619b6044 md: Clamp to a multiple of the sector size when resizing
We do this when creating md(4) devices, in kern_mdattach_locked(), but
not when resizing the provider.  Apply the same policy when resizing, as
many GEOM classes do not expect to deal with providers for which
pp->mediasize % pp->sectorsize != 0.

Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-08-31 15:50:04 -04:00
Ka Ho Ng
78267c2e70 md: Replace BIO_DELETE emulation with vn_deallocate(9)
Both zero-filling and/or deallocation can be done with vn_deallocate(9).

Sponsored by:	The FreeBSD Foundation
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D28899
2021-08-19 18:30:13 +08:00
Alex Richardson
69e18c9b7b sys/dev/md: Drop unncessary __GLOBL(mfs_root)
LLVM12 complains if you change the symbol binding:
error: mfs_root_end changed binding to STB_WEAK [-Werror,-Winline-asm]
error: mfs_root changed binding to STB_WEAK [-Werror,-Winline-asm]
2021-03-30 14:59:43 +01:00
Mark Johnston
c4cceb1d0d md: Fix a race in mdstart_swap()
Release a grabbed page's busy state only after marking it as referenced.
Otherwise there exists a narrow window where the page could be freed
before the update.  Before r356902 this was not a problem since the
object lock was held.

Discussed with:	kib
Sponsored by:	The FreeBSD Foundation
2021-01-04 08:26:14 -05:00
Mark Johnston
795a009b32 md: Set bio_completed properly in the face of errors
Account for any residual bytes.  This is only relevant for vnode-backed
md(4) devices.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27738
2020-12-27 16:49:35 -05:00
Mark Johnston
a7a7c306bf md: Fix a read-after-free in BIO_GETATTR handling
g_handleattr_int() consumes the bio if the attribute matches, so when we
check bp->bio_cmd bp may have been freed.

Move GETATTR handling to a separate function to avoid the problem.  We
do not need to set bio_completed for such bios, g_handleattr_int() will
handle it.  Also remove the setting of bio_resid before the
devstat_end_transaction_bio() call.  All of the md(4) bio handlers set
bio_resid already.

Reported by:	KASAN
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27724
2020-12-23 11:16:40 -05:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Mateusz Piotrowski
e2a03adb53 Fix a typo in a license comment
Approved by:	kaktus (src)
2020-11-12 15:50:18 +00:00
John Baldwin
f54c6ef100 Use a template assembly file to generate the embedded MFS.
This uses the .incbin directive to pull in the MFS image contents.
Using assembly directly ensures that symbols can be defined with the
name and properties (such as .size) desired without having to rename
symbols, etc. via a second objcopy invocation.  Since it is compiled
by the C compiler driver, it also avoids the need for all of the
EMBEDFS* make variables.

Suggested by:	jrtc27
Reviewed by:	kib, markj
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26781
2020-10-20 16:48:45 +00:00
Mateusz Guzik
1224a253d8 md: clean up empty lines in .c and .h files 2020-09-01 22:08:52 +00:00
Mark Johnston
3507b8d467 Remove some redundant assignments and computations.
Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25400
2020-06-28 21:34:38 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Jeff Roberson
7aaf252c96 Convert a few triviail consumers to the new unlocked grab API.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23847
2020-02-28 20:34:30 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
2c14385aa2 Fix a page leak in the md(4) swap I/O path.
r356147 removed a vm_page_activate() call, but this is required to
ensure that pages end up in the page queues in the first place.

Restore the pre-r356157 logic.  Now, without the page lock, the
vm_page_active() check is racy, but this race is harmless.

Reviewed by:	alc, kib
Reported and tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23024
2020-01-03 18:48:53 +00:00
Alexander Motin
5a93d93e8b Avoid duplicate I/O statistics accounting.
Alike to geom_disk free the provider statistics structure and point GEOM
toward local statistics.  It allows to save some CPU time.

MFC after:	2 weeks
2020-01-03 04:37:47 +00:00
Alexander Motin
024932aae9 Use atomic for start_count in devstat_start_transaction().
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places.  It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by:	iXsystems, Inc.
2019-12-30 03:13:38 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Brooks Davis
dcb235ab9e md(4): remove the unused and unusable MDIOCLIST ioctl.
It is unused, the ABI was broken in r322969, and it is broken by design
(more than MDNPAD md devices can exist and there is no way to retreive
them with this interface).

mdconfig(8) was converted to use libgeom to obtain this information
in r157160 and any other consumers of MDIOCLIST should likewise be
converted.

Reviewed by:	emaste
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18936
2019-08-16 18:57:32 +00:00
Kirk McKusick
7e1a6d4777 When using the force option to shut down a memory-disk device,
I/O operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the device driver
needs to wait for the draining to happen. The waiting is done by
adding a g_md_providergone() function to wait for the I/O operations
to finish up.

It is likely that every GEOM provider that implements orphaning
attached GEOM consumers needs to use the "providergone" mechanism
for this same reason, but some of them do not do so. Apparently
Kenneth Merry (ken@) added the drain for just such races, but he
missed adding it to some of the device drivers that needed it.

Submitted by: Chuck Silvers
Reviewed by:  imp
Tested by:    Chuck Silvers
MFC after:    1 week
Sponsored by: Netflix
2019-03-31 21:34:58 +00:00
Gleb Smirnoff
756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
Bruce Evans
c907940bf9 Fix devstat on md devices, second attempt. r341765 depends on
g_io_deliver() finishing initialization of the bio, but g_io_deliver()
actually destroys the bio.  INVARIANTS makes the bug obvious by
overwriting the bio with garbage.

Restore the old order for calling devstat (except don't restore not calling
it for the error case), and translate to the devstat KPI so that this order
works.

Reviewed by:	kib
2018-12-22 22:59:11 +00:00
Bruce Evans
9e5ed8593f Use VOP_ADVISE() with POSIX_FADV_DONTNEED instead of IO_DIRECT to
implement not double-caching for reads from vnode-backed md devices.
Use VOP_ADVISE() similarly instead of !IO_DIRECT unsimilarly for writes.
Add a "cache" option to mdconfig to allow changing the default of not
caching.

This depends on a recent commit to fix VOP_ADVISE().  A previous version
had optimizations for sequential i/o's (merge the i/o's and only uncache
for discontiguous i/o's and for full blocks), but optimizations and
knowledge of block boundaries belong in VOP_ADVISE().  Read-ahead should
also be handled better, by supporting it in md and discarding it in
VOP_ADVISE().

POSIX_FADV_DONTNEED is ignored by zfs, but so is IO_DIRECT.

POSIX_FADV_DONTNEED works better than IO_DIRECT if it is not ignored,
since it only discards from the buffer cache immediately, while
IO_DIRECT also discards from the page cache immediately.

IO_DIRECT was not used for writes since it was claimed to be too slow,
but most of the slowness for writes is from doing them synchronously by
default.  Non-synchronous writes still deadlock in many cases.

IO_DIRECT only has a special implementation for ffs reads with DIRECTIO
configured.  Otherwise, if it is not ignored than it uses the buffer and
page caches normally except for discarding everything after each i/o,
and then it has much the same overheads as POSIX_FADV_DONTNEED.  The
overheads for reading with ffs and DIRECTIO were similar in tests of md.

Reviewed by:	kib
2018-12-21 08:15:31 +00:00
Bruce Evans
dac6a0d559 Fix devstat on md devices.
devstat_end_transaction() was called before the i/o was actually ended
(by delivering it to GEOM), so at least the i/o length was messed up.
It was always recorded as 0, so the average transaction size and the
average transfer rate was always displayed as 0.

devstat_end_transaction() was not called at all for the error case, so
there were sometimes multiple starts per end.  I didn't observe this in
practice and don't know if it did much damage.  I think it extended the
length of the i/o to the next transaction.

Reviewed by:	kib
2018-12-09 15:34:20 +00:00
Breno Leitao
7b2c7b92be md: use prestaged mfs_root
On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs
into memory and set two fdt entries to describe where the file is located in
the memory;

I need to pass this memory region to the md device as a mfs_root, but, current
md driver does not support two things:

 * Just getting a pointer from an external (bootloader) memory. If I need to
workaround it, I would need to declare a static array and memcopy from this
external memory to this static variable.

 * The size of the image. The usage of mfs_root_end, which is not a pointer,
seems to be not possible for this prestaged scenario.

This patch simply adds a new way to load mfs_root from memory.

Differential Revision: https://reviews.freebsd.org/D15625
Approved by: kib, jhibbits (mentor)
2018-06-07 13:57:34 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Brooks Davis
026cac8213 Move 32-bit compat for md(4) ioctls into the md code.
This is more correct in that ioctl commands have no meaning until they
hit the handler associated with the file descriptor.

Add support for MDIOCRESIZE_32 which was missed when it was added.

Reviewed by:	cem, kib, markj (various versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14714
2018-03-27 16:07:54 +00:00
Brooks Davis
34a77b9741 Move uio enums to sys/_uio.h.
Include _uio.h instead of uio.h in several headers to reduce header
polution.

Fix a few places that relied on header polution to get the uio.h header.

I have not moved struct uio as many more things that use it rely on
header polution to get other definitions from uio.h.

Reviewed by:	cem, kib, markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14811
2018-03-27 15:20:03 +00:00
Brooks Davis
064c9c3d42 Add a request structure and make the implementation use it.
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.

Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.

Reviewed by:	markj, cem, kib (assorted versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14704
2018-03-15 21:42:49 +00:00
Brooks Davis
b65794ad84 Move implementation of ioctls into kern_*() functions.
Move locks from outside ioctl to the individual implementations.

This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.

Reviewed by:	cem, kib, markj (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14700
2018-03-15 18:12:55 +00:00
Brooks Davis
94598ac9f9 Restore the behavior of returning the total number of units by
unconditionally incrementing i in the loop;

Reported by:	cem
MFC with:	r330880
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-15 16:37:43 +00:00
Brooks Davis
8b9f77a14c Don't overflow the kernel struct mdio in the MDIOCLIST ioctl.
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.

Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.

Reviewed by:	kib
Obtained from:	CheriBSD
Discussed with:	security-officer (gordon)
MFC after:	3 days
Security:	kernel heap buffer overflow
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-13 20:39:06 +00:00
Jonathan T. Looney
f05c495660 Fix backwards MD_VERIFY logic for md devices.
If the MD_VERIFY flag is set, we should use O_VERIFY. If the MD_VERIFY flag
is not set, we should not.

Reviewed by:	stevek
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D13814
2018-01-10 00:08:57 +00:00