Commit Graph

279 Commits

Author SHA1 Message Date
Mateusz Piotrowski
e2a03adb53 Fix a typo in a license comment
Approved by:	kaktus (src)
2020-11-12 15:50:18 +00:00
John Baldwin
f54c6ef100 Use a template assembly file to generate the embedded MFS.
This uses the .incbin directive to pull in the MFS image contents.
Using assembly directly ensures that symbols can be defined with the
name and properties (such as .size) desired without having to rename
symbols, etc. via a second objcopy invocation.  Since it is compiled
by the C compiler driver, it also avoids the need for all of the
EMBEDFS* make variables.

Suggested by:	jrtc27
Reviewed by:	kib, markj
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26781
2020-10-20 16:48:45 +00:00
Mateusz Guzik
1224a253d8 md: clean up empty lines in .c and .h files 2020-09-01 22:08:52 +00:00
Mark Johnston
3507b8d467 Remove some redundant assignments and computations.
Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25400
2020-06-28 21:34:38 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Jeff Roberson
7aaf252c96 Convert a few triviail consumers to the new unlocked grab API.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23847
2020-02-28 20:34:30 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
2c14385aa2 Fix a page leak in the md(4) swap I/O path.
r356147 removed a vm_page_activate() call, but this is required to
ensure that pages end up in the page queues in the first place.

Restore the pre-r356157 logic.  Now, without the page lock, the
vm_page_active() check is racy, but this race is harmless.

Reviewed by:	alc, kib
Reported and tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23024
2020-01-03 18:48:53 +00:00
Alexander Motin
5a93d93e8b Avoid duplicate I/O statistics accounting.
Alike to geom_disk free the provider statistics structure and point GEOM
toward local statistics.  It allows to save some CPU time.

MFC after:	2 weeks
2020-01-03 04:37:47 +00:00
Alexander Motin
024932aae9 Use atomic for start_count in devstat_start_transaction().
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places.  It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by:	iXsystems, Inc.
2019-12-30 03:13:38 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Brooks Davis
dcb235ab9e md(4): remove the unused and unusable MDIOCLIST ioctl.
It is unused, the ABI was broken in r322969, and it is broken by design
(more than MDNPAD md devices can exist and there is no way to retreive
them with this interface).

mdconfig(8) was converted to use libgeom to obtain this information
in r157160 and any other consumers of MDIOCLIST should likewise be
converted.

Reviewed by:	emaste
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18936
2019-08-16 18:57:32 +00:00
Kirk McKusick
7e1a6d4777 When using the force option to shut down a memory-disk device,
I/O operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the device driver
needs to wait for the draining to happen. The waiting is done by
adding a g_md_providergone() function to wait for the I/O operations
to finish up.

It is likely that every GEOM provider that implements orphaning
attached GEOM consumers needs to use the "providergone" mechanism
for this same reason, but some of them do not do so. Apparently
Kenneth Merry (ken@) added the drain for just such races, but he
missed adding it to some of the device drivers that needed it.

Submitted by: Chuck Silvers
Reviewed by:  imp
Tested by:    Chuck Silvers
MFC after:    1 week
Sponsored by: Netflix
2019-03-31 21:34:58 +00:00
Gleb Smirnoff
756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
Bruce Evans
c907940bf9 Fix devstat on md devices, second attempt. r341765 depends on
g_io_deliver() finishing initialization of the bio, but g_io_deliver()
actually destroys the bio.  INVARIANTS makes the bug obvious by
overwriting the bio with garbage.

Restore the old order for calling devstat (except don't restore not calling
it for the error case), and translate to the devstat KPI so that this order
works.

Reviewed by:	kib
2018-12-22 22:59:11 +00:00
Bruce Evans
9e5ed8593f Use VOP_ADVISE() with POSIX_FADV_DONTNEED instead of IO_DIRECT to
implement not double-caching for reads from vnode-backed md devices.
Use VOP_ADVISE() similarly instead of !IO_DIRECT unsimilarly for writes.
Add a "cache" option to mdconfig to allow changing the default of not
caching.

This depends on a recent commit to fix VOP_ADVISE().  A previous version
had optimizations for sequential i/o's (merge the i/o's and only uncache
for discontiguous i/o's and for full blocks), but optimizations and
knowledge of block boundaries belong in VOP_ADVISE().  Read-ahead should
also be handled better, by supporting it in md and discarding it in
VOP_ADVISE().

POSIX_FADV_DONTNEED is ignored by zfs, but so is IO_DIRECT.

POSIX_FADV_DONTNEED works better than IO_DIRECT if it is not ignored,
since it only discards from the buffer cache immediately, while
IO_DIRECT also discards from the page cache immediately.

IO_DIRECT was not used for writes since it was claimed to be too slow,
but most of the slowness for writes is from doing them synchronously by
default.  Non-synchronous writes still deadlock in many cases.

IO_DIRECT only has a special implementation for ffs reads with DIRECTIO
configured.  Otherwise, if it is not ignored than it uses the buffer and
page caches normally except for discarding everything after each i/o,
and then it has much the same overheads as POSIX_FADV_DONTNEED.  The
overheads for reading with ffs and DIRECTIO were similar in tests of md.

Reviewed by:	kib
2018-12-21 08:15:31 +00:00
Bruce Evans
dac6a0d559 Fix devstat on md devices.
devstat_end_transaction() was called before the i/o was actually ended
(by delivering it to GEOM), so at least the i/o length was messed up.
It was always recorded as 0, so the average transaction size and the
average transfer rate was always displayed as 0.

devstat_end_transaction() was not called at all for the error case, so
there were sometimes multiple starts per end.  I didn't observe this in
practice and don't know if it did much damage.  I think it extended the
length of the i/o to the next transaction.

Reviewed by:	kib
2018-12-09 15:34:20 +00:00
Breno Leitao
7b2c7b92be md: use prestaged mfs_root
On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs
into memory and set two fdt entries to describe where the file is located in
the memory;

I need to pass this memory region to the md device as a mfs_root, but, current
md driver does not support two things:

 * Just getting a pointer from an external (bootloader) memory. If I need to
workaround it, I would need to declare a static array and memcopy from this
external memory to this static variable.

 * The size of the image. The usage of mfs_root_end, which is not a pointer,
seems to be not possible for this prestaged scenario.

This patch simply adds a new way to load mfs_root from memory.

Differential Revision: https://reviews.freebsd.org/D15625
Approved by: kib, jhibbits (mentor)
2018-06-07 13:57:34 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Brooks Davis
026cac8213 Move 32-bit compat for md(4) ioctls into the md code.
This is more correct in that ioctl commands have no meaning until they
hit the handler associated with the file descriptor.

Add support for MDIOCRESIZE_32 which was missed when it was added.

Reviewed by:	cem, kib, markj (various versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14714
2018-03-27 16:07:54 +00:00
Brooks Davis
34a77b9741 Move uio enums to sys/_uio.h.
Include _uio.h instead of uio.h in several headers to reduce header
polution.

Fix a few places that relied on header polution to get the uio.h header.

I have not moved struct uio as many more things that use it rely on
header polution to get other definitions from uio.h.

Reviewed by:	cem, kib, markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14811
2018-03-27 15:20:03 +00:00
Brooks Davis
064c9c3d42 Add a request structure and make the implementation use it.
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.

Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.

Reviewed by:	markj, cem, kib (assorted versions)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14704
2018-03-15 21:42:49 +00:00
Brooks Davis
b65794ad84 Move implementation of ioctls into kern_*() functions.
Move locks from outside ioctl to the individual implementations.

This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.

Reviewed by:	cem, kib, markj (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14700
2018-03-15 18:12:55 +00:00
Brooks Davis
94598ac9f9 Restore the behavior of returning the total number of units by
unconditionally incrementing i in the loop;

Reported by:	cem
MFC with:	r330880
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-15 16:37:43 +00:00
Brooks Davis
8b9f77a14c Don't overflow the kernel struct mdio in the MDIOCLIST ioctl.
Always terminate the list with -1 and document the ioctl behavior.
This preserves existing behavior as seen from userspace with the
addition of the unconditional termination which will not be seen by
working consumers of MDIOCLIST.

Because this ioctl can only be performed by root (in default
configurations) and is not used in the base system this bug is not
deemed to warrant either a security advisory or an eratta notice.

Reviewed by:	kib
Obtained from:	CheriBSD
Discussed with:	security-officer (gordon)
MFC after:	3 days
Security:	kernel heap buffer overflow
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14685
2018-03-13 20:39:06 +00:00
Jonathan T. Looney
f05c495660 Fix backwards MD_VERIFY logic for md devices.
If the MD_VERIFY flag is set, we should use O_VERIFY. If the MD_VERIFY flag
is not set, we should not.

Reviewed by:	stevek
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D13814
2018-01-10 00:08:57 +00:00
Ian Lepore
5cf10fb96a Add a new kernel config option, MD_ROOT_READONLY, which forces on the
MD_READONLY flag for the md device automatically instantiated during
kernel init for an mdroot filesystem.

Note that there is specifically and by design no tunable or sysctl
control over this feature.  Without this option, you already have control
over whether the mdroot fs is writeable using vfs.root.mountfrom.options
from loader(8), the root_rw_mount rcvar, and by using "mount -u[rw] /"
or equivelent on the fly.  This option is being added to provide a way
to make the mdroot fs truly immutable before userland code begins running.

Differential Revision:	https://reviews.freebsd.org/D13411
2017-12-20 18:23:22 +00:00
Pedro F. Giffuni
64de3fdd58 SPDX: use the Beerware identifier. 2017-11-30 20:33:45 +00:00
Pedro F. Giffuni
7282444b10 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:36:21 +00:00
Edward Tomasz Napierala
253f5a2edb Make md(4) support GEOM::ident for vnode-backed disks. It's based
on backing file device and inode numbers.

This is useful for gmountver(8) regression tests.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12230
2017-10-04 12:23:34 +00:00
Alan Cox
d9ccb9a89f When mdstart_swap() accesses a page that is already in the active queue,
mark the page as referenced rather than calling vm_page_activate().  This
allows the page's act_count to grow beyond ACT_INIT and better reflect
its usage.  (See also r324146, which modified a function used by tmpfs,
uiomove_object_page(), to behave in the same way.)

Reviewed by:	kib, markj
MFC after:	2 weeks
2017-10-02 07:14:32 +00:00
Maxim Sobolev
f7ca2bbe44 Add ability to label md(4) devices.
This feature comes from the fact that we rely memory-backed md(4)
in our build process heavily. However, if the build goes haywire
the allocated resources (i.e. swap and memory-backed md(4)'s) need
to be purged. It is extremely useful to have ability to attach
arbitrary labels to each of the virtual disks so that they can
be identified and GC'ed if neecessary.

MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D10457
2017-08-28 15:54:07 +00:00
Mark Johnston
0d48e7e839 Don't call vm_pager_page_unswapped() when writing or deleting a dirty page.
The swap space backing a clean page is released when it is first dirtied,
so there's no need to attempt to release swap space when the page is
already dirty.

Reviewed by:	alc
MFC after:	1 week
2017-06-14 03:55:11 +00:00
Mark Johnston
01bc16bb0e Free the request page if an I/O error occurs while reading from swap.
After such a failure, the page is invalid, so there's point in keeping it
around. Moreover, such pages were not being inserted into the active queue,
making them unreclaimable until a subsequent write or delete made them
valid.

Reported by:	alc
Reviewed by:	alc (previous revision)
MFC after:	1 week
2017-06-14 03:50:02 +00:00
Mark Johnston
cc2fe2b0fb Fix handling of subpage BIO_WRITE and BIO_DELETE requests on swap MDs.
Such requests would previously mark the entire page as valid, which was
incorrect since nothing guaranteed that the page's contents had been
initialized. This change also modifies subpage BIO_DELETEs so that the
entire page is marked dirty, rather than only a subrange. There is no
benefit to creating partially dirty swap pages.

Reviewed by:	alc, kib (previous version)
MFC after:	3 days
2017-06-14 03:45:26 +00:00
Stephen J. Kiernan
9a81ba0f24 Add MD_VERIFY option to enable O_VERIFY in open for vnode type.
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
  verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
  the underlying device has been verified.

Reviewed by:	rwatson
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2902
2017-05-31 21:18:11 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Pedro F. Giffuni
4d24901ac9 sys/dev: Replace zero with NULL for pointers.
Makes things easier to read, plus architectures may set NULL to something
different than zero.

Found with:	devel/coccinelle
MFC after:	3 weeks
2017-02-20 03:43:12 +00:00
Stephen J. Kiernan
e895e7fce7 Fix typo where opening brace was needed.
Reported by:	Michael Butler
Reviewed by:	sjg
Approved by:	sjg (mentor)
2017-02-13 18:52:26 +00:00
Stephen J. Kiernan
d2e6391342 For MD_PRELOAD type md(4) devices, if there is a file name in the preloaded
meta-data, copy it into the softc structure.

When returning md(4) device details to the caller, include the file name in
any MD_PRELOAD type devices if it is set (first character is not NUL.)

In mdconfig, for "preload" type md(4) devices, if there is file config
available, print it in the file column of the output.

Reviewed by:	brooks
Approved by:	sjg (mentor)
MFC after:	1 month
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D9529
2017-02-13 17:44:07 +00:00
Maxim Sobolev
c3d1c73fa9 For the MD_ROOT option don't inject /dev/md0 as root dev when ROOTDEVNAME
is defined explicitly. It's kinda pointless and results in extra step in
boot sequence which is not really needed, i.e.:

md0: Embedded image 1331200 bytes at 0x8038b7b4
Trying to mount root from ufs:/dev/md0 []...
Mounting from ufs:/dev/md0 failed with error 22.
Trying to mount root from ufs:md0.uzip []...
warning: no time-of-day clock registered, system time will not be set accurately
start_init: trying /sbin/init
2016-03-09 19:36:25 +00:00
Adrian Chadd
f4c1f0b9eb Fix MFS builds when both MD_ROOT_SIZE and MFS_IMAGE are specified
MD_ROOT_SIZE and embed_mfs.sh were basically retired as part of
https://reviews.freebsd.org/D2903 .
However, when building a kernel with 'options MD_ROOT_SIZE' specified, this
results in a non-working MFS, as within sys/dev/md/md.c we fall within the
wrong # ifdef.

This patch implements the following:

* Allow kernels to be built without the MD_ROOT_SIZE option, which results
  in a kernel built as per D2903.
* Allow kernels to be built with the MD_ROOT_SIZE option, which results
  in a kernel built similarly to the pre-D2903 way, with the following
  differences:
  * The MFS is now put in a separate section within the kernel (oldmfs,
    so it differs from the mfs section introduced by D2903).
  * embed_mfs.sh is changed, so it looks up the oldmfs section within the
    kernel, gets its size and offset, sees if the MFS will fit within the
    allocated oldmfs section and only if all is well does a dd of the MFS
    image into the kernel.

Submitted by:	Stanislav Galabov <sgalabov@gmail.com>
Reviewed by:	brooks, imp
Differential Revision:	https://reviews.freebsd.org/D5093
2016-02-02 07:02:51 +00:00
Gleb Smirnoff
b0cd20172d A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().
o With new KPI consumers can request contiguous ranges of pages, and
  unlike before, all pages will be kept busied on return, like it was
  done before with the 'reqpage' only. Now the reqpage goes away. With
  new interface it is easier to implement code protected from race
  conditions.

  Such arrayed requests for now should be preceeded by a call to
  vm_pager_haspage() to make sure that request is possible. This
  could be improved later, making vm_pager_haspage() obsolete.

  Strenghtening the promises on the business of the array of pages
  allows us to remove such hacks as swp_pager_free_nrpage() and
  vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
  values for read ahead and read behind, that a pager may do, if it
  can. These pages are completely owned by pager, and not controlled
  by the caller.

  This shifts the UFS-specific readahead logic from vm_fault.c, which
  should be file system agnostic, into vnode_pager.c. It also removes
  one VOP_BMAP() request per hard fault.

Discussed with:	kib, alc, jeff, scottl
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-12-16 21:30:45 +00:00
Konstantin Belousov
d5f998ba70 In md(4) over vnode, correct handling of the unaligned unmapped io
requests which page alignment + size is greater than MAXPHYS.  Right
now md(4) over vnode would use the physical buffer of the size MAXPHYS
to map a data of size MAXPHYS + page offset of the user buffer. This
typically corrupts next pbuf, or, if the pbuf used was the last pbuf
in the map, the next page after the pbuf's map.

Split request up to the size of io which fits into pbuf KVA with
alignment, and retry if a part of the bio is left unprocessed.

Reported by:	Fabian Keil <fk@fabiankeil.de>
Tested by:	Fabian Keil, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-12 14:08:29 +00:00