Commit Graph

95 Commits

Author SHA1 Message Date
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Edward Tomasz Napierala
d22ff249d9 Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26658
2020-10-18 16:24:08 +00:00
Mateusz Guzik
d40bc60752 geom: clean up empty lines in .c and .h files 2020-09-01 22:14:09 +00:00
Xin LI
8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Warner Losh
8b522bdae6 Pass BIO_SPEEDUP through all the geom layers
While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183
2020-01-17 01:15:55 +00:00
Conrad Meyer
ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Alexander Motin
49ee0fcea5 Use sbuf_cat() in GEOM confxml generation.
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-19 15:36:02 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Conrad Meyer
6b6e2954dd List-ify kernel dump device configuration
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails.  E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.  It also does not implement
IPv6 netdump.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device.  Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.

Reviewed by:	markj, scottl (earlier version)
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19996
2019-05-06 18:24:07 +00:00
Mark Johnston
438622af06 Use g_handleattr() to reply to GEOM::candelete queries.
g_handleattr() fills out bp->bio_completed; otherwise, g_getattr()
returns an error in response to the query.  This caused BIO_DELETE
support to not be propagated through stacked configurations, e.g.,
a gconcat of gmirror volumes would not handle BIO_DELETE even when
the gmirrors do.  g_io_getattr() was not affected by the problem.

PR:		232676
Reported and tested by:	noah.bergbauer@tum.de
MFC after:	1 week
2019-01-02 15:52:16 +00:00
Eugene Grosbein
6d305ab0b2 Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.

This change does not affect ABI but changes KBI. No MFC planned.

Differential Revision:	https://reviews.freebsd.org/D13426
2018-10-27 16:14:42 +00:00
Alexander Motin
cf4a52cf67 Fix use-after-free in RAID0 error reporting of GEOM_RAID.
PR:		231510
Submitted by:	yangx92@hotmail.com
Approved by:	re (gjb)
MFC after:	1 week
2018-09-24 16:58:55 +00:00
Sean Bruno
2c385d51ce Squash error from geom by sizing ident strings to DISK_IDENT_SIZE.
Display attribute in future error strings and differentiate g_handleattr()
error messages for ease of debugging in the future.

"g_handleattr: md1 bio_length 24 strlen 31 -> EFAULT"

Reported by:	swills
Reviewed by:	imp cem avg
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14962
2018-04-05 13:56:40 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Eugene Grosbein
1b8ea9beff geom_raid (RAID5): do not lose bp->bio_error, keep it in pbp->bio_error
and return it by passing to g_raid_iodone()

Approved by:	mav (mentor)
MFC after:	3 days
2017-12-07 20:09:17 +00:00
Eugene Grosbein
01a51aba38 Fix use-after-free that sometimes results in a garbage returned
instead of right error code after requests to SINGLE/CONCAT volumes, f.e:

# dd if=/dev/raid/r0 bs=512 of=/dev/null
dd: /dev/raid/r0: Unknown error: -559038242

Reviewed by:	avg (mentor), mav (mentor)
MFC after:	3 days
2017-12-07 05:55:18 +00:00
Pedro F. Giffuni
3728855a0f sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
Conrad Meyer
b28ea2c250 g_raid: Prevent tasters from attempting excessively large reads
Some g_raid tasters attempt metadata reads in multiples of the provider
sectorsize.  Reads larger than MAXPHYS are invalid, so detect and abort
in such situations.

Spiritually similar to r217305 / PR 147851.

PR:		214721
Sponsored by:	Dell EMC Isilon
2017-01-12 06:58:31 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Konstantin Belousov
4e2732b550 Removal of Giant droping wrappers for GEOM classes.
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:25:37 +00:00
Pedro F. Giffuni
e8d5712284 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
Pedro F. Giffuni
310aef3257 sys/geom: spelling fixes.
These affect debugging messages.

MFC after:	2 weeks
2016-04-28 19:26:46 +00:00
Pedro F. Giffuni
b99bce73e2 geom: unsign some types to match their definitions and avoid overflows.
In struct:gctl_req, nargs is unsigned.

In mirror:
g_mirror_syncreqs is unsigned.

In raid:
in struct:g_raid_volume, v_disks_count is unsigned.

In virstor:
in struct:g_virstor_softc, n_components is unsigned.

MFC after:	2 weeks
2016-04-27 15:10:40 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Warner Losh
c55f57071a Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
2016-02-17 17:16:02 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Pedro F. Giffuni
6bc3fe5f4e Clean out some externally visible "more then" grammar
MFC after:	3 days
2015-08-11 03:12:09 +00:00
Alexander Motin
3ab0187add Remove request sorting from GEOM_MIRROR and GEOM_RAID.
When CPU is not busy, those queues are typically empty.  When CPU is busy,
then one more extra sorting is the last thing it needs.  If specific device
(HDD) really needs sorting, then it will be done later by CAM.

This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.

MFC after:	2 weeks
2015-03-27 12:44:28 +00:00
Alexander Motin
0b1b7c2cec Replace constant with proper sizeof().
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2015-02-25 10:18:11 +00:00
Alexander Motin
1e68fe9c33 Avoid unneeded malloc/memcpy/free if there is no metadata on disk.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-05 10:23:18 +00:00
Alexander Motin
26f0f92fa2 Decode some binary fields of Intel metadata.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2014-12-04 15:54:45 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander Motin
dea1e22600 Reduce number of opens by REOM RAID during provider taste.
Instead opening/closing provider by each of metadata classes, do it only
once in core code.  Since for SCSI disks open/close means sending some
SCSI commands to the device, this change reduces taste time.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-04-28 15:03:52 +00:00
Alexander Motin
1229e83d2b Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields.
This caused incorrect behavior of arrays with big-endian DDF metadata.
Little-endian (like used by Adaptec controllers) should not be harmed.
Add workaround should be enough to manage compatibility.

MFC after:	2 weeks
2014-04-10 16:00:33 +00:00
Eitan Adler
7a22215c53 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Alexander Motin
b43560ab19 MFprojects/camlock r256445:
Add unmapped I/O support to GEOM RAID.
2013-10-16 09:33:23 +00:00
Alexander Motin
0f0b2fd889 Return error when opening read-only volumes (like RAID4/5/...) for writing.
Previously opens succeeded, but actual write operations returned errors.

Requested by:	peter
MFC after:	2 weeks
2013-08-13 07:56:40 +00:00
Alexander Motin
db8645f05e Oops, wrong constant at r254269. 2013-08-13 06:25:34 +00:00
Alexander Motin
e70b565ba4 Fix reasonable but safe Clang warnings. 2013-08-13 06:21:36 +00:00
Alexander Motin
8531bb3f0c Introduce 3 seconds timeout on graid stop command (mostly with -f flag).
Since completion waiting goes in g_event thread, it may cause GEOM deadlock
if consumer on top (for example, ZFS) uses g_event thread for closing.
2013-07-27 15:02:19 +00:00
Alexander Motin
57eed4a86f Fix vdc->Secondary_Element_Count metadata field access from 16 to 8 bit.
In some cases it could cause kernel panic during failed drive replacement.

Reported by:	trasz
MFC after:	1 week
2013-05-20 00:33:54 +00:00
Alexander Motin
bcb6ad36f2 Return "descr" field alike to "Intel RAID1 volume" for GEOM RAID to make
it look better in bsdinstall.
2013-04-27 06:57:39 +00:00
Alexander Motin
a93c0ed463 Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.
2013-03-26 05:42:12 +00:00
Sean Bruno
bd9fba0cfe Add legacy support to geom raid to create a /dev/arX device for support
of upgrading older machines using ataraid(4) to newer releases.

This optional parameter is controlled via kern.geom.raid.legacy_aliases
and will create a /dev/ar0 device that will point at /dev/raid/r0 for
example.

Tested on Dell SC 1425 DDF-1 format software raid controllers installing from
stable/7 and upgrading to stable/9 without having to adjust /etc/fstab

Reviewed by:	mav
Obtained from:	Yahoo!
MFC after:	2 Weeks
2013-03-08 20:07:32 +00:00