Commit Graph

63 Commits

Author SHA1 Message Date
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Peter Jeremy
afcd121024
geom_gate: Distinguish between classes of errors
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.

The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error).  Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure".  The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).

This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.

Reviewed by:    Johannes Totz <jo@bruelltuete.com>
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D33996
2022-01-29 21:15:51 +11:00
Alan Somers
f284bed200 geom_gate: ensure readprov is null-terminated
With crafted input to the G_GATE_CMD_CREATE ioctl, geom_gate can be made
to print kernel memory to the system console, potentially revealing
sensitive data from whatever was previously in that memory page.

But but but: this is a case of the sys admin misconfiguring, and you'd
need root privileges to do this.

Submitted By:	Johannes Totz <jo@bruelltuete.com>
MFC after:	2 weeks
Reviewed By:	asomers
Differential Revision: https://reviews.freebsd.org/D31727
2022-01-02 18:01:23 -07:00
Mateusz Guzik
d40bc60752 geom: clean up empty lines in .c and .h files 2020-09-01 22:14:09 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Warner Losh
8b522bdae6 Pass BIO_SPEEDUP through all the geom layers
While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183
2020-01-17 01:15:55 +00:00
Alexander Motin
351d0fa6df Fix GEOM_GATE orphanization.
Previous code closed and destroyed direct read consumer even with I/O still
in progress.  This patch adds locking and request counting to postpone the
close till the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 17:52:53 +00:00
Conrad Meyer
ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Mikolaj Golub
874774c5d4 geom_gate: enable resize
Reviewed By:	pjd
Approved By:	pjd
Differential Revision:	https://reviews.freebsd.org/D11531
2018-07-13 07:08:06 +00:00
Kyle Evans
74d6c131cb Annotate geom modules with MODULE_VERSION
GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

MFC Note: There's a pc98 component to the original submission that is
omitted here due to pc98 removal in head. This part will need to be revived
upon MFC.

Reviewed by:	imp
Submitted by:	op
Obtained from:	opBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14992
2018-04-10 19:18:16 +00:00
Pedro F. Giffuni
3728855a0f sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
Alexander Motin
8b64f3ca6c Use g_wither_provider() where applicable.
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
2016-09-23 21:29:40 +00:00
Pedro F. Giffuni
01b5c6f73e g_gate: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 16:18:07 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Alexander Motin
a93c0ed463 Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.
2013-03-26 05:42:12 +00:00
Pawel Jakub Dawidek
c4d2d401f8 We don't need buffer to handle BIO_DELETE, so don't check buffer size for it.
This fixes handling BIO_DELETE larger than MAXPHYS.
2013-03-14 23:07:01 +00:00
Mikolaj Golub
1d9db37c77 In g_gate_dumpconf() always check the result of g_gate_hold().
This fixes "Negative sc_ref" panic possible when sysctl_kern_geom_confxml()
is run simultaneously with destroying GATE device.

Reviewed by:	pjd
MFC after:	3 days
2012-08-07 18:50:33 +00:00
Mikolaj Golub
a277f47bd2 Reorder things in g_gate_create() so at the moment when g_new_geomf()
is called name is properly initialized.

Discussed with:	pjd
MFC after:	2 weeks
2012-07-28 16:30:50 +00:00
Pawel Jakub Dawidek
e08ec03778 Extend GEOM Gate class to handle read I/O requests directly within the kernel.
This will allow HAST to read directly from the local component without
even communicating userland daemon.

Sponsored by:	Panzura, http://www.panzura.com
MFC after:	1 month
2012-07-04 20:16:28 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Andrey V. Elsukov
5d807a0e1a Include sys/sbuf.h directly.
Reviewed by:	pjd
2011-07-11 05:22:31 +00:00
Pawel Jakub Dawidek
204a4e196a Recognize BIO_FLUSH requests and pass them to userland.
MFC after:	1 week
2011-05-23 21:00:37 +00:00
Pawel Jakub Dawidek
63a6c5c12b GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.

Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.

MFC after:	1 week
2011-04-02 06:56:06 +00:00
Mikolaj Golub
bd119384c7 Increase debug level on g_gate device destruction and add message on
device creation.

Suggested by:	danger
Approved by:	pjd (mentor)
MFC after:	3 days
2011-03-30 21:40:14 +00:00
Mikolaj Golub
baf63f65ae In g_gate_create() there is a window between when g_gate_softc is
registered in g_gate_units array and when its sc_provider field is
filled. If during this period g_gate_units is accessed by another
thread that is checking for provider name collision the crash is
possible.

Fix this by adding sc_name field to struct g_gate_softc. In
g_gate_create() when g_gate_softc is created but sc_provider is still
not sc_name points to provider name stored in the local array.

Approved by:	pjd (mentor)
Reported by:	Freddie Cash <fjwcash@gmail.com>
MFC after:	1 week
2011-03-27 19:56:55 +00:00
Alexander Leidinger
cb08c2cc83 Add some FEATURE macros for various GEOM classes.
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:	Google Summer of Code 2010
Submitted by:	kibab
Reviewed by:	silence on geom@ during 2 weeks
X-MFC after:	to be determined in last commit with code from this project
2011-02-25 10:24:35 +00:00
Pawel Jakub Dawidek
2aa15ffdab 'unit' can be negative, so use signed type for it.
Found by:	Coverity Prevent
CID:		3731
MFC after:	3 days
2010-06-14 21:58:55 +00:00
Pawel Jakub Dawidek
15725379d0 BIO_DELETE contains range we want to delete and doesn't provide any useful
data, so there is no need to copy it to userland.

MFC after:	3 days
2010-06-14 21:56:24 +00:00
Pawel Jakub Dawidek
b0990a1dae Simplify loops. 2010-03-18 13:11:43 +00:00
Pawel Jakub Dawidek
32115b105a Please welcome HAST - Highly Avalable Storage.
HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by:	FreeBSD Foundation
Sponsored by:	OMCnet Internet Service GmbH
Sponsored by:	TransIP BV
2010-02-18 23:16:19 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Pawel Jakub Dawidek
fc024f7a45 Bump copyright year. 2006-09-08 10:20:44 +00:00
Pawel Jakub Dawidek
c076790223 Use __FBSDID in .c files. 2006-09-08 10:19:24 +00:00
Pawel Jakub Dawidek
7ffb6e0f6a Fix problems with destroy and forcible destroy functionality:
- hold/release device in start/done routines, this will probably slow
  down things a bit, but previous code was racy;
- only release device if g_gate_destroy() failed - if it succeeded device
  is dead and there is nothing to release;
- various other changes which makes forcible destruction reliable.

MFC after:	3 days
2006-09-05 21:56:00 +00:00
Pawel Jakub Dawidek
38ea96ac99 Remove trailing spaces. 2006-02-01 12:06:01 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Pawel Jakub Dawidek
84436f14c4 Add CANCEL command which allows to remove one request from the queue or
all requests from the queue if request number is not given.

Bump version number.

Approved by:	re (scottl)
2005-07-08 21:08:53 +00:00
Pawel Jakub Dawidek
0218292cdf Update copyright in files changed this year. 2005-02-16 22:14:52 +00:00
Pawel Jakub Dawidek
ccbef85dd0 Remove mutex asserion from g_gate_find(). We don't want g_gate_list_mtx
mutex to be held here, because we want speed here.
2005-02-16 16:13:56 +00:00
Pawel Jakub Dawidek
f906581296 Remove TDP_GEOM flag from thread after ggate device creation.
This flag means "wait for all pending requests before returning to userland".
There are pending events for sure, because we just created new provider and
other classes want to taste it, but we cannot answer on I/O requests until
we're here.
2005-02-16 16:12:28 +00:00
Pawel Jakub Dawidek
35f855d9f9 Fix typo. We want to unlock mutex here.
Submitted by:	Andreas Kohn <andreas.kohn@gmail.com>
MFC after:	1 week
2005-02-12 16:19:03 +00:00
Pawel Jakub Dawidek
e35d3a7828 - Remove g_gate_hold()/g_gate_release() from start/done paths. It saves
4 mutex operations per I/O requests.
- Use only one mutex to protect both (incoming and outgoing) queue.
  As MUTEX_PROFILING(9) shows, there is no big contention for this lock.
- Protect sc_queue_count with queue mutex, instead of doing atomic
  operations on it.
- Remove DROP_GIANT()/PICKUP_GIANT() - ggate is marked as MPSAFE and no
  Giant there.
2005-02-09 08:29:39 +00:00
Pawel Jakub Dawidek
662a4e5878 - Use bioq_insert_tail()/bioq_insert_head() instead of bioq_disksort().
- Improve mediasize checking.

MFC after:	1 week
2005-02-05 00:30:08 +00:00
Pawel Jakub Dawidek
a17dd95f14 - Add missing Giant drop before acquiring the topology lock.
- Move DROP_GIANT()/PICKUP_GIANT() to g_gate_ioctl().
2004-11-23 11:18:26 +00:00