Commit Graph

132 Commits

Author SHA1 Message Date
kib
ca7c13e470 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
mjg
246370ba36 geom: clean up empty lines in .c and .h files 2020-09-01 22:14:09 +00:00
delphij
3249097b0a sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
kaktus
ad355b0a9d Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
mav
62c2efad53 Reimplement gvinum orphanization.
gvinum was the only GEOM class, using consumer nstart/nend fields. Making
it do its own accounting for orphanization purposes allows in perspective
to remove burden of that expensive for SMP accounting from GEOM.

Also the previous implementation spinned in a tight event loop, waiting
for all active BIOs to complete, while the new one knows exactly when it
is possible to close the consumer.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-12-27 01:36:53 +00:00
cem
10d53fcce8 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
kevans
dd6f2f2c8d Annotate geom modules with MODULE_VERSION
GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

MFC Note: There's a pc98 component to the original submission that is
omitted here due to pc98 removal in head. This part will need to be revived
upon MFC.

Reviewed by:	imp
Submitted by:	op
Obtained from:	opBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14992
2018-04-10 19:18:16 +00:00
pfg
a82e3a8b24 sys/geom: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:17:37 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
dim
f457850d31 Fix logic error in gvinum's gv_set_sd_state()
With clang 4.0.0, I'm getting the following warnings:

    sys/geom/vinum/geom_vinum_state.c:186:7: error: logical not is only
    applied to the left hand side of this bitwise operator
    [-Werror,-Wlogical-not-parentheses]
                    if (!flags & GV_SETSTATE_FORCE)
                        ^      ~

The logical not operator should obiously be called after masking.

Reviewed by:	mav, pfg
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D9093
2017-01-08 17:56:54 +00:00
mav
990a95381b Use g_wither_provider() where applicable.
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
2016-09-23 21:29:40 +00:00
pfg
fafa173c28 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
ae
972deb0b1f Include sys/sbuf.h directly.
Reviewed by:	pjd
2011-07-11 05:22:31 +00:00
ae
bd082ea669 Prevent non-aligned reading from provider while tasting. Reject
providers with unsupported sectorsize.

Reported by:	Joerg Wunsch
MFC after:	1 week
2011-05-25 11:14:26 +00:00
lulf
57b68bbf11 - Check flag with the bitwise operator, not the logical operator.
Submitted by:	arundel
MFC after:	1 week
2010-10-01 06:12:13 +00:00
jh
e34ccbdd57 - Don't return EAGAIN from gv_unload(). It was used to work around the
deadlock fixed in r207671.
- Wait for worker process to exit at class unload. The worker process
  was not guaranteed to exit before the linker unloaded the module.
- Use 0 as the worker process exit status instead of ENXIO and style
  the NOTREACHED comment.

Reviewed by:	lulf
X-MFC after:	r207671
2010-05-10 19:12:23 +00:00
lulf
b6c4219b41 - Remove obsolete flags.
MFC after:	1 week
2010-05-08 16:19:17 +00:00
lulf
db22efb244 - Set missing flag when initiating a plex rebuild with the rebuildparity
command.
- Check if plex is already syncing or rebuilding before initiating a parity
  rebuild or check.
2010-03-08 21:16:28 +00:00
trasz
ca36390aa1 Remove some pointless variable assignments.
Found with:	clang
2010-01-25 16:55:30 +00:00
lulf
b3a1193d86 - Improve error message consistency and wording. 2009-10-05 08:44:31 +00:00
lulf
08a0ba34aa - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
jamie
572db1408a Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
lulf
0ece818a7b - Split up the BIO queue into a queue for new and one for completed requests.
This is necessary for two reasons:
  1) In order to avoid collisions with the use of a BIOs flags set by a consumer
     or a provider
  2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
     available, so the consumer flags of a BIO had to be misused in order to
     support enough flags. The new queue makes it possible to recycle the
     GV_BIO_DONE flag into GV_BIO_GROW.
  As a consequence, gvinum will now work with any other GEOM class under it or
  on top of it.

- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
  appear to come from a consumer of a gvinum volume. Use bio_cflags only for
  cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.

PR:		kern/133604
2009-05-06 19:34:32 +00:00
lulf
4fbf65a82b - Fix a case where a RAID5 volume would think that it is supposed to grow a new
subdisk after a parity rebuild.
2009-05-06 19:18:19 +00:00
lulf
bb7cc8f64c - Check if any plexes are doing internal maintenance before removing them. 2009-05-06 19:06:28 +00:00
lulf
c2e31af67e - Add forgotten KASSERT. 2009-05-06 18:37:32 +00:00
lulf
9192c52290 - Fix a bug where the bio_data field of the wrong BIO is freed if an error
occurs when doing a RAID5 request.
2009-05-06 18:27:28 +00:00
lulf
5ef86e69f2 - GV_BIO_RETRY is not used, and it is actually impossible with more than 8
values for bio_cflags/bio_pflags.
2009-05-06 18:24:56 +00:00
lulf
2857e7c796 - Split the queue mutex into one for the event queue and one for the BIO queue,
as they do not really relate and to prepare for an additional queue to be
  covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
  as for putting a new element into the BIO queue.
2009-05-06 18:21:48 +00:00
lulf
5a4e3bbf09 - Make the gvinum softc invisible to userland, as it is not needed. 2009-05-04 17:30:20 +00:00
lulf
c18c28353d - Remove assertion of topology lock remaining from 7.x gvinum. It is not needed,
as the renaming only changes internal gvinum names and will not alter the geom
  topology.
- The topology lock was not held when calling g_wither_geom after renaming.
2009-04-18 16:36:27 +00:00
lulf
62687638df - Move out allocation part of different gvinum objects into its own routine and
make use of it in the gvinum userland code.
2009-04-10 08:50:14 +00:00
lulf
4b994f250d - Add files that should have been added in r190507. 2009-03-28 21:06:59 +00:00
lulf
14e3eb7296 Import the gvinum work that have been done during and after Summer of Code 2007.
The work have been under testing and fixing since then, and it is mature enough
to be put into HEAD for further testing.

A lot have changed in this time, and here are the most important:
- Gvinum now uses one single workerthread instead of one thread for each
  volume and each plex. The reason for this is that the previous scheme was
  very complex, and was the cause of many of the bugs discovered in gvinum.
  Instead, gvinum now uses one worker thread with an event queue, quite
  similar to what used in gmirror.
- The rebuild/grow/initialize/parity check routines no longer runs in
  separate threads, but are run as regular I/O requests with special flags.
  This made it easier to support mounted growing and parity rebuild.
- Support for growing striped and raid5-plexes, meaning that one can extend the
  volumes for these plex types in addition to the concat type. Also works while
  the volume is mounted.
- Implementation of many of the missing commands from the old vinum:
  attach/detach, start (was partially implemented), stop (was partially
  implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes
  with one plex of these organizations).
- The parity check and rebuild no longer goes between userland/kernel, meaning
  that the gvinum command will not stay and wait forever for the rebuild to
  finish. You can instead watch the status with the list command.
- Many problems with gvinum have been reported since 5.x, and some has been hard
  to fix due to the complicated architecture. Hopefully, it should be more
  stable and better handle edge cases that previously made gvinum crash.
- Failed drives no longer disappears entirely, but now leave behind a dummy
  drive that makes sure the original state is not forgotten in case the system
  is rebooted between drive failures/swaps.
- Update manpage to reflect new commands and extend it with some examples.

Sponsored by:   Google Summer of Code 2007
Mentored by:    le
Tested by:      Rick C. Petty <rick-freebsd2008 -at- kiwi-computer.com>
2009-03-28 17:20:08 +00:00
lulf
55553856e4 - Fix an issue with access permissions to underlying disks used by a gvinum
plex. If the plex is a raid5 plex, and is being written to, parity data might
  have to be read from the underlying disks, requiring them to be opened for
  reading as well as writing.

MFC after:	1 week
2008-12-27 14:32:39 +00:00
lulf
7e02db46fd - Fix a potential NULL pointer reference. Note that this cannot happen in
practice, but it is a good programming practice nontheless and it allows the
  kernel to not depend on userland correctness.

Found with:   Coverity Prevent(tm)
CID:          655-659, 664-667
2008-11-25 19:13:58 +00:00
lulf
7c44fbc5ad - Import macros used in gmirror for printing gvinum debug messages and making
the output more standardized.
- Add a sysctl to set the verbosity of the debug messages.
- While there, fixup typos and wording in the messages.
2008-10-26 17:20:37 +00:00
lulf
e37076e922 - Use the new gv_write_header function to write out the header when removing a
drive to make sure that the header is in the correct format.
2008-10-02 10:01:05 +00:00
lulf
f5703ce9b7 - Remove unneeded macro since the config_length field in the header was changed
to 64 bit in the new format.
2008-10-02 09:35:47 +00:00
lulf
badfc0cb43 - Make gvinum header on-disk structure consistent on all platforms by storing
the gvinum header in fields of fixed size and in a big endian byte order
  rather than the size and byte order of the actual platform.

Note that the change is backwards compatible with the old gvinum configuration
format, but will save the configuration in the new format when the 'saveconfig'
command is executed.

Submitted by:	Rick C. Petty <rick-freebsd -at- kiwi-computer.com>
2008-10-01 14:50:36 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
lulf
d5f479c4b3 - When renaming a drive, also set the drive name in the gvinum header.
PR:		kern/125632
Approved by:	pjd (mentor)
MFC after:	3 days
2008-07-19 13:53:11 +00:00
lulf
5cc7bcb02c - Fix a logic error when updating plex configuration.
Approved by:	pjd (mentor)
2008-07-11 16:46:29 +00:00
rwatson
051819b847 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
lulf
4cc0ad7f58 - Recognize the 'volume' parameter when creating a plex.
PR:		kern/75632
Approved by:	pjd (mentor)
MFC after:	1 day
2008-05-22 10:27:03 +00:00
lulf
8a5c25a52b - Fix a memory leak when re-discovering a gvinum configuration.
Approved by:	pjd (mentor)
MFC after:	1 week
2008-03-18 08:48:51 +00:00