Commit Graph

115 Commits

Author SHA1 Message Date
Alan Somers
d3f96f6610 Fix O(n^2) behavior in sysctl
Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them.  The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma).  But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.

Convert the linked lists into RB trees.  This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.

Bump __FreeBSD_version for the KPI change.

Sponsored by:	Axcient
Reviewed by:	mjg
Differential Revision: https://reviews.freebsd.org/D36500
2022-09-26 18:03:34 -06:00
Allan Jude
b20ec58669 vfs.typenumhash: fix sysctl description
a string continuation was missing a space, resulting in two works
being smushed together.

Sponsored by:	Klara, Inc.
2022-09-10 22:47:51 +00:00
Konstantin Belousov
eca39864f7 Add sysctl KERN_LOCKF
reporting the shapshot of the active advisory locks.

A new VFS ops method vfs_report_lockf if provided in the mount point
op table.  If it is NULL, as it is currently for all existing
filesystems, vfs_report_lockf() function is used, which gathers
information from the standard implementation inside kern/kern_lockf.c.

Filesystems implementing its own locking (NFSv4 as example) can provide
a custom implementation.

Reviewed by:	markj, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34756
2022-04-10 00:43:53 +03:00
Marvin Ma
1517b8d5a7 vfs_unregister: fix error handling
Due to misplaced braces, an error from vfs_uninit() in the VFCF_SBDRY
case was ignored.

Reported by:	Anton Rang <rang@acm.org>
Reviewed by:	Anton Rang <rang@acm.org>, markj
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34375
2022-02-25 12:19:14 -06:00
Jason A. Harmening
a4b07a2701 VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h.  Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
2021-05-30 14:53:47 -07:00
Jason A. Harmening
271fcf1c28 Revert commits 6d3e78ad6c and 54256e7954
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason.  Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
2021-05-29 17:48:02 -07:00
Jason A. Harmening
6d3e78ad6c VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
2021-05-29 14:05:39 -07:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
dc20b834ca vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by:	jeff (previous version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:14:32 +00:00
Konstantin Belousov
8ff7fad1d7 Only call sigdeferstop() for NFS.
Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17658
2018-10-23 21:43:41 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Mateusz Guzik
7665e341ca sysctl: switch sysctllock to a sleepable rmlock, take 2
This restores r285125. Previous attempt was reverted due to a bug in rmlocks,
which is fixed since r287833.
2015-09-15 23:06:56 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Mateusz Guzik
b8633775a8 sysctl: switch sysctllock to a sleepable rmlock
The lock is almost never taken for writing.
2015-07-04 06:54:15 +00:00
Konstantin Belousov
69baeadc31 Remove several write-only variables, all reported by the gcc 4.9
buildkernel run.

Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros.  It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.

Review:	https://reviews.freebsd.org/D2665
Reviewed by:	rodrigc
Looked at by:	bjk
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 13:24:17 +00:00
Mateusz Guzik
580a011762 Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock. 2014-10-21 19:02:26 +00:00
Konstantin Belousov
168f4ee0a8 Remove Giant acquisition from the mount and unmount pathes.
It could be claimed that two things were reasonable protected by
Giant.  One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx.  Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-03 03:27:54 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Peter Wemm
0a943e59c9 Revert accidental commit.
Pointy hat to:	 peter
2013-06-29 05:05:57 +00:00
Peter Wemm
76665df9bf Help out gcc. clang understands.
sys_generic.c:1510: warning: 'precision' may be used uninitialized
*** [sys_generic.o] Error code 1
2013-06-29 04:35:04 +00:00
Jamie Gritton
ffc72591b1 Don't worry if a module is already loaded when looking for a fstype to mount
(possible in a race condition).

Reviewed by:	kib
MFC after:	1 week
2013-02-21 02:41:37 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Rick Macklem
4d30adc494 Modify vfs_register() to use a hash calculation
on vfc_name to set vfc_typenum, so that vfc_typenum doesn't
change when file systems are loaded in different orders. This
keeps NFS file handles from changing, for file systems that
use vfc_typenum in their fsid. This change is controlled via
a loader.conf variable called vfs.typenumhash, since vfc_typenum
will change once when this is enabled. It defaults to 1 for
9.0, but will default to 0 when MFC'd to stable/8.

Tested by:	hrs
Reviewed by:	jhb, pjd (earlier version)
Approved by:	re (kib)
MFC after:	1 month
2011-09-13 21:01:26 +00:00
John Baldwin
875b66a05b Expand the scope of the sysctllock sx lock to protect the sysctl tree itself.
Back in 1.1 of kern_sysctl.c the sysctl() routine wired the "old" userland
buffer for most sysctls (everything except kern.vnode.*).  I think to prevent
issues with wiring too much memory it used a 'memlock' to serialize all
sysctl(2) invocations, meaning that only one user buffer could be wired at
a time.  In 5.0 the 'memlock' was converted to an sx lock and renamed to
'sysctl lock'.  However, it still only served the purpose of serializing
sysctls to avoid wiring too much memory and didn't actually protect the
sysctl tree as its name suggested.  These changes expand the lock to actually
protect the tree.

Later on in 5.0, sysctl was changed to not wire buffers for requests by
default (sysctl_handle_opaque() will still wire buffers larger than a single
page, however).  As a result, user buffers are no longer wired as often.
However, many sysctl handlers still wire user buffers, so it is still
desirable to serialize userland sysctl requests.  Kernel sysctl requests
are allowed to run in parallel, however.

- Expose sysctl_lock()/sysctl_unlock() routines to exclusively lock the
  sysctl tree for a few places outside of kern_sysctl.c that manipulate
  the sysctl tree directly including the kernel linker and vfs_register().
- sysctl_register() and sysctl_unregister() require the caller to lock
  the sysctl lock using sysctl_lock() and sysctl_unlock().  The rest of
  the public sysctl API manage the locking internally.
- Add a locked variant of sysctl_remove_oid() for internal use so that
  external uses of the API do not need to be aware of locking requirements.
- The kernel linker no longer needs Giant when manipulating the sysctl
  tree.
- Add a missing break to the loop in vfs_register() so that we stop looking
  at the sysctl MIB once we have changed it.

MFC after:	1 month
2009-02-06 14:51:32 +00:00
Pawel Jakub Dawidek
2c7b0f41ec Remove VFS_VPTOFH entirely. API is already broken and it is good time to
do it.

Suggested by:	rwatson
2007-02-16 17:32:41 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
John Baldwin
322fb40cbf Remove duplicate security checks already performed in kern_kldload(). 2006-06-26 18:33:32 +00:00
John Baldwin
edd32c2da2 Use kern_kldload() and kern_kldunload() to load and unload modules when
we intend for the user to be able to unload them later via kldunload(2)
instead of calling linker_load_module() and then directly adjusting the
ref count on the linker file structure.  This makes the resulting
consumer code simpler and cleaner and better hides the linker internals
making it possible to sanely lock the linker.
2006-06-13 21:36:23 +00:00
David Schultz
e8ed933099 Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with:	mckusick
Reviewed by:	phk
2005-02-20 23:02:20 +00:00
Poul-Henning Kamp
ebbfc2f82d Make various mountpoint related functions static. 2005-02-10 12:25:38 +00:00
Poul-Henning Kamp
63f89abf4a Change the generated VOP_ macro implementations to improve type checking
and KASSERT coverage.

After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.

Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.

We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.

    Add new (non-inline) VOP_FOO_AP() functions which take a "struct
    foo_args" argument and does everything the VOP_FOO() macros
    used to do with checks and debugging code.

    Add KASSERT to VOP_FOO_AP() check for argument type being
    correct.

    Slim down VOP_FOO() inline functions to just stuff arguments
    into the struct foo_args and call VOP_FOO_AP().

    Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
    and make VCALL() use it instead of the current offsetoff() hack.

    Retire vcall() which implemented the offsetoff()

    Make deadfs and unionfs use VOP_FOO_AP() calls instead of
    VCALL(), we know which specific call we want already.

    Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
    functions.

    Remove unused vdesc_offset and VOFFSET().

    Generally improve style/readability of the generated code.
2005-01-13 07:53:01 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
8d8883caaf make "ffs" and alias for "ufs" when it comes to filesystem names. 2004-12-06 22:22:57 +00:00
Poul-Henning Kamp
32ba8e9390 Introduce vfs_byname_kld() which will try to load the filesystem
as a module if possible.

Use it so we don't have linker magic in the middle of the already
complex mount code.
2004-12-03 16:11:01 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
273350ad0f Simplify initialization of va_null a little bit. 2004-09-15 21:42:03 +00:00
Poul-Henning Kamp
5e8c582ac2 Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
Poul-Henning Kamp
3dfe213e61 Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Alfred Perlstein
81d16e2d64 do the vfsstd thing instead of messing up our VFS_SYSCTL macro. 2004-07-07 06:58:29 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Colin Percival
a20e9655b9 Remove opv_desc_vector from vfs_add_vnodeops, since it is defined
and given a value, but never used.  This has no effect on the
resulting binaries, since gcc optimizes the variable away anyway.

PR:		kern/62684
Approved by:	rwatson (mentor)
2004-02-15 17:27:33 +00:00
Mike Silbersack
184dcdc7c8 Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.
2003-10-21 18:28:36 +00:00