Commit Graph

272 Commits

Author SHA1 Message Date
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Hans Petter Selasky
64282a1274 Slightly bump the maximum OID path for loading tunable SYSCTLs.
Coming updates to the mlx5en(4) driver will require this.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-02 12:42:46 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mateusz Guzik
8b9817a443 sysctl: try to avoid malloc in name2oid
name2oid is called all the time and passed names are almost always very short
(< 16 characters).
2017-11-11 21:50:36 +00:00
Mateusz Guzik
9b8de76beb sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE
The previous limit of just one page is hit by ps.

The entire mechanism should be reworked, if not whacked. It seems the intent
is to reduce kernel dos-ability - some handlers wire the amount of memory
passed here. Handlers should probably stop wiring in the first place or in
the worst case indicate they are doing so so that the check is done only if
necessary. It should also probably be a counter, not a lock.

MFC after:	1 week
2017-10-19 01:38:31 +00:00
Andriy Gapon
693593b6f0 sysctl-s in a module should be accessible only when the module is initialized
A sysctl can have a custom handler that may access data that is initialized
via SYSINIT(9) or via a module event handler (also invoked via SYSINIT).
Thus, it is not safe to allow access to the module's sysctl-s until
the initialization is performed.  Likewise, we should not allow access
to teh sysctl-s after the module is uninitialized.
The latter is easy to achieve by properly ordering linker_file_unregister_sysctls
and linker_file_sysuninit.
The former is not as easy for two reasons:
- the initialization may depend on tunables which get set when sysctl-s are
  registered, so we need to set the tunables before running sysinit-s
- the initialization may try to dynamically add more sysctl-s under statically
  defined sysctl nodes
So, this change splits the sysctl setup into two phases.  In the first phase
the sysctl-s are registered as before but they are disabled and hidden from
consumers.  In the second phase, done after sysinit-s, normal access to the
sysctl-s is enabled.

The change should affect only dynamic module loading and unloading after
the system boot-up.  Nothing changes for sysctl-s compiled into the kernel
and sysctl-s in preloaded modules.

Discussed with:	hselasky, ian, jhb
Reviewed by:	julian, kib
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D12545
2017-10-05 12:32:14 +00:00
Andriy Gapon
550374efe6 revert r324166, it has an unrelated change in it 2017-10-01 16:37:54 +00:00
Andriy Gapon
7d5c6491f0 MFV r323531: 8521 nvlist memory leak in get_clones_stat() and spa_load_best()
illumos/illumos-gate@7d3000f774
7d3000f774

https://www.illumos.org/issues/8521
  Yuri reported this to the mailing list:
  doing a `reboot -d` on current illumos-gate HEAD gives the following "::
  findleaks -dv" output:
  findleaks: maximum buffers => 301061
  findleaks: actual buffers => 297587
  findleaks:
  findleaks: potential pointers => 29289774
  findleaks: dismissals => 26242305 (89.5%)
  findleaks: misses => 331153 ( 1.1%)
  findleaks: dups => 2419681 ( 8.2%)
  findleaks: follows => 296635 ( 1.0%)
  findleaks:
  findleaks: peak memory usage => 7353 kB
  findleaks: elapsed CPU time => 1.5 seconds
  findleaks: elapsed wall time => 2.0 seconds
  findleaks:
  CACHE LEAKED BUFCTL CALLER
  ffffff03d222b008 120 ffffff03ef7ceb78 nv_alloc_sys+0x1f
  ffffff03d222a448 123 ffffff03f4150cc8 nv_alloc_sys+0x1f
  ffffff03d222b448 5 ffffff03f28bd598 nv_alloc_sys+0x1f
  ffffff03d222b888 87 ffffff03f28c10f0 nv_alloc_sys+0x1f
  ffffff03d222c008 21 ffffff03f4139310 nv_alloc_sys+0x1f
  ffffff03d222b888 43 ffffff040ef3f3e8 nv_alloc_sys+0x1f
  ffffff03d222c008 120 ffffff03f4591e58 nv_alloc_sys+0x1f
  ffffff03d222b008 121 ffffff03f352c068 nv_alloc_sys+0x1f
  ffffff03d222a448 112 ffffff03f414e5f8 nv_alloc_sys+0x1f
  ffffff03d222b008 119 ffffff03ee92fdc0 nv_alloc_sys+0x1f
  ffffff03d222b888 46 ffffff03f28c1378 nv_alloc_sys+0x1f
  ffffff03d222b448 4 ffffff03f28c7708 nv_alloc_sys+0x1f
  ffffff03d222c008 20 ffffff03f2a6e7e8 nv_alloc_sys+0x1f

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-01 16:34:16 +00:00
Mateusz Guzik
a79d52d739 sysctl: remove target buffer read/write checks prior to calling the handler
Said checks were inherently racy anyway as jokers could unmap target areas
before the handler got around to accessing them.

This saves time by avoiding locking the address space.

MFC after:	1 week
2017-09-27 01:31:52 +00:00
Mateusz Guzik
956713cb74 Annotate sysctlmemlock with __exclusive_cache_line.
MFC after:	1 week
2017-09-27 01:27:43 +00:00
Conrad Meyer
4ae2ade114 Enhance debugibility of sysctl leaf re-use warnings
Print the full conflicting oid path, and include the function name in the
warning so it is clear that the warnings are sysctl-related.

PR:		221853
Submitted by:	Fabian Keil <fk AT fabiankeil.de> (earlier version)
Sponsored by:	Dell EMC Isilon
2017-08-27 17:12:30 +00:00
Enji Cooper
8b69c3e79f Print out name of non-dynamic sysctl in sysctl_remove_oid_locked
This will provide a slightly better smoking gun than just stating
"can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9)
and sysctl_remove_{name,oid}(9) with a non-dynamic (likely
static) sysctl.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-03-22 05:27:20 +00:00
Ed Schouten
669a25b50d Document the existence of the {0, 6, ...} sysctl. 2016-12-15 15:45:11 +00:00
Ed Schouten
1e1f3941e4 Add support for attaching aggregation labels to sysctl objects.
I'm currently working on writing a metrics exporter for the Prometheus
monitoring system to provide access to sysctl metrics. Prometheus and
sysctl have some structural differences:

- sysctl is a tree of string component names.
- Prometheus uses a flat namespace for its metrics, but allows you to
  attach labels with values to them, so that you can do aggregation.

An initial version of my exporter simply translated

    hw.acpi.thermal.tz1.temperature

to

    sysctl_hw_acpi_thermal_tz1_temperature_celcius

while we should ideally have

    sysctl_hw_acpi_thermal_temperature_celcius{thermal_zone="tz1"}

allowing you to graph all thermal zones on a system in one go.

The change presented in this commit adds support for accomplishing this,
by providing the ability to attach labels to nodes. In the example I
gave above, the label "thermal_zone" would be attached to "tz1". As this
is a feature that will only be used very rarely, I decided to not change
the KPI too aggressively.

Discussed on:	hackers@
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D8775
2016-12-14 12:47:34 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Hans Petter Selasky
84e717c4cf Add support for boolean sysctl's.
Because the size of bool can be implementation defined, make a bool
sysctl handler which handle bools. Userspace sees the bools like
unsigned 8-bit integers. Values are filtered to either 1 or 0 upon
read and write, similar to what a compiler would do.

Requested by:	kmacy @
Sponsored by:	Mellanox Technologies
2016-05-26 08:41:55 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Pedro F. Giffuni
b85f65af68 kern: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 16:10:11 +00:00
Mark Johnston
87b87bb853 Evaluate the sysctl_running fail point before taking the sysctl lock.
The fail point handler may sleep, but this is not permitted while holding a
rm read lock.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-01-26 01:15:18 +00:00
Conrad Meyer
e072f955ff Flesh out sysctl types further (follow-up of r290475)
Use the right intmax_t type instead of intptr_t in a few remaining
places.

Add support for CTLFLAG_TUN for the new fixed with types.  Bruce will be
upset that the new handlers silently truncate tuned quad-sized inputs,
but so do all of the existing handlers.

Add the new types to debug_dump_node, for whatever use that is.

Bump FreeBSD_version again, for good measure.  We are changing
SYSCTL_HANDLER_ARGS and a member of struct sysctl_oid to intmax_t.

Correct the sysctl typed NULL values for the fixed-width types.  (Hat
tip: hps@.)

Suggested by:	hps (partial)
Sponsored by:	EMC / Isilon Storage Division
2015-11-07 18:26:32 +00:00
Conrad Meyer
be87839e56 Round out SYSCTL macros to the full set of fixed-width types
Add S8, S16, S32, and U32 types;  add SYSCTL*() macros for them, as well
as for the existing 64-bit types.  (While SYSCTL*QUAD and UQUAD macros
already exist, they do not take the same sort of 'val' parameter that
the other macros do.)

Clean up the documented "types" in the sysctl.9 document.  (These are
macros and thus not real types, but the manual page documents intent.)

The sysctl_add_oid(9) arg2 has been bumped from intptr_t to intmax_t to
accommodate 64-bit types on 32-bit pointer architectures.

This is just the kernel support piece; the userspace sysctl(1) support
will follow in a later patch.

Submitted by:	Ravi Pokala <rpokala@panasas.com>
Reviewed by:	cem
Relnotes:	no
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D4091
2015-11-07 01:43:01 +00:00
Conrad Meyer
c3220d0b6d Sysctl: Add common support for U8, U16 types
Sponsored by:	EMC / Isilon Storage Division
2015-10-22 23:03:06 +00:00
Mateusz Guzik
7665e341ca sysctl: switch sysctllock to a sleepable rmlock, take 2
This restores r285125. Previous attempt was reverted due to a bug in rmlocks,
which is fixed since r287833.
2015-09-15 23:06:56 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Patrick Kelsey
6f99ea0520 Don't acquire sysctlmemlock in userland_sysctl() when the old value
pointer is NULL, as in that case there are no userland pages that
could potentially be wired.  It is common for old to be NULL and
oldlenp to be non-NULL in calls to userland_sysctl(), as this is used
to probe for the length of a variable-length sysctl entry before
retrieving a value.  Note that it is typical for such calls to be made
with an uninitialized value in *oldlenp, so sysctlmemlock was
essentially being acquired at random (depending on the uninitialized
value in *oldlenp being > PAGE_SIZE or not) for these calls prior to
this patch.

Differential Revision: https://reviews.freebsd.org/D2987
Reviewed by: mjg, kib
Approved by: jmallett (mentor)
MFC after: 1 month
2015-07-06 16:07:21 +00:00
Mateusz Guzik
ee5f66f820 sysctl: get rid of sysctl_lock/unlock
Inline their contents into the only consumer.
2015-07-04 14:44:39 +00:00
Mateusz Guzik
d5fc115a1a sysctl: remove a debugging printf which crept in with r285125 2015-07-04 07:01:43 +00:00
Mateusz Guzik
b8633775a8 sysctl: switch sysctllock to a sleepable rmlock
The lock is almost never taken for writing.
2015-07-04 06:54:15 +00:00
Hans Petter Selasky
38668c6044 Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:

- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.

- Add simple check for duplicate non-automatic OID numbers.

MFC after:  1 week
2015-03-25 08:55:34 +00:00
Hans Petter Selasky
502702c644 Make sure tunable sysctls are only fetched once. The existing code can
re-register sysctls when destroying sysctl contexts or when moving
sysctls from one tree to another.
2015-03-24 17:42:53 +00:00
Hans Petter Selasky
ab91c9a743 Correct string pointer offset for error printout. 2015-03-24 16:37:19 +00:00
Ian Lepore
2834924513 In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the
passed-in pointer is NULL and the length is zero.
2015-03-17 20:56:24 +00:00
Ian Lepore
1eafc07856 Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
Mateusz Guzik
84267cacf4 sysctl: don't modify oid_running for static nodes
It is necessary to prevent nodes from being destroyed while used, but static
ones cannot be destroyed.
2014-12-28 19:24:01 +00:00
Mateusz Guzik
f84f8f9468 Now that sysctl_root is only called with sysctl lock in shared mode, update
its assertion to require that.

Update comment missed in r273400: sysctl_xlock/unlock -> sysctl_xlock/xunlock

Noted by: jhb
2014-10-26 01:47:55 +00:00
Dag-Erling Smørgrav
b0d69dfad9 In all cases except CTLTYPE_STRING, penv is NULL here, so passing it
indiscriminately to printf() and freeenv() is incorrect.  Add a NULL
check before freeenv(); as for printf(), we could use req.newptr
instead, but we'd have to select the correct format string based on
the type, and that's too much work for an error message, so just
remove it.
2014-10-23 22:42:56 +00:00
Mateusz Guzik
fca7732078 Mark some more sysctl stuff shared-locked and MPSAFE. 2014-10-21 21:08:45 +00:00
Mateusz Guzik
b564c5d6aa Make sysctl name2oid shared-locked as well.
This is a follow-up to r273401.
2014-10-21 19:45:08 +00:00
Mateusz Guzik
efe0abddf5 Implement shared locking for sysctl. 2014-10-21 19:05:44 +00:00
Mateusz Guzik
580a011762 Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock. 2014-10-21 19:02:26 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Mateusz Guzik
30d58d6b39 Don't make a temporary copy of fixed sysctl strings. 2014-07-10 21:46:57 +00:00
Hans Petter Selasky
604bf9d37e When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
2014-07-05 06:12:48 +00:00
Hans Petter Selasky
6a3287f889 Fix regression issue after r267961. Handle special string case for
SYSCTLs like previously.

MFC after:	2 weeks
Reported by:	several people
2014-06-28 03:59:04 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Gleb Smirnoff
b5c32cf481 Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
in the sysctl_root().

Note: SYSCTL_VNET_* macros can be removed as well. All is
  needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
  But for now keep macros in place to avoid large code churn.

Sponsored by:	Nginx, Inc.
2014-02-07 13:47:33 +00:00
Scott Long
f510415d84 Add a helpful message that can help point to why a sysctl tree removal failed
Obtained from:	Netflix
MFC after:	3 days
2013-08-09 01:04:44 +00:00