Commit Graph

253 Commits

Author SHA1 Message Date
Mark Johnston
9e575fadf4 link_elf_obj: Invoke fini callbacks
This is required for KASAN: when a module is unloaded, poisoned regions
(e.g., pad areas between global variables) are left as such, so if they
are reused as KLDs are loaded, false positives can arise.

Reported by:	pho, Jenkins
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31339
2021-07-29 09:46:25 -04:00
Konstantin Belousov
e266a0f7f0 kern linker: do not allow more than one kldload and kldunload syscalls simultaneously
kld_sx is dropped e.g. for executing sysinits, which allows user
to initiate kldunload while module is not yet fully initialized.

Reviewed by:	markj
Differential revision:	https://reviews.freebsd.org/D30456
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-31 18:09:22 +03:00
Warner Losh
f1f9870668 Minor style cleanup
We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After:		3 days
Sponsored by:		Netflix
2021-04-18 11:14:17 -06:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Peter Wemm
694f3fc81c Clarify which hints file is the source of an error message.
PR:		246688
Submitted by:	Ashish Gupta <lrx337@gmail.com>
MFC after:	1 week
2020-06-01 03:37:58 +00:00
Mark Johnston
615a9fb5fe Use the symbolic name for "modmetadata_set".
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-05-19 18:34:50 +00:00
Mateusz Guzik
d2222aa0e9 fd: use smr for managing struct pwd
This has a side effect of eliminating filedesc slock/sunlock during path
lookup, which in turn removes contention vs concurrent modifications to the fd
table.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23889
2020-03-08 00:23:36 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Ryan Libby
fe20aaec0a sys/kern: quiet -Wwrite-strings
Quiet a variety of Wwrite-strings warnings in sys/kern at low-impact
sites.  This patch avoids addressing certain others which would need to
plumb const through structure definitions.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23798
2020-02-23 03:32:16 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Ian Lepore
b19c9dea3e Rewrite arm kernel stack unwind code to work when unwinding through modules.
The arm kernel stack unwinder has apparently never been able to unwind when
the path of execution leads through a kernel module. There was code that
tried to handle modules by looking for the unwind data in them, but it did
so by trying to find symbols which have never existed in arm kernel
modules. That caused the unwind code to panic, and because part of panic
handling calls into the unwind code, that just created a recursion loop.

Locating the unwind data in a loaded module requires accessing the Elf
section headers to find the SHT_ARM_EXIDX section. For preloaded modules
those headers are present in a metadata blob. For dynamically loaded
modules, the headers are present only while the loading is in progress; the
memory is freed once the module is ready to use. For that reason, there is
new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
unwind info while the headers are loaded. The values are saved into new
fields in the linker_file structure which are also conditional on __arm__.

In arm/unwind.c there is new code to locally cache the per-module info
needed to find the unwind tables. The local cache is crafted for lockless
read access, because the unwind code often needs to run in context where
sleeping is not allowed.  A large comment block describes the local cache
list, so I won't repeat it all here.
2019-12-15 21:16:35 +00:00
Hans Petter Selasky
c2a8682ae8 Factor out check for mounted root file system.
Differential Revision:	https://reviews.freebsd.org/D22571
PR:		241639
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-28 08:47:36 +00:00
Hans Petter Selasky
aa4612d133 Fix panic when loading kernel modules before root file system is mounted.
Make sure the rootvnode is always NULL checked.

Differential Revision:	https://reviews.freebsd.org/D22545
PR:		241639
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-26 12:20:44 +00:00
Conrad Meyer
d158fa4ade Add flags variants to linker_files / stack(9) symbol resolution
Some best-effort consumers may find trylock behavior for stack(9) symbol
resolution acceptable.  Expose that behavior to such consumers.

This API is ugly.  If in the future the modules and linker file list locking
is cleaned up such that the linker_files list can be iterated safely without
acquiring a sleepable lock, this API should be removed.  However, most of
the time nothing will be holding the linker files lock exclusive and the
acquisition can proceed.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17620
2018-10-20 18:08:43 +00:00
Bjoern A. Zeeb
0455a92bcb The countp argument passed to linker_file_lookup_set() in
linker_load_dependencies() is unused, so no need to ask for the
value in first place.  Remove the unused "count" variable.

Approved by:	re (kib)
2018-10-17 10:31:08 +00:00
Ed Maste
891cf3ed44 Use NULL for SYSINIT's last arg, which is a pointer type
Sponsored by:	The FreeBSD Foundation
2018-05-18 17:58:09 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Gordon Tetlow
edb01d11f8 Properly bzero kldstat structure to prevent kernel information leak.
Submitted by:	kib
Reported by:	TJ Corley
Security:	CVE-2017-1088
2017-11-15 22:30:21 +00:00
Andriy Gapon
693593b6f0 sysctl-s in a module should be accessible only when the module is initialized
A sysctl can have a custom handler that may access data that is initialized
via SYSINIT(9) or via a module event handler (also invoked via SYSINIT).
Thus, it is not safe to allow access to the module's sysctl-s until
the initialization is performed.  Likewise, we should not allow access
to teh sysctl-s after the module is uninitialized.
The latter is easy to achieve by properly ordering linker_file_unregister_sysctls
and linker_file_sysuninit.
The former is not as easy for two reasons:
- the initialization may depend on tunables which get set when sysctl-s are
  registered, so we need to set the tunables before running sysinit-s
- the initialization may try to dynamically add more sysctl-s under statically
  defined sysctl nodes
So, this change splits the sysctl setup into two phases.  In the first phase
the sysctl-s are registered as before but they are disabled and hidden from
consumers.  In the second phase, done after sysinit-s, normal access to the
sysctl-s is enabled.

The change should affect only dynamic module loading and unloading after
the system boot-up.  Nothing changes for sysctl-s compiled into the kernel
and sysctl-s in preloaded modules.

Discussed with:	hselasky, ian, jhb
Reviewed by:	julian, kib
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D12545
2017-10-05 12:32:14 +00:00
Andriy Gapon
550374efe6 revert r324166, it has an unrelated change in it 2017-10-01 16:37:54 +00:00
Andriy Gapon
7d5c6491f0 MFV r323531: 8521 nvlist memory leak in get_clones_stat() and spa_load_best()
illumos/illumos-gate@7d3000f774
7d3000f774

https://www.illumos.org/issues/8521
  Yuri reported this to the mailing list:
  doing a `reboot -d` on current illumos-gate HEAD gives the following "::
  findleaks -dv" output:
  findleaks: maximum buffers => 301061
  findleaks: actual buffers => 297587
  findleaks:
  findleaks: potential pointers => 29289774
  findleaks: dismissals => 26242305 (89.5%)
  findleaks: misses => 331153 ( 1.1%)
  findleaks: dups => 2419681 ( 8.2%)
  findleaks: follows => 296635 ( 1.0%)
  findleaks:
  findleaks: peak memory usage => 7353 kB
  findleaks: elapsed CPU time => 1.5 seconds
  findleaks: elapsed wall time => 2.0 seconds
  findleaks:
  CACHE LEAKED BUFCTL CALLER
  ffffff03d222b008 120 ffffff03ef7ceb78 nv_alloc_sys+0x1f
  ffffff03d222a448 123 ffffff03f4150cc8 nv_alloc_sys+0x1f
  ffffff03d222b448 5 ffffff03f28bd598 nv_alloc_sys+0x1f
  ffffff03d222b888 87 ffffff03f28c10f0 nv_alloc_sys+0x1f
  ffffff03d222c008 21 ffffff03f4139310 nv_alloc_sys+0x1f
  ffffff03d222b888 43 ffffff040ef3f3e8 nv_alloc_sys+0x1f
  ffffff03d222c008 120 ffffff03f4591e58 nv_alloc_sys+0x1f
  ffffff03d222b008 121 ffffff03f352c068 nv_alloc_sys+0x1f
  ffffff03d222a448 112 ffffff03f414e5f8 nv_alloc_sys+0x1f
  ffffff03d222b008 119 ffffff03ee92fdc0 nv_alloc_sys+0x1f
  ffffff03d222b888 46 ffffff03f28c1378 nv_alloc_sys+0x1f
  ffffff03d222b448 4 ffffff03f28c7708 nv_alloc_sys+0x1f
  ffffff03d222c008 20 ffffff03f2a6e7e8 nv_alloc_sys+0x1f

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>

MFC after:	5 weeks
X-MFC after:	r324163
2017-10-01 16:34:16 +00:00
Conrad Meyer
ca3fec5042 kldstat: Use sizeof in place of named constants for sizing
No functional change.

This is handy for FreeBSD derivatives that want to modify the value of
MAXPATHLEN, but not the kld_file_stat ABI.

Submitted by:	Siddhant Agarwal <sagarwal AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-07-29 23:31:21 +00:00
Gleb Smirnoff
d0147e10ca In linker_load_file() print name of a file that failed to load.
Discussed with:	kib
2017-03-09 00:56:07 +00:00
Conrad Meyer
d9ce8a41ea kern_linker: Handle module-loading failures in preloaded .ko files
The runtime kernel loader, linker_load_file, unloads kernel files that
failed to load all of their modules. For consistency, treat preloaded
(loader.conf loaded) kernel files in the same way.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8200
2016-10-13 02:06:23 +00:00
Conrad Meyer
8a3aeac27b Add DDB command "kldstat"
It prints much the same information as kldstat(8) without any arguments.

Suggested by:	jhibbits
Sponsored by:	EMC / Isilon Storage Division
2016-06-09 18:27:41 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Edward Tomasz Napierala
35030a5dd4 Remove some NULL checks for M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-29 13:56:59 +00:00
Warner Losh
da82615ae2 Create the MDT_PNP_INFO metadata record to communicate PNP info about
modules. External agents may use this data to automatically load those
modules.

Differential Review: https://reviews.freebsd.org/D3461
2015-12-11 05:27:53 +00:00
John Baldwin
645743ea99 Export various helper variables describing the layout and size of
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.

One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.

A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.

The 'pcb_size' variable is used to index into the stoppcbs[] array.

The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.

While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved.  Annotations for
other platforms will be added in the future.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3773
2015-11-12 22:00:59 +00:00
Mateusz Guzik
7665e341ca sysctl: switch sysctllock to a sleepable rmlock, take 2
This restores r285125. Previous attempt was reverted due to a bug in rmlocks,
which is fixed since r287833.
2015-09-15 23:06:56 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Mateusz Guzik
b8633775a8 sysctl: switch sysctllock to a sleepable rmlock
The lock is almost never taken for writing.
2015-07-04 06:54:15 +00:00
John Baldwin
15c2b30155 Revert r284153, as I believe it breaks the dtrace sdt module. I will
fix the original issue a different way.
2015-06-08 18:06:00 +00:00
John Baldwin
69c5c774fe Add an internal "locked" variant of linker_file_lookup_set() and change
the public function to acquire the global linker lock directly.  This
permits linker_file_lookup_set() to be safely used from other modules.
2015-06-08 14:06:47 +00:00
Warner Losh
d0b6da086f Const poison in a few places to ensure we don't modify things
through the module data pointer.
2014-12-03 22:14:13 +00:00
Warner Losh
fac92ae126 The current limit of 100k for the linker hints file is getting a bit
crowded as we now are at about 70k. Bump the limit to 1MB instead
which is still quite a reasonable limit and allows for future growth
of this file and possible future expansion to additional data.

MFC After: 2 weeks
2014-11-29 17:29:30 +00:00
Mateusz Guzik
2afec8edfc Take the lock shared in linker_search_symbol_name.
This helps sysctl kern.proc.stack.
2014-10-21 21:29:20 +00:00
Mateusz Guzik
580a011762 Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock. 2014-10-21 19:02:26 +00:00
Marcel Moolenaar
0067051fe7 Fully support constructors for the purpose of code coverage analysis.
This involves:
1.  Have the loader pass the start and size of the .ctors section to the
    kernel in 2 new metadata elements.
2.  Have the linker backends look for and record the start and size of
    the .ctors section in dynamically loaded modules.
3.  Have the linker backends call the constructors as part of the final
    work of initializing preloaded or dynamically loaded modules.

Note that LLVM appends the priority of the constructors to the name of
the .ctors section. Not so when compiling with GCC. The code currently
works for GCC and not for LLVM.

Submitted by:	Dmitry Mikulin <dmitrym@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-10-20 17:04:03 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Konstantin Belousov
14fcb4b4f8 Use realloc(9) instead of doing the reallocation inline.
Submitted by:	bde
MFC after:	1 week
2014-04-05 20:44:52 +00:00
Konstantin Belousov
cee9542d51 Use correct types for sizeof() in the calculations for the malloc(9) sizes [1].
While there, remove unneeded checks for failed allocations with M_WAITOK flag.

Submitted by:	Conrad Meyer <cemeyer@uw.edu> [1]
MFC after:	1 week
2014-03-12 10:25:26 +00:00
Mark Johnston
8f7254629f Invoke the kld_* event handlers from linker_load_file() and
linker_unload_file() rather than kern_kldload() and kern_kldunload(). This
ensures that the handlers are invoked for files that are loaded/unloaded
automatically as dependencies. Previously, they were only invoked for files
loaded by a user.

As a side effect, the kld_load and kld_unload handlers are now invoked with
the kernel linker lock exclusively held.

Reported by:	avg
Reviewed by:	jhb
MFC after:	2 weeks
2013-12-19 03:48:36 +00:00
Mark Johnston
29f4e216f2 Rename the kld_unload event handler to kld_unload_try, and add a new
kld_unload event handler which gets invoked after a linker file has been
successfully unloaded. The kld_unload and kld_load event handlers are now
invoked with the shared linker lock held, while kld_unload_try is invoked
with the lock exclusively held.

Convert hwpmc(4) to use these event handlers instead of having
kern_kldload() and kern_kldunload() invoke hwpmc(4) hooks whenever files are
loaded or unloaded. This has no functional effect, but simplifes the linker
code somewhat.

Reviewed by:	jhb
2013-08-24 21:13:38 +00:00
Mark Johnston
0770f164d3 Set things up so that linker_file_lookup_set() is always called with the
linker lock held. This makes it possible to call it from a kld event handler
with the shared lock held.

Reviewed by:	jhb
2013-08-24 21:08:55 +00:00