Commit Graph

10410 Commits

Author SHA1 Message Date
Warner Losh
e4fc8cadca cdefs.h: remove intel_compiler support
The  age  of   the  intel  compiler  support  is  so   old  as  to  be
uninteresting. No recent recports of  intel compiler support have been
received.  Remove  all the  special  case  workarounds for  the  Intel
compiler. Should there be interest in supporting the compiler, contact
me and I'll work with people to make it happen, though I suspect these
instances are more likely to be in the way than to be helpful.

Reviewed by: cem, emaste, vangyzen, dim
Differential Revision: https://reviews.freebsd.org/D26817
2020-10-24 23:21:31 +00:00
Warner Losh
60b426f46c Remove obsolete check for GCC < 3 and support for Intel Compiler
We no longer support old versions of GCC. Remove this check by
assuming it's false. That will make the entire expression false.  Also
remove support for Intel compiler, it's badly bitrotted.  Technically,
this removes support for C89 and K&R from compilers that don't define
_Bool in those compilation environments as well. I'm unaware of any
working compiler today for which that would be relevant (pcc has it
and tcc sadly isn't working for other reasons), though if one
pops up in ports, I'll work to resolve the issue.
2020-10-24 23:21:06 +00:00
Alexander Motin
8b220f8915 Fix asymmetry in devstat(9) calls by GEOM.
Before this GEOM passed bio pointer to transaction start, but not end.
It was irrelevant until devstat(9) got DTrace hooks, that appeared to
provide bio pointer on I/O completion, but not on submission.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-10-24 21:07:10 +00:00
Mateusz Guzik
c7520caa4f vfs: prevent avoidable evictions on mkdir of existing directories
mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.

The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.

For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.

Tested by:	pho
Discussed with:	kib
2020-10-22 19:28:12 +00:00
Hans Petter Selasky
2ae634c6db Implement mbuf hashing routines for IP over infiniband, IPoIB.
No functional change intended.

Differential Revision:	https://reviews.freebsd.org/D26254
Reviewed by:		melifaro@
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-22 09:17:56 +00:00
Brooks Davis
44ca4575ea vmapbuf: don't smuggle address or length in buf
Instead, add arguments to vmapbuf.  Since this argument is
always a pointer use a type of void * and cast to vm_offset_t in
vmapbuf.  (In CheriBSD we've altered vm_fault_quick_hold_pages to
take a pointer and check its bounds.)

In no other situtation does b_data contain a user pointer and vmapbuf
replaces b_data with the actual mapping.

Suggested by:	jhb
Reviewed by:	imp, jhb
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26784
2020-10-21 16:00:15 +00:00
Mateusz Guzik
3fc7822de1 Bump __FreeBSD_version after VOP VPTOCNP and INACTIVE changes 2020-10-20 07:19:44 +00:00
Mateusz Guzik
8ecd87a3e7 vfs: drop spurious cred argument from VOP_VPTOCNP 2020-10-20 07:18:27 +00:00
Ruslan Bukin
e707c8be4e Manage MSI iommu pages.
This allows the interrupt controller driver only need a small change to
create a map for the page the device will write to raise an interrupt.

Submitted by:	andrew
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26705
2020-10-19 13:10:21 +00:00
Mateusz Guzik
ad89066af4 vfs: annotate mountlist_mtx with __exclusive_cache_line 2020-10-17 08:47:08 +00:00
Xin LI
5853735d5d Bump __FreeBSD_version after ptsname_r addition. 2020-10-17 04:14:46 +00:00
Mateusz Guzik
ad429c47ce Bump __FreeBSD_version after addition of VOP_EAGAIN 2020-10-15 05:11:16 +00:00
Mateusz Guzik
214eccf4b6 vfs: add VOP_EAGAIN
Can be used to stub fplookup for example.
2020-10-15 04:48:14 +00:00
Andrey V. Elsukov
6952c3e1ac Implement SIOCGIFALIAS.
It is lightweight way to check if an IPv4 address exists.

Submitted by:	Roy Marples
Reviewed by:	gnn, melifaro
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26636
2020-10-14 09:22:54 +00:00
Andrew Turner
ed50d40834 Bump __FreeBSD_version for the fix to arm64 write-only mappings
Sponsored by:	Innovate UK
2020-10-13 10:31:12 +00:00
Warner Losh
af928ad562 systm.h: forward declare ucred for _STANDALONE too
There's a number of types we forward declare for the kernel. We need
struct ucred for the ZSTD ZFS integration, so go ahead and forward
declare it here too.
2020-10-12 05:56:29 +00:00
Conrad Meyer
f8e8a06d23 random(4) FenestrasX: Push root seed version to arc4random(3)
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled.  Otherwise, there is no
functional change.  The mechanism can be disabled with
debug.fxrng_vdso_enable=0.

arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time.  Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed.  On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.

This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator.  That change is
left for future discussion/work.

Reviewed by:	kib (previous version)
Approved by:	csprng (me -- only touching FXRNG here)
Differential Revision:	https://reviews.freebsd.org/D22839
2020-10-10 21:52:00 +00:00
Conrad Meyer
10b1a17594 arc4random(9): Integrate with RANDOM_FENESTRASX push-reseed
There is no functional change for the existing Fortuna random(4)
implementation, which remains the default in GENERIC.

In the FenestrasX model, when the root CSPRNG is reseeded from pools due to
an (infrequent) timer, child CSPRNGs can cheaply detect this condition and
reseed.  To do so, they just need to track an additional 64-bit value in the
associated state, and compare it against the root seed version (generation)
on random reads.

This revision integrates arc4random(9) into that model without substantially
changing the design or implementation of arc4random(9).  The motivation is
that arc4random(9) is immediately reseeded when the backing random(4)
implementation has additional entropy.  This is arguably most important
during boot, when fenestrasX is reseeding at 1, 3, 9, 27, etc., second
intervals.  Today, arc4random(9) has a hardcoded 300 second reseed window.
Without this mechanism, if arc4random(9) gets weak entropy during initial
seed (and arc4random(9) is used early in boot, so this is quite possible),
it may continue to emit poorly seeded output for 5 minutes.  The FenestrasX
push-reseed scheme corrects consumers, like arc4random(9), as soon as
possible.

Reviewed by:	markm
Approved by:	csprng (markm)
Differential Revision:	https://reviews.freebsd.org/D22838
2020-10-10 21:48:06 +00:00
Mateusz Guzik
dd28b379cb vfs: support lockless dirfd lookups 2020-10-10 03:48:17 +00:00
Bryan Drewery
c2c6fb90e0 Use unlocked page lookup for inmem() to avoid object lock contention
Reviewed By:	kib, markj
Submitted by:	mlaier
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D26653
2020-10-09 23:49:42 +00:00
Mitchell Horne
22e6a67086 Add a routine to dump boot metadata
The boot metadata (also referred to as modinfo, or preload metadata)
provides information about the size and location of the kernel,
pre-loaded modules, and other metadata (e.g. the EFI framebuffer) to be
consumed during by the kernel during early boot. It is encoded as a
series of type-length-value entries and is usually constructed by
loader(8) and passed to the kernel. It is also faked on some
architectures when booted by other means.

Although much of the module information is available via kldstat(8),
there is no easy way to debug the metadata in its entirety. Add some
routines to parse this data and allow it to be printed to the console
during early boot or output via a sysctl.

Since the output can be lengthly, printing to the console is gated
behind the debug.dump_modinfo_at_boot kenv variable as well as the
BOOTVERBOSE flag. The sysctl to print the metadata is named
debug.dump_modinfo.

Reviewed by:	tsoome
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26687
2020-10-08 18:02:05 +00:00
Warner Losh
bc683a89a3 Move kernel env global variables, etc to sys/kenv.h
The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...
2020-10-07 06:16:37 +00:00
Mitchell Horne
6debfd4b13 Remove unused function cpu_boot()
The prototype was added with the creation of kern_shutdown.c in r17658,
but it appears to have never been implemented. Remove it now.

Reviewed by:	cem, kib
Differential Revision:	https://reviews.freebsd.org/D26702
2020-10-06 23:16:56 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Ryan Moeller
92e17803cd Enable iterating all sysctls, even ones with CTLFLAG_SKIP
Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.

This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.

It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.

The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.

Reviewed by:	jhb
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26560
2020-10-05 20:13:22 +00:00
Mateusz Guzik
4e2266100d cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.

Note it is very unlikely anyone ran into it.
2020-10-05 19:38:51 +00:00
Hans Petter Selasky
4c2dddd8a7 Populate the acquire context field of a ww_mutex in the LinuxKPI.
Bump the FreeBSD version to force recompilation of external kernel modules.

MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D26657
Submitted by:		greg_unrelenting.technology (Greg V)
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-04 17:23:39 +00:00
Konstantin Belousov
0400be45e9 Add sig_intr(9).
It gives the answer would the thread sleep according to current state
of signals and suspensions.  Of course the answer is racy and allows
for false-negatives (no sleep when signal is delivered after process
lock is dropped).  Also the answer might change due to signal
rescheduling among threads in multi-threaded process.

Still it is the best approximation I can provide, to answering the
question was the thread interrupted.

Reviewed by:	markj
Tested by:	pho, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26628
2020-10-04 16:33:42 +00:00
Konstantin Belousov
0c82fb267b Refactor sleepq_catch_signals().
- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
  sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.

Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26628
2020-10-04 16:30:05 +00:00
Alexander V. Chernikov
fedeb08b6a Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
 relative weights and a dataplane-optimized structure to enable
 efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
 gets compiled during group creation and is basically an array of
 nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
 nexthop or nexthop group. They are distinguished by the presense of
 NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
 should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
 This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
1         ------- ------- ------- --------------------------------------- ---------       1
              13      10       1                             2001:db8::2     vlan2
              14      20       2                             2001:db8::3     vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by:	olivier
Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26449
2020-10-03 10:47:17 +00:00
Emmanuel Vadot
1e145e73b8 Bump __FreeBSD_version after latest linuxkpi changes 2020-10-02 18:29:25 +00:00
Emmanuel Vadot
675aae732d Add backlight subsystem
This is a simple subsystem that allow drivers to register as a backlight.
Each backlight creates a device node under /dev/backlight/backlightX and
an alias based on the name provided.

Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26250
2020-10-02 18:18:01 +00:00
Warner Losh
dc761d84e2 Standalone SX shims
Create a do-nothing version of SX locks. OpenZFS needs them. However,
since the boot loader is single threaded, they can be nops.
2020-09-29 18:06:02 +00:00
Ed Maste
c1aedfcbd9 add SIOCGIFDATA ioctl
For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state.  While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.

SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison.  This implementation mirrors that of other
similar ioctls in FreeBSD.

Submitted by:	Roy Marples <roy@marples.name>
Reviewed by:	markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D26538
2020-09-28 16:54:39 +00:00
Edward Tomasz Napierala
4abea760e7 Shrink struct sysent from 48 to 32 bytes (on LP64; on ILP32 its probably
from 32 to 28) by shrinking some entries and reordering them.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26508
2020-09-27 18:14:01 +00:00
Warner Losh
728757f256 Adjustments to includes for openzfs in _STANDALONE
Allow the necessary parts of systm.h to be visible in the _STANDALONE
environnment. Limit the reset to only being visible for _KERNEL
builds.  Map KASSERT, etc to printf on failure in the bootloader until
we have more confidence things won't break and leave systems
unbootable. Eventually, this should map to a full panic in the
bootloader, but that also needs some enhancement to be more useful.

Reviewed by: tsoome, jhb
Differential Revision:  https://reviews.freebsd.org/D26543
2020-09-25 20:51:07 +00:00
Warner Losh
fcefa24551 Dont let kernel and standalone both be defined at the same time
_KERNEL and _STANDALONE are different things. They cannot both be true
at the same time. If things that are normally visible only to _KERNEL
are needed for the _STANDALONE environment, you need to also make them
visible to _STANDALONE. Often times, this will be just a subset of the
required things for _KERNEL (eg global variables are but one example).

sys/cdefs.h is included by pretty much everything in both the loader
and the kernel, so is the ideal choke point.
2020-09-25 19:02:49 +00:00
Edward Tomasz Napierala
0c5bd5f993 Regen after r366145.
Sponsored by:	DARPA
2020-09-25 10:05:38 +00:00
Warner Losh
0672da33f3 Create a standalone version of sys/malloc.h
The ZSTD support for the boot loader will need to include files that
use the kernel's malloc interface. Create a standalone stub version
that's functional enough to allow this to work. There's some
limitations in this interface, and it's not quite a perfect
match. Specifically, M_WAITOK allocations can fail because there's
nothing that can be done we no memory is available.
2020-09-24 06:40:35 +00:00
Konstantin Belousov
aaf78c16f5 Do not leak oldvmspace if image activation failed
and current address space is already destroyed, so kern_execve()
terminates the process.

While there, clean up some internals of post_execve() inlined in init_main.

Reported by:	Peter <pmc@citylink.dinoex.sub.org>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26525
2020-09-23 18:03:07 +00:00
Mateusz Guzik
254c54c65a Bump __FreeBSD_version after cache_purgevfs change 2020-09-23 11:02:23 +00:00
Mateusz Guzik
a3d9bf49b5 cache: drop the force flag from purgevfs
The optional scan is wasteful, thus it is removed altogether from unmount.

Callers which always want it anyway remain unaffected.
2020-09-23 10:46:07 +00:00
Brandon Bergren
af22c7e495 __FreeBSD_version bump for introduction of the powerpc64le arch.
Although this is technically not a breaking change, I believe it is best
to have a fresh version to use to define where the starting point was
here.
2020-09-23 03:19:20 +00:00
Konstantin Belousov
1317da4349 Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.
It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.

Requested by:	Dan Gohman <dev@sunfishcode.online>
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:48:12 +00:00
Konstantin Belousov
c7de3d6f0b Add NIRES_STRICTREL.
Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup.  Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:06:20 +00:00
D Scott Phillips
26a3bf76c9 bitset: expand bit index type to long
An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.

CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26190
2020-09-21 22:19:12 +00:00
Mitchell Horne
45f6508149 Hide tunable definitions behind _KERNEL
Some userspace code include sys/kernel.h. Namely, some OpenZFS tests do
this, and it was causing breakage after r365945 due to a lack of bool
typedef. Userspace should not need the TUNABLE_** stuff, so hide it
behind an #ifdef _KERNEL.

Sorry for the breakage.

Reported by:	andrew, Michael Butler, Jenkins
Discussed with: kevans, allanjude
2020-09-21 17:28:41 +00:00
Mitchell Horne
cba446e2c2 Add getenv(9) boolean parsing functions
This adds the getenv_bool() function, to parse a boolean value from a
kernel environment variable or tunable. This works for traditional
boolean values like "0" and "1", and also "true" and "false"
(case-insensitive). These semantics do not yet apply to sysctls declared
using SYSCTL_BOOL with CTLFLAG_TUN (they still only parse 1 and 0).

Also added are two wrapper functions, getenv_is_true() and
getenv_is_false(). These are slightly simpler for callers wishing to
perform a single check of a configuration variable.

Reviewed by:	jhb (slightly earlier version)
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26270
2020-09-21 15:24:44 +00:00
Jessica Clarke
7d54cc9165 atomic_common.h: Fix the volatile qualifier placement in atomic_load_ptr
This was broken in r357940 which introduced the __typeof use. We need
the volatile qualifier to be on the pointee not the pointer otherwise it
does nothing. This was found by mhorne in D26498, noticing there was a
problem (a spin loop condition was hoisted for RISC-V boot code) but not
the root cause of it.

Reported by:	mhorne
Reviewed by:	mhorne, mjg
Approved by:	mhorne, mjg
Differential Revision:	https://reviews.freebsd.org/D26500
2020-09-20 23:20:18 +00:00
Michal Meloun
95a85c125d Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.

Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.

This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25371
2020-09-19 11:06:41 +00:00