17384 Commits

Author SHA1 Message Date
Brandon Bergren
9411e24df3 [PowerPC] kernel ifunc support for powerpc*, fix ppc64 relocation oddities.
This is a general cleanup of the relocatable kernel support on powerpc,
needed to enable kernel ifuncs.

 * Fix some relocatable issues in the kernel linker, and change to using
   a RELOCATABLE_KERNEL #define instead of #ifdef __powerpc__ for parts that
   other platforms can use in the future if they wish to have ET_DYN kernels.

 * Get rid of the DB_STOFFS hack now that the kernel is relocated to the DMAP
   properly across the board on powerpc64.

 * Add powerpc64 and powerpc32 ifunc functionality.

 * Allow AIM64 virtual mode OF kernels to run from the DMAP like other AIM64
   by implementing a virtual mode restart. This fixes the runtime address on
   PowerMac G5.

 * Fix symbol relocation problems on post-relocation kernels by relocating
   the symbol table.

 * Add an undocumented method for supplying kernel symbols on powernv and
   other powerpc machines using linux-style kernel/initrd loading -- If
   you pass the kernel in as the initrd as well, the copy resident in initrd
   will be used as a source for symbols when initializing the debugger.
   This method is subject to removal once we have a better way of doing this.

Approved by:	jhibbits
Relnotes:	yes
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D23156
2020-05-07 19:32:49 +00:00
Brandon Bergren
f57d153e2e [PowerPC] Fix powerpcspe build failure after r360569
On powerpcspe, vm_paddr_t is 64 bit despite it being a 32 bit platform.
Adjust compile time assertion to compensate.
2020-05-07 17:58:07 +00:00
Gleb Smirnoff
61664ee700 Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:37:16 +00:00
Gleb Smirnoff
365e8da44a Mechanically rename MBUF_EXT_PGS_ASSERT() to M_ASSERTEXTPG() to match
classical M_ASSERTPKTHDR.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:27:41 +00:00
Gleb Smirnoff
6edfd179c8 Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:21:11 +00:00
Gleb Smirnoff
7b6c99d08d Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
Gleb Smirnoff
bccf6e26e9 Step 2.5: Stop using 'struct mbuf_ext_pgs' in the kernel itself.
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:08:05 +00:00
Gleb Smirnoff
b363a438b1 Make MBUF_EXT_PGS_ASSERT_SANITY() a macro, so that it prints file:line.
While here, stop using struct mbuf_ext_pgs.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:03:39 +00:00
Gleb Smirnoff
c4ee38f8e8 Step 2.3: Rename mbuf_ext_pg_len() to m_epg_pagelen() that
uses mbuf argument.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:52:35 +00:00
Gleb Smirnoff
49b6b60e22 Step 2.2:
o Shrink sglist(9) functions to work with multipage mbufs down from
  four functions to two.
o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf.
o Rename to something matching _epg.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:46:29 +00:00
Gleb Smirnoff
d90fe9d0cd Step 2.1: Build TLS workqueue from mbufs, not struct mbuf_ext_pgs.
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:38:13 +00:00
Gleb Smirnoff
eeec834855 Get rid of the mbuf self-pointing pointer.
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:56:22 +00:00
Gleb Smirnoff
7433a5a966 Start moving into EPG_/epg_ namespace. There is only one flag, but
next commit brings in second flag, so let them already be in the
future namespace.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:49:14 +00:00
Gleb Smirnoff
4c9f0f982f In mb_unmapped_compress() we don't need mbuf structure to keep data,
but we need buffer of MLEN bytes.  This isn't just a simplification,
but important fixup, because previous commit shrinked sizeof(struct
mbuf) down below MSIZE, and instantiating an mbuf on stack no longer
provides enough data.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:44:23 +00:00
Gleb Smirnoff
0c1032665c Continuation of multi page mbuf redesign from r359919.
The following series of patches addresses three things:

Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.

Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.

The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.

Step 1 of 4:

 o Anonymize mbuf_ext_pgs_data, embed in m_ext
 o Embed mbuf_ext_pgs
 o Start documenting all this entanglement

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:39:26 +00:00
Conrad Meyer
698dda6777 kern_exec.c: Produce valid code ifndef SYS_PROTO_H
Reported by: Coccinelle
2020-05-02 18:54:25 +00:00
Ed Maste
dd175b11d5 correct procctl(PROC_PROTMAX_STATUS _NOFORCE return
Previously procctl(PROC_PROTMAX_STATUS, ... used the PROC_ASLR_NOFORCE
macro for the "system-wide configured policy" status, instead of
PROC_PROTMAX_NOFORCE.

They both have a value of 3, so no functional change.

Sponsored by:	The FreeBSD Foundation
2020-05-01 14:30:59 +00:00
Michal Meloun
5ab94f48a2 Don't try to set frequendcy for enumerated but newer started CPUs.
Openfirmare enumerates and installs the driver for all processors,
regardless of whether they will be started later (because of power
constrains for example).

MFC after: 3 weeks
2020-04-29 14:14:15 +00:00
Mark Johnston
b7eae75807 Make sendfile(SF_SYNC)'s CV wait interruptible.
Otherwise, since the CV is not signalled until data is drained from the
socket, it is trivial to create an unkillable process using
sendfile(SF_SYNC) and a process-private PF_LOCAL socket pair.  In
particular, the cv_wait() in sendfile() does not get interrupted until
data is drained from the receiving socket buffer.

Reported by:	pho
Discussed with:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-04-28 15:02:44 +00:00
Rick Macklem
0306689367 Fix sosend_generic() so that it can handle a list of ext_pgs mbufs.
Without this patch, sosend_generic() will try to use top->m_pkthdr.len,
assuming that the first mbuf has a pkthdr.
When a list of ext_pgs mbufs is passed in, the first mbuf is not a
pkthdr and cannot be post-r359919.  As such, the value of top->m_pkthdr.len
is bogus (0 for my testing).
This patch fixes sosend_generic() to handle this case, calculating the
total length via m_length() for this case.

There is currently nothing that hands a list of ext_pgs mbufs to
sosend_generic(), but the nfs-over-tls kernel RPC code in
projects/nfs-over-tls will do that and was used to test this patch.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24568
2020-04-27 23:55:09 +00:00
John Baldwin
f1f9347546 Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
  authentication algorithms and keys as well as the initial sequence
  number.

- When reading from a socket using KTLS receive, applications must use
  recvmsg().  Each successful call to recvmsg() will return a single
  TLS record.  A new TCP control message, TLS_GET_RECORD, will contain
  the TLS record header of the decrypted record.  The regular message
  buffer passed to recvmsg() will receive the decrypted payload.  This
  is similar to the interface used by Linux's KTLS RX except that
  Linux does not return the full TLS header in the control message.

- Add plumbing to the TOE KTLS interface to request either transmit
  or receive KTLS sessions.

- When a socket is using receive KTLS, redirect reads from
  soreceive_stream() into soreceive_generic().

- Note that this interface is currently only defined for TLS 1.1 and
  1.2, though I believe we will be able to reuse the same interface
  and structures for 1.3.
2020-04-27 23:17:19 +00:00
John Baldwin
ec1db6e13d Add the initial sequence number to the TLS enable socket option.
This will be needed for KTLS RX.

Reviewed by:	gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24451
2020-04-27 22:31:42 +00:00
John Baldwin
61bbe53c2d Improve MACHINE_ARCH handling for hard vs soft-float on RISC-V.
For userland, MACHINE_ARCH reflects the current ABI via preprocessor
directives.  For the kernel, the hw.machine_arch sysctl uses the ELF
header flags of the current process to select the correct MACHINE_ARCH
value.

Reviewed by:	imp, kp
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24543
2020-04-27 17:55:40 +00:00
John Baldwin
3da4d19be4 Extend support in sysctls for supporting multiple native ABIs.
This extends some of the changes in place to support reporting support
for 32-bit ABIs to permit reporting hard-float vs soft-float ABIs.

Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24542
2020-04-27 17:53:38 +00:00
Mark Johnston
569eb766c5 Fix handling of EV_EOF for named pipes.
Contrary to the kevent man page, EV_EOF on a fifo is not cleared by
EV_CLEAR.  Modify the read and write filters to clear EV_EOF when the
fifo's PIPE_EOF flag is clear, and update the man page to document the
new behaviour.

Modify the write filter to return the amount of buffer space available
even if no readers are present.  This matches the behaviour for sockets.

When reading from a pipe, only call pipeselwakeup() if some data was
actually read.  This prevents the continuous re-triggering of a
EVFILT_READ event on EOF when in edge-triggered mode.

PR:		203366, 224615
Submitted by:	Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24528
2020-04-27 15:59:19 +00:00
Mark Johnston
9b22722423 Call pipeselwakeup() after toggling PIPE_EOF.
This ensures that pipe_poll() and the pipe kqueue filters observe
PIPE_EOF and set EV_EOF accordingly.  As a result an extra call to
knote() after setting PIPE_EOF is unnecessary.

Submitted by:	Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24528
2020-04-27 15:59:07 +00:00
Mark Johnston
9ab4355732 Avoid returning POLLIN if the pipe descriptor is not open for reading.
Submitted by:	Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24528
2020-04-27 15:58:55 +00:00
Mateusz Guzik
21d3be9105 pwd: unbreak repeated calls to set_rootvnode
Prior to the change the once set pointer would never be updated.

Unbreaks reboot -r.

Reported by:	Ross Gohlke
2020-04-27 13:54:00 +00:00
Mark Johnston
4ee964d6b6 Fix up i386 thread structure layout assertions after r360354.
Reported by:	Jenkins
2020-04-26 22:04:43 +00:00
Mark Johnston
f13fa9df05 Use a single VM object for kernel stacks.
Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks.  This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.

Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24473
2020-04-26 20:08:57 +00:00
Eric van Gyzen
ba0ced82ea Fix handling of NMIs from unknown sources (BMC, hypervisor)
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure.  This includes NMIs from
IPMI BMCs and hypervisors.

Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.

Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi.  Expand the meaning of panic_on_nmi by making
it a bitfield.  There are currently two bits: one for NMIs due to
hardware failure, and one for all others.  Leave room for more.

If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes: machdep.kdb_on_nmi is gone; machdep.panic_on_nmi changed
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D24558
2020-04-26 00:41:29 +00:00
Alexander V. Chernikov
454d389645 Fix LINT build #2 after r360292.
Pointyhat to: melifaro
2020-04-25 11:35:38 +00:00
Alexander V. Chernikov
983066f05b Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
 to perform packet forwarding such as gateway interface and mtu. Nexthops
 are shared among the routes, providing more pre-computed cache-efficient
 data while requiring less memory. Splitting the LPM code and the attached
 data solves multiple long-standing problems in the routing layer,
 drastically reduces the coupling with outher parts of the stack and allows
 to transparently introduce faster lookup algorithms.

Route caching was (re)introduced to minimise (slow) routing lookups, allowing
 for notably better performance for large TCP senders. Caching works by
 acquiring rtentry reference, which is protected by per-rtentry mutex.
 If the routing table is changed (checked by comparing the rtable generation id)
 or link goes down, cache record gets withdrawn.

Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
 cached object, which is mostly mechanical. Other moving parts like cache
 cleanup on rtable change remains the same.

Differential Revision:	https://reviews.freebsd.org/D24340
2020-04-25 09:06:11 +00:00
Kyle Evans
2c9c433e17 sysent: re-roll after 360236 (AUE_CLOSERANGE used) 2020-04-24 01:30:33 +00:00
Kyle Evans
3e6b82913d close_range(2): use newly assigned AUE_CLOSERANGE 2020-04-24 01:30:00 +00:00
Mark Johnston
304dcfb0d8 Handle PCATCH in blockcount_sleep() so it can be interrupted.
blockcount_wait() still unconditionally waits for the count to reach
zero before returning.

Tested by:	pho (a larger patch)
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24513
2020-04-21 17:13:06 +00:00
Kyle Evans
59dafcde62 kqueue: fix conversion of timer data to sbintime
This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.

PR:		245768
Reported by:	lwhsu / CI
MFC after:	1 week
2020-04-21 03:57:30 +00:00
Mitchell Horne
49439183ce Convert arm's physmem interface to MI code
The arm_physmem interface found in arm's MD code provides a convenient
set of routines for adding/excluding physical memory regions and
initializing important kernel globals such as Maxmem, realmem,
phys_avail[], and dump_avail[]. It is especially convenient for FDT
systems, since we can use FDT parsing functions and pass the result
directly to one of these physmem routines. This interface is already in
use on arm and arm64, and can be used to simplify this early
initialization on RISC-V as well.

This requires only a couple trivial changes:
  - Move arm_physmem_kernel_addr to arm/machdep.c. It is unused on arm64,
    and manipulated entirely in arm MD code.
  - Convert arm32_btop/arm64_btop to atop. This is equivalently defined
    on all architectures.
  - Drop the "arm" prefix.

Reviewed by:	manu, emaste ("looks reasonable")
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24153
2020-04-19 00:12:30 +00:00
Kristof Provost
3f8bc99c4c ethersubr: Make the mac address generation more robust
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.

Mitigate this problem by including the jail name in the mac address.

Reviewed by:	kevans, melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24383
2020-04-18 07:50:30 +00:00
Konstantin Belousov
acf65eef23 sendfile: When all io finished, assert that sfio->pa[] is in expected state.
It must contain fully restored contigous run of the wired pages from
the object, except possible trimmed tail.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2020-04-18 03:09:25 +00:00
Konstantin Belousov
e795a04083 The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.
It is true for zfs, but it is not for e.g. vnode or buffer pagers.
When fixing bogus pages, fix them in both places.  Rely on the fact
that pa[0] must have been invalid so it cannot be bogus.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2020-04-18 03:07:18 +00:00
Kyle Evans
23d5326823 tty: convert tty_lock_assert to tty_assert_locked to hide lock type
A later change, currently being iterated on in D24459, will in-fact change
the lock type to an sx so that TTY drivers can sleep on it if they need to.
Committing this ahead of time to make the review in question a little more
palatable.

tty_lock_assert() is unfortunately still needed for now in two places to
make sure that the tty lock has not been recursed upon, for those scenarios
where it's supplied by the TTY driver and possibly a mutex that is allowed
to recurse.

Suggested by:	markj
2020-04-17 18:34:49 +00:00
Brooks Davis
b24e6ac8b7 Convert canary, execpathp, and pagesizes to pointers.
Use AUXARGS_ENTRY_PTR to export these pointers.  This is a followup to
r359987 and r359988.

Reviewed by:	jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24446
2020-04-16 21:53:17 +00:00
Brooks Davis
19112958cf style(9): end continued line with operator. 2020-04-16 17:24:13 +00:00
Kyle Evans
c318828929 Preload hostuuid for early-boot use
prison0's hostuuid will get set by the hostid rc script, either after
generating it and saving it to /etc/hostid or by simply reading /etc/hostid.

Some things (e.g. arbitrary MAC address generation) may use the hostuuid as
a factor in early boot, so providing a way to read /etc/hostid (if it's
available) and using it before userland starts up is desirable. The code is
written such that the preload doesn't *have* to be /etc/hostid, thus not
assuming that there will be newline at the end of the buffer or even the
exact shape of the newline. White trailing whitespace/non-printables
trimmed, the result will be validated as a valid uuid before it's used for
early boot purposes.

The preload can be turned off with hostuuid_load="NO" in /boot/loader.conf,
just as other preloads; it's worth noting that this is a 37-byte file, the
overhead is believed to be generally minimal.

It doesn't seem necessary at this time to be concerned with kern.hostid.

One does wonder if we should consider validating hostuuids coming in
via jail_set(2); some bits seem to care about uuid form and we bother
validating format of smbios-provided uuid and in-fact whatever uuid comes
from /etc/hostid.

Reviewed by:	karels, delphij, jamie
MFC after:	1 week (don't preload by default, probably)
Differential Revision:	https://reviews.freebsd.org/D24288
2020-04-16 00:54:06 +00:00
Brooks Davis
9df1c38bbc Export argc, argv, envc, envv, and ps_strings in auxargs.
This simplifies discovery of these values, potentially with reducing the
number of syscalls we need to make at runtime.  Longer term, we wish to
convert the startup process to pass an auxargs pointer to _start() and
use that rather than walking off the end of envv.  This is cleaner,
more C-friendly, and for systems with strong bounds (e.g. CHERI)
necessary.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24407
2020-04-15 20:23:55 +00:00
Brooks Davis
397df744f9 Make ps_strings in struct image_params into a pointer.
This is a prepratory commit for D24407.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
2020-04-15 20:21:30 +00:00
Kyle Evans
3fb92d4cb1 validate_uuid: absorb the rest of parse_uuid with a flags arg
This makes the naming annoyance (validate_uuid vs. parse_uuid) less of an
issue and centralizes all of the functionality into the new KPI while still
making the extra validation optional. The end-result is all the same as far
as hostuuid validation-only goes.
2020-04-15 18:39:12 +00:00
Pawel Biernacki
f65eac0fc0 sysctl_handle_string: Put logical or in parentheses.
Reported by:	rdivacky
Approved by:	kib (mentor)
Pointy-hat to:	kaktus
2020-04-15 16:55:38 +00:00
Pawel Biernacki
1627b1fd9d sysctl(9): fix handling string tunables.
r357614 changed internals of handling string sysctls, and inadvertently
broke setting string tunables.  Take them into account.

PR:		245463
Reported by:	jhb, np
Reviewed by:	imp, jhb, kib
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D24429
2020-04-15 16:33:55 +00:00