Rather than repeatedly nesting loops, separate concerns with a single loop
per call stack level. Use a table to drive the recursive routine. Handle
missing topology layers more gracefully (infer a single unit).
Analyze some additional optional layers which may be present on e.g. AMD Zen
systems (groups, aka dies, per package; and cachegroups, aka CCXes, per
group).
Display that additional information in the boot-time topology information,
when it is relevent (non-one).
Reviewed by: markj@, mjoras@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12019
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.
Reviewed by: jhb (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12023
This mode allows other clean buffers to arrive while we flush the buf
lists for the vnode, which is fine for the targeted use. We only need
that all buffers existed at the time of the function start were
flushed. In fact, only one assert has to be relaxed.
In collaboration with: pho
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12083
There was already a per-vty defaults field, but it was useless since it was
only initialized when propagating the global settings and thus no different
from the current global settings and not per-vty. The global defaults field
was also invariant after boot time, but not quite so useless.
Fix this by adding a second selection bit the the control flags of the
relevant ioctl(). vidcontrol doesn't support this yet. Setting either
default propagates the change to the current setting for the same level
and then to all lower levels.
Improve the 3-way escape sequence used by termcap to control the cursor.
The "normal" (ve) case has always used reset, so the user could set
it to anything, but since the reset is to a global value this is not
very useful, especially since the "very visible" (vs) case doesn't
reset but inconsistently forces to a blinking block. Change vs to
first reset and then XOR the blinking bit so that it is predictably
different from ve.
attribute field is curs_attr. The base field holds user data translated
in a reversible way and is needed because current field holds this in
an irreversible way for efficiency.
Factor out some common code for the reversible translation. This is
slightly simpler now, and much easier to expand.
Translate the magic flags value -1 to a single control flag internally
up front so other flags can be trusted later. This can be used for the
relevant ioctl() too.
Remove CONS_CURSOR_FLAGS which contained all the control flags. It was
unused and not useful. After adding more flags, there will be tests on
a couple at a time but never on them all. This API should have used this
to disallow unknown flags.
Drop the EARLY_AP_STARTUP gtaskqueue code, as gtaskqueues are now
initialized before APs are started.
Reviewed by: hselasky@, jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12054
This helps simplify the code in kern_shutdown.c and reduces the number
of globally visible functions.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11603
dump_start() and dump_finish() are responsible for writing kernel dump
headers, optionally writing the key when encryption is enabled, and
initializing the initial offset into the dump device.
Also remove the unused dump_pad(), and make some functions static now that
they're only called from kern_shutdown.c.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11584
during drain operations. When an sbuf is configured to use this feature by way
of the SBUF_DRAINTOEOR sbuf_new() flag, top-level sections started with
sbuf_start_section() create a record boundary marker that is used to avoid
flushing partial records.
Reviewed by: cem,imp,wblock
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D8536
This fixes a regression accidentally introduced in r322588, due to an
interaction with EARLY_AP_STARTUP.
Reviewed by: bdrewery@, jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12053
but it was actually extended then and it is still used (just once) in
/usr/src by its primary user (vidcontrol), while its replacement is
still not used in /usr/src.
yokota became inactive soon after deprecating CONS_CURSORTYPE (this
was part of a large change to make cursor attributes per-vty).
vidcontrol has incomplete support even for the old ioctl. I will
update it soon. Then there are many broken escape sequences to fix.
This is just to prepare for setting cursor colors using vidcontrol.
it automatically after it runs.
The config_intrhook mechanism allows a driver to stall the boot process
until device(s) required for booting are available, by not allowing system
inits to proceed until all intrhook functions have been unregistered.
Virtually all existing code simply unregisters from within the hook function
when it gets called.
This new function makes that common usage more convenient. Instead of
allocating and filling in a struct, passing it to a function that might (in
theory) fail, and checking the return code, now a driver can simply call
this cannot-fail routine, passing just the intrhook function and its arg.
Differential Revision: https://reviews.freebsd.org/D11963
another parameter that identifies a starting point in the memory address
block. Radix is a power of two, blk is a multiple of radix, and the
starting point is in the range [blk, blk+radix), so that blk can always be
computed from the other two. This change drops the blk parameter from the
meta functions and computes it instead. (On amd64, for example, this
change reduces subr_blist.o's text size by 7%.)
It also makes the radix parameters unsigned to address concerns that the
calculation of '-radix' might overflow without the -fwrapv option. (See
https://reviews.freebsd.org/D11819.)
Submitted by: Doug Moore <dougm@rice.edu>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11964
The encrypted kernel dump code writes data in blocks of this size. A buffer
size of 4096 allows encrypted dumps to work with 4Kn drives.
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11870
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)
This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.
RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):
__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen
Reviewed by: ngie
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11901
They are defined by XSI or newer SUS.
This is a follow-up to r318780.
Reported by: jbeich
Obtained from: DragonflyBSD commit e08b3836c962
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- Store the symbol table contents in an anonymous swap-backed object. Have
mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
the calling process in ksyms_open(). Previously we would cache a pointer
to the pmap of the opening process, and mmap(/dev/ksyms) would create a
mapping using the physical address found by a pmap lookup at the initial
mapping address. However, this assumes that the cached pmap is valid,
which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
around a limitation in libelf that no longer exists; see r321842.
Moreover, the interface is difficult to support and isn't present in
illumos. Since ksyms was added specifically to support lockstat(1), it
is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
of the kernel's symbol table. devfs will do that for us.
Reported by: Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by: kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11789
Since traditional types for the macros values are int, remove the
cookie trick and just split the dev_t at the word boundary.
Reported by: Victor Stinner <victor.stinner@gmail.com>
PR: 221048
Sponsored by: The FreeBSD Foundation
'skip', which denote, respectively, the largest number of blocks that can be
managed by a subtree of that height, and one less than the number of nodes
in a subtree of that height. This change removes the 'skip' argument from
those functions because 'skip' can be trivially computed from 'radius'.
This change also redefines 'skip' so that it denotes the number of nodes in
the subtree, and so changes loop upper bound tests from '<= skip' to '<
skip' to account for the change.
The 'skip' field is also removed from the blist struct.
The self-test program is changed so that the print command includes the
cursor value in the output.
Submitted by: Doug Moore <dougm@rice.edu>
MFC after: 1 week
Linux specific things to the native fdescfs file system.
Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.
Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.
For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.
For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:
mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd
Reviewed by: kib@
MFC after: 1 week
Relnotes: yes
request that their clock_settime() methods be called at a given offset
from top-of-second. This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread. If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.
The motivation behind this is that I was on the path of adding identical
code to a third RTC driver to figure out a delta to top-of-second and
sleep for that amount of time because writing the the RTC registers resets
the hardware's concept of top-of-second. (Sometimes it's not top-of-second,
some RTC clocks tick over a half second after you set their time registers.)
Worst-case would be to sleep for almost a full second, which is a rude thing
to do on a shared task queue thread.
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.
Tested by: pho
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11735
Use them in some existing code that is vulnerable to roundoff errors.
The existing constant SBT_1NS is a honeypot, luring unsuspecting folks into
writing code such as long_timeout_ns*SBT_1NS to generate the argument for a
sleep call. The actual value of 1ns in sbt units is ~4.3, leading to a
large roundoff error giving a shorter sleep than expected when multiplying
by the trucated value of 4 in SBT_1NS. (The evil honeypot aspect becomes
clear after you waste a whole day figuring out why your sleeps return early.)
The PCTRIE macro will be shortly applied in a situation where
LOOKUP_LE is not needed.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D11435
on clock drivers.
This tracks multiple concurrent realtime clock drivers in a list sorted by
clock resolution. When system time changes (and periodically) the
clock_settime() methods of all registered clocks are invoked.
To initialize system time, each driver is tried in turn from best to worst
resolution, until one succesfully returns a valid time.
The code no longer holds a mutex while calling the clock_settime() and
clock_gettime() methods of the registered clocks. This allows clock drivers
to do whatever kind of locking or sleeping is necessary (this is especially
important for i2c clock chips since i2c drivers often need to sleep).
A new clock_register_flags() function allows the clock driver to pass
flags. The flags currently defined help support drivers that use their own
techniques to avoid roundoff errors (prevents the 4/5 rounding done by the
subr_rtc code). A driver which may need to wait for resources (such as bus
ownership) may pass a flag to indicate that it will obtain system time for
itself after waiting for resources; this is merely an optimization to avoid
the common code retrieving a timespec that will never get used.
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D11484
The iteration index is unsigned, so testing for larger than or equal
to zero makes little sense.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 3 days
have available to use in the future.
- Add kmem_access flag as a placeholder (reserve it), not used yet.
Differential Revision: D11451
Reviewed by: jamie
Sponsored by: Hackathon Essen 2017
The benefit of BIT_FLS() is that ffsl() can be implemented with a
count leading zeros instruction which is more widespread available.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week
--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.
--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.
--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h
--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)
Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729
The acq barrier in refcount_acquire() has no use, constructor must
ensure that the changes are visible before publication by other means.
Last release must sync/with the constructor and all updaters.
This is based on the refcount/shared_ptr analysis I heard at the Hans
Boehm and Herb Sutter talks about C++ atomics.
Reviewed by: alc, jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D11270
Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D11407
AKA Make time_t 64 bits on powerpc(32).
PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386). This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.
Tested by: andreast, others
MFC after: Never
Relnotes: Yes
It distinguishes between data flow sockets and listening sockets, and
in case of the latter doesn't change resource limits, since listening
sockets don't hold any buffers, they only carry values to be inherited
by their children.
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range. It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).
Use guards to reimplement stack grow code. Explicitely track stack
grow area with the guard, including the stack guard page. On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion. Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().
As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.
Enable stack guard page by default.
Reviewed by: alc, markj
Man page update reviewed by: alc, bjk, emaste, markj, pho
Tested by: pho, Qualys
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11306 (man pages)
r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.
Suggested by: kib
MFC after: 2 weeks
The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.
Reviewed by: kib
that disk writes are more likely to be sequential. This change is
beneficial on both the solid state and mechanical disks that I've
tested. (A similar change in allocation policy was made by DragonFly
BSD in 2013 to speed up Poudriere with "stressful memory parameters".)
Increase the width of blst_meta_alloc()'s parameter "skip" and the local
variables whose values are derived from it to 64 bits. (This matches the
width of the field "skip" that is stored in the structure "blist" and
passed to blst_meta_alloc().)
Eliminate a pointless check for a NULL blist_t.
Simplify blst_meta_alloc()'s handling of the ALL-FREE case.
Address nearby style errors.
Reviewed by: kib, markj
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D11247