The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.
By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap. The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.
There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.
copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline. If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.
The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done. The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging. I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.
Tested by: pho
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14633
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
Normally, shutdown_nice() just signals init. However, sometimes it
calls kern_reboot directly. For that case, r331298 dropped the Giant
lock before calling it. This turns out to be incorrect for the more
common case where init exists and we just signal it. Restore the old
behavior. The direct call to kern_reboot() doesn't sync buffers to the
disk, so should work with Giant held, so we don't need to drop locks
here for that.
Noticed by: bde@
Sponsored by: Netflix
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
- map the hard-coded frame buffer address above KERNBASE. Using the
physical address only worked because of larger mapping bugs.
The hard-coded frame buffer address only works on x86. Use messy ifdefs
to try to avoid warnings about unused code for other arches.
- remove the sysctl for reading and writing the table kernel console
attributes. Writing only worked for emergency output since normal
output uses unalterd copies.
- fix the test for the emergency console being usable
- explain why a hard-coded attribute is used very early. Emergency output
works on x86 even before the pcpu pointer is initialized.
Advertise this by changing the defaults to mostly red. If you don't like
this, change them (almost) back using:
vidcontrol -c charcolors,base=7,height=0
vidcontrol -c mousecolors,base=0[,height=15]
The (graphics mode only) mouse cursor colors were hard-coded to a black
border and lightwhite interior. Black for the border is the worst
possible default, since it is the same as the default black background
and not good for any dark background. Reversing this gives the better
default of X Windows. Coloring everything works better still. Now
the coloring defaults to a lightwhite border and red interior.
Coloring for the character cursor is more complicated and mode
dependent. The new coloring doesn't apply for hardware cursors. For
non-block cursors, it only applies in graphics mode. In text mode,
the cursor color was usually a hard-coded (dull)white for the background
only, unless the foreground was white when it was a hard-coded black
for the background only, unless the foreground was white and the
background was black it was reverse video. In graphics mode, it was
always reverse video for the block cursor. Reverse video is worse,
especially over cutmarking regions, since cutmarking still uses simple
reverse video (nothing better is possible in text mode) and double
reverse video for the cursor gives normal video. Now, graphics mode
uses the same algorithm as the best case for text mode in all cases
for graphics mode. The hard-coded sequence { white, black, } for the
background is now { red, white, blue, } where the first 2 colors can
be configured. The blue color at the end is a sentinel which prevents
reverse video being used in most cases but breaks the compatibility
setting for white on black and black on white characters. This will
be fixed later. The compatibility setting is most needed for mono modes.
The previous commit to syscons.c changed sc_cnterm() to be more careful.
It followed null pointers in some cases. But sc_cnterm() has been
unreachable for 15+ years since changes for multiple consoles turned
off calls to the the cnterm destructor for all console drivers. Before
them, it was only called at boot time. So no driver with an attached
console has ever been unloadable and not even the non-console destructors
have been tested much.
terminal state for kernel console output.
r56043 in 2000 added many complications to support dynamic selection
of the terminal emulator using modules and the ioctl CONS_SETTERM.
This was never completed. There are still no modules, but it is easy
to restore the scterm and dumb emulators at compile time. Then
boot-time configuration for the preferred one doesn't work right, but
CONS_SETTERM almost works after fixing this bug. CONS_SETTERM only
switches the emulator for the user state, leaving the kernel state(s)
still using the boot-time emulator. The fix is especially important
when switching from sc to scteken, since the scteken state has pointers
in it.
Rename kernel_console_ts to sc_kts.
There was already a per-vty defaults field, but it was useless since it was
only initialized when propagating the global settings and thus no different
from the current global settings and not per-vty. The global defaults field
was also invariant after boot time, but not quite so useless.
Fix this by adding a second selection bit the the control flags of the
relevant ioctl(). vidcontrol doesn't support this yet. Setting either
default propagates the change to the current setting for the same level
and then to all lower levels.
Improve the 3-way escape sequence used by termcap to control the cursor.
The "normal" (ve) case has always used reset, so the user could set
it to anything, but since the reset is to a global value this is not
very useful, especially since the "very visible" (vs) case doesn't
reset but inconsistently forces to a blinking block. Change vs to
first reset and then XOR the blinking bit so that it is predictably
different from ve.
attribute field is curs_attr. The base field holds user data translated
in a reversible way and is needed because current field holds this in
an irreversible way for efficiency.
Factor out some common code for the reversible translation. This is
slightly simpler now, and much easier to expand.
Translate the magic flags value -1 to a single control flag internally
up front so other flags can be trusted later. This can be used for the
relevant ioctl() too.
Remove CONS_CURSOR_FLAGS which contained all the control flags. It was
unused and not useful. After adding more flags, there will be tests on
a couple at a time but never on them all. This API should have used this
to disallow unknown flags.
redundant initializations.
Hard-code base = 0, height = (approx. 1/8 of the boot-time font height)
in all cases, and remove the BIOS/MD support for setting these values.
This asks for an underline cursor sized for the boot-time font instead
of various less hard-coded but worse values. I used that think that
the x86 BIOS always gave the same values as the above hard-coding, but
on 1 of my systems it gives the wrong value of base = 1.
The remaining BIOS fields are shift_state and bell_pitch. These are now
consistently not explicitly reinitialized to 0. All sc_get_bios_value()
functions except x86's are now empty, and the only useful thing that x86
returns is shift_state. This really belongs in atkbdc, but heavier
use of the BIOS to read the more useful typematic rate has been removed
there. fb still makes much heavier use of the BIOS.
but it was actually extended then and it is still used (just once) in
/usr/src by its primary user (vidcontrol), while its replacement is
still not used in /usr/src.
yokota became inactive soon after deprecating CONS_CURSORTYPE (this
was part of a large change to make cursor attributes per-vty).
vidcontrol has incomplete support even for the old ioctl. I will
update it soon. Then there are many broken escape sequences to fix.
This is just to prepare for setting cursor colors using vidcontrol.
position. Especially the screen size, and potentially everything except
the input state and attributes. Do this by changing the cursor position
setting method to a general syncing method.
Use proper constructors instead of copying to create kernel terminal
contexts. We really want clones and not new instances, but there is
no method for cloning and there is nothing in the active instance that
needs to be cloned exactly.
Add proper destructors for kernel terminal contexts. I doubt that the
destructor code has every been reached, but if it was then it leaked the
memory of the clones.
Remove freeing of statically allocated memory for the non-kernel terminal
context for the same terminal as the kernel. This is in the nearly
unreachable code. This used to not happen because delicate context
swapping made the user context use the dynamic memory and kernel
context the static memory. I didn't restore this swapping since it
would have been unnatural to have all kernel contexts except 1 dynamic.
The constructor for terminal context has bad layering for reasons
related to the bug. It has to return static memory early before
malloc() works. Callers also can't allocate memory until after the
first constructor selects an emulator and tells upper layers the size
of its context. After that, the cloning hack required the cloning
code to allocate the memory, but for all other constructors it would
be better for the terminal layer to allocate and deallocate the
memory in all cases.
Zero the memory when allocating terminal contexts dynamically.
it to a separate state for each CPU.
Terminal "input" is user or kernel output. Its state includes the current
parser state for escape sequences and multi-byte characters, and some
results of previous parsing (mainly attributes), and in teken the cursor
position, but not completed output. This state must be switched for kernel
output since the kernel can preempt anything, including itself, and this
must not affect the preempted state more than necessary. Since vty0 is
shared, it is necessary to affect the frame buffer and cursor position and
history, but escape sequences must not be affected and attributes for
further output must not be affected.
This used to work. The syscons terminal state contained mainly the parser
state for escape sequences and attributes, but not the cursor position,
and was switched. This was first broken by SMP and/or preemptive kernels.
Then there should really be a separate state for each thread, and one more
for ddb, or locking to prevent preemption. Serialization of printf() helps.
But it is arcane that full syscons escape sequences mostly work in kernel
printf(), and I have never seen them used except by me to test this fix.
They worked perfectly except for the races, since "input" from the kernel
was not special in any way.
This was broken to use teken. The general switch was removed, and the
kernel normal attribute was switched specially. The kernel reverse
attribute (config option SC_CONS_REVERSE_ATTR) became unused, and is
still unusable because teken doesn't support default reverse attributes
(it used to only be used via the ANSI escape sequence to set reverse
video).
The only new difficulty for using teken seems to be that the cursor
position is in the "input" state, so it must be updated in the active
input state for each half of the switch. Do this to complete the
restoration.
The per-CPU state is mainly to make per-CPU coloring work cleanly, at
a cost of some space. Each CPU gets its own full set of attribute
(not just the current attribute) maintained in the usual way. This
also reduces races from unserialized printf()s. However, this gives
races for serialized printf()s that otherwise have none. Nothing
prevents the CPU doing the a printf() changing in the middle of an
escape sequence.
Fix this by using more dynamic initialization with simpler ifdefs for
the machine dependencies. Find a frame buffer address in a more
portable way that at least compiles on sparc64.
some cases of initialization and resetting of the teken cursor position.
(This bad name is consistent with others, but it is too easy to confuse
with scteken_cursor() which goes in the opposite direction.)
The following cases were broken:
- for booting without a syscons console, the teken and sc positions for
ttyv0 were (0, 0), but are supposed to be somewhere in the middle of
the screen (after carefully preserved BIOS and loader messages) (at
least if there is no mode switch that loses the messages).
- after mode switches, the screen is cleared and the cursor is supposed to
be moved to (0, 0), but it was only moved there for sc.
The following case was hacked to work:
- for booting with a syscons console, it was arranged that scteken_init()
for the console could see a nonzero cursor position and adjust, although
this broke the sc seeing it in the non-console case above.
This change just does cleanups missed in r56043 17 years ago. The
default attributes were still stored in structs for the purpose of
changing them and passing around pointers to the defaults, but r56043
added another layer that made the defaults invariant and only used for
initialization and reset. Just use the defaults directly. This was
already done for the kernel defaults. The defaults for reverse
attributes aren't actually used, but are ignored in layers that no
longer support them.
is unavailable on sparc64 only. This makes the new ec_putc() a non-op
on sparc64 but still calls it. On other non-x86 arches, it should
compile but might not work.
Reported by: gjb
it in emergency in sc_cnputc().
Locking fixes in sc_cnputc() previously turned off normal output in
near-deadlock conditions and added deferred output which might never
be completed. Emergency output goes to the frame buffer using
sufficiently atomic non-blocking writes if the console is in text
mode (in graphics mode, nothing is done, modulo races setting the
graphics mode bit). Screen updates overwrite the emergency output
if the emergency condition clears enough to reach them.
ec_putc() also works for "early" console output in normal x86 text
mode as soon as this mode is initialized (if ever). This uses a
hard-coded x86 frame buffer address before cninit() and a hopefully
MI address after cninit(). But non-x86 is more likely to not support
text mode, when ec_putc() will be null. ec_putc() has no dependencies
of syscons before cninit(), and only has them later to track syscons'
mode changes. This commit doesn't attach ec_putc() for early use.
To test emergency use, put a breakpoint in central syscons output code
like sc_puts() and do some user output. The system used to race or
deadlock in ddb output soon after entry to ddb. The locking fixes
deferred the output until after leaving ddb, so ddb was unusable and
you had to try typing c[ontinue] blindly until it exited, or better use
a serial console in parallel. Now the output goes to a window in the
middle 2/3 of the screen. Scrolling is circular and there is no cursor,
but otherwise ec_putc() provides full dumb terminal functionality and
very fast output that hides artificates from dumb overwrites.
by the CPU number.
This was originally for debugging near-deadlock conditions where
multiple CPUs either deadlock or scramble each other's output trying
to report the problem, but I found it interesting and sometimes
useful for ordinary kernel messages. Ordinary kernel messages
shouldn't be interleaved, but if they are then the colorization
makes them readable even if the interleaving is for every character
(provided the CPU printing each message doesn't change).
The default colors are 8-15 starting at 15 (bright white on black)
for CPU 0 and repeating every 8 CPUs. This works best with 8 CPUs.
Non-bright colors and nonzero background colors need special
configuration to avoid unreadable and ugly combinations so are not
configured by default. The next bright color after 15 is 8 (bright
black = dark gray) is not very readable but is the only other color
used with 2 CPUs. After that the next bright color is 9 (bright
blue) which is not much brighter than bright black, but is used with
3+ CPUs. Other bright colors are brighter.
Colorization is configured by default so that it gets tested. It can
only be turned off by configuring SC_KERNEL_CONS_ATTR to anything other
than FG_WHITE. After booting, all colors can be changed using the
syscons.kattr sysctl. This is a SYSCTL_OPAQUE, and no utility is
provided to change it (sysctl only displays it).
The default colors work in all VGA modes that I could test. In 2-color
graphics modes, all 8 bright colors are displayed as bright white, so
the colorization has no effect, but anything with a nonzero background
gives white on white unless the foreground is zero. I don't have an
mono or VGA grayscale hardware to test on. Support for mono mode seems
to have never worked right in syscons (I think bright white gives white
underline with either bold or bright), but VGA grayscale should work
better than 2-color graphics.
important detail that sc_cngetc() now opens and closes the keyboard
on every call again. This was moved from sc_cngetc() to scn_cngrab/
ungrab() in r228644, but the change wasn't quite complete. After
fixes for nesting in kbdd_poll() in ukbd and kbdmux, these opens
and closes should have no significant effect if done while grabbed.
They fix unusual cases when cngetc() is called while not grabbed.
This commit is the main fix for screen locking in sc_cnputc():
detect deadlock or likely-deadlock and handle it by buffering the
output atomically and printing it later if the deadlock condition
clears (and sc_cnputc() is called).
The most common deadlock is when the screen lock is held by ourself.
Then it would be safe to acquire the lock recursively if the console
driver is calling printf() in a safe context, but we don't know when
that is. It is not safe to ignore the lock even in kdb or panic mode.
But ignore it in panic mode. The only other known case of deadlock
is when another thread holds the lock but is running on a stopped CPU.
Detect that case approximately by using trylock and retrying for 1000
usec. On a 4 GHz CPU, 100 usec is almost long enough -- screen switches
take slightly longer than that. Not retrying at all is good enough
except for stress tests, and planned future versions will extend the
timeout so that the stress tests work better.
To see the behaviour when deadlock is detected, single step through
sctty_outwakeup() (or sc_puts() to start with deadlock). Another
(serial) console is needed to the buffered-only output, but the
keyboard works in this context to continue or step out of the
deadlocked region. The buffer is not large enough to hold all the
output for this.
Keyboard input needs Giant locking, and that is not possible to do
correctly here. Use mtx_trylock() and proceed unlocked as before if
we can't acquire Giant (non-recursively), except in kdb mode don't
even try to acquire Giant. Everything here is a hack, but it often
works. Even if mtx_trylock() succeeds, this might be a LOR.
Keyboard input also needs screen locking, to handle screen updates
and switches. Add this, using the same simplistic screen locking
as for sc_cnputc().
Giant must be acquired before the screen lock, and the screen lock
must be dropped when calling the keyboard driver (else it would get a
harmless LOR if it tries to acquire Giant). It was intended that sc
cn open/close hide the locking calls, and they do for i/o functions
functions except for this complication.
Non-console keyboard input is still only Giant-locked, with screen
locking in some called functions. This is correct for the keyboard
parts only.
When Giant cannot be acquired properly, atkbd and kbdmux tend to race
and work (they assume that the caller acquired Giant properly and don't
try to acquire it again or check that it has been acquired, and the
races rarely matter), while ukbd tends to deadlock or panic (since it
does the opposite, and has other usb threads to deadlock with).
The keyboard (Giant) locking here does very little, but the screen
locking completes screen locking for console mode except for not
detecting or handling deadlock.
Restore an splx() lost in r228644. We aren't nearly ready to remove
spl's. They give hints about missing locking. This lost one was
misplaced. Dropping it early for convenience gave race windows for
accesses to the fkey buffer. Giant locking accidentally fixed this
for non-console cases.
Put the spl's around the whole function. Since there are many returns
that would need splx() just before them for a direct fix, split the
function into a wrapper that does the spl's and a "locked" function
that does the work.
Return earlier when no keyboard is attached to match the ordering in a
planned version. This breaks the dubious feature of returning keys
from the fkey buffer after the keyboard has gone away. Losing the keys
wouldn't matter, but we keep them too long now.
just use the same mutex locking as sc cn putc so they have the same
defects.
The locking calls to acquire the lock are actually in sc cn open and close.
Ungrab has to unlock, although this opens a race window.
Change the direct mutex lock calls in sc cn putc to the new locking
functions via the open and close functions. Putc also has to unlock, but
doesn't keep the screen open like grab. Screen open and close reduce to
locking, except screen open for grab also attempts to switch the screen.
Keyboard locking is more difficult and still null, even when keyboard
input calls screen functions, except some of the functions have locks
too deep to work right.
This organization gives a single place to fix some of the locking.
syscons spinlock for the output routine alone. It is better to extend
the coverage of the first syscons spinlock added in r162285. 2 locks
might work with complicated juggling, but no juggling was done. What
the 2 locks actually did was to cover some of the missing locking in
each other and deadlock less often against each other than a single
lock with larger coverage would against itself. Races are preferable
to deadlocks here, but 2 locks are still worse since they are harder
to understand and fix.
Prefer deadlocks to races and merge the second lock into the first one.
Extend the scope of the spinlocking to all of sc_cnputc() instead of
just the sc_puts() part. This further prefers deadlocks to races.
Extend the kdb_active hack from sc_puts() internals for the second lock
to all spinlocking. This reduces deadlocks much more than the other
changes increases them. The s/p,10* test in ddb gets much further now.
Hide this detail in the SC_VIDEO_LOCK() macro. Add namespace pollution
in 1 nested #include and reduce namespace pollution in other nested
#includes to pay for this.
Move the first lock higher in the witness order. The second lock was
unnaturally low and the first lock was unnaturally high. The second
lock had to be above "sleepq chain" and/or "callout" to avoid spurious
LORs for visual bells in sc_puts(). Other console driver locks are
already even higher (but not adjacent like they should be) except when
they are missing from the table. Audio bells also benefit from the
syscons lock being high so that audio mutexes have chance of being
lower. Otherwise, console drviver locks should be as low as possible.
Non-spurious LORs now occur if the bell code calls printf() or is
interrupted (perhaps by an NMI) and the interrupt handler calls
printf(). Previous commits turned off many bells in console i/o but
missed ones done by the teken layer.
indicate (potentially partial) success of the open. Use these to
decide what to close in sccnclose(). Only grab/ungrab use open/close
so far.
Add a per-sc variable to count successful keyboard opens and use
this instead of the grab count to decide if the keyboad state has
been switched.
Start fixing the locking by using atomic ops for the most important
counter -- the grab level one. Other racy counting will eventually
be fixed by normal mutex or kdb locking in most cases.
Use a 2-entry per-sc stack of states for grabbing. 2 is just enough
to debug grabbing, e.g., for gets(). gets() grabs once and might not
be able to do a full (or any) state switch. ddb grabs again and has
a better chance of doing a full state switch and needs a place to
stack the previous state. For more than 3 levels, grabbing just
changes the count. Console drivers should try to switch on every i/o
in case lower levels of nesting failed to switch but the current level
succeeds, but then the switch (back) must be completed on every i/o
and this flaps the state unless the switch is null. The main point
of grabbing is to make it null quite often. Syscons grabbing also
does a carefully chosen screen focus that is not done on every i/o.
Add a large comment about grabbing.
Restore some small lost comments.
- in sccnopen(), open the keyboard before the screen. The keyboard
currently requires Giant (although it must be spinlocked to work
correctly as a console), so the previous order would be a LOR if
it has any semblance of locking.
- add a (currently dummy) state arg to scgetc().
close functions. Scattered calls to sc_cnputc() and sc_cngetc() were
broken by turning the semi-reentrant inline context-switching code in
these functions into the grabbing functions. cncheckc() calls for
panic dumps are the main broken case. The grabbing functions have
special behaviour (mainly screen switching in sc_cngrab()) which makes
them unsuitable as replacements for the inline code.
Simply change the mode to K_XLATE using a local variable and use the
grab level as a flag to tell screen switches not to change it again,
so that we don't need to switch scp->kbd_mode. We did the latter,
but didn't have the complications to update the keyboard mode switch
for every screen switch. sc->kbd_mode remains at its user setting
for all scp's and ungrabbing restores to it.
This bug corrupted the grab count on both vtys if the ungrabbed vty is
different from the console, and failed to restore the keyboard state
on the ungrabbed vty, but not restoring the latter usually left the
keyboard mode part of it uncorrupted at 1 (K_XLATE), while after this
fix the keyboard mode part is usually corrupted to 0 (K_RAW).
While here, rename the grab count from 'grabbed' to grab_level.
- never call up to the tty layer to restart output for keyboard input in
console mode. This was already disallowed in kdb mode. Other cases
are rarely reached.
- disable the reboot, halt and powerdown keys in console mode. The suspend,
standby and panic keys are still allowed, and aren't even conditonal
on excessive configuration options. Some of these actions are still
available in ddb mode as ddb commands which are equally unsafe. Some
are useful at input prompts and should be restored when the locking is
fixed.
- disallow bells in kdb mode (should be in console mode, but the flag for
that is not available). Visual bell gives very alarming behaviour by
trying to use callouts which don't work in kdb mode. Audio bell uses
timeouts and hardware resources with mutexes that can deadlock in
reasonable use of ddb.
Screen switches in kdb mode are not very safe, but they are important
functionality and there is a lot of code to make them sort of work.
restores avoidance of doing dangerous things like calling wakeup() and
callouts while in ddb.
Initialization of 'debugger' was broken by removing the cndbctl() console
method that was used mainly in this driver to initialize 'debugger' and
switch to the console screen on entry to ddb. The screen switch was
restored using the cngrab() method, but cngrab() is more general so it
should not initialize 'debugger' and never did. 'debugger' was just
an over-engineered alias for kdb_active anyway. It existed because
kdb_active (when it was named ddb_active) was considered as a private
kdb variable, and there are ordering problems initializing the variables
atomically with the state that they represent, but an extra variable and
method to set it increased these problems.
The bug caused LORs, but WITNESS is normally misconfigured with
WITNESS_SKIPSIN so it doesn't check the spinlocks used by wakeup() and
callouts.
virtual-device, but needs to be per-physical-device so that it protects
shared data. Usually, scp->sc->write_in_progress got corrupted first
and further corruption was limited when this variable was left at nonzero
with no write in progress.
Attempt to fix missing lock destruction in r162285. Put it with the
lock destruction for r172250 after moving the latter. Both might be
unreachable.
To demonstrate the bug, find a buggy syscall or sysctl that calls
printf(9) and run this often. Run hd /dev/zero >/dev/ttyvN for any
N != 0. The console spam goes to ttyv0 and the non-console spam goes
to ttyvN, so the lock provided no protection (but it helped for
N == 0).
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.
* src/share/man/...
- Update the man pages.
* src/etc/...
- The startup/shutdown work is done in D2924.
* src/UPDATING
- Add UPDATING announcement.
* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.
* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.
* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.
* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.
* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.
* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.
* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.
* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.
* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.
Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
Also, split power_suspend into power_suspend and power_suspend_early.
power_suspend_early is called before the userland is frozen.
power_suspend is called after the userland is frozen.
Currently only VT switching is hooked to power_suspend_early.
This is needed because switching away from X server requires its
cooperation, so obviously X server must not be frozen when that happens.
Freezing userland during ACPI suspend is useful because not all drivers
correctly handle suspension concurrent with other activity. This is
especially applicable to drivers ported from other operating systems
that suspend all software activity between placing drivers and hardware
into suspended state.
In particular drm2/radeon (radeonkms) depends on the described
procedure. The driver does not have any internal synchronization
between suspension activities and processing of userland requests.
Many thanks to kib for the code that allows to freeze and thaw all
userland threads.
Note that ideally we also need to park / inhibit (non-special) kernel
threads as well to ensure that they do not call into drivers.
MFC after: 17 days
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.
(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Contains:
* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).
* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.
* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.
* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.
* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.
Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)
- Switch syscons from timeout() to callout_reset_flags() and specify that
precision is not important there -- anything from 20 to 30Hz will be fine.
- Reduce syscons "refresh" rate to 1-2Hz when console is in graphics mode
and there is nothing to do except some polling for keyboard. Text mode
refresh would also be nice to have adaptive, but this change at least
should help laptop users who running X.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
using /dev/consolectl close. This fixes a problem where if
a USB mouse is detached while a button is pressed, that
button is never released.
MFC after: 1 week