If a thread waiting on sx dropped Giant it would not be properly
reacquired on exit from the routine, later resulting in panics
indicating Giant is not held (when it should be).
The bug was not present in the original patch sent to pho, I wittingly
added it just prior to the commit and only smoke-tested it.
Reported by: pho
On arm64 we will need to get the phys_avail array from before the kernel
is excluded to create teh DMAP region. In preperation for this pass in the
array length into regions_to_avail.
This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.
Reported by: <lantw44 at gmail.com>
MFC after: 2 days
lists in the EFI memory map. As such we need to reduce the mappings to
restrict them to not be the full 1G block. For now reduce this to a 2M
block, however this may be further restricted to be 4k page aligned as
other SoCs may require.
This allows ThunderX2 to boot reliably to userspace without performing
any speculative memory accesses to invalid physical memory.
Sponsored by: DARPA, AFRL
On some arm64 boards we need to access memory in ACPI tables that is not
mapped in the DMAP region. To handle this create the needed mappings in
pmap_mapbios in the KVA space.
Submitted by: Michal Stanek (mst@semihalf.com)
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D15059
The main advantage of this is to allow us to exclude memory from being
used by the kernel. This may be from the memreserve property, or ranges
marked as no-map under the reserved-memory node.
More work is still needed to remove the physmap array. This is still used
for creating the DMAP region, however other patches need to be committed
before we can remove this.
Obtained from: ABT Systems Ltd
Sponsored by: Turing Robotic Industries
This will help simplify the arm64 code and allow us to properly exclude
memory that should never be mapped.
Obtained from: ABT Systems Ltd
Sponsored by: Turing Robotic Industries
This reduces the overhead when we have many small mappings, e.g. on some
EFI systems. This is to help use this code on arm64 where we may have a
large number of entries from the EFI firmware.
Obtained from: ABT Systems Ltd
Sponsored by: Turing Robotic Industries
Differential Revision: https://reviews.freebsd.org/D15477
Linux will not connect to a backend that's in state 3
(XenbusStateInitialised), it needs to be in state 2
(XenbusStateInitWait) for Linux to attempt to connect to the backend.
The protocol seems to suggest that the backend should indeed wait in
state 2 for the frontend to connect, which makes state 3 unusable for
disk backends.
Also make sure blkback will connect to the frontend if the frontend
reaches state 3 (XenbusStateInitialised) before blkback has processed
the results from the hotplug script (Submitted by Nathan Friess).
MFC after: 1 week
This corrects a warning issues by gcc9:
/srv/src/freebsd/head/usr.bin/top/machine.c:988:22: warning: '%5zu'
directive writing between 5 and 20 bytes into a
region of size 15 [-Wformat-overflow=]
sprintf(status, "?%5zu", state);
A constant stream of readers could completely starve writers and this is not
a hypothetical scenario.
The 'poll2_threads' test from the will-it-scale suite reliably starves writers
even with concurrency < 10 threads.
The problem was run into and diagnosed by dillon@backplane.com
There was next to no change in lock contention profile during -j 128 pkg build,
despite an sx lock being at the top.
Tested by: pho
Writers waiting on readers to finish can set the RW_LOCK_WRITE_SPINNER
bit. This prevents most new readers from coming on. However, the last
reader to unlock also clears the bit which means new readers can sneak
in and the cycle starts over.
Change the code to keep the bit after last unlock.
Note that starvation potential is still there: no matter how many write
spinners are there, there is one bit. After the writer unlocks, the lock
is free to get raided by readers again. It is good enough for the time
being.
The real fix would include counting writers.
This runs into a caveat: the writer which set the bit may now be preempted.
In order to get rid of the problem all attempts to set the bit are preceeded
with critical_enter.
The bit gets cleared when the thread which set it goes to sleep. This way
an invariant holds that if the bit is set, someone is actively spinning and
will grab the lock soon. In particular this means that readers which find
the lock in this transient state can safely spin until the lock finds itself
an owner (i.e. they don't need to block nor speculate how long to spin
speculatively).
Tested by: pho
r334005: add pc_ibpb_set as it is now referenced by common code
(although presumably not needed on i386 since it has been there
since the first spectre mitigation work on amd64)
r334009: there is no amd64 rflags -> i386 eflags
IPMI access on PowerNV systems is done through the OPAL firmware. This adds a
simple attachment for communicating with the FSP/BMC on these machines. This
has been tested on a Talos POWER9 workstation, only in the bootup phase, noting
the successful attachment messages:
...
ipmi0: IPMI device rev. 0, firmware rev. 2.00, version 2.0, device support mask 0
ipmi0: Number of channels 2
...
The ipmi device has not been added to GENERIC64, but may be after further
testing. It may also eventually be added to the ipmi module at that point.
Summary:
PowerNV architectures (in the test case POWER9) export sensors via the device
tree, which are accessed via OPAL calls. This adds sysctl nodes for each
device in a generic fashion. New sysctl nodes are:
dev.opal_sensor.N.sensor
dev.opal_sensor.N.sensor_min
dev.opal_sensor.N.sensor_max
dev.opal_sensor.N.type
dev.opal_sensor.N.label
These are rooted at a parent attachment under opal, called opalsens. This does
not add support for the "sensor groups" defined in the device tree.
Reviewed by: breno.leitao_gmail.com
Differential Revision: https://reviews.freebsd.org/D15362
- Add constants for fields in DR6 and the reserved fields in DR7. Use
these constants instead of magic numbers in most places that use DR6
and DR7.
- Refer to T_TRCTRAP as "debug exception" rather than a "trace trap"
as it is not just for trace exceptions.
- Always read DR6 for debug exceptions and only clear TF in the flags
register for user exceptions where DR6.BS is set.
- Clear DR6 before returning from a debug exception handler as
recommended by the SDM dating all the way back to the 386. This
allows debuggers to determine the cause of each exception. For
kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value
to other parts of the handler (namely, user_dbreg_trap()). For user
traps, wait until after trapsignal to clear DR6 so that userland
debuggers can read DR6 via PT_GETDBREGS while the thread is stopped
in trapsignal().
Reviewed by: kib, rgrimes
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15189
Always disable FIFO access as we don't use it.
Rename some register bits so they are in sync with the register name.
While here add my copyright as I've probably wrote 70% of the code here.
Speculative Store Bypass (SSB) is a speculative execution side channel
vulnerability identified by Jann Horn of Google Project Zero (GPZ) and
Ken Johnson of the Microsoft Security Response Center (MSRC)
https://bugs.chromium.org/p/project-zero/issues/detail?id=1528.
Updated Intel microcode introduces a MSR bit to disable SSB as a
mitigation for the vulnerability.
Introduce a sysctl hw.spec_store_bypass_disable to provide global
control over the SSBD bit, akin to the existing sysctl that controls
IBRS. The sysctl can be set to one of three values:
0: off
1: on
2: auto
Future work will enable applications to control SSBD on a per-process
basis (when it is not enabled globally).
SSBD bit detection and control was verified with prerelease microcode.
Security: CVE-2018-3639
Tested by: emaste (previous version, without updated microcode)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
This change adds support for a UBS<->RS232 adapter based on CH340 (or an
analogue) that I own. The device seems to have a newer internal version
(0x30) and the existing code incorrectly configures line control for it
resulting in garbled transmission. The changes are based on what I
learned in Linux drivers for the same hardware.
Additional changes:
- use UCHCOM_REG_LCR1 / UCHCOM_REG_LCR2 instead of explicit 0x18 and
0x25
- use NULL instead of 0 where a pointer is expected
Reviewed by: hselasky
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15498
Also, add definitions for more bits of UCHCOM_REG_LCR1 as seen in the
Linux driver. UCHCOM_LCR1_PARENB definition was different from that in
the Linux driver and clashed with newly added UCHCOM_LCR1_RX. I took a
liberty to change UCHCOM_LCR1_PARENB to the Linux definition as it was
unused in the driver anyway. This change should make
uchcom_cfg_set_break() easier to understand.
Approved by: hselasky
MFC after: 2 weeks
Product IDs are specified in vendor documents. The previously used
device ID is not. This is a cosmetic change. No functionality depends
on those IDs.
Reviewed by: hselasky
MFC after: 2 weeks
I have a system that is very unstable after resuming from suspend-to-RAM
but only if HPET is used as the event timer. The theory is that SMM
code / firmware could be enabling HPET for its own uses and unexpected
interrupts cause a trouble for it. Originally I wanted to solve the
problem in hpet_suspend() method, but that was insufficient as the event
timer could get reprogrammed again.
So, it's better, for my case and in general, to stop the event timer(s)
before entering the hardware suspend.
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D15413
When we issue shootdown IPIs, we first assign zero to pm_gens to
indicate the need to flush on the next context switch in case our IPI
misses the context, next we read pm_active. On context switch we set
our bit in pm_active, then we read pm_gen. It is crucial that both
threads see the memory in the program order, otherwise invalidation
thread might read pm_active bit as zero and the context switching
thread might read pm_gen as zero.
IA32 allows CPU for both reads to see zero. We must use the barriers
between write and read. The pm_active bit set is already locked, so
only the invalidation functions need it.
I never saw it in real life, or at least I do not have a good
reproduction case. I found this during code inspection when hunting
for the Xen TLB issue reported by cperciva.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15506