This is a primary boot loader that is intended to implement the
gptboot partition selection algorithm just like we did for BIOS
booting. While the preferred method for UEFI is to use the UEFI Boot
Manager protocol, there are situations where that can't be done: some
BIOS makers interfere with the protocol in unhelpful ways, there's a
new standard for a zero variable write from the client OS, and finally
for USB drives that might be mobile between systems with multiple
partitions there needs to be a media stable way to select.
Reviewed by: tsoome, bcran
Differential Revision: https://reviews.freebsd.org/D20547
Segregate the disk probing and selection protocol from the rest of the
boot loader.
Reviewed by: tsoome, bcran
Differential Revision: https://reviews.freebsd.org/D20547
that it becomes increasingly expensive to process a steady stream of
correctable errors. Additionally, the memory used by the MCA entries can
grow without bound.
Change the code to maintain two separate lists: a list of entries which
still need to be logged, and a list of entries which have already been
logged. Additionally, allow a user-configurable limit on the number of
entries which will be saved after they are logged. (The limit defaults
to -1 [unlimited], which is the current behavior.)
Reviewed by: imp, jhb
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20482
Apply a linker script when linking i386 kernel modules to apply padding
to a set_pcpu or set_vnet section. The padding value is kind-of random
and is used to catch modules not compiled with the linker-script, so
possibly still having problems leading to kernel panics.
This is needed as the code generated on certain architectures for
non-simple-types, e.g., an array can generate an absolute relocation
on the edge (just outside) the section and thus will not be properly
relocated. Adding the padding to the end of the section will ensure
that even absolute relocations of complex types will be inside the
section, if they are the last object in there and hence relocation will
work properly and avoid panics such as observed with carp.ko or ipsec.ko.
There is a rather lengthy discussion of various options to apply in
the mentioned PRs and their depends/blocks, and the review.
There seems no best solution working across multiple toolchains and
multiple version of them, so I took the liberty of taking one,
as currently our users (and our CI system) are hitting this on
just i386 and we need some solution. I wish we would have a proper
fix rather than another "hack".
Also backout r340009 which manually, temporarily fixed CARP before 12.0-R
"by chance" after a lead-up of various other link-elf.c and related fixes.
PR: 230857,238012
With suggestions from: arichardson (originally last year)
Tested by: lwhsu
Event: Waterloo Hackathon 2019
Reported by: lwhsu, olivier
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D17512
Add a CAM-Newbus SDIO support module. This works provides a newbus
infrastructure for device drivers wanting to use SDIO. On the lower end
while it is connected by newbus to SDHCI, it talks CAM using the MMCCAM
framework to get to it.
This also duplicates the usbdevs framework to equally create sdiodev
header files with #defines for "vendors" and "products".
Submitted by: kibab (initial work, see https://reviews.freebsd.org/D12467)
Reviewed by: kibab, imp (comments on earlier version)
MFC after: 6 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19749
In the DMA case, given we disable the data interrupts, we never seem
to get DATA_END. Given we are relying on DMA interrupts we are not
using the SDHCI state machine and hence only call into
sdhci_platform_will_handle() for the first check of data.
We do not call "will handle" for any following round trips of the same
transaction if block size * count > BCM_DMA_BLOCK_SIZE.
Manually check "left" in the DMA interrupt handler to see if we have at
least another full BCM_DMA_BLOCK_SIZE to handle.
Without this change we would DMA that and then even start a DMA with
left == 0 which would lead to a timeout and error.
Now we re-enable data interrupts and return and let the SDHCI generic
interrupt handler and state machine pick the SPACE_AVAIL up and then
find that it should punt to the pio_handler for the remaining bytes
or finish the data transaction.
With this change block mode seems to work beyond 7 * 64byte blocks,
which worked as it was below BCM_DMA_BLOCK_SIZE.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20199
Extending what the initial revision, r273264, r276985, r277346 have
started for the transfer mode and command registers, another pair of
16bit registers written in sequence are block size and block count,
which fall together onto the same 32bit line and hence the same
register(s) would be written twice in sequence for those as well.
Use a similar approach to transfer mode and command and save the writes
to either of the block regiters and then only execute a write once.
We can do this as with transfer mode their values are meaningless until
a command is issued so we can use that write to command as a trigger
to also write out the block registers.
Compared to transfer mode and command the value of block count can
change, so we need to keep state and actually read the block registers
back the first time after a write.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20197
Currently slot_printf() uses two printf() calls to print the
device-slot name, and actual message. When other printf()s are
ongoing in parallel this can lead to interleaved message on the console,
which is especially unhelpful for debugging or error messages.
Take a hit on the stack and vsnprintf() the message to the buffer.
This way it can be printed along with the device-slot name in one go
avoiding console gibberish.
Reviewed by: marius
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19747
Add cam_sim_alloc_dev() as a wrapper to cam_sim_alloc() which takes
a device_t instead of the unit_number (which we can derive from the
dev again).
Add device_t sim_dev to struct cam_sim. It will be used to pass through
the bus for cases when both sides of CAM speak newbus already and we want
to link them (yet make the calls through CAM for now).
SDIO will be the first consumer of this. For that make use of
cam_sim_alloc_dev() in sdhci under MMCCAM.
This will also allow people to start iterating more on the idea
to newbus-ify CAM without changing 50+ device drivers from the start.
Also to be clear there are callers to cam_sim_alloc() which do not
have a device_t (e.g., XPT) or provide their own unit number so we cannot
simply switch the KPI entirely.
Submitted by: kibab (original idea, see https://reviews.freebsd.org/D12467)
Reviewed by: imp, chuck
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19746
Convert the array to use C99 initializers.
Make it constant.
Replace MAX_TRAP_MSG with nitems().
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
configuration descriptor reads early on to avoid issues with devices
that don't check for a valid USB configuration read request.
Submitted by: takahiro.kurosawa@gmail.com
PR: 238412
MFC after: 3 days
When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size). Use
of signed comparison there fixes the problem.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
While formally it is not necessary, but the sooner it start, the sooner it
finish, and supposedly less disturbing for workload it will be.
MFC after: 2 weeks
Differentiate between PCI Express Endpoint devices and Root Complex
Integrated Endpoints in the nda driver. The Link Status and Capability
registers are not valid for Integrated Endpoints and should not be
displayed. The bhyve emulated NVMe device will advertise as being an
Integrated Endpoint.
Reviewed by: imp
Approved byL imp (mentor)
Differential Revision: https://reviews.freebsd.org/D20282
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470
This set of changes make it possible to run FreeBSD for PowerPC64/pseries,
under QEMU/KVM, without requiring the host to make hugepages available to the
guest.
While there was already this possibility, by means of setting hw_direct_map to
0, on PowerPC64 there were a couple of issues/wrong assumptions that prevented
this from working, before this changelist.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20522
The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:
nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link
The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.
Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.
Reviewed by: imp, ken, araujo, jhb, rgrimes
Approved by: imp (mentor), ken (mentor), jhb (maintainer)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19904
This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531
Rather than uncoditionally setting the MEMEN and PORTEN bits in
PCIR_COMMAND for PCI devices, set the respective bit when the first
BAR of a given type is added to the device. This more closely matches
what firmware does on bare metal.
BUSMASTEREN is still set unconditionally. Eventually this bit should
move into the device models as not all device models need this set.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20530
This gets reasonably close to the existing format in sys/kern but will
probably require some changes to upstream clang-format before it can be
used as the default formatting tool.
I tried formatting a few files in sys/kern and the result is pretty close to
the existing code. However, this configuration file is not ready to be used
without manually checking the output.
Reviewed By: emaste
Differential Revision: https://reviews.freebsd.org/D20533
- Add constants for OpenBSD wxneeded, bootdata and randomize to the
FreeBSD elf_common.h file. This is the file that gets used by the
elftoolchain library.
- Update readelf and elfdump utilities to decode these program headers
if they are encountered.
Note: FreeBSD has it's own version of elfdump(1), which will be updated
in a subsequent commit. I am adding it here anyway because this diff is
going to be submitted upstream.
Discussed with: emaste
Reviewed by: imp
MFC afer: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20548
M contrib/elftoolchain/elfdump/elfdump.c
M contrib/elftoolchain/readelf/readelf.c
M sys/sys/elf_common.h
Before r305323 (MFV r302991: 6950 ARC should cache compressed data)
arc_read() code did this for access to a ghost buffer:
arc_adapt() (from arc_get_data_buf())
arc_access(hdr, hash_lock)
I.e., we first checked access to the MFU ghost/MRU ghost buffer and
adapt MFU/MRU sizes (in arc_adapt()) and next move buffer from the ghost
state to regular.
After r305323 the sequence is different:
arc_access(hdr, hash_lock);
arc_hdr_alloc_pabd(hdr);
I.e., we first move the buffer from the ghost state in arc_access() and
then we check access to buffer in ghost state (in arc_hdr_alloc_pabd()
-> arc_get_data_abd() -> arc_get_data_impl() -> arc_adapt()). This is
incorrect: arc_adapt() never see access to the ghost buffer because
arc_access() already migrated the buffer from the ghost state to
regular.
So, the fix is to restore a call to arc_adapt() before arc_access() and
to suppress the call to arc_adapt() after arc_access().
Submitted by: Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after: 2 weeks
Sponsored by: Integros [integros.com]
Differential Revision: https://reviews.freebsd.org/D19094
Simplify the code a bit and rework how we report the results
of the probing.
Reviewed by: tsoome@
Differential Revision: https://reviews.freebsd.org/D20537
ZFS ABD allocates tons of 4KB chunks via UMA, requiring huge hash tables.
With initial hash table size of only 32 elements it takes ~20 expansions
or ~400 seconds to adapt to handling 220GB ZFS ARC. During that time not
only the hash table is highly inefficient, but also each of those expan-
sions takes significant time with the lock held, blocking operation.
On my test system with 256GB of RAM and ZFS pool of 28 HDDs this change
reduces time needed to first time read 240GB from ~300-400s, during which
system is quite busy and unresponsive, to only ~150s with light CPU load
and just 5 sub-second CPU spikes to expand the hash table.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Note llvm-ar is linked to llvm-ranlib since r311565. r348677 fixed
"make delete-old" issue with llvm-ar but missed it somehow.
Discussed with: emaste, jhb
BootServices AllocatePool/FreePool calls. They are simpler to use and
result in the same thing happening.
Reviewed by: tsoome@
Differential Revision: https://reviews.freebsd.org/D20540
This fixes a panic in Espressobin when gpioregulator fails to allocate the
GPIO pin (the GPIO controller is not there).
Sponsored by: Rubicon Communications, LLC (Netgate)
Provide the acpi handle path as the location string for the nvdimm
children of the nvdimm_root device.
Reviewed by: kib
Approved by: jhb (mentor)
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20528
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation. For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.
Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274
The D_PARTNONE is documented to make it possible to open raw MBR
partition, but the current disk_open() does not really implement this
statement.
The current code is checking partition against -1 (D_PARTNONE) but does
attempt to open partition table in case we do have FreeBSD MBR partition type.
Instead, we should check -2 (D_PARTWILD).
In case we do have MBR + BSD label, this code is only working because
by default, the first BSD partiton is created starting with relative sector
0, and we can still access the BSD table from that MBR slice.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20501