Port illumos change: https://www.illumos.org/issues/11667
Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be
usable from kernel/stand/userland builds, so we can use just one single
source. Add lz4.h to declare lz4_compress() and lz4_decompress().
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22037
In case of efi console having serial backend (video + serial or only serial),
we need to stick with old emulator till we can draw console.
Eventually we would need to get console terminal emulator to be removed
from serial console because the serial link already has the terminal.
However, we need to implement comconsole on all efi platforms first, then
we need the ability to draw console, so we do not have to use SimpleTextOutput
protocol (which will write both on video and serial in case of multiplexed
ComOut).
Differential Revision: https://reviews.freebsd.org/D22161
Actual modules get require()'d in, rather than try_include(). All instances
of try_include should be provided with proper hooks/API in the rest of
loader to do the work they need to do, since we can't rely on them to exist.
Convert this now to lfs + dofile since we won't really be treating them as
modules.
lfs is required because dofile will properly throw an error if the file
doesn't exist, which is not in the spirit of 'optionally included'.
Getting out of the pcall game allows us to provide a loader.exit() style
call that backs out to the common bits of loader (autoboot sequence unless
disabled with a loader.setenv("autoboot_delay", "NO")). The most ideal way
identified so far to implement loader.exit() is to throw a special
abort-style error that indicates to the caller in interp_lua that we've not
actually errored out, just continue execution. Otherwise, we have to hack in
logic to bubble up and return from loader.lua without continuing further,
which gets kind of ugly depending on the context in which we're aborting.
A compat shim is provided temporarily in case the executing loader doesn't
yet have loader.lua_path, which was just added in r354246.
As described previously, loader.lua_path is absolute path where scripts are
installed. A future commit will use this to build paths for dofile in
try_include, rather than the current pcall/require setup that makes it more
difficult to coordinate loader aborts from local.lua -- we do not need the
flexibility of require(), and local.lua is in-fact not a 'module-like' file
as we will not be referencing anything from it.
Multiple places coordinate to 'know' where lua scripts are installed. Knock
this down to being formally defined (and overridable) in exactly one spot,
defs.mk, and spread the knowledge to loaders and liblua alike. A future
commit will expose this to lua as loader.lua_path, so it can build absolute
paths to lua scripts as needed.
MFC after: 1 week
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.
This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Current implementation of ppcfbsd_pc_in_sigtramp() seems to take only 32-bit
PowerPC in account, as on 64-bit PowerPC most kernel instruction addresses will
be wrongly reported as in sigtramp.
This change adds proper sigtramp detection for PPC64.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22199
mdmfs(8) lacks the ability to populate throwaway memory filesystems from an
existing directory.
This features permits an interesting setup where /var for instance lives on
a device where wear-leveling is something you want to avoid as much as
possible and nonetheless you don't want to lose your logs, ports metadata,
etc. Here are the steps:
1. Copy /var to /var.bak;
2. Mount an mfs into /var using -k /var.bak at startup;
3. Synchronize /var to /var.bak weekly and on shutdown.
Note that this more or less mimics OpenBSD's mount_mfs(8) -P flag.
PR: 146254
Submitted by: jlh (many moons ago)
MFC after: 1 week
It's possible, with per-CPU mappings, for TLB1 indices to get out of sync.
This presents a problem when trying to insert an entry into TLB1 of all
CPUs. Currently that's done by assuming (hoping) that the TLBs are
perfectly synced, and inserting to the same index for all CPUs. However,
with aforementioned private mappings, this can result in overwriting
mappings on the other CPUs.
An example:
CPU0 CPU1
<setup all mappings> <idle>
3 private mappings
kick off CPU 1
initialize shared mappings (3 indices low)
Load kernel module, triggers 20 new mappings
Sync mappings at N-3
initialize 3 private mappings.
At this point, CPU 1 has all the correct mappings, while CPU 0 is missing 3
mappings that were shared across to CPU 1. When CPU 0 tries to access
memory in one of the overwritten mappings, it hangs while tripping through
the TLB miss handler. Device mappings are not stored in any page table.
This fixes by introducing a '-1' index for tlb1_write_entry_int(), so each
CPU searches for an available index private to itself.
MFC after: 3 weeks
In nearly all cases, the caller has a uintptr_t compatible argument so
this eliminates a large number of casts.
Add a print_pointer function to centralize printing pointers.
Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22212
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Reviewed by: hrs, bcr, lwhsu, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22146
bzero the entire thrmisc struct, not just the padding. Other core dump
notes are already done this way.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Replace explicit TARGET_* variables with COMPAT_* versions defined based
on where the file is being included.
Also, require that bsd.compat.mk be included directly. It's not going to
be widely used so always loading it in bsd.prog.mk doesn't make sense.
Instead users can include it directly.
Reviewed by: imp, bdrewery (prior revision)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22059
In this release the netmap support was introduced.
Moreover, it is also now possible to use the LLQ mode of the driver on
the arm64 AWS instances (A1 type).
Differential Revision: https://reviews.freebsd.org/D21938
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
In NETMAP mode not all queues need to be allocated to NETMAP. Some of
them could be left to the kernel. Configuration is managed by the flags
nr_mode and nr_pending_mode provided per each NETMAP kring.
ENA driver checks those flags and perform proper rings initialization.
Differential Revision: https://reviews.freebsd.org/D21937
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.
When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.
When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.
When Tx interrupt occurrs, ena_netmap_tx_irq is called.
ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.
ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.
Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.
ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.
Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.
ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.
Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.
ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.
In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.
ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.
Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.
Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.
Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Mock implementation of NETMAP routines is located in ena_netmap.c/.h
files. All code is protected under the DEV_NETMAP macro. Makefile was
updated with files and flag.
As ENA driver provide own implementations of (un)likely it must be
undefined before including NETMAP headers.
ena_netmap_attach function is called on the end of NIC attach. It fills
structure with NIC configuration and callbacks. Then provides it to
netmap_attach. Similarly netmap_detach is called during ena_detach.
Three callbacks are used.
nm_register is implemented by ena_netmap_reg. It is called when user
space application open or close NIC in NETMAP mode. Current action is
recognized based on onoff parameter: true means on and false off. As
NICs rings need to be reconfigured ena_down and ena_up are reused.
When user space application wants to receive new packets from NIC
nm_rxsync is called, and when there are new packets ready for Tx
nm_txsync is called.
Differential Revision: https://reviews.freebsd.org/D21934
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.
Differential Revision: https://reviews.freebsd.org/D21933
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
When the ENA_FLAG_DEVICE_RUNNING flag is disabled, the AENQ handlers
aren't executed. To fix that, the watchdog timestamp should be updated
just before enabling the watchdog.
Timer service was always being enabled, even if the device wasn't up
before the reset. That shouldn't happen, as the timer service is being
executed only for working interface.
Differential Revision: https://reviews.freebsd.org/D21932
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the pmamp_change_attr() is public on arm64 since r351131, it can be
used on the arm64 to map memory range as with the write combined
attribute.
It requires the driver to use generic VM_MEMATTR_WRITE_COMBINING flag
instead of the x86 specific PAT_WRITE_COMBINING.
Differential Revision: https://reviews.freebsd.org/D21931
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Altough in the comment above the pmap_change_attr() it was mentioned
that VA could be in KV or DMAP memory space. However,
pmap_change_attr_locked() was accepting only the values inside the DMAP
memory range.
To fix that, the condition check was changed so also the va inside the
KV memory range would be accepted.
The sample use case that wasn't supported is the PCI Device that has the
BAR which should me mapped with the Write Combine attribute - for
example BAR2 of the ENA network controller on the A1 instances on AWS.
Tested on A1 AWS instance and changed ENA BAR2 mapped resource to be
write-combined memory region.
Differential Revision: https://reviews.freebsd.org/D22055
MFC after: 2 weeks
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
There was a couple issues with GDB machdep code for PPC/PPC64, the main ones being:
- wrong register sizes being returned
- pcb_context index was wrong (this affects all PPC variants)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22201
In some scenarios, the 4K trapstk may overflow, corrupting tmpstk.
This was observed during remote debugging, with the following steps:
At remote host (R):
- enter kdb during boot
- switch to gdb backend
At local host (L):
- attach gdb to R
- try to read an invalid memory position
At R:
- a DSI trap occurs and kdb restarts (all this occurs on trapstk)
- while printing the stacktrace, trapstk overflows and corrupts tmpstk
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22200
First, SCL low timeout is set to 25 milliseconds by default as opposed
to 1 millisecond before. The new value is based on the SMBus
specification. The timeout can be changed on a per bus basis using
dev.iicbb.N.scl_low_timeout sysctl.
The driver uses DELAY to wait for high SCL up to 1 millisecond, then it
switches to pause_sbt(SBT_1MS) for the rest of the timeout.
While here I made a number of other changes. 'udelay' that's used for
timing clock and data signals is now calculated based on the requested
bus frequency (dev.iicbus.N.frequency) instead of being hardcoded to 10
microseconds. The calculations are done in such a fashion that the
default bus frequency of 100000 is converted to udelay of 10 us. This
is for backward compatibility. The actual frequency will be less than a
quarter (I think) of the requested frequency.
Also, I added detection of stuck low SCL in a few places. Previously,
the code would just carry on after the SCL low timeout and that might
potentially lead to misinterpreted bits.
Finally, I fixed several style issues near the code that I changed.
Many more are still remaining.
Tested by accessing HTU21 temperature and humidity sensor in this setup:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
gpio1: <Nuvoton GPIO controller> at GPIO ldn 0x07 on superio0
pcib0: allocated type 4 (0x220-0x226) for rid 0 of gpio1
gpiobus1: <GPIO bus> on gpio1
gpioiic0: <GPIO I2C bit-banging driver> at pins 14-15 on gpiobus1
gpioiic0: SCL pin: 14, SDA pin: 15
iicbb0: <I2C bit-banging driver> on gpioiic0
iicbus0: <Philips I2C bus> on iicbb0 master-only
iic0: <I2C generic I/O> on iicbus0
Discussed with: ian, imp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22109
From Jake:
The iflib stack failed to release all of the memory allocated under
M_IFLIB during device detach.
Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx
DMA info were not being released.
Release this memory so that iflib won't leak memory when a device
detaches.
Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we
need to call this only after iflib_dma_free in iflib_tx_structures_free.
Additionally, also ensure that we destroy the callout mutex associated
with each Tx queue when we free it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22157
Using cam_sim_alloc_dev() allows to properly set sim_dev field so that
sdiob(4) can attach to the CAM device that represents SDIO card.
The same change for SDHCI driver happened in r348800.
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D22192
This is in preparation for adding the corresponding support to iwm(4).
Version 46 is the latest but contains unrecognized TLVs, so use version
43 for now.
Obtained from: linux-firmware
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Arm64 allows us to create execute only mappings. To make sure userspace is
unable to accidentally execute kernel code set the user execute never
bit in the kernel page tables.
MFC after: 1 week
Sponsored by: DARPA, AFRL
r351049 bogusly deleted these lines from files.amd64 but failed to add them to
files.x86. Since this works on i386, add them to files.x86 rather than just
adding them back to files.amd64.
PR: 240734
Reported by: Michael Pro
Summary:
ARM64 currently treats all data abort exceptions as page faults. This
can cause infinite loops on non-page fault faults, such as alignment faults.
Since kernel-side alignment faults should be avoided, this adds support directly
to the el0 fault handler, instead of the data_abort() handler.
Test Plan: Tested on rpi3, with a misaligned ldm test.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D22133
I limited potentially infinite timings by 960 us based on a footnote on
page 38 of Maxim Integrated Application Note 937, Book of iButton
Standards: "In order not to mask interrupt signalling by other devices
on the 1–Wire bus, tRSTL + tR should always be less than 960 us."
MFC after: 3 weeks