Implement the MMC/SD/SDIO protocol within a CAM framework. CAM's
flexible queueing will make it easier to write non-storage drivers
than the legacy stack. SDIO drivers from both the kernel and as
userland daemons are possible, though much of that functionality will
come later.
Some of the CAM integration isn't complete (there are sleeps in the
device probe state machine, for example), but those minor issues can
be improved in-tree more easily than out of tree and shouldn't gate
progress on other fronts. Appologies to reviews if specific items
have been overlooked.
Submitted by: Ilya Bakulin
Reviewed by: emaste, imp, mav, adrian, ian
Differential Review: https://reviews.freebsd.org/D4761
merge with first commit, various compile hacks.
Includes:
- Support for X550EM devices.
- Support for Bypass adapters.
- Flow Director code moved to separate files
- SR-IOV code moved to separate files
- Netmap code moved to separate files
Differential Revision: https://reviews.freebsd.org/D11232
Submitted by: Jeb Cramer <cramerj@intel.com>
Reviewed by: erj@
Tested by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Sponsored by: Intel Corporation
illumos/illumos-gate@770499e185770499e185https://www.illumos.org/issues/8021
The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
community) changes the way the ARC allocates `b_pdata` memory from using linear
`void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
improves ZFS's performance by helping to defragment the address space occupied
by the ARC, in particular for cases where compressed ARC is enabled. It could
also ease future work to allocate pages directly from `segkpm` for minimal-
overhead memory allocations, bypassing the `kmem` subsystem.
This is essentially the same change as the one which recently landed in ZFS on
Linux, although they made some platform-specific changes while adapting this
work to their codebase:
1. Implemented the equivalent of the `segkpm` suggestion for future work
mentioned above to bypass issues that they've had with the Linux kernel memory
allocator.
2. Changed the internal representation of the ABD's scatter/gather list so it
could be used to pass I/O directly into Linux block device drivers. (This
feature is not available in the illumos block device interface yet.)
FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
mapped into the KVA unless explicitly requested, especially on
platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
in the original ABD
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
MFC after: 3 weeks
From the linux tune2fs(8) manpage:
"Allow the kernel to initialize bitmaps and inode tables and keep a high
watermark for the unused inodes in a filesystem, to reduce e2fsck(8) time.
This first e2fsck run after enabling this feature will take the full time,
but subsequent e2fsck runs will take only a fraction of the original time,
depending on how full the file system is."
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11211
system retrieve its config data from the fdt data.
The properties that are common to all phys are decoded and returned in a
structure. The fdt node handles for the mac and phy devices are also
returned in the config data struct, so a driver can easily obtain additional
hardware-specific config values from the fdt data.
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
calling thread to sleep.
This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10986
Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t. The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range. It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.
With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit. That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity. For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
* This change also fixes a possible issue in the existing smart-fifo code,
which set the IWM_SF_CFG_DUMMY_NOTIF_OFF bit on AC8260 chipsets, although
that's only used in iwlwifi for Family 8000 chipsets connected via SDIO
interface.
Obtained from: Dragonflybsd.git cb650b01526b0aeef3c4307d926e7f1428997d50
This is closely tied to the Extended Attribute implementation.
Submitted by: Fedor Uporov
Reviewed by: kevlo, pfg
Differential Revision: https://reviews.freebsd.org/D10807
The latest firmware has a number of link related fixes, support for a
new custom card, and the fix for a bug that affected rate limiting on
FreeBSD.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.
The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.
The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.
Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized
data placement.
The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.
The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds. This feature will
be implemented for driver in future releases.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Jakub Palider <jpa@semihalf.com>
Jan Medala <jan@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10427
The ccr(4) driver supports use of the crypto accelerator engine on
Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.
Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS
cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC,
and SHA2-512-HMAC authentication algorithms. The driver also supports
chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication
algorithm for encrypt-then-authenticate operations.
Note that this driver is still under active development and testing and
may not yet be ready for production use. It does pass the tests in
tests/sys/opencrypto with the exception that the AES-GCM implementation
in the driver does not yet support requests with a zero byte payload.
To use this driver currently, the "uwire" configuration must be used
along with explicitly enabling support for lookaside crypto capabilities
in the cxgbe(4) driver. These can be done by setting the following
tunables before loading the cxgbe(4) driver:
hw.cxgbe.config_file=uwire
hw.cxgbe.cryptocaps_allowed=-1
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D10763
* This adds iwm_mvm_rm_sta(), which will be used to tear down firmware
state for better/cleaner iwm_newstate() handling.
* Makes iwm_enable_txq() and iwm_mvm_flush_tx_path() non-static, add
the declarations to if_iwm_util.h for now.
Obtained from: dragonflybsd.git 85d1c6190c4c3564b1a347f253e823aa95c202b2
For GEN1 Hyper-V, vmbus is attached to pcib0, which contains the
resources for PCI passthrough and SR-IOV. There is no
acpi_syscontainer0 on GEN1 Hyper-V.
For GEN2 Hyper-V, vmbus is attached to acpi_syscontainer0, which
contains the resources for PCI passthrough and SR-IOV. There is
no pcib0 on GEN2 Hyper-V.
The ACPI VMBUS device now only holds its _CRS, which is empty as
of this commit; its existence is mainly for upward compatibility.
Device tree structure is suggested by jhb@.
Tested-by: dexuan@
Collabrated-wth: dexuan@
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D10565
- Create a new file, t4_sched.c, and move all of the code related to
traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.
MFC after: 1 month
Sponsored by: Chelsio Communications
patm(4) devices.
Maintaining an address family and framework has real costs when we make
infrastructure improvements. In the case of NATM we support no devices
manufactured in the last 20 years and some will not even work in modern
motherboards (some newer devices that patm(4) could be updated to
support apparently exist, but we do not currently have support).
With this change, support remains for some netgraph modules that don't
require NATM support code. It is unclear if all these should remain,
though ng_atmllc certainly stands alone.
Note well: FreeBSD 11 supports NATM and will continue to do so until at
least September 30, 2021. Improvements to the code in FreeBSD 11 are
certainly welcome.
Reviewed by: philip
Approved by: harti
numbers with Chacha20. Keep the API, though, as that is what the
other *BSD's have done.
Use the boot-time entropy stash (if present) to bootstrap the
in-kernel entropy source.
Reviewed by: delphij,rwatson
Approved by: so(delphij)
MFC after: 2 months
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10048
retaining various utility functions used during BSM generation,
and a second (audit_bsm_db.c) that contains the various in-kernel
databases supporting various audit activities (the class and
event-name tables).
(No functional change is intended.)
Obtained from: TrustedBSD Project
MFC after: 3 weeks
Sponsored by: DARPA, AFRL
The module is designed for modification of a packets of any protocols.
For now it implements only TCP MSS modification. It adds the external
action handler for "tcp-setmss" action.
A rule with tcp-setmss action does additional check for protocol and
TCP flags. If SYN flag is present, it parses TCP options and modifies
MSS option if its value is greater than configured value in the rule.
Then it adjustes TCP checksum if needed. After handling the search
continues with the next rule.
Obtained from: Yandex LLC
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Yandex LLC
No objection from: #network
Differential Revision: https://reviews.freebsd.org/D10150
The goal of this work is to remove the explicit dependency for ctl(4)
on iscsi(4), so end-users without iscsi(4) support in the kernel can
use ctl(4) for its other functions.
This allows those without iscsi(4) support built into the kernel to use
ctl(4) as a test mechanism. As a sidenote, this was possible around the
10.0-RELEASE period, but made impossible for end-users without iscsi(4)
between 10.0-RELEASE and 11.0-RELEASE.
Automatically load cfiscsi(4) from ctladm(8) and ctld(8) for backwards
compatibility with previously releases. The automatic loading feature is
compiled into the beforementioned tools if MK_ISCSI == yes when building
world.
Add a manpage for cfiscsi(4) and refer to it in ctl(4).
Differential Revision: D10099
MFC after: 2 months
Relnotes: yes
Reviewed by: mav, trasz
Sponsored by: Dell EMC Isilon
instrument security event auditing rather than relying on conventional BSM
trail files or audit pipes:
- Add a set of per-event 'commit' probes, which provide access to
particular auditable events at the time of commit in system-call return.
These probes gain access to audit data via the in-kernel audit_record
data structure, providing convenient access to system-call arguments and
return values in a single probe.
- Add a set of per-event 'bsm' probes, which provide access to particular
auditable events at the time of BSM record generation in the audit
worker thread. These probes have access to the in-kernel audit_record
data structure and BSM representation as would be written to a trail
file or audit pipe -- i.e., asynchronously in the audit worker thread.
DTrace probe arguments consist of the name of the audit event (to support
future mechanisms of instrumenting multiple events via a single probe --
e.g., using classes), a pointer to the in-kernel audit record, and an
optional pointer to the BSM data and its length. For human convenience,
upper-case audit event names (AUE_...) are converted to lower case in
DTrace.
DTrace scripts can now cause additional audit-based data to be collected
on system calls, and inspect internal and BSM representations of the data.
They do not affect data captured in the audit trail or audit pipes
configured in the system. auditd(8) must be configured and running in
order to provide a database of event information, as well as other audit
configuration parameters (e.g., to capture command-line arguments or
environmental variables) for the provider to operate.
Reviewed by: gnn, jonathan, markj
Sponsored by: DARPA, AFRL
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D10149
the default partition, eMMC v4.41 and later devices can additionally
provide up to:
1 enhanced user data area partition
2 boot partitions
1 RPMB (Replay Protected Memory Block) partition
4 general purpose partitions (optionally with a enhanced or extended
attribute)
Of these "partitions", only the enhanced user data area one actually
slices the user data area partition and, thus, gets handled with the
help of geom_flashmap(4). The other types of partitions have address
space independent from the default partition and need to be switched
to via CMD6 (SWITCH), i. e. constitute a set of additional "disks".
The second kind of these "partitions" doesn't fit that well into the
design of mmc(4) and mmcsd(4). I've decided to let mmcsd(4) hook all
of these "partitions" up as disk(9)'s (except for the RPMB partition
as it didn't seem to make much sense to be able to put a file-system
there and may require authentication; therefore, RPMB partitions are
solely accessible via the newly added IOCTL interface currently; see
also below). This approach for one resulted in cleaner code. Second,
it retains the notion of mmcsd(4) children corresponding to a single
physical device each. With the addition of some layering violations,
it also would have been possible for mmc(4) to add separate mmcsd(4)
instances with one disk each for all of these "partitions", however.
Still, both mmc(4) and mmcsd(4) share some common code now e. g. for
issuing CMD6, which has been factored out into mmc_subr.c.
Besides simply subdividing eMMC devices, some Intel NUCs having UEFI
code in the boot partitions etc., another use case for the partition
support is the activation of pseudo-SLC mode, which manufacturers of
eMMC chips typically associate with the enhanced user data area and/
or the enhanced attribute of general purpose partitions.
CAVEAT EMPTOR: Partitioning eMMC devices is a one-time operation.
- Now that properly issuing CMD6 is crucial (so data isn't written to
the wrong partition for example), make a step into the direction of
correctly handling the timeout for these commands in the MMC layer.
Also, do a SEND_STATUS when CMD6 is invoked with an R1B response as
recommended by relevant specifications. However, quite some work is
left to be done in this regard; all other R1B-type commands done by
the MMC layer also should be followed by a SEND_STATUS (CMD13), the
erase timeout calculations/handling as documented in specifications
are entirely ignored so far, the MMC layer doesn't provide timeouts
applicable up to the bridge drivers and at least sdhci(4) currently
is hardcoding 1 s as timeout for all command types unconditionally.
Let alone already available return codes often not being checked in
the MMC layer ...
- Add an IOCTL interface to mmcsd(4); this is sufficiently compatible
with Linux so that the GNU mmc-utils can be ported to and used with
FreeBSD (note that due to the remaining deficiencies outlined above
SANITIZE operations issued by/with `mmc` currently most likely will
fail). These latter will be added to ports as sysutils/mmc-utils in
a bit. Among others, the `mmc` tool of the GNU mmc-utils allows for
partitioning eMMC devices (tested working).
- For devices following the eMMC specification v4.41 or later, year 0
is 2013 rather than 1997; so correct this for assembling the device
ID string properly.
- Let mmcsd.ko depend on mmc.ko. Additionally, bump MMC_VERSION as at
least for some of the above a matching pair is required.
- In the ACPI front-end of sdhci(4) describe the Intel eMMC and SDXC
controllers as such in order to match the PCI one.
Additionally, in the entry for the 80860F14 SDXC controller remove
the eMMC-only SDHCI_QUIRK_INTEL_POWER_UP_RESET.
OKed by: imp
Submitted by: ian (mmc_switch_status() implementation)
When locking a mutex and deadlock is detected the first mutex lock
call that sees the deadlock will return -EDEADLK .
MFC after: 1 week
Sponsored by: Mellanox Technologies
Put large functions into linux_slab.c instead of declaring them static
inline.
Add support for more memory allocation wrappers like kmalloc_array()
and __vmalloc().
Make sure either the M_WAITOK or the M_NOWAIT flag is set and mask
away unused memory allocation flags before calling FreeBSD's malloc()
routine.
Move kmalloc_node() definition to slab.h where it belongs.
Implement support for the SLAB_DESTROY_BY_RCU feature when creating a
kmem_cache which basically means kmem_cache memory is freed using
call_rcu().
MFC after: 1 week
Sponsored by: Mellanox Technologies
This change makes the workqueue implementation behave more like in
Linux, both functionality wise and structure wise.
All workqueue code has been moved to linux_work.c
Add an atomic based statemachine to the work_struct to ensure proper
operation. Prior to this change struct_work was directly mapped to a
FreeBSD task. When a taskqueue has multiple threads the same task may
end up being executed on more than one worker thread simultaneously.
This might cause problems with code coming from Linux, which expects
serial behaviour, similar to Linux tasklets.
Move all global workqueue function names into the linux_xxx domain to
avoid symbol name clashes in the future.
Implement a few more workqueue related functions and macros.
Create two multithreaded taskqueues for the LinuxKPI during module
load, one for time-consuming callbacks and one for non-time consuming
callbacks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
This interface has no in-tree consumers and has been more or less
non-functional for several releases.
Remove manpage note that the procfs special file 'mem' is grouped to
kmem. This hasn't been true since r81107.
Remove procfs' README file. It is an out of date duplication of the manpage
(quoth the README: "since the bsd kernel is single-processor...").
Reviewed by: vangyzen, bcr (manpage)
Approved by: des (procfs maintainer), vangyzen (mentor)
Differential Revision: https://reviews.freebsd.org/D9802
* Uses the IWM_FW_PAGING_BLOCK_CMD firmware command to tell the firmware
what memory ranges to use for paging.
Obtained from: dragonflybsd.git 8a5b199964f8e7bdb00039f0b48817a01b402f18
When allocating unmapped pages, take advantage of the direct map on
AMD64 to get the virtual address corresponding to a page. Else all
pages allocated must be mapped because sometimes the virtual address
of a page is requested.
Move all page allocation and deallocation code into an own C-file.
Add support for GFP_DMA32, GFP_KERNEL, GFP_ATOMIC and __GFP_ZERO
allocation flags.
Make a clear separation between mapped and unmapped allocations.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
* This is more similar to how code/definitions are distributed in
Linux's iwlwifi.
* This should make recognizing new chipset variants, and adding additional
flags from the Linux iwlwifi code easier, without blowing up if_iwm.c
Obtained from: dragonflybsd.git 27d11320e707d2c41424efc1983762f6799941d6
Tasklets are implemented using a taskqueue and a small statemachine on
top. The additional statemachine is required to ensure all LinuxKPI
tasklets get serialized. FreeBSD taskqueues do not guarantee
serialisation of its tasks, except when there is only one worker
thread configured.
MFC after: 1 week
Sponsored by: Mellanox Technologies
A set of helper functions have been added to manage the life of the
LinuxKPI task struct. When an external system call or task is invoked,
a check is made to create the task struct by demand. A thread
destructor callback is registered to free the task struct when a
thread exits to avoid memory leaks.
This change lays the ground for emulating the Linux kernel more
closely which is a dependency by the code using the LinuxKPI APIs.
Add new dedicated td_lkpi_task field has been added to struct thread
instead of abusing td_retval[1].
Fix some header file inclusions to make LINT kernel build properly
after this change.
Bump the __FreeBSD_version to force a rebuild of all kernel modules.
MFC after: 1 week
Sponsored by: Mellanox Technologies
for USB OTG-capable hardware to implement device side of USB
Mass Storage, ie pretend it's a flash drive. It's configured
in the same way as other CTL frontends, using ctladm(8)
or ctld(8). Differently from usfs(4), all the configuration
can be done without rebuilding the kernel.
Testing and review is welcome. Right now I'm still moving,
and I don't have access to my test environment, so I'm somewhat
reluctant to making larger changes to this code; on the other
hand I don't want to let it sit on Phab until my testing setup
is back, because I want to get it into 11.1-RELEASE.
Reviewed by: emaste (cursory), wblock (man page)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8787
compile options. Remove doxygen pointers to now deleted files. Remove
EISA and VME as examples in bus_space.9.
Retained EISA mode code for IO PIC and MPTABLES because that's not
EISA bus, per se, and some people have abused EISA to mean "EISA-like
behavior as opposed to ISA" rather than using it for EISA add-in
cards.
Relnotes: yes
VesaLocalBus or EISA. Internally, EISA and ISA are handled the same,
with VL being handled slightly differently. To avoid too much code
churn, retain the EISA name, despite it being used only for ISA
bus. When it is on the ISA bus, weird gymnastics are required with
EISA-space address accesses as well. Remove known models from the ahc
man page. Remove ahc_eisa module.