support to FreeBSD. A full description of the overall functionality
being added is below. nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."
This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.
Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.
There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback. Questions and feedback are welcome.
nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.
The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7
nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling
nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
human-readable or hex format
- perftest: quick and dirty performance test to measure raw
performance of NVMe device without userspace/physio/GEOM
overhead
The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval
This has been primarily tested on amd64, with light testing on i386. I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself. Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.
The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.
Sponsored by: Intel
Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>
- Use callout(9) rather than timeout(9).
- Add a mutex as an I/O lock that protects the adapter and is used
for the I/O path.
- Add an sx lock as a configuration lock that protects the relationship
of configured volumes.
- Freeze the request queue when a DMA load is deferred with EINPROGRESS
and unfreeze the queue when the DMA callback is invoked.
- Explicitly poll the hardware while waiting to submit a command to
allow completed commands to free up slots in the command ring.
- Remove driver-wide 'initted' variable from mlx_*_fw_handshake() routines.
That state should be per-controller instead. Add it as an argument
since the first caller knows when it is the first caller.
- Remove explicit bus_space tag/handle and use bus_*() rather than
bus_space_*().
- Move duplicated PCI device ID probing into a mlx_pci_match() routine.
- Don't check for PCIM_CMD_MEMEN (the PCI bus will enable that when
allocating the resource) and use pci_enable_busmaster() rather than
manipulating the register directly.
Tested by: no one despite multiple requests (hope it works)
an NVidia Tegra 2 CPU.
Tegra 2 needs an external patch to pmap for atomic operations to work. Even
with this the Kernel only gets to the mount root prompt. As such Tegra
support is considered experimental, however adding the kernel config will
help ensure the Tegra code builds.
These are intended for software TX filtering support, where the NIC
decides there has been too many successive failues to a destination
and will filter it.
Although the filtering is done per-destination (via the keycache),
the state and queue is kept per-TID for now. It simplifies the overall
architecture design and locking.
Whilst here, add ATH_TID_UNLOCK_ASSERT().
* Don't treat high percentage failures as "sucessive failures" - high
MCS rates are very picky and will quite happily "fade" from low
to high failure % and back again within a few seconds. If they really
don't work, the aggregate will just plain fail.
* Only sample MCS rates +/- 3 from the current MCS. Sample will back off
quite quickly, so there's no need to sample _all_ MCS rates between
a high MCS rate and MCS0; there may be a lot of them.
* Modify the smoothing rate to be 75% rather than 95% - it's more adaptive
but it comes with a cost of being slightly less stable at times.
A per-node, hysterisis behaviour would be nicer.
such that when commenting/uncommentting lines, horizontal spacing is
maintained...
Also fix some minor comment formatting to line things up, etc...
Reviewed by: gnn, imp
MFC after: 1 week
(download microcode with offsets, save, and activate).
SATI translation layer was incorrectly using allocation length instead
of blocks, and was constructing the ATA command incorrectly.
Also change #define to specify that the 512 block size here is
specific for DOWNLOAD_MICROCODE, and does not relate to the device's
logical block size.
Submitted by: scottl (with small modifications)
MFC after: 3 days
reside, and move there ipfw(4) and pf(4).
o Move most modified parts of pf out of contrib.
Actual movements:
sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5
sys/netinet/ipfw -> sys/netpfil/ipfw
The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.
Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.
The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.
Discussed with: bz, luigi
MSI are implemented via Inbound Shared Doorbell 1 interrupts. Interrupts
are triggered by writing to Software Triggered Interrupt registeri (PCIe
card using physical address of this register in BAR0 space). There are 32
interrupts available. It can be increased by using Doorbell 2 and
Doorbell 3 registers to 96 interrupts.
Obtained from: Marvell, Semihalf
MSI are implemented via software interrupt. PCIe cards will write
into software interrupt register which will cause inbound shared
interrupt which will be interpreted as a MSI.
Obtained from: Marvell, Semihalf
- Add functions to calculate clocks instead using hardcoded values
- Update reset and timers functions
- Update number of interrupts
- Change name of platform from db88f78100 to db78460
- Correct DRAM size and PCI IRQ routing in dts file.
Obtained from: Semihalf
trap checks (eg. printtrap()).
Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.
Reviewed by: bde, jhb
MFC after: 1 week
- Add constants for the rest of the fields in the PCI-express device
capability and control registers.
- Tweak some of the recently added PCI-e capability constants (always
use hex for offsets in config space, and include a shortened
version of the relevant register in the name of field constants).
MFC after: 1 week
I'm not sure where in the deep, distant past I found the AR_PHY_MODE
registers for half/quarter rate mode, but unfortunately that doesn't
seem to work "right" for non-AR9280 chips.
Specifically:
* don't touch AR_PHY_MODE
* set the PLL bits when configuring half/quarter rate
I've verified this on the AR9280 (5ghz fast clock) and the AR5416.
The AR9280 works in both half/quarter rate; the AR5416 unfortunately
only currently works at half rate. It fails to calibrate on quarter rate.
type of compiler is being used (currently clang or gcc). COMPILER_TYPE
is set in the new bsd.compiler.mk file based on the value of the CC
variable or, should it prove informative, by running ${CC} --version
and examining the output.
To avoid negative performance impacts in the default case and correct
value for COMPILER_TYPE type is determined and passed in the environment
of submake instances while building world.
Replace adhoc attempts at determining the compiler type by examining
CC or MK_CLANG_IS_CC with checks of COMPILER_TYPE. This eliminates
bootstrapping complications when first setting WITH_CLANG_IS_CC.
Sponsored by: DARPA, AFRL
Reviewed by: Yamaya Takashi <yamayan@kbh.biglobe.ne.jp>, imp, linimon
(with some modifications post review)
MFC after: 2 weeks
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process. Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.
Reviewed by: kib
MFC after: 1 week