In the conversion into a tunable, we converted the
size of the s/g list used by the driver to be based
off of a hardcoded size of 128k rather than maxphys,
this caused performance problems for us. Revert this
to use the maxphys tunable.
Note that this constant is used to size dynamically allocated
things, and not static data structs, so this is safe.
Reviewed By: imp, kib, mav
Tested By:i dhw
Differential Revision: https://reviews.freebsd.org/D28023
Sponsored by: Netflix
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
This device cannot cross a 4GB boundary with DMA. Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.
Submitted by: gallatin@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20910
isci(4) uses deferred loading. Typically on amd64 and i386 non-PAE
the tag does not create any restrictions, but on i386 PAE-tables but
non-PAE configs callbacks might be used.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Also correct a typo in the comment for these values, noted by jimharris.
Reviewed by: jimharris
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3715
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are
fixed-length strings. AFAICT the only place they're read is in
sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated.
However, the kernel doesn't null-terminate them. A bunch of copy-pasted code
uses strncpy to write them, and doesn't guarantee null-termination. For at
least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually
overflows. You can see the result by doing "camcontrol negotiate da0 -v".
This change null-terminates those fields everywhere they're set in the
kernel. It also shortens a few strings to ensure they'll fit within the
16-character field.
PR: 215474
Reported by: Coverity
CID: 1009997 1010000 1010001 1010002 1010003 1010004 1010005
CID: 1331519 1010006 1215097 1010007 1288967 1010008 1306000
CID: 1211924 1010009 1010010 1010011 1010012 1010013 1010014
CID: 1147190 1010017 1010016 1010018 1216435 1010020 1010021
CID: 1010022 1009666 1018185 1010023 1010025 1010026 1010027
CID: 1010028 1010029 1010030 1010031 1010033 1018186 1018187
CID: 1010035 1010036 1010042 1010041 1010040 1010039
Reviewed by: imp, sephe, slm
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9037
Differential Revision: https://reviews.freebsd.org/D9038
Certain VM guest types (VMware, Xen) do not support MSI, so pci_alloc_msix()
always fails. isci(4) was not properly detecting the allocation failure,
and would try to proceed with MSIx resource initialization rather than
reverting to INTx.
Reported and tested by: Bradley W. Dutton (brad-fbsd-stable@duttonbros.com)
MFC after: 3 days
Sponsored by: Intel
BIOS always enables PCI busmaster on the isci device, which effectively
worked around this omission. But when passing the isci device through
to a guest VM, the hypervisor will disable busmaster and isci will not
work without calling pci_enable_busmaster().
MFC after: 3 days
Sponsored by: Intel
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.
Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.
Also correct isci to correctly resolve ccb timeout.
This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.
Differential Revision: https://reviews.freebsd.org/D1157
Reviewed by: mav, davide
MFC after: 1 month
Sponsored by: Multiplay
reset device task request from the driver. If the drive fails to respond
with a signature FIS, the driver would previously get into an endless retry
loop, stalling all I/O to the drive and keeping user processes stranded.
Instead, fail the i/o and invalidate the device if the task management
command times out. This is controllable with the sysctl and tunable
hw.isci.fail_on_task_timeout
dev.isci.0.fail_on_task_timeout
The default for these is 1.
Reviewed by: jimharris
Obtained from: Netflix, Inc.
MFC after: 2 days
by treating it as UDMA.
This fixes a problem introduced in r249933/r249939, where CAM sends
ATA_DSM_TRIM to SATA devices using ATA_PASSTHROUGH_16. scsi_ata_trim()
sets protocol as DMA (not UDMA) which is for multi-word DMA, even
though no such mode is selected for the device. isci(4) would fail
these commands which is the correct behavior but not consistent with
other HBAs, namely LSI's.
smh@ did some further testing on an LSI controller, which rejected
ATA_PASSTHROUGH_16 commands with mode=UDMA_OUT, even though only
a UDMA mode was selected on the device. So this precludes adding
any kind of mode detection in CAM to determine which mode to use on
a per-device basis.
Sponsored by: Intel
Discussed with: scottl, smh
Reported by: scottl
Tested by: scottl
MFC after: 3 days
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning. NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph". If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
data buffer for a ccb that is unmapped.
This case is currently not possible, since the SCI framework only
requests these pointers for doing SCSI/ATA translation of non-
READ/WRITE commands. The panic is more to protect against the
unlikely future scenario where additional commands could be unmapped.
Sponsored by: Intel
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
to map, and technically this isn't allowed.
Functionally, it works OK (at least on x86) to call bus_dmamap_load with
a NULL data pointer and zero length, so this is primarily for correctness
and consistency with other drivers.
While here, remove check in isci_io_request_construct for nseg==0.
Previously, bus_dmamap_load would pass nseg==1, even for case where
buffer is NULL and length = 0, which allowed CAM_DIR_NONE CCBs
to get processed. This check is not correct though, and needed to be
removed both for the changes elsewhere in this patch, as well as jeff's
preliminary bus_dmamap_load_ccb patch (which uncovered all of this in
the first place).
MFC after: 3 days
While here, change ISCI_LED to ISCI_PHY since conceptually the hardware
ties the LEDs to a phy and the LEDs for a given phy cannot be controlled
independently.
Submitted by: Paul Maulberger <Paul.Maulberger at gmx.de> (with modifications)
Device nodes are in the format /dev/led/isci.busX.portY.locate.
Sponsored by: Intel
Requested by: Paul Maulberger <paul dot maulberger at gmx dot de>
MFC after: 1 week
(download microcode with offsets, save, and activate).
SATI translation layer was incorrectly using allocation length instead
of blocks, and was constructing the ATA command incorrectly.
Also change #define to specify that the 512 block size here is
specific for DOWNLOAD_MICROCODE, and does not relate to the device's
logical block size.
Submitted by: scottl (with small modifications)
MFC after: 3 days
This routine is intended only for commands such as INQUIRY where
the controller may fill out a smaller amount of data than allocated
by the host.
The end result of this bug was that isci(4) would report non-zero
resid for successful SCSI_UNMAP commands.
Sponsored by: Intel
MFC after: 3 days
This ensures that any ccbs which immediately start during the call to
xpt_release_devq see an accurate picture of the frozen_lun_mask.
Sponsored by: Intel
MFC after: 3 days