Return while there are any I/Os in a queue may result in them stuck
indefinitely, since there is only one taskqueue task for all of them.
I think I've reproduced this by switching ha_role to secondary under
heavy load.
MFC after: 3 days
There were two definitions for the SCSI VPD Block Device Characteristics (page
0xb1): struct scsi_vpd_block_characteristics and struct
scsi_vpd_block_device_characteristics. The latter is more complete and more
widely used. Convert uses of the former to the latter by tweaking the da driver
and removing sturct scsi_vpd_block_characteristics.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
only one row (cache line). Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.
Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve. About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
Add 04/25 Depopulation restoration in progress, 31/04 Depopulation failed, and
31/05 Depopulation restoration failed.
These are defined in SPC-6r2 (though 31/4 was added in an earlier draft). They
relate to different aspects of in-progress or failed depopulation removal and
restoration commands.
This makes random read benchmarks look better on a wide ZFS pools.
I am not sure where the original value goes from, but it is there
for too long now.
MFC after: 1 week
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
MFC after: 1 month
Switch OOA queue from TAILQ to LIST and change its direction, so that
we traverse it forward, not backward. There is only one place where
we really need other direction, and it is not critical.
Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.
Replace few impossible conditions with assertions.
MFC after: 1 month
Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
context instead of scheduling another thread just for that. This call
may sleep, that is not acceptable for some frontends like the original
CAM/FC one, but iSCSI already has separate sleepable per-connection RX
threads, and another thread scheduling is mostly just a waste of time.
IOCTL frontend actually waits for the I/O completion in the caller
thread, so the use of another thread for this has even less sense.
With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
to ZFS.
MFC after: 1 month
If a disk's SIM doesn't support polling, then it can't be used to
store crashdumps. Leave d_dump NULL in that case so that dumpon(8)
fails gracefully rather than having dumps fail at crash time.
Reviewed by: scottl, mav, imp
MFC after: 2 weeks
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D28454
Some CAM sim drivers do not support polling (notably iscsi(4)).
Rather than using a no-op poll routine that always times out requests,
permit a SIM to set a NULL poll callback. cam_periph_runccb() will
fail polled requests non-pollable sims immediately as if they had
timed out.
Reviewed by: scottl, mav (earlier version)
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D28453
RFC 7143 (11.7.4):
The Target Transfer Tag values are not specified by this protocol,
except that the value 0xffffffff is reserved and means that the
Target Transfer Tag is not supplied.
MFC after: 1 month
There are no non-MPSAFE SIM drivers left in the tree, verified with
coccinelle.
Reviewed by: scottl, imp
Differential Revision: https://reviews.freebsd.org/D27853
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB. This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os. The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
There are two ways to propagate the error in MMCCAM:
* Using cmd.error which is set by the peripheral driver;
* Using CCB status which is... also set by the driver.
The problem is that those two error conditions don't necessarily match.
This leads to the confusion when handling the MMC reply. So enforce the consistency
by panicking if request is marked as completed successfully but MMC-level error
is present (this hints to the programming error).
Reviewed by: manu
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D26925
Foundation copyrights, approved by emaste@. It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.
Reviewed by: emaste, imp, gbe (manpages)
Differential Revision: https://reviews.freebsd.org/D26980
SAM-3 specification introduced concept of Task Priority, that was renamed
to Command Priority in SAM-4, and supported by all modern SCSI transports.
It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and
0 - default. SAT specification for SATA devices translates priorities 1-3
into NCQ high priority.
This change adds new "priority" field into empty spots of struct ccb_scsiio
and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL. Respective
support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers
for both initiator and where applicable target roles. Minimal support was
added to CTL to receive the priority value from different frontends, pass it
between HA controllers and report in few places.
This patch does not add consumers of this functionality, so nothing should
really change yet, since the field is still set to 0 (default) on initiator
and not actively used on target. Those are to be implemented separately.
I've confirmed priority working on WD Red SATA disks connected via mpr(4)
and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4).
While there, added missing tag_action support to ocs_fc(4) initiator role.
MFC after: 1 month
Relnotes: yes
Sponsored by: iXsystems, Inc.
Instead, add arguments to vmapbuf. Since this argument is
always a pointer use a type of void * and cast to vm_offset_t in
vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to
take a pointer and check its bounds.)
In no other situtation does b_data contain a user pointer and vmapbuf
replaces b_data with the actual mapping.
Suggested by: jhb
Reviewed by: imp, jhb
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26784
Sometimes, this drive will be present in the system such that the the
firmware identification string doesn't start with ATA, such as when
it's behind a SATA-to-SAS interposer. Add another quirk for that.
Submitted by: github user mr44er
Github PR: 423
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
PR: 249395
Reported by: Trenton Schulz <trueos@norwegianrockcat.com>
Reviewed by: scottl
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26550
This is a fix to r334065.
Without this change I once got stuck I/O with endless partition switching:
(sdda0:aw_mmc_sim2:0:0:0): sddastart
(sdda0:aw_mmc_sim2:0:0:0): Partition 0 -> -525703168
(sdda0:aw_mmc_sim2:0:0:0): xpt_action: func 0x91d XPT_MMC_IO
(sdda0:aw_mmc_sim2:0:0:0): xpt_done: func= 0x91d XPT_MMC_IO status 0x1
(sdda0:aw_mmc_sim2:0:0:0): sddadone
(sdda0:aw_mmc_sim2:0:0:0): Card status: 00000000
(sdda0:aw_mmc_sim2:0:0:0): Current state: 4
(sdda0:aw_mmc_sim2:0:0:0): Compteting partition switch to 0
Note that -525703168 (an int) is 0xe0aa6800 in binary representation.
The partition indexes are actually stored as uint8_t, so that value
was converted / truncated to zero.
MFC after: 1 week
cam_sim_free(), cam_sim_release(), and cam_sim_hold() all assign
a mtx variable during declaration and then if NULL or the mtx is
held may re-asign the variable and/or acquire/release a lock.
Harmonize the code, avoiding double assignments and make it look
the same for all three function (with cam_sim_free() not needing
an extra case).
No functional changes intended.
Reviewed by: imp; no-objections by: mav
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26286
It allows to report GEOM::lunid for nda(4) same as for nvd(4). Since
NVMe now allows multiple LUs (namespaces) with multiple paths unique
LU identification is important. The serial_num field is filled same
as before with the controller serial number, while device_id is based
on namespace GUID and/or EUI64 fields as recommended by "NVM Express:
SCSI Translation Reference" and matching nvd(4) at the end.
MFC after: 1 week
In case we are failing to allocate mmcdata, we are returning with
a softc allocated but not attached to anything and thus leak the
memory.
Reviewed by: scottl
MFC after: 2 weeks
X-MFC: only if we also mfc other mmccam changes?
Differential Revision: https://reviews.freebsd.org/D25987
It allows to report to initiator LU identifying information, preset via
"ident_info" and "text_ident_info" options.
Unfortunately it is impossible to implement SET IDENTIFYING INFORMATION,
since we have no persistent storage it requires, so the information is
read-only for initiator and has to be set out-of-band.
MFC after: 1 week
Sponsored by: iXsystems, Inc.