Commit Graph

290 Commits

Author SHA1 Message Date
Kenneth D. Merry
e26059ca18 Fix FC-Tape bugs caused in part by r345008.
The point of r345008 was to reset the Command Reference Number (CRN)
in some situations where a device stayed in the topology, but had
changed somehow.

This can include moving from a switch connection to a direct
connection or vice versa, or a device that temporarily goes away
and comes back.  (e.g. moving to a different switch port)

There were a couple of bugs in that change:
- We were reporting that a device had not changed whenever the
  Establish Image Pair bit was not set.  That is not quite correct.
  Instead, if the Establish Image Pair bit stays the same (set or
  not), the device hasn't changed in that way.

- We weren't setting PRLI Word0 in the port database when a new
  device arrived, so comparisons with the old value for the
  Establish Image Pair bit weren't really possible.  So, make sure
  PRLI Word0 is set in the port database for new devices.

- We were resetting the CRN whenever the Establish Image Pair bit
  was set for a device, even when the device had stayed the same
  and the value of the bit hadn't changed.  Now, only reset the
  CRN for devices that have changed, not devices that sayed the
  same.

The result of all of this was that if we had a single FC device on
an FC port and it went away and came back, we would wind up
correctly resetting the CRN.

But, if we had multiple devices connected via a switch, and there
was any change in one or more of those devices, all of the devices
that stayed the same would also have their CRN values reset.

The result, from a user standpoint, is that the tape drives, etc.
would all start to time out commands and the initiator would send
aborts.

sys/dev/isp/isp.c:
	In isp_pdb_add_update(), look at whether the Establish
	Image Pair bit has changed as part of the check to
	determine whether a device is still the same.   This was
	causing erroneous change notifications.  Also, when
	creating a new port database entry, initialize the
	PRLI Word 0 values.

sys/dev/isp/isp_freebsd.c:
	In isp_async(), in the changed/stayed case, instead of
	looking at the Establish Image Pair bit to determine
	whether to reset the CRN, look at the command value.
	(Changed vs. Stayed.)  Only reset the CRN for devices
	that have changed.

Sponsored by:	Spectra Logic
MFC after:	3 days
2019-05-24 17:58:29 +00:00
Kenneth D. Merry
6f9dbc0e6e Fix CRN resets in the isp(4) driver in certain situations.
The Command Reference Number (CRN) is part of the FC-Tape features
that we enable when talking to tape drives.  It starts at 1, and
goes to 255 and wraps around to 1.  There are a number of reset
type conditions that result in the CRN getting reset to 1.  These
are detailed in section 4.10 (table 8) of the FCP-4r02b specification.

One of the conditions is when a PRLI (Process Login) is sent by
the initiator, and the Establish Image Pair bit is set in Word 0
of the PRLI.

Previously, the isp(4) driver core sent a notification via
isp_async() that the target had changed or stayed in place, but
there was no indication of whether a PRLI was sent and whether the
Establish Image Pair bit was set.

The result of this was that in some situations, notably
switching back and forth between a direct connection and a switch
connection to a tape drive, the isp(4) driver would fail to reset
the CRN in situations that require it according to the spec.  When
the CRN isn't reset in a situation that requires it, the tape drive
then rejects every subsequent command that is sent to the drive.
It is assuming that the commands are being sent out of order.

So, modify the isp(4) driver to include Word 0 of the PRLI command
when it sends isp_async() notifications of target changes.  Look at
the Establish Image Pair bit, and reset the CRN if that bit is set.

With this change, I am able to switch a tape drive back and forth
between a direct connection and a switch connection, and the isp(4)
driver resets the CRN when it should.

sys/dev/isp_stds.h:
	Add bit definitions for PRLI Word 0.

sys/dev/ispmbox.h:
	Add PRLI Word 0 to the port database type, isp_pdb_t.

sys/dev/ispvar.h
	Add PRLI Word 0 to fcportdb_t.

sys/dev/isp.c:
	Populate the new prli_word0 parameter in the port database.

	In isp_pdb_add_update(), add a check to see if the
	Establish Image Pair bit is set in PRLI Word 0.  If it is,
	then that is an additional reason to create a change
	notification.

sys/dev/isp_freebsd.c:
	In isp_async(), if the device changed or stayed, look at
	PRLI Word 0 to see if the Establish Image Pair bit is set.
	If it is, reset the CRN if we haven't already.

MFC after:	1 week
Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D19472
2019-03-11 14:21:14 +00:00
Alexander Motin
db08ef4353 Increase ABOUT FIRMWARE command timeout to 5s.
It seems default timeout of 100ms is not enough for my 2694L card,
while it was perfectly fine for others, even for full-height 2694.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-03-15 01:07:21 +00:00
Alexander Motin
14e084ada5 Add support for Enhanced Gen 5 (16Gb) and Gen 6 (32Gb) QLogic FC HBAs.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-02-28 16:24:32 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Kenneth D. Merry
fefd924a01 Remove duplicate assignments from r321622.
Submitted by:	mav
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-07-27 15:51:56 +00:00
Kenneth D. Merry
a0acb3512c Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.

The intent of that change was to keep loop IDs persistent across
chip reinits.

The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address.  It also turned off soft addressing on 2400
(4Gb) and newer controllers.

The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on.  So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.

The solution is to largely revert that change.  I left some elements
in place that are related to virtual ports, since they were new.

This does work with IBM tape drives with hard and soft addressing
turned on.  I have tested it with 4Gb, 8Gb, and 16Gb controllers.

sys/dev/isp.c:
	Largely revert FreeBSD SVN change 289937.  I left the
	ispmbox.h changes in place.

	Don't use the PREV_ADDRESS bit on initialization.  It tells
	the chip to not participate if it can't get the requested
	loop ID.

	Do use soft addressing on 2400 and newer chips.

	Use hard addressing when the user has requested a specific
	initiator ID.  (hint.isp.X.iid=N in /boot/loader.conf)

	Leave some of the virtual port options from that change in
	place, but don't turn on the PREV_ADDRESS bit.

Reviewed by:	mav
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-07-27 15:33:57 +00:00
Alexander Motin
ae77193109 "Port Type not registered" is not a real error for GIT_PT. 2017-07-10 06:25:30 +00:00
Alexander Motin
a94fab67bb Switch fabric scans from GID_FT to GID_PT+GFF_ID/GFT_ID.
Instead of using GID_FT SNS request to get list of registered FCP ports,
use GID_PT to get list of all Nx_Ports, and then use GFF_ID and/or GFT_ID
requests to find whether they are FCP and target capable.

The problem with old approach is that GID_FT does not report ports without
FC-4 type registered.  In particular it was impossible to boot OS from
FreeBSD FC target using QLogic FC BIOS, since one does not register FC-4
type even on new cards and so ignored by old code as incompatible.

As a side bonus this allows initiator to skip pointless logins to other
initiators by fetching that information from SNS instead.

In case some switches do not implement GFF_ID/GFT_ID correctly, add sysctls
to disable that functionality.  I handled broken GFF_ID of my Brocade 200E,
but there may be other switches with different bugs.

Linux also uses GID_PT, but GFF_ID is disabled by default there, and GFT_ID
is not supported.

Sponsored by:	iXsystems, Inc.
2017-07-03 15:56:45 +00:00
Alexander Motin
9cf87855b1 Move comment respecting previous commit. 2017-07-02 15:08:32 +00:00
Alexander Motin
3d792e6080 Slightly unify SNS requests for post- and pre-24xx. 2017-07-02 14:59:41 +00:00
Kenneth D. Merry
57b6261f94 Correct loop mode CRN resets to adhere to FCP-4 section 4.10
Prior to this change, the CRN (Command Reference Number) is reset on any
firmware LIP, LOOP DOWN, or LOOP RESET event in violation of FCP-4 which
specifies that the CRN should only be reset in response to a LIP Reset
(LIPyx) primitive. FCP-4 also indicates PLOGI/LOGO and PRLI/PRLO ELS
actions as conditions for resetting the CRN for the associated initiator
port.

These violations manifest themselves when the HBA is removed from the
loop, or a target device is removed (especially during an outstanding
command) without power cycling. If the HBA and and the target device
determine upon re-establishing the loop that no PLOGI or PRLI is
required, and the target does not issue a LIPxy to the initiator, the
CRN for the target will have been improperly reset by the isp driver. As
a result, the target port will silently ignore all FCP commands issued
during the device probe (which will time out) preventing the device from
attaching.

This change corrects thie CRN reset behavior in response to loop state
changes, also introduces CRN resets for the above mentioned ELS actions
as encountered through async PDB change events.

This change also adds cleanup of outstanding commands in isp_loop_dead()
that was previously missing.

sys/dev/isp/isp.c
	Add the last login state to debug output when syncing the pdb

sys/dev/isp/isp_freebsd.c
	Replace binary statement setting aborted ccb status in
	isp_watchdog() with the XS_SETERR macro used elsewhere

	In isp_loop_dead(), abort or complete pending commands as done
	in isp_watchdog()

	In isp_async(), segregate the ISPASYNC_LOOP_RESET action from
	ISPASYNC_LIP, ISPASYNC_LOOP_DOWN, and ISPASYNC_LOOP_UP
	fallthroughs, and only reset the CRN in the RESET case. Also add
	checks to handle false LOOP RESET actions that do not have a
	proper associated LIP primitive, and log the primitive in the
	debug messages

	In isp_async(), remove the goto from ISP_ASYNC_DEV_STAYED, and
	only reset the CRN in the DEV_CHANGED action

	In isp_async(), when processing an ISPASYNC_CHANGE_PDB status,
	reset CRN(s) for the associated nphdl (or all ports) if the
	change reason is some form of ELS login/logout. Also remove
	assignment to fc since it is not used in the scope

sys/dev/isp/ispmbox.h
	Add macro definition for the global N-Port handle, and correct a
	macro typo 'PDB24XX_AE_PRLI_DONJE'

sys/dev/isp/ispvar.h
	Add macros FCP_AL_DA_ALL, FCP_AL_PA, and FCP_IS_DEST_ALPD for
	more legible code when determining if an AL_PD port matches the
	portid for a given struct fcparam* by value or by virtue of the
	AL_PD port being 0xFF

Submitted by:	Reid Linnemann
Sponsored by:	Spectra Logic
MFC after:	1 week
2017-05-03 13:17:01 +00:00
Alexander Motin
1c779b2849 Switch isp_reset to scratchpad not requiring ISP_MBOXDMASETUP.
MFC after:	1 week
2017-04-24 10:16:12 +00:00
Alexander Motin
e9da70a35e Fix few minor issues found by Clang Analyzer.
MFC after:	2 weeks
2017-04-09 07:53:31 +00:00
Alexander Motin
2d24b6af63 Cleanup response queue processing.
MFC after:	2 weeks
2017-03-22 08:56:03 +00:00
Alexander Motin
31c161a615 Improve command timeout handling.
Let firmware do its best first, and if it can't, try software recovery.
I would remove software timeout handler completely, but found bunch of
complains on command timeout on sparc64 mailing list few years ago, so
better be safe in case of interrupt loss.

MFC after:	2 weeks
2017-03-21 13:10:37 +00:00
Alexander Motin
01728721bf Remove questionable reqp->req_time access.
MFC after:	2 weeks
2017-03-21 11:26:31 +00:00
Alexander Motin
9abc1e2b0c Remove some useless code.
MFC after:	2 weeks
2017-03-19 21:25:13 +00:00
Alexander Motin
08826086fe Add initial support for multiple MSI-X vectors.
For 24xx and above use 2 vectors (default and response queue).
For 26xx and above use 3 vectors (default, response and ATIO queues).
Due to global lock interrupt hardlers never run simultaneously now, but
at least this allows to save one regitster read per interrupt.

MFC after:	2 weeks
2017-03-19 19:11:40 +00:00
Alexander Motin
9c81a61ee1 Remove hackish code delaying ATIOs to unknown virtual port.
Since we support RQSTYPE_RPT_ID_ACQ, that functionality is only useful
in loop mode, which probably doesn't worth having this hack in 2017.

MFC after:	2 weeks
2017-03-19 13:46:11 +00:00
Alexander Motin
98b08fbea5 Remove dead remnants of SPI target.
MFC after:	2 weeks
2017-03-18 15:42:22 +00:00
Alexander Motin
0e6bc811e4 Refactor interrupt handling.
Instead of single isp_intr() function doing all possible magic, introduce
four different functions to handle mailbox operation completions, async
events, response and ATIO queues.  The goal is to isolate different code
paths to make code more readable, and to make easier support for multiple
interrupt vectors.  Even oldest hardware in many cases can identify what
code path it should run on interrupt.  Contemporary hardware can assign
them to different interrupt vectors.

MFC after:	2 weeks
2017-03-15 14:58:29 +00:00
Alexander Motin
9c2e9bcfbe Remove some dead/broken code paths around async handling
MFC after:	2 weeks
2017-03-14 18:42:33 +00:00
Alexander Motin
6327b0d287 Remove tangled isp_mbox_continue() mechanism.
It was implemented to reduce context switches when uploading firmware to
card's RAM.  But this mechanism is not used last 10 years since all mbox
operations are now polled, and it was never used for cards produced in
last 15 years.  Newer cards can use DMA to upload firmware.

MFC after:	2 weeks
2017-03-14 17:34:44 +00:00
Alexander Motin
0cbfd9bb18 Remove dangerous and questionable isp_mboxcmd_qnw() call.
MFC after:	2 weeks
2017-03-14 08:45:33 +00:00
Alexander Motin
a1fa02673a Improvements around attach, reset and detach.
This change fixes DMA resource leak on driver unload.  Also it removes
DMA resources allocation for hardcoded number of requests before fetching
the real number from firmware.  Also it prepares ground for more flexible
IRQs allocation according to firmware capabilities.

MFC after:	2 weeks
2017-03-14 08:03:56 +00:00
Alexander Motin
ab23521a49 Try to slight untangle I/O and loop status handling.
MFC after:	2 weeks
2017-03-12 15:36:07 +00:00
Alexander Motin
229203af68 Remove code for unsupported FreeBSD versions.
MFC after:	2 weeks
2017-03-12 14:17:57 +00:00
Alexander Motin
3f072d6948 Send TERMINATE to firmware when aborting active ATIO.
MFC after:	2 weeks
2017-02-27 08:20:28 +00:00
Alexander Motin
2d4a5bcccb Return better error code in case of too long CDB.
Its more important for SPI HBAs, as they don't support CDBs above 12 bytes.
The new error code makes CAM to fall back to alternative commands.

MFC after:	2 weeks
2017-02-26 14:29:09 +00:00
Kenneth D. Merry
28ef82eb7b Change the isp(4) driver to not adjust the tag type for REQUEST SENSE.
The isp(4) driver was changing the tag type for REQUEST SENSE
commands to Head of Queue, when the CAM CCB flag
CAM_TAG_ACTION_VALID was NOT set.  CAM_TAG_ACTION_VALID is set
when the tag action in the XPT_SCSI_IO is not CAM_TAG_ACTION_NONE
and when the target has tagged queueing turned on.

In most cases when CAM_TAG_ACTION_VALID is not set, it is because
the target is not doing tagged queueing.  In those cases, trying to
send a Head of Queue tag may cause problems.  Instead, default to
sending a simple tag.

IBM tape drives claim to support tagged queueing in their standard
Inquiry data, but have the DQue bit set in the control mode page
(mode page 10).  CAM correctly detects that these drives do not
support tagged queueing, and clears the CAM_TAG_ACTION_VALID flag
on CCBs sent down to the drives.

This caused the isp(4) driver to go down the path of setting the
tag action to a default value, and for Request Sense commands only,
set the tag action to Head of Queue.

If an IBM tape drive does get a Head of Queue tag, it rejects it with
Invalid Message Error (0x49,0x00).  (The Qlogic firmware translates that
to a Transport Error, which the driver translates to an Unrecoverable
HBA Error, or CAM_UNREC_HBA_ERROR.) So, by default, it wasn't possible
to get a good response from a REQUEST SENSE to an FC-attached IBM
tape drive with the isp(4) driver.

IBM tape drives (tested on an LTO-5 with G9N1 firmware and a TS1150
with 4470 firmware) also have a bug in that sending a command with a
non-simple tag attribute breaks the tape drive's Command Reference
Number (CRN) accounting and causes it to ignore all subsequent
commands because it and the initiator disagree about the next
expected CRN.  The drives do reject the initial command with a head
of queue tag with an Invalid Message Error (0x49,0x00), but after that
they ignore any subsequent commands.  IBM confirmed that it is a bug,
and sent me test firmware that fixes the bug.  However tape drives in
the field will still exhibit the bug until they are upgraded.

Request Sense is not often sent to targets because most errors are
reported automatically through autosense in Fibre Channel and other
modern transports.  ("Modern" meaning post SCSI-2.)  So this is not
an error that would crop up frequently.  But Request Sense is useful on
tape devices to report status information, aside from error reporting.

This problem is less serious without FC-Tape features turned on,
specifically precise delivery of commands (which enables Command
Reference Numbers), enabled on the target and initiator.  Without
FC-Tape features turned on, the target would return an error and
things would continue on.

And it also does not cause problems for targets that do tagged
queueing, because in those cases the isp(4) driver just uses the
tag type that is specified in the CCB, assuming the
CAM_TAG_ACTION_VALID flag is set, and defaults to sending a Simple
tag action if it isn't an ordered or head of queue tag.

sys/dev/isp/isp.c:
	In isp_start(), don't try to send Request Sense commands
	with the Head of Queue tag attribute if the CCB doesn't
	have a valid tag action.  The tag action likely isn't valid
	because the target doesn't support tagged queueing.

Sponsored by:	Spectra Logic
MFC after:	3 days
2017-02-10 22:02:45 +00:00
Conrad Meyer
db4fcadf52 "Buses" is the preferred plural of "bus"
Replace archaic "busses" with modern form "buses."

Intentionally excluded:
* Old/random drivers I didn't recognize
  * Old hardware in general
* Use of "busses" in code as identifiers

No functional change.

http://grammarist.com/spelling/buses-busses/

PR:		216099
Reported by:	bltsrc at mail.ru
Sponsored by:	Dell EMC Isilon
2017-01-15 17:54:01 +00:00
Alexander Motin
514a71eba7 Fix delaying requests to unknown virtual ports 2s after init.
This code was originally implemented 7 years ago, but never really worked
due to trivial error.  I think this functionality may be not required.
Initiators supporting optional periodic command status checks detected
those terminated commands and retried them 3 seconds later.  But thinking
about less featured initiators and the fact that it is our race makes
virtual ports "unknown" it may be good to have this feature.
2016-05-19 17:48:56 +00:00
Alexander Motin
0bd83292ef Add IOCB debugging for ISPCTL_RESET_DEV and ISPCTL_ABORT_CMD. 2016-05-19 16:53:53 +00:00
Alexander Motin
3a82d79d30 Make RQCS_PORT_LOGGED_OUT for ZOMBIE ports retriable.
It is normal for ZOMBIE ports to be logged out.  This status is not really
an error until Gone Device Timeout expires, so make CAM retry after delay.

MFC after:	1 week
2016-05-17 15:12:57 +00:00
Alexander Motin
5fa351ed89 Completely remove broken now autologin port flag.
Firmware automatically logs in only to local loop ports, and those ports
can be easily identified without extra flag by zero domain and area IDs.

MFC after:	1 week
2016-05-17 13:18:57 +00:00
Alexander Motin
b2699a1bae No need to check login status for ZOMBIE ports.
ZOMBIE ports are always logged out, and so initiator may try to relogin.

MFC after:	1 weeks
2016-05-16 16:44:34 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Alexander Motin
50f2c01d32 Simplify memory allocation for NS requests.
Since we no longer need additional buffers for request and response IOCBs,
we can increase receive space by 192 bytes, that is enough for fetching 48
more ports.  The new limit is 1020 fabric ports per virtual port.

MFC after:	1 month
2016-04-16 06:36:56 +00:00
Alexander Motin
212fad7469 Extract virtual port address from RQSTYPE_RPT_ID_ACQ.
This should close the race between request arriving on new target mode
virtual port and its scanner thread finally fetch its address for request
routing.
2016-04-14 20:49:01 +00:00
Alexander Motin
ace7039eaa Filter Port Database Changed notifications.
For some reason firmware sends Port Database Changed notifications in case
of explicit login requests from the driver when target port is unavailabe.
Those notifications don't give driver any new information, but only cause
infinite scan loop.
2016-04-13 10:35:17 +00:00
Alexander Motin
5f2638dabb Respect NVRAM topology settings on 24xx and above chips. 2016-04-13 07:04:04 +00:00
Alexander Motin
1521935737 Make all CT Pass-Through (name server requests) asynchronous.
Previously we had to do it synchronously because we could not drop the lock
due to potential scratch memory use conflicts.  Previous commits fixed that
collision, so here it goes -- slower and less reliable external requests
are executed asynchronously without spinning in tight loop and with more
safe timeout handling.
2016-04-12 18:50:37 +00:00
Alexander Motin
e3188c2f31 Switch isp_getpdb() to synchronous IOCB DMA area.
While technically it is not IOCB, it is synchronous and can be called from
different places, so calling FC_SCRATCH_ACQUIRE() here is inconvenient.
2016-04-12 14:43:17 +00:00
Alexander Motin
4ff970c462 Allocate separate DMA area for synchronous IOCB execution.
Usually IOCBs should be put on queue for asynchronous processing and should
not require additional DMA memory.  But there are some cases like aborts and
resets that for external reasons has to be synchronous.  Give those cases
separate 2*64 byte DMA area to decouple them from other DMA scratch area
users, using it for asynchronous requests.
2016-04-12 14:19:19 +00:00
Alexander Motin
003c82d713 Add couple missing memory barriers. 2016-04-12 11:48:50 +00:00
Alexander Motin
5e3e6a8241 Polish debugging IOCB dumping.
Add few more missing cases, unify byte order.

MFC after:	1 month
2016-04-11 10:48:26 +00:00
Alexander Motin
7e53e7accc Register symbolic port/node names in FC name server.
This is cosmetics that simplifies identification of new ports on FC switch.

It would be good to use target name from CTL here instead of hostname, but
it is not passed here through CAM now.

MFC after:	2 weeks
2016-04-09 14:50:47 +00:00
Alexander Motin
06c5618335 Reduce code duplication when executing Passthrough IOCB.
MFC after:	2 weeks
2016-04-09 11:54:09 +00:00
Alexander Motin
5d084976cb Allocate separate scratch space for scanner purposes.
This space does not require DMA syncing. It reduces lock scope of the DMA
scratch space.  It allows whole DMA scratch space to be used to I/O, so now
we can fetch up to ~1000 ports from SNS.

Due to the last fact, increase maximal number of ports from 256 to 1024.
2015-12-27 06:28:31 +00:00