Instead of using GID_FT SNS request to get list of registered FCP ports,
use GID_PT to get list of all Nx_Ports, and then use GFF_ID and/or GFT_ID
requests to find whether they are FCP and target capable.
The problem with old approach is that GID_FT does not report ports without
FC-4 type registered. In particular it was impossible to boot OS from
FreeBSD FC target using QLogic FC BIOS, since one does not register FC-4
type even on new cards and so ignored by old code as incompatible.
As a side bonus this allows initiator to skip pointless logins to other
initiators by fetching that information from SNS instead.
In case some switches do not implement GFF_ID/GFT_ID correctly, add sysctls
to disable that functionality. I handled broken GFF_ID of my Brocade 200E,
but there may be other switches with different bugs.
Linux also uses GID_PT, but GFF_ID is disabled by default there, and GFT_ID
is not supported.
Sponsored by: iXsystems, Inc.
For ATIOs it is pointless to report isp_loopid to CAM, since in other
places it operates with port database record IDs, not with loop IDs.
For INOTs target_id/target_lun seems were never set, so wildcard INOTs
probably were not working correctly when LUN IDs were important.
Prior to this change, the CRN (Command Reference Number) is reset on any
firmware LIP, LOOP DOWN, or LOOP RESET event in violation of FCP-4 which
specifies that the CRN should only be reset in response to a LIP Reset
(LIPyx) primitive. FCP-4 also indicates PLOGI/LOGO and PRLI/PRLO ELS
actions as conditions for resetting the CRN for the associated initiator
port.
These violations manifest themselves when the HBA is removed from the
loop, or a target device is removed (especially during an outstanding
command) without power cycling. If the HBA and and the target device
determine upon re-establishing the loop that no PLOGI or PRLI is
required, and the target does not issue a LIPxy to the initiator, the
CRN for the target will have been improperly reset by the isp driver. As
a result, the target port will silently ignore all FCP commands issued
during the device probe (which will time out) preventing the device from
attaching.
This change corrects thie CRN reset behavior in response to loop state
changes, also introduces CRN resets for the above mentioned ELS actions
as encountered through async PDB change events.
This change also adds cleanup of outstanding commands in isp_loop_dead()
that was previously missing.
sys/dev/isp/isp.c
Add the last login state to debug output when syncing the pdb
sys/dev/isp/isp_freebsd.c
Replace binary statement setting aborted ccb status in
isp_watchdog() with the XS_SETERR macro used elsewhere
In isp_loop_dead(), abort or complete pending commands as done
in isp_watchdog()
In isp_async(), segregate the ISPASYNC_LOOP_RESET action from
ISPASYNC_LIP, ISPASYNC_LOOP_DOWN, and ISPASYNC_LOOP_UP
fallthroughs, and only reset the CRN in the RESET case. Also add
checks to handle false LOOP RESET actions that do not have a
proper associated LIP primitive, and log the primitive in the
debug messages
In isp_async(), remove the goto from ISP_ASYNC_DEV_STAYED, and
only reset the CRN in the DEV_CHANGED action
In isp_async(), when processing an ISPASYNC_CHANGE_PDB status,
reset CRN(s) for the associated nphdl (or all ports) if the
change reason is some form of ELS login/logout. Also remove
assignment to fc since it is not used in the scope
sys/dev/isp/ispmbox.h
Add macro definition for the global N-Port handle, and correct a
macro typo 'PDB24XX_AE_PRLI_DONJE'
sys/dev/isp/ispvar.h
Add macros FCP_AL_DA_ALL, FCP_AL_PA, and FCP_IS_DEST_ALPD for
more legible code when determining if an AL_PD port matches the
portid for a given struct fcparam* by value or by virtue of the
AL_PD port being 0xFF
Submitted by: Reid Linnemann
Sponsored by: Spectra Logic
MFC after: 1 week
Let firmware do its best first, and if it can't, try software recovery.
I would remove software timeout handler completely, but found bunch of
complains on command timeout on sparc64 mailing list few years ago, so
better be safe in case of interrupt loss.
MFC after: 2 weeks
For 24xx and above use 2 vectors (default and response queue).
For 26xx and above use 3 vectors (default, response and ATIO queues).
Due to global lock interrupt hardlers never run simultaneously now, but
at least this allows to save one regitster read per interrupt.
MFC after: 2 weeks
Since we support RQSTYPE_RPT_ID_ACQ, that functionality is only useful
in loop mode, which probably doesn't worth having this hack in 2017.
MFC after: 2 weeks
There were two copies of the code: one in generic code was half-broken, and
another in platform code was never called. Leave only one in generic code
and working.
MFC after: 2 weeks
Instead of single isp_intr() function doing all possible magic, introduce
four different functions to handle mailbox operation completions, async
events, response and ATIO queues. The goal is to isolate different code
paths to make code more readable, and to make easier support for multiple
interrupt vectors. Even oldest hardware in many cases can identify what
code path it should run on interrupt. Contemporary hardware can assign
them to different interrupt vectors.
MFC after: 2 weeks
It was implemented to reduce context switches when uploading firmware to
card's RAM. But this mechanism is not used last 10 years since all mbox
operations are now polled, and it was never used for cards produced in
last 15 years. Newer cards can use DMA to upload firmware.
MFC after: 2 weeks
This change fixes DMA resource leak on driver unload. Also it removes
DMA resources allocation for hardcoded number of requests before fetching
the real number from firmware. Also it prepares ground for more flexible
IRQs allocation according to firmware capabilities.
MFC after: 2 weeks
Its more important for SPI HBAs, as they don't support CDBs above 12 bytes.
The new error code makes CAM to fall back to alternative commands.
MFC after: 2 weeks
This allows to properly handle cases when target wants to receive or send
more data then initiator wants to send or receive. Previously in such
cases isp(4) returned CAM_DATA_RUN_ERR, while now it returns resid > 0.
MFC after: 2 weeks