Add some somewhat vague documentation for this driver and a list
of Hardware that might, in fact, work.
This commit is contained in:
parent
9fcab64afc
commit
94ad0dbb03
634
sys/dev/isp/DriverManual.txt
Normal file
634
sys/dev/isp/DriverManual.txt
Normal file
@ -0,0 +1,634 @@
|
||||
/* $FreeBSD$ */
|
||||
|
||||
Driver Theory of Operation Manual
|
||||
|
||||
1. Introduction
|
||||
|
||||
This is a short text document that will describe the background, goals
|
||||
for, and current theory of operation for the joint Fibre Channel/SCSI
|
||||
HBA driver for QLogic hardware.
|
||||
|
||||
Because this driver is an ongoing project, do not expect this manual
|
||||
to remain entirely up to date. Like a lot of software engineering, the
|
||||
ultimate documentation is the driver source. However, this manual should
|
||||
serve as a solid basis for attempting to understand where the driver
|
||||
started and what is trying to be accomplished with the current source.
|
||||
|
||||
The reader is expected to understand the basics of SCSI and Fibre Channel
|
||||
and to be familiar with the range of platforms that Solaris, Linux and
|
||||
the variant "BSD" Open Source systems are available on. A glossary and
|
||||
a few references will be placed at the end of the document.
|
||||
|
||||
There will be references to functions and structures within the body of
|
||||
this document. These can be easily found within the source using editor
|
||||
tags or grep. There will be few code examples here as the code already
|
||||
exists where the reader can easily find it.
|
||||
|
||||
2. A Brief History for this Driver
|
||||
|
||||
This driver originally started as part of work funded by NASA Ames
|
||||
Research Center's Numerical Aerodynamic Simulation center ("NAS" for
|
||||
short) for the QLogic PCI 1020 and 1040 SCSI Host Adapters as part of my
|
||||
work at porting the NetBSD Operating System to the Alpha architectures
|
||||
(specifically the AlphaServer 8200 and 8400 platforms). In short, it
|
||||
started just as simple single SCSI HBA driver for just the purpose of
|
||||
running off a SCSI disk. This work took place starting in January, 1997.
|
||||
|
||||
Because the first implementation was for NetBSD, which runs on a very
|
||||
large number of platforms, and because NetBSD supported both systems with
|
||||
SBus cards (e.g., Sun SPARC systems) as well as systems with PCI cards,
|
||||
and because the QLogic SCSI cards came in both SBus and PCI versions, the
|
||||
initial implementation followed the very thoughtful NetBSD design tenet
|
||||
of splitting drivers into what are called MI (for Machine Independent)
|
||||
and MD (Machine Dependent) portions. The original design therefore was
|
||||
from the premise that the driver would drive both SBus and PCI card
|
||||
variants. These busses are similar but have quite different constraints,
|
||||
and while the QLogic SBus and PCI cards are very similar, there are some
|
||||
significant differences.
|
||||
|
||||
After this initial goal had been met, there began to be some talk about
|
||||
looking into implementing Fibre Channel mass storage at NAS. At this time
|
||||
the QLogic 2100 FC/AL HBA was about to become available. After looking at
|
||||
the way it was designed I concluded that it was so darned close to being
|
||||
just like the SCSI HBAs that it would be insane to *not* leverage off of
|
||||
the existing driver. So, we ended up with a driver for NetBSD that drove
|
||||
PCI and SBus SCSI cards, and now also drove the QLogic 2100 FC-AL HBA.
|
||||
|
||||
After this, ports to non-NetBSD platforms became interesting as well.
|
||||
This took the driver out of the interest with NAS and into interested
|
||||
support from a number of other places. Since the original NetBSD
|
||||
development, the driver has been ported to FreeBSD, OpenBSD, Linux,
|
||||
Solaris, and two proprietary systems. Following from the original MI/MD
|
||||
design of NetBSD, a rather successful attempt has been made to keep the
|
||||
Operating System Platform differences segregated and to a minimum.
|
||||
|
||||
Along the way, support for the 2200 as well as full fabric and target
|
||||
mode support has been added, and 2300 support as well as an FC-IP stack
|
||||
are planned.
|
||||
|
||||
3. Driver Design Goals
|
||||
|
||||
The driver has not started out as one normally would do such an effort.
|
||||
Normally you design via top-down methodologies and set an intial goal
|
||||
and meet it. This driver has had a design goal that changes from almost
|
||||
the very first. This has been an extremely peculiar, if not risque,
|
||||
experience. As a consequence, this section of this document contains
|
||||
a bit of "reconstruction after the fact" in that the design goals are
|
||||
as I perceive them to be now- not necessarily what they started as.
|
||||
|
||||
The primary design goal now is to have a driver that can run both the
|
||||
SCSI and Fibre Channel SCSI prototocols on multiple OS platforms with
|
||||
as little OS platform support code as possible.
|
||||
|
||||
The intended support targets for SCSI HBAs is to support the single and
|
||||
dual channel PCI Ultra2 and PCI Ultra3 cards as well as the older PCI
|
||||
Ultra single channel cards and SBus cards.
|
||||
|
||||
The intended support targets for Fibre Channel HBAs is the 2100, 2200
|
||||
and 2300 PCI cards.
|
||||
|
||||
Fibre Channel support should include complete fabric and public loop
|
||||
as well as private loop and private loop, direct-attach topologies.
|
||||
FC-IP support is also a goal.
|
||||
|
||||
For both SCSI and Fibre Channel, simultaneous target/initiator mode support
|
||||
is a goal.
|
||||
|
||||
Pure, raw, performance is not a primary goal of this design. This design,
|
||||
because it has a tremendous amount of code common across multiple
|
||||
platforms, will undoubtedly never be able to beat the performance of a
|
||||
driver that is specifically designed for a single platform and a single
|
||||
card. However, it is a good strong secondary goal to make the performance
|
||||
penalties in this design as small as possible.
|
||||
|
||||
Another primary aim, which almost need not be stated, is that the
|
||||
implementation of platform differences must not clutter up the common
|
||||
code with platform specific defines. Instead, some reasonable layering
|
||||
semantics are defined such that platform specifics can be kept in the
|
||||
platform specific code.
|
||||
|
||||
4. QLogic Hardware Architecture
|
||||
|
||||
In order to make the design of this driver more intelligible, some
|
||||
description of the Qlogic hardware architecture is in order. This will
|
||||
not be an exhaustive description of how this card works, but will
|
||||
note enough of the important features so that the driver design is
|
||||
hopefully clearer.
|
||||
|
||||
4.1 Basic QLogic hardware
|
||||
|
||||
The QLogic HBA cards all contain a tiny 16-bit RISC-like processor and
|
||||
varying sizes of SRAM. Each card contains a Bus Interface Unit (BIU)
|
||||
as appropriate for the host bus (SBus or PCI). The BIUs allow access
|
||||
to a set of dual-ranked 16 bit incoming and outgoing mailbox registers
|
||||
as well as access to control registers that control the RISC or access
|
||||
other portions of the card (e.g., Flash BIOS). The term 'dual-ranked'
|
||||
means that at the same host visible address if you write a mailbox
|
||||
register, that is a write to an (incoming, to the HBA) mailbox register,
|
||||
while a read to the same address reads another (outgoing, to the HBA)
|
||||
mailbox register with completely different data. Each HBA also then has
|
||||
core and auxillary logic which either is used to interface to a SCSI bus
|
||||
(or to external bus drivers that connect to a SCSI bus), or to connect
|
||||
to a Fibre Channel bus.
|
||||
|
||||
4.2 Basic Control Interface
|
||||
|
||||
There are two principle I/O control mechanisms by which the driver
|
||||
communicates with and controls the QLogic HBA. The first mechanism is to
|
||||
use the incoming mailbox registers to interrupt and issue commands to
|
||||
the RISC processor (with results usually, but not always, ending up in
|
||||
the ougtoing mailbox registers). The second mechanism is to establish,
|
||||
via mailbox commands, circular request and response queues in system
|
||||
memory that are then shared between the QLogic and the driver. The
|
||||
request queue is used to queue requests (e.g., I/O requests) for the
|
||||
QLogic HBA's RISC engine to copy into the HBA memory and process. The
|
||||
result queue is used by the QLogic HBA's RISC engine to place results of
|
||||
requests read from the request queue, as well as to place notification
|
||||
of asynchronous events (e.g., incoming commands in target mode).
|
||||
|
||||
To give a bit more precise scale to the preceding description, the QLogic
|
||||
HBA has 8 dual-ranked 16 bit mailbox registers, mostly for out-of-band
|
||||
control purposes. The QLogic HBA then utilizes a circular request queue
|
||||
of 64 byte fixed size Queue Entries to receive normal initiator mode
|
||||
I/O commands (or continue target mode requests). The request queue may
|
||||
be up to 256 elements for the QLogic 1020 and 1040 chipsets, but may
|
||||
be quite larger for the QLogic 12X0/12160 SCSI and QLogic 2X00 Fibre
|
||||
Channel chipsets.
|
||||
|
||||
In addition to synchronously initiated usage of mailbox commands by
|
||||
the host system, the QLogic may also deliver asynchronous notifications
|
||||
solely in outgoing mailbox registers. These asynchronous notifications in
|
||||
mailboxes may be things like notification of SCSI Bus resets, or that the
|
||||
Fabric Name server has sent a change notification, or even that a specific
|
||||
I/O command completed without error (this is called 'Fast Posting'
|
||||
and saves the QLogic HBA from having to write a response queue entry).
|
||||
|
||||
The QLogic HBA is an interrupting card, and when servicing an interrupt
|
||||
you really only have to check for either a mailbox interrupt or an
|
||||
interrupt notification that the the response queue has an entry to
|
||||
be dequeued.
|
||||
|
||||
4.3 Fibre Channel SCSI out of SCSI
|
||||
|
||||
QLogic took the approach in introducing the 2X00 cards to just treat
|
||||
FC-AL as a 'fat' SCSI bus (a SCSI bus with more than 15 targets). All
|
||||
of the things that you really need to do with Fibre Channel with respect
|
||||
to providing FC-4 services on top of a Class 3 connection are performed
|
||||
by the RISC engine on the QLogic card itself. This means that from
|
||||
an HBA driver point of view, very little needs to change that would
|
||||
distinguish addressing a Fibre Channel disk from addressing a plain
|
||||
old SCSI disk.
|
||||
|
||||
However, in the details it's not *quite* that simple. For example, in
|
||||
order to manage Fabric Connections, the HBA driver has to do explicit
|
||||
binding of entities it's queried from the name server to specific 'target'
|
||||
ids (targets, in this case, being a virtual entity).
|
||||
|
||||
Still- the HBA firmware does really nearly all of the tedious management
|
||||
of Fibre Channel login state. The corollary to this sometimes is the
|
||||
lack of ability to say why a particular login connection to a Fibre
|
||||
Channel disk is not working well.
|
||||
|
||||
There are clear limits with the QLogic card in managing fabric devices.
|
||||
The QLogic manages local loop devices (LoopID or Target 0..126) itself,
|
||||
but for the management of fabric devices, it has an absolute limit of
|
||||
253 simultaneous connections (256 entries less 3 reserved entries).
|
||||
|
||||
5. Driver Architecture
|
||||
|
||||
5.1 Driver Assumptions
|
||||
|
||||
The first basic assumption for this driver is that the requirements for
|
||||
a SCSI HBA driver for any system is that of a 2 or 3 layer model where
|
||||
there are SCSI target device drivers (drivers which drive SCSI disks,
|
||||
SCSI tapes, and so on), possibly a middle services layer, and a bottom
|
||||
layer that manages the transport of SCSI CDB's out a SCSI bus (or across
|
||||
Fibre Channel) to a SCSI device. It's assumed that each SCSI command is
|
||||
a separate structure (or pointer to a structure) that contains the SCSI
|
||||
CDB and a place to store SCSI Status and SCSI Sense Data.
|
||||
|
||||
This turns out to be a pretty good assumption. All of the Open Source
|
||||
systems (*BSD and Linux) and most of the proprietary systems have this
|
||||
kind of structure. This has been the way to manage SCSI subsystems for
|
||||
at least ten years.
|
||||
|
||||
There are some additional basic assumptions that this driver makes- primarily
|
||||
in the arena of basic simple services like memory zeroing, memory copying,
|
||||
delay, sleep, microtime functions. It doesn't assume much more than this.
|
||||
|
||||
5.2 Overall Driver Architecture
|
||||
|
||||
The driver is split into a core (machine independent) module and platform
|
||||
and bus specific outer modules (machine dependent).
|
||||
|
||||
The core code (in the files isp.c, isp_inline.h, ispvar.h, ispreg.h and
|
||||
ispmbox.h) handles:
|
||||
|
||||
+ Chipset recognition and reset and firmware download (isp_reset)
|
||||
+ Board Initialization (isp_init)
|
||||
+ First level interrupt handling (response retrieval) (isp_intr)
|
||||
+ A SCSI command queueing entry point (isp_start)
|
||||
+ A set of control services accessed either via local requirements within
|
||||
the core module or via an externally visible control entry point
|
||||
(isp_control).
|
||||
|
||||
The platform/bus specific modules (and definitions) depend on each
|
||||
platform, and they provide both definitions and functions for the core
|
||||
module's use. Generally a platform module set is split into a bus
|
||||
dependent module (where configuration is begun from and bus specific
|
||||
support functions reside) and relatively thin platform specific layer
|
||||
which serves as the interconnect with the rest of this platform's SCSI
|
||||
subsystem.
|
||||
|
||||
For ease of bus specific access issues, a centralized soft state
|
||||
structure is maintained for each HBA instance (struct ispsoftc). This
|
||||
soft state structure contains a machine/bus dependent vector (mdvec)
|
||||
for functions that read and write hardware registers, set up DMA for the
|
||||
request/response queues and fibre channel scratch area, set up and tear
|
||||
down DMA mappings for a SCSI command, provide a pointer to firmware to
|
||||
load, and other minor things.
|
||||
|
||||
The machine dependent outer module must provide functional entry points
|
||||
for the core module:
|
||||
|
||||
+ A SCSI command completion handoff point (isp_done)
|
||||
+ An asynchronous event handler (isp_async)
|
||||
+ A logging/printing function (isp_prt)
|
||||
|
||||
The machine dependent outer module code must also provide a set of
|
||||
abstracting definitions which is what the core module utilizes heavily
|
||||
to do its job. These are discussed in detail in the comments in the
|
||||
file ispvar.h, but to give a sense of the range of what is required,
|
||||
let's illustrate two basic classes of these defines.
|
||||
|
||||
The first class are "structure definition/access" class. An
|
||||
example of these would be:
|
||||
|
||||
XS_T Platform SCSI transaction type (i.e., command for HBA)
|
||||
..
|
||||
XS_TGT(xs) gets the target from an XS_T
|
||||
..
|
||||
XS_TAG_TYPE(xs) which type of tag to use
|
||||
..
|
||||
|
||||
The second class are 'functional' class definitions. Some examples of
|
||||
this class are:
|
||||
|
||||
MEMZERO(dst, src) platform zeroing function
|
||||
..
|
||||
MBOX_WAIT_COMPLETE(struct ispsoftc *) wait for mailbox cmd to be done
|
||||
|
||||
Note that the former is likely to be simple replacement with bzero or
|
||||
memset on most systems, while the latter could be quite complex.
|
||||
|
||||
This soft state structure also contains different parameter information
|
||||
based upon whether this is a SCSI HBA or a Fibre Channel HBA (which is
|
||||
filled in by the code module).
|
||||
|
||||
In order to clear up what is undoubtedly a seeming confusion of
|
||||
interconnects, a description of the typical flow of code that performs
|
||||
boards initialization and command transactions may help.
|
||||
|
||||
5.3 Initialization Code Flow
|
||||
|
||||
Typically a bus specific module for a platform (e.g., one that wants
|
||||
to configure a PCI card) is entered via that platform's configuration
|
||||
methods. If this module recognizes a card and can utilize or construct the
|
||||
space for the HBA instance softc, it does so, and initializes the machine
|
||||
dependent vector as well as any other platform specific information that
|
||||
can be hidden in or associated with this structure.
|
||||
|
||||
Configuration at this point usually involves mapping in board registers
|
||||
and registering an interrupt. It's quite possible that the core module's
|
||||
isp_intr function is adequate to be the interrupt entry point, but often
|
||||
it's more useful have a bus specific wrapper module that calls isp_intr.
|
||||
|
||||
After mapping and interrupt registry is done, isp_reset is called.
|
||||
Part of the isp_reset call may cause callbacks out to the bus dependent
|
||||
module to perform allocation and/or mapping of Request and Response
|
||||
queues (as well as a Fibre Channel scratch area if this is a Fibre
|
||||
Channel HBA). The reason this is considered 'bus dependent' is that
|
||||
only the bus dependent module may have the information that says how
|
||||
one could perform I/O mapping and dependent (e.g., on a Solaris system)
|
||||
on the Request and Reponse queues. Another callback can enable the *use*
|
||||
of interrupts should this platform be able to finish configuration in
|
||||
interrupt driven mode.
|
||||
|
||||
If isp_reset is successful at resetting the QLogic chipset and downloading
|
||||
new firmware (if available) and setting it running, isp_init is called. If
|
||||
isp_init is successful in doing initial board setups (including reading
|
||||
NVRAM from the QLogic card), then this bus specicic module will call the
|
||||
platform dependent module that takes the appropriate steps to 'register'
|
||||
this HBA with this platform's SCSI subsystem. Examining either the
|
||||
OpenBSD or the NetBSD isp_pci.c or isp_sbus.c files may assist the reader
|
||||
here in clarifying some of this.
|
||||
|
||||
5.4 Initiator Mode Command Code Flow
|
||||
|
||||
A succesful execution of isp_init will lead to the driver 'registering'
|
||||
itself with this platform's SCSI subsystem. One assumed action for this
|
||||
is the registry of a function the the SCSI subsystem for this platform
|
||||
will call when it has a SCSI command to run.
|
||||
|
||||
The platform specific module function that receives this will do whatever
|
||||
it needs to to prepare this command for execution in the core module. This
|
||||
sounds vague, but it's also very flexible. In principle, this could be
|
||||
a complete marshalling/demarshalling of this platform's SCSI command
|
||||
structure (should it be impossible to represent in an XS_T). In addition,
|
||||
this function can also block commands from running (if, e.g., Fibre
|
||||
Channel loop state would preclude successful starting of the command).
|
||||
|
||||
When it's ready to do so, the function isp_start is called with this
|
||||
command. This core module tries to allocate request queue space for
|
||||
this command. It also calls through the machine dependent vector
|
||||
function to make sure any DMA mapping for this command is done.
|
||||
|
||||
Now, DMA mapping here is possibly a misnomer, as more than just
|
||||
DMA mapping can be done in this bus dependent function. This is
|
||||
also the place where any endian byte-swizzling will be done. At any
|
||||
rate, this function is called last because the process of establishing
|
||||
DMA addresses for any command may in fact consume more Request Queue
|
||||
entries than there are currently available. If the mapping and other
|
||||
functions are successful, the QLogic mailbox inbox pointer register
|
||||
is updated to indicate to the QLogic that it has a new request to
|
||||
read.
|
||||
|
||||
If this function is unsuccessful, policy as to what to do at this point is
|
||||
left to the machine dependent platform function which called isp_start. In
|
||||
some platforms, temporary resource shortages can be handled by the main
|
||||
SCSI subsystem. In other platforms, the machine dependent code has to
|
||||
handle this.
|
||||
|
||||
In order to keep track of commands that are in progress, the soft state
|
||||
structure contains an array of 'handles' that are associated with each
|
||||
active command. When you send a command to the QLogic firmware, a portion
|
||||
of the Request Queue entry can contain a non-zero handle identifier so
|
||||
that at a later point in time in reading either a Response Queue entry
|
||||
or from a Fast Posting mailbox completion interrupt, you can take this
|
||||
handle to find the command you were waiting on. It should be noted that
|
||||
this is probably one of the most dangerous areas of this driver. Corrupted
|
||||
handles will lead to system panics.
|
||||
|
||||
At some later point in time an interrupt will occur. Eventually,
|
||||
isp_intr will be called. This core module will determine what the cause
|
||||
of the interrupt is, and if it is for a completing command. That is,
|
||||
it'll determine the handle and fetch the pointer to the command out of
|
||||
storage within the soft state structure. Skipping over a lot of details,
|
||||
the machine dependent code supplied function isp_done is called with the
|
||||
pointer to the completing command. This would then be the glue layer that
|
||||
informs the SCSI subsystem for this platform that a command is complete.
|
||||
|
||||
5.5 Asynchronous Events
|
||||
|
||||
Interrupts occur for events other than commands (mailbox or request queue
|
||||
started commands) completing. These are called Asynchronous Mailbox
|
||||
interrupts. When some external event causes the SCSI bus to be reset,
|
||||
or when a Fibre Channel loop changes state (e.g., a LIP is observed),
|
||||
this generates such an asynchronous event.
|
||||
|
||||
Each platform module has to provide an isp_async entry point that will
|
||||
handle a set of these. This isp_async entry point also handles things
|
||||
which aren't properly async events but are simply natural outgrowths
|
||||
of code flow for another core function (see discussion on fabric device
|
||||
management below).
|
||||
|
||||
5.6 Target Mode Code Flow
|
||||
|
||||
This section could use a lot of expansion, but this covers the basics.
|
||||
|
||||
The QLogic cards, when operating in target mode, follow a code flow that is
|
||||
essentially the inverse of that for intiator mode describe above. In this
|
||||
scenario, an interrupt occurs, and present on the Response Queue is a
|
||||
queue entry element defining a new command arriving from an initiator.
|
||||
|
||||
This is passed to possibly external target mode handler. This driver
|
||||
provides some handling for this in a core module, but also leaves
|
||||
things open enough that a completely different target mode handler
|
||||
may accept this incoming queue entry.
|
||||
|
||||
The external target mode handler then turns around forms up a response
|
||||
to this 'response' that just arrived which is then placed on the Request
|
||||
Queue and handled very much like an initiator mode command (i.e., calling
|
||||
the bus dependent DMA mapping function). If this entry completes the
|
||||
command, no more need occur. But often this handles only part of the
|
||||
requested command, so the QLogic firmware will rewrite the response
|
||||
to the initial 'response' again onto the Response Queue, whereupon the
|
||||
target mode handler will respond to that, and so on until the command
|
||||
is completely handled.
|
||||
|
||||
Because almost no platform provides basic SCSI Subsystem target mode
|
||||
support, this design has been left extremely open ended, and as such
|
||||
it's a bit hard to describe in more detail than this.
|
||||
|
||||
5.7 Locking Assumptions
|
||||
|
||||
The observant reader by now is likely to have asked the question, "but what
|
||||
about locking? Or interrupt masking" by now.
|
||||
|
||||
The basic assumption about this is that the core module does not know
|
||||
anything directly about locking or interrupt masking. It may assume that
|
||||
upon entry (e.g., via isp_start, isp_control, isp_intr) that appropriate
|
||||
locking and interrupt masking has been done.
|
||||
|
||||
The platform dependent code may also therefore assume that if it is
|
||||
called (e.g., isp_done or isp_async) that any locking or masking that
|
||||
was in place upon the entry to the core module is still there. It is up
|
||||
to the platform dependent code to worry about avoiding any lock nesting
|
||||
issues. As an example of this, the Linux implementation simply queues
|
||||
up commands completed via the callout to isp_done, which it then pushes
|
||||
out to the SCSI subsystem after a return from it's calling isp_intr is
|
||||
executed (and locks dropped appropriately, as well as avoidance of deep
|
||||
interrupt stacks).
|
||||
|
||||
Recent changes in the design have now eased what had been an original
|
||||
requirement that the while in the core module no locks or interrupt
|
||||
masking could be dropped. It's now up to each platform to figure out how
|
||||
to implement this. This is principally used in the execution of mailbox
|
||||
commands (which are principally used for Loop and Fabric management via
|
||||
the isp_control function).
|
||||
|
||||
5.8 SCSI Specifics
|
||||
|
||||
The driver core or platform dependent architecture issues that are specific
|
||||
to SCSI are few. There is a basic assumption that the QLogic firmware
|
||||
supported Automatic Request sense will work- there is no particular provision
|
||||
for disabling it's usage on a per-command basis.
|
||||
|
||||
5.9 Fibre Channel Specifics
|
||||
|
||||
Fibre Channel presents an interesting challenge here. The QLogic firmware
|
||||
architecture for dealing with Fibre Channel as just a 'fat' SCSI bus
|
||||
is fine on the face of it, but there are some subtle and not so subtle
|
||||
problems here.
|
||||
|
||||
5.9.1 Firmware State
|
||||
|
||||
Part of the initialization (isp_init) for Fibre Channel HBAs involves
|
||||
sending a command (Initialize Control Block) that establishes Node
|
||||
and Port WWNs as well as topology preferences. After this occurs,
|
||||
the QLogic firmware tries to traverese through serveral states:
|
||||
|
||||
FW_CONFIG_WAIT
|
||||
FW_WAIT_AL_PA
|
||||
FW_WAIT_LOGIN
|
||||
FW_READY
|
||||
FW_LOSS_OF_SYNC
|
||||
FW_ERROR
|
||||
FW_REINIT
|
||||
FW_NON_PART
|
||||
|
||||
It starts with FW_CONFIG_WAIT, attempts to get an AL_PA (if on an FC-AL
|
||||
loop instead of being connected as an N-port), waits to log into all
|
||||
FC-AL loop entities and then hopefully transitions to FW_READY state.
|
||||
|
||||
Clearly, no command should be attempted prior to FW_READY state is
|
||||
achieved. The core internal function isp_fclink_test (reachable via
|
||||
isp_control with the ISPCTL_FCLINK_TEST function code). This function
|
||||
also determines connection topology (i.e., whether we're attached to a
|
||||
fabric or not).
|
||||
|
||||
5.9.2. Loop State Transitions- From Nil to Ready
|
||||
|
||||
Once the firmware has transitioned to a ready state, then the state of the
|
||||
connection to either arbitrated loop or to a fabric has to be ascertained,
|
||||
and the identity of all loop members (and fabric members validated).
|
||||
|
||||
This can be very complicated, and it isn't made easy in that the QLogic
|
||||
firmware manages PLOGI and PRLI to devices that are on a local loop, but
|
||||
it is the driver that must manage PLOGI/PRLI with devices on the fabric.
|
||||
|
||||
In order to manage this state an eight level staging of current "Loop"
|
||||
(where "Loop" is taken to mean FC-AL or N- or F-port connections) states
|
||||
in the following ascending order:
|
||||
|
||||
LOOP_NIL
|
||||
LOOP_LIP_RCVD
|
||||
LOOP_PDB_RCVD
|
||||
LOOP_SCANNING_FABRIC
|
||||
LOOP_FSCAN_DONE
|
||||
LOOP_SCANNING_LOOP
|
||||
LOOP_LSCAN_DONE
|
||||
LOOP_SYNCING_PDB
|
||||
LOOP_READY
|
||||
|
||||
When the core code initializes the QLogic firmware, it sets the loop
|
||||
state to LOOP_NIL. The first 'LIP Received' asynchronous event sets state
|
||||
to LOOP_LIP_RCVD. This should be followed by a "Port Database Changed"
|
||||
asynchronous event which will set the state to LOOP_PDB_RCVD. Each of
|
||||
these states, when entered, causes an isp_async event call to the
|
||||
machine dependent layers with the ISPASYNC_CHANGE_NOTIFY code.
|
||||
|
||||
After the state of LOOP_PDB_RCVD is reached, the internal core function
|
||||
isp_scan_fabric (reachable via isp_control(..ISPCTL_SCAN_FABRIC)) will,
|
||||
if the connection is to a fabric, use Simple Name Server mailbox mediated
|
||||
commands to dump the entire fabric contents. For each new entity, an
|
||||
isp_async event will be generated that says a Fabric device has arrived
|
||||
(ISPASYNC_FABRIC_DEV). The function that isp_async must perform in this
|
||||
step is to insert possibly remove devices that it wants to have the
|
||||
QLogic firmware log into (at LOOP_SYNCING_PDB state level)).
|
||||
|
||||
After this has occurred, the state LOOP_FSCAN_DONE is set, and then the
|
||||
internal function isp_scan_loop (isp_control(...ISPCTL_SCAN_LOOP)) can
|
||||
be called which will then scan for any local (FC-AL) entries by asking
|
||||
for each possible local loop id the QLogic firmware for a Port Database
|
||||
entry. It's at this level some entries cached locally are purged
|
||||
or shifting loopids are managed (see section 5.9.4).
|
||||
|
||||
The final step after this is to call the internal function isp_pdb_sync
|
||||
(isp_control(..ISPCTL_PDB_SYNC)). The purpose of this function is to
|
||||
then perform the PLOGI/PRLI functions for fabric devices. The next state
|
||||
entered after this is LOOP_READY, which means that the driver is ready
|
||||
to process commands to send to Fibre Channel devices.
|
||||
|
||||
5.9.3 Fibre Channel variants of Initiator Mode Code Flow
|
||||
|
||||
The code flow in isp_start for Fibre Channel devices is the same as it is
|
||||
for SCSI devices, but with a notable exception.
|
||||
|
||||
Maintained within the fibre channel specific portion of the driver soft
|
||||
state structure is a distillation of the existing population of both
|
||||
local loop and fabric devices. Because Loop IDs can shift on a local
|
||||
loop but we wish to retain a 'constant' Target ID (see 5.9.4), this
|
||||
is indexed directly via the Target ID for the command (XS_TGT(xs)).
|
||||
|
||||
If there is a valid entry for this Target ID, the command is started
|
||||
(with the stored 'Loop ID'). If not the command is completed with
|
||||
the error that is just like a SCSI Selection Timeout error.
|
||||
|
||||
This code is currently somewhat in transition. Some platforms to
|
||||
do firmware and loop state management (as described above) at this
|
||||
point. Other platforms manage this from the machine dependent layers. The
|
||||
important function to watch in this respect is isp_fc_runstate (in
|
||||
isp_inline.h).
|
||||
|
||||
5.9.4 "Target" in Fibre Channel is a fixed virtual construct
|
||||
|
||||
Very few systems can cope with the notion that "Target" for a disk
|
||||
device can change while you're using it. But one of the properties of
|
||||
for arbitrated loop is that the physical bus address for a loop member
|
||||
(the AL_PA) can change depending on when and how things are inserted in
|
||||
the loop.
|
||||
|
||||
To illustrate this, let's take an example. Let's say you start with a
|
||||
loop that has 5 disks in it. At boot time, the system will likely find
|
||||
them and see them in this order:
|
||||
|
||||
disk# Loop ID Target ID
|
||||
disk0 0 0
|
||||
disk1 1 1
|
||||
disk2 2 2
|
||||
disk3 3 3
|
||||
disk4 4 4
|
||||
|
||||
The driver uses 'Loop ID' when it forms requests to send a comamnd to
|
||||
each disk. However, it reports to NetBSD that things exist as 'Target
|
||||
ID'. As you can see here, there is perfect correspondence between disk,
|
||||
Loop ID and Target ID.
|
||||
|
||||
Let's say you add a new disk between disk2 and disk3 while things are
|
||||
running. You don't really often see this, but you *could* see this where
|
||||
the loop has to renegotiate, and you end up with:
|
||||
|
||||
disk# Loop ID Target ID
|
||||
disk0 0 0
|
||||
disk1 1 1
|
||||
disk2 2 2
|
||||
diskN 3 ?
|
||||
disk3 4 ?
|
||||
disk4 5 ?
|
||||
|
||||
Clearly, you don't want disk3 and disk4's "Target ID" to change while you're
|
||||
running since currently mounted filesystems will get trashed.
|
||||
|
||||
What the driver is supposed to do (this is the function of isp_scan_loop),
|
||||
is regenerate things such that the following then occurs:
|
||||
|
||||
disk# Loop ID Target ID
|
||||
disk0 0 0
|
||||
disk1 1 1
|
||||
disk2 2 2
|
||||
diskN 3 5
|
||||
disk3 4 3
|
||||
disk4 5 4
|
||||
|
||||
So, "Target" is a virtual entity that is maintained while you're running.
|
||||
|
||||
6. Glossary
|
||||
|
||||
HBA - Host Bus Adapter
|
||||
|
||||
SCSI - Small Computer
|
||||
|
||||
7. References
|
||||
|
||||
Various URLs of interest:
|
||||
|
||||
http://www.netbsd.org - NetBSD's Web Page
|
||||
http://www.openbsd.org - OpenBSD's Web Page
|
||||
http://www.freebsd.org - FreeBSD's Web Page
|
||||
|
||||
http://www.t10.org - ANSI SCSI Commitee's Web Page
|
||||
(SCSI Specs)
|
||||
http://www.t11.org - NCITS Device Interface Web Page
|
||||
(Fibre Channel Specs)
|
||||
|
304
sys/dev/isp/Hardware.txt
Normal file
304
sys/dev/isp/Hardware.txt
Normal file
@ -0,0 +1,304 @@
|
||||
/* $FreeBSD$ */
|
||||
|
||||
Hardware that is Known To or Should Work with This Driver
|
||||
|
||||
|
||||
0. Intro
|
||||
|
||||
This is not an endorsement for hardware vendors (there will be
|
||||
no "where to buy" URLs here with a couple of exception). This
|
||||
is simply a list of things I know work, or should work, plus
|
||||
maybe a couple of notes as to what you should do to make it
|
||||
work. Corrections accepted. Even better would be to send me
|
||||
hardware to I can test it.
|
||||
|
||||
I'll put a rough range of costs in US$ that I know about. No doubt
|
||||
it'll differ from your expectations.
|
||||
|
||||
1. HBAs
|
||||
|
||||
Qlogic 2100, 2102
|
||||
2200, 2202, 2204
|
||||
|
||||
There are various suffices that indicate copper or optical
|
||||
connectors, or 33 vs. 66MHz PCI bus operation. None of these
|
||||
have a software impact.
|
||||
|
||||
Approx cost: 1K$ for a 2200
|
||||
|
||||
Qlogic 2300, 2312
|
||||
|
||||
These are the new 2-Gigabit cards. Optical only.
|
||||
|
||||
Approx cost: ??????
|
||||
|
||||
|
||||
Antares P-0033, P-0034, P-0036
|
||||
|
||||
There many other vendors that use the Qlogic 2X00 chipset. Some older
|
||||
2100 boards (not on this list) have a bug in the ROM that causes a
|
||||
failure to download newer firmware that is larger than 0x7fff words.
|
||||
|
||||
Approx cost: 850$ for a P-0036
|
||||
|
||||
|
||||
|
||||
In general, the 2200 class chip is to be preferred.
|
||||
|
||||
|
||||
2. Hubs
|
||||
|
||||
Vixel 1000
|
||||
Vixel 2000
|
||||
Of the two, the 1000 (7 ports, vs. 12 ports) has had fewer problems-
|
||||
it's an old workhorse.
|
||||
|
||||
|
||||
Approx cost: 1.5K$ for Vixel 1000, 2.5K$ for 2000
|
||||
|
||||
Gadzoox Cappellix 3000
|
||||
Don't forget to use telnet to configure the Cappellix ports
|
||||
to the role you're using them for- otherwise things don't
|
||||
work well at all.
|
||||
|
||||
(cost: I have no idea... certainly less than a switch)
|
||||
|
||||
3. Switches
|
||||
|
||||
Brocade Silkworm II
|
||||
Brocade 2400
|
||||
(other brocades should be fine)
|
||||
|
||||
Especially with revision 2 or higher f/w, this is now best
|
||||
of breed for fabrics or segmented loop (which Brocade
|
||||
calls "QuickLoop").
|
||||
|
||||
For the Silkworm II, set operating mode to "Tachyon" (mode 3).
|
||||
|
||||
The web interace isn't good- but telnet is what I prefer anyhow.
|
||||
|
||||
You can't connect a Silkworm II and the other Brocades together
|
||||
as E-ports to make a large fabric (at least with the f/w *I*
|
||||
had for the Silkworm II).
|
||||
|
||||
Approx cost of a Brocade 2400 with no GBICs is about 8K$ when
|
||||
I recently checked the US Government SEWP price list- no doubt
|
||||
it'll be a bit more for others. I'd assume around 10K$.
|
||||
|
||||
ANCOR SA-8
|
||||
|
||||
This also is a fine switch, but you have to use a browser
|
||||
with working java to manage it- which is a bit of a pain.
|
||||
This also supports fabric and segmented loop.
|
||||
|
||||
These switches don't form E-ports with each other for a larger
|
||||
fabric.
|
||||
|
||||
(cost: no idea)
|
||||
|
||||
McData (model unknown)
|
||||
|
||||
I tried one exactly once for 30 minutes. Seemed to work once
|
||||
I added the "register FC4 types" command to the driver.
|
||||
|
||||
(cost: very very expensive, 40K$ plus)
|
||||
|
||||
4. Cables/GBICs
|
||||
|
||||
Multimode optical is adequate for Fibre Channel- the same cable is
|
||||
used for Gigabit Ethernet.
|
||||
|
||||
Copper DB-9 and Copper HSS-DC connectors are also fine. Copper &&
|
||||
Optical both are rated to 1.026Gbit- copper is naturally shorter
|
||||
(the longest I've used is a 15meter cable but it's supposed to go
|
||||
longer).
|
||||
|
||||
The reason to use copper instead of optical is that if step on one of
|
||||
the really fat DB-9 cables you can get, it'll survive. Optical usually
|
||||
dies quickly if you step on it.
|
||||
|
||||
Approx cost: I don't know what optical is- you can expect to pay maybe
|
||||
a 100$ for a 3m copper cable.
|
||||
|
||||
GBICs-
|
||||
|
||||
I use Finisar copper and IBM Opticals.
|
||||
|
||||
Approx Cost: Copper GBICs are 70$ each. Opticals are twice that or more.
|
||||
|
||||
|
||||
Vendor: (this is the one exception I'll make because it turns out to be
|
||||
an incredible pain to find FC copper cabling and GBICs- the source I
|
||||
use for GBICs and copper cables is http://www.scsi-cables.com)
|
||||
|
||||
|
||||
Other:
|
||||
There now is apparently a source for little connector boards
|
||||
to connect to bare drives: http://www.cinonic.com.
|
||||
|
||||
|
||||
5. Storage JBODs/RAID
|
||||
|
||||
JMR 4-Bay
|
||||
|
||||
Rinky-tink, but a solid 4 bay loop only entry model.
|
||||
|
||||
I paid 1000$ for mine- overprice, IMO.
|
||||
|
||||
JMR Fortra
|
||||
|
||||
I rather like this box. The blue LEDs are a very nice touch- you
|
||||
can see them very clearly from 50 feet away.
|
||||
|
||||
I paid 2000$ for one used.
|
||||
|
||||
Sun A5X00
|
||||
|
||||
Very expensive (in my opinion) but well crafted. Has two SES
|
||||
instances, so you can use the ses driver (and the example
|
||||
code in /usr/share/examples) for power/thermal/slot monitoring.
|
||||
|
||||
Approx Cost: The last I saw for a price list item on this was 22K$
|
||||
for a unpopulated (no disk drive) A5X00.
|
||||
|
||||
|
||||
DataDirect E1000 RAID
|
||||
|
||||
Don't connect both SCSI and FC interfaces at the same time- a SCSI
|
||||
reset will cause the DataDirect to think you want to use the SCSI
|
||||
interface and a LIP on the FC interface will cause it to think you
|
||||
want to use the FC interface. Use only one connector at a time so
|
||||
both you and the DataDirect are sure about what you want.
|
||||
|
||||
Cost: I have no idea.
|
||||
|
||||
Veritas ServPoint
|
||||
|
||||
This is a software storage virtualization engine that
|
||||
runs on Sparc/Solaris in target mode for frontend
|
||||
and with other FC or SCSI as the backend storage. FreeBSD
|
||||
has been used extensively to test it.
|
||||
|
||||
|
||||
Cost: I have no idea.
|
||||
|
||||
6. Disk Drives
|
||||
|
||||
I have used lots of different Seagate and a few IBM drives and
|
||||
typically have had few problems with them. These are the bare
|
||||
drives with 40-pin SCA connectors in back. They go into the JBODs
|
||||
you assemble.
|
||||
|
||||
Seagate does make, but I can no longer find, a little paddleboard
|
||||
single drive connector that goes from DB-9 FC to the 40-pin SCA
|
||||
connector- primarily for you to try and evaluate a single FC drive.
|
||||
|
||||
All FC-AL disk drives are dual ported (i.e., have separte 'A' and
|
||||
'B' ports- which are completely separate loops). This seems to work
|
||||
reasonably enough, but I haven't tested it much. It really depends
|
||||
on the JBOD you put them to carry this dual port to the outside
|
||||
world. The JMR boxes have it. The Sun A5X00 you have to pay for
|
||||
an extra IB card to carry it out.
|
||||
|
||||
Approx Cost: You'll find that FC drives are the same cost if not
|
||||
slightly cheaper than the equivalent Ultra3 SCSI drives.
|
||||
|
||||
7. Recommended Configurations
|
||||
|
||||
These are recommendations that are biased toward the cautious side. They
|
||||
do not represent formal engineering commitments- just suggestions as to
|
||||
what I would expect to work.
|
||||
|
||||
A. The simpletst form of a connection topology I can suggest for
|
||||
a small SAN (i.e., replacement for SCSI JBOD/RAID):
|
||||
|
||||
HOST
|
||||
2xxx <----------> Single Unit of Storage (JBOD, RAID)
|
||||
|
||||
This is called a PL_DA (Private Loop, Direct Attach) topology.
|
||||
|
||||
B. The next most simple form of a connection topology I can suggest for
|
||||
a medium local SAN (where you do not plan to do dynamic insertion
|
||||
and removal of devices while I/Os are active):
|
||||
|
||||
HOST
|
||||
2xxx <----------> +--------
|
||||
| Vixel |
|
||||
| 1000 |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
--------
|
||||
|
||||
This is a Private Loop topology. Remember that this can get very unstable
|
||||
if you make it too long. A good practice is to try it in a staged fashion.
|
||||
|
||||
It is possible with some units to "daisy chain", e.g.:
|
||||
|
||||
HOST
|
||||
2xxx <----------> (JBOD, RAID) <--------> (JBOD, RAID)
|
||||
|
||||
In practice I have had poor results with these configurations. They *should*
|
||||
work fine, but for both the JMR and the Sun A5X00 I tend to get LIP storms
|
||||
and so the second unit just isn't seen and the loop isn't stable.
|
||||
|
||||
Now, this could simply be my lack of clean, newer, h/w (or, in general,
|
||||
a lack of h/w), but I would recommend the use of a hub if you want to
|
||||
stay with Private Loop and have more than one FC target.
|
||||
|
||||
You should also note this can begin to be the basis for a shared SAN
|
||||
solution. For example, the above configuration can be extended to be:
|
||||
|
||||
HOST
|
||||
2xxx <----------> +--------
|
||||
| Vixel |
|
||||
| 1000 |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
HOST | |
|
||||
2xxx <----------> +--------
|
||||
|
||||
However, note that there is nothing to mediate locking of devices, and
|
||||
it is also conceivable that the reboot of one host can, by causing
|
||||
a LIP storm, cause problems with the I/Os from the other host.
|
||||
(in other words, this topology hasn't really been made safe yet for
|
||||
this driver).
|
||||
|
||||
D. You can repeat the topology in #B with a switch that is set to be
|
||||
in segmented loop mode. This avoids LIPs propagating where you don't
|
||||
want them to- and this makes for a much more reliable, if more expensive,
|
||||
SAN.
|
||||
|
||||
E. The next level of complexity is a Switched Fabric. The following topology
|
||||
is good when you start to begin to get to want more performance. Private
|
||||
and Public Arbitrated Loop, while 100MB/s, is a shared medium. Direct
|
||||
connections to a switch can run full-duplex at full speed.
|
||||
|
||||
HOST
|
||||
2xxx <----------> +---------
|
||||
| Brocade|
|
||||
| 2400 |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
| |
|
||||
| +<---> Storage
|
||||
HOST | |
|
||||
2xxx <----------> +---------
|
||||
|
||||
|
||||
I would call this the best configuration available now. It can expand
|
||||
substantially if you cascade switches.
|
||||
|
||||
There is a hard limit of about 253 devices for each Qlogic HBA- and the
|
||||
fabric login policy is simplistic (log them in as you find them). If
|
||||
somebody actually runs into a configuration that's larger, let me know
|
||||
and I'll work on some tools that would allow you some policy choices
|
||||
as to which would be interesting devices to actually connect to.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user