Add some somewhat vague documentation for this driver and a list

of Hardware that might, in fact, work.
This commit is contained in:
mjacob 2001-10-07 18:26:47 +00:00
parent 9fcab64afc
commit 94ad0dbb03
2 changed files with 938 additions and 0 deletions

View File

@ -0,0 +1,634 @@
/* $FreeBSD$ */
Driver Theory of Operation Manual
1. Introduction
This is a short text document that will describe the background, goals
for, and current theory of operation for the joint Fibre Channel/SCSI
HBA driver for QLogic hardware.
Because this driver is an ongoing project, do not expect this manual
to remain entirely up to date. Like a lot of software engineering, the
ultimate documentation is the driver source. However, this manual should
serve as a solid basis for attempting to understand where the driver
started and what is trying to be accomplished with the current source.
The reader is expected to understand the basics of SCSI and Fibre Channel
and to be familiar with the range of platforms that Solaris, Linux and
the variant "BSD" Open Source systems are available on. A glossary and
a few references will be placed at the end of the document.
There will be references to functions and structures within the body of
this document. These can be easily found within the source using editor
tags or grep. There will be few code examples here as the code already
exists where the reader can easily find it.
2. A Brief History for this Driver
This driver originally started as part of work funded by NASA Ames
Research Center's Numerical Aerodynamic Simulation center ("NAS" for
short) for the QLogic PCI 1020 and 1040 SCSI Host Adapters as part of my
work at porting the NetBSD Operating System to the Alpha architectures
(specifically the AlphaServer 8200 and 8400 platforms). In short, it
started just as simple single SCSI HBA driver for just the purpose of
running off a SCSI disk. This work took place starting in January, 1997.
Because the first implementation was for NetBSD, which runs on a very
large number of platforms, and because NetBSD supported both systems with
SBus cards (e.g., Sun SPARC systems) as well as systems with PCI cards,
and because the QLogic SCSI cards came in both SBus and PCI versions, the
initial implementation followed the very thoughtful NetBSD design tenet
of splitting drivers into what are called MI (for Machine Independent)
and MD (Machine Dependent) portions. The original design therefore was
from the premise that the driver would drive both SBus and PCI card
variants. These busses are similar but have quite different constraints,
and while the QLogic SBus and PCI cards are very similar, there are some
significant differences.
After this initial goal had been met, there began to be some talk about
looking into implementing Fibre Channel mass storage at NAS. At this time
the QLogic 2100 FC/AL HBA was about to become available. After looking at
the way it was designed I concluded that it was so darned close to being
just like the SCSI HBAs that it would be insane to *not* leverage off of
the existing driver. So, we ended up with a driver for NetBSD that drove
PCI and SBus SCSI cards, and now also drove the QLogic 2100 FC-AL HBA.
After this, ports to non-NetBSD platforms became interesting as well.
This took the driver out of the interest with NAS and into interested
support from a number of other places. Since the original NetBSD
development, the driver has been ported to FreeBSD, OpenBSD, Linux,
Solaris, and two proprietary systems. Following from the original MI/MD
design of NetBSD, a rather successful attempt has been made to keep the
Operating System Platform differences segregated and to a minimum.
Along the way, support for the 2200 as well as full fabric and target
mode support has been added, and 2300 support as well as an FC-IP stack
are planned.
3. Driver Design Goals
The driver has not started out as one normally would do such an effort.
Normally you design via top-down methodologies and set an intial goal
and meet it. This driver has had a design goal that changes from almost
the very first. This has been an extremely peculiar, if not risque,
experience. As a consequence, this section of this document contains
a bit of "reconstruction after the fact" in that the design goals are
as I perceive them to be now- not necessarily what they started as.
The primary design goal now is to have a driver that can run both the
SCSI and Fibre Channel SCSI prototocols on multiple OS platforms with
as little OS platform support code as possible.
The intended support targets for SCSI HBAs is to support the single and
dual channel PCI Ultra2 and PCI Ultra3 cards as well as the older PCI
Ultra single channel cards and SBus cards.
The intended support targets for Fibre Channel HBAs is the 2100, 2200
and 2300 PCI cards.
Fibre Channel support should include complete fabric and public loop
as well as private loop and private loop, direct-attach topologies.
FC-IP support is also a goal.
For both SCSI and Fibre Channel, simultaneous target/initiator mode support
is a goal.
Pure, raw, performance is not a primary goal of this design. This design,
because it has a tremendous amount of code common across multiple
platforms, will undoubtedly never be able to beat the performance of a
driver that is specifically designed for a single platform and a single
card. However, it is a good strong secondary goal to make the performance
penalties in this design as small as possible.
Another primary aim, which almost need not be stated, is that the
implementation of platform differences must not clutter up the common
code with platform specific defines. Instead, some reasonable layering
semantics are defined such that platform specifics can be kept in the
platform specific code.
4. QLogic Hardware Architecture
In order to make the design of this driver more intelligible, some
description of the Qlogic hardware architecture is in order. This will
not be an exhaustive description of how this card works, but will
note enough of the important features so that the driver design is
hopefully clearer.
4.1 Basic QLogic hardware
The QLogic HBA cards all contain a tiny 16-bit RISC-like processor and
varying sizes of SRAM. Each card contains a Bus Interface Unit (BIU)
as appropriate for the host bus (SBus or PCI). The BIUs allow access
to a set of dual-ranked 16 bit incoming and outgoing mailbox registers
as well as access to control registers that control the RISC or access
other portions of the card (e.g., Flash BIOS). The term 'dual-ranked'
means that at the same host visible address if you write a mailbox
register, that is a write to an (incoming, to the HBA) mailbox register,
while a read to the same address reads another (outgoing, to the HBA)
mailbox register with completely different data. Each HBA also then has
core and auxillary logic which either is used to interface to a SCSI bus
(or to external bus drivers that connect to a SCSI bus), or to connect
to a Fibre Channel bus.
4.2 Basic Control Interface
There are two principle I/O control mechanisms by which the driver
communicates with and controls the QLogic HBA. The first mechanism is to
use the incoming mailbox registers to interrupt and issue commands to
the RISC processor (with results usually, but not always, ending up in
the ougtoing mailbox registers). The second mechanism is to establish,
via mailbox commands, circular request and response queues in system
memory that are then shared between the QLogic and the driver. The
request queue is used to queue requests (e.g., I/O requests) for the
QLogic HBA's RISC engine to copy into the HBA memory and process. The
result queue is used by the QLogic HBA's RISC engine to place results of
requests read from the request queue, as well as to place notification
of asynchronous events (e.g., incoming commands in target mode).
To give a bit more precise scale to the preceding description, the QLogic
HBA has 8 dual-ranked 16 bit mailbox registers, mostly for out-of-band
control purposes. The QLogic HBA then utilizes a circular request queue
of 64 byte fixed size Queue Entries to receive normal initiator mode
I/O commands (or continue target mode requests). The request queue may
be up to 256 elements for the QLogic 1020 and 1040 chipsets, but may
be quite larger for the QLogic 12X0/12160 SCSI and QLogic 2X00 Fibre
Channel chipsets.
In addition to synchronously initiated usage of mailbox commands by
the host system, the QLogic may also deliver asynchronous notifications
solely in outgoing mailbox registers. These asynchronous notifications in
mailboxes may be things like notification of SCSI Bus resets, or that the
Fabric Name server has sent a change notification, or even that a specific
I/O command completed without error (this is called 'Fast Posting'
and saves the QLogic HBA from having to write a response queue entry).
The QLogic HBA is an interrupting card, and when servicing an interrupt
you really only have to check for either a mailbox interrupt or an
interrupt notification that the the response queue has an entry to
be dequeued.
4.3 Fibre Channel SCSI out of SCSI
QLogic took the approach in introducing the 2X00 cards to just treat
FC-AL as a 'fat' SCSI bus (a SCSI bus with more than 15 targets). All
of the things that you really need to do with Fibre Channel with respect
to providing FC-4 services on top of a Class 3 connection are performed
by the RISC engine on the QLogic card itself. This means that from
an HBA driver point of view, very little needs to change that would
distinguish addressing a Fibre Channel disk from addressing a plain
old SCSI disk.
However, in the details it's not *quite* that simple. For example, in
order to manage Fabric Connections, the HBA driver has to do explicit
binding of entities it's queried from the name server to specific 'target'
ids (targets, in this case, being a virtual entity).
Still- the HBA firmware does really nearly all of the tedious management
of Fibre Channel login state. The corollary to this sometimes is the
lack of ability to say why a particular login connection to a Fibre
Channel disk is not working well.
There are clear limits with the QLogic card in managing fabric devices.
The QLogic manages local loop devices (LoopID or Target 0..126) itself,
but for the management of fabric devices, it has an absolute limit of
253 simultaneous connections (256 entries less 3 reserved entries).
5. Driver Architecture
5.1 Driver Assumptions
The first basic assumption for this driver is that the requirements for
a SCSI HBA driver for any system is that of a 2 or 3 layer model where
there are SCSI target device drivers (drivers which drive SCSI disks,
SCSI tapes, and so on), possibly a middle services layer, and a bottom
layer that manages the transport of SCSI CDB's out a SCSI bus (or across
Fibre Channel) to a SCSI device. It's assumed that each SCSI command is
a separate structure (or pointer to a structure) that contains the SCSI
CDB and a place to store SCSI Status and SCSI Sense Data.
This turns out to be a pretty good assumption. All of the Open Source
systems (*BSD and Linux) and most of the proprietary systems have this
kind of structure. This has been the way to manage SCSI subsystems for
at least ten years.
There are some additional basic assumptions that this driver makes- primarily
in the arena of basic simple services like memory zeroing, memory copying,
delay, sleep, microtime functions. It doesn't assume much more than this.
5.2 Overall Driver Architecture
The driver is split into a core (machine independent) module and platform
and bus specific outer modules (machine dependent).
The core code (in the files isp.c, isp_inline.h, ispvar.h, ispreg.h and
ispmbox.h) handles:
+ Chipset recognition and reset and firmware download (isp_reset)
+ Board Initialization (isp_init)
+ First level interrupt handling (response retrieval) (isp_intr)
+ A SCSI command queueing entry point (isp_start)
+ A set of control services accessed either via local requirements within
the core module or via an externally visible control entry point
(isp_control).
The platform/bus specific modules (and definitions) depend on each
platform, and they provide both definitions and functions for the core
module's use. Generally a platform module set is split into a bus
dependent module (where configuration is begun from and bus specific
support functions reside) and relatively thin platform specific layer
which serves as the interconnect with the rest of this platform's SCSI
subsystem.
For ease of bus specific access issues, a centralized soft state
structure is maintained for each HBA instance (struct ispsoftc). This
soft state structure contains a machine/bus dependent vector (mdvec)
for functions that read and write hardware registers, set up DMA for the
request/response queues and fibre channel scratch area, set up and tear
down DMA mappings for a SCSI command, provide a pointer to firmware to
load, and other minor things.
The machine dependent outer module must provide functional entry points
for the core module:
+ A SCSI command completion handoff point (isp_done)
+ An asynchronous event handler (isp_async)
+ A logging/printing function (isp_prt)
The machine dependent outer module code must also provide a set of
abstracting definitions which is what the core module utilizes heavily
to do its job. These are discussed in detail in the comments in the
file ispvar.h, but to give a sense of the range of what is required,
let's illustrate two basic classes of these defines.
The first class are "structure definition/access" class. An
example of these would be:
XS_T Platform SCSI transaction type (i.e., command for HBA)
..
XS_TGT(xs) gets the target from an XS_T
..
XS_TAG_TYPE(xs) which type of tag to use
..
The second class are 'functional' class definitions. Some examples of
this class are:
MEMZERO(dst, src) platform zeroing function
..
MBOX_WAIT_COMPLETE(struct ispsoftc *) wait for mailbox cmd to be done
Note that the former is likely to be simple replacement with bzero or
memset on most systems, while the latter could be quite complex.
This soft state structure also contains different parameter information
based upon whether this is a SCSI HBA or a Fibre Channel HBA (which is
filled in by the code module).
In order to clear up what is undoubtedly a seeming confusion of
interconnects, a description of the typical flow of code that performs
boards initialization and command transactions may help.
5.3 Initialization Code Flow
Typically a bus specific module for a platform (e.g., one that wants
to configure a PCI card) is entered via that platform's configuration
methods. If this module recognizes a card and can utilize or construct the
space for the HBA instance softc, it does so, and initializes the machine
dependent vector as well as any other platform specific information that
can be hidden in or associated with this structure.
Configuration at this point usually involves mapping in board registers
and registering an interrupt. It's quite possible that the core module's
isp_intr function is adequate to be the interrupt entry point, but often
it's more useful have a bus specific wrapper module that calls isp_intr.
After mapping and interrupt registry is done, isp_reset is called.
Part of the isp_reset call may cause callbacks out to the bus dependent
module to perform allocation and/or mapping of Request and Response
queues (as well as a Fibre Channel scratch area if this is a Fibre
Channel HBA). The reason this is considered 'bus dependent' is that
only the bus dependent module may have the information that says how
one could perform I/O mapping and dependent (e.g., on a Solaris system)
on the Request and Reponse queues. Another callback can enable the *use*
of interrupts should this platform be able to finish configuration in
interrupt driven mode.
If isp_reset is successful at resetting the QLogic chipset and downloading
new firmware (if available) and setting it running, isp_init is called. If
isp_init is successful in doing initial board setups (including reading
NVRAM from the QLogic card), then this bus specicic module will call the
platform dependent module that takes the appropriate steps to 'register'
this HBA with this platform's SCSI subsystem. Examining either the
OpenBSD or the NetBSD isp_pci.c or isp_sbus.c files may assist the reader
here in clarifying some of this.
5.4 Initiator Mode Command Code Flow
A succesful execution of isp_init will lead to the driver 'registering'
itself with this platform's SCSI subsystem. One assumed action for this
is the registry of a function the the SCSI subsystem for this platform
will call when it has a SCSI command to run.
The platform specific module function that receives this will do whatever
it needs to to prepare this command for execution in the core module. This
sounds vague, but it's also very flexible. In principle, this could be
a complete marshalling/demarshalling of this platform's SCSI command
structure (should it be impossible to represent in an XS_T). In addition,
this function can also block commands from running (if, e.g., Fibre
Channel loop state would preclude successful starting of the command).
When it's ready to do so, the function isp_start is called with this
command. This core module tries to allocate request queue space for
this command. It also calls through the machine dependent vector
function to make sure any DMA mapping for this command is done.
Now, DMA mapping here is possibly a misnomer, as more than just
DMA mapping can be done in this bus dependent function. This is
also the place where any endian byte-swizzling will be done. At any
rate, this function is called last because the process of establishing
DMA addresses for any command may in fact consume more Request Queue
entries than there are currently available. If the mapping and other
functions are successful, the QLogic mailbox inbox pointer register
is updated to indicate to the QLogic that it has a new request to
read.
If this function is unsuccessful, policy as to what to do at this point is
left to the machine dependent platform function which called isp_start. In
some platforms, temporary resource shortages can be handled by the main
SCSI subsystem. In other platforms, the machine dependent code has to
handle this.
In order to keep track of commands that are in progress, the soft state
structure contains an array of 'handles' that are associated with each
active command. When you send a command to the QLogic firmware, a portion
of the Request Queue entry can contain a non-zero handle identifier so
that at a later point in time in reading either a Response Queue entry
or from a Fast Posting mailbox completion interrupt, you can take this
handle to find the command you were waiting on. It should be noted that
this is probably one of the most dangerous areas of this driver. Corrupted
handles will lead to system panics.
At some later point in time an interrupt will occur. Eventually,
isp_intr will be called. This core module will determine what the cause
of the interrupt is, and if it is for a completing command. That is,
it'll determine the handle and fetch the pointer to the command out of
storage within the soft state structure. Skipping over a lot of details,
the machine dependent code supplied function isp_done is called with the
pointer to the completing command. This would then be the glue layer that
informs the SCSI subsystem for this platform that a command is complete.
5.5 Asynchronous Events
Interrupts occur for events other than commands (mailbox or request queue
started commands) completing. These are called Asynchronous Mailbox
interrupts. When some external event causes the SCSI bus to be reset,
or when a Fibre Channel loop changes state (e.g., a LIP is observed),
this generates such an asynchronous event.
Each platform module has to provide an isp_async entry point that will
handle a set of these. This isp_async entry point also handles things
which aren't properly async events but are simply natural outgrowths
of code flow for another core function (see discussion on fabric device
management below).
5.6 Target Mode Code Flow
This section could use a lot of expansion, but this covers the basics.
The QLogic cards, when operating in target mode, follow a code flow that is
essentially the inverse of that for intiator mode describe above. In this
scenario, an interrupt occurs, and present on the Response Queue is a
queue entry element defining a new command arriving from an initiator.
This is passed to possibly external target mode handler. This driver
provides some handling for this in a core module, but also leaves
things open enough that a completely different target mode handler
may accept this incoming queue entry.
The external target mode handler then turns around forms up a response
to this 'response' that just arrived which is then placed on the Request
Queue and handled very much like an initiator mode command (i.e., calling
the bus dependent DMA mapping function). If this entry completes the
command, no more need occur. But often this handles only part of the
requested command, so the QLogic firmware will rewrite the response
to the initial 'response' again onto the Response Queue, whereupon the
target mode handler will respond to that, and so on until the command
is completely handled.
Because almost no platform provides basic SCSI Subsystem target mode
support, this design has been left extremely open ended, and as such
it's a bit hard to describe in more detail than this.
5.7 Locking Assumptions
The observant reader by now is likely to have asked the question, "but what
about locking? Or interrupt masking" by now.
The basic assumption about this is that the core module does not know
anything directly about locking or interrupt masking. It may assume that
upon entry (e.g., via isp_start, isp_control, isp_intr) that appropriate
locking and interrupt masking has been done.
The platform dependent code may also therefore assume that if it is
called (e.g., isp_done or isp_async) that any locking or masking that
was in place upon the entry to the core module is still there. It is up
to the platform dependent code to worry about avoiding any lock nesting
issues. As an example of this, the Linux implementation simply queues
up commands completed via the callout to isp_done, which it then pushes
out to the SCSI subsystem after a return from it's calling isp_intr is
executed (and locks dropped appropriately, as well as avoidance of deep
interrupt stacks).
Recent changes in the design have now eased what had been an original
requirement that the while in the core module no locks or interrupt
masking could be dropped. It's now up to each platform to figure out how
to implement this. This is principally used in the execution of mailbox
commands (which are principally used for Loop and Fabric management via
the isp_control function).
5.8 SCSI Specifics
The driver core or platform dependent architecture issues that are specific
to SCSI are few. There is a basic assumption that the QLogic firmware
supported Automatic Request sense will work- there is no particular provision
for disabling it's usage on a per-command basis.
5.9 Fibre Channel Specifics
Fibre Channel presents an interesting challenge here. The QLogic firmware
architecture for dealing with Fibre Channel as just a 'fat' SCSI bus
is fine on the face of it, but there are some subtle and not so subtle
problems here.
5.9.1 Firmware State
Part of the initialization (isp_init) for Fibre Channel HBAs involves
sending a command (Initialize Control Block) that establishes Node
and Port WWNs as well as topology preferences. After this occurs,
the QLogic firmware tries to traverese through serveral states:
FW_CONFIG_WAIT
FW_WAIT_AL_PA
FW_WAIT_LOGIN
FW_READY
FW_LOSS_OF_SYNC
FW_ERROR
FW_REINIT
FW_NON_PART
It starts with FW_CONFIG_WAIT, attempts to get an AL_PA (if on an FC-AL
loop instead of being connected as an N-port), waits to log into all
FC-AL loop entities and then hopefully transitions to FW_READY state.
Clearly, no command should be attempted prior to FW_READY state is
achieved. The core internal function isp_fclink_test (reachable via
isp_control with the ISPCTL_FCLINK_TEST function code). This function
also determines connection topology (i.e., whether we're attached to a
fabric or not).
5.9.2. Loop State Transitions- From Nil to Ready
Once the firmware has transitioned to a ready state, then the state of the
connection to either arbitrated loop or to a fabric has to be ascertained,
and the identity of all loop members (and fabric members validated).
This can be very complicated, and it isn't made easy in that the QLogic
firmware manages PLOGI and PRLI to devices that are on a local loop, but
it is the driver that must manage PLOGI/PRLI with devices on the fabric.
In order to manage this state an eight level staging of current "Loop"
(where "Loop" is taken to mean FC-AL or N- or F-port connections) states
in the following ascending order:
LOOP_NIL
LOOP_LIP_RCVD
LOOP_PDB_RCVD
LOOP_SCANNING_FABRIC
LOOP_FSCAN_DONE
LOOP_SCANNING_LOOP
LOOP_LSCAN_DONE
LOOP_SYNCING_PDB
LOOP_READY
When the core code initializes the QLogic firmware, it sets the loop
state to LOOP_NIL. The first 'LIP Received' asynchronous event sets state
to LOOP_LIP_RCVD. This should be followed by a "Port Database Changed"
asynchronous event which will set the state to LOOP_PDB_RCVD. Each of
these states, when entered, causes an isp_async event call to the
machine dependent layers with the ISPASYNC_CHANGE_NOTIFY code.
After the state of LOOP_PDB_RCVD is reached, the internal core function
isp_scan_fabric (reachable via isp_control(..ISPCTL_SCAN_FABRIC)) will,
if the connection is to a fabric, use Simple Name Server mailbox mediated
commands to dump the entire fabric contents. For each new entity, an
isp_async event will be generated that says a Fabric device has arrived
(ISPASYNC_FABRIC_DEV). The function that isp_async must perform in this
step is to insert possibly remove devices that it wants to have the
QLogic firmware log into (at LOOP_SYNCING_PDB state level)).
After this has occurred, the state LOOP_FSCAN_DONE is set, and then the
internal function isp_scan_loop (isp_control(...ISPCTL_SCAN_LOOP)) can
be called which will then scan for any local (FC-AL) entries by asking
for each possible local loop id the QLogic firmware for a Port Database
entry. It's at this level some entries cached locally are purged
or shifting loopids are managed (see section 5.9.4).
The final step after this is to call the internal function isp_pdb_sync
(isp_control(..ISPCTL_PDB_SYNC)). The purpose of this function is to
then perform the PLOGI/PRLI functions for fabric devices. The next state
entered after this is LOOP_READY, which means that the driver is ready
to process commands to send to Fibre Channel devices.
5.9.3 Fibre Channel variants of Initiator Mode Code Flow
The code flow in isp_start for Fibre Channel devices is the same as it is
for SCSI devices, but with a notable exception.
Maintained within the fibre channel specific portion of the driver soft
state structure is a distillation of the existing population of both
local loop and fabric devices. Because Loop IDs can shift on a local
loop but we wish to retain a 'constant' Target ID (see 5.9.4), this
is indexed directly via the Target ID for the command (XS_TGT(xs)).
If there is a valid entry for this Target ID, the command is started
(with the stored 'Loop ID'). If not the command is completed with
the error that is just like a SCSI Selection Timeout error.
This code is currently somewhat in transition. Some platforms to
do firmware and loop state management (as described above) at this
point. Other platforms manage this from the machine dependent layers. The
important function to watch in this respect is isp_fc_runstate (in
isp_inline.h).
5.9.4 "Target" in Fibre Channel is a fixed virtual construct
Very few systems can cope with the notion that "Target" for a disk
device can change while you're using it. But one of the properties of
for arbitrated loop is that the physical bus address for a loop member
(the AL_PA) can change depending on when and how things are inserted in
the loop.
To illustrate this, let's take an example. Let's say you start with a
loop that has 5 disks in it. At boot time, the system will likely find
them and see them in this order:
disk# Loop ID Target ID
disk0 0 0
disk1 1 1
disk2 2 2
disk3 3 3
disk4 4 4
The driver uses 'Loop ID' when it forms requests to send a comamnd to
each disk. However, it reports to NetBSD that things exist as 'Target
ID'. As you can see here, there is perfect correspondence between disk,
Loop ID and Target ID.
Let's say you add a new disk between disk2 and disk3 while things are
running. You don't really often see this, but you *could* see this where
the loop has to renegotiate, and you end up with:
disk# Loop ID Target ID
disk0 0 0
disk1 1 1
disk2 2 2
diskN 3 ?
disk3 4 ?
disk4 5 ?
Clearly, you don't want disk3 and disk4's "Target ID" to change while you're
running since currently mounted filesystems will get trashed.
What the driver is supposed to do (this is the function of isp_scan_loop),
is regenerate things such that the following then occurs:
disk# Loop ID Target ID
disk0 0 0
disk1 1 1
disk2 2 2
diskN 3 5
disk3 4 3
disk4 5 4
So, "Target" is a virtual entity that is maintained while you're running.
6. Glossary
HBA - Host Bus Adapter
SCSI - Small Computer
7. References
Various URLs of interest:
http://www.netbsd.org - NetBSD's Web Page
http://www.openbsd.org - OpenBSD's Web Page
http://www.freebsd.org - FreeBSD's Web Page
http://www.t10.org - ANSI SCSI Commitee's Web Page
(SCSI Specs)
http://www.t11.org - NCITS Device Interface Web Page
(Fibre Channel Specs)

304
sys/dev/isp/Hardware.txt Normal file
View File

@ -0,0 +1,304 @@
/* $FreeBSD$ */
Hardware that is Known To or Should Work with This Driver
0. Intro
This is not an endorsement for hardware vendors (there will be
no "where to buy" URLs here with a couple of exception). This
is simply a list of things I know work, or should work, plus
maybe a couple of notes as to what you should do to make it
work. Corrections accepted. Even better would be to send me
hardware to I can test it.
I'll put a rough range of costs in US$ that I know about. No doubt
it'll differ from your expectations.
1. HBAs
Qlogic 2100, 2102
2200, 2202, 2204
There are various suffices that indicate copper or optical
connectors, or 33 vs. 66MHz PCI bus operation. None of these
have a software impact.
Approx cost: 1K$ for a 2200
Qlogic 2300, 2312
These are the new 2-Gigabit cards. Optical only.
Approx cost: ??????
Antares P-0033, P-0034, P-0036
There many other vendors that use the Qlogic 2X00 chipset. Some older
2100 boards (not on this list) have a bug in the ROM that causes a
failure to download newer firmware that is larger than 0x7fff words.
Approx cost: 850$ for a P-0036
In general, the 2200 class chip is to be preferred.
2. Hubs
Vixel 1000
Vixel 2000
Of the two, the 1000 (7 ports, vs. 12 ports) has had fewer problems-
it's an old workhorse.
Approx cost: 1.5K$ for Vixel 1000, 2.5K$ for 2000
Gadzoox Cappellix 3000
Don't forget to use telnet to configure the Cappellix ports
to the role you're using them for- otherwise things don't
work well at all.
(cost: I have no idea... certainly less than a switch)
3. Switches
Brocade Silkworm II
Brocade 2400
(other brocades should be fine)
Especially with revision 2 or higher f/w, this is now best
of breed for fabrics or segmented loop (which Brocade
calls "QuickLoop").
For the Silkworm II, set operating mode to "Tachyon" (mode 3).
The web interace isn't good- but telnet is what I prefer anyhow.
You can't connect a Silkworm II and the other Brocades together
as E-ports to make a large fabric (at least with the f/w *I*
had for the Silkworm II).
Approx cost of a Brocade 2400 with no GBICs is about 8K$ when
I recently checked the US Government SEWP price list- no doubt
it'll be a bit more for others. I'd assume around 10K$.
ANCOR SA-8
This also is a fine switch, but you have to use a browser
with working java to manage it- which is a bit of a pain.
This also supports fabric and segmented loop.
These switches don't form E-ports with each other for a larger
fabric.
(cost: no idea)
McData (model unknown)
I tried one exactly once for 30 minutes. Seemed to work once
I added the "register FC4 types" command to the driver.
(cost: very very expensive, 40K$ plus)
4. Cables/GBICs
Multimode optical is adequate for Fibre Channel- the same cable is
used for Gigabit Ethernet.
Copper DB-9 and Copper HSS-DC connectors are also fine. Copper &&
Optical both are rated to 1.026Gbit- copper is naturally shorter
(the longest I've used is a 15meter cable but it's supposed to go
longer).
The reason to use copper instead of optical is that if step on one of
the really fat DB-9 cables you can get, it'll survive. Optical usually
dies quickly if you step on it.
Approx cost: I don't know what optical is- you can expect to pay maybe
a 100$ for a 3m copper cable.
GBICs-
I use Finisar copper and IBM Opticals.
Approx Cost: Copper GBICs are 70$ each. Opticals are twice that or more.
Vendor: (this is the one exception I'll make because it turns out to be
an incredible pain to find FC copper cabling and GBICs- the source I
use for GBICs and copper cables is http://www.scsi-cables.com)
Other:
There now is apparently a source for little connector boards
to connect to bare drives: http://www.cinonic.com.
5. Storage JBODs/RAID
JMR 4-Bay
Rinky-tink, but a solid 4 bay loop only entry model.
I paid 1000$ for mine- overprice, IMO.
JMR Fortra
I rather like this box. The blue LEDs are a very nice touch- you
can see them very clearly from 50 feet away.
I paid 2000$ for one used.
Sun A5X00
Very expensive (in my opinion) but well crafted. Has two SES
instances, so you can use the ses driver (and the example
code in /usr/share/examples) for power/thermal/slot monitoring.
Approx Cost: The last I saw for a price list item on this was 22K$
for a unpopulated (no disk drive) A5X00.
DataDirect E1000 RAID
Don't connect both SCSI and FC interfaces at the same time- a SCSI
reset will cause the DataDirect to think you want to use the SCSI
interface and a LIP on the FC interface will cause it to think you
want to use the FC interface. Use only one connector at a time so
both you and the DataDirect are sure about what you want.
Cost: I have no idea.
Veritas ServPoint
This is a software storage virtualization engine that
runs on Sparc/Solaris in target mode for frontend
and with other FC or SCSI as the backend storage. FreeBSD
has been used extensively to test it.
Cost: I have no idea.
6. Disk Drives
I have used lots of different Seagate and a few IBM drives and
typically have had few problems with them. These are the bare
drives with 40-pin SCA connectors in back. They go into the JBODs
you assemble.
Seagate does make, but I can no longer find, a little paddleboard
single drive connector that goes from DB-9 FC to the 40-pin SCA
connector- primarily for you to try and evaluate a single FC drive.
All FC-AL disk drives are dual ported (i.e., have separte 'A' and
'B' ports- which are completely separate loops). This seems to work
reasonably enough, but I haven't tested it much. It really depends
on the JBOD you put them to carry this dual port to the outside
world. The JMR boxes have it. The Sun A5X00 you have to pay for
an extra IB card to carry it out.
Approx Cost: You'll find that FC drives are the same cost if not
slightly cheaper than the equivalent Ultra3 SCSI drives.
7. Recommended Configurations
These are recommendations that are biased toward the cautious side. They
do not represent formal engineering commitments- just suggestions as to
what I would expect to work.
A. The simpletst form of a connection topology I can suggest for
a small SAN (i.e., replacement for SCSI JBOD/RAID):
HOST
2xxx <----------> Single Unit of Storage (JBOD, RAID)
This is called a PL_DA (Private Loop, Direct Attach) topology.
B. The next most simple form of a connection topology I can suggest for
a medium local SAN (where you do not plan to do dynamic insertion
and removal of devices while I/Os are active):
HOST
2xxx <----------> +--------
| Vixel |
| 1000 |
| +<---> Storage
| |
| +<---> Storage
| |
| +<---> Storage
--------
This is a Private Loop topology. Remember that this can get very unstable
if you make it too long. A good practice is to try it in a staged fashion.
It is possible with some units to "daisy chain", e.g.:
HOST
2xxx <----------> (JBOD, RAID) <--------> (JBOD, RAID)
In practice I have had poor results with these configurations. They *should*
work fine, but for both the JMR and the Sun A5X00 I tend to get LIP storms
and so the second unit just isn't seen and the loop isn't stable.
Now, this could simply be my lack of clean, newer, h/w (or, in general,
a lack of h/w), but I would recommend the use of a hub if you want to
stay with Private Loop and have more than one FC target.
You should also note this can begin to be the basis for a shared SAN
solution. For example, the above configuration can be extended to be:
HOST
2xxx <----------> +--------
| Vixel |
| 1000 |
| +<---> Storage
| |
| +<---> Storage
| |
| +<---> Storage
HOST | |
2xxx <----------> +--------
However, note that there is nothing to mediate locking of devices, and
it is also conceivable that the reboot of one host can, by causing
a LIP storm, cause problems with the I/Os from the other host.
(in other words, this topology hasn't really been made safe yet for
this driver).
D. You can repeat the topology in #B with a switch that is set to be
in segmented loop mode. This avoids LIPs propagating where you don't
want them to- and this makes for a much more reliable, if more expensive,
SAN.
E. The next level of complexity is a Switched Fabric. The following topology
is good when you start to begin to get to want more performance. Private
and Public Arbitrated Loop, while 100MB/s, is a shared medium. Direct
connections to a switch can run full-duplex at full speed.
HOST
2xxx <----------> +---------
| Brocade|
| 2400 |
| +<---> Storage
| |
| +<---> Storage
| |
| +<---> Storage
HOST | |
2xxx <----------> +---------
I would call this the best configuration available now. It can expand
substantially if you cascade switches.
There is a hard limit of about 253 devices for each Qlogic HBA- and the
fabric login policy is simplistic (log them in as you find them). If
somebody actually runs into a configuration that's larger, let me know
and I'll work on some tools that would allow you some policy choices
as to which would be interesting devices to actually connect to.