2001-11-27 23:08:37 +00:00
|
|
|
/*-
|
|
|
|
* Copyright (c) 2001 Michael Smith
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
|
|
|
* $FreeBSD$
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* CISS adapter driver datastructures
|
|
|
|
*/
|
|
|
|
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
typedef STAILQ_HEAD(, ciss_request) cr_qhead_t;
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/************************************************************************
|
|
|
|
* Tunable parameters
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There is no guaranteed upper bound on the number of concurrent
|
|
|
|
* commands an adapter may claim to support. Cap it at a reasonable
|
|
|
|
* value.
|
|
|
|
*/
|
2010-03-03 17:58:41 +00:00
|
|
|
#define CISS_MAX_REQUESTS 1024
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Maximum number of logical drives we support.
|
2013-01-15 14:35:35 +00:00
|
|
|
* If the controller does not indicate a maximum
|
|
|
|
* value. This is a compatibiliy value to support
|
|
|
|
* older ciss controllers (e.g. model 6i)
|
2001-11-27 23:08:37 +00:00
|
|
|
*/
|
2013-01-15 14:35:35 +00:00
|
|
|
#define CISS_MAX_LOGICAL 16
|
2001-11-27 23:08:37 +00:00
|
|
|
|
2004-04-16 23:00:01 +00:00
|
|
|
/*
|
|
|
|
* Maximum number of physical devices we support.
|
|
|
|
*/
|
|
|
|
#define CISS_MAX_PHYSICAL 1024
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/*
|
|
|
|
* Interrupt reduction can be controlled by tuning the interrupt
|
|
|
|
* coalesce delay and count paramters. The delay (in microseconds)
|
|
|
|
* defers delivery of interrupts to increase the chance of there being
|
|
|
|
* more than one completed command ready when the interrupt is
|
|
|
|
* delivered. The count expedites the delivery of the interrupt when
|
|
|
|
* the given number of commands are ready.
|
|
|
|
*
|
|
|
|
* If the delay is set to 0, interrupts are delivered immediately.
|
|
|
|
*/
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
#define CISS_INTERRUPT_COALESCE_DELAY 0
|
2001-11-27 23:08:37 +00:00
|
|
|
#define CISS_INTERRUPT_COALESCE_COUNT 16
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Heartbeat routine timeout in seconds. Note that since event
|
|
|
|
* handling is performed on a callback basis, we don't need this to
|
|
|
|
* run very often.
|
|
|
|
*/
|
|
|
|
#define CISS_HEARTBEAT_RATE 10
|
|
|
|
|
2001-12-02 06:17:16 +00:00
|
|
|
/************************************************************************
|
|
|
|
* Compatibility with older versions of FreeBSD
|
|
|
|
*/
|
|
|
|
#if __FreeBSD_version < 440001
|
|
|
|
#warning testing old-FreeBSD compat
|
|
|
|
typedef struct proc d_thread_t;
|
|
|
|
#endif
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/************************************************************************
|
|
|
|
* Driver version. Only really significant to the ACU interface.
|
|
|
|
*/
|
2001-12-02 06:17:16 +00:00
|
|
|
#define CISS_DRIVER_VERSION 20011201
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
/************************************************************************
|
|
|
|
* Driver data structures
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Each command issued to the adapter is managed by a request
|
|
|
|
* structure.
|
|
|
|
*/
|
|
|
|
struct ciss_request
|
|
|
|
{
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
STAILQ_ENTRY(ciss_request) cr_link;
|
2001-11-27 23:08:37 +00:00
|
|
|
int cr_onq; /* which queue we are on */
|
|
|
|
|
|
|
|
struct ciss_softc *cr_sc; /* controller softc */
|
|
|
|
void *cr_data; /* data buffer */
|
|
|
|
u_int32_t cr_length; /* data length */
|
|
|
|
bus_dmamap_t cr_datamap; /* DMA map for data */
|
2009-09-16 23:27:14 +00:00
|
|
|
struct ciss_command *cr_cc;
|
|
|
|
uint32_t cr_ccphys;
|
2001-11-27 23:08:37 +00:00
|
|
|
int cr_tag;
|
|
|
|
int cr_flags;
|
|
|
|
#define CISS_REQ_MAPPED (1<<0) /* data mapped */
|
|
|
|
#define CISS_REQ_SLEEP (1<<1) /* submitter sleeping */
|
|
|
|
#define CISS_REQ_POLL (1<<2) /* submitter polling */
|
|
|
|
#define CISS_REQ_DATAOUT (1<<3) /* data host->adapter */
|
|
|
|
#define CISS_REQ_DATAIN (1<<4) /* data adapter->host */
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
#define CISS_REQ_BUSY (1<<5) /* controller has req */
|
2013-02-12 16:57:20 +00:00
|
|
|
#define CISS_REQ_CCB (1<<6) /* data is ccb */
|
2004-04-16 21:03:38 +00:00
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
void (* cr_complete)(struct ciss_request *);
|
|
|
|
void *cr_private;
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
int cr_sg_tag;
|
|
|
|
#define CISS_SG_MAX ((CISS_SG_FETCH_MAX << 1) | 0x1)
|
|
|
|
#define CISS_SG_1 ((CISS_SG_FETCH_1 << 1) | 0x01)
|
|
|
|
#define CISS_SG_2 ((CISS_SG_FETCH_2 << 1) | 0x01)
|
|
|
|
#define CISS_SG_4 ((CISS_SG_FETCH_4 << 1) | 0x01)
|
|
|
|
#define CISS_SG_8 ((CISS_SG_FETCH_8 << 1) | 0x01)
|
|
|
|
#define CISS_SG_16 ((CISS_SG_FETCH_16 << 1) | 0x01)
|
|
|
|
#define CISS_SG_32 ((CISS_SG_FETCH_32 << 1) | 0x01)
|
|
|
|
#define CISS_SG_NONE ((CISS_SG_FETCH_NONE << 1) | 0x01)
|
2001-11-27 23:08:37 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The adapter command structure is defined with a zero-length
|
|
|
|
* scatter/gather list size. In practise, we want space for a
|
|
|
|
* scatter-gather list, and we also want to avoid having commands
|
|
|
|
* cross page boundaries.
|
|
|
|
*
|
2009-09-16 23:27:14 +00:00
|
|
|
* The size of the ciss_command is 52 bytes. 65 s/g elements are reserved
|
|
|
|
* to allow a max i/o size of 256k. This gives a total command size of
|
|
|
|
* 1120 bytes, including the 32 byte alignment padding. Modern controllers
|
|
|
|
* seem to saturate nicely at this value.
|
2001-11-27 23:08:37 +00:00
|
|
|
*/
|
|
|
|
|
2009-09-16 23:27:14 +00:00
|
|
|
#define CISS_MAX_SG_ELEMENTS 65
|
|
|
|
#define CISS_COMMAND_ALIGN 32
|
|
|
|
#define CISS_COMMAND_SG_LENGTH (sizeof(struct ciss_sg_entry) * CISS_MAX_SG_ELEMENTS)
|
|
|
|
#define CISS_COMMAND_ALLOC_SIZE (roundup2(sizeof(struct ciss_command) + CISS_COMMAND_SG_LENGTH, CISS_COMMAND_ALIGN))
|
Separate the parallel scsi knowledge out of the core of the XPT, and
modularize it so that new transports can be created.
Add a transport for SATA
Add a periph+protocol layer for ATA
Add a driver for AHCI-compliant hardware.
Add a maxio field to CAM so that drivers can advertise their max
I/O capability. Modify various drivers so that they are insulated
from the value of MAXPHYS.
The new ATA/SATA code supports AHCI-compliant hardware, and will override
the classic ATA driver if it is loaded as a module at boot time or compiled
into the kernel. The stack now support NCQ (tagged queueing) for increased
performance on modern SATA drives. It also supports port multipliers.
ATA drives are accessed via 'ada' device nodes. ATAPI drives are
accessed via 'cd' device nodes. They can all be enumerated and manipulated
via camcontrol, just like SCSI drives. SCSI commands are not translated to
their ATA equivalents; ATA native commands are used throughout the entire
stack, including camcontrol. See the camcontrol manpage for further
details. Testing this code may require that you update your fstab, and
possibly modify your BIOS to enable AHCI functionality, if available.
This code is very experimental at the moment. The userland ABI/API has
changed, so applications will need to be recompiled. It may change
further in the near future. The 'ada' device name may also change as
more infrastructure is completed in this project. The goal is to
eventually put all CAM busses and devices until newbus, allowing for
interesting topology and management options.
Few functional changes will be seen with existing SCSI/SAS/FC drivers,
though the userland ABI has still changed. In the future, transports
specific modules for SAS and FC may appear in order to better support
the topologies and capabilities of these technologies.
The modularization of CAM and the addition of the ATA/SATA modules is
meant to break CAM out of the mold of being specific to SCSI, letting it
grow to be a framework for arbitrary transports and protocols. It also
allows drivers to be written to support discrete hardware without
jeopardizing the stability of non-related hardware. While only an AHCI
driver is provided now, a Silicon Image driver is also in the works.
Drivers for ICH1-4, ICH5-6, PIIX, classic IDE, and any other hardware
is possible and encouraged. Help with new transports is also encouraged.
Submitted by: scottl, mav
Approved by: re
2009-07-10 08:18:08 +00:00
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/*
|
|
|
|
* Per-logical-drive data.
|
|
|
|
*/
|
2004-04-16 21:03:38 +00:00
|
|
|
struct ciss_ldrive
|
2001-11-27 23:08:37 +00:00
|
|
|
{
|
|
|
|
union ciss_device_address cl_address;
|
2004-04-16 23:00:01 +00:00
|
|
|
union ciss_device_address *cl_controller;
|
2001-11-27 23:08:37 +00:00
|
|
|
int cl_status;
|
|
|
|
#define CISS_LD_NONEXISTENT 0
|
|
|
|
#define CISS_LD_ONLINE 1
|
|
|
|
#define CISS_LD_OFFLINE 2
|
|
|
|
|
2004-04-16 23:00:01 +00:00
|
|
|
int cl_update;
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
struct ciss_bmic_id_ldrive *cl_ldrive;
|
|
|
|
struct ciss_bmic_id_lstatus *cl_lstatus;
|
2003-02-05 08:43:46 +00:00
|
|
|
struct ciss_ldrive_geometry cl_geometry;
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
char cl_name[16]; /* device name */
|
|
|
|
};
|
|
|
|
|
2004-06-21 20:18:40 +00:00
|
|
|
/*
|
|
|
|
* Per-physical-drive data
|
|
|
|
*/
|
|
|
|
struct ciss_pdrive
|
|
|
|
{
|
|
|
|
union ciss_device_address cp_address;
|
|
|
|
int cp_online;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define CISS_PHYSICAL_SHIFT 5
|
|
|
|
#define CISS_PHYSICAL_BASE (1 << CISS_PHYSICAL_SHIFT)
|
2009-09-16 22:52:20 +00:00
|
|
|
#define CISS_MAX_PHYSTGT 256
|
2004-06-21 20:18:40 +00:00
|
|
|
|
|
|
|
#define CISS_IS_PHYSICAL(bus) (bus >= CISS_PHYSICAL_BASE)
|
|
|
|
#define CISS_CAM_TO_PBUS(bus) (bus - CISS_PHYSICAL_BASE)
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/*
|
|
|
|
* Per-adapter data
|
|
|
|
*/
|
2004-04-16 21:03:38 +00:00
|
|
|
struct ciss_softc
|
2001-11-27 23:08:37 +00:00
|
|
|
{
|
|
|
|
/* bus connections */
|
|
|
|
device_t ciss_dev; /* bus attachment */
|
2004-06-16 09:47:26 +00:00
|
|
|
struct cdev *ciss_dev_t; /* control device */
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
struct resource *ciss_regs_resource; /* register interface window */
|
|
|
|
int ciss_regs_rid; /* resource ID */
|
|
|
|
bus_space_handle_t ciss_regs_bhandle; /* bus space handle */
|
|
|
|
bus_space_tag_t ciss_regs_btag; /* bus space tag */
|
|
|
|
|
|
|
|
struct resource *ciss_cfg_resource; /* config struct interface window */
|
|
|
|
int ciss_cfg_rid; /* resource ID */
|
|
|
|
struct ciss_config_table *ciss_cfg; /* config table in adapter memory */
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
struct ciss_perf_config *ciss_perf; /* config table for the performant */
|
2001-11-27 23:08:37 +00:00
|
|
|
struct ciss_bmic_id_table *ciss_id; /* ID table in host memory */
|
|
|
|
u_int32_t ciss_heartbeat; /* last heartbeat value */
|
|
|
|
int ciss_heart_attack; /* number of times we have seen this value */
|
|
|
|
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
int ciss_msi;
|
2001-11-27 23:08:37 +00:00
|
|
|
struct resource *ciss_irq_resource; /* interrupt */
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
int ciss_irq_rid[CISS_MSI_COUNT]; /* resource ID */
|
2001-11-27 23:08:37 +00:00
|
|
|
void *ciss_intr; /* interrupt handle */
|
|
|
|
|
|
|
|
bus_dma_tag_t ciss_parent_dmat; /* parent DMA tag */
|
|
|
|
bus_dma_tag_t ciss_buffer_dmat; /* data buffer/command DMA tag */
|
|
|
|
|
|
|
|
u_int32_t ciss_interrupt_mask; /* controller interrupt mask bits */
|
|
|
|
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
uint64_t *ciss_reply;
|
|
|
|
int ciss_cycle;
|
|
|
|
int ciss_rqidx;
|
|
|
|
bus_dma_tag_t ciss_reply_dmat;
|
|
|
|
bus_dmamap_t ciss_reply_map;
|
|
|
|
uint32_t ciss_reply_phys;
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
int ciss_max_requests;
|
|
|
|
struct ciss_request ciss_request[CISS_MAX_REQUESTS]; /* requests */
|
|
|
|
void *ciss_command; /* command structures */
|
|
|
|
bus_dma_tag_t ciss_command_dmat; /* command DMA tag */
|
|
|
|
bus_dmamap_t ciss_command_map; /* command DMA map */
|
|
|
|
u_int32_t ciss_command_phys; /* command array base address */
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
cr_qhead_t ciss_free; /* requests available for reuse */
|
|
|
|
cr_qhead_t ciss_notify; /* requests which are defered for processing */
|
2004-04-16 23:00:01 +00:00
|
|
|
struct proc *ciss_notify_thread;
|
2001-11-27 23:08:37 +00:00
|
|
|
|
2007-05-01 05:13:15 +00:00
|
|
|
struct callout ciss_periodic; /* periodic event handling */
|
2001-11-27 23:08:37 +00:00
|
|
|
struct ciss_request *ciss_periodic_notify; /* notify callback request */
|
|
|
|
|
2007-05-01 05:13:15 +00:00
|
|
|
struct mtx ciss_mtx;
|
2004-04-16 23:00:01 +00:00
|
|
|
struct ciss_ldrive **ciss_logical;
|
2004-06-21 20:18:40 +00:00
|
|
|
struct ciss_pdrive **ciss_physical;
|
2004-04-16 23:00:01 +00:00
|
|
|
union ciss_device_address *ciss_controllers; /* controller address */
|
|
|
|
int ciss_max_bus_number; /* maximum bus number */
|
2004-06-21 20:18:40 +00:00
|
|
|
int ciss_max_logical_bus;
|
|
|
|
int ciss_max_physical_bus;
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
struct cam_devq *ciss_cam_devq;
|
2004-04-16 23:00:01 +00:00
|
|
|
struct cam_sim **ciss_cam_sim;
|
2001-11-27 23:08:37 +00:00
|
|
|
|
2005-04-19 06:11:16 +00:00
|
|
|
int ciss_soft_reset;
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
int ciss_flags;
|
|
|
|
#define CISS_FLAG_NOTIFY_OK (1<<0) /* notify command running OK */
|
|
|
|
#define CISS_FLAG_CONTROL_OPEN (1<<1) /* control device is open */
|
|
|
|
#define CISS_FLAG_ABORTING (1<<2) /* driver is going away */
|
|
|
|
#define CISS_FLAG_RUNNING (1<<3) /* driver is running (interrupts usable) */
|
2010-03-03 17:58:41 +00:00
|
|
|
#define CISS_FLAG_BUSY (1<<4) /* no free commands */
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
#define CISS_FLAG_FAKE_SYNCH (1<<16) /* needs SYNCHRONISE_CACHE faked */
|
|
|
|
#define CISS_FLAG_BMIC_ABORT (1<<17) /* use BMIC command to abort Notify on Event */
|
2004-04-16 23:00:01 +00:00
|
|
|
#define CISS_FLAG_THREAD_SHUT (1<<20) /* shutdown the kthread */
|
2001-11-27 23:08:37 +00:00
|
|
|
|
|
|
|
struct ciss_qstat ciss_qstat[CISSQ_COUNT]; /* queue statistics */
|
|
|
|
};
|
|
|
|
|
|
|
|
/************************************************************************
|
|
|
|
* Debugging/diagnostic output.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Debugging levels:
|
|
|
|
* 0 - quiet, only emit warnings
|
|
|
|
* 1 - talkative, log major events, but nothing on the I/O path
|
|
|
|
* 2 - noisy, log events on the I/O path
|
|
|
|
* 3 - extremely noisy, log items in loops
|
|
|
|
*/
|
|
|
|
#ifdef CISS_DEBUG
|
|
|
|
# define debug(level, fmt, args...) \
|
|
|
|
do { \
|
2001-12-10 08:09:49 +00:00
|
|
|
if (level <= CISS_DEBUG) printf("%s: " fmt "\n", __func__ , ##args); \
|
2001-11-27 23:08:37 +00:00
|
|
|
} while(0)
|
|
|
|
# define debug_called(level) \
|
|
|
|
do { \
|
2001-12-10 08:09:49 +00:00
|
|
|
if (level <= CISS_DEBUG) printf("%s: called\n", __func__); \
|
2001-11-27 23:08:37 +00:00
|
|
|
} while(0)
|
2002-10-27 12:27:04 +00:00
|
|
|
# define debug_struct(s) printf(" SIZE %s: %zu\n", #s, sizeof(struct s))
|
|
|
|
# define debug_union(s) printf(" SIZE %s: %zu\n", #s, sizeof(union s))
|
|
|
|
# define debug_type(s) printf(" SIZE %s: %zu\n", #s, sizeof(s))
|
2001-11-27 23:08:37 +00:00
|
|
|
# define debug_field(s, f) printf(" OFFSET %s.%s: %d\n", #s, #f, ((int)&(((struct s *)0)->f)))
|
2002-10-27 12:27:04 +00:00
|
|
|
# define debug_const(c) printf(" CONST %s %jd/0x%jx\n", #c, (intmax_t)c, (intmax_t)c);
|
2001-11-27 23:08:37 +00:00
|
|
|
#else
|
|
|
|
# define debug(level, fmt, args...)
|
|
|
|
# define debug_called(level)
|
|
|
|
# define debug_struct(s)
|
|
|
|
# define debug_union(s)
|
|
|
|
# define debug_type(s)
|
|
|
|
# define debug_field(s, f)
|
|
|
|
# define debug_const(c)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#define ciss_printf(sc, fmt, args...) device_printf(sc->ciss_dev, fmt , ##args)
|
|
|
|
|
|
|
|
/************************************************************************
|
|
|
|
* Queue primitives
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define CISSQ_ADD(sc, qname) \
|
|
|
|
do { \
|
|
|
|
struct ciss_qstat *qs = &(sc)->ciss_qstat[qname]; \
|
|
|
|
\
|
|
|
|
qs->q_length++; \
|
|
|
|
if (qs->q_length > qs->q_max) \
|
|
|
|
qs->q_max = qs->q_length; \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
#define CISSQ_REMOVE(sc, qname) (sc)->ciss_qstat[qname].q_length--
|
|
|
|
#define CISSQ_INIT(sc, qname) \
|
|
|
|
do { \
|
|
|
|
sc->ciss_qstat[qname].q_length = 0; \
|
|
|
|
sc->ciss_qstat[qname].q_max = 0; \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
|
|
|
|
#define CISSQ_REQUEST_QUEUE(name, index) \
|
|
|
|
static __inline void \
|
|
|
|
ciss_initq_ ## name (struct ciss_softc *sc) \
|
|
|
|
{ \
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
STAILQ_INIT(&sc->ciss_ ## name); \
|
2001-11-27 23:08:37 +00:00
|
|
|
CISSQ_INIT(sc, index); \
|
|
|
|
} \
|
|
|
|
static __inline void \
|
|
|
|
ciss_enqueue_ ## name (struct ciss_request *cr) \
|
|
|
|
{ \
|
|
|
|
\
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
STAILQ_INSERT_TAIL(&cr->cr_sc->ciss_ ## name, cr, cr_link); \
|
2001-11-27 23:08:37 +00:00
|
|
|
CISSQ_ADD(cr->cr_sc, index); \
|
|
|
|
cr->cr_onq = index; \
|
|
|
|
} \
|
|
|
|
static __inline void \
|
|
|
|
ciss_requeue_ ## name (struct ciss_request *cr) \
|
|
|
|
{ \
|
|
|
|
\
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
STAILQ_INSERT_HEAD(&cr->cr_sc->ciss_ ## name, cr, cr_link); \
|
2001-11-27 23:08:37 +00:00
|
|
|
CISSQ_ADD(cr->cr_sc, index); \
|
|
|
|
cr->cr_onq = index; \
|
|
|
|
} \
|
|
|
|
static __inline struct ciss_request * \
|
|
|
|
ciss_dequeue_ ## name (struct ciss_softc *sc) \
|
|
|
|
{ \
|
|
|
|
struct ciss_request *cr; \
|
|
|
|
\
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
if ((cr = STAILQ_FIRST(&sc->ciss_ ## name)) != NULL) { \
|
|
|
|
STAILQ_REMOVE_HEAD(&sc->ciss_ ## name, cr_link); \
|
2001-11-27 23:08:37 +00:00
|
|
|
CISSQ_REMOVE(sc, index); \
|
|
|
|
cr->cr_onq = -1; \
|
|
|
|
} \
|
|
|
|
return(cr); \
|
|
|
|
} \
|
|
|
|
struct hack
|
|
|
|
|
|
|
|
CISSQ_REQUEST_QUEUE(free, CISSQ_FREE);
|
2004-04-16 23:00:01 +00:00
|
|
|
CISSQ_REQUEST_QUEUE(notify, CISSQ_NOTIFY);
|
2001-11-27 23:08:37 +00:00
|
|
|
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
static __inline void
|
|
|
|
ciss_enqueue_complete(struct ciss_request *ac, cr_qhead_t *head)
|
|
|
|
{
|
|
|
|
|
|
|
|
STAILQ_INSERT_TAIL(head, ac, cr_link);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __inline struct ciss_request *
|
|
|
|
ciss_dequeue_complete(struct ciss_softc *sc, cr_qhead_t *head)
|
|
|
|
{
|
|
|
|
struct ciss_request *ac;
|
|
|
|
|
|
|
|
if ((ac = STAILQ_FIRST(head)) != NULL)
|
|
|
|
STAILQ_REMOVE_HEAD(head, cr_link);
|
|
|
|
return(ac);
|
|
|
|
}
|
|
|
|
|
2001-11-27 23:08:37 +00:00
|
|
|
/********************************************************************************
|
|
|
|
* space-fill a character string
|
|
|
|
*/
|
|
|
|
static __inline void
|
|
|
|
padstr(char *targ, const char *src, int len)
|
|
|
|
{
|
|
|
|
while (len-- > 0) {
|
|
|
|
if (*src != 0) {
|
|
|
|
*targ++ = *src++;
|
|
|
|
} else {
|
|
|
|
*targ++ = ' ';
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
A number of significant enhancements to the ciss driver:
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
2008-07-11 21:20:51 +00:00
|
|
|
|
|
|
|
#define ciss_report_request(a, b, c) \
|
|
|
|
_ciss_report_request(a, b, c, __FUNCTION__)
|