sfxge: Separate software Tx queue limit for non-TCP traffic
Add separate software Tx queue limit for non-TCP traffic to make total
limit higher and avoid local drops of TCP packets because of no
backpressure.
There is no point to make non-TCP limit high since without backpressure
UDP stream easily overflows any sensible limit.
Split early drops statistics since it is better to have separate counter
for each drop reason to make it unabmiguous.
Add software Tx queue high watermark. The information is very useful to
understand how big queues grow under traffic load.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Pass correct address to free allocated memory in the case of load error
Most likely is was just memory leak on the error handling path since
typically efsys_mem_t is filled in by zeros on allocation.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Remove unused esm_size member of the efsys_mem_t structure
esm_size is not even initialized properly when memory is allocated.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Do not bzero() DMA allocated memory once again
sfxge_dma_alloc() calls bus_dmamem_alloc() with BUS_DMA_ZERO flag, so
allocated memory is already filled in by zeros
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Add evq argument to sfxge_tx_qcomplete()
It removes necessity to get evq pointer by its index in soft context.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Remove extra cache-line alignment and reorder sfxge_evq_t
Remove the first member alignment to cacheline since it is nop.
Use __aligned() for the whole structure to make sure that the structure
size is cacheline aligned.
Remove lock alignment to make the structure smaller and fit all members
used on event queue processing into one cacheline (128 bytes) on x86-64.
The lock is obtained as well from different context when event queue
statistics are retrived from sysctl context, but it is infrequent.
Reorder members to avoid padding and go in usage order on event
processing.
As the result all structure members used on event queue processing fit
into exactly one cacheline (128 byte) now.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Move txq->next pointer to part writable on completion path
In fact the pointer is used only if more than one TXQ is processed in
one interrupt.
It is used (read-write) on completion path only.
Also it makes the first part of the structure smaller and it fits now
into one 128byte cache line. So, TXQ structure becomes 128 bytes
smaller.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
sfxge: Change sfxge_ev_qpoll() proto to avoid EVQ pointers array access
It was the only place on data path where sc->evq array is accessed.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
Some cleanup for sfxge.4
Use standard mdoc macros instead of pure roff, fix some other mdoc
usage,
make the style consistent, and fix some grammar issues.
Approved by: hrs (mentor)
Support tunable to control Tx deferred packet list limits
Also increase default for Tx queue get-list limit.
Too small limit results in TCP packets drops especiall when many
streams are running simultaneously.
Put list may be kept small enough since it is just a temporary
location if transmit function can't get Tx queue lock.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
The patch allows to check state of the software Tx queues at run time.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Update SolarFlare driver manual page with new tunables.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Make size of Tx and Rx rings configurable
Required size of event queue is calculated now.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
cleanup: code style fixes
Remove trailing whitespaces and tabs.
Enclose value in return statements in parentheses.
Use tabs after #define.
Do not skip comparison with 0/NULL in boolean expressions.
Submitted by: Andrew Rybchenko <arybchenko at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
sfxge: limit software Tx queue size.
Previous implementation limits put queue size only (when Tx lock can't
be acquired), but get queue may grow unboundedly which results in mbuf
pools exhaustion and latency growth.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Add counter for Tx errors returned from if_transmit.
Submitted by: Boris Misenov <Boris.Misenov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Return error when packet is dropped because of link down.
Submitted by: Boris Misenov <Boris.Misenov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
------------------------------------------------------------------------
r279336 | ken | 2015-02-26 15:22:06 -0700 (Thu, 26 Feb 2015) | 12 lines
Add FreeBSD stable/10 version checks for the availability of the
CDAI_FLAG_NONE advanced information CCB flag.
Support for the flag was merged to stable/10 in r279329, and the
__FreeBSD_version in stable/10 was bumped to 1001510.
Check for that version in the mps(4) and mpr(4) drivers when determining
whether to use the flag.
Sponsored by: Spectra Logic
MFC after: 3 days
------------------------------------------------------------------------
------------------------------------------------------------------------
r279375 | ken | 2015-02-27 14:35:36 -0700 (Fri, 27 Feb 2015) | 26 lines
Fix I/O size calculation for pass(4) driver requests and add latency
tracking.
It is important to subtract the residual from the requested
transfer size to see how much data was actually transferred. With
tape drives in particular, it is common to request more data than is
returned.
Also, add I/O latency tracking for CAM requests issued by
cam_periph_runccb().
If the caller supplies a struct devstat, and the I/O is a SCSI or
ATA I/O, we will track the elapsed time to provide I/O latency
statistics for the request.
sys/cam/scsi/cam_periph.c:
In cam_periph_runccb(), subtract the residual when reporting I/O
totals to devstat(9) for SCSI and ATA passthrough requests.
In cam_periph_runccb(), grab the I/O start time and supply
the start time to devstat_end_transaction() so that it can
calculate the elapsed I/O time.
Sponsored by: Spectra Logic
MFC after: 1 week
------------------------------------------------------------------------
Sponsored by: Spectra Logic
This includes these changes: 279219, 279229, 279261, 279534, 279570,
280230, 280231.
In addition, bump __FreeBSD_version for the addition of the new
mtio(4) / sa(4) ioctls.
Thanks to Dan Langille, Harald Schmalzbauer and Rudolf Cejka for spending
a significant amount of time and effort testing these changes.
------------------------------------------------------------------------
r279219 | ken | 2015-02-23 14:59:30 -0700 (Mon, 23 Feb 2015) | 282 lines
Significant upgrades to sa(4) and mt(1).
The primary focus of these changes is to modernize FreeBSD's
tape infrastructure so that we can take advantage of some of the
features of modern tape drives and allow support for LTFS.
Significant changes and new features include:
o sa(4) driver status and parameter information is now exported via an
XML structure. This will allow for changes and improvements later
on that will not break userland applications. The old MTIOCGET
status ioctl remains, so applications using the existing interface
will not break.
o 'mt status' now reports drive-reported tape position information
as well as the previously available calculated tape position
information. These numbers will be different at times, because
the drive-reported block numbers are relative to BOP (Beginning
of Partition), but the block numbers calculated previously via
sa(4) (and still provided) are relative to the last filemark.
Both numbers are now provided. 'mt status' now also shows the
drive INQUIRY information, serial number and any position flags
(BOP, EOT, etc.) provided with the tape position information.
'mt status -v' adds information on the maximum possible I/O size,
and the underlying values used to calculate it.
o The extra sa(4) /dev entries (/dev/saN.[0-3]) have been removed.
The extra devices were originally added as place holders for
density-specific device nodes. Some OSes (NetBSD, NetApp's OnTap
and Solaris) have had device nodes that, when you write to them,
will automatically select a given density for particular tape drives.
This is a convenient way of switching densities, but it was never
implemented in FreeBSD. Only the device nodes were there, and that
sometimes confused users.
For modern tape devices, the density is generally not selectable
(e.g. with LTO) or defaults to the highest availble density when
the tape is rewritten from BOT (e.g. TS11X0). So, for most users,
density selection won't be necessary. If they do need to select
the density, it is easy enough to use 'mt density' to change it.
o Protection information is now supported. This is either a
Reed-Solomon CRC or CRC32 that is included at the end of each block
read and written. On write, the tape drive verifies the CRC, and
on read, the tape drive provides a CRC for the userland application
to verify.
o New, extensible tape driver parameter get/set interface.
o Density reporting information. For drives that support it,
'mt getdensity' will show detailed information on what formats the
tape drive supports, and what formats the tape drive supports.
o Some mt(1) functionality moved into a new mt(3) library so that
external applications can reuse the code.
o The new mt(3) library includes helper routines to aid in parsing
the XML output of the sa(4) driver, and build a tree of driver
metadata.
o Support for the MTLOAD (load a tape in the drive) and MTWEOFI
(write filemark immediate) ioctls needed by IBM's LTFS
implementation.
o Improve device departure behavior for the sa(4) driver. The previous
implementation led to hangs when the device was open.
o This has been tested on the following types of drives:
IBM TS1150
IBM TS1140
IBM LTO-6
IBM LTO-5
HP LTO-2
Seagate DDS-4
Quantum DLT-4000
Exabyte 8505
Sony DDS-2
contrib/groff/tmac/doc-syms,
share/mk/bsd.libnames.mk,
lib/Makefile,
Add libmt.
lib/libmt/Makefile,
lib/libmt/mt.3,
lib/libmt/mtlib.c,
lib/libmt/mtlib.h,
New mt(3) library that contains functions moved from mt(1) and
new functions needed to interact with the updated sa(4) driver.
This includes XML parser helper functions that application writers
can use when writing code to query tape parameters.
rescue/rescue/Makefile:
Add -lmt to CRUNCH_LIBS.
src/share/man/man4/mtio.4
Clarify this man page a bit, and since it contains what is
essentially the mtio.h header file, add new ioctls and structure
definitions from mtio.h.
src/share/man/man4/sa.4
Update BUGS and maintainer section.
sys/cam/scsi/scsi_all.c,
sys/cam/scsi/scsi_all.h:
Add SCSI SECURITY PROTOCOL IN/OUT CDB definitions and CDB building
functions.
sys/cam/scsi/scsi_sa.c
sys/cam/scsi/scsi_sa.h
Many tape driver changes, largely outlined above.
Increase the sa(4) driver read/write timeout from 4 to 32
minutes. This is based on the recommended values for IBM LTO
5/6 drives. This may also avoid timeouts for other tape
hardware that can take a long time to do retries and error
recovery. Longer term, a better way to handle this is to ask
the drive for recommended timeout values using the REPORT
SUPPORTED OPCODES command. Modern IBM and Oracle tape drives
at least support that command, and it would allow for more
accurate timeout values.
Add XML status generation. This is done with a series of
macros to eliminate as much duplicate code as possible. The
new XML-based status values are reported through the new
MTIOCEXTGET ioctl.
Add XML driver parameter reporting, using the new MTIOCPARAMGET
ioctl.
Add a new driver parameter setting interface, using the new
MTIOCPARAMSET and MTIOCSETLIST ioctls.
Add a new MTIOCRBLIM ioctl to get block limits information.
Add CCB/CDB building routines scsi_locate_16, scsi_locate_10,
and scsi_read_position_10().
scsi_locate_10 implements the LOCATE command, as does the
existing scsi_set_position() command. It just supports
additional arguments and features. If/when we figure out a
good way to provide backward compatibility for older
applications using the old function API, we can just revamp
scsi_set_position(). The same goes for
scsi_read_position_10() and the existing scsi_read_position()
function.
Revamp sasetpos() to take the new mtlocate structure as an
argument. It now will use either scsi_locate_10() or
scsi_locate_16(), depending upon the arguments the user
supplies. As before, once we change position we don't have a
clear idea of what the current logical position of the tape
drive is.
For tape drives that support long form position data, we
read the current position and store that for later reporting
after changing the position. This should help applications
like Bacula speed tape access under FreeBSD once they are
modified to support the new ioctls.
Add a new quirk, SA_QUIRK_NO_LONG_POS, that is set for all
drives that report SCSI-2 or older, as well as drives that
report an Illegal Request type error for READ POSITION with
the long format. So we should automatically detect drives
that don't support the long form and stop asking for it after
an initial try.
Add a partition number to the sa(4) softc.
Improve device departure handling. The previous implementation
led to hangs when the device was open.
If an application had the sa(4) driver open, and attempted to
close it after it went away, the cam_periph_release() call in
saclose() would cause the periph to get destroyed because that
was the last reference to it. Because destroy_dev() was
called from the sa(4) driver's cleanup routine (sacleanup()),
and would block waiting for the close to happen, a deadlock
would result.
So instead of calling destroy_dev() from the cleanup routine,
call destroy_dev_sched_cb() from saoninvalidate() and wait for
the callback.
Acquire a reference for devfs in saregister(), and release it
in the new sadevgonecb() routine when all devfs devices for
the particular sa(4) driver instance are gone.
Add a new function, sasetupdev(), to centralize setting
per-instance devfs device parameters instead of repeating the
code in saregister().
Add an open count to the softc, so we know how many
peripheral driver references are a result of open
sessions.
Add the D_TRACKCLOSE flag to the cdevsw flags so
that we get a 1:1 mapping of open to close calls
instead of a N:1 mapping.
This should be a no-op for everything except the
control device, since we don't allow more than one
open on non-control devices.
However, since we do allow multiple opens on the
control device, the combination of the open count
and the D_TRACKCLOSE flag should result in an
accurate peripheral driver reference count, and an
accurate open count.
The accurate open count allows us to release all
peripheral driver references that are the result
of open contexts once we get the callback from devfs.
sys/sys/mtio.h:
Add a number of new mt(4) ioctls and the requisite data
structures. None of the existing interfaces been removed
or changed.
This includes definitions for the following new ioctls:
MTIOCRBLIM /* get block limits */
MTIOCEXTLOCATE /* seek to position */
MTIOCEXTGET /* get tape status */
MTIOCPARAMGET /* get tape params */
MTIOCPARAMSET /* set tape params */
MTIOCSETLIST /* set N params */
usr.bin/mt/Makefile:
mt(1) now depends on libmt, libsbuf and libbsdxml.
usr.bin/mt/mt.1:
Document new mt(1) features and subcommands.
usr.bin/mt/mt.c:
Implement support for mt(1) subcommands that need to
use getopt(3) for their arguments.
Implement a new 'mt status' command to replace the old
'mt status' command. The old status command has been
renamed 'ostatus'.
The new status function uses the MTIOCEXTGET ioctl, and
therefore parses the XML data to determine drive status.
The -x argument to 'mt status' allows the user to dump out
the raw XML reported by the kernel.
The new status display is mostly the same as the old status
display, except that it doesn't print the redundant density
mode information, and it does print the current partition
number and position flags.
Add a new command, 'mt locate', that will supersede the
old 'mt setspos' and 'mt sethpos' commands. 'mt locate'
implements all of the functionality of the MTIOCEXTLOCATE
ioctl, and allows the user to change the logical position
of the tape drive in a number of ways. (Partition,
block number, file number, set mark number, end of data.)
The immediate bit and the explicit address bits are
implemented, but not documented in the man page.
Add a new 'mt weofi' command to use the new MTWEOFI ioctl.
This allows the user to ask the drive to write a filemark
without waiting around for the operation to complete.
Add a new 'mt getdensity' command that gets the XML-based
tape drive density report from the sa(4) driver and displays
it. This uses the SCSI REPORT DENSITY SUPPORT command
to get comprehensive information from the tape drive about
what formats it is able to read and write.
Add a new 'mt protect' command that allows getting and setting
tape drive protection information. The protection information
is a CRC tacked on to the end of every read/write from and to
the tape drive.
Sponsored by: Spectra Logic
MFC after: 1 month
------------------------------------------------------------------------
------------------------------------------------------------------------
r279229 | ken | 2015-02-23 22:43:16 -0700 (Mon, 23 Feb 2015) | 5 lines
Fix printf format warnings on sparc64 and mips.
Sponsored by: Spectra Logic
MFC after: 1 month
------------------------------------------------------------------------
------------------------------------------------------------------------
r279261 | ken | 2015-02-24 21:30:23 -0700 (Tue, 24 Feb 2015) | 23 lines
Fix several problems found by Coverity.
lib/libmt/mtlib.c:
In mt_start_element(), make sure we don't overflow the
cur_sb array. CID 1271325
usr.bin/mt/mt.c:
In main(), bzero the mt_com structure so that we aren't
using any uninitialized stack variables. CID 1271319
In mt_param(), only allow one -s and one -p argument. This
will prevent a memory leak caused by overwriting the
param_name and/or param_value variables. CID 1271320 and
CID 1271322
To make things simpler in mt_param(), make sure there
there is only one exit path for the function. Make sure
the arguments are explicitly freed.
Sponsored by: Spectra Logic
Pointed out by: emaste
MFC after: 1 month
------------------------------------------------------------------------
------------------------------------------------------------------------
r279534 | ken | 2015-03-02 11:09:49 -0700 (Mon, 02 Mar 2015) | 18 lines
Change the sa(4) driver to check for long position support on
SCSI-2 devices.
Some older tape devices claim to be SCSI-2, but actually do support
long position information. (Long position information includes
the current file mark.) For example, the COMPAQ SuperDLT1.
So we now only disable the check on SCSI-1 and older devices.
sys/cam/scsi/scsi_sa.c:
In saregister(), only disable fetching long position
information on SCSI-1 and older drives. Update the
comment to explain why.
Confirmed by: dvl
Sponsored by: Spectra Logic
MFC after: 3 weeks
------------------------------------------------------------------------
------------------------------------------------------------------------
r279570 | ken | 2015-03-03 15:49:07 -0700 (Tue, 03 Mar 2015) | 21 lines
Add density code for DAT-72, and notes on DAT-160.
As it turns out, the density code for DAT-160 (0x48) is the same
as for SDLT220. Since the SDLT values are already in the table,
we will leave them in place.
Thanks to Harald Schmalzbauer for confirming the DAT-72 density code.
lib/libmt/mtlib.c:
Add DAT-72 density code, and commented out DAT-160 density
code. Explain why DAT-160 is commented out. Add notes
explaining where the bpi values for these formats came from.
usr.bin/mt/mt.1:
Add DAT-72 density code, and add a note explaining that
the SDLTTapeI(110) density code (0x48) is the same as
DAT-160.
Sponsored by: Spectra Logic
MFC after: 3 weeks
------------------------------------------------------------------------
------------------------------------------------------------------------
r280230 | ken | 2015-03-18 14:52:34 -0600 (Wed, 18 Mar 2015) | 25 lines
Fix a couple of problems in the sa(4) media type reports.
The only drives I have discovered so far that support medium type
reports are newer HP LTO (LTO-5 and LTO-6) drives. IBM drives
only support the density reports.
sys/cam/scsi/scsi_sa.h:
The number of possible density codes in the medium type
report is 9, not 8. This caused problems parsing all of
the medium type report after this point in the structure.
usr.bin/mt/mt.c:
Run the density codes returned in the medium type report
through denstostring(), just like the primary and secondary
density codes in the density report. This will print the
density code in hex, and give a text description if it
is available.
Thanks to Rudolf Cejka for doing extensive testing with HP LTO drives
and Bacula and discovering these problems.
Tested by: Rudolf Cejka <cejkar at fit.vutbr.cz>
Sponsored by: Spectra Logic
MFC after: 4 days
------------------------------------------------------------------------
------------------------------------------------------------------------
r280231 | ken | 2015-03-18 14:54:54 -0600 (Wed, 18 Mar 2015) | 16 lines
Improve the mt(1) rblim display.
The granularity reported by READ BLOCK LIMITS is an exponent, not a
byte value. So a granularity of 0 means 2^0, or 1 byte. A
granularity of 1 means 2^1, or 2 bytes.
Print out the individual block limits on separate lines to improve
readability and avoid exceeding 80 columns.
usr.bin/mt/mt.c:
Fix and improve the 'mt rblim' output. Add a MT_PLURAL()
macro so we can print "byte" or "bytes" as appropriate.
Sponsored by: Spectra Logic
MFC after: 4 days
------------------------------------------------------------------------
Sponsored by: Spectra Logic
When inserting new entry into the address map, ensure that not only
next entry does not intersect with the tail of the new entry, but also
that previous entry is also before new entry start.
following MFC as well:
MFC 278251:
Honor the following flags for items that can be conditionalized out of the
build/install without disrupting other dependent services (see r278249, et
al):
- MK_LOCATE
- MK_MAN
- MK_NLS
- MK_OPENSSL
- MK_PKGBOOTSTRAP
- MK_SENDMAIL
Additional flags need to be handled in etc/Makefile, but it requires
refactoring the relevant scripts in etc/rc.d/*
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
r278135 (by amdmi3):
- Remove more files when MK_USB == no
Reviewed by: ngie
Approved by: ngie
Differential Revision: D1600
r278202:
Clean up more usb related files when MK_USB == no when dealing with
manpages, libraries, and binaries
Sponsored by: EMC / Isilon Storage Division
Release 2015b - 2015-03-19 23:28:11 -0700
Changes affecting future time stamps
Mongolia will start observing DST again this year, from the last
Saturday in March at 02:00 to the last Saturday in September at 00:00.
(Thanks to Ganbold Tsagaankhuu.)
Palestine will start DST on March 28, not March 27. Also,
correct the fall 2014 transition from September 26 to October 24.
Adjust future predictions accordingly. (Thanks to Steffen Thorsen.)
Changes affecting past time stamps
The 1982 zone shift in Pacific/Easter has been corrected, fixing a 2015a
regression. (Thanks to Stuart Bishop for reporting the problem.)
Some more zones have been turned into links, when they differed
from existing zones only for older time stamps. As usual,
these changes affect UTC offsets in pre-1970 time stamps only.
Their old contents have been moved to the 'backzone' file.
The affected zones are: America/Antigua, America/Cayman,
Pacific/Midway, and Pacific/Saipan.
Changes affecting time zone abbreviations
Correct the 1992-2010 DST abbreviation in Volgograd from "MSK" to "MSD".
(Thanks to Hank W.)
msun: use previously ignored "in" value.
This fixes evaluation of exceptional values in scalblnl().
While here, simplify the code as suggested by Bruce Evans.
Reported by: clang static analyzer
setmode(3): Make sure that setmode sets errno on failure.
Our man page already documented this partially but now
we provide more consistent behavior.
PR: 136669
Obtained from: NetBSD (CVS rev. 1.31, 1.33)
Relnotes: yes
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE. Virtual disks backed by files won't report this capability.
Relnotes: yes
An update for the i915 GPU driver, which brings the code up to Linux
commit 4d93914ae3db4a897ead4b.
MFC r277959 (by adrian):
Fix backlight for ivybridge based laptops (and whatever else comes through
this codepath.)
MFC r278146:
Do not attach to the unsupported chipsets, unless magic tunable is
frobbed.
MFC r278147, r278148:
Fix sign for the error code returned from the driver-specific code.
MFC r278152:
Do not access gmbus_ports array past its end.
MFC r278159 (by emaste):
Remove duplicate intel_fbc_enabled prototype.