based on Solarflare SFC9000 family controllers. The driver supports jumbo
frames, transmit/receive checksum offload, TCP Segmentation Offload (TSO),
Large Receive Offload (LRO), VLAN checksum offload, VLAN TSO, and Receive Side
Scaling (RSS) using MSI-X interrupts.
This work was sponsored by Solarflare Communications, Inc.
My sincere thanks to Ben Hutchings for doing a lot of the hard work!
Sponsored by: Solarflare Communications, Inc.
MFC after: 3 weeks
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
can be enabled via the hw.mfi.msi tunable. Many mfi(4) controllers also
support MSI-X, but in testing it seems that many adapters do not work with
MSI-X but do work with MSI.
MFC after: 2 weeks
is actually broken, or needs a BIOS upgrade for 64 bit loads, but this uncovered
a couple of misplaced opcode definitions and some missing continual mbox command
cases, so might as well update them here.
maximum IP datagram size (65535 bytes) +
Ethernet header size (14 bytes) +
2 * VLAN tag size (4 bytes) [1].
[1] We need to multiply by 2 to account for the double VLAN tag
provision added in IEEE 802.1ad.
Submitted by: David Somayajulu (david.somayajulu qlogic.com)
MFC after: 4 days
for regular files. Since other file types don't write into the
buffer cache, calling ncl_flush() is almost a no-op. However, it does
clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for
directories.
MFC after: 2 weeks
directly from g7, the pcpu pointer. This guarantees correct behavior
when the thread migrates to a different CPU.
Commit message stolen from r205431. Additional testing by Peter Jeremy.
MFC after: 3 days
emits calls for them, rather than expanding them inline. Older FreeBSD
versions compile for i386 by default and as such we end up with
unresolved symbols when we build LLVM's TableGen utility as a build
tool on them. Add the functions that GCC emits here, but don't bother
to make them atomic. Such is not needed.
Submitted by: marcel
MFC after: 1 week
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.
Reviewed by: nwhitehorn (earlier version), jhb
MFC after: 3 days
layer for old KPI and KBI. New interface should be used together with
d_mmap_single cdevsw method.
Device pager can be allocated with the cdev_pager_allocate(9)
function, which takes struct cdev_pager_ops, containing
constructor/destructor and page fault handler methods supplied by
driver.
Constructor and destructor, called at the pager allocation and
deallocation time, allow the driver to handle per-object private data.
The pager handler is called to handle page fault on the vm map entry
backed by the driver pager. Driver shall return either the vm_page_t
which should be mapped, or error code (which does not cause kernel
panic anymore). The page handler interface has a placeholder to
specify the access mode causing the fault, but currently PROT_READ is
always passed there.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
change here is to ensure that when a process forks after arc4random
is seeded, the parent and child don't observe the same random sequence.
OpenBSD's fix introduces some additional overhead in the form of a
getpid() call. This could be improved upon, e.g., by setting a flag
in fork(), if it proves to be a problem.
This was discussed with secteam (simon, csjp, rwatson) in 2008, shortly
prior to my going out of town and forgetting all about it. The conclusion
was that the problem with forks is worrisome, but it doesn't appear to
have introduced an actual vulnerability for any known programs.
The only significant remaining difference between our arc4random and
OpenBSD's is in how we seed the generator in arc4_stir().
OpenBSD's version (r1.22). While some of our style changes were
indeed small improvements, being able to easily track functionality
changes in OpenBSD seems more useful.
Also fix style bugs in the FreeBSD-specific parts of this file.
No functional changes, as verified with md5.
before the nfs_decode_args() call in the new NFS client, so
that a specfied command line value won't be overwritten.
Also, modify the calculation for small values of desiredvnodes
to avoid an unusually large value or a divide by zero crash.
It seems that the default value for nm_wcommitsize is very
conservative and may need to change at some time.
PR: kern/159351
Submitted by: onwahe at gmail.com (earlier version)
Reviewed by: jhb
MFC after: 2 weeks
- Don't use a single big DMA block for all rings. Create separate
DMA area for each ring instead. Currently the following DMA
areas are created:
Event ring, standard RX ring, jumbo RX ring, RX return ring,
hardware MAC statistics and producer/consumer status area.
For Tigon II, mini RX ring and TX ring are additionally created.
- Added missing bus_dmamap_sync(9) in various TX/RX paths.
- TX ring is no longer created for Tigon 1 such that it saves more
resources on Tigon 1.
- Data sheet is not clear about alignment requirement of each ring
so use 32 bytes alignment for normal DMA area but use 64 bytes
alignment for jumbo RX ring where the extended RX descriptor
size is 64 bytes.
- For each TX/RX buffers use separate DMA tag(e.g. the size of a
DMA segment, total size of DMA segments etc).
- Tigon allows separate DMA area for event producer, RX return
producer and TX consumer which is really cool feature. This
means TX and RX path could be independently run in parallel.
However ti(4) uses a single driver lock so it's meaningless
to have separate DMA area for these producer/consumer such that
this change creates a single status DMA area.
- It seems Tigon has no limits on DMA address space and I also
don't see any problem with that but old comments in driver
indicates there could be issues on descriptors being located in
64bit region. Introduce a tunable, dev.ti.%d.dac, to disable
using 64bit DMA in driver. The default is 0 which means it would
use full 64bit DMA. If there are DMA issues, users can disable
it by setting the tunable to 0.
- Do not increase watchdog timer in ti_txeof(). Previously driver
increased the watchdog timer whenever there are queued TX frames.
- When stat ticks is set to 0, skip processing ti_stats_update(),
avoiding bus_dmamap_sync(9) and updating if_collisions counter.
- MTU does not include FCS bytes, replace it with
ETHER_VLAN_ENCAP_LEN.
With these changes, ti(4) should work on PAE environments.
Many thanks to Jay Borkenhagen for remote hardware access.
Before r215687, if some withered geom or provider could not be destroyed,
g_event thread went to sleep for 0.1s before retrying. After that change
it is just restarting immediately. r227009 made orphaned (withered) provider
to not detach immediately, but only after context switch. That made loop
inside g_event thread infinite on UP systems without PREEMPTION.
To address original problem with possible dead lock addressed by r227009
we have to fix r215687 change first, that needs some time to think and test.
have administrators control them. ti(4) provides a character
device to control various other features of driver via ioctls but
users had to write their own code to manipulate these parameters.
It seems some default values for these parameters are not optimal
on today's system but leave it as it was and let administrators
change them. The following parameters could be changed:
dev.ti.%d.rx_coal_ticks
dev.ti.%d.rx_max_coal_bds
dev.ti.%d.tx_coal_ticks
dev.ti.%d.tx_max_coal_bds
dev.ti.%d.tx_buf_ratio
dev.ti.%d.stat_ticks
The interface has to be brought down and up again before a change
takes effect.
ti(4) controller supports hardware MAC counters with additional
DMA statistics. So it's doable to export these counters via
sysctl interface. Unfortunately, these counters are cumulative
such that driver have to either send an explicit clear command to
controller after extracting them or have to maintain internal
counters to get actual changes. Neither look good to me so
counters were not exported via sysctl.
Pre-allocate the memory in device attach time. While I'm here
remove unnecessary reassignment of error variable as it was already
initialized. Also added a missing driver lock in TIIOCSETTRACE
handler.
it can be used by in-kernel consumers.
- Make kern_posix_fallocate() public.
- Use kern_posix_fadvise() and kern_posix_fallocate() to implement the
freebsd32 wrappers for the two system calls.
952 separate intent logs should be obvious in 'zpool iostat' output
1337 `zpool status -D' should tell if there are no DDT entries
References:
https://www.illumos.org/issues/952https://www.illumos.org/issues/1337
Obtained from: Illumos (issues 952, 1337; changesets 13384, 13432)
MFC after: 1 week