commands to an ATA device attached via a SCSI control.
sys/cam/scsi/scsi_all.c:
- Added scsi_ata_identify, scsi_ata_trim
Which use ATA Pass-Through to send commands to the attached disk.
sys/cam/scsi/scsi_all.h:
- Added defines for all missing ATA Pass-Through commands values.
- Added scsi_ata_identify, scsi_ata_trim methods used in ATA TRIM
support.
- Added scsi_vpd_logical_block_prov structure used when querying for
the supported sizes UNMAP commands.
- Added scsi_vpd_block_limits structure used when querying for the
supported sizes of the UNMAP command.
Reviewed by: mav
Approved by: pjd (mentor)
MFC after: 2 weeks
size of a delete request sent to the providing device performed by g_dev_ioctl.
This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.
Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.
In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E
For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s
Reviewed by: mav
Approved by: pjd (mentor)
MFC after: 2 weeks
I/O clock. Thankfully, the simple executive provies a way to querry
the proper clock that works on all models. Move to asking for the SCLK
via this interface.
This gets the serial console working after we start init and open the
console and set the divisor (which turned the output from good to
bad). I can login on the console now.
and tuned value, we would advertise the unsupported value to CAM and it would
merrily destroy the controller with way too many IO operations.
This manifests itself in a Zero Memory RAID configuration for a P410 and
possibly other controllers.
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
in the pcb. setjmp/longjmp in the kernel also used these values, so
continue to use them although their use isn't technically the pcb
register array (matching is all that's important for setjmp/longjmp in
the kernel). Finally, eliminate the old register names from regnum.h.
This is a lexical change only. The non-debug .o files have the same md5.
same place as dst, or to the sockaddr in the routing table.
The const constraint of gw makes us safe from modifing routing table
accidentially. And "onstantness" of dst allows us to remove several
bandaids, when we switched it back at &ro->ro_dst, now it always
points there.
Reviewed by: rrs
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.
Hit by the OpenBSD local-apic code.
Submitted by: neel
Reviewed by: grehan
Obtained from: NetApp
Merge vendor bugfix for a possible deadlock related to async destroy
and improve write performance by introducing a new lock protecting
tx_open_txg.
Illumos ZFS issues:
3642 dsl_scan_active() should not issue I/O to determine if async
destroying is active
3643 txg_delay should not hold the tc_lock
MFC after: 1 week
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does *not* do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.
The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.
PR: 174749,157796
MFC after: 1 week
GDT from the correct segment, otherwise a triple fault would be caused.
In some virtual environments (VMware, VirtualBox, etc) this could lead
to a unhandled error or hang in the guest emulation software.
Thanks to avg and jhb for a few hints in the right direction.
Noticed by: Jeremy Chadwick <jdc@koitsu.org> (and many others)
MFC after: 1 week
instead of kernel_map size to prevent kernel memory exhaustion
by mbufs and a subsequent panic on physical page allocation
failure.
On architectures without a direct map all mbuf memory (except
for jumbo mbufs larger than PAGE_SIZE) comes from kmem_map.
It is the limiting factor hence.
For architectures with a direct map using the size of kmem_map
is a good proxy of available kernel memory as well. If it is
much smaller the mbuf limit may be sub-optimal but remains
reasonable, while avoiding panics under exhaustion.
The overall mbuf memory limit calculation may be reconsidered
again later, however due to the many different mbuf sizes and
different backing KVM maps it is a tricky subject.
Found by: pho's new network stress test
Pointed out by: alc (kmem_map instead of kernel_map)
Tested by: pho
duplicate ACK make sure we actually have new data to send.
This prevents us from sending unneccessary pure ACKs.
Reported by: Matt Miller <matt@matthewjmiller.net>
Tested by: Matt Miller <matt@matthewjmiller.net>
MFC after: 2 weeks
impossible to set quota and reservation on pools lower than version 22.
Problem has been reported and a solution discussed with vendor.
Illumos ZFS issues:
3739 cannot set zfs quota or reservation on pool version < 22
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reported by: Steve Wills <swills@FreeBSD.org>
MFC after: 3 days
Partially implement generic_bs_*_8() for MIPS platforms.
This is known to work with TARGET_ARCH=mips64 with FreeBSD/BERI.
Assuming that other definitions in cpufunc.h are correct it will
work on non-o64 ABI systems except sibyte. On sibyte and o32 systems
generic_bs_*_8() will remain panic() implementations.
Sponsored by: DARPA, AFRL
Reviewed by: imp, jmallett (older versions)
that uses non-ISA IRQs but use a plain IRQ resource in _CRS. However,
a non-ISA IRQ can't fit into a plain IRQ resource. If we encounter a
link like this, build the resource buffer from _PRS instead of _CRS.
- Set the correct size of the end tag in a resource buffer.
Tested by: Benjamin Lee <ben@b1c1l1.com>
MFC after: 2 weeks
than VLAN groups.
Some chips (eg this rtl8366rb) has a VLAN group per port - you first
define a set of VLANs in a vlan group, then you assign a VLAN group
to a port.
Other chips (eg the AR8xxx switch chips) have a VLAN ID array per
port - there's no group per se, just a list of vlans that can be
configured.
So for now, the switch API will use the latter and rely on drivers
doing the heavy lifting if one wishes to use the VLAN group method.
Maybe later on both can be supported.
PR: kern/177878
PR: kern/177873
Submitted by: Luiz Otavio O Souza <loos.br@gmail.com>
Reviewed by: ray
the pause/resume code to not be called completely symmetrically.
I'll chase down the root cause of that soon; this at least works around
the bug and tells me when it happens.
This allows users who boot without loader to adjust their environments
around slightly buggy or slow hardware.
PR: kern/161809
Submitted by: rozhuk.im@gmail.com
MFC after: 2 weeks
is compiled in or not.
This fixes issues with people running -HEAD but who build modules
without doing a "make buildkernel KERNCONF=XXX", thus picking up
opt_*.h. The resulting module wouldn't have 11n enabled and the
chainmask configuration would just be plain wrong.
This allows mapping a tape drive in a changer (as reported by
'chio status') to a sa(4) driver instance by comparing the
serial numbers.
The designators can be ASCII (which is printed out directly), binary
(which is printed in hex format) or UTF-8, which is printed in either
native UTF-8 format if the terminal can support it, or in %XX notation
for non-ASCII characters. Thanks to Hiroki Sato <hrs@> for the
explaining UTF-8 printing and example UTF-8 printing code.
chio.h: Modify the changer_element_status structure to add new
fields and definitions from the SMC3r16 spec.
Rename the original CHIOGSTATUS ioctl to OCHIOGTATUS and
define a new CHIOGSTATUS ioctl.
Clean up some tab/space issues.
chio.c: For the 'status' subcommand, print the designator field
if it is supplied by a device.
scsi_ch.h: Add new flags for DVCID and CURDATA to the READ
ELEMENT STATUS command structure.
Add a read_element_status_device_id structure
for the data fields in the new standard. Add new
unions, dt_or_obsolete and voltage_devid, to hold
and address data from either SCSI-2 or newer devices.
scsi_ch.c: Implement support for fetching device IDs with READ
ELEMENT STATUS data.
Add new arguments to scsi_read_element_status() to
allow the user to request the DVCID and CURDATA bits.
This isn't compiled into libcam (it's only an internal
kernel interface), so we don't need any special
handling for the API change.
If the user issues the new CHIOGSTATUS ioctl, copy all of
the available element status data out. If he issues the
OCHIOGSTATUS ioctl, we don't copy the new fields in the
structure.
Fix a bug in chopen() that would result in the peripheral
never getting unheld if chgetparams() failed.
Sponsored by: Spectra Logic
Submitted by: Po-Li Soong
MFC After: 1 week
This is intended to be used as a stop-gap for switch devices
which expose multiple ethernet PHYs but we don't have a driver
for - here, etherswitchcfg and the general switch configuration
API can be used to interface to said PHYs.
Submitted by: Luiz Otavio O Souza <loos.br@gmail.com>
Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.
The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.
The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.
It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.
Discussed with: jhb
MFC after: 1 week
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.
Reported and tested by: pho
Reviewed by: kib
MFC after: 2 weeks
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot. It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.
Reviewed by: jimharris, kib
MFC after: 1 month
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.
Reported and tested by: jhb
Reviewed by: jhb
MFC after: 2 weeks
it will work with either the old or new server.
The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes). It does not currently work for NFSv4 requests. They are
more complex, and will take more work to support.
This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately. Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads. This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.
The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.
This improves sequential write performance in ZFS, because writes
to a file are now more ordered. Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS. Sending them down the same thread increases
the odds of their being in order.
In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once. ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking. This change does not affect the
NFSv4 code.
This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.
I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.
The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables. The read offset window calculation has been slightly
modified as well. Instead of having separate bins, each file
handle has a rolling window of bin_shift size. This minimizes
glitches in throughput when shifting from one bin to another.
sys/conf/files:
Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c
when either the old or the new NFS server is built.
sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
Bring in changes from Rick Macklem to newnfs_realign that
allow it to operate in blocking (M_WAITOK) or non-blocking
(M_NOWAIT) mode.
sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
Bring in a change from Rick Macklem to allow telling
nfsm_dissect() whether or not to wait for mallocs.
sys/fs/nfs/nfsm_subs.h:
Bring in changes from Rick Macklem to create a new
nfsm_dissect_nonblock() inline function and
NFSM_DISSECT_NONBLOCK() macro.
sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
Add the malloc wait flag to a newnfs_realign() call.
sys/fs/nfsserver/nfs_nfsdkrpc.c:
Setup the new NFS server's RPC thread pool so that it will
call the FHA code.
Add the malloc flag argument to newnfs_realign().
Unstaticize newnfs_nfsv3_procid[] so that we can use it in
the FHA code.
sys/fs/nfsserver/nfs_nfsdsocket.c:
In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
that use the LK_SHARED lock type.
sys/fs/nfsserver/nfs_nfsdport.c:
In nfsd_fhtovp(), if we're starting a write, check to see
whether the underlying filesystem supports shared writes.
If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.
sys/nfsserver/nfs_fha.c:
Remove all code that is specific to the NFS server
implementation. Anything that is server-specific is now
accessed through a callback supplied by that server's FHA
shim in the new softc.
There are now separate sysctls and tunables for the FHA
implementations for the old and new NFS servers. The new
NFS server has its tunables under vfs.nfsd.fha, the old
NFS server's tunables are under vfs.nfsrv.fha as before.
In fha_extract_info(), use callouts for all server-specific
code. Getting file handles and offsets is now done in the
individual server's shim module.
In fha_hash_entry_choose_thread(), change the way we decide
whether two reads are in proximity to each other.
Previously, the calculation was a simple shift operation to
see whether the offsets were in the same power of 2 bucket.
The issue was that there would be a bucket (and therefore
thread) transition, even if the reads were in close
proximity. When there is a thread transition, reads wind
up going somewhat out of order, and ZFS gets confused.
The new calculation simply tries to see whether the offsets
are within 1 << bin_shift of each other. If they are, the
reads will be sent to the same thread.
The effect of this change is that for sequential reads, if
the client doesn't exceed the max_reqs_per_nfsd parameter
and the bin_shift is set to a reasonable value (22, or
4MB works well in my tests), the reads in any sequential
stream will largely be confined to a single thread.
Change fha_assign() so that it takes a softc argument. It
is now called from the individual server's shim code, which
will pass in the softc.
Change fhe_stats_sysctl() so that it takes a softc
parameter. It is now called from the individual server's
shim code. Add the current offset to the list of things
printed out about each active thread.
Change the num_reads and num_writes counters in the
fha_hash_entry structure to 32-bit values, and rename them
num_rw and num_exclusive, respectively, to reflect their
changed usage.
Add an enable sysctl and tunable that allows the user to
disable the FHA code (when vfs.XXX.fha.enable = 0). This
is useful for before/after performance comparisons.
nfs_fha.h:
Move most structure definitions out of nfs_fha.c and into
the header file, so that the individual server shims can
see them.
Change the default bin_shift to 22 (4MB) instead of 18
(256K). Allow unlimited commands per thread.
sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
Add shims for the old and new NFS servers to interface with
the FHA code, and callbacks for the
The shims contain all of the code and definitions that are
specific to the NFS servers.
They setup the server-specific callbacks and set the server
name for the sysctl and loader tunable variables.
sys/nfsserver/nfs_srvkrpc.c:
Configure the RPC code to call fhaold_assign() instead of
fha_assign().
sys/modules/nfsd/Makefile:
Add nfs_fha.c and nfs_fha_new.c.
sys/modules/nfsserver/Makefile:
Add nfs_fha_old.c.
Reviewed by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks
prevents us from creating UMA_ZONE_PCPU zones earlier.
As bandaid shift initialization of counter(9) zone later.
Reviewed by: kib
Reported & tested by: Lytochkin Boris <lytboris gmail.com>
(Wasting 4k just as a temporary placeholder for a boot environment seems
a bit ridiculous, but hey.)
Tested: gxemul:
$ gxemul -e malta -d i:/home/adrian/work/freebsd/svn/mfsroot-rspro.img -C 4Kc /tftpboot/kernel.MALTA
* Add ah_ratesArray[] to the ar5416 HAL state - this stores the maximum
values permissable per rate.
* Since different chip EEPROM formats store this value in a different place,
store the HT40 power detector increment value in the ar5416 HAL state.
* Modify the target power setup code to store the maximum values in the
ar5416 HAL state rather than using a local variable.
* Add ar5416RateToRateTable() - to convert a hardware rate code to the
ratesArray enum / index.
* Add ar5416GetTxRatePower() - which goes through the gymnastics required
to correctly calculate the target TX power:
+ Add the power detector increment for ht40;
+ Take the power offset into account for AR9280 and later;
+ Offset the TX power correctly when doing open-loop TX power control;
+ Enforce the per-rate maximum value allowable.
Note - setting a TPC value of 0x0 in the TX descriptor on (at least)
the AR9160 resulted in the TX power being very high indeed. This didn't
happen on the AR9220. I'm guessing it's a chip bug that was fixed at
some point. So for now, just assume the AR5416/AR5418 and AR9130 are
also suspect and clamp the minimum value here at 1.
Tested:
* AR5416, AR9160, AR9220 hostap, verified using (2GHz) spectrum analyser
* Looked at target TX power in TX descriptor (using athalq) as well as TX
power on the spectrum analyser.
TODO:
* The TX descriptor code sets the target TX power to 0 for AR9285 chips.
I'm not yet sure why. Disable this for TPC and ensure that the TPC
TX power is set.
* AR9280, AR9285, AR9227, AR9287 testing!
* 5GHz testing!
Quirks:
* The per-packet TPC code is only exercised when the tpc sysctl is set
to 1. (dev.ath.X.tpc=1.) This needs to be done before you bring the
interface up.
* When TPC is enabled, setting the TX power doesn't end up with a call
through to the HAL to update the maximum TX power. So ensure that
you set the TPC sysctl before you bring the interface up and configure
a lower TX power or the hardware will be clamped by the lower TX
power (at least until the next channel change.)
Thanks to Qualcomm Atheros for all the hardware, and Sam Leffler for use
of his spectrum analyser to verify the TX channel power.
ath_tx_rate_fill_rcflags(). Include setting up the TX power cap in the
rate scenario setup code being passed to the HAL.
Other things:
* add a tx power cap field in ath_rc.
* Add a three-stream flag in ath_rc.
* Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a
global flag for the transmission.
The following change from illumos brought caused DTrace to
pause in an interactive environment:
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This was not detected during testing because it doesn't
affect scripts.
We shouldn't be changing the environment, especially since the
LD_NOLAZYLOAD option doesn't apply to our (GNU) ld.
Unfortunately the change from upstream was made in such a way
that it is very difficult to separate this change from the
others so, at least for now, it's better to just revert
everything.
Reference:
https://www.illumos.org/issues/3026
Reported by: Navdeep Parhar and Mark Johnston
directly referencing ni->ni_txpower.
This provides the hardware with a slightly more accurate idea of
the maximum TX power to be using.
This is part of a series to get per-packet TPC to work (better).
Tested:
* AR5416, hostap mode
the given node.
This takes into account the per-node cap, the ic cap and the
per-channel regulatory caps.
This is designed to replace references to ni_txpower in various net80211
drivers - ni_txpower doesn't necessarily reflect the actual cap for
the given node (eg if the node has the default value of 50dBm (100) and
the administrator has manually configured a lower TX power.)
signal.
- Fix the old ksem implementation for POSIX semaphores to not restart
sem_wait() or sem_timedwait() if interrupted by a signal.
MFC after: 1 week
Pointy-hat to: me, for not realizing snprintf() is available in kernel.
Thanks to: jh, for bringing me the good news of snprintf(), Pawel Worach, for
noting that the panic can be provoked in i386 and not in amd64
The notes format is a header of sizeof(int), which stores the size of
the corresponding data structure to provide some versioning, and data
in the format as it is returned by a related sysctl call.
The userland tools (procstat(1)) will be taught to extract this data,
providing additional info for postmortem analysis.
PR: kern/173723
Suggested by: jhb
Discussed with: jhb, kib
Reviewed by: jhb (initial version), kib
MFC after: 1 month
Only look for FDT partitions if our potential parent is a DISK device.
Excluding direct recursion on the flashmap geoms was insufficient
because it did not prevent the underlying device from being retrieved if
flashmap geoms were further partitioned.
Reviewed by: imp
Sponsored by: DARPA, AFRL
LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a
usecount on a vnode during NFSv4 recovery from an
expired lease.
Reported and tested by: pho
MFC after: 2 weeks
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.
to unique values.
There's some confusion about what the n32 assembler API really is
(since on page 9 of the spec they say that t0-t3 don't exist, then
turn around on page 22 and say that t4-t7 don't exist), and this
doesn't touch that.
NetBSD's version of this file follows the convention I used here, and
is likely to be correct.
This should fix gdb/ptrace.
is starting. This is in line with practice in OpenSolaris.
Note that this change is only in ULE and not in the 4BSD scheduler.
Once this change settles in (MFC timeout has expired) we'll try it out
on 4BSD as well.
PR: 177706
Submitted by: Tiwei Bie
MFC after: 1 month
well as enabled when necessary. And simplify the checksum routine
itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing
out the problems and code suggestions.
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.
Discussed with: trasz
Approved by: silence from freebsd-geom@
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.
Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.
We store counters in the softc, and once per second push their values
to legacy ifnet counters.
Sponsored by: Nginx, Inc.
the number of interior nodes, we have previously created a level zero
interior node at the root of every non-empty trie, even when that node is
not strictly necessary, i.e., it has only one child. This change is the
second (and final) step in eliminating those unnecessary level zero interior
nodes. Specifically, it updates the deletion and insertion functions so
that they do not require a level zero interior node at the root of the trie.
For a "buildworld" workload, this change results in a 16.8% reduction in the
number of interior nodes allocated and a similar reduction in the average
execution time for lookup functions. For example, the average execution
time for a call to vm_radix_lookup_ge() is reduced by 22.9%.
Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division
and kern_proc_vmmap_out() functions to output process kinfo structures
to sbuf, to make the code reusable.
The functions are going to be used in the coredump routine to store
procstat info in the core program header notes.
Reviewed by: kib
MFC after: 3 weeks
function is provided, which is used either to calculate the note size
or output it to sbuf. On the first pass the notes are registered in a
list and the resulting size is found, on the second pass the list is
traversed outputing notes to sbuf. For the sbuf a drain routine is
provided that writes data to a core file.
The main goal of the change is to make coredump to write notes
directly to the core file, without preliminary preparing them all in a
memory buffer. Storing notes in memory is not a problem for the
current, rather small, set of notes we write to the core, but it may
becomes an issue when we start to store procstat notes.
Reviewed by: jhb (initial version), kib
Discussed with: jhb, kib
MFC after: 3 weeks
device which makes the request for dma tag, instead of some descendant
of the PCI device, by creating a pass-through trampoline for vga_pci
and ata_pci buses.
Sponsored by: The FreeBSD Foundation
Suggested by: jhb
Discussed with: jhb, mav
MFC after: 1 week
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning. NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph". If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
r248917, r248918, r248978, r249001, r249014, r249030:
Remove multilevel freezing mechanism, implemented to handle specifics of
the ATA/SATA error recovery, when post-reset recovery commands should be
allocated when queues are already full of payload requests. Instead of
removing frozen CCBs with specified range of priorities from the queue
to provide free openings, use simple hack, allowing explicit CCBs over-
allocation for requests with priority higher (numerically lower) then
CAM_PRIORITY_OOB threshold.
Simplify CCB allocation logic by removing SIM-level allocation queue.
After that SIM-level queue manages only CCBs execution, while allocation
logic is localized within each single device.
Suggested by: gibbs
sys/arm and sys/mips), squelching the clang 3.3 warnings about this.
Noticed by: tinderbox and many irate spectators
Submitted by: Luiz Otavio O Souza <loos.br@gmail.com>
PR: kern/177759
MFC after: 3 days
In some cases, kern_envp is set by the architecture code and env_pos does
not contain the length of the static kernel environment. In these cases
r249408 causes the kernel to discard the environment.
Fix this by updating the check for empty static env to *kern_envp != '\0'
Reported by: np@
the number of interior nodes, we always create a level zero interior node at
the root of every non-empty trie, even when that node is not strictly
necessary, i.e., it has only one child. This change is the first step in
eliminating those unnecessary level zero interior nodes. Specifically, it
updates all of the lookup functions so that they do not require a level zero
interior node at the root.
Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.
Sponsored by: Intel
These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio. Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.
Sponsored by: Intel
Having MIPS_MAX_TLB_ENTRIES defined to 128 is misleading, since it used
to be 64 in older releases of MIPS architecture (where it could be read
from Config1) and can be much more than 128 for the newer processors.
For now, move the definition to the only file using it (mips/mips/tlb.c)
and define MIPS_MAX_TLB_ENTRIES depending on the MIPS cpu defined. Also
add few checks so that we do not write beyond the end of the tlb_state
array.
This fixes a kernel data corruption seen in Netlogic XLP, which was casued
by tlb_save() writing beyond the end of tlb_state array when breaking into
debugger.
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.
With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.
Reviewed by: ken
Sponsored by: FreeBSD Foundation
In case where there are no static kernel environment entries, the
function init_dynamic_kenv() adds an incorrect entry at position 0 of
the dynamic kernel environment. This in turn causes kenv(1) to print
and empty list even though there are dynamic entries added later.
Fix this by checking env_pos in init_dynamic_kenv() and adding dynamic
entries only if there are static entries.
it is being installed). Improve other error messages while here.
- Select special FPGA specific configuration profile when appropriate.
MFC after: 3 days
mbuf allocation fails, as in a case when ip_output() returns error.
To achieve that, move large block of code that updates tcpcb below
the out: label.
This fixes a panic, that requires the following sequence to happen:
1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss
2) The retransmit timeout happened for the SYN we had sent,
tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output().
In tcp_output m_get() fails.
3) Later on the SYN|ACK for the SYN sent in step 1) came,
tcp_input sets tp->snd_una += 1, which leads to
tp->snd_una > tp->snd_nxt inconsistency, that later panics in
socket buffer code.
For reference, this bug fixed in DragonflyBSD repo:
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419
Reviewed by: andre
Tested by: pho
Sponsored by: Nginx, Inc.
PR: kern/177456
Submitted by: HouYeFei&XiBoLiu <lglion718 163.com>
Merge changes from illumos:
3021 option for time-ordered output from dtrace(1M)
3022 DTrace: keys should not affect the sort order when sorting by value
3023 it should be possible to dereference dynamic variables
3024 D integer narrowing needs some work
3025 register leak in D code generation
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This brings yet another feature implemented in upstream DTrace.
A complete description is available here:
http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/
This change bumps the DT_VERS_* number to 1.9.1 in
accordance to what is done in illumos.
This change was somewhat complicated because upstream is mixed many
changes in an individual commit and some of the tests don't really
apply to us.
There are also appear to be differences in timestamping with Solaris
so we had to workaround some assertions making sure no regression
happened.
Special thanks to Fabian Keil for changes and testing.
Illumos Revisions: 13758:23432da34147
Reference:
https://www.illumos.org/issues/3021https://www.illumos.org/issues/3022https://www.illumos.org/issues/3023https://www.illumos.org/issues/3024https://www.illumos.org/issues/3025https://www.illumos.org/issues/1694
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 months
Merge bugfixes accepted and integrated by vendor. Underlying problems
have been reported by us and fixed in r240942 and r249196.
Illumos ZFS issues:
3645 dmu_send_impl: possibilty of pool hold leak
3692 Panic on zfs receive of a recursive deduplicated stream
MFC after: 8 days
Some failing disks tend to return vendor-specific ASC/ASCQ codes with
NOT READY sense key. It caused extremely long recovery attempts, repeating
these 120 TURs (it takes at least 1 minute) for every I/O request.
Instead of that use default error handling, doing just few retries.
Reviewed by: ken, gibbs
MFC after: 1 month
volumes behind a ciss(4) controller were being reported with malformeed
names and identifiers.
Repair that reporting by using the CAM values for the three SCSI indents
reported via camcontrol devlist
PR: kern/171650
Reviewed by: scottl
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
somewhere around svn r39402 to r39234.
I don't know of anyone who really wants to test these changes, but they
only remove the deprecated code in question. This shreds the driver down a
bit and *removes* options from the kernel configs.
These don't appear to be referenced in the man page, so no need to check it
there.
PR: kern/44587
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
references to it.
This is the functional equivalent to change r237518, which added this
functionality to the cd(4) and da(4) drivers.
This fix prevents a panic caused by GEOM calling adaopen() while the device
is going away. We now keep the device around until GEOM has finished
cleaning up its state.
ata_da.c: In adaregister(), add a d_gone callback to the GEOM disk
structure registered for the ada driver. Increment the
peripheral reference count for GEOM.
Add a new callback, adadiskgonecb(), that GEOM calls when
it is done with its resources. This callback releases the
reference acquired in adaregister().
Submitted by: Po-Li Soong
Sponsored by: Spectra Logic
MFC After: 5 days
as CTASSERT in MI pcpu.h, stop including all possible mutually exclusive
PCPU_MD_FIELDS fields into LINT kernels, due to brekaing
aforementioned CTASSERT.