During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
- Create a new file, t4_sched.c, and move all of the code related to
traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.
MFC after: 1 month
Sponsored by: Chelsio Communications
Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state. This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.
This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock. Instead it uses sx lock to
protect direct if_ioctl() calls.
As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before. This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10514
This restores 32bit-sized accesses to vmcnt sysctls, making old
binaries like top(1), systat(8) and reboot(8) mostly functional on
newer kernel.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
At least with Tx FIFO enabled it shows me ~10% reduction of verbose boot
time with serial console at 115200 baud.
Reviewed by: marcel
MFC after: 2 weeks
The current reboot command in efi/loader/main.c is passing extra data with
ResetSystem, however, UEFI spec 2.6, page 265 does state:
"ResetData is only valid if ResetStatus is something other than EFI_SUCCESS
unless the ResetType is EfiResetPlatformSpecific where a minimum amount of
ResetData is always required."
Therefore we should use DataSize 0 and ResetData NULL - those are two last
arguments for the call.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D10562
cannot access the GLOBAL2 register directly.
Despite the comment in code (which was misleading), the indirect access is
only used to read the switch CONFIG data from the scrap register and not
for the GLOBAL2 access.
Use the dsa data to define when the switch is in the Multi Chip Addressing
Mode (a even address different than zero).
While here fix a typo.
Sponsored by: Rubicon Communications, LLC (Netgate)
This patch allows the MTU stored in the hostcache to be used as an
initial value for SCTP paths. When an ICMP PTB message is received,
store the MTU in the hostcache.
MFC after: 1 week
determining the softc of the bridge in psycho_route_interrupt(). [1]
While at it, update the corresponding comment that the code in
question is also necessary for U30s in addition to E450s (a fact
that has been known for ages).
PR: 218478
Submitted by: Yoshihiko Iwama
The code specified the length of a layout as INT64_MAX instead of
UINT64_MAX. This could result in getting a layout for less than the
full file for extremely large files. Although having little practical
effect, this patch corrects this in the code.
Detected during recent testing of the pNFS server.
MFC after: 2 weeks
In exceptional circumstances, an MCA exception will trigger when the
freelist is exhausted. In such a case, no error will be logged on the list
and 'mca_count' will not be incremented.
Prior to this patch, all CPUs that received the exception would spin
forever.
With this change, the CPU that detects the error but finds the freelist
empty will proceed to panic the machine, ending the deadlock.
A follow-up to r260457.
Reported by: Ryan Libby <rlibby at gmail.com>
Reviewed by: jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10536
It improves interoperability with if_bridge, which may need to disable
some capabilities not supported by other members. IMHO there is still
open question about LRO capability, which may need to be disabled on
physical interface.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
7740 fix for 6513 only works in hole punching case, not truncation
illumos/illumos-gate@7de35a3ed07de35a3ed0https://www.illumos.org/issues/7740
The problem is that dbuf_findbp will return ENOENT if the block it's
trying to find is beyond the end of the file. If that happens, we assume
there is no birth time, and so we lose that information when we write
out new blkptrs. We should teach dbuf_findbp to look for things that are
beyond the current end, but not beyond the absolute end of the file.
To verify, create a large file, truncate it to a short length, and then
write beyond the end. Check with zdb to make sure that there are no
holes with birth time zero (will appear as gaps).
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Paul Dagnelie <pcd@delphix.com>
7743 per-vdev-zaps have no initialize path on upgrade
illumos/illumos-gate@555da5111b555da5111bhttps://www.illumos.org/issues/7743
When loading a pool that had been created before the existance of
per-vdev zaps, on a system that knows about per-vdev zaps, the
per-vdev zaps will not be allocated and initialized.
This appears to be because the logic that would have done so, in
spa_sync_config_object(), is not reached under normal operation. It is
only reached if spa_config_dirty_list is non-empty.
The fix is to add another `AVZ_ACTION_` enum that will allow this code
to be reached when we detect that we're loading an old pool, even when
there are no dirty configs.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
7613 ms_freetree[4] is only used in syncing context
illumos/illumos-gate@5f145778015f14577801https://www.illumos.org/issues/7613
metaslab_t:ms_freetree[TXG_SIZE] is only used in syncing context. We should
replace it with two trees: the freeing tree (ranges that we are freeing this
syncing txg) and the freed tree (ranges which have been freed this txg).
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Some notes:
- Only i386 and amd64 layouts are checked, other Tier-1 (or close to
it) architectures would benefit from the same check.
- Unconditional enabling of the asserts depend on the stability of locks
memory layout. If locks are optimized to avoid bloat when some debugging
or profiling features turned off, it makes sense to only assert layout
for production configs.
Reviewed by: badger, emaste, jhb, vangyzen
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D10526
7586 remove #ifdef __lint hack from dmu.h
illumos/illumos-gate@4ba5b961634ba5b96163https://www.illumos.org/issues/7586
The #ifdef __lint in dmu.h is ugly, and it would be nice not to duplicate it if
we add other inline functions into header files in ZFS, especially since it is
difficult to make any other solution work across all compilation targets. We
should switch to disabling the lint flags that are failing instead.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
available and if that is true make use of them.
Thank you very much to Andrew Turner for providing help and review the patch!
Reviewed by: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10499
7580 ztest failure in dbuf_read_impl
illumos/illumos-gate@1a01181fdc1a01181fdchttps://www.illumos.org/issues/7580
We need to prevent any reader whenever we're about the zero out all the
blkptrs. To do this we need to grab the dn_struct_rwlock as writer in
dbuf_write_children_ready and free_children just prior to calling bzero.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
- Rename the default implementation of 'pcib_request_feature' and add
a pcib_request_feature() wrapper function (as is often done for
new-bus APIs implemented via kobj) that accepts a single function.
Previously the call to pcib_request_feature() ended up invoking the
method on the great-great-grandparent of the bridge device instead
of the grandparent. For a bridge that was a direct child of pci0 on
x86 this resulted in the method skipping over the Host-PCI bridge
driver and being invoked against nexus0
- When invoking _OSC from a Host-PCI bridge driver, invoke
device_get_softc() against the Host-PCI bridge device instead of the
child bridge that is requesting HotPlug. Using the wrong softc data
resulted in garbage being passed for the ACPI handle causing the
_OSC call to fail.
- While here, perform some other cleanups to _OSC handling in the ACPI
Host-PCI bridge driver:
- Don't invoke _OSC when requesting a control that has already been
granted by the firmware.
- Don't set the first word of the capability array before invoking
_OSC. This word is always set explicitly by acpi_EvaluateOSC()
since it is UUID-independent.
- Don't modify the set of granted controls unless _OSC doesn't exist
(which is treated as always successful), or the _OSC method
doesn't fail.
- Don't require an _OSC status of 0 for success. _OSC always
returns the updated control mask even if it returns a non-zero
status in the first word.
- Whine if _OSC ever tries to revoke a previously-granted control.
(It is not supposed to do that.)
- While here, add constants for the _OSC status word in acpivar.h
(though currently unused).
Reported by: adrian
Reviewed by: imp
MFC after: 1 week
Tested on: Lenovo x220
Differential Revision: https://reviews.freebsd.org/D10520
7606 dmu_objset_find_dp() takes a long time while importing pool
illumos/illumos-gate@7588687e6b7588687e6bhttps://www.illumos.org/issues/7606
When importing a pool with a large number of filesystems within the same
parent filesystem, we see that dmu_objset_find_dp() takes a long time.
It is called from 3 places: spa_check_logs(), spa_ld_claim_log_blocks(),
and spa_load_verify().
There are several ways to improve performance here:
1. We don't really need to do spa_check_logs() or
spa_ld_claim_log_blocks() if the pool was closed cleanly.
2. spa_load_verify() uses dmu_objset_find_dp() to check that no
datasets have too long of names.
3. dmu_objset_find_dp() is slow because it's doing
zap_value_search() (which is O(N sibling datasets)) to determine
the name of each dsl_dir when it's opened. In this case we
actually know the name when we are opening it, so we can provide
it and avoid the lookup.
This change implements fix#3 from the above list; i.e. make
dmu_objset_find_dp() provide the name of the dataset so that we don't
have to search for it.
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
linux_page_address() function in the LinuxKPI. This solves an issue
where the return value from linux_page_address() is passed to
kmem_free().
MFC after: 1 week
Sponsored by: Mellanox Technologies
The nfsv4_seqsession() call returns NFSERR_REPLYFROMCACHE when it has a
reply in the session, due to a requestor retry. The code erroneously
assumed a return of 0 for this case. This patch fixes this and adds
a KASSERT(). This would be an extremely rare occurrence. It was found
during code inspection during the pNFS server development.
MFC after: 2 weeks
busdma know, so that on architectures where dma isn't always coherent, we
know we don't have to write-back/invalidates cachelines on DMA operations.
Reviewed by: andrew, mav
validation of SEG.ACK as the first step. If the ACK is not acceptable,
a RST segment should be sent and the segment should be dropped.
Up to now, the segment was partially processed.
This patch moves the check for the SEG.ACK validation up to the front
as required.
Reviewed by: hiren, gnn
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D10424
PCB SP cache acquires extra reference, when SP is stored in the cache.
Release this reference when PCB is destroyed in ipsec_delete_pcbpolicy().
In ipsec_copy_pcbpolicy() release reference to SP in case if sp_in or
sp_out are not NULL.
Reported by: Slawa Olhovchenkov <slw at zxy spb ru>
MFC after: 1 week
before we increase irq again, or we'd end up choosing an irq, and then
really using the next one, even if it's not available.
Also in the inner loop, correct the end check so that we check every irq,
even the last one.
This makes the msk(4) adapter able to use MSI on Softiron Overdrive 1000.
This check has been redundant since it was introduced in r162554.
Reviewed by: emaste, glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10322
7252 7628 compressed zfs send / receive
illumos/illumos-gate@5602294fda5602294fdahttps://www.illumos.org/issues/7252
This feature includes code to allow a system with compressed ARC enabled to
send data in its compressed form straight out of the ARC, and receive data in
its compressed form directly into the ARC.
https://www.illumos.org/issues/7628
We should have longer, more readable versions of the ZFS send / recv options.
7628 create long versions of ZFS send / receive options
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
wait for the next enqueue from the driver.
Reviewed by: gnn@, hselasky@, gallatin@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D10432
patm(4) devices.
Maintaining an address family and framework has real costs when we make
infrastructure improvements. In the case of NATM we support no devices
manufactured in the last 20 years and some will not even work in modern
motherboards (some newer devices that patm(4) could be updated to
support apparently exist, but we do not currently have support).
With this change, support remains for some netgraph modules that don't
require NATM support code. It is unclear if all these should remain,
though ng_atmllc certainly stands alone.
Note well: FreeBSD 11 supports NATM and will continue to do so until at
least September 30, 2021. Improvements to the code in FreeBSD 11 are
certainly welcome.
Reviewed by: philip
Approved by: harti
The NFSv4 RFCs give a server the option of allowing the use of an open
stateid for write access to be used for a Read operation.
This patch enables this by default and adds a sysctl to disable it,
for anyone who does not want this capability.
Allowing this is particularily useful for a pNFS Data Server (DS), since
they are not permitted to allow the use of special stateids.
Discovered during recent testing of the pNFS server under development.
MFC after: 2 weeks
BHND_EROM_DUMP() method.
Dump the EROM tables to the coneole on mips/broadcom devices if bootverbose
is enabled; this functionality is primarily useful when debugging SoC EROM
parsing and device matching issues during early boot.
Reviewed by: mizhka
Approved by: adrian (mentor)
Sponsored by: Plausible Labs
Differential Revision: https://reviews.freebsd.org/D10122
kernel calls this directly so the event handler is not called, meaning
the computer fails to reboot.
Tested by: cognet
MFC after: 1 week
Sponsored by: DARPA, AFRL
Hyper-V hot channel effect:
Operation latency on hot channel is only _half_ of the operation
latency on cold channels.
This commit takes the advantage of the above Hyper-V host channel
effect, and can reduce more than 75% latency and more than 50%
latency stdev, i.e. lower and more stable/predictable latency,
for various types of web server workloads.
MFC after: 3 days
Sponsored by: Microsoft
An NFSv4 server has the option of allowing a Read to be done using a Write
Open. If this is not allowed, the server will return NFSERR_OPENMODE.
This patch attempts the read with a write open and then disables this
if the server replies NFSERR_OPENMODE.
This change will avoid some uses of the special stateids. This will be
useful for pNFS/DS Reads, since they cannot use special stateids.
It will also be useful for any NFSv4 server that does not support reading
via the special stateids. It has been tested against both types of NFSv4 server.
MFC after: 2 weeks
The NFSv4.1/pNFS client does not use/need a backchannel for the Data Server (DS)
sessions, so the flag should only be set for MetaData Server (MDS) sessions.
This patch should have been a part of r317275.
MFC after: 2 weeks
colors.
Colors are still hard-coded as 15 (normally lightwhite) for the interior
and 0 (normally black) for the border, but these are now values used in
2 expressions instead of built in to the algorithm. The algorithm used
a fancy and/or method, but this gives no control over the colors except
and'ing all color planes off gives black and or'ing all color planes on
gives lightwhite. Just draw the border and interior in separate colors
using the same method as for characters, including its complications to
optimize for VGA adaptors. Optimization is not really needed here, but
for the VGA case it avoids being slower than the and/or method. The
optimization is worth about 30%.
The "return layout on close" case in the pNFS client was badly broken.
Fortunately, extant pNFS servers that I have tested against do not
do this. This patch fixes it. It also changes the way the layout stateid.seqid
is set for LayoutReturn. I think this change is correct w.r.t. the RFC,
but I am not 100% sure.
This was found during recent testing of the pNFS server under development.
MFC after: 2 weeks
The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the
connection to the Data Server (DS) under some circumstances. The main
effect of this was a leak of malloc'd structures in the krpc. This patch
adds the newnfs_disconnect() calls to fix this.
Detected during recent testing against the pNFS server under development.
MFC after: 2 weeks
Rename the mtu variable in ip6_fragment(), because mtu is misleading. The
variable actually holds the fragment length.
No functional change.
Suggested by: ae
For now there isn't any per-VAP WME state. The eventual aim is to migrate
the driver direct use of WME parameters over to use these methods as
appropriate (global for most devices, per-VAP for firmware NICs that support
it) in preparation for actual per-VAP WME (and other thing) state change
support.
In my eagerness to eliminate a branch which is taken once per 2^38
bytes of keystream, I forgot that the state words are in host order.
Thus, the counter increment code worked fine on little-endian
machines, but not on big-endian ones. Switch to a simpler (branchful)
solution.
The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.
MFC after: 2 weeks
An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.
MFC after: 2 weeks
Implement FUSE open flag FOPEN_KEEP_CACHE. Without this flag, cached file
contents should be invalidated on open. Apparently, fusefs-encfs relies
upon this behavior.
PR: 218636
Submitted by: Ben RUBSON <ben.rubson at gmail.com>
The nfscl_mtofh() function didn't check for failed operations and, as such,
would have returned EBADRPC for these cases, due to parsing failure.
This patch adds checks, so that it returns with ND_NOMOREDATA set.
This is needed for future use in the pNFS server and acts as a safety
belt in the meantime.
MFC after: 2 weeks
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. The default
values of 0 meant that setting a group to "wheel" would fail even when
done by root.
It also adds a definition of GID_NOGROUP to sys/conf.h.
Discussed on: freebsd-current@
MFC after: 2 weeks
7386 zfs get does not work properly with bookmarks
illumos/illumos-gate@edb901aab9edb901aab9https://www.illumos.org/issues/7386
The zfs get command does not work with the bookmark parameter while it works
properly with both filesystem and snapshot:
# zfs get -t all -r creation rpool/test
NAME PROPERTY VALUE SOURCE
rpool/test creation Fri Sep 16 15:00 2016 -
rpool/test@snap creation Fri Sep 16 15:00 2016 -
rpool/test#bkmark creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test@snap
NAME PROPERTY VALUE SOURCE
rpool/test@snap creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test#bkmark
cannot open 'rpool/test#bkmark': invalid dataset name
#
The zfs get command should be modified to work properly with bookmarks too.
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
remove the former.
All other EGA/VGA methods were already shared, with VGA-only features
mostly not used and no decisions in inner loops to optimize fof VGA,
but this method was split up because it is the only important one and
using VGA methods if possible is about twice as fast. The speed is
mostly not from splitting to reduce branches but from doing half as
many bus accesses, so make this easier to maintain by not splitting.
There is now 1 extra branch in an inner loop where it costs less than
1% of the bus access overhead on Haswell even if the compiler schedules
it poorly.
Before this change it was impossible to set number of PKCS#5v2 iterations,
required to set passphrase, if it has two keys and never had any passphrase.
Due to present metadata format limitations there are still cases when number
of iterations can not be changed, but now it works in cases when it can.
PR: 218512
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10338
vga planar method (for testing that was supposed to be local that the
former still works). The ega method works on vga but is about twice
as slow. The vga method doesn't work on ega.
Optimize the main vga planar method a little. For changing the
background color (which was otherwise optimized better than most
things), don't switch the write mode from 3 to 0 just to select
the pixel mask of 0xff obscurely by writing 0. Just write 0xff
directly.