and IPv6. With IPv6 we may call if_addmulti() in context of processing
of an incoming packet. Usually this is interrupt context. While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't. An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden. Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion. However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped. Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.
Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context. Note that nobody cares about returned value.
Reviewed by: hselasky, kib
Differential Revision: https://reviews.freebsd.org/D22154
Epoch itself doesn't rely on the counter and it is provided
merely for sleeping subsystems to check it.
- In functions that sleep use THREAD_CAN_SLEEP() to assert
correctness. With EPOCH_TRACE compiled print epoch info.
- _sleep() was a wrong place to put the assertion for epoch,
right place is sleepq_add(), as there ways to call the
latter bypassing _sleep().
- Do not increase td_no_sleeping in non-preemptible epochs.
The critical section would trigger all possible safeguards,
no sleeping counter is extraneous.
Reviewed by: kib
After r353292, netmap generic adapter on if_vlan interfaces panics on
asserting the NET_EPOCH. In more detail, this happens when
nm_os_generic_xmit_frame() is called, that is in the generic txsync
routine.
Fix the issue by entering the NET_EPOCH during the generic txsync.
We amortize the cost of entering/exiting over a whole batch of
transmissions.
PR: 241489
Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes#8077zfsonlinux/zfs@4c0883fb4a
MFC after: 2 weeks
The fill_elf_hwcap() function expects to find only cpu nodes under the
/cpus entry of the device tree. Newer versions of QEMU insert a cpu-map
node which describes the CPU topology, breaking this function. To fix
this, simply skip any non-cpu entries.
Reviewed by: markj, kp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22151
If neither FULL_BUF_TRACKING nor BUF_TRACKING are defined, then the body of
buf_track() becomes empty. Mark the arguments with "__unused" so the
compiler doesn't complain about unused arguments in that case.
Reported by: Bruce Leverett (Panasas)
Reviewed by: cem (on IRC)
MFC after: 1 month
Sponsored by: Panasas
Use a separate make variable to specify the linker script so that it is
only applied at link time and not during intermediate generation of .fwo
files.
This ensures that the .text padding inserted by the amd64 linker script
is applied to the stub module load handlers embedded in firmware
modules.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22125
This saves 320 bytes of the precious stack space.
The only negative aspect of the change I can think of is that the
struct thread increased by 320 bytes obviously, and that 320 bytes are
not swapped out anymore. I believe the freed stack space is much more
important than that. Also, current struct thread size is 1392 bytes
on amd64, so UMA will allocate two thread structures per (4KB) slab,
which leaves a space for pcb without increasing zone memory use.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D22138
The RK805 regs array was being allocated before it's required size was
known, causing the driver to use memory it didn't own. That memory
was subsequently allocated and used elsewhere causing later fatal data
aborts in rk805_map().
Whilst I'm here, add a sanity check to catch unsupported PMICs (this
shouldn't ever get hit because the probe should have failed).
Reviewed by: manu
MFC after: 1 week
Sponsored by: Google
In theory the eventhandler invoke should be in the same VNET as
the the current interface. We however cannot guarantee that for
all cases in the future.
So before checking if the fragmentation handling for this VNET
is active, switch the VNET to the VNET of the interface to always
get the one we want.
Reviewed by: hselasky
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22153
This is a driver for the USB3 PHY present in the RK3399.
While the phy support DP (Display Port) the driver doesn't has we have
no driver to test this with for now.
All the lane and pll configuration is just magic values from rockchip.
While the manual have some info on those registers it's really hard to
understand how to calculate those values (if there is a way).
MFC after: 1 month
When allocating the IPv6 fragement packet queue entry we do checks
against counters and if we pass we increment one of the counters
to claim the spot. Right after that we have two cases (malloc and MAC)
which can both fail in which case we free the entry but never released
our claim on the counter. In theory this can lead to not accepting new
fragments after a long time, especially if it would be MAC "refusing"
them.
Rather than immediately subtracting the value in the error case, only
increment it after these two cases so we can no longer leak it.
MFC after: 3 weeks
Sponsored by: Netflix
Previously the code used sbttous() before microseconds comparison in one
place, sbttons() and nanoseconds in another, division by SBT_1US and
microseconds in yet another.
Now the code consistently uses multiplication by SBT_1US to convert
microseconds to sbintime_t before comparing them with periods between
calls to sbinuptime(). This is fast, this is precise enough (below
0.03%) and the periods defined by the protocol cannot overflow.
Reviewed by: imp (D22108)
MFC after: 2 weeks
The lock is used only for start / stop signaling.
It is used only for 'flags' field and the related condition variable.
This change is a follow-up to r354067, it was suggested by Warner in
D22107.
Suggested by: imp
MFC after: 1 week
This is similar to what is done around other calls that lead to
own_command_wait() that can sleep.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22107
Some controllers cannot preset future output value while the pin is in
input mode. This adds a fallback for those controllers. The new code
assumes that a controller reports an error in that case.
For example, all hardware supported by nctgpio behaves in that way.
This is a temporary measure. In the future we will use
GPIO_PIN_PRESET_LOW / GPIO_PIN_PRESET_HIGH to preset the output either
in hardware, if supported, or in software (e.g., in
gpiobus_pin_setflags).
While here, I extracted common functionality of gpioiic_set{sda,scl} and
gpioiic_get{sda,scl} to gpioiic_setpin and gpioiic_getpin respectively.
MFC after: 2 weeks
Clang trunk recently gained this new warning, and complains about the
sizeof(trim->data) / sizeof(struct nvme_dsm_range) expression, since
the left hand side's element type (char) does not match the right hand
side's type. The byte buffer is unnecessary so we can remove it to clean
up the code and fix the warning at the same time.
No functional change.
Submitted by: James Clarke <jrtc27@jrtc27.com>
Reviewed by: imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D21912
When we receive the packet with the first fragmented part (fragoff=0)
we remember the length of the unfragmentable part and the next header
(and should probably also remember ECN) as meta-data on the reassembly
queue.
Someone replying this packet so far could change these 2 (3) values.
While changing the next header seems more severe, for a full size
fragmented UDP packet, for example, adding an extension header to the
unfragmentable part would go unnoticed (as the framented part would be
considered an exact duplicate) but make reassembly fail.
So do not allow updating the meta-data after we have seen the first
fragmented part anymore.
The frag6_20 test case is added which failed before triggering an
ICMPv6 "param prob" due to the check for each queued fragment for
a max-size violation if a fragoff=0 packet was received.
MFC after: 3 weeks
Sponsored by: Netflix
stat() of one of the remaining names of the file does not show an
updated ctime (inode modification time) until several seconds after
the unlink() completes. The problem only occurs when the filesystem
is running with soft updates enabled. When running with soft updates,
the ctime is not updated until the soft updates background process
has settled all the needed I/O operations.
This commit causes the ctime to be updated immediately during the
unlink(). A side effect of this change is that the ctime is updated
again when soft updates has finished its processing because that
is the time that is correct from the perspective of programs that
look at the disk (like dump). This change does not cause any extra
I/O to be done, it just ensures that stat() updates the ctime before
handing it back.
PR: 241373
Reported by: Alan Somers
Tested by: Alan Somers
MFC after: 3 days
Sponsored by: Netflix
While the comment was updated in r350746, the code was not.
RFC8200 says that unless fragment overlaps are exact (same fragment
twice) not only the current fragment but the entire reassembly queue
for this packet must be silently discarded, which we now do if
fragment offset and fragment length do not match.
Obtained from: jtl
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16850
Similar to the system global counter also export the per-VNET counter
"frag6_nfragpackets" detailing the current number of fragment packets
in this VNET's reassembly queues.
The read-only counter is helpful for in-VNET statistical monitoring and
for test-cases.
MFC after: 3 weeks
Sponsored by: Netflix
In case the first fragmented part (off=0) arrives we check for the
maximum packet size for each fragmented part we already queued with the
addition of the unfragmentable part from the first one.
For one we do not have to enter the loop at all if this is the first
fragmented part to arrive, and we can skip the check.
Should we encounter an error case we send an ICMPv6 message for any
fragment exceeding the maximum length limit. While dequeueing the
original packet and freeing it, statistics were not updated and leaked
both the reassembly queue count for the fragment and the global
fragment count. Found by code inspection and confirmed by tightening
test cases checking more statistical and system counters.
While here properly wrap a line.
MFC after: 3 weeks
Sponsored by: Netflix
When we are checking for the maximum reassembled packet size of the
fragmentable part and run into the error case (packet too big),
we are leaking the packet queue enntry if this was a first fragment
to arrive.
Properly cleanup, removing the queue entry from the bucket, decrementing
counters, and freeing the memory.
MFC after: 3 weeks
Sponsored by: Netflix
have been unlinked, but are still referenced by open file descriptors.
These inodes cannot be freed until the final file descriptor reference
has been closed. If the system crashes while they are still being
referenced, these inodes and their referenced blocks need to be
freed by fsck. By having them on a linked list with the head pointer
in the superblock, fsck can quickly find and process them rather
than having to check every inode in the filesystem to see if it is
unreferenced.
When updating the head pointer of this list of unlinked inodes in
the superblock, the superblock check-hash was not getting updated.
If the system crashed with the incorrect superblock check-hash, the
superblock would appear to be corrupted. This patch ensures that
the superblock check-hash is updated when updating the head pointer
of the unlinked inodes list.
There is no need to MFC as superblock check hashes first appeared in
13.0.
Tested by: Peter Holm
Sponsored by: Netflix
When it is set to 0 (the default), a heavy Netflix-style web workload
suffers from heavy lock contention on the vm page free queue called from
vm_page_zone_{import,release}() as the buckets are frequently drained.
When setting the maxcache, this contention goes away.
We should eventually try to autotune this, as well as make this
zone eligable for uma_reclaim().
Reviewed by: alc, markj
Not Objected to by: jeff
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22112
This permits constructing the entire TLS header in ktls_frame() rather
than ktls_seq(). This also matches the approach used by OpenSSL which
uses an incrementing nonce as the explicit IV rather than the sequence
number.
Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22117
Per sepcification the upper layer header needs to be within the first
fragment. The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.
MFC after: 3 weeks
Sponsored by: Netflix