When iicbus is attached as child of Designware I2C controller it scans all
ACPI nodes for "I2C Serial Bus Connection Resource Descriptor" described
in section 19.6.57 of ACPI specs.
If such a descriptor is found, I2C child is added to iicbus, it's I2C
address, IRQ resource and ACPI handle are added to ivars. Existing
ACPI bus-hosted child is deleted afterwards.
The driver also installs so called "I2C address space handler" which is
disabled by default as nontested.
Set hw.iicbus.enable_acpi_space_handler loader tunable to 1 to enable it.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22901
Newbus device reference attached to ACPI handle is not cleared when newbus
device is deleted with devctl(8) delete command. Fix that with calling of
AcpiDetachData() from "child_deleted" bus method like acpi_pci driver does.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22902
Without this change, if an AIF interrupt comes at the same time a SYNC
command is finished, the SYNC interrupt will be lost. This happens because
all interrupt bits (bellbits) are cleared, but only one of them is handled.
Debugging shows that, (at least) when !sc->msi_enabled and (sc->flags &
AAC_FLAGS_SYNC_MODE) is true (sync mode), both bits may be set at the same
time.
PR: 237463
Reviewed by: scottl
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23859
When I did the use_numa support, I missed the fact that there is
a separate hash function for send tag nic selection. So when
use_numa is enabled, ktls offload does not work properly, as it
does not reliably allocate a send tag on the proper egress nic
since different egress nics are selected for send-tag allocation
and packet transmit. To fix this, this change:
- refectors lacp_select_tx_port_by_hash() and
lacp_select_tx_port() to make lacp_select_tx_port_by_hash()
always called by lacp_select_tx_port()
- pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash()
- adds a numa_domain field to if_snd_tag_alloc_params
- plumbs the numa domain into places where we allocate send tags
In testing with NIC TLS setup on a NUMA machine, I see thousands
of output errors before the change when enabling
kern.ipc.tls.ifnet.permitted=1. After the change, I see no
errors, and I see the NIC sysctl counters showing active TLS
offload sessions.
Reviewed by: rrs, hselasky, jhb
Sponsored by: Netflix
This silences an "unused label" warning as well as fixes the attach fail
path that wasn't releasing resources.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D24004
Before skipping the current cpu when trying to find the ones that
have the same opp, record that this one have this opp.
Reported by: mmel
MFC after: 2 weeks
X-MFC-With: r358555
We were reusing a structure for multiple operations, but failing to
reinitialize one member. The result is that a server that cares about FUSE
file handle IDs would see one correct FUSE_FSYNC operation, and one with the
FHID unset.
PR: 244431
Reported by: Agata <chogata@gmail.com>
MFC after: 2 weeks
because it clobbers the super speed link status when a device is in super
speed mode. Currently the power bit is not needed for anything in the USB
hub driver.
This fixes USB warm reset for super speed devices.
Tested by: Shichun.Ma@dell.com
MFC after: 3 days
Sponsored by: Mellanox Technologies
This has a side effect of eliminating filedesc slock/sunlock during path
lookup, which in turn removes contention vs concurrent modifications to the fd
table.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D23889
This makes it easier to write libkvm programs that access UMA data
structures.
- Remove a couple of unused slab functions and make others local to
uma_core.c. Similarly move SLAB_BITSETS, which affects the layout of
slab structures, to uma_core.c.
- Stop defining the slab structures under _KERNEL. There's no real
reason they can't be visible to userspace like the rest of UMA's
structures are.
- Group KEG_ASSERT_COLD with other keg macros.
- Convert an assertion about MAXMEMDOM to use _Static_assert.
No functional change intended.
Discussed with: jeff
Reviewed by: rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23980
The intent is to provide a header that can be included by other headers
without introducing too much pollution. smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.
Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
could be used for resizeable arrays, for example, so TYPE seems too
generic.
- It is useful to be able to define anonymous SMR-protected pointer
types and the _DECLARE suffix makes that look wrong.
Reviewed by: jeff, mjg, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23988
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
Add a HACK to handle a special case for a sensor location.
Since switch to the lockless grab, shared busy for ahead/behind pages
allows other threads to validate and map the pages readonly.
Reviewed by: avg, jeff, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23986
32-bit Book-E doesn't set UMA_MD_SMALL_ALLOC, and 32-bit OEA platforms
have a 32-bit vm_paddr_t. Moreover, this code was wrong in that it
leaked the page if the check failed.
Reviewed by: jhibbits
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23991
The aim is to reduce the boilerplate needed today to define and
initialize global counters. Also add SI_SUB_COUNTER to the sysinit
ordering.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23977
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.
Reviewed by: kib, mckusick
Approved by: imp (mentor)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23787
- Add more registers needed by bhyve [1]
- Move EL2 registers from armreg.h to hypervisor.h
- Add the register name to hypervisor.h
Obtained from: https://github.com/FreeBSD-UPB/freebsd [1]
This fixes some errors on PPC64, during attach and when trying to assign an IP
to an interface. With this change, basic operation of X710 NICs is now
possible.
This also fixes builds with IXL_DEBUG enabled
Reviewed by: erj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23975
As written, the condition of (cdb[0] != 0x28 || cdb[0] != 0x2A) will always
be true, since if it's one, it's obviously not the other. Reading the code,
the intent appears to be that it should only perform the operation if it's
neither, otherwise the conditional can be elided.
Found by clang 10.
Port aacraid driver to big-endian (BE) hosts.
The immediate goal of this change is to make it possible to use the
aacraid driver on PowerPC64 machines that have Adaptec Series 8 SAS
controllers.
Adapters supported by this driver expect FIB contents in little-endian
(LE) byte order. All FIBs have a fixed header part as well as a data
part that depends on the command being issued to the controller.
In this way, on BE hosts, the FIB header and all FIB data structures
used in aacraid.c and aacraid_cam.c need to be converted to LE before
being sent to the adapter and converted to BE when coming from it.
The functions to convert each struct are on aacraid_endian.c.
For little-endian (LE) targets, they are macros that expand
to nothing.
In some cases, when only a few fields of a large structure are used,
the fields are converted inline, by the code using them.
PR: 237463
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23887
Ucred is passed to bread(9) so that non-local filesystems use proper
credentials. But, since clean buffer might be cached unless
buf_pager_relbuf is not enabled, it makes credentials to have extra
reference until buffer is reclaimed. Ucred reference would prevent
jail from destroying if creds are jailed.
Dereferencing the read credentials on the valid buffer avoid that, and
should be fine because the buffer is valid and does not need re-read.
PR: 238032
Reported by: bz
Reproduced and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23775
Fix panic "Freeing UMA block at 0xn with no associated page".
Also replaces pmap_remove call by pmap_kremove, for symmetry.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D23931
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
Reviewed by: kib
Discussed with: kevans
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23164
Each segment can be up to 4096 bytes in chain structure according to the
RK3399 TRM Part 2.
Set the buffers in full ring where the last one point to the first one.
Correctly reports the MMC_IVAR_MAX_DATA.
Use CACHE_LINE_SIZE for bus_dma alignment.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D23894
This will be used to tag binaries that require W+X mappings, in advance
of the ability to prevent W^X in mmap/mprotect.
There is still some discussion about the flag's name, but the ABI won't
change even if the name does (as kib pointed out in the review).
Reviewed by: csjp, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23909
Check copyin's error code (differ adding copyout checks at this time).
Don't directly access user memory in the switch statement.
Since bnxt_ioctl_data isn't all that big, use a stack allocation.
Reviewed by: jhb
MFC after: 3 days
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23933
appropriate actions when we are trying to detach an audio device,
but cannot because someone is using it.
This avoids applications having to wait for the DSP read data
timeout before they receive any error indication.
Tested with virtual_oss(8).
Remove some unused definitions while at it.
PR: 194727
MFC after: 1 week
Sponsored by: Mellanox Technologies
Implement counting of table entries linked on a per-table base
with an optional (if set > 0) limit of the maximum number of table
entries.
For that the public lltable_link_entry() and lltable_unlink_entry()
functions as well as the internal function pointers change from void
to having an int return type.
Given no consumer currently sets the new llt_maxentries this can be
committed on its own. The moment we make use of the table limits,
the callers of the link function must check the return value as
it can change and entries might not be added.
Adjustments for IPv6 (and possibly IPv4) will follow.
Sponsored by: Netflix (originally)
Reviewed by: melifaro
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22713
cookies, use the same flow label for the segments sent during the
handshake and after the handshake.
This fixes a bug by making sure that sc_flowlabel is always stored in
network byte order.
Reviewed by: bz@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23957
Add four new counters for ND6 related Anti-DoS measures.
We split these out into a separate upfront commit so that we only
change the struct size one time. Implementations using them will
follow.
PR: 157410
Reviewed by: melifaro
MFC after: 2 weeks
X-MFC: cannot really MFC this without breaking netstat
Sponsored by: Netflix (initially)
Differential Revision: https://reviews.freebsd.org/D22711
sending a TCP segment from the TCP SYN cache (like a SYN-ACK).
This fix initialises it to zero. This is correct for the ECN bits,
but is does not honor the DSCP what an application might have set via
the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be
fixed separately.
Reviewed by: Richard Scheffenegger
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23900
This issue was observed on a PowerPC64 machine with an Adaptec RAID Controller
with PCI device ID 0x028d. After several read/write operations, the kernel was
panic'ing in bus_dmamap_sync(). This was due to a missing aac_unmap_command()
in the SYNC path.
PR: 237463
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D23668
This fixes a regression issue after r357861.
Reported by: James Wright <james.wright@jigsawdezign.com>
PR: 224592
PR: 233884
MFC after: 3 days
Sponsored by: Mellanox Technologies
refcount that we took earlier that represents the I/O that ended up
not being started.
Reviewed by: glebius
Approved by: imp (mentor)
Sponsored by: Netflix
Correct the sense of the comment describing sigsetmasked() to match the
code. It was exactly backwards.
While here, convert the type/values of the predicate from pre-C99 int/1/0 to
bool/true/false. No functional change.
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.
This allows us to call it on a per-CPU basis and to warn if the details
are different across CPUs.
While here read the L1 I-Cache type and store this for use later by pmap.
Sponsored by: Innovate UK
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck. At some points this change increases
write IOPS by as much as 50%. I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Add proper #includes, and #ifdefs and some style fixes to make RSS
kernels compile again. There are still possible issues with uin16_t
vs. uint_t cpuid which I am not going near.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D23726
The results of ktls_get_cpu() are stored in u_int and NETISR_CPUID_NONE
requires u_int. Adjust uint16_t to uint_t in order to make RSS kernels
compile some more again.
HPTS still has to be fixed, which is a bit more complicated.
Reviewed by: jhb, gallatin, rrs
Differential Revision: https://reviews.freebsd.org/D23726
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind nor connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.
When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.
Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given an option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).
Sponsored by: Netflix
Reviewed by: glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23886
Like for IPv4 add nh_ia to the ext interface and return rt_ifa
in order to be used for, e.g., packet/octets accounting purposes.
Reviewed by: melifaro
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23873
In fib6_rte_to_nh_* when returning a link-local gateway address
currently we do clear the scope. That could be recovered using
the ifp returned as well, but the code in general seems to
expect a link-local address with scope embeedded as otherwise
the "dst" (gw) passed to the output routines will not include
scope and not send the packet out (the right interface).
Do not clear the scope when returning a link-local address and
allow packets to go out (the right interface).
Remove the (now) extra scope recovery in the IPv6 fast-fwd code.
Sponsored by: Netflix
Reviewed by: melifaro, ae
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23872
A hypervisor, e.g. bhyve, will need to know what exception levelthe kernel
was in when it started booting. If it was EL2 we can then enable said
hypervisor.
Store the boot exception level and allow the kernel to later query it.
Obtained from: https://github.com/FreeBSD-UPB/freebsd (earlier version)
Sponsored by: Innovate UK
If NUMA is not enabled in the kernel config, or is disabled at boot, this
function should just return domain 0 regardless of what's in the device
tree.
Fixes a panic in iflib with NUMA disabled.
Reported by: luporl
to the path related tokens not being processed. Restore this behavior and
and move AUE_JAIL_SET in this block, as it may conditionally contain a
path token.
Discovered by: kevans
PR: 244537
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D23929
When looking for cpu with the same OPP starts from the root /cpus node
so each instance of cpufreq_dt will now each cpu with the same operating
point.
Also test that the node we are testing have the property "device_type" set
to be equal to "cpu".
While here add more debug printfs (off by defaults).
MFC after: 2 weeks
These support outdated or obsolete ISA WAN (T1/E1) sync serial cards,
and these drivers haven't really been touched (other than in tree-wide
sweeps to keep them building) for 15+ years.
Related PCI devices ce and cp are still in the tree, with deprecation
proposed in D23928.
MFC after: 1 week
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
This issue was observed on a PowerPC64 machine with an Adaptec RAID
Controller with PCI device ID 0x028d, where sense data was causing a
buffer overflow because of wrong max sense length logic.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D23667
This allows libinput to disable touchpads when the lid is closed and
various desktop environments can show power-off dialogs when the power
button is pressed. While the latter is doable with devd a
cross-platform solution is nicer.
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D23863
MFC after: 1 week
Sponsored by: Mellanox Technologies
When generating an cloned interface instance in edsc_clone_create(),
generate a MAC address from the FF OUI with ether_gen_addr(). This allows us
to have unique local-link addresses. Previously, the MAC address was zero.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D23842
They are spurious since introduction of struct pwd, which provides them
implicitly.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23885
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.
This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23884
Also, inline/remove now empty or trivial macros. CAM has evolved enough this
code couldn't work there anyway, and the API sweeep commits made since then were
made unconditional.
Eliminate code for old versions, inline pci_find_cap instead of relying on
compat ifdef.
This commit should have been combined with r358488 before pushing it in.
these look to be cut and pasted from other drivers since this driver was
committed to FreeBSD 7-current and MFC'd to FreeBSD 6. The ones for FreeBSD 4
and 5 likely never were working...
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99]. For SCHED_OTHER the single valid priority is
0. On FreeBSD it is [0,31] for all policies. Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.
This commit adds a tunable compat.linux.map_sched_prio. When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.
Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.
This fixes FMOD, a commercial sound library used by several games.
PR: 240043
Tested by: Alex S <iwtcex@gmail.com>
Reviewed by: dchagin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23790
now and are incompatible with the correct ones in RFC 3168.
Submitted by: Richard Scheffenegger
Differential Revision: https://reviews.freebsd.org/D23903
jail_remove(2) and finally setloginclass(2) are not being converted and
committed into userspace. Add the cases for these syscalls and make sure
they are being converted properly.
Reviewed by: bz, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23882
Otherwise we can fail to handle translation faults on curthread, leading
to a panic.
Reviewed by: alc, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23895
sys/arm64/arm64/identcpu.c:1170:5: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
break;
^
sys/arm64/arm64/identcpu.c:1168:4: note: previous statement is here
if (fv[j].desc[0] != '\0')
^
The break should be after the if statement, indented one level less.
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D23871
The IVAR_MAX_DATA is supposed to have the number of descriptor X the mmc
block size and desc_count contain all this information + 1.
Reported by: phk
MFC after: 1 week
Remove the list of architectures and depend on COMPAT_FREEBSD32 which is
defined (if relevent) in opt_global.h and thus defined everywhere in
the kernel.
This is a minor change in behavior in that 32-bit compat for sysctls now
depends on COMPAT_FREEBSD32 rather than on the potential for 32-bit
compat support. The prior arrangement may have been part of an attempt
to allow 32-bit compat to be loadable, but such attempts are doomed to
failure (due to the fact that ioctls have no meaning without the
associated file descriptor) without vastly more refactoring and some
sort of COMPAT_FREEBSD32_SUPPORT option.
Reviewed by: jhb
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23748
Following previous revision, apply the same minor optimization to
hand-rolled atomic_fcmpset_128 in pmap.c.
Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23870
Previously the pattern to extract status flags from inline assembly
blocks was to use setcc in the block to write the flag to a register.
This was suboptimal in a few ways:
- It would lead to code like: sete %cl; test %cl; jne, i.e. a flag
would just be loaded into a register and then reloaded to a flag.
- The setcc would force the block to use an additional register.
- If the client code didn't care for the flag value then the setcc
would be entirely pointless but could not be eliminated by the
optimizer.
A more modern inline asm construct (since gcc 6 and clang 9) allows for
"flag output operands", where a C variable can be written directly from
a flag. The optimizer can then use this to produce direct code where
the flag does not take a trip through a register.
In practice this makes each affected operation sequence shorter by five
bytes of instructions. It's unlikely this has a measurable performance
impact.
Reviewed by: kib, markj, mjg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23869
refcount(9) was recently extended to support waiting on a refcount to
drop to zero, as this was needed for a lockless VM object
paging-in-progress counter. However, this adds overhead to all uses of
refcount(9) and doesn't really match traditional refcounting semantics:
once a counter has dropped to zero, the protected object may be freed at
any point and it is not safe to dereference the counter.
This change removes that extension and instead adds a new set of KPIs,
blockcount_*, for use by VM object PIP and busy.
Reviewed by: jeff, kib, mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23723
In certain cases (probably not during normal operation but observed in
the lab during development) ip6_ouput() could return without error
and ifpp (&oifp) not updated.
Given oifp was never initialized we would take the later branch
as oifp was not NULL, and when calling icmp6_ifstat_inc() we would
panic dereferencing a garbage pointer.
For code stability initialize oifp to NULL before first use to always
have a deterministic value and not rely on a called function to behave
and always and for ever do the work for us as we hope for.
MFC after: 3 days
Sponsored by: Netflix
This more clearly differentiates TLS records encrypted and decrypted
in TOE connections from those encrypted via NIC TLS.
MFC after: 1 week
Sponsored by: Chelsio Communications
Fix the following -Werror warning from clang 10.0.0:
sys/arm/arm/identcpu-v6.c:227:5: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
if (val & CPUV7_CT_CTYPE_RA)
^
sys/arm/arm/identcpu-v6.c:225:4: note: previous statement is here
if (val & CPUV7_CT_CTYPE_WB)
^
This was due to an accidentally inserted tab before the if statement.
MFC after: 3 days
sys/arm/arm/identcpu-v6.c:227:5: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
if (val & CPUV7_CT_CTYPE_RA)
^
sys/arm/arm/identcpu-v6.c:225:4: note: previous statement is here
if (val & CPUV7_CT_CTYPE_WB)
^
This was due to an accidentally inserted tab before the if statement.
MFC after: 3 days
This provides the potential to force a lazy (tick based) SMR to advance
when there are blocking waiters by decoupling the wr_seq value from the
ticks value.
Add some missing compiler barriers.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D23825
Use __riscv_flen instead of __riscv_float_abi_soft. While the latter works for
userland (and one could argue it's more correct), it fails for the kernel. We
compile the kernel with -mabi=lp64 (eg soft float abi) to avoid floating point
instructions in the kernel. We also compile the kernel -march=rv64imafdc for
hard float kernels (eg those with options FPE), but with -march=rv64imac for
softfloat kernels (eg those with FPE). Since we do this, in the kernel (as in
userland) __riscv_flen will be defined for 'riscv64' and not for 'riscv64sf'.
This also removes the -DMACHINE_ARCH hack now that it's no longer needed.
Longer term, we should return the ABI from the sysctl hw.machine_arch like on
amd64 for i386 binaries.
Suggested by: mhorne@
Differential Revision: https://reviews.freebsd.org/D23813
Mesa's drm_syncobj usage, in the LinuxKPI.
While at it optimise the jiffies conversion functions to avoid repeated
and constant calculations.
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D23846
MFC after: 1 week
Sponsored by: Mellanox Technologies
The change affects only FreeBSD specific code as the common code already
mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN.
MFC after: 1 week
It's very unlikely that zfsvfs_update_fromname() and
zvol_rename_minors() ever did anything during the promote operation as
the old name was not initialized.
MFC after: 1 week
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Reviewed by: cem
Approved by: csprng, kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23841
Swap buckets on free as well as alloc so that alloc is always the most
cache-hot data.
When selecting a zone domain for the round-robin bucket cache use the
local domain unless there is a severe imbalance. This does not affinitize
memory, only locks and queues.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D23824
lookup pages. These variants will fall back to their locked counterparts
if the page is not present.
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D23449
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.
In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes#2081zfsonlinux/zfs@7290cd3c4e
MFC after: 1 week
From POSIX,
[ENOTSUP]
The implementation does not support the combination of accesses
requested in the prot argument.
This fits the case that prot contains permissions which are not a subset
of prot_max.
Reviewed by: brooks, cem
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23843
Remove a number of workarounds for older versions of FreeBSD. FreeBSD stable/10
was branched over 6 years ago. All of these changes date from about that time or
earlier. These workarounds are extensive and get in the way of understanding
the current flow in the driver.
This patch addresses an issue found in ztest where resilver
write zios that were passed to an indirect vdev would end up
being handled as though they were resilver read zios. This
caused issues where the zio->io_abd would be both read to
and written from at the same time, causing asserts to fail.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#8193zfsonlinux/zfs@5aa95ba0d3
MFC after: 1 week
This patch fixes an issue discovered by ztest where
dsl_scan_ddt_entry() could add I/Os to the dsl scan queues
between when the scan had finished all required work and
when the scan was marked as complete. This caused the scan
to spin indefinitely without ending.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#8010zfsonlinux/zfs@5e0bd0ae05
MFC after: 1 week
This patch corrects 2 small bugs where scn->scn_phys_cached was
not properly updated to match the primary copy when it needed to
be. The first resulted in the pause state not being properly
updated and the second resulted in the cached version being
completely zeroed even if the primary was not.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#8010zfsonlinux/zfs@8cb119e3dc
MFC after: 1 week
When scn->scn_maxinflight_bytes has not been initialized it's
possible to hang on the condition variable in scan_exec_io().
This issue was uncovered by ztest and is only possible when
deduplication is enabled through the following call path.
txg_sync_thread()
spa_sync()
ddt_sync_table()
ddt_sync_entry()
dsl_scan_ddt_entry()
dsl_scan_scrub_cb()
dsl_scan_enqueuei()
scan_exec_io()
cv_wait()
Resolve the issue by always initializing scn_maxinflight_bytes
to a reasonable minimum value. This value will be recalculated
in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit
and the addition/removal of vdevs.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7098zfsonlinux/zfs@f90a30ad1b
MFC after: 1 week
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
like the mlx-c5 and c6 that require a "setup" routine before
the tcp_ratelimit code can declare and use a rate. I add the
setup routine to if_var as well as fix tcp_ratelimit to call it.
I also revisit the rates so that in the case of a mlx card
of type c5/6 we will use about 100 rates concentrated in the range
where the most gain can be had (1-200Mbps). Note that I have
tested these on a c5 and they work and perform well. In fact
in an unloaded system they pace right to the correct rate (great
job mlx!). There will be a further commit here from Hans that
will add the respective changes to the mlx driver to support this
work (which I was testing with).
Sponsored by: Netflix Inc.
Differential Revision: ttps://reviews.freebsd.org/D23647
Add support for non-ID registers when printing CPU information. This is
used with the cache type register to print details of the cache on boot.
Sponsored by: Innovate UK
The requirements of an Address Space ID allocator and a Virtual Machine ID
allocator are similar. Generalise the former code so it can be used with
the latter.
Reviewed by: alc (previous version)
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D23831
On arm64 the stage 1 and stage 2 pte formats are similar enough we can
reuse the pmap code for both. As they are only similar and not identical
we need to know if we are managing stage 1 or stage 2 tables.
Add an enum to store this information and a check to make sure it is
set to stage 1 when we manage stage 1 pte fields.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D23830
The locking defines for if_bridge used to live in if_bridgevar.h, but
they're only ever used by the bridge implementation itself (in
if_bridge.c). Moving them into the .c file.
Reported by: philip, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23808
Remove an old workaround that is no longer necessary since rS343824.
There used to be a problem with FMan interrupts firing on multiple CPUS
at the same time.
This ended up being due to multicast interrupts being unsupported in the
Freescale PIC (so instead of using a selection algorithm, it would do some
unspecified action, such as interrupting multiple cpus at random.)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23829
Now we execute sendfile_iodone() in all possible cases, which
guarantees that vm_object_pip_wakeup() is called and sfio structure
is freed.
At the beginning of sendfile initialize sfio->m to NULL, that would
indicate that the mbuf chain either doesn't exist, or belongs to the
syscall (not to I/O completion). Fill sfio->m only at a point when
we are positive that there are I/Os ongoing and before releasing
syscall's reference on sfio.
In sendfile_iodone() perform vm_object_pip_wakeup() once last
reference is released, then check for sfio->m. NULL pointer
indicates that we need only to free the memory.
Reviewed by: jtl, gallatin
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE.
Reviewed by: royger
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23638
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores. Right now code only performed MSR write
onr the CPU where sysctl was run.
Properly report hw.ibrs_active for IBRS_ALL. Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.
Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
When moving the calculations for the optlen into the if (opt) block
which deals with possible extension headers I failed to initialise
unfragpartlen to the ipv6 header length if there were no extension
headers present. Correct that mistake to make IPv6 fragment length
calculcations work again.
Reported by: hselasky, kp
OKed by: hselasky, kp
MFC after: 3 days
X-MFC with: r358167
PR: 244393
key-codes from the USB keyboard. Negative key-codes are currently skipped.
While at it use the bit size value provided by the HID location structure
instead of assuming a value of 8.
This fixes a regression issue after r357861.
Reported by: Minoru TANABE <kotanabe3@gmail.com>
PR: 224592
PR: 233884
MFC after: 3 days
Sponsored by: Mellanox Technologies