Non-NULL timeouts where copied in improperly and could produce failures
due to incompatible data structures.
Reviewed by: kib
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14587
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value. Still, appeasing Coverity is pretty harmless in this case.)
Reported by: Coverity
Reviewed by: Yann Collet
Obtained from: zstd 606374269cf3485972c90b993fbb84dc20da032f
Sponsored by: Dell EMC Isilon
therefore, it should be safe to set the NX bit on the PML4E for the
recursive page table mappings. According to the Intel docs, the effect
of the NX bit should propogate to any page reached through a PML4E which
has the NX bit set.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14333
first available virtual address to a 2MB boundary. After r329071,
create_pagetables() rounds firstaddr up to a 2MB boundary. This ensures
the kernel is mapped in super-pages, which is the point of the logic
in pmap_kmem_choose(). Therefore, it is no longer necessary for
pmap_bootstrap() to round up to the 2MB boundary again.
As pmap_bootstrap() was the only user of pmap_kmem_choose(), we can
delete pmap_kmem_choose().
Reviewed by: kib
MFC after: 2 weeks
X-MFC-with: r329071
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14355
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.
Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.
Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.
The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.
Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.
MFC after: 1 week
Sponsored by: Mellanox Technologies
VLANs in ibcore.
IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.
Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.
MFC after: 1 week
Sponsored by: Mellanox Technologies
only use the first driver, however this may change in the future and
hardware exists with multiple ITS devices.
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Pretty much any other device might need to manipulate a gpio pin during its
probe or attach routines, so these devices must be available as early as
possible.
The gpio device is an interrupt controller, but I didn't choose the
INTERRUPT pass for that reason (it works fine as an interrupt controller as
long as it attaches any time before interrupts are enabled). That just
looked like the right place in the passes to ensure that it attaches before
any type of device that might need gpio pin manipulations.
Rename ACPI_IVRS_HARDWARE_NEW to ACPI_IVRS_HARDWARE_EFRSUP, since new definitions add Extended Feature Register support. Use IvrsType to distinguish three types of IVHD - 0x10(legacy), 0x11 and 0x40(with EFR). IVHD 0x40 is also called mixed type since it supports HID device entries.
Fix 2 coverity bugs reported by cem.
Reported by:jkim, cem
Approved by:grehan
Differential Revision://reviews.freebsd.org/D14501
driver requires interrupts to do transfers, and the drivers for the SPI
devices on the bus quite reasonably expect to be able to do IO while probing
and attaching.
Normally after grabbing the lock it has to be verified we got the right one
to begin with. However, if we are recursing, it must not change thus the
check can be avoided. In particular this avoids a lock read for non-recursing
case which found out the lock was changed.
While here avoid an irq trip of this happens.
Tested by: pho (previous version)
of years since the century, so strip the century out when converting to or
from bcd_clocktime format (the conversion routines will infer century by
pivoting on 70).
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.
This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.
The bug sometimes manifested itself in stalls during -j 128 package builds.
Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
The main routine takes 8 args, 3 of which are almost the same for most uses.
This in particular pushes it above the limit of 6 arguments passable through
registers on amd64 making it impossible to tail call.
This is a prerequisite for further cleanups.
Tested by: pho
the jtag port, so that a tty is not created for it.
This is based on information in the PR and from the vendor website. When
the PR was first opened we had no facility for flagging the jtag ports. I
stumbled across the still-open PR with the idea of closing it, and noticed
that this wee update was needed.
PR: 175893
Before executing a command in a ddb script ddb prints an information
line of the form:
db:1:my-script> command
where 1 is the script's depth level, "my-script" is the scipt's name,
and "command" is the current command in the script.
db_script_exec() uses its 'scriptname' parameter to produce that string.
In the case when db_script_exec() is called from db_run_cmd() the
argument points to db_tok_string that is a global variable used for
command parsing. So, its value changes with every command executed.
This commit changes the code to use the script's name stored in
ds_scriptname to print the line.
MFC after: 2 weeks
instead of pause() in the msleep() function to avoid rounding errors when
converting delay values forth and back. Add a guard for a delay value
of zero milliseconds which is undefined.
MFC after: 1 week
Requested by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
It would have been on an actual named pass before, but none were really
appropriate in name. Move it to the recently created SUPPORTDEV pass, which
perfectly describes it and keeps it in the right order.
Getting regulator is good, enabling them is better.
When the mmc stack decide to change the voltage for IO, don't
change the main vcc of the sd/mmc, only the io vcc.
AXP803 and AXP813/818 are very similar, only two regulators differs.
AXP803 is the companion chip for A64/R18
AXP813 is the companion chip for A83T
AXP818 is the companion chip for H8 (~A83T)
Add support for all regulators found in both of them.
Since that change the system call stack traces look like this:
...
sys___sysctl() at sys___sysctl+0x5f/frame 0xfffffe0028e13ac0
amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe0028e13bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0028e13bf0
So, db_nextframe() stopped recognizing the system call frame.
This commit should fix that.
Reviewed by: kib
MFC after: 4 days
The reason for this new pass is :
The earlier pass names are really specific (interrupt, timer, scheduler etc ..)
and making a driver that other device driver (that attach at DEFAULT pass)
needs probe at earlier pass can be confiusing. We can live with GPIO driver
at INTERRUPT pass because they are often an interrupt controller too but having
a usb phy driver probed at RESOURCES (or SCHEDULER for example) is silly.
The number was choosen to have a lot of margin if we want to introduce other
pass in the futur.
Reviewed by: ian, imp, kevans
Differential Revision: https://reviews.freebsd.org/D14568
imcsmb(4) provides smbus(4) support for the SMBus controller functionality
in the integrated Memory Controllers (iMCs) embedded in Intel Sandybridge-
Xeon, Ivybridge-Xeon, Haswell-Xeon, and Broadwell-Xeon CPUs. Each CPU
implements one or more iMCs, depending on the number of cores; each iMC
implements two SMBus controllers (iMC-SMBs).
*** IMPORTANT NOTE ***
Because motherboard firmware or the BMC might try to use the iMC-SMBs for
monitoring DIMM temperatures and/or managing an NVDIMM, the driver might
need to temporarily disable those functions, or take a hardware interlock,
before using the iMC-SMBs. Details on how to do this may vary from board to
board, and the procedure may be proprietary. It is strongly suggested that
anyone wishing to use this driver contact their motherboard vendor, and
modify the driver as described in the manual page and in the driver itself.
(For what it's worth, the driver as-is has been tested on various SuperMicro
motherboards.)
Reviewed by: avg, jhb
MFC after: 1 week
Relnotes: yes
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D14447
Discussed with: avg, ian, jhb
Tested by: allanjude (previous version), Panasas
POSIX explicitly states that the application must declare union semun.
This makes no sense, but it is what it is. This brings us into line
with Linux, MacOS/Darwin, and NetBSD.
In a ports exp-run a moderate number of ports fail due to a lack of
approprate autotools-like discovery mechanisms or local patches. A
commit to address them will follow shortly.
PR: 224300, 224443 (exp-run)
Reviewed by: emaste, jhb, kib
Exp-run by: antoine
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14492
This deliberately breaks the API in preperation for future syscall
revisions which will remove these nonstandard members.
In an exp-run a single port (devel/qemu-user-static) was found to
use them which it did becuase it emulates system calls. This has
been fixed in the ports tree.
PR: 224443 (exp-run)
Reviewed by: kib, jhb (previous version)
Exp-run by: antoine
Sponsored by: DARPA, AFRP
Differential Revision: https://reviews.freebsd.org/D14490
copyout(9) family.
The addresses are user-controllable, and if the process ABI allows
mapping at zero, then the zero address is meaningful, contradicting
the definition of _Nonnull. In any case, it does not require any
special code to handle NULL udaddr.
It is not clear if __restrict makes sense as well, since kaddr and
udaddr point to different address spaces, so equal numeric values of
the pointers do not imply aliasing and a legitimate. But leave it for
later.
copyinstr(9) does not have its user address argument annotated.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
delayed_work in the LinuxKPI. This allows the timer_pending() function
macro to be used with delayed work structures.
No functional nor structural change.
MFC after: 1 week
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
Sponsored by: Limelight Networks
to fix the memory leak that I introduced in r328426. Instead of
trying to clear up the possible memory leak in all the clients, I
ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures
that memory is always freed if it returns an error).
The original change in r328426 was a bit sparse in its description.
So I am expanding on its description here (thanks cem@ and rgrimes@
for your encouragement for my longer commit messages).
In preparation for adding check hashing to superblocks, r328426 is
a refactoring of the code to get the reading/writing of the superblock
into one place. Unlike the cylinder group reading/writing which
ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel
and cgget/cgput in libufs), I have the core superblock functions
just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is
already imported into utilities like fsck_ffs as well as libufs to
implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions
take a function pointer to do the actual I/O for which there are
four variants:
ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem
g_use_g_read_data / g_use_g_write_data for kernel geom clients
ufs_use_sa_read for the standalone code (stand/libsa/ufs.c
but not stand/libsa/ufsread.c which is size constrained)
use_pread / use_pwrite for libufs
Uses of these interfaces are in the UFS filesystem, geoms journal &
label, libsa changes, and libufs. They also permeate out into the
filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck,
fsirand, fstyp, and quot. Some of these utilities should probably be
converted to directly use libufs (like dumpfs was for example), but
there does not seem to be much win in doing so.
Tested by: Peter Holm (pho@)
Add I2C OPAL driver and a set of dummy-ones to allow
all I2C things on Power8 to attach.
TODO: better async token management
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders. Additional files
still waiting on permission from others are listed in review D14210.
Approved by: dchagin, rdivacky, sos
MFC after: 1 week
MFC with: r329370
Sponsored by: The FreeBSD Foundation
LinuxKPI to comply more with Linux. This fixes an issue when these functions
are used in waiting loops.
MFC after: 1 week
Sponsored by: Mellanox Technologies
In the weird case where the user-provided buffer was zero bytes, we could break
out of PCIOCGETCONF and return without initializing error. In this case,
initialize error to zero -- we successfully did nothing, as requested.
Reported by: Coverity
Sponsored by: Dell EMC Isilon
'status' array passed to get_mouse_status() is usually uninitialized by
callers.
Fully populating it with values in get_mouse_status() can fail due to
read_aux_data().
Additionally, nothing in API constrains 'len' to be >= 3. In practice,
every caller passes three, so perhaps that argument should just be removed.
Refactoring is a larger change, though.
Remove use of potentially uninitialized values by:
1. Only printing 3 debug statuses if the passed array was at least
'len' >= 3;
2. Populating 'status' array up to first three elements, if read_aux_data()
failed.
No functional change intended.
Reported by: Coverity
Sponsored by: Dell EMC Isilon
Coverity cannot determine that handle_written_indirdep() does not access
uninitialized 'sbp' when flags argument is zero.
So, simply move the initialization slightly sooner to silence the warning.
No functional change.
Reported by: Coverity
Sponsored by: Dell EMC Isilon
6.0 spec 6.4.3.5 bit 0 is ignored on QWord, DWord, and Word Address Space
Descriptors, but not Extended Address Space Descriptors.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Differential Revision: https://reviews.freebsd.org/D14516
It calls OF_* functions to check if it needs to implement workarounds.
This may not be the case on arm64 where we support both FDT and ACPI.
Fix this by checking if we are booting on FDT before calling these checks.
Reviewed by: ian
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Differential Revision: https://reviews.freebsd.org/D14515
Retpoline is a compiler-based mitigation for CVE-2017-5715, also known
as Spectre V2, that protects against speculative execution branch target
injection attacks.
In this commit it is disabled by default, but will be changed in a
followup commit.
Reviewed by: bdrewery (previous version)
MFC after: 3 days
Security: CVE-2017-5715
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14242
for the minimum length.
This fixes a bug where cookies of length 2 bytes (which is smaller
than the minimum length of 4) is provided by the server.
Sponsored by: Netflix, Inc.
Page daemon threads for other domains show up in ps(1) output as
"pagedaemon/domN", so let that be the case for domain 0 as well.
Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14518
Those operations, zfsctl_snapdir_readdir and zfsctl_snapdir_getattr,
access the filesystem's objset and it can be unstable during operations
like receive and rollback.
MFC after: 2 weeks
A reservation granule on PowerPC is a cache line.
On e500mc and derivatives a cacheline size is 64 bytes, not 32. Allocate
the maximum size permitted, but only utilize the size that is needed. On
e500v1 and e500v2 the reservation granule will still be 32 bytes.
Chain frames required to satisfy all 2K of declared I/Os of 128KB each take
more then a megabyte of a physical memory, all of which existing code tries
allocate as physically contiguous. This patch removes that physical
contiguousness requirement, leaving only virtual contiguousness. I was
thinking about other ways of allocation, but the less granular allocation
becomes, the bigger is the overhead and/or complexity, reaching about 100%
overhead if allocate each frame separately.
The patch also bumps the chain frames hard limit from 2K to 16K. It is more
than enough for the case of default REQ_FRAMES and MAXPHYS (the drivers will
allocate less than that automatically), while in case of increased MAXPHYS
it will control maximal memory usage.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14420
to what other arches (all except riscv and armv4/5) do.
Submitted by: Hyun Hwang <hyun@caffeinated.codes>
Differential Revision: https://reviews.freebsd.org/D14465
After the auth key is copied into the ipad[] array, any remaining bytes
are cleared to zero (in case the key is shorter than one block size).
The full block size was used as the length of the zero rather than the
size of the remaining ipad[]. In practice this overflow was harmless as
it could only clear bytes in the following opad[] array which is
initialized with a copy of ipad[] in the next statement.
Sponsored by: Chelsio Communications
The parameters describe how much of the adapter's memory is reserved for
storing TLS keys. The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.
Sponsored by: Chelsio Communications
* If compiled with EXT_RESOURCES look up the "biu" and "ciu" clocks in
the DT
* Don't use custom property "bus-frequency" but the standard one
"clock-frequency"
* Use the DT property max-frequency and fall back to 200Mhz if it don't exists
* Add more mmc caps suported by the controller
* Always ack all interrupts
* Subclassed driver can supply an update_ios so they can handle update
the clocks accordingly
* Take care of the DDR bit in update_ios (no functional change since we
do not support voltage change for now)
* Make use of the FDT bus-width property
* rk_cru is a cru driver that needs to be subclassed by
the real CRU driver
* rk_clk_pll handle the pll type clock on RockChip SoC, it's only read
only for now.
* rk_clk_composite handle the different composite clock types (with gate,
with mux etc ...)
* rk_clk_gate handle the RockChip gates
* rk_clk_mux handle the RockChip muxes (unused for now)
* Only clocks for supported devices are supported for now, the rest will be
added when driver support comes
* The assigned-clock* property are not handled for now so we rely a lot on the
bootloader to setup some initial values for some clocks.
Unlike the existing GLA2GPA ioctl, GLA2GPA_NOFAULT does not modify
the guest. In particular, it does not inject any faults or modify
PTEs in the guest when performing an address space translation.
This is used by bhyve's debug server to read and write memory for
the remote debugger.
Reviewed by: grehan
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D14075
`iconv_sysctl_add` from `sys/libkern/iconv.c` incorrectly limits the
size of user strings, such that several out of bounds reads could have
been possible.
static int
iconv_sysctl_add(SYSCTL_HANDLER_ARGS)
{
struct iconv_converter_class *dcp;
struct iconv_cspair *csp;
struct iconv_add_in din;
struct iconv_add_out dout;
int error;
error = SYSCTL_IN(req, &din, sizeof(din));
if (error)
return error;
if (din.ia_version != ICONV_ADD_VER)
return EINVAL;
if (din.ia_datalen > ICONV_CSMAXDATALEN)
return EINVAL;
if (strlen(din.ia_from) >= ICONV_CSNMAXLEN)
return EINVAL;
if (strlen(din.ia_to) >= ICONV_CSNMAXLEN)
return EINVAL;
if (strlen(din.ia_converter) >= ICONV_CNVNMAXLEN)
return EINVAL;
...
Since the `din` struct is directly copied from userland, there is no
guarantee that the strings supplied will be NULL terminated. The
`strlen` calls could continue reading past the designated buffer
sizes.
Declaration of `struct iconv_add_in` is found in `sys/sys/iconv.h`:
struct iconv_add_in {
int ia_version;
char ia_converter[ICONV_CNVNMAXLEN];
char ia_to[ICONV_CSNMAXLEN];
char ia_from[ICONV_CSNMAXLEN];
int ia_datalen;
const void *ia_data;
};
Our strings are followed by the `ia_datalen` member, which is checked
before the `strlen` calls:
if (din.ia_datalen > ICONV_CSMAXDATALEN)
Since `ICONV_CSMAXDATALEN` has value `0x41000` (and is `unsigned`),
this ensures that `din.ia_datalen` contains at least 1 byte of 0, so
it is not possible to trigger a read out of bounds of the `struct`
however, this code is fragile and could introduce subtle bugs in the
future if the `struct` is ever modified.
PR: 207302
Submitted by: CTurt <cturt@hardenedbsd.org>
Reported by: CTurt <cturt@hardenedbsd.org>
Reviewed by: jhb, vangyzen
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14521
libfdt now provides methods to iterate through subnodes and properties in a
convenient fashion.
Replace our ofw_fdt_{peer,child} searches with calls to their corresponding
libfdt methods. Rework ofw_fdt_nextprop to use the
fdt_for_each_property_offset macro, making it even more obvious what it's
doing.
No functional change intended.
Reviewed by: nwhitehorn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14225
We can reach that point with IRQs disabled, and calling ast() with IRQs
disabled can lead to a deadlock.
This should fix the freezes on arm64 under load.
Reviewed by: andrew
The conditional compilation support is now centralized in
tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum
theoretical code/data footprint when TCP_RFC7413 is disabled, but
nearly all the TFO code should wind up being removed by the optimizer,
the additional footprint in the syncache entries is a single pointer,
and the additional overhead in the tcpcb is at the end of the
structure.
This enables the TCP_RFC7413 kernel option by default in amd64 and
arm64 GENERIC.
Reviewed by: hiren
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14048
[RFC7413]. It also includes a pre-shared key mode of operation in
which the server requires the client to be in possession of a shared
secret in order to successfully open TFO connections with that server.
The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).
Reviewed by: tuexen
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14047
The keylist lock was not being acquired early enough. The only side
effect of this bug is that the effective add time of a new key could
be slightly later than it would have been otherwise, as seen by a TFO
client.
Reviewed by: tuexen
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14046
the Server Base System Architecture to be a subset of the pl011 r1p5. As
we don't use the removed features it is safe to just attach to the existing
driver as is.
Sponsored by: DARPA, AFRL
(using "boot -d" at the loader propmt or setting boot_ddb in loader.conf).
Submitted by: Thomas Skibo <thomasskibo@yahoo.com>
Differential Revision: https://reviews.freebsd.org/D14428
the iicbus module for FDT-based systems.
The primary motivation for this is that host controller drivers which
declare DRIVER_MODULE(ofw_iicbus, thisdriver, etc, etc) now only need a
single MODULE_DEPEND(thisdriver, ofw_iicbus) for runtime linking to resolve
all the symbols. With ofw_iicbus and iicbus in separate modules, drivers
would need to declare a MODULE_DEPEND() on both, because symbol lookup is
non-recursive through the dependency chain. Requiring a driver to have
MODULE_DEPENDS() on both amounts to requiring the drivers to understand the
kobj inheritence details of how ofw_iicbus is implemented, which seems like
something they shouldn't have to know (and could even change some day).
Also, this is somewhat analogous to how the drivers get built when compiled
into the kernel. You don't have to ask for ofw_iicbus separately, it just
gets built in along with iicbus when option FDT is in effect.
The first call is used to gauge how much spaces is needed. Just computing
the size instead of generating the output allows to not take the proctree
lock.
It gets called by dmu_buf_init_user, which is inline but not static. So it
needs global linkage itself.
Reported by: GCC-6
MFC after: 17 days
X-MFC-With: 329722
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.
Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.
PR: 209475
Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14367
With r329882, in the absence of a free page shortage we would only take
len(PQ_INACTIVE)+len(PQ_LAUNDRY) into account when deciding whether to
aggressively scan PQ_ACTIVE. Previously we would also include the
number of free pages in this computation, ensuring that we wouldn't scan
PQ_ACTIVE with plenty of free memory available. The change in behaviour
was most noticeable immediately after booting, when PQ_INACTIVE and
PQ_LAUNDRY are nearly empty.
Reviewed by: jeff
new arm64 hardware, e.g. ThunderX2, seems to use this page size so was
failing to attach as the register value read back was incorrect.
While here fix the spelling on shareability.
Sponsored by: DARPA, AFRL
There are no parts useful for usermode applications in
vm/vm_pageout.h. Even for the specific applications like fstat and
lsof.
In my opinion, this protection is redundant and instead userspace
should not include the header at all. Since there are apparently
broken third party codebases, give them a bit of slack by providing
transitional period.
Reported by: julian
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
network device pointer might be NULL. Wait for any pending tasks to
complete before calling axge_stop().
MFC after: 1 week
Sponsored by: Mellanox Technologies
These interfaces were put in place to let QorIQ SoCs dictate CPU idling
semantics, in order to support capabilities such as NAP mode and deep sleep.
However, this never stabilized, and the idling support reverted back to
CPU-level rather than SoC level. Move this code back to cpu.c instead. If
at a later date the lower power modes do come to fruition, it should be done
by overriding the cpu_idle_hook instead of this platform hook.
When I wrote the patch, I wanted to remove SYSINIT() usage from amd64 code.
There is no reason to keep the divergence any more because iwasaki merged
most amd64 suspend/resume code to i386 with r235622. Note this also fixed
an enge case reported by royger. [1]
Suggested by: jhb
Reviewed by: royger
Tested by: royger [1]
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14400 [1]
After r328977, a wired page m may have m->queue != PQ_NONE.
Reviewed by: kib
X-MFC with: r328977
Differential Revision: https://reviews.freebsd.org/D14485
use it to regulate page daemon output.
This provides much smoother and more responsive page daemon output, anticipating
demand and avoiding pageout stalls by increasing the number of pages to match
the workload. This is a reimplementation of work done by myself and mlaier at
Isilon.
Reviewed by: bsdimp
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14402
No implementation of fpu_kern_enter() can fail, and it was causing needless
error checking boilerplate and confusion. Change the return code to void to
match reality.
(This trivial change took nine days to land because of the commit hook on
sys/dev/random. Please consider removing the hook or otherwise lowering the
bar -- secteam never seems to have free time to review patches.)
Reported by: Lachlan McIlroy <Lachlan.McIlroy AT isilon.com>
Reviewed by: delphij
Approved by: secteam (delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14380
Also correct a typo in the comment for these values, noted by jimharris.
Reviewed by: jimharris
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3715
NVMe support is ready and should be compiled-in
to the ppc64 kernel.
Submitted by: Wojciech Macek <wma@semihalf.org>
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
We don't support float in the boot loaders, so don't include
interfaces for float or double in systems headers. In addition, take
the unusual step of spiking double and float to prevent any more
accidental seepage.
2. Sysctls to enable/disable driver_state_dump and error_recovery.
3. Sysctl to control the delay between hw/fw reinitialization and
restarting the fastpath.
4. Stop periodic stats retrieval if interface has IFF_DRV_RUNNING flag off.
5. Print contents of PEG_HALT_STATUS1 and PEG_HALT_STATUS2 on heartbeat
failure.
6. Speed up slowpath shutdown during error recovery.
7. link_state update using atomic_store.
8. Added timestamp information on driver state and minidump captures.
9. Added support for Slowpath event logging
10.Added additional failure injection types to simulate failures.
search for a thread to steal inside a critical section. Since this
allows the search to be preempted, restart the search if preemption
happens since the search results found earlier may no longer be
valid.
Decrease the latency of starting a thread that may be assigned to
this CPU during the search by polling for incoming threads during
the search and switching to that thread instead of continuing the
search.
Test for stale search results and restart the search before going
through the expense of calling tdq_lock_pair(). Retry some tests
after grabbing the locks since things may have changed while waiting
to get both locks.
Eliminate special case handling for stealing from an SMT peer that
uses 1 as the steal threshold. This can only succeed if a thread
has been assigned but our SMT peer has not yet started executing
it. This is quite rare and when it happens the other SMT thread
is generally waiting for the same tdq lock that we hold. Basically
both SMT threads are racing to grab the same spin lock.
Add the kern.sched.always_steal knob from a ULE patch by jeff@.
Incorporate another idea from Jeff's ULE patch. If the sched_switch()
detects that the CPU is about to go idle, try to steal a thread
before switching to the idle thread. Since the search for a thread
to steal has to be done inside a critical section in this context,
limit the impact on latency by adding the knob kern.sched.trysteal_limit
to limit the topological distance of the search and don't restart
the search if we detect stale results. If this search can't find
an stealable thread, the idle loop can do a more complete search.
Also poll for threads being assigned to this CPU during the search
and switch to them instead of continuing the search. This change
is responsibile for the majority of the improvement in parallel
buildworld times.
In sched_balance_group() change the minimum threshold from stealing
a thread from 1 to 2. Poaching a newly assigned thread from a CPU
that is waking up hasn't yet switched to that thread from idle is
likely very rare and is likely to have the same lock race as is
seen when stealing threads in the idle loop. Also use tdq_notify()
to kick the destintation CPU instead of always sending an IPI.
Update a stale comment, the number of transferable threads is not
calculated.
Reviewed by: kib (earlier version)
Comments by: avg, jeff, mav
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12130
A super-set of the functionality of jedec_ts(4). jedec_dimm(4) reports asset
information (Part Number, Serial Number) encoded in the "Serial Presence
Detect" (SPD) data on JEDEC DDR3 and DDR4 DIMMs. It also calculates and
reports the memory capacity of the DIMM, in megabytes. If the DIMM includes
a "Thermal Sensor On DIMM" (TSOD), the temperature is also reported.
Reviewed by: cem
MFC after: 1 week
Relnotes: yes
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D14392
Discussed with: avg, cem
Tested by: avg, cem (previous version, no semantic changes)
Add chvgpio(4) driver for Intel Z8xxx SoC family. This product
was formerly known as Cherry Trail but Linux and OpenBSD drivers
refer to it as Cherry View. This driver is derived from OpenBSD
one so the name is kept for alignment with another BSD system.
Submitted by: Tom Jones <tj@enoti.me>
Reviewed by: gonzo, wblock(man page)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D13086
signal in the LinuxKPI.
The read(), write() and mmap() system calls can return either EINTR or
ERESTART upon receiving a signal. Add code to figure out the correct
return value by temporarily storing the return code from the relevant
FreeBSD kernel APIs in the Linux task structure.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.
NVMe is now working on powerpc64 host.
Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
This change is designed to account for yet another difference between
illumos and FreeBSD VFS. In FreeBSD a filesystem driver is supposed to
clean up mnt_data in its VFS_UNMOUNT method because it's the last call
into the driver before a struct mount object is destroyed. The VFS
drains all references to the object before destroying it, but for the
driver it's already as good as gone.
In contrast, illumos VFS provides another method, VFS_FREEVFS, that is
called when all references are drained. So, the driver can keep its
data after VFS_UNMOUNT and clean it up in VFS_FREEVFS after all
references are gone. This is what ZFS does on illumos.
So there a reference to a filesystem is sufficient to guarantee that the
ZFS specific data, aka zfsvfs_t, stays around (even if the filesystem
gets unmounted). In FreeBSD we need to vfs_busy the filesystem to get
the same guarantee. vfs_ref guarantees only that the struct mount is
kept.
The following rules should be observed in getzfsvfs / getzfsvfs_impl on
FreeBSD:
- if we need access to zfsvfs_t then we must use vfs_busy
- if only we need to access struct mount (aka vfs_t), then vfs_ref is
enough
- when illumos code actually needs only the vfs_t, they still can pass
the zfsvfs_t and get the vfs_t from it; that can work in FreeBSD if
the filesystem is busied, but when it's just referenced then we have
to pass the vfs_t explicitly
- we cannot call vfs_busy while holding a dataset because that creates a
LOR with dp_config_rwlock
As a result:
- getzfsvfs_impl now only references the filesystem, same as in illumos,
but unlike illumos it has to return the vfs_t
- the consumers are updated to account for the change
- getzfsvfs busies the filesystem (and drops the reference from
getzfsvfs_impl)
Also, zfs_unmount_snap() now gets a busied a filesystem, references it
and then unbusies it essentially reverting actions done in getzfsvfs.
This is needed because the code may perform some checks that require the
zfsvfs_t. So, those are done before the unbusying.
MFC after: 2 weeks
vrele() acquires the vnode lock only if the hold count drops to zero.
In other scenarios it needs only the interlock. So,
zfsctl_snapdir_lookup() can race with vfs_mount_destroy() -> vrele()
such that the lookup adds a new reference and then vrele() drops the
mountpoint's reference and only then we check the reference count.
It would be just one in this case.
In fact, the assert should have been removed in r323483 when the code
learned how to deal with the uncovered vnode.
PR: 225795
MFC after: 4 days
X-MFC with: r329556