therefore, it should be safe to set the NX bit on the PML4E for the
recursive page table mappings. According to the Intel docs, the effect
of the NX bit should propogate to any page reached through a PML4E which
has the NX bit set.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14333
first available virtual address to a 2MB boundary. After r329071,
create_pagetables() rounds firstaddr up to a 2MB boundary. This ensures
the kernel is mapped in super-pages, which is the point of the logic
in pmap_kmem_choose(). Therefore, it is no longer necessary for
pmap_bootstrap() to round up to the 2MB boundary again.
As pmap_bootstrap() was the only user of pmap_kmem_choose(), we can
delete pmap_kmem_choose().
Reviewed by: kib
MFC after: 2 weeks
X-MFC-with: r329071
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14355
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.
Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.
Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.
The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.
Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.
MFC after: 1 week
Sponsored by: Mellanox Technologies
VLANs in ibcore.
IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.
Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.
MFC after: 1 week
Sponsored by: Mellanox Technologies
only use the first driver, however this may change in the future and
hardware exists with multiple ITS devices.
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Pretty much any other device might need to manipulate a gpio pin during its
probe or attach routines, so these devices must be available as early as
possible.
The gpio device is an interrupt controller, but I didn't choose the
INTERRUPT pass for that reason (it works fine as an interrupt controller as
long as it attaches any time before interrupts are enabled). That just
looked like the right place in the passes to ensure that it attaches before
any type of device that might need gpio pin manipulations.
Rename ACPI_IVRS_HARDWARE_NEW to ACPI_IVRS_HARDWARE_EFRSUP, since new definitions add Extended Feature Register support. Use IvrsType to distinguish three types of IVHD - 0x10(legacy), 0x11 and 0x40(with EFR). IVHD 0x40 is also called mixed type since it supports HID device entries.
Fix 2 coverity bugs reported by cem.
Reported by:jkim, cem
Approved by:grehan
Differential Revision://reviews.freebsd.org/D14501
driver requires interrupts to do transfers, and the drivers for the SPI
devices on the bus quite reasonably expect to be able to do IO while probing
and attaching.
Normally after grabbing the lock it has to be verified we got the right one
to begin with. However, if we are recursing, it must not change thus the
check can be avoided. In particular this avoids a lock read for non-recursing
case which found out the lock was changed.
While here avoid an irq trip of this happens.
Tested by: pho (previous version)
of years since the century, so strip the century out when converting to or
from bcd_clocktime format (the conversion routines will infer century by
pivoting on 70).
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.
This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.
The bug sometimes manifested itself in stalls during -j 128 package builds.
Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
POSIX defines no macros for these permissions.
Also remove unneeded headers from synopsis.
PR: 225905
Reviewed by: wblock
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14461
The main routine takes 8 args, 3 of which are almost the same for most uses.
This in particular pushes it above the limit of 6 arguments passable through
registers on amd64 making it impossible to tail call.
This is a prerequisite for further cleanups.
Tested by: pho