Either msync(MS_INVALIDATE) or the object unlock during vnode
truncation can expose invalid pages backing wired entries. Accept
them, but do not install them into destrination pmap. We must create
copied pages in the copy case, because e.g. vm_object_unwire() expects
that the entry is fully backed.
Reported by: syzkaller, via emaste
Reported by: syzbot+514d40ce757a3f8b15bc@syzkaller.appspotmail.com
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19615
From Jake:
The iflib core never modifies the isc_driver_version string. Allow
drivers to safely assign pointers to constant buffers by marking this
parameter const.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@, jhb@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19577
From Krzysztof:
The driver built as KLD cannot be unloaded, if this flag is not set.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19402
From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.
This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.
If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.
Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19489
From Jake:
iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an
m_collapse and an m_defrag are attempted to shrink the mbuf cluster to
fit within the DMA segment limitations.
However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns
EFBIG on the now defragmented mbuf, we will continuously re-call
bus_dmamap_load_mbuf_sg over and over.
This happens because m_head isn't NULL, and remap is >1, so we don't try
to m_collapse or m_defrag again. The only way we exit the loop is if
m_head is NULL. However, m_head can't be modified by the call to
bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer.
I believe this will be an incredibly rare occurrence, because it is
unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second
defragment with an EFBIG error. However, it still seems like
a possibility that we should account for.
Fix the exit check to ensure that if remap is >1, we will also exit,
even if m_head is not NULL.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19468
Minimalistic PSCI implementation in U-Boot doesn't implement get_version()
method for some SoC. In this case, use PSCI version declared by 'psci' node
in DT as fallback.
MFC after: 2 weeks
- older DT can use 'cpu0-supply' property for power supply binding.
- don't expect that actual CPU frequency is contained in CPU
operational point table, but read current CPU voltage directly from
reguator. Typically, u-boot can set starting CPU frequency to any
value.
MFC after: 2 weeks
In current code, the delay argument in FDT_PLATFORM_DEF(2) improperly
initialize refs field from kobj_class structure instead of delay_count
field.
This causes not working DELAY() function (due to never initialized
delay_count) in earlier boot stages, until the first timer was attached.
MFC after: 2 weeks
Update NAT64LSN implementation:
o most of data structures and relations were modified to be able support
large number of translation states. Now each supported protocol can
use full ports range. Ports groups now are belongs to IPv4 alias
addresses, not hosts. Each ports group can keep several states chunks.
This is controlled with new `states_chunks` config option. States
chunks allow to have several translation states for single alias address
and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
special prefix "::" value should be used for this purpose when instance
is created.
o due to modified internal data structures relations, the socket opcode
that does states listing was changed.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
and remove possible panic condition.
It is already allowed to sleep in bpfattach[2], since BPF_LOCK was
converted to SX lock in r332388. Also move KASSERT() to the top of
function and make full initialization before bpf_if will be linked
to BPF's list of interfaces.
MFC after: 2 weeks
It may happen on some machines, that even if SGX is disabled
in firmware, the driver would still attach despite EPC base and
size equal zero. Such behaviour causes a kernel panic when the
module is unloaded. Add a simple check to make sure we
only attach when these values are correctly set.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: br
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19595
OpenBSD and NetBSD provide macros to directly reference the underlying
struct timespec's tv_nsec member. While FreeBSD has such macros for
tv_sec, the others are missing. Add the following macros:
st->st_atimensec
st->st_mtimensec
st->st_ctimensec
st->st_birthtimensec
Adding these fields will provide programs which reference them better
portability to FreeBSD. An example of such a program is makefs(8),
which has unused support for subseconds that it has inherited from
NetBSD.
Submitted by: Mitchell Horne <mhorne063@gmail.com>
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D19626
o most of data structures and relations were modified to be able support
large number of translation states. Now each supported protocol can
use full ports range. Ports groups now are belongs to IPv4 alias
addresses, not hosts. Each ports group can keep several states chunks.
This is controlled with new `states_chunks` config option. States
chunks allow to have several translation states for single alias address
and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
special prefix "::" value should be used for this purpose when instance
is created.
o due to modified internal data structures relations, the socket opcode
that does states listing was changed.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
The line was misedited to change tt to st instead of
changing ut to st.
The use of st as the denominator in mul64_by_fraction() will lead
to an integer divide fault in the intr proc (the process holding
ithreads) where st will be 0. This divide by 0 happens after
the total runtime for all ithreads exceeds 76 hours.
Submitted by: bde
Some applications forward from/to host rings most or all the
traffic received or sent on a physical interface. In this
cases it is desirable to have more than a pair of RX/TX host
rings, and use multiple threads to speed up forwarding.
This change adds support for multiple host rings. On registering
a netmap port, the user can specify the number of desired receive
and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings
fields of the nmreq_register structure.
MFC after: 2 weeks
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.
# ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
# ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
# ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in
Obtained from: Yandex LLC
Submitted by: Boris N. Lytochkin
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.
Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.
Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
Update node flags when driver supports SMPS, not when it is disabled or
in dynamic mode ((iv_htcaps & HTCAP_SMPS) != 0).
Checked with RTL8188EE (1T1R), STA mode - 'smps' word should disappear
from 'ifconfig wlan0' output.
MFC after: 2 weeks
The old implementation, at the VFS layer, would map the entire range of
logical blocks between the starting offset and the first data block
following that offset. With large sparse files this is very
inefficient. The VFS currently doesn't provide an interface to improve
upon the current implementation in a generic way.
Add ufs_bmap_seekdata(), which uses the obvious algorithm of scanning
indirect blocks to look for data blocks. Use it instead of
vn_bmap_seekhole() to implement SEEK_DATA.
Reviewed by: kib, mckusick
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19598
Without this dependency relationship, the linker doesn't find the
flash_register_slicer() function, so kldload fails to load fdt_slicer.ko.
Discussed with: ian@
* Set MK_OPENMP to yes by default only on amd64, for now.
* Bump __FreeBSD_version to signal this addition.
* Ensure gcc's conflicting omp.h is not installed if MK_OPENMP is yes.
* Update OptionalObsoleteFiles.inc to cope with the conflicting omp.h.
* Regenerate src.conf(5) with new WITH/WITHOUT fragments.
Relnotes: yes
PR: 236062
MFC after: 1 month
X-MFC-With: r344779
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting. PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.
Requested by: jhb
Reviewed by: jhb, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.
On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process. Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.
MFC note: md_flags change struct proc KBI.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
When the pmap with pti disabled (i.e. pm_ucr3 == PMAP_NO_CR3) is
activated, tss.rsp0 was not updated. Any interrupt that happen before
next context switch would use pti trampoline stack for hardware frame
but fault and interrupt handlers are not prepared to this. Correctly
update tss.rsp0 for both PMAP_NO_CR3 and pti pmaps.
Note that this case, pti = 1 but pmap->pm_ucr3 == PMAP_NO_CR3 is not
used at the moment.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
Previously the main pfsync lock and the bucket locks shared the same name.
This lead to spurious warnings from WITNESS like this:
acquiring duplicate lock of same type: "pfsync"
1st pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1402
2nd pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1429
It's perfectly okay to grab both the main pfsync lock and a bucket lock at the
same time.
We don't need different names for each bucket lock, because we should always
only acquire a single one of those at a time.
MFC after: 1 week
`arc_reclaim_thread()` calls `arc_adjust()` after calling
`arc_kmem_reap_now()`; `arc_adjust()` signals `arc_get_data_buf()` to
indicate that we may no longer be `arc_is_overflowing()`.
The problem is, `arc_kmem_reap_now()` can take several seconds to
complete, has no impact on `arc_is_overflowing()`, but due to how the
code is structured, can impact how long the ARC will remain in the
`arc_is_overflowing()` state.
The fix is to use seperate threads to:
1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which
improves `arc_is_overflowing()`
2. keep enough free memory in the system, by calling
`arc_kmem_reap_now()` plus `arc_shrink()`, which improves
`arc_available_memory()`.
illumos/illumos-gate@de753e34f9
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Tim Kordas <tim.kordas@joyent.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Brad Lewis <brad.lewis@delphix.com>
use them without redefining the value names. New clang no longer
allows to redefine a enum value name to the same value.
Bump __FreeBSD_version, since ports depend on that.
Discussed with: jhb
At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.
At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.
and for %ecx after RDTSCP.
Initialize TSC_AUX MSR with CPUID. It allows for userspace to cheaply
identify CPU it was executed on some time ago, which is sometimes useful.
Note: The values returned might be changed in future.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
After r345180 we need to have the appropriate vnet context set to delete an
rtnode in bridge_rtnode_destroy().
That's usually the case, but not when it's called by the STP code (through
bstp_notify_rtage()).
We have to set the vnet context in bridge_rtable_expire() just as we do in the
other STP callback bridge_state_change().
Reviewed by: kevans
bridge_rtnode_zone still has outstanding allocations at the time of
destruction in the current model because all of the interface teardown
happens in a VNET_SYSUNINIT, -after- the MOD_UNLOAD has already been
processed. The SYSUNINIT triggers destruction of the interfaces, which then
attempts to free the memory from the zone that's already been destroyed, and
we hit a panic.
Solve this by virtualizing the uma_zone we allocate the rtnodes from to fix
the ordering. bridge_rtable_fini should also take care to flush any
remaining routes that weren't taken care of when dynamic routes were flushed
in bridge_stop.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D19578
The ext2_nodealloccg() function unlocks the mount point
in case of successful node allocation.
The additional unlocks are not required and should be removed.
PR: 236452
Reported by: pho
MFC after: 3 days