It has passed an exp run on amd64 and i386, and has testing on arm64. On
other architectures it is expected to run, however it can be disabled by
building world with -DWITHOUT_BSD_CRTBEGIN.
Sponsored by: DARPA, AFRL
For encapsulated packets, the firmware gives info about the inner frame
fields by default. When not using encapsulation offload, ask for info
about the outer frame instead.
On SFN8xxx with firmware version before v6.4.2.1007 driver reload is
needed after switching from full-feature to low-latency firmware
variant since the driver still thinks that firmware supports
encapsulation, but firmware does not tolerate request to provide info
about outer frame in Rx events.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18105
With MCDIv2, the reponse length can be to big to fit into the
CMDDONE_DATALEN field in the MCDI completion event. But rather that
the length being truncated, it can overflow into the CMDDONE_ERRNO
field (this is a longstanding firmware bug). Hence the CMDDONE_ERRNO
field may not be valid.
It isn't necessary to use the value in the CMDDONE_ERRNO field though,
so it can be ignored. The actual error code is already read from the
response header on MCDIv2 capable hardware and stored in emr_rc, so
that can be used instead.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18104
Use MCDIv2 for requests with a response size too long for MCDIv1.
Required for MC_CMD_MAC_STATS to reports the stats without using DMA.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18103
Inner checksum offloads may be used only if firmware supports
these tunnels.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18102
Remove the requirement that a device be a ACPI method battery to be supported
as a battery.
Require now that the device be in the battery devclass and implement the
get_status and get_info functions. This allows batteries which are not ACPI
method batteries to be supported.
Reviewed by: jtl
Approved by: jtl (mentor)
MFC after: 1 Month
Differential Revision: https://reviews.freebsd.org/D17434
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18101
In the case of packed stream real size of the buffer does not fit in
Rx descriptor byte count. Real size is specified on Rx queue setup.
Non-zero fake should be used to bypass hardware checks.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18100
Packed stream Rx datapath requires access to packed stream state
stored in event queue. Number of credits is upstead in event handler
on a new buffer, packets parsing on 64k boundary crossing and
Rx doorbell push to give credits back.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18099
RxQ type provides more information which may be useful to
setup event queue appropriately.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18098
Submitted by: Andrew Lee <alee at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18097
fix PreFAST issue, add missing annotation that function return value
should not be ignored. Fix alignment.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18096
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18095
The checking performed in the ->envo_type_to_partn
internal method make these assertions unnecessary.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18094
Read existing flash content before writing, so the flash write can be
avoided if the existing partition content matches the new image. This
avoids unnecessary write cycles for the flash device, and may also be
faster. If the flash does need to be updated, verify the content after
writing.
Note that reading the flash content after writing but before calling
efx_nvram-rw_finish() avoids firmware bug68170, which can lead to
signed image updates failing on Medford.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18093
If the firmware reports a non-zero write chunk size then nvram writes
may fail if a different granularity is used (e.g. for MUM firmware on
Sorrento).
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18091
Tracking which partition is locked avoids being overly conservative
when EFX_NVRAM_xxx maps to more than one partition (depnding on the
current port number).
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18090
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18089
The existing name confuses support for secure boot with
support for reporting a verify result after an NVRAM update.
As the capability only reports support for returning a verify
result, change the name to be less confusing.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18088
Extend efx_nvram_rw_finish() to return firmware verify result code.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18087
This makes the verify result visible to efx_nvram_rw_finish(), which
can be extended to report it in a later patch.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18086
Simplify verify result handling in NVRAM update finish
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18085
Submitted by: Andrew Jackson <ajackson at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18083
Submitted by: Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18082
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18081
Update efx_rx_scale_mode_set(), efx_rx_scale_key_set()
and efx_rx_scale_tbl_set().
Submitted by: Mark Spender <mspender at solarflare.com>
Submitted by: Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18080
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18079
Rename efx_rx_scale_support_get() to efx_rx_scale_default_support_get(),
and efx_rx_hash_support_get() to efx_rx_hash_default_support_get().
All these really report is whether an exclusive RSS context was
successfully acquired at efx_rx_init().
efx_rx_scale_support_get() sounds like it reports whether the device
supports RSS, and whether exclusive or shared contexts are supported,
but it doesn't do that. Renaming it to
efx_rx_scale_default_support_get() helps to reflect that it reports
what RSS support the client gets without trying to allocate RSS
contexts itself.
Also rename efx_rx_scale_support_t to efx_rx_scale_context_type_t, to
make the enum more suitable for specifying the type of an RSS context
to be allocated.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18078
The patch adds enc_rx_scale_max_exclusive_contexts member
to nic_cfg_t structure and sets the corresponding values
for Siena, Huntington and Medford
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18077
Default RSS context check is carried out during filter
insertion on Siena and it needs to be fixed
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18076
Make the existing filter-specific define more general.
This is the same as MC_CMD_RSS_CONTEXT_ALLOC_OUT_RSS_CONTEXT_ID_INVALID.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18075
On Medford, with full-featured firmware running, encapsulated
packets may not be delivered unless filters are inserted for
them, as ordinary filters are not applied to encapsulated
packets. So filters for encapsulated packets need to be
inserted for each class of encapsulated packet. For simplicity,
catch-all filters are always inserted. These may match more
packets than the OS has asked for, but trying to insert more
precise filters increases complexity for little gain.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18074
This supports filters which match all unicast or multicast
inner frames in VXLAN, GENEVE, or NVGRE packets.
(Additional fields to match on can be added easily.)
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18073
MC_CMD_FILTER_OP_IN_EXT is needed to set filters for encapsulated
packets.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18072
VXLAN/NVGRE (and Geneve) support is available on SFN8xxx with
full-feature firmware variant running.
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18071
Tx/Rx queue may be already flushed due to Tx/Rx error on the queue or
MC reboot. Caller needs to know that the queue is already flushed to
avoid waiting for flush done event.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18070
MCDI results returned in req.emr_rc have already been translated
from MC_CMD_ERR_* to errno names, so using an MC_CMD_ERR_* value
is incorrect.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18069
Improve error checking to avoid a caller overflowing the MCDI
request buffer if the requested TXQ size was excessively large.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18067
Some libefx-based drivers might need this functionality to
indicate DPCPU FW IDs as part of FW version info to assist
experienced users.
Submitted by: Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18065
If a libefx-based driver needs some way to clear port statistics,
then an MCDI agnostic method is required.
Submitted by: Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18064
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.
vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.
While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.
Revert the fix until a solution can be worked out.
Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.
The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't
filling in parts of the readdir reply, such as d_pad[01] and the bytes
at the end of d_name within d_reclen. As such, data left in a buffer cache
block could be leaked to userland in the readdir reply.
This patch makes sure all of the data is filled in.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib, markj
MFC after: 2 weeks
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.
Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18295
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland. Fix a number of such handlers.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: tuexen
MFC after: 3 days
Security: kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18301
segment in the SYN-SENT state as stated in Section 3.9 of RFC 793,
page 66. Ensure this is also done by the TCP RACK stack.
Reviewed by: rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18034
the TCP connection was initiated using the RACK stack, but the
peer does not support the TCP RACK extension.
This ensures that the TCP behaviour on the wire is the same if
the TCP connection is initated using the RACK stack or the default
stack.
Reviewed by: rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18032
zero. This was already done when sending them via tcp_respond().
Reviewed by: rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17949
Mirror the fix for the native i386 implementation from r218327. This
code is compiled only when the non-default COMPAT_43 option is
configured.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18298
C Turt reports that the driver is not thread safe and may have
exploitable races.
Note that the proto device is intended for prototyping and development,
and is not for use on production systems. From the man page:
SECURITY CONSIDERATIONS
Because programs have direct access to the hardware, the proto
driver is inherently insecure. It is not advisable to use this
driver on a production machine.
The proto device is not included in any of FreeBSD's kernel config files
(although the module is built).
The issues in the proto device still need to be fixed, and the device is
inherently (and intentionally) insecure, but it might as well be limited
to root only.
admbugs: 782
Reported by: C Turt <ecturt@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Queues with 4096 descriptors are not supported as the top bit is used for vfifo
stuffing.
Submitted by: Mark Spender <mspender at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D8948
Due to incorrect merge the piece of code was put in incorrect
place and diverge from libefx in other locations.
Sponsored by: Solarflare Communications, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18024
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.
Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.
Sponsored by: The FreeBSD Foundation
Processes stay in the hash until they get reaped.
This code does not unlink the child from the parent, so remove
the claim that it does.
Sponsored by: The FreeBSD Foundation
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.
Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17817
There are two locations where an always true comparison was made in
a KASSERT. Replace this by an appropriate check and use a consistent
panic message. Also use this code when checking a similar condition.
PR: 229664
Reviewed by: rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18021
It was reported, and I easily reproduced it, that this change triggers panic
when receiving replication stream with enabled embedded blocks, when short
file compressing into one embedded block changes its block size. I am not
sure that the problem is in this particuler patch, not just triggered by it,
but since investigation and fix will take some time, I've decided to revert
this for now.
PR: 198457, 233277
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held. For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.
Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter. KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.
Reported and tested by: Sylvain GALLIANO <sg@efficientip.com>
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18060
It is a write-only flag whose last use was removed in r302235.
No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18059
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish. This will be used in the opensolaris
compat layer.
PR: 227784
Reviewed by: cem
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17975
The FBT fuction boundary prober was setting one return probe marker value,
but the dtrace handler was expecting another. This causes a hang when
tracing return probes.
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages. vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary. It's a
somewhat legacy component of the system and isn't required. You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.
Based on this, we want pageproc to emulate kswapd, not vmproc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D18061
These definitions will be used by a driver to implement Hardware
P-States (autonomous control of HWP, via Intel Speed Shift technology).
Reviewed by: kib
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D18050
These are used by kms-drm to determine various heuristics relate
memory conditions.
The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.
This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h
Reviewed by: mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D18052
Enable evdev on ppc32 as well, similar to what was done i386 and amd64 in
r340387 and ppc64 in r340632.
Evdev can be used by X and is used by wayland to handle input devices.
Approved by: jhibbits
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18049
Important users of unr like tmpfs or pipes can get away with just
ever-increasing counters, making the overhead of managing the state
for 32 bit counters a pessimization.
Change it to an atomic variable. This can be further sped up by making
the counts variable "allocate" ranges and store them per-cpu.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18054
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument. Stop using the macro as well as LINUX_CMSG_FIRSTHDR. Use
the size field of the kernel copy of the control message header to obtain
the next control message.
PR: 217901
MFC after: 2 days
X-MFC-With: r340631
Integer overflows and wrong constants limited the accuracy of these
functions and created situatiosn where sbttoXs(Xstosbt(Y)) != Y. This
was especailly true in the ns case where we had millions of values
that were wrong.
Instead, used fixed constants because there's no way to say ceil(X)
for integer math. Document what these crazy constants are.
Also, use a shift one fewer left to avoid integer overflow causing
incorrect results, and adjust the equasion accordingly. Document this.
Allow times >= 1s to be well defined for these conversion functions
(at least the Xstosbt). There's too many users in the tree that they
work for >= 1s.
This fixes a failure on boot to program firmware on the mlx4
NIC. There was a msleep(1000) in the code. Prior to my recent rounding
changes, msleep(1000) worked, but msleep(1001) did not because the old
code rounded to just below 2^64 and the new code rounds to just above
it (overflowing, causing the msleep(1000) to really sleep 1ms).
A test program to test all cases will be committed shortly. The test
exaustively tries every value (thanks to bde for the test).
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D18051
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.
MFC after: 1 week
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.
MFC after: 1 week
provide a _MEMMOVE extension of _MEMCPY that deals with overlap based on
the previous bcopy(9) implementation and use the former for bcopy(9) and
memmove(9). This addresses my D15374 review comment, avoiding extra MOVs
in case of memmove(9) and trashing the stack pointer.
The update of jemalloc to 5.1.0 exposed a cache syncing issue on a Freescale
e500 base system. There was already code in the FPU emulator to address
this, but it was limited to a single static variable, and did not attempt to
sync the cache. This pulls that out to the higher level program exception
handler, and syncs the cache.
If a SIGILL is hit a second time at the same address, it will be treated as
a real illegal instruction, and handled accordingly.
This patch utilizes the fixed_devclass attribute in order to make sure
other acpi devices with params don't get confused for an EC device.
The existing code assumes that acpi_ec_probe is only ever called with a
dereferencable acpi param. Aside from being incorrect because other
devices of ACPI_TYPE_DEVICE may be probed here which aren't ec devices,
(and they may have set acpi private data), it is even more nefarious if
another ACPI driver uses private data which is not dereferancable. This
will result in a pointer deref during boot and therefore boot failure.
On X86, as it stands today, no other devices actually do this (acpi_cpu
checks for PROCESSOR type devices) and so there is no issue. I ran into
this because I am adding such a device which gets probed before
acpi_ec_probe and sets private data. If ARM ever has an EC, I think
they'd run into this issue as well.
There have been several iterations of this patch. Earlier
iterations had ECDT enumerated ECs not call into the probe/attach
functions of this driver. This change was Suggested by: jhb@.
Reviewed by: jhb
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D16635
Enable evdev on ppc64 as well, similar to what was done for amd64 and i386
in r340387.
Evdev can be used by X and is used by wayland to handle input devices.
Approved by: mmacy
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18026
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin. For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly. One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).
PR: 217901
Reviewed by: kib
MFC after: 3 days
There may be cases where cpu_model[] may not be 32bit aligned, so it is
better to not try to access it as such in order to avoid unaligned access.
Sponsored by: Smartcom - Bulgaria AD
First pass of support for multiple GIC ITS blocks with ACPI.
Changes are to:
* register the correct subset of interrupts with pic_register
in case of ACPI.
* initialize just the cpu interface for the first ITS, when
domain information is not avialable. This has to be done
until we split the per-CPU init to do LPI setup just once.
* remove duplicate check for the GIC ITS domain, the sc_cpus
are setup from domain, so the check again in per-CPU init
seems unnecessary.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17841
Now that the ACPI and FDT implementations for activating and
deactivating resources are the same, we can move it to
pci_host_generic.c. No functional changes.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17793
Now that we are handling PCI resources in pci_host_generic_acpi.c, we
don't need these change (made by r336129)
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17792
This is a major update for pci_host_generic_acpi.c, the current
implementation has some gaps that are better fixed up in one go.
The changes are to:
* Follow x86 method of not adding PCI resources to PCI host bridge in
ACPI code. This has been moved to pci_host_generic_acpi.c, where we
walk thru its resources of the host bridge and add them.
* Fixup code in pci_host_generic_acpi.c to read all decoded ranges
and update the 'ranges' property. This allows us to share most of
the code with generic implementation (and the FDT one).
* Parse and setup IO ranges and bus ranges when walking the resources
above. Drop most of the changes related to this from acpica code.
* Add the ECAM memory area as mem resource 0. Implement the logic to
get the ECAM area from MCFG (using bus range which we now decode),
or from _CBA (using _BBN/bus range). Drop aarch64 ifdefs from acpica
code which did part of this.
* Switch resource activation to similar code as FDT implementation,
this can be moved into generic implementation in a later pass.
* Drop the mechanism of using the 7th bit of bus number as the domain,
this is not correct and will work only in very specific cases. Use
_SEG as PCI domain and use the bus ranges of the host bridge to
provide start bus number.
This commit should not make any functional change to dev/acpica/acpi.c
for other architectures, almost all the changes there are to revert
earlier additions in this file done for aarch64.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17791
On arm64 (where INTRNG is enabled), the interrupts have to be mapped
with ACPI_BUS_MAP_INTR() before adding them as resources to devices.
The earlier code did the mapping before calling acpi_set_resource(),
which bypassed code that checked for PCI link interrupts.
To fix this, move the call to map interrupts into acpi_set_resource()
and that requires additional work to lookup interrupt properties.
The changes here are to:
* extend acpi_lookup_irq_handler() to lookup an irq in the ACPI
resources
* create a helper function acpi_map_intr() which uses the updated
acpi_lookup_irq_handler() to look up an irq, and then map it
with ACPI_BUS_MAP_INTR()
* use acpi_map_intr() in acpi_pcib_route_interrupt() to map
pci link interrupts.
With these changes, we can drop the ifdefs in acpi_resource.c, and
we can also drop the call for mapping interrupts in generic_timer.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17790
Both ACPI and FDT support bus ranges for pci host bridges. Update
pci_host_generic*.[ch] with a default implementation to support this.
This will be used in the next set of changes for ACPI based host
bridge. No functional changes in this commit.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17657
Fix up pci_host_generic.c and pci_host_generic_fdt.c to allocate
resources against devices that requested them. Currently the
allocation happens against the pcib, which is incorrect.
This is needed for the upcoming changes for fixing up
pci_host_generic_acpi.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17656
The current quirk implementation writes a fixed address to the PCI BAR
to fix a firmware bug. The PCI BARs are allocated by firmware and will
change depending on PCI devices present. So using a fixed address here
is not correct.
This quirk worked around a firmware bug that programmed the MSI-X bar
of the SATA controller incorrectly. The newer firmware does not have
this issue, so it is better to drop this quirk altogether.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17655
As of r340465 all consumers use sbsndptr_adv and sbsndptr_noadv
Reviewed by: gallatin
Approved by: krion (mentor)
Differential Revision: https://reviews.freebsd.org/D17998
functions. Notably, reflow the text of some comments so that they
occupy fewer lines, and introduce an assertion in one of the new
helper functions so that it is not misused by a future caller.
In collaboration with: Doug Moore <dougm@rice.edu>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17635
transfer mode only (lost with r321385). [1]
- Similarly, don't try to set the power class on MMC devices that comply
to version 4.0 of the system specification but are operated in default/
legacy transfer or 1-bit bus mode as no power class is specified for
these cases. Trying to set a power class nevertheless resulted in an -
albeit harmless - error message.
PR: 231713 [1]
Just allow MSI interrupts to always start at the end of the I/O APIC
pins. Since existing machines already have more than 255 I/O APIC
pins, IRQ 255 is no longer reliably invalid, so just remove the
minimum starting value for MSI.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D17991
SDM rev. 068 was released yesterday and it contains the description of
the MSR 0x10a IA32_ARCH_CAP. This change adds symbolic definitions for
all bits present in the document, and decode them in the CPU
identification lines printed on boot.
But also, the document defines SSB_NO as bit 4, while FreeBSD used but
2 to detect the need to work-around Speculative Store Bypass
issue. Change code to use the bit from SDM.
Similarly, the document describes bit 3 as an indicator that L1TF
issue is not present, in particular, no L1D flush is needed on
VMENTRY. We used RDCL_NO to avoid flushing, and again I changed the
code to follow new spec from SDM.
In fact my Apollo Lake machine with latest ucode shows this:
IA32_ARCH_CAPS=0x19<RDCL_NO,SKIP_L1DFL_VME,SSB_NO>
Reviewed by: bwidawsk
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D18006
Specifically, block 0-length fragments, even when the MF bit is clear.
Also, ensure that every fragment with the MF bit clear ends at the same
offset and that no subsequently-received fragments exceed that offset.
Reviewed by: glebius, markj
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D17922
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.
Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17825
endpoints.
This can be used to configure several IPsec tunnels between two hosts
with different security associations.
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
These SoCs have CHIPID registers, which store the Chip model, according
to the manufacturer; make use of those in order to better identify
the chip we're actually running on.
If we're unable to read the CHIPID registers for some reason we will
use the string "unknown " as a value for hw.model.
Reported by: yamori813@yahoo.co.jp
Sponsored by: Smartcom - Bulgaria AD
The CBQ BORROW flag conflicts with the RMCF_CODEL flag; the
two sets of definitions actually define the same things. The symptom
is that a kernel with CBQ support and not CODEL fails to load a QoS
policy with the obscure error "pfctl: DIOCADDALTQ: Cannot allocate memory."
If ALTQ_DEBUG is enabled, the error becomes a little clearer:
"rmc_newclass: CODEL not configured for CBQ!" is printed by the kernel.
There really shouldn't be two sets of macros that have to be defined
consistently, but the include structure isn't right for exporting
CBQ flags to altq_rmclass.h. Re-align the definitions, and add
CTASSERTs in the kernel to ensure that the definitions are consistent.
PR: 215716
Reviewed by: pkelsey
MFC after: 2 weeks
Sponsored by: Forcepoint LLC
Differential Revision: https://reviews.freebsd.org/D17758
vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.
Pointy hat to: jhb
Sponsored by: Chelsio Communications
Instead of jumping to locations which store the exact number of bytes,
use displacement to move the destination.
In particular the following clears an area between 8-16 (inclusive)
branch-free:
movq %r10,(%rdi)
movq %r10,-8(%rdi,%rcx)
For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2.
Writing 8 bytes starting at that offset overlaps with 6 bytes written
previously and writes 2 new, giving 10 in total.
Provides a nice win for smaller stores. Other ones are erratic depending
on the microarchitecture.
General idea taken from NetBSD (restricted use of the trick) and bionic
string functions (use for various ranges like in this patch).
Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17660
The key context is always placed immediately after the work request
header. The total work request length has to be rounded up by 16
however.
MFC after: 1 month
Sponsored by: Chelsio Communications
The addresses passed when reading and writing keys are always shifted
right by 5 as the memory locations are addressed in 32-byte chunks, so
the quantum needs to be 32, not 8.
MFC after: 1 month
Sponsored by: Chelsio Communications
For some reason the proc UMA zone's ctor, dtor and init functions are
instrumented, but these functions are always available through FBT.
Moreover, the probes are not part of the original Solaris proc
provider, aren't documented, have no uses (e.g., in dwatch(8)) and
have no clear use to begin with. Therefore, remove them.
Reviewed by: rpaulo
Differential Revision: https://reviews.freebsd.org/D2169
For TOE TLS, we just want to advance the send pointer to skip over the
record just sent to the TOE. The recently added sbsndptr_adv() is
sufficient for that and is cheaper.
MFC after: 1 month
Sponsored by: Chelsio Communications
The number of MSI IRQs still defaults to 512, but it can now be
changed at boot time via the machdep.num_msi_irqs tunable.
Reviewed by: kib, royger (older version)
Reviewed by: markj
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D17977
recent changes in spibus and allow the use of different SPI modes on
the same bus.
Reported by: ian
Sponsored by: Rubicon Communications, LLC (Netgate)
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.
One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.
Sponsored by: Netflix, Inc
rather than the floor(). Returning the floor means that
sbttoX(Xtosbt(y)) != y for almost all values of y. In practice, this
results in a difference of at most 1 in the lsb of the sbintime_t.
This difference is meaningless for all current users of these
functions, but is important for the newly introduced sysctl conversion
routines which implicitly rely on the transformation being idempotent.
Sponsored by: Netflix, Inc
iflib_stop() was not resetting the rxq completion queue state variables.
This meant that for any driver that has receive completion queues, after a
reinit, iflib would start asking what's available on the rx side starting at
whatever the completion queue index was prior to the stop, instead of at 0.
Submitted by: pkelsey
Reported by: pkelsey
MFC after: 3 days
Sponsored by: Limelight Networks
The off-by-one errors in 332735 weren't actual errors and were
preventing the last MSI interrupt source from being used. Instead,
the issue is that when all MSI interrupt sources were allocated, the
loop in msix_alloc() would terminate with 'msi' still set to non-null.
The only check for 'i' overflowing was in the 'msi' == NULL case, so
msix_alloc() would try to reuse the last MSI interrupt source instead
of failing.
Fix by moving the check for all sources being in use to just after the
loop.
Reviewed by: kib, markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D17976
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.
Changelist:
- handle errors returned by virtqueue_enqueue() properly (they were
previously ignored)
- make sure netmap XOR rest of the kernel access each virtqueue.
- compute the number of netmap slots for TX and RX separately, according to
whether indirect descriptors are used or not for a given virtqueue.
- make sure sglist are freed according to their type (mbufs or netmap
buffers)
- add support for mulitiqueue and netmap host (aka sw) rings.
- intercept VQ interrupts directly instead of intercepting them in txq_eof
and rxq_eof. This simplifies the code and makes it easier to make sure
taskqueues are not running for a VQ while it is in netmap mode.
- implement vntet_netmap_config() to cope with changes in the number of queues.
Reviewed by: bryanv
Approved by: gnn (mentor)
MFC after: 3 days
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D17916
Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding
CSUM_IP6?_TCP / CSUM_IP flags are also set.
Rather than requireing drivers to bake-in an understanding that TSO implies
checksum offloads, make it explicit.
This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to
ensure it's zeroed for TSO.
Reported by: Jacob Keller <jacob.e.keller@intel.com>
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17801
r333502 removed initialization of ifc_nhwtxqs, and it's not clear
there's a need to copy it into the struct iflib_ctx at all. Use
ctx->ifc_sctx->isc_ntxqs instead.
Further, iflib_stop() did not clear the last ring in the case where
isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use
ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl.
Reported by: pkelsey
Reviewed by: pkelsey
MFC after: 3 days
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17979