Cubic calculates the new cwnd based on absolute time
elapsed since the start of an epoch. A cubic epoch is
started on congestion events, or once the congestion
avoidance phase is started, after slow-start has
completed.
When a sender is application limited for an extended
amount of time and subsequently a larger volume of data
becomes ready for sending, Cubic recalculates cwnd
with a lingering cubic epoch. This recalculation
of the cwnd can induce a massive increase in cwnd,
causing a burst of data to be sent at line rate by
the sender.
This adds a flag to reset the cubic epoch once a
session transitions from app-limited to cwnd-limited
to prevent the above effect.
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25065
vmem quantum cache is only needed when doing a lot of concurrent allocations,
which doesn't happen when allocating MSIs. This wastes memory for the cache
zones. Avoid this waste and don't use the quantum cache.
Reported by: markj
These two functions are needed by nfs-over-tls, but could also be
useful for other purposes.
mb_alloc_ext_plus_pages() - Allocates a M_EXTPG mbuf and enough anonymous
pages to store "len" data bytes.
mb_mapped_to_unmapped() - Copies the data from a list of mapped (non-M_EXTPG)
mbufs into a list of M_EXTPG mbufs allocated with anonymous pages.
This is roughly the inverse of mb_unmapped_to_ext().
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D25182
Some environments in which execvPe may be called have a limited amount of
stack available. Currently, it avoidably allocates a segment on the stack
large enough to hold PATH so that it may be mutated and use strsep() for
easy parsing. This logic is now rewritten to just operate on the immutable
string passed in and do the necessary math to extract individual paths,
since it will be copying out those segments to another buffer anyways and
piecing them together with the name for a full path.
Additional size is also needed for the stack in posix_spawnp(), because it
may need to push all of argv to the stack and rebuild the command with sh in
front of it. We'll make sure it's properly aligned for the new thread, but
future work should likely make rfork_thread a little easier to use by
ensuring proper alignment.
Some trivial cleanup has been done with a couple of error writes, moving
strings into char arrays for use with the less fragile sizeof().
Reported by: Andrew Gierth <andrew_tao173.riddles.org.uk>
Reviewed by: jilles, kib, Andrew Gierth
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25038
If execve fails with ENOEXEC, execvp is expected to rebuild the command
with /bin/sh instead and try again.
The previous version did this, but overlooked two details:
argv[0] can conceivably be NULL, in which case memp would never get
terminated. We must allocate no less than three * sizeof(char *) so we can
properly terminate at all times. For the non-NULL argv standard case, we
count all the non-NULL elements and actually skip the first argument, so we
end up capturing the NULL terminator in our bcopy().
The second detail is that the spec is actually worded such that we should
have been preserving argv[0] as passed to execvp:
"[...] executed command shall be as if the process invoked the sh utility
using execl() as follows:
execl(<shell path>, arg0, file, arg1, ..., (char *)0);
where <shell path> is an unspecified pathname for the sh utility, file is
the process image file, and for execvp(), where arg0, arg1, and so on
correspond to the values passed to execvp() in argv[0], argv[1], and so on."
So we make this change at this time as well, while we're already touching
it. We decidedly can't preserve a NULL argv[0] as this would be incredibly,
incredibly fragile, so we retain our legacy behavior of using "sh" for
argv[] in this specific instance.
Some light tests are added to try and detect some components of handling the
ENOEXEC fallback; posix_spawnp_enoexec_fallback_null_argv0 is likely not
100% reliable, but it at least won't raise false-alarms and it did result in
useful failures with pre-change libc on my machine.
This is a secondary change in D25038.
Reported by: Andrew Gierth <andrew_tao173.riddles.org.uk>
Reviewed by: jilles, kib, Andrew Gierth
MFC after: 1 week
This avoids dirtying creds in the common case, see the comment in kern_prot.c
for details.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D24007
Update the iflib version of ixl driver based on the OOT version ixl-1.11.29.
Major changes:
- Extract iflib specific functions from ixl_pf_main.c to ixl_pf_iflib.c
to simplify code sharing between legacy and iflib version of driver
- Add support for most recent FW API version (1.10), which extends FW
LLDP Agent control by user to X722 devices
- Improve handling of device global reset
- Add support for the FW recovery mode
- Use virtchnl function to validate virtual channel messages instead of
using separate checks
- Fix MAC/VLAN filters accounting
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: erj@
Tested by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D24564
Use this in GELI to print out a different message when accelerated
software such as AESNI is used vs plain software crypto.
While here, simplify the logic in GELI a bit for determing which type
of crypto driver was chosen the first time by examining the
capabilities of the matched driver after a single call to
crypto_newsession rather than making separate calls with different
flags.
Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25126
Properly handle reference counts in the 64-bit pmap page directories.
Otherwise all page table pages would leak due to over-referencing. This
would cause a quick enter to swap on a desktop system (AmigaOne X5000) when
quitting and rerunning applications, or just building world.
Add an INVARIANTS check to validate no leakage at pmap release time.
Introducing flags to track the initial Wmax dragging and exit
from slow-start in TCP Cubic. This prevents sudden jumps in the
caluclated cwnd by cubic, especially when the flow is application
limited during slow start (cwnd can not grow as fast as expected).
The downside is that cubic may remain slightly longer in the
concave region before starting the convex region beyond Wmax again.
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor, blanket)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D23655
In the recent dts sync the name of the aips-bus@ changed to bus@. Reflect
this change and add an additional OF_finddevice in fix_fdt_interrupt_data()
and in fix_fdt_iomuxc_data() with bus@ only. Iow, keep the old naming for
compatibility.
Discussed with: ian@
significant bit in the pointer to the node from its parent to indicate
that the node is red. Have the tree rotation macros leave the
old-parent/new-child node red and the new-parent/old-child node black.
This change makes RB_LEFT and RB_RIGHT no longer assignable, and
RB_COLOR no longer defined. Any code that modifies the tree or
examines a node color would have to be modified after this change.
Reviewed by: markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25105
In the receive interrupt routine, always call netmap_rx_irq().
The latter function will return != NM_IRQ_PASS if netmap is not
active on that specific receive queue, so that the driver can go
on with iflib_rxeof(). Note that netmap supports partial opening,
where only a subset of the RX or TX rings can be open in netmap mode.
Checking the IFCAP_NETMAP flag is not enough to make sure that the
queue is indeed in netmap mode.
Moreover, in case netmap_rx_irq() returns NM_IRQ_RESCHED, it means
that netmap expects the driver to call netmap_rx_irq() again as soon
as possible. Currently, this may happen when the device is attached
to a VALE switch.
Reviewed by: gallatin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25167
Specifically, add LIBUSB_CLASS_PHYSICAL and the libusb_has_capability API.
Descriptions and functionality for these derived from the
documentation at [0]. The current set of capabilities are all supported by
libusb.
These were detected as missing after updating net/freerdp to 2.1.1, which
attempted to use both.
[0] http://libusb.sourceforge.net/api-1.0/group__libusb__misc.html
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25194
Add a freebsd32_ptrace() and move as many freebsd32 shims as possible
to freebsd32_ptrace(). Aside from register sets, freebsd32 passes
pointers to native structures to kern_ptrace() and converts to/from
native/32-bit structure formats in freebsd32_ptrace() outside of
kern_ptrace().
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25195
The lock-protected iteration is trivially avoidable.
This removes a serialisation point from Linux binaries (which end up calling
here from the sysinfo syscall).
DEBUG is a kernel configuration flag and if used cpufreq_dt.c will fail the
build of kernel.
PR: 246867
Submitted by: Oskar Holmund (oskar.holmlund@ohdata.se)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25080
Upstream commit message:
[PATCH 3/3] WPS UPnP: Handle HTTP initiation failures for events more
properly
While it is appropriate to try to retransmit the event to another
callback URL on a failure to initiate the HTTP client connection, there
is no point in trying the exact same operation multiple times in a row.
Replve the event_retry() calls with event_addr_failure() for these cases
to avoid busy loops trying to repeat the same failing operation.
These potential busy loops would go through eloop callbacks, so the
process is not completely stuck on handling them, but unnecessary CPU
would be used to process the continues retries that will keep failing
for the same reason.
Obtained from: https://w1.fi/security/2020-1/\
0003-WPS-UPnP-Handle-HTTP-initiation-failures-for-events-.patch
MFC after: 3 days
Security: VU#339275 and CVE-2020-12695
Upstream commit message:
[PATCH 2/3] WPS UPnP: Fix event message generation using a long URL path
More than about 700 character URL ended up overflowing the wpabuf used
for building the event notification and this resulted in the wpabuf
buffer overflow checks terminating the hostapd process. Fix this by
allocating the buffer to be large enough to contain the full URL path.
However, since that around 700 character limit has been the practical
limit for more than ten years, start explicitly enforcing that as the
limit or the callback URLs since any longer ones had not worked before
and there is no need to enable them now either.
Obtained from: https://w1.fi/security/2020-1/\
0002-WPS-UPnP-Fix-event-message-generation-using-a-long-U.patch
MFC after: 3 days
Security: VU#339275 and CVE-2020-12695
Upstream commit message:
[PATCH 1/3] WPS UPnP: Do not allow event subscriptions with URLs to
other networks
The UPnP Device Architecture 2.0 specification errata ("UDA errata
16-04-2020.docx") addresses a problem with notifications being allowed
to go out to other domains by disallowing such cases. Do such filtering
for the notification callback URLs to avoid undesired connections to
external networks based on subscriptions that any device in the local
network could request when WPS support for external registrars is
enabled (the upnp_iface parameter in hostapd configuration).
Obtained from: https://w1.fi/security/2020-1/\
0001-WPS-UPnP-Do-not-allow-event-subscriptions-with-URLs-.patch
MFC after: 3 days
Security: VU#339275 and CVE-2020-12695
r361780 fixed the code so that it would only remove the duplicate when
it actually existed. However, that might have resulted in XU_NGROUPS + 1
groups being copied, running off the end of the array. This patch fixes
the problem.
Spotted during code inspection for other mountd changes.
MFC after: 2 weeks
The previous code was computing an incorrect value in a very expensive
manner. "sharedram" is supposed to be the amount of memory used by
named swap objects, which on FreeBSD basically corresponds to memory
usage by shared memory objects (including, for example, GEM objects) and
tmpfs. We currently have no cheap way to count such pages. The
previous code tried to determine the number of copy-on-write pages
shared between processes.
Just replace the computed value with 0. illumos reportedly does the
same thing. Linux itself did not populate this field until a 2014
commit, "mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces".
Reported by: mjg
MFC after: 1 week
The non-legacy interface always defines num_buffers in the header,
regardless of whether VIRTIO_NET_F_MRG_RXBUF, just leaving it unused. We
also need to ensure our virtqueue doesn't filter out VIRTIO_F_VERSION_1
during negotiation, as it supports non-legacy transports just fine. This
fixes network packet transmission on TinyEMU.
Reviewed by: br, brooks (mentor), jhb (mentor)
Approved by: br, brooks (mentor), jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D25132
The feature bits are exposed as a 32-bit register with 2 banks, so we
should negotiate both halves. Notably, VIRTIO_F_VERSION_1 is in the
upper half, and will be used in an upcoming commit.
The PCI bus driver also has this bug, but the legacy BAR layout did not
include selector registers and is rather different from the modern
layout, so it remains solely as legacy.
Reviewed by: br, brooks (mentor), jhb (mentor)
Approved by: br, brooks (mentor), jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D25131
Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
[PATCH 3/3] WPS UPnP: Handle HTTP initiation failures for events more
properly
While it is appropriate to try to retransmit the event to another
callback URL on a failure to initiate the HTTP client connection, there
is no point in trying the exact same operation multiple times in a row.
Replve the event_retry() calls with event_addr_failure() for these cases
to avoid busy loops trying to repeat the same failing operation.
These potential busy loops would go through eloop callbacks, so the
process is not completely stuck on handling them, but unnecessary CPU
would be used to process the continues retries that will keep failing
for the same reason.
Obtained from: https://w1.fi/security/2020-1/\
0003-WPS-UPnP-Handle-HTTP-initiation-failures-for-events-.patch
Security: VU#339275 and CVE-2020-12695
[PATCH 2/3] WPS UPnP: Fix event message generation using a long URL path
More than about 700 character URL ended up overflowing the wpabuf used
for building the event notification and this resulted in the wpabuf
buffer overflow checks terminating the hostapd process. Fix this by
allocating the buffer to be large enough to contain the full URL path.
However, since that around 700 character limit has been the practical
limit for more than ten years, start explicitly enforcing that as the
limit or the callback URLs since any longer ones had not worked before
and there is no need to enable them now either.
Obtained from: https://w1.fi/security/2020-1/\
0002-WPS-UPnP-Fix-event-message-generation-using-a-long-U.patch
Security: VU#339275 and CVE-2020-12695
[PATCH 1/3] WPS UPnP: Do not allow event subscriptions with URLs to
other networks
The UPnP Device Architecture 2.0 specification errata ("UDA errata
16-04-2020.docx") addresses a problem with notifications being allowed
to go out to other domains by disallowing such cases. Do such filtering
for the notification callback URLs to avoid undesired connections to
external networks based on subscriptions that any device in the local
network could request when WPS support for external registrars is
enabled (the upnp_iface parameter in hostapd configuration).
Obtained from: https://w1.fi/security/2020-1/\
0001-WPS-UPnP-Do-not-allow-event-subscriptions-with-URLs-.patch
MFC after: 1 week
Security: VU#339275 and CVE-2020-12695
VHDX is the successor of Microsoft's VHD file format. It increases
maximum capacity of the virtual drive to 64TB and introduces features
to better handle power/system failures.
VHDX is the required format for 2nd generation Hyper-V VMs.
Reviewed by: marcel
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25184
Rework to simplify and impose sane url syntax.
That is we allow for file://[devname[:fstype]]/package
Reviewed by: stevek
MFC after: 1 week
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org//D25134