another one in udp_output(). This one is a race condition.
We do check on the laddr and lport without holding a lock in
order to determine whether we want a read or a write lock
(this is in the "sendto/sendmsg" cases where addr (sin) is given).
Instrumenting the kernel showed that after taking the lock, we
had bound to a local port from a parallel thread on the same socket.
If we find that case, unlock, and retry again. Taking the write
lock would not be a problem in first place (apart from killing some
parallelism). However the retry is needed as later on based on
similar condition checks we do acquire the pcbinfo lock and if the
conditions have changed, we might find ourselves with a lock
inconsistency, hence at the end of the function when trying to
unlock, hitting the KASSERT.
Reported by: syzbot+bdf4caa36f3ceeac198f@syzkaller.appspotmail.com
Reviewed by: markj
MFC after: 6 weeks
Event: Waterloo Hackathon 2019
When starting a command also print the opcode and flags.
More consitently print flags as hex.
Use slot_printf rather than printf in one case.
MFC after: 6 weeks
Reviewed by: marius, kibab, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19748
receive sockbuf's high water mark.
Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb. The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).
This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.
MFC after: 3 days
Sponsored by: Chelsio Communications
Summary:
moea64_insert_pteg_native()'s invalidation only works by happenstance.
The purpose of the shifts and XORs is to extract the VSID in order to
reverse-engineer the lower bits of the VPN. Currently a segment size is 256MB
(2**28), and ADDR_API_SHFT64 is 16, so ADDR_PIDX_SHIFT is equivalent. However,
it's semantically incorrect, in that we don't want to shift by the page shift
size, we want to shift to get to the VSID.
Tested by: bdragon
Differential Revision: https://reviews.freebsd.org/D20467
I introduced an obvious compiler error in r346282, so this change fixes
that.
Unfortunately, RANDOM_LOADABLE isn't covered by our existing tinderbox, and
it seems like there were existing latent linking problems. I believe these
were introduced on accident in r338324 during reduction of the boolean
expression(s) adjacent to randomdev.c and hash.c. It seems the
RANDOM_LOADABLE build breakage has gone unnoticed for nine months.
This change correctly annotates randomdev.c and hash.c with !random_loadable
to match the pre-r338324 logic; and additionally updates the HWRNG drivers
in MD 'files.*', which depend on random_device symbols, with
!random_loadable (it is invalid for the kernel to depend on symbols from a
module).
(The expression for both randomdev.c and hash.c was the same, prior to
r338324: "optional random random_yarrow | random !random_yarrow
!random_loadable". I.e., "random && (yarrow || !loadable)." When Yarrow
was removed ("yarrow := False"), the expression was incorrectly reduced to
"optional random" when it should have retained "random && !loadable".)
Additionally, I discovered that virtio_random was missing a MODULE_DEPEND on
random_device, which breaks kld load/link of the driver on RANDOM_LOADABLE
kernels. Address that issue as well.
PR: 238223
Reported by: Eir Nym <eirnym AT gmail.com>
Reviewed by: delphij, markm
Approved by: secteam(delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20466
With the reorg r348175, we now look at modified before it is
set. Rearrange things so that we can set include_metadata to either
yes, no or if-modified. This should fix the -R flag that was broken in
r348175, which broke WITH_REPRODUCIBLE_BUILD for kernels.
Feedback From: emaste@
Differential Revision: https://reviews.freebsd.org/D20480
power-of-two page block it frees, launching an unsuccessful search for
a buddy to pair up with each time. The only possible buddy-up mergers
are across the boundaries of the freed region, so change
vm_phys_free_contig simply to enqueue the freed interior blocks, via a
new function vm_phys_enqueue_contig, and then call vm_phys_free_pages
on the bounding blocks to create as big a cross-boundary block as
possible after buddy-merging.
The only callers of vm_phys_free_contig at the moment call it in
situations where merging blocks across the boundary is clearly
impossible, so just call vm_phys_enqueue_contig in those places and
avoid trying to buddy-up at all.
One beneficiary of this change is in breaking reservations. For the
case where memory is freed in breaking a reservation with only the
first and last pages allocated, the number of cycles consumed by the
operation drops about 11% with this change.
Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D16901
debugging checks.
In particular,
- Move the code to handle failure to allocate page table page into
a helper.
- After the previous item is done, it is possible to distinguish !PG_A
case and case of missed page, in the control flow.
- Make the variable to indicate that in-kernel mapping is demoted.
- Assert that missed page table page can only happen for in-kernel
mapping when demoting direct map.
- If DIAGNOSTIC is enabled, and the page table page should be already
filled, check all ptes instead of only first one.
Reviewed by: alc, markj
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20266
The fix to override the default python version when building
the sysutils/py-google-compute-engine did not work, and there
are still issues that need to be addressed in the port itself.
See bugzilla 238267 for additional details.
MFC after: 6 days
MFC with: r348438
MFC note: no-op to appease the merge tracker
Sponsored by: The FreeBSD Foundation
netdump waits for acknowledgement from the server for each write. When
dumping page table pages, we perform many small writes, limiting
throughput. Use the netdump client's buffer to buffer small contiguous
writes before calling netdump_send() to flush the MAXDUMPPGS-sized
buffer. This results in a significant reduction in the time taken to
complete a netdump.
Submitted by: Sam Gwydir <sam@samgwydir.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20317
The tests are failing because the return value and output have changed, but
before test code structure adjusted, removing these test cases help people
be able to focus on more important cases.
Discussed with: emaste
MFC with: r348206
Sponsored by: The FreeBSD Foundation
mountd.c uses a single linked list of "struct exportlist" structures,
where there is one of these for each exported file system on the NFS server.
This list gets long if there are a large number of file systems exported and
the list must be searched for each line in the exports file(s) when
SIGHUP causes the exports file(s) to be reloaded.
A simple benchmark that traverses SLIST() elements and compares two 32bit
fields in the structure for equal (which is what the search is)
appears to take a couple of nsec. So, for a server with 72000 exported file
systems, this can take about 5sec during reload of the exports file(s).
By replacing the single linked list with a hash table with a target of
10 elements per list, the time should be reduced to less than 1msec.
Peter Errikson (who has a server with 72000+ exported file systems) ran
a test program using 5 hashes to see how they worked.
fnv_32_buf(fsid,..., 0)
fnv_32_buf(fsid,..., FNV1_32_INIT)
hash32_buf(fsid,..., 0)
hash32_buf(fsid,..., HASHINIT)
- plus simply using the low order bits of fsid.val[0].
The first three behaved about equally well, with the first one being
slightly better than the others.
It has an average variation of about 4.5% about the target list length
and that is what this patch uses.
Peter Errikson also tested this hash table version and found that the
performance wasn't measurably improved by a larger hash table, so a
load factor of 10 appears adequate.
Tested by: pen@lysator.liu.se (with other patches)
PR: 237860
MFC after: 1 month
This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.
PR: 238167
syscalls.conf is included using "." which per the Open Group:
If file does not contain a <slash>, the shell shall use the search
path specified by PATH to find the directory containing file.
POSIX shells don't fall back to the current working directory.
Submitted by: Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: bdrewery
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20476
misleading indentation. This is found by gcc -Wmisleading-indentation
Approved by: erj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20428
it hasn't been initialized.
This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.
MFC after: 3 days
Sponsored by: Chelsio Communications
size is too small to bootstrap the firstboot_pkgs list.
While here, add the growfs(8) startup script to /etc/rc.conf,
as Vagrant images can be resized by modifying the Vagrantfile.
Reported by: dbaio
PR: 238226
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The ports/head branch recently switched to python3 as the default,
which breaks the sysutils/py-google-compute-engine startup scripts,
as lang/python installs lang/python3{,.x} where lang/python2{,.x}
are needed.
Set DEFAULT_VERSIONS in release/tools/gce.conf to python=2.7, and
remove the lang/python3 inclusion in VM_EXTRA_PACKAGES.
Additionally, unset DEFAULT_VERSIONS in release/tools/vmimage.subr
to prevent persistence of DEFAULT_VERSIONS=python=2.7 in subsequent
VM/cloud image builds.
Note: at present, this affects only 13-CURRENT and 12-STABLE, as
the stable/11 branch had already switched to using the 2019Q2 branch
at the start of the 11.3-RELEASE cycle, so this does not immediately
affect 11.3-BETA, hence the 1-week merge timeout. This had been
manually tested on 13-CURRENT.
Reported by: ler (privately)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This is the common case when strip(1) is creating the output file.
The change provides a significant speedup when running on ELF files with
many sections.
PR: 234949
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20444
Probably due to historical reasons the driver uses In/Out arguments in
odd way. While this tool still never uses Out arguments to see that,
make the code to not trigger EINVAL in possible future uses.
MFC after: 2 weeks
The order is correct, it is nullfs vnode interlock -> lower vnode
interlock. vop_stdadd_writecount() is called from nullfs
VOP_ADD_WRITECOUNT() and both take interlocks.
Requested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.
PR: 215202
Reviewed by: tijl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20415
ENAv2 introduces many new features, bug fixes and improvements.
Main new features are LLQ (Low Latency Queues) and independent queues
reconfiguration using sysctl commands.
The year in copyright notice was updated to 2019.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The issues were pointed in community review:
https://reviews.freebsd.org/D10427#inline-67587
Also, fix other issues found by the igor tool.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
For easier debugging, the reset is being triggered and the reset reason is
being set only in case it is done for the first time. Such approach will
ensure that the first reset reason is not going to be overwritten and
will make it easier for debugging.
Also, add a reset trigger upon invalid Tx requested ID.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the call to ena_up() in ena_restore_device() fails, next usage of
`ifconfig up` will cause NULL pointer dereference.
This patch adds additional checks to prevent that.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Some messages were missing new line character and traces were not having
unified behavior. To fix that, each trace and printout should add new
line character at the end of each string - that should improve
readability.
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the headers of the packets are split into multiple segments of the
mbuf chain, the previous version of ena_tx_csum which was assuming,
that all segments will lay in the first mbuf, will eventually fail to
map the headers properties to meta descriptor.
That will cause Tx checksum offload to do not work and was leading to
memory corruption. It could even cause the crash of the system.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
For alignment with Linux driver and better handling ena_detach(), the
reset is now calling ena_device_restore() and ena_device_destroy().
The ena_device_destroy() is also being called on ena_detach(), so the
code will be more readable.
The watchdog is now being activated after reset only, if it was active
before.
There were added additional checks to ensure, that there is no race with
the link state change AENQ handler.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the ENA can have multiple states turned on/off, it is more convenient
to store them in single bitfield instead of multiple boolean variables.
The bitset FreeBSD API was used for the bitfield implementation, as it
provides flexible structure together with API which also supports atomic
bitfield operations.
For better readability basic macros from API were wrapped into custom
ENA_FLAG_* macros, which are filling up common parameters for all calls.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Before the patch, error handling was not releasing all resources and
was not issuing device reset if the reset task failed.
That could cause memory leak and fault of the device.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The host info bdf field is the abbreviation for the bus, device,
function of the PCI on which the device is being attached to.
Now the driver is filling information about that using FreeBSD RID
resource.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The new ENA HAL is introducing API, which can determine on Tx path if
the doorbell is needed.
That way, it can tell the driver, that it should call an doorbell.
The old threshold value wasn't removed, as not all HW is supporting this
feature - so it was reworked to also work with the new API.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The Rx ring size can be as high as 8k. Because of that we want to limit
the cleanup threshold by maximum value of 256.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
LLQ (Low Latency Queue) is the feature, that allows pushing header
directly to the device through PCI before even DMA is triggered.
It reduces latency, because device can start preparing packet before
payload is sent through DMA.
To speed up sending data through PCI, the Write Combining is enabled,
which allows hardware to buffer data before sending them on the PCI - it
allows to reduce number of PCI IO operations.
ENAv2 is using special descriptor for the negotiation of the LLQ.
Currently, only the default configuration is supported.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.