zfs lives in libsa. However, it depends on nvstore (and other things)
that are in common. Fix part of this layering violation by splitting
nvstore into a libsa piece (which is the base implementation) and
keeping a much smaller common piece (to implement the nvstore
command). This just leaves zfs' knowledge of device names that's
specific to common and its calling platform specific init code to
resolve. Add a nvstore.h file for these two parts to communicate private
things and move the public nvstore api from bootstrap.h to stand.h.
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38043
The pfsync_defer_tmo() callout needs to set the correct vnet before it
can transmit packets. It used the rcvif in the mbuf to get this vnet,
but that doesn't work for locally originated traffic. In that case the
rcvif pointer is NULL, and the dereference leads to a panic.
Instead use the sc_sync_if, which is always set (if pfsync is enabled,
at least).
PR: 268246
MFC after: 2 weeks
1. Update ena.4 manual file to include amazon owner emails.
2. State that the driver is developed by amazon but leave
that it was originally written by Semihalf, similarly to other
drivers in the /share/man/ directory of the FreeBSD source code.
3. Advance year in copyright notice to 2022.
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
In case the reset sequence fails (ena_destroy_device() followed by
ena_restore_device() calls) during ena_restore_device(), the driver
resources are being freed. After the clean-up, the timer service is
re-armed in order to try and re-initialize the driver state.
But, such an attempt would fail given that the resources are freed.
Moreover, this would actually cause either the system to fail or a
panic.
When the driver fails in ena_restore_device() procedure, the only
recovery is either unloading and loading the driver or instance
reboot.
This change removes the timer service re-arm in case of failure
in ena_restore_device().
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Fixes: 78554d0c70 ("ena: start timer service on attach")
Commit [1] first added the ena_tx_buffer.print_once member,
so that a message about a missing tx completion is printed only
once per packet (and not every second when the watchdog runs).
In this commit print_once is initialized to true, and is set back
to false after detecting a missing tx completion and printing
a warning about it to dmesg.
Commit [2] incorrectly reverses the values assigned to print_once.
The variable is initialized to be true but is checked to be false
when a missing tx completion is detected. This is never true, and
therefore the warning print for each missing tx completion is never
printed since this commit.
Commit [3] added time passed since last TX cleanup to the missing
tx completions per-packet print. However, due to the issue in commit
[2], this time is never printed.
This commit reverses back the values assigned to ena_tx_buffer.print_once
erroneously by commit [2], bringing back to life the missing tx
completion per-packet print.
Also add a space after "." in the missing tx completion print.
[1] - 9b8d05b8ac ("Add support for Amazon Elastic Network Adapter (ENA) NIC")
[2] - 74dba3ad78 ("Split function checking for missing TX completion in ENA driver")
[3] - d8aba82b5c ("ena: Store ticks of last Tx cleanup")
Fixes: 74dba3ad78 ("Split function checking for missing TX completion in ENA driver")
Fixes: d8aba82b5c ("ena: Store ticks of last Tx cleanup")
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
To attach to the hypervisor, kvmclock needs to write a per-CPU MSR.
When EARLY_AP_STARTUP is not defined, device attach happens too early:
APs are not yet spun up, so smp_rendezvous only runs the callback on the
local CPU. As a result, the timecounter only gets initialized on the
BSP, and then timekeeping is broken on SMP systems.
Implement handling for !EARLY_AP_STARTUP kernels: keep track of the CPU
on which device attach ran, and then use a SI_SUB_SMP SYSINIT to
register the rest of the CPUs with the hypervisor.
Reported by: Shrikanth R Kamath <kshrikanth@juniper.net>
Reviewed by: kib, jhb (earlier versions)
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37705
Currently function prison_ip_restrict() returns true if the replacement
buffer was used, or no buffer provided and allocation fails and should
redo. The logic is confusing and cause possibly infinite loop from
eb8dcdeac2 .
Reviewed by: jamie, glebius
Approved by: kp (mentor)
Differential Revision: https://reviews.freebsd.org/D37918
And possibly infinite loop calling prison_ip_restrict() in
kern_jail_set() [2].
[1] It is possible that prisons do not have any IPv4 or IPv6 addresses.
[2] If prison_ip_restrict() is not provided with prison_ip, when it
allocates prison_ip successfully, then it should return false to
indicate not redo prison_ip_restrict() later.
Reviewed by: glebius
Approved by: kp (mentor)
Fixes: eb8dcdeac2 jail: network epoch protection for IP address lists
Differential Revision: https://reviews.freebsd.org/D37906
Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.
Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink
Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.
The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.
This change targets amd64, the other architectures will follow soon.
Differential Revision: https://reviews.freebsd.org/D37783
Make sure the size of the raw[] array in the lro_address union is
correctly set at compile time, so that static code analysis tools
do not report undefined behaviour.
MFC after: 1 week
Sponsored by: NVIDIA Networking
The cost of enabling syncookies in adaptive mode is very low (basically
a single atomic add when we create a new half-open state), and the
payoff when under SYN flood is huge.
So, enable adaptive mode by default.
Suggested by: Eirik Øverby
When a src/dst ip/port tuple is re-used before the pf state fully
expires we clean up the state and create a new one, unless syncookies
are enabled.
Test this, by running two back-to-back nc sessions, with a fixed source
port. Move the interface and IP to a different (vnet) jail, to trick the
network stack into letting us do this.
MFC after: 2 weeks
Event: Aberdeen hackathon 2022
Differential Revision: https://reviews.freebsd.org/D36886
Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.
Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.
So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.
Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919
<net/if.h> should be a fully user-facing header, so these APIs don't
belong there. Revert and will find another approach.
This reverts commit fe33e0ab83.
Fixes: fe33e0ab83
Sponsored by: Juniper Networks, Inc.
Use NL80211_BAND_2GHZ instead of a hard coded 0 as array index for the
band. While LinuxKPI provides a KPI compatibility some of these values
may not necessarily be KBI compatible (in this case they shoule be so
this is a NOP) and after all it is better style.
No functional change.
MFC after: 3 days
iwl_trans_pcie_send_hcmd() does not seem to exist (anymore). Mark it
as __linux__ so we can submit the cleanup with the next upstream run.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Commit 65127e982b removed a check for ni_startdir != NULL.
This allowed the vrele(ndp->ni_dvp) to be called with
a NULL argument.
This patch adds a new boolean argument to nfsvno_open()
that can be checked instead of ni_startdir, since mjg@ requested
that ni_startdir not be used. (Discussed in PR#268828.)
PR: 268828
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D38032
Add irq_get_msi_desc() as a wrapper around a PCI function which will
allocate a single cached value (see comment on struct) for the
msi_desc requested if it doesn't exist yet and handle freeing it
when the PCI device goes away. We take the values from the ivars of
the native (FreeBSD) device.
While changing struct pci_dev also add the msi_cap field requested by
a wireless driver.
Bump __FreeBSD_version so these changes can be detected.
MFC after: 3 days
X-MFC: move fields to end of struct (alloc happens in linux_pci.c)
Reviewed by: hselasky (earlier version)
Differential Revision: https://reviews.freebsd.org/D37523
Add a version of pci_get_device() as linuxkpi_pci_get_device()
not (yet) supporting the last argument.
Due to conflicts we cannot redefine it as we would normally do
in LinuxKPI so drivers have to be adjusted.
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D37593
pci_alloc_irq_vectors() is given a min and max vector value.
pci_enable_msi() will always succeed independent of these arguments as
it does not know about them. Further it will only ever allocate
1 "vector" not supporting any other amount.
So upfront check that (a) the available pci_msi_count() can satisfy the
requested minv and (b) given the pci_enable_msi() hard coded limit check
that minv is not larger than 1.
If we cannot satisfy either requirement return an error.
This fixes problems with drivers which check that the returned value
of allocated "vectors" will match their requests and only otherwise try
to fall back to ask for 1 or deal otherwise.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: hselasky (earlier version)
Differential Revision: https://reviews.freebsd.org/D37522
Add more functions to netdevice.h (netif_napi_add_tx() being the only
one implemented) and add platform_device.h and netlink.h in order to
make driver code compile.
The skeleton functions are used only in very limited scope and not at
all in our usage so far but add (invasive) #ifdef if removed.
Add pr_debug() calls to each of them in order to log a TODO (if DEBUG
compiled in) and someone should hit them in the future.
MFC after: 3 days
Commented on by: hselasky (earlier version)
Differential Revision: https://reviews.freebsd.org/D37599
While here:
- fix an argument of kstrtouint_from_user() to correct signedness.
- make kstrtou32() call kstrtouint() to avoid duplication (keep inline
function)
Add kstrtou32_from_user() based on other examples in the file
making it a copy of the now fixed kstrtouint_from_user().
Also add a rudimentarily hacked up version of mac_pton() which is
leanient accepting non-well-formed input but so far only with ':'
separators. It does not seem to obviously belong to any networking
header file so add it here.
Both new functions are needed for debugfs support for iwlwifi hence
coming together in one commit.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Commented on by: emaste
Differential Revision: https://reviews.freebsd.org/D37088
If a type=dir entry exists and all contents are directories, files
added with contents=, or symlinks with link= attributes then it doesn't
need to exist. Just let openat fail in that case. It's conceivable
this will make debugging some cases weird, but it's sufficent to handle
the way we add /root/.ssh in CheriBSD VM images.
This is a recommit of 794154149f with
bugfixes.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D38029
If a type=dir entry exists and all contents are directories, files
added with contents=, or symlinks with link= attributes then it doesn't
need to exist. Just let openat fail in that case. It's conceivable
this will make debugging some cases weird, but it's sufficent to handle
the way we add /root/.ssh in CheriBSD VM images.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D38029
When a link target is specified use it rather than attempting to read
a potentially non-existant file.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D38028
Cast char's through unsigned char before storing as an integer in
xdr_char(), this ensures that the encoded form is consistently not
sign-extended following Open Solaris's example.
Prior to this change, platforms with signed chars would sign extend
values with the high bit set but ones with unsigned chars would not
so 0xff would be stored as 0x000000ff on unsigned char platforms and
0xffffffff on signed char platforms. Decoding has the same
result for either form so this is a largely cosmetic change, but it
seems best to produce consistent output.
For more discussion, see https://github.com/openzfs/zfs/issues/14173
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D37992
Summary:
The "public" KPI for ifnet belongs in net/if.h, with net/if_var.h being
implementation details for the netstack. This is the next step in
enforcing that separation.
Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38030
Set the number of artificial frames to 5:
1. cpu_exception_handler_supervisor()
2. do_trap_supervisor()
3. dtrace_invop_start()
4. dtrace_invop()
5. fbt_invop()
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37663
Experimentation shows this is the correct value; the dtrace/interrupt
handler frames are omitted, while the backtrace of the active thread is
recorded in its entirety.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37662
Backtraces for fbt probes are missing the caller's frame. Despite what
the inherited comment claims, we do need to insert this manually on
riscv. In fbt_invop(), set cpu_dtrace_caller to be the return address,
not addr.
We should not increment aframes within this function, since we begin the
main loop by unwinding past the current frame.
Plus some very small comment/style tweaks.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37661
Specifically it is missing in kernel modules, meaning a proper backtrace
can't be constructed.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37657
This reverts commit c33509d49a.
It turns out that the long 27 second delay I saw in the
gss_acquire_cred() call was caused by a (mis)configured
DNS. Although I did not specify "dns" in /etc/nsswitch.conf,
I did have a /etc/resolv.conf file on the system (left
there by wpa_supplicant). As such, with no route, it was
somehow trying to contact the DNS server, although there was none.
Once I got rid of the /etc/resolv.conf file, it worked
as expected.
Since there is now a large 5 minute timeout on the
kernel to gssd(8) upcalls, the gssd(8) daemon will not
get terminated when this delay occurs and the only affect
is a 30 second delay during the mount.
Discussed with: bjk
Fix a possible NULL pointer deref in case alloc_pages() fails.
This is theoretical so far as up to now no code in the tree uses
linuxkpi_page_frag_alloc().
Reported by: Coverity via emaste
Coverity ID: 1502345
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC-with: 55038a6306
Pull together the nearly identical copies of set_currdev in i386,
userboot and efi. Other boot loaders have variances that might be fine
to use the common routine, or not. Since they are harder to test for me,
and ofw and uboot do handle these setting differently, leave them be for
now.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38005
Since dev_cleanup() walks through all the devsw devices with dv_cleanup
rotuines, move it into libsa rather than having it in
'common'. Logically, it operates only on things that are in libsa, and
would never be different for different loaders: either people would call
it as is, or they'd do the loop themselves with 'special' things inline
between calls to cleanup (not that I think that will ever be needed
though).
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38004
Replace 4 identical copies of *_setcurrdev with gen_setcurrdev to avoid
having to create a 5th copy. uboot_setcurrdev is actually different and
needs to remain separate (even though it's quite similar).
Sponsored by: Netflix
Reviewed by: fuz@fuz.su, kevans
Differential Revision: https://reviews.freebsd.org/D38003