This used to trigger panics, so try to reproduce it.
Create an if_ovpn interface, set a new peer on it with a TCP fd (as
opposed to the expected UDP) and ensure that this is rejected.
Sponsored by: Rubicon Communications, LLC ("Netgate")
We must ensure that the fd provided by userspace is really for a UDP
socket. If it's not we'll panic in udp_set_kernel_tunneling().
Reported by: Gert Doering <gert@greenie.muc.de>
Sponsored by: Rubicon Communications, LLC ("Netgate")
The BIOS method of booting imposes an absolute limit of 640k for the
size of the program being run due to btx. In practice, this means that
programs larger than about 500kiB will fail in odd ways as the stack /
heap will overflow.
Pick 510,000 as the cutoff line semi-arbitrarily. loader_lua is now
almost too big and we want to break the build when it crosses this
threshold. In my experience, below 500,000 always works, above 520,000
always seems to fail with things getting bad somewhere between 512,000
to 515,000. 510,000 is as close to the line as I think we can go, though
experience may dictate we need to lower this in the future.
This is at-best a stop-breakage until we have a better way to subset the
boot loader for BIOS booting to allow better, more fined-tuned
/boot/loaders for the many different environments they have to run
in. This likely means we'll have a graphical loader than understands a
few filesystmes for installation, and a non-graphical loader that
understands the most filesystems possible for everything else in the
future. Our build infrastructure needs some work before we can do that,
however.
At this late date, it likely isn't worth the efforts to move parts of
the loader into high memory. There's a number of assumptions about where
the stack is, where buffers reside, etc that are fulfilled when it lives
in the first 640k that would need bounce buffers and/or other counter
measures if we were to split it up. All BIOS calls are done in 16-bit
mode with SEG:OFF addresses, requiring them to be in the first 640k of
RAM. And nearly all machines in the last decade can boot with UEFI
(though there's some exceptions, so it isn't worth killing outright
yet).
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D36129
The first level boot blocks have understood how to load ELF code since
1999. Switch /boot/loader and /boot/pxeldr over to being ELF format so
that in-tree tools can examine them more closely. In addition, one
could, in theory, now have a 'lo-mem' and a 'hi-mem' segment (though a
lot of work would need to be done with bounce buffers, btx, code segment
marking, etc for an arrangement like that to work).
As far as I can tell, this is the last a.out binary in the tree. There
are several raw binaries left, but everything else is ELF.
Reviewed by: emaste, kevans
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36130
Fixes INVARIANTS build with Clang 15, which previously failed due to
set-but-not-used variable warnings.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36096
SDIO CMD53 (RW Extented) can either write to the same address (useful for FIFO)
or auto increment the destination address (to write to multiple registers).
It is more logical to have read/write_4 to use incremental mode and make other
helper function for writing to a FIFO destination especially since most FIFO
write/read will be 8bits based and not 32bits based.
Do not use b/l but _1/_4 also address comes first and then data.
This makes them closer to something like bus_space_{read,write}
We have no users in the tree.
Fixes INVARIANTS build with Clang 15, which previously failed due to
set-but-not-used variable warnings.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Currently, subr_bus.c shares logic for (a) maintaining all HW devices
(e.g. discovery/attach/detach logic) and (b) generic devctl notification
layer for devices/PMU/GEOM/interfaces/etc).
These two subsystems share really tiny interaction interface, composed of 3
notification functions. With that in mind, move devctl layer to a
separate file, establishing a clear notification interface between the
sub.c bus layer and the provider (devctl).
The primary driver of this change is netlink implementation (D36002).
The idea is to propagate device-level events to netlink as well, so all
netlink customers can subscribe to these changes.
The long-term goal is to deprecate devctl and to use netlink as the
kernel<> userland transport provided netlink gets enough traction.
Reviewed by: imp, markj
Differential Revision: https://reviews.freebsd.org/D36091
MFC after: 1 month
route_ctl.c size has grown considerably since initial introduction.
Factor out non-relevant parts:
* all rtentry logic, such as creation/destruction and accessors
goes to net/route/route_rtentry.c
* all rtable subscription logic goes to net/route/route_subscription.c
Differential Revision: https://reviews.freebsd.org/D36074
MFC after: 1 month
This change adds public KPI to work with routes using pre-created
nexthops, instead of using data from addrinfo structures. These
functions will be later used for adding/deleting kernel-originated
routes and upcoming netlink protocol.
As a part of providing this KPI, low-level route addition code has been
reworked to provide more control over route creation or change.
Specifically, a number of operation flags
(RTM_F_<CREATE|EXCL|REPLACE|APPEND>) have been added, defining the
desired behaviour the the route already exists (or not exists). This
change required some changes in the multipath addition code, resulting
in moving this code to route_ctl.c, rendering mpath_ctl.c empty.
Differential Revision: https://reviews.freebsd.org/D36073
MFC after: 1 month
This change is required for the upcoming introduction of the next
nexhop-based operations KPI, as it will create rtentry and nexthops
at different stages of route table modification.
Differential Revision: https://reviews.freebsd.org/D36072
MFC after: 2 weeks
* Use same filter func (rib_filter_f_t) for nexhtop groups to
simplify callbacks.
* simplify conditional route deletion & remove the need to pass
rt_addrinfo to the low-level deletion functions
* speedup rib_walk_del() by removing an additional per-prefix lookup
Differential Revision: https://reviews.freebsd.org/D36071
MFC after: 1 month
This and the follow-up routing-related changes target to remove or
reduce `struct rt_addrinfo` usage and use recently-landed nhop(9)
KPI instead.
Traditionally `rt_addrinfo` structure has been used to propagate all necessary
information between the protocol/rtsock and a routing layer. Many
functions inside routing subsystem uses it internally. However, using
this structure became somewhat complicated, as there are too many ways
of specifying a single state and verifying data consistency is hard.
For example, arerouting flgs consistent with mask/gateway sockaddr pointers?
Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked,
have proper length)? Are they mutable? Is the suggested interface specified
by the interface index embedded into the sockadd_dl gateway, or passed
as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to
be derived from the ifa?
These (and other similar) questions have to be considered every time when
a function has `rt_addrinfo` pointer as an argument.
The new approach is to bring more control back to the protocols and
construct the desired routing objects themselves - in the end, it's the
protocol/subsystem who knows the desired outcome.
This specific diff changes the following:
* add explicit basic low-level radix operations:
add_route() (renamed from add_route_nhop())
delete_route() (factored from change_route_nhop())
change_route() (renamed from change_route_nhop)
* remove "info" parameter from change_route_conditional() as a part
of reducing rt_addrinfo usage in the internal KPIs
* add lookup_prefix_rt() wrapper for doing re-lookups after
RIB lock/unlock
Differential Revision: https://reviews.freebsd.org/D36070
MFC after: 2 weeks
This streamlines cloning of a socket from a listener. Now we do not
drop the inpcb lock during creation of a new socket, do not do useless
state transitions, and put a fully initialized socket+inpcb+tcpcb into
the listen queue.
Before this change, first we would allocate the socket and inpcb+tcpcb via
tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock
pcb and put this onto incomplete queue (see 6f3caa6d81). Then, after
sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED,
insert into inpcb hash, finalize initialization of tcpcb. And then, in
call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call
soisconnected(). This call would lock the listening socket once again
with a LOR protection sequence and then we would relocate the socket onto
the complete queue and only now it is ready for accept(2).
Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D36064
as alternative KPI to sonewconn(). The latter has three stages:
- check the listening socket queue limits
- allocate a new socket
- call into protocol attach method
- link the new socket into the listen queue of the listening socket
The attach method, originally designed for a creation of socket by the
socket(2) syscall has slightly different semantics than attach of a socket
cloned by listener. Make it possible for protocols to call into the
first stage, then perform a different attach, and then call into the
final stage. The first stage, that checks limits and clones a socket
is called solisten_clone(), and the function that enqueues the socket
is solisten_enqueue().
Reviewed by: tuexen
Differential revision: https://reviews.freebsd.org/D36063
Changing mode on a pin (input/output/pullup/pulldown) is a bit slow.
Improve this by caching what we can.
We need to check if the pin is in gpio mode, do that the first time
that we have a request for this pin and cache the result. We can't do
that at attach as we are a child of rk_pinctrl and it didn't finished
its attach then.
Cache also the flags specific to the pinctrl (pullup or pulldown) if the
pin is in input mode.
Cache the registers that deals with input/output mode and output value. Also
remove some register reads when we change the direction of a pin or when we
change the output value since the bit changed in the registers only affect output
pins.
Define PAGE_SIZE and PAGE_MASK based on PAGE_SHIFT. With this we only
need to set one value to change one value to change the page size.
While here remove the unused PAGE_MASK_* macros.
Sponsored by: The FreeBSD Foundation
Fixes INVARIANTS build with Clang 15, which previously failed due to
set-but-not-used variable warnings.
Reviewed by: dim
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36097
Imagine we are in SYN-RCVD state and two ACKs arrive at the same time,
both valid, e.g. coming from the same host and with valid sequence.
First packet would locate the listening socket in the inpcb database,
write-lock it and start expanding the syncache entry into a socket.
Meanwhile second packet would wait on the write lock of the listening
socket. First packet will create a new ESTABLISHED socket, free the
syncache entry and unlock the listening socket. Second packet would
call into syncache_expand(), but this time it will fail as there
is no syncache entry. Second packet would generate RST, effectively
resetting the remote connection.
It seems to me, that it is impossible to solve this problem with
just rearranging locks, as the race happens at a wire level.
To solve the problem, for an ACK packet arrived on a listening socket,
that failed syncache lookup, perform a second non-wildcard lookup right
away. That lookup may find the new born socket. Otherwise, we indeed
send RST.
Tested by: kp
Reviewed by: tuexen, rrs
PR: 265154
Differential revision: https://reviews.freebsd.org/D36066
destinations.
The current assumption is that kernel-handled rtadv prefixes along with
the interface address prefixes are the only prefixes considered in
the ND neighbor eligibility code.
Change this by allowing any non-gatewaye routes to be eligible. This
will allow DHCPv6-controlled routes to be correctly handled by
the ND code.
Refactor nd6_is_new_addr_neighbor() to enable more deterministic
performance in "found" case and remove non-needed
V_rt_add_addr_allfibs handling logic.
Reviewed By: kbowling
Differential Revision: https://reviews.freebsd.org/D23695
MFC after: 1 month
Node names for gpio bank were made generic in Linux 5.16 so stop
using them to map the gpio controller to the pin controller bank unit.
Sponsored by: Beckhoff Automation GmbH & Co. KG
`zpool_expand_001_pos` was often failing due to not seeing autoexpand
commands in the `zpool history`. During testing, I found this to be
unreliable (sometimes the "online" wouldn't appear in `zpool history`)
and unnecessary, as we could simply check that the pool increased in
size.
This commit revamps the test to check for the expanded pool size
and corresponding new free space.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#13743
Thanks to George Wilson for clarifying this on Slack.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes#13698
The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#13748
Put the return value of ip_output()/ip6_output in the output event
instead of adding another one in case of an error. This improves
consistency with other similar places.
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D36085
The main change was v1.57 by djm@:
Randomise the rekey interval a little. Previously, the chacha20
instance would be rekeyed every 1.6MB. This makes it happen at a
random point somewhere in the 1-2MB range.
Reviewed by: csprng (markm, cem)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36088
The new tests exercise simulated COW that occurs when the protections on
a wired, copy-on-write mapping are changed from read-only to read-write.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35636