Commit Graph

57 Commits

Author SHA1 Message Date
Brandon Bergren
98727c6cd1 Fix panic when using BOOTP to resolve root path.
When loading a direct-boot kernel, a temporary route is being installed,
the NFS handle is acquired, and the temporary route is removed again.

This was being done inside a net epoch, but since the krpc code is written
using blocking APIs, we can't actually do that, because sleeping is not
allowed during a net epoch.

Exit and reenter the epoch so we are only in the epoch when doing the
routing table manipulation.

Fixes panic when booting my RB800 over NFS (where the kernel is loaded
using RouterBOOT directly.)

Reviewed by:	melifaro
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D29464
2021-03-28 14:02:40 -05:00
Mateusz Guzik
7c8e2a57dd nfs: clean up empty lines in .c and .h files 2020-09-01 21:25:39 +00:00
Alexander V. Chernikov
e1c05fd290 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
 to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
2020-07-21 19:56:13 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Alexander V. Chernikov
9c67d4a14a Remove rtable dumping code from bootp.
This debugging code printing routing table data was introduced in rS25723,
 22+ years ago. The last functional commit to this code was rS67534, 19 years ago.
The code has been turned off by default all this time.
Lastly, this code directly iterates radix tree and rtentries, which is not
 not a proper interaction with routing system.

Differential Revision:	https://reviews.freebsd.org/D24554
2020-04-28 07:23:41 +00:00
Andrey V. Elsukov
20efcfc602 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Brooks Davis
0437c8e3b1 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
Brooks Davis
69f0fecbd6 Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Ian Lepore
2596e59108 Do not try to install a default route for each interface found, because
only the first one will actually work and all the others just result in
errors (which would get printed but otherwise ignored).

Instead, wait until we make a choice of which interface will be used to
mount the rootfs, and install the default route associated with it (if any).
After doing the md_mount() call to obtain the needed info, remove the
default route again, and transcribe the route info into the nfs_diskless
structure.  If the system eventually chooses to mount the nfs rootfs, the
default route will be installed again when the nfs_diskless code
re-initializes the interface.

The theory here is that since we can only have one default route, the one
most likely to be correct for mounting the rootfs is the one that was
delivered along with the rootpath option.
2016-03-27 23:16:37 +00:00
Ian Lepore
25101b0b1c Stop setting the default route to the IP of the interface itself when the
bootp/dhcp server doesn't provide a router option.  Doing so prevents
setting defaultrouter=<ip> in rc.conf (it fails because there's already
a bogus default route installed by bootpc_init).

When an admin wants to use this style of proxy arp on an interface, the
proper mechanism is to set the "use-lease-addr-for-default-route" flag
in the dhcp server config.  That causes the lease address to be delivered
in the routers option, and the normal handling of the routers option will
then install the self-ip as the default route.

PR:		187094
2016-03-27 22:58:56 +00:00
Ian Lepore
d32d350eeb Switch bootpc_adjust_interface() from returning int to void. Its one caller
doesn't check for errors, and all the errors that can happen result in it
calling panic anyway, except for one that's really more of a warning (and
is going to disappear on an upcoming commit anyway).
2016-03-27 22:36:32 +00:00
Ian Lepore
eb2d4f022c Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp
server (and not when it came from a fallback method such as the ROOTDEVNAME
option).  This makes the code in bootpc_init() choose the first interface
that provided a rootpath name.  Previously it was choosing the first
interface that got an IP address, which could be on a different and
potentially unreachable subnet than the server providing the rootfs.

If the rootpath name actually does come from a fallback source, then the
code continues to use the first interface in the list that got configured.

Note that this wasn't directly reported in the PR cited below, but was
discovered while working on that PR.

PR:		187094
2016-03-27 22:21:34 +00:00
Ian Lepore
a6c1490807 If the dhcp server provides an interface-mtu option, parse the value and
set that mtu on the interface.

These changes are based on the patch submitted by Robert Blayzor in the
PR, but I changed things around a bit, so the blame for any mistakes
belongs to me.

PR:		187094
2016-03-21 14:51:51 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Rick Macklem
c15882f091 Remove the old NFS client and server from head,
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.

Discussed on: freebsd-fs
2014-12-23 00:47:46 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Ian Lepore
6cbd933b37 Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem
other than the one specified by the BOOTP server.  This configures NFS
using the BOOTP protocol while also respecting other root-path options such
as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT
boot option.  It allows you to override the root path provided by the
server, or to supply a root path when the server provides IP configuration
but no root path info.

This maintains the historical BOOTP_NFSROOT behavior of panicking on a
failure to mount the root path provided by the server, unless you've
provided an alternative via the ROOTDEVNAME kernel option or by setting
vfs.root.mountfrom.  The behavior of panicking when given no other options
is preserved because it amounts to a bit of a retry loop that could
eventually recover from a transient network or server problem.

The user can now override the root path from loader(8) even if the
kernel is compiled with BOOTP_NFSROOT.  If vfs.root.mountfrom is set in
the environment it is used unconditionally -- it always overrides the
BOOTP info.  If it begins with [old]nfs: then the BOOTP code uses it
instead of the server-provided info.  If it specifies some other
filesystem then the bootp code will not panic like it used to and the code
in vfs_mountroot.c will invoke the right filesystem to do the mount.

If the kernel is compiled with the ROOTDEVNAME option, then that name is
used by the BOOTP code if either
      * The server doesn't provide a pathname.
      * The boothowto flags include RB_DFLTROOT.
The latter allows the user to compile in alternate path in ROOTDEVNAME
such as ufs:/dev/da0s1a and boot from that path by setting
boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).

The one thing not provided here is automatic failover from a
server-provided path to a compiled-in one without the user manually
requesting that.  The code just isn't currently structured in a way that
makes that possible with a lot of rewrite.  I think the ability to set
vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server
doesn't provide a name covers the most common needs.

A set of patches submitted by Lars Eggert provided the part I couldn't
figure out by myself when I tried to do this last year; many thanks.

Reviewed by:	rodrigc
2013-07-31 19:14:00 +00:00
Oleksandr Tymoshenko
c6fd18e027 - Typo fix
- style(9) fix

Spotted by: kib@, Andrey Zonov
2012-08-16 19:22:34 +00:00
Oleksandr Tymoshenko
1d4c5e5116 Merge somewhat modified r230399 from projects/armv6:
Add timeout to wait for network controllers to appear when netbooting.

USB ethernet adapter initialization usually is delayed and
they're not available immidiately after autoconfiguration. So we need
to wait a bit before giving up

Reviewed by:	stas@
2012-08-16 00:51:50 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Gleb Smirnoff
6bfaa83996 Some cleanup of BOOTP code. Initially I wanted to just change the ifioctl()
usage, but end up with more changes.

- Use SIOCAIFADDR instead of old rusty SIOCSIFADDR, SIOCSIFBRDADDR
  and SIOCSIFNETMASK.
- Use queue(9) instead of hand made stailq.
- Use one socket for all ifioctl() and send/receive operations.
- Use __func__ instead of cut-n-paste in logging and panics.
- Axe some dead or strange code.

Tested by:	gonzo, Stefan Bethke <stb lassitu.de>
2011-12-13 07:02:48 +00:00
Grzegorz Bernacki
acd73f5041 Set proper root device name when legacy NFS client is compiled into kernel.
Approved by:     cognet (mentor)
2011-06-29 15:17:29 +00:00
Rick Macklem
7c208ed659 Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by:	jhb
MFC after:	2 weeks
2011-04-25 22:22:51 +00:00
Peter Wemm
eb25edbda3 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Tor Egge
7d1af7b215 Enable use of DHCP extensions.
Reviewed by:	Per Kristian Hove <Per.Hove@math.ntnu.no>
2001-02-02 02:35:40 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Tor Egge
e4e7a9a4e9 Reduce kernel stack usage by not having large packets on the stack.
Supply correct size parameter to dhcpd.
Replace some magic numbers with macro names.
Handle more than one interface.
2000-10-29 01:19:32 +00:00
Tor Egge
5b93d1da3f Eliminate some bitrot (nonexisting member variable names).
Don't use curproc when a proc pointer is available.
2000-10-24 23:33:01 +00:00
Tor Egge
6d7518c134 Style fixes. 2000-10-24 22:40:18 +00:00
Paul Saab
fb27899f3b Correctly set the Maximum DHCP Message Size. bootpd now works
again as well as ISC dhcpd.
2000-06-13 09:32:09 +00:00
Poul-Henning Kamp
831e32f863 Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies
unless we explicitly tell it we can do more.  We tell it that we can do
1200 bytes.
2000-05-07 14:29:19 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Matthew Dillon
fe08c21a53 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

    This commit includes significant work to proper handle const arguments
    for the DDB symbol routines.
1999-01-27 23:45:44 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Archie Cobbs
2127f26023 Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
Matthew Dillon
aeb728f0d5 Make bootp error message slightly more verbose 1998-12-03 20:28:23 +00:00
Garrett Wollman
cfe8b629f1 Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
Bruce Evans
18df27bda2 Fixed printf format errors. 1998-08-18 00:32:50 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Tor Egge
8f7030a7cc Add a BOOTP_WIRED_TO option, for use on machines with multiple network
cards where the first detected card should not be used for bootp.
Submitted by:	Doug Ambrisko <ambrisko@whistle.com>
1998-03-14 04:13:56 +00:00
Tor Egge
8bd965cce4 Update workaround for limitations in the arp code.
Adjust the RPC timeout message which occured when the old workaround
broke to show the correct IP address.
1998-03-14 03:25:18 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00