This is more compatible with formatting tools and looks more normal.
Reported by: jhb (on a different review)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18442
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding
it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because
unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it
doesn check that concurrent code is running in epoch section. But while
we are in epoch section, we should be able to do access to ifnet's
fields, even it was unlinked. Thus do not change if_addr and if_hw_addr
fields in ifnet_detach_internal() to NULL, since rtsock code can do
access to these fields and this is allowed while it is running in epoch
section.
This should fix the race, when ifnet_detach_internal() unlinks ifnet
after we checked it for IFF_DYING in sysctl_dumpentry.
Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the
NULL check for ifp->if_description, since free(9) can correctly handle
NULL pointer.
MFC after: 1 week
The panic can happen, when some application does dump of routing table
using sysctl interface. To prevent this, set IFF_DYING flag in
if_detach_internal() function, when ifnet under lock is removed from
the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
ifnet detach during routes enumeration. In case, if some interface was
detached in the time before we take the lock, add the check, that ifnet
is not DYING. This prevents access to memory that could be freed after
ifnet is unlinked.
PR: 227720, 230498, 233306
Reviewed by: bz, eugen
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18338
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.
Reviewed by: markj
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.
Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17500
SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch
read lock over typical network IOCTL code paths. This is a regression
issue after r334305.
Reviewed by: ae (network)
Differential revision: https://reviews.freebsd.org/D17647
MFC after: 1 week
Sponsored by: Mellanox Technologies
appearing and disappearing on the host system.
Such handling is need, because tunneling interfaces must use addresses,
that are configured on the host as ingress addresses for tunnels.
Otherwise the system can send spoofed packets with source address, that
belongs to foreign host.
The KPI uses ifaddr_event_ext event to implement addresses tracking.
Tunneling interfaces register event handlers and then they are
notified by the kernel, when an address disappears or appears.
ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event()
in the ip_encap.c
MFC after: 1 month
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17134
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.
MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.
Reviewed by: ae@, bz@
Approved by: re (kib@)
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17180
toe_l2_resolve to fill up the complete vtag and not just the vid.
Reviewed by: kib@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D16752
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock
Fundamentally multicast needs a rewrite - but keep applying band-aids for now.
Tested by: kp
Reported by: novel, kp, lwhsu
Add generic function if_tunnel_check_nesting() that does check for
allowed nesting level for tunneling interfaces and also does loop
detection. Use it in gif(4), gre(4) and me(4) interfaces.
Differential Revision: https://reviews.freebsd.org/D16162
ifioctl(). Move it inside the proper #ifdef. This was throwing a valid
"Assigned but unused" warning with gcc.
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16063
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.
Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.
Reported by: markj
adds:
- epoch_enter_critical() - can be called inside a different epoch,
starts a section that will acquire any MTX_DEF mutexes or do
anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance
Requested by: markj
Approved by: sbruno
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.
Reviewed by: sbruno@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15344
Make if_printf() use vlog() instead of vprintf(). This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.
Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().
In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not. We still return an error code, though.
MFC after: 1 week
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.
Synchronously remove all external references to a multicast address before enqueueing for delete.
Reported by: lwhsu
Approved by: sbruno
This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.
The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.
The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.
Reviewed by: bdrewery, cem (earlier versions), julian, sbruno
MFC after: 1 month
X-MFC note: use a spare field in struct ifnet
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15253
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.
Submitted by: mmacy <mmacy@mattmacy.io>
Reviewed by: shurd, sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14969
We use transformation rather than accessors as virtually ever driver
implements SIOCGIFMEDIA and all would have to be touched.
Keep the code readable by always performing copies and (possiably no-op)
transforms.
Reviewed by: jhb, kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14996
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.
Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.
PR: 182297
Reviewed by: jhibbits, vangyzen
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15057
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).
Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.
I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice. If it did, the result would be an 8 byte heap leak on
current architectures.
admbugs: 869
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 3 days
Security: kernel heap leak
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14981
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
Use an accessor to access ifgr_group and ifgr_groups.
Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14960
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.
Reviewed by: kib, emaste
Obtained from: CheriBSD
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14912
- The two types must be type-punnable for shared members of ifr_ifru.
This allows compatibility accessors to be shared.
- There must be no padding gap between ifr_name and ifr_ifru. This is
assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
broadly portable. This is true for all current architectures, but very
large (256-bit) fat-pointers could violate this invariant.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14910
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14846
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.
Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.
Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.
Reviewed by: ae (previous version), hselasky, melifaro
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14702
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).
Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).
PR: 222647, 223129
Reviewed by: gnn
Approved by: avg (mentor), mav (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12747