Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is
not very useful (if there are other masters -- it can lead to split brain,
if there are none -- it makes no sense). Having it as INIT makes it clear
that carp packets are disabled.
Submitted by: wg
MFC after: 1 month
Relnotes: yes
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14477
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.
Submitted by: mmacy <mmacy@mattmacy.io>
Reviewed by: shurd, sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14969
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).
Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
In all cases where cif_vrs list is modified, two locks are held: per-ifnet
CIF_LOCK and global carp_sx. It means to read that list only one of them
is enough to be held, so we can skip CIF_LOCK when we already have carp_sx.
This fixes kernel panic, caused by attempts of copyout() to sleep while
holding non-sleepable CIF_LOCK mutex.
Discussed with: glebius
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).
This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.
This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.
There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.
Reviewed by: glebius
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
structs under the INET6 #ifdef. Similarly (even though it doesn't seem
to affect the build), conditionalize all IPv4 structs under the INET
#ifdef
This also unbreaks the LINT-NOINET6 tinderbox target on amd64; I have not
verified other MACHINE/TARGET pairs (e.g. armv6/arm).
MFC after: 2 weeks
X-MFC with: r310847
Pointyhat to: jpaetzel
Reported by: O. Hartmann <o.hartmann@walstatt.org>
If there is a loop in the network a CARP that is in MASTER state will see it's
own broadcasts, which will then cause it to assume BACKUP state. When it
assumes BACKUP it will stop sending advertisements. In that state it will no
longer see advertisements and will assume MASTER...
We can't catch all the cases where we are seeing our own CARP broadcast, but
we can catch the obvious case.
Submitted by: torek
Obtained from: FreeNAS
MFC after: 2 weeks
Sponsored by: iXsystems
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.
This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.
We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.
Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce
This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.
The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link
Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.
PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111
Rename arp_ifinit2() into arp_announce_ifaddr().
Eliminate zeroing ifa_rtrequest: it was used for calling arp_rtrequest()
which was responsible for handling route cloning requests. It became
obsolete since r186119 (L2/L3 split).
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3573
- Use the carp_sx to serialize not only CARP ioctls, but also carp_attach()
and carp_detach().
- Use cif_mtx to lock only access to those the linked list.
- These locking changes allow us to do some memory allocations with M_WAITOK
and also properly call callout_drain() in carp_destroy().
- In carp_attach() assert that ifaddr isn't attached. We always come here
with a pristine address from in[6]_control().
Reviewed by: oleg
Sponsored by: Nginx, Inc.
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
datagrams to any value, to improve performance. The behaviour is
controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.
Differential Revision: https://reviews.freebsd.org/D2177
Reviewed by: adrian, cy, rpaulo
Tested by: Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Relnotes: yes
sys/netinet/ip_carp.c:
Add a "reason" string parameter to carp_set_state() and
carp_master_down_locked() allowing more specific logging
information to be passed into these apis.
Refactor existing state transition logging into a single
log call in carp_set_state().
Update all calls to carp_set_state() and
carp_master_down_locked() to pass an appropriate reason
string. For state transitions that were previously logged,
the output should be unchanged.
Submitted by: gibbs (original), asomers (updated)
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1039697 on 2014/02/11 (original)
1049992 on 2014/03/21 (updated)
CARP devices are created with advskew set to '0' and once you set it to
any other value in the valid range (0..254) you can't set it back to zero.
The code in question is also used to prevent that zeroed values overwrite
the CARP defaults when a new CARP device is created. Since advskew already
defaults to '0' for newly created devices and the new value is guaranteed
to be within the valid range, it is safe to overwrite it here.
PR: 194672
Reported by: cmb@pfsense.org
In collaboration with: garga
Tested by: garga
MFC after: 2 weeks
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:
- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().
This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.
Differential Revision: https://reviews.freebsd.org/D1436
Reviewed by: glebius, trasz, gnn, bz
Sponsored by: EMC / Isilon Storage Division
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
in6_ifaddloop -> nd6_add_ifa_lle
in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
of helper functions:
- carp_master() - boolean function which is true if an address
is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
and is aware of CARP.
Utilize ifa_preferred() in ifa_ifwithnet().
The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.
Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc
to the current demotion factor instead of assigning it.
This allows external scripts to control demotion factor together
with kernel in a raceless manner.
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.
After this change a packet processed by the stack isn't
modified at all[2] except for TTL.
After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.
[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.
[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.
Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
but not for IPv6. The current checks in nd6_nbr.c along with the
old version will result in ifa being NULL and subsequently the
packet will be dropped. This prevented NS/NA, from working and
with that IPv6.
Now return the ifa from the carp lookup function in two cases:
1) if the address matches, is a carp address, and we are MASTER
(as before),
2) if the address matches but it is not a carp address at all (new).
Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood)
Tested on: New Y! FreeBSD cluster machines
Reviewed by: glebius
we look at count of addresses on a particular vhid, we
should account number of addresses on cif.
To achieve this we need to run carp_attach() and
carp_detach() under appropriate cif lock.
this commit is not enough to enable CARP operation on
if_bridge(4), because the latter doesn't handle or even
initialize its ifp->if_link_state.
Reported by: Alexander Lunev <sol289 gmail.com>