1384 Commits

Author SHA1 Message Date
andre
10b033d327 For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.
The upper 32bits are not occupied for now.

Sponsored by:	The FreeBSD Foundation
2013-08-25 09:49:00 +00:00
andre
e3737c33e7 Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
delphij
d76e7522db Fix an integer overflow in computing the size of a temporary buffer
can result in a buffer which is too small for the requested
operation.

Security:	CVE-2013-3077
Security:	FreeBSD-SA-13:09.ip_multicast
2013-08-22 00:51:37 +00:00
andre
7cc6cc696c Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
andre
fd76db4587 Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific
flag instead.  The flag is only used within the IP and IPv6 layer 3
protocols.

Because some firewall packages treat IPv4 and IPv6 packets the same the
flag should have the same value for both.

Discussed with:	trociny, glebius
2013-08-19 11:08:36 +00:00
hrs
9b92a60da0 Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to
time_second should not be performed in this case.
2013-08-17 07:14:45 +00:00
hrs
c5a14d7164 Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6,
SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl.  These userland interfaces
treat expiration times in time_second, not time_uptime.
2013-08-06 17:10:52 +00:00
hrs
13c1bcf2c1 - Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel.  This fixes various malfunction when the wall time
  clock is changed.  Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after:	1 month
2013-08-05 20:13:02 +00:00
hrs
05101f7501 Fix a panic in tmpaddrtimer. 2013-08-05 00:36:12 +00:00
hrs
64e5ea0653 Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable
interfaces.  This eliminates unnecessary IPv6 processing for non-IPv6
interfaces.

MFC after:	3 days
2013-07-31 16:24:49 +00:00
ae
afd48faca0 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
trociny
69ab640b6b A complete duplication of binding should be allowed if on both new and
duplicated sockets a multicast address is bound and either
SO_REUSEPORT or SO_REUSEADDR is set.

But actually it works for the following combinations:

  * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
  * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
  * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;

and fails for this:

  * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.

Fix the last case.

PR:		179901
MFC after:	1 month
2013-07-12 19:08:33 +00:00
ae
9bfe2ac5dd Correct the size of allocated memory to store array of counters. 2013-07-09 15:20:46 +00:00
ae
08c6719ac4 Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters. 2013-07-09 09:59:46 +00:00
ae
e5b002a3b8 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
ae
1a36dfcc87 Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
trociny
9b554dcd02 In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option.  It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.

Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.

Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.

PR:		179901
Submitted by:	Michael Gmelin freebsd grem.de (initial version)
Reviewed by:	rwatson
MFC after:	1 week
2013-07-04 18:38:00 +00:00
hrs
50e0add9e4 - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
  regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
  To configure an autoconfigured link-local address (RFC 4862), the
  following rc.conf(5) configuration can be used:

   ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
  added when the parent interface or one of the existing member
  interfaces has an IPv6 address.  if_bridge(4) merges each link-local
  scope zone which the member interfaces form respectively, so it causes
  address scope violation.  Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
  unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by:	rpaulo [*]
MFC after:	1 week
2013-07-02 16:58:15 +00:00
qingli
2478a430f6 Delete the nd6 entries associated with an off-link prefix
if the same prefix cannot be found on an alternative
interface.

Reviewed by:	hrs
MFC after:	1 week
2013-06-24 05:01:13 +00:00
ae
1e4c88cc8b Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after:	2 weeks
2013-06-20 09:55:53 +00:00
ae
71fe01331d Use PIM6STAT_INC() and MRT6STAT_INC() macros for IPv6 multicast
statistics accounting.

MFC after:	2 weeks
2013-06-19 21:50:17 +00:00
ae
fc846316eb Use RIP6STAT_INC() macro for raw ip6 statistics accounting.
MFC after:	2 weeks
2013-06-19 20:48:34 +00:00
ae
6cec63ef0a Use ICMP6STAT_INC() macro for ICMPv6 errors accounting.
MFC after:	2 weeks
2013-06-19 15:59:21 +00:00
melifaro
40bb6f2505 Really fix netmask address family this time.
MFC with:	r250813
2013-05-19 19:42:46 +00:00
melifaro
9f42266f8d Finish r85740 : Make IPv6 netmask has address family set.
This pleases routing daemons like bird.

MFC after:	2 weeks
2013-05-19 19:19:01 +00:00
julian
329247aec2 Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after:	never
2013-05-16 16:20:17 +00:00
tuexen
6ea39edf93 Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY
socket option for SCTP sockets in the same way as for UDP or TCP
sockets.

MFC after: 2 weeks
2013-05-10 18:09:38 +00:00
hrs
d9d71436d9 Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address.  Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one.  When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by:		ume
Based on work by:	Thomas Scheffler
PR:			conf/174957
MFC after:		2 weeks
2013-05-04 19:16:26 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
ae
28d7b5b903 Remove unused variable.
MFC after:	1 week
2013-04-24 10:24:01 +00:00
oleg
9917da6df0 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
tijl
40254de0a6 Fix build after r249543. 2013-04-16 16:59:29 +00:00
ae
586b63d9f3 Fix accounting after the r249528, also add several another counters to
the statistics.
2013-04-16 11:31:26 +00:00
ae
bb1dffc2b9 Use IP6S_M2MMAX macro. 2013-04-16 11:19:13 +00:00
ae
e7b578dd8b Replace hardcoded numbers. 2013-04-16 11:12:58 +00:00
ae
dec8b563fa The source address selection algorithm tries to apply several rules
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.
2013-04-15 21:02:40 +00:00
ae
cd45f7487f Free memory after deleting an address policy entry.
MFC after:	1 week
2013-04-12 07:59:54 +00:00
ae
844d612b2a Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after:	1 week
2013-04-09 07:11:22 +00:00
kevlo
0cbbbb7d30 Clean up some unused leftover code.
Pointed out by:	ae
2013-03-22 01:45:54 +00:00
kevlo
b0b955ade2 Remove unused global variables.
Reviewed by:	ae, glebius
2013-03-22 01:40:17 +00:00
glebius
f07362f54e - Use m_getcl() instead of hand allocating.
- Do not calculate constant length values at run time,
  CTASSERT() their sanity.
- Remove superfluous cleaning of mbuf fields after allocation.
- Replace compat macros with function calls.

Sponsored by:	Nginx, Inc.
2013-03-15 13:48:53 +00:00
glebius
79cb402edb - Use m_getcl() instead of hand allocating.
- Use m_get()/m_gethdr() instead of macros.
- Remove superfluous cleaning of mbuf fields after allocation.

Sponsored by:	Nginx, Inc.
2013-03-15 12:50:29 +00:00
glebius
ace684a132 Use m_getcl() instead of hand made allocation.
Sponsored by:	Nginx, Inc.
2013-03-15 12:33:23 +00:00
ae
4e920d3af6 Take the inpcb rlock before calculating checksum, it was accidentally
moved in r191672.

Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-12 02:20:20 +00:00
np
e7cfe70efd Generate lle_event in the IPv6 neighbor discovery code too.
Reviewed by:	bz@
2013-01-26 00:05:22 +00:00
np
09b8766144 Avoid NULL dereference in nd6_storelladdr when no mbuf is provided. It
is called this way from a couple of places in the OFED code.  (toecore
calls it too but that's going to change shortly).

Reviewed by:	bz@
2013-01-25 23:11:13 +00:00
ae
0bad7195e9 Simplify in6_setscope() function to get better performance.
Currently we use interface indeces as zone IDs for link-local and
interface-local scopes, and since we don't have any tool to configure
zone IDs, there is no need to acquire the afdata lock several times per
packet only to read if_index value.
So, now in6_setscope reads zone IDs for interface-local, link-local and
global scopes without a lock.

Sponsored by:	Yandex LLC
MFC after:	2 weeks
2013-01-10 00:10:24 +00:00
ae
6be782d67f Remove unneeded variable.
MFC after:	1 week
2013-01-09 18:54:58 +00:00
ume
e33acd92c3 Add no_prefer_iface option.
It stops treating the address on the interface as special by source
address selection rule even when the interface is outgoing interface.
This is desired in some situation.

Requested by:	hrs
Reviewed by:	IHANet folks including hrs
MFC after:	1 week
2013-01-09 18:18:08 +00:00
ae
5f7fde904c The in6_setscope() function determines the scope zone id of an address
and embeds it into address. Inside the kernel we keep addresses with
embedded zone id only for two scopes: link-local and interface-local.

For other scopes this function is nop in most cases. To reduce an
overhead of locking, first check that address is capable for embedding.
Also, handle the loopback address before acquire the lock.

Sponsored by:	Yandex LLC
MFC after:	1 week
2013-01-09 00:36:06 +00:00