Commit Graph

3084 Commits

Author SHA1 Message Date
Gleb Smirnoff
67420bda02 Remove ifa_mtx. It was used only in one place in kernel, and ifnet's
ifaddr lock can substitute it there.

Discussed with:	melifaro, ae
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:41:22 +00:00
Gleb Smirnoff
4675896098 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
Gleb Smirnoff
6ed910fabe Hide 'struct ifaddr' definition from userland. Two tools left that use it,
namely ipftest(1) and ifmcstat(1). These sniff structure definition using
_WANT_IFADDR define.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:19:24 +00:00
Mark Murray
72acff0f07 MFC - tracking commit. 2013-10-09 21:03:34 +00:00
Gleb Smirnoff
4cdc1f5421 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
Mark Murray
ad1f331196 Debug run. This now works, except that the "live" sources haven't
been tested. With all sources turned on, this unlocks itself in
a couple of seconds! That is no my box, and there is no guarantee
that this will be the case everywhere.

* Cut debug prints.

* Use the same locks/mutexes all the way through.

* Be a tad more conservative about entropy estimates.
2013-10-06 12:40:32 +00:00
Mark Murray
f02e47dc1e Snapshot. This passes the build test, but has not yet been finished or debugged.
Contains:

* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).

* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.

* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.

* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.

* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.

Submitted by:	Arthur Mesh <arthurmesh@gmail.com> (the first item)
2013-10-04 06:55:06 +00:00
Gleb Smirnoff
c7063c15b0 Clear knlist before destroying it in tap(4) and tun(4). This fixes later
crash, when a kqueue descriptor tries to dereference appropriate knotes.

Approved by:	re (kib)
2013-10-02 20:44:36 +00:00
Gleb Smirnoff
bdad3190a2 Fix a fallout from r241610. One enc interface must be created on startup.
Pointy hat to:	glebius
Reported by:	gavin
Approved by:	re (gjb)
2013-09-28 14:14:23 +00:00
Gleb Smirnoff
540b1a7238 Clean up SIOCSIFDSTADDR usage from ifnet drivers. The ioctl itself is
extremely outdated, and I doubt that it was ever used for ifnet drivers.
It was used for AF_INET sockets in pre-FreeBSD time.

Approved by:	re (hrs)
Sponsored by:	Nginx, Inc.
2013-09-11 09:19:44 +00:00
Dag-Erling Smørgrav
1a05c762b9 Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory.  [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks.  [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem.  [SA-13:13]

Security:	CVE-2013-5666
Security:	FreeBSD-SA-13:11.sendfile
Security:	CVE-2013-5691
Security:	FreeBSD-SA-13:12.ifioctl
Security:	CVE-2013-5710
Security:	FreeBSD-SA-13:13.nullfs
Approved by:	re
2013-09-10 10:05:59 +00:00
Mark Murray
a40c2646a4 Bring in some behind-the-scenes development, mainly By Arthur Mesh,
the rest by me.

o Namespace cleanup; the Yarrow name is now restricted to where it
  really applies; this is in anticipation of being augmented or
  replaced by Fortuna in the future. Fortuna is mentioned, but behind
  #if logic, and is ignorable for now.

o The harvest queue is pulled out into its own modules.

o Entropy harvesting is emproved, both by being made more conservative,
  and by separating (a bit!) the sources. Available entropy crumbs are
  marginally improved.

o Selection of sources is made clearer. With recent revelations,
  this will receive more work in the weeks and months to come.

Submitted by:	 Arthur Mesh (partly) <arthurmesh@gmail.com>
2013-09-07 14:15:13 +00:00
Davide Italiano
ab97ad0806 Don't clear the unused SI_CHEAPCLONE flag in tap_create()/tuncreate().
Reviewed by:	kib
2013-09-07 13:50:13 +00:00
Mark Murray
9d32fc31c7 MFC 2013-09-07 07:58:29 +00:00
Davide Italiano
933e681d93 Retire netisr.netisr_direct and netisr.netisr_direct_force sysctls.
These were used to control/export dispatch policy but they're not anymore.
This commit cannot be MFC'ed to 9 because old netstat(9) binary relies
on such sysctl to work. On the other hand, there's no real reason to
keep'em around in 10.
2013-09-06 21:02:43 +00:00
Mark Murray
f27c28dc6e MFC 2013-08-30 11:38:34 +00:00
Adrian Chadd
310915a45a Convert the if_lagg rwlock to an rmlock.
We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by:	Netflix, Inc.
2013-08-29 19:35:14 +00:00
Alfred Perlstein
29c463d633 Remove include opt_ofed.h since OFED is unifdef'd.
Pointed out by: glebius
2013-08-27 16:45:00 +00:00
Mark Murray
c495c93567 Snapshot; Do some running repairs on entropy harvesting. More needs to follow. 2013-08-26 18:35:21 +00:00
John Baldwin
fd77bbb967 Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by:	bde
Tested by:	exp-run by bdrewery
2013-08-26 18:16:05 +00:00
Andre Oppermann
edb159e1ea Remove unnecessary setup of the m->pkthdr.header pointer.
Sponsored by:	The FreeBSD Foundation
2013-08-25 09:41:37 +00:00
Alfred Perlstein
250053bc41 Remove the #ifdef OFED from the 20 byte mac in struct llentry.
With this change it is now possible to build the entire infiniband
stack as modules and load it dynamically including IP over IB.
2013-08-25 01:55:14 +00:00
Andre Oppermann
1b4381afbb Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00
Andre Oppermann
804c784c13 Whitespace, style cleanups, and improved comments. 2013-08-24 12:03:24 +00:00
Andre Oppermann
737003b366 ename PFIL_LIST_[UN]LOCK() to PFIL_HEADLIST_[UN]LOCK() to avoid
confusion with the pfil_head chain locking macros.
2013-08-24 11:24:15 +00:00
Andre Oppermann
8da0139975 Resolve the confusion between the head_list and the hook list.
The linked list of pfil hooks is changed to "chain" and this term
is applied consistently.  The head_list remains with "list" term.

Add KASSERT to vnet_pfil_uninit().

Update and extend comments.

Reviewed by:	eri (previous version)
2013-08-24 11:17:25 +00:00
Andre Oppermann
887c60fc86 Internalize pfil_hook_get(). There are no outside consumers of
this API, it is only safe for internal use and even the pfil(9)
man page says so in the BUGS section.

Reviewed by:	eri
2013-08-24 10:36:33 +00:00
Andre Oppermann
f13e611f7c Convert one instance of pfil hook callback missed in r254769. 2013-08-24 10:30:20 +00:00
Andre Oppermann
25da5060a4 Introduce typedef for pfil hook callback function and replace all
spelled out occurrences with it.

Reviewed by:	eri
2013-08-24 10:13:59 +00:00
Bjoern A. Zeeb
413e45bf81 After r241616 properly export ifi_baudrate_pf in the 32bit compat case.
MFC after:	3 days
2013-08-20 14:35:17 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Mark Johnston
5bc4f6b3ab Add a missing module version declaration to if_tun(4).
PR:		181078
Submitted by:	Brandon Gooch <jamesbrandongooch@gmail.com>
MFC after:	1 week
2013-08-07 01:32:08 +00:00
Hiroki Sato
12bdf23a3a sin6 should be assigned before the loop. 2013-07-28 20:02:41 +00:00
Hiroki Sato
9fcd8e9ebd - Relax the restriction on the member interfaces with LLAs. Two or more
LLAs on the member interfaces are actually harmless when the parent
  interface does not have a LLA.

- Add net.link.bridge.allow_llz_overlap.  This is a knob to allow LLAs on
  a bridge and the member interfaces at the same time.  The default is 0.

Pointed out by:	ume
MFC after:	3 days
2013-07-28 19:49:39 +00:00
Adrian Chadd
49de4f2214 Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by:	Netflix
2013-07-26 19:41:13 +00:00
Adrian Chadd
387e754ae5 Fix typo.
Sponsored by:	Netflix
2013-07-25 19:10:23 +00:00
Marcel Moolenaar
ef1f916971 Decouple the UUID generator from network interfaces by having MAC
addresses added to the UUID generator using uuid_ether_add(). The
UUID generator keeps an arbitrary number of MAC addresses, under
the assumption that they are rarely removed (= uuid_ether_del()).
This achieves the following:
1.  It brings up closer to having the network stack as a loadable
    module.
2.  It allows the UUID generator to filter MAC addresses for best
    results (= highest chance of uniqeness).
3.  MAC addresses can come from anywhere, irrespactive of whether
    it's used for an interface or not.

A side-effect of the change is that when no MAC addresses have been
added, a random multicast MAC address is created once and re-used if
needed. Previusly, when a random MAC address was needed, it was
created for every call. Thus, a change in behaviour is introduced
for when no MAC addresses exist.

Obtained from:	Juniper Networks, Inc.
2013-07-24 04:24:21 +00:00
Craig Rodrigues
719fb72517 PR: 168520 170096
Submitted by: adrian, zec

Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.

(1)  Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
     device_attach().  This fixes multiple VIMAGE related kernel panics
     when trying to attach Bluetooth or USB Ethernet devices because
     curthread->td_vnet is NULL.

(2)  Set curthread->td_vnet in if_detach().  This fixes kernel panics when detaching networking
     interfaces, especially USB Ethernet devices.

(3)  Use VNET_DOMAIN_SET() in ng_btsocket.c

(4)  In ng_unref_node() set curthread->td_vnet.  This fixes kernel panics
     when detaching Netgraph nodes.
2013-07-15 01:32:55 +00:00
Adrian Chadd
31402c27b8 Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
  to test whether seeing or hearing heartbeats correctly cause LACP to
  drop the port.
* Add in (and make default) a strict LACP mode, which requires the
  heartbeat on a port to be heard before it's used.  Sometimes vendor ports
  will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
  badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
  configured ports, rather than the largest.

Obtained from:	Netflix
MFC after:	2 weeks
2013-07-13 04:25:03 +00:00
Hiroki Sato
4825b1e098 Add a leaf node CTL_NET.PF_ROUTE.0.AF.NET_RT_DUMP.0.FIB. This returns
routing table with the specified FIB number, not td->td_proc->p_fibnum.
2013-07-12 12:36:12 +00:00
Hiroki Sato
e9f947e27c - Drop GIF_ACCEPT_REVETHIP flag by default.
- Add IFF_MONITOR support.
2013-07-12 12:18:07 +00:00
Andrey V. Elsukov
9bea6fd6c6 Correct CTASSERT condition. 2013-07-09 15:10:27 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Andrey V. Elsukov
7daad711df Add several macros to help migrate statistics structures to PCPU counters. 2013-07-09 09:37:21 +00:00
Andrey V. Elsukov
c80211e3cf Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
Colin Percival
d36ed80a7b Fix typo: minmum -> minimum.
Submitted by:	@z3ndrag0n
2013-07-05 23:40:08 +00:00
Hiroki Sato
6facd7a6b8 Fix a compiler warning.
MFC after:	1 week
2013-07-03 07:31:07 +00:00
Hiroki Sato
af8056441e - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
  regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
  To configure an autoconfigured link-local address (RFC 4862), the
  following rc.conf(5) configuration can be used:

   ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
  added when the parent interface or one of the existing member
  interfaces has an IPv6 address.  if_bridge(4) merges each link-local
  scope zone which the member interfaces form respectively, so it causes
  address scope violation.  Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
  unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by:	rpaulo [*]
MFC after:	1 week
2013-07-02 16:58:15 +00:00
Qing Li
f672f56f21 Due to the routing related networking kernel redesign work
in FBSD 8.0, interface routes have been returened to the
applications without the RTF_GATEWAY bit. This incompatibility
has caused some issues with Zebra, Qugga and the like.
This patch provides the RTF_GATEWAY flag bit in returned interface
routes so to behave similarly to pre 8.0 systems.

Reviewed by:	    hrs
Verified by:	    mackn at opendns dot com
2013-06-25 00:10:49 +00:00
Xin LI
eda6cf02b8 Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet.  This
gives better feedback and meaningful errors for applications.

MFC after:	2 weeks
Reviewed by:	thompsa
2013-06-17 19:31:03 +00:00
Hiroki Sato
8b20f6cf8a Return ENETDOWN when the parent interface is down.
MFC after:	1 week
2013-06-16 04:40:02 +00:00
Mikolaj Golub
f8afe33795 Properly set curvnet context in lagg_port_setlladdr() task handler.
Reported by:	Nikos Vassiliadis <nvass gmx.com>
Submitted by:	zec
Tested by:	Nikos Vassiliadis <nvass gmx.com>
MFC after:	1 week
2013-06-07 10:27:50 +00:00
John Baldwin
8aa9937318 Fix build with both INET and INET6 disabled. 2013-06-04 20:40:16 +00:00
Andre Oppermann
3c914c547e Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore.  The upper limit is still at
IP_MAXPACKET (65536 bytes).  Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found.  When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by:	cperciva (earlier version)
Reviewed by:	cperciva
Tested by:	cperciva
MFC after:	1 week (using spare struct members to preserve ABI)
2013-06-03 12:55:13 +00:00
Luigi Rizzo
f18be5766f Bring in a number of new features, mostly implemented by Michio Honda:
- the VALE switch now support up to 254 destinations per switch,
  unicast or broadcast (multicast goes to all ports).

- we can attach hw interfaces and the host stack to a VALE switch,
  which means we will be able to use it more or less as a native bridge
  (minor tweaks still necessary).
  A 'vale-ctl' program is supplied in tools/tools/netmap
  to attach/detach ports the switch, and list current configuration.

- the lookup function in the VALE switch can be reassigned to
  something else, similar to the pf hooks. This will enable
  attaching the firewall, or other processing functions (e.g. in-kernel
  openvswitch) directly on the netmap port.

The internal API used by device drivers does not change.

Userspace applications should be recompiled because we
bump NETMAP_API as we now use some fields in the struct nmreq
that were previously ignored -- otherwise, data structures
are the same.

Manpages will be committed separately.
2013-05-30 14:07:14 +00:00
Luigi Rizzo
27892e02fb clarify usage of NETMAP_BUF 2013-05-30 13:41:19 +00:00
Guy Helmer
d013d9022a While waiting for the bpf hold buffer to become idle, check
the return value from mtx_sleep() and exit bpfread() on
errors such as EINTR.

Reviewed by:	jhb
2013-05-23 21:33:10 +00:00
Ed Schouten
6ed0f50f78 Allow certain headers to be included more easily.
Spotted by:	http://hacks.owlfolio.org/header-survey/
2013-05-21 21:20:10 +00:00
Alexander V. Chernikov
22f8ce4335 Use separate function to update mbuf checksum flags instead of
duplicating the same code in different places.

MFC after:	2 weeks
2013-05-18 08:14:21 +00:00
Alexander V. Chernikov
d54455b0c9 Fix rte leak introduced in r248070.
MFC after:	2 weeks
2013-05-18 07:10:22 +00:00
Julian Elischer
4871fc4ab5 Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after:	never
2013-05-16 16:20:17 +00:00
Hiroki Sato
b8992a6792 Add IFF_MONITOR support to gre(4).
Tested by:	Chip Marshall
MFC after:	1 week
2013-05-11 19:05:38 +00:00
Andre Oppermann
f89d4c3acf Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
2013-05-06 16:42:18 +00:00
Eitan Adler
578acad37e Correct a few sizeof()s
Submitted by:	swildner@DragonFlyBSD.org
Reviewed by:	alfred
2013-05-01 04:37:34 +00:00
Luigi Rizzo
c10b5796c0 remove $Id$ (whitespace change) 2013-04-30 16:00:21 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Oleg Bulyzhin
2c5b403e2d Recover missing arp_ifinit() call.
MFC after:	2 weeks
2013-04-18 20:13:33 +00:00
Gleb Smirnoff
b64478a137 Switch lagg(4) statistics to counter(9).
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by:	Nginx, Inc.
2013-04-15 13:00:42 +00:00
Gleb Smirnoff
18ba072a22 Fix build. 2013-04-10 08:09:25 +00:00
Andre Oppermann
e8b3186b6a Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.
2013-04-09 21:02:20 +00:00
Andrey V. Elsukov
9cb8d207af Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after:	1 week
2013-04-09 07:11:22 +00:00
Mark Johnston
83a3ff21a8 Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by:	rstone
Approved by:	rstone (co-mentor)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-28 20:37:07 +00:00
Gleb Smirnoff
209dddb90e Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
Andrey V. Elsukov
5474386bd3 Fix style and comments. 2013-03-19 05:51:47 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
c69f77c339 - Use m_getcl() instead of hand allocating.
- Convert panic() to KASSERT.
- Remove superfluous cleaning of mbuf fields after allocation.
- Add comment on possible use of m_get2() here.

Sponsored by:	Nginx, Inc.
2013-03-15 12:52:59 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Gleb Smirnoff
129004c56f Reinitialize eh after pfil(9) processing.
PR:		176764
Submitted by:	adri
2013-03-11 12:06:57 +00:00
Alexander V. Chernikov
3034f43f2f Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00
Alexander V. Chernikov
14126522cf Write lock is not required for find&compare operation.
MFC after:	2 weeks
2013-03-05 13:38:45 +00:00
Gleb Smirnoff
e2a55a0021 Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
Gleb Smirnoff
24421c1c32 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
Randall Stewart
ded5ea6a25 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
Gleb Smirnoff
9711a168b9 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
Gleb Smirnoff
1910bfcba2 route_output() always supplies info with RTAX_GATEWAY member that
points to a sockaddr of AF_LINK family. Assert this instead of
checking.
2013-01-29 21:44:22 +00:00
Navdeep Parhar
4364ec0852 Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by:	bz@
2013-01-25 23:58:21 +00:00
Gleb Smirnoff
ed63043b21 - Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.
2013-01-24 14:29:31 +00:00
Luigi Rizzo
01c039a19c leftover from r245579... flags for semi transparent mode and direct
forwarding through a VALE switch
2013-01-23 03:49:48 +00:00
Gleb Smirnoff
1d9797f128 If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.
2013-01-21 08:59:31 +00:00
Gleb Smirnoff
f6eef2c2d6 - Add dashes before copyright notices.
- Add $FreeBSD$.
- Remove unused define.
2013-01-07 19:36:11 +00:00
Peter Wemm
a116ec4b5e Juggle some internal symbols from our antique zlib (that originally came
in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice.

This is a horrible hack to get freefall to compile, and is in dire need
of reconciliation.  This antique zlib-1.04 code needs to go away.
2013-01-06 14:59:59 +00:00
Andrey V. Elsukov
e37e7917f3 Add an ability to set net.link.stf.permit_rfc1918 from the loader.
MFC after:	2 weeks
2012-12-27 21:26:08 +00:00
Andrey V. Elsukov
51743c5f73 Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow
the use of private IPv4 addresses with stf(4).

MFC after:	2 weeks
2012-12-27 20:59:22 +00:00
Kevin Lo
c7dada99bb Fix typo in comment.
Reviewed by:	thompsa
2012-12-18 06:37:23 +00:00
Gleb Smirnoff
b1ec2940af Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
Guy Helmer
3b3b91e736 Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.
2012-12-10 16:14:44 +00:00
Hiroki Sato
0bebb5448b - Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
2012-12-05 19:45:24 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Hiroki Sato
5c9fa630f6 - Fix LOR in sa6_recoverscope() in rt_msg2()[1].
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).

Reported by:	rpaulo[1], Mateusz Guzik[1], peter[2]
2012-12-04 17:12:23 +00:00
Alexander V. Chernikov
f079a0fa8c Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after:	1 week
2012-12-02 21:43:37 +00:00