Commit Graph

2811 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
9dba179d5e IFC @231845
Sponsored by:	Cisco Systems, Inc.
2012-02-17 00:27:48 +00:00
Tijl Coosemans
265f940acc Change some headers such that lang/gcc* ports no longer patch them.
The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.
2012-02-14 12:50:20 +00:00
Bjoern A. Zeeb
6d076ae8f7 Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl.  This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by:	glebius, brooks
MFC after:	3 days
2012-02-11 06:02:16 +00:00
Bjoern A. Zeeb
e82cf13bfb Backout changes from r228571. Remove if_data from struct ifa_msghdr again.
While this breaks carp on HEAD temporary, it restores the upgrade path from
stable, and head before 20111215.

Reviewed by:	glebius, brooks
2012-02-11 05:59:54 +00:00
Sergey Kandaurov
4ecf274be7 g/c last bit of old ipv6 prefix management.
Reviewed by:	bz
Obtained from:	NetBSD, net/if.h, rev 1.80
2012-02-08 22:05:26 +00:00
Luigi Rizzo
5819da83ce - change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
  with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
  to replace it with a different implementation. The current code
  relies on a single large contiguous chunk of memory obtained through
  contigmalloc.
  The new implementation (not committed yet) uses multiple
  smaller chunks which are easier to fit in a fragmented address
  space.
2012-02-08 11:43:29 +00:00
Pawel Jakub Dawidek
50c8ec53f6 Allow to set if_bridge(4) sysctls from /boot/loader.conf.
MFC after:	3 days
2012-02-07 14:50:33 +00:00
Gleb Smirnoff
e8aa8bdd64 Fix typo in r231010.
Submitted by:	linimon
2012-02-05 12:52:28 +00:00
Gleb Smirnoff
cad582b753 Better comment for ifa_init(), ifa_ref(), ifa_free(). 2012-02-05 08:53:05 +00:00
Gleb Smirnoff
adc7231d5d In ifa_init() initialize if_data.ifi_datalen. This would be
required after upcoming changes from bz@.

Discussed with:	bz
2012-02-05 08:31:15 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb
a84986256f Move a comment from rtinit1() to the top of the file where dealing with
the (maximum) number of FIBs trying to clarify that evetually FIBs
should probably attached to domain(9) specific storage. [1]

Add a comment on a limitimation on the rt_add_addr_allfibs option.

Use RT_DEFAULT_FIB instead of 0 where applicable.

Add empty line to functions without local variables per style.

Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES
to indicate that it might go away in the future.

No functional change.

Discussed with:	julian [1] (clarification on what the original one meant)
Sponsored by:	Cisco Systems, Inc.
2012-02-03 12:25:14 +00:00
Bjoern A. Zeeb
b3dd077152 Minor optimization doing input validation with a possible early return
before doing further work.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 11:20:11 +00:00
Bjoern A. Zeeb
096f27864f Fix FLOWTABLE IPv6 handling in route.c missed in r205066.
While doing so, for consistency with the rtalloc_ign_fib(9) interface
called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer
indicating that it would only handle the INET case.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 10:17:34 +00:00
Bjoern A. Zeeb
b680a383a8 Allow for IPv6 to allocate (and in the VIMAGE case free) as many routing
tables (FIBs) as IPv4.
Prepare various general rt* functions for multi-FIB IPv6 handling in
addition to already existing multi-FIB IPv4 cases.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 09:23:55 +00:00
Bjoern A. Zeeb
556d81ddd7 Rather than putting magic 0s as FIB argument into the rt* calls, provide
a macro RT_DEFAULT_FIB defined to 0 to more easily identify the cases
tied to the default FIB.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 09:06:24 +00:00
Kip Macy
0fe48d670f A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.

Reviewed by:	qingli@
MFC after:	2 weeks
2012-01-26 20:02:40 +00:00
Bjoern A. Zeeb
8d74af3668 Replace random ARIN direct assignment legacy IPs with proper RFC 5735
TEST-NET1 block for use in documentation and example code addresses.

MFC after:	3 days
2012-01-24 15:20:31 +00:00
Eitan Adler
dde153da49 - Fix trivial typo
Approved by:	nwhitehorn
MFC after:	3 days
2012-01-14 17:07:52 +00:00
Robert Watson
7983103ae6 Clarify throughout the vlan(4) code the difference between a "tag" (the
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID).  Tags go in packets.  VIDs identify VLANs.

No functional change is intended, so this should be safe to MFC.  Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).

Reviewed by:	bz
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-12 18:39:37 +00:00
Lawrence Stewart
9a7e6bac47 Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
  exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
  no bpf_if referencing the specified ifnet is found, as it is harmless and does
  not require user attention.

Reviewed by:	csjp
MFC after:	1 week
2012-01-10 00:48:29 +00:00
John Baldwin
fbcebf7f71 Convert the per-interface address list lock from a mutex to a reader/writer
lock.

Reviewed by:	bz
2012-01-09 19:34:12 +00:00
Gleb Smirnoff
c82c34b4a9 Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571.
Submitted by:	bz
2012-01-08 17:11:53 +00:00
Gleb Smirnoff
94901d5e60 Move arprequest() declaration to if_ether.h. 2012-01-08 13:34:00 +00:00
Gleb Smirnoff
dcb39bd84a Since r228571 CARP is no longer an interface. 2012-01-06 12:05:43 +00:00
John Baldwin
137f91e80f Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
John Baldwin
a2cb1d522b Add new variants of the IF_ADDR_*LOCK*() macros used for protecting
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros.  The lock is still kept as a mutex for now.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 18:35:49 +00:00
Robert Watson
5a39f779b2 Refine last comment.
Submitted by:	joeld
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-05 11:42:34 +00:00
Robert Watson
15f6780ef4 Add comment to the VLAN code about its integration with VIMAGE: we see what
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way.  More pondering is clearly
required.

Sponsored by:	ADARA Networks, Inc.
Discussed with:	bz
MFC after:	3 days
2012-01-05 11:24:22 +00:00
Lawrence Stewart
253a3814d4 Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by:	adrian
2011-12-31 07:21:28 +00:00
Lawrence Stewart
0f89fc22f3 - Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one
aspect of time stamp configuration per interface rather than per BPF
  descriptor. Prior to this, the order in which BPF devices were opened and the
  per descriptor time stamp configuration settings could cause non-deterministic
  and unintended behaviour with respect to time stamping. With the new scheme, a
  BPF attached interface's tscfg sysctl entry can be set to "default", "none",
  "fast", "normal" or "external". Setting "default" means use the system default
  option (set with the net.bpf.tscfg.default sysctl), "none" means do not
  generate time stamps for tapped packets, "fast" means generate time stamps for
  tapped packets using a hz granularity system clock read, "normal" means
  generate time stamps for tapped packets using a full timecounter granularity
  system clock read and "external" (currently unimplemented) means use the time
  stamp provided with the packet from an underlying source.

- Utilise the recently introduced sysclock_getsnapshot() and
  sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
  packet, regardless of the number of BPF descriptors and time stamp formats
  requested. Use the per BPF attached interface time stamp configuration to
  control if sysclock_getsnapshot() is called and whether the system clock read
  is fast or normal. The per BPF descriptor time stamp configuration is then
  used to control how the system clock snapshot is converted to a bintime by
  sysclock_snap2bintime().

- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
  read of the system clock is now controlled per BPF attached interface using
  the net.bpf.tscfg sysctl tree.

- Update the bpf.4 man page.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with:	Julien Ridoux (jridoux at unimelb edu au)
2011-12-30 08:57:58 +00:00
Pyun YongHyeon
1ad7a2570d Update if_obytes and if_omcast after successful transmit.
While I'm here update if_oerrors if parent interface of vlan is not
up and running.  Previously it updated collision counter and it was
confusing to interprete it.

PR:		kern/163478
Reviewed by:	glebius, jhb
Tested by:	Joe Holden < lists <> rewt dot org dot uk >
2011-12-29 18:40:58 +00:00
Gleb Smirnoff
7121247312 Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by:	brooks
2011-12-21 12:39:08 +00:00
Gleb Smirnoff
f08535f872 Restore a feature that was present in 5.x and 6.x, and was cleared in
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.

However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:

- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
  conditions, for now these are:
  - interface goes down
  - carp(4) has problems with ip_output() or ip6_output()
  - pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
  is actual value added to advskew. The adjustment values for
  particular error conditions are also configurable, and their
  defaults are maximum advskew value, so a single failure bumps
  demotion to maximum. This is for POLA compatibility, and should
  satisfy most users.
- Demotion factor is a writable sysctl, so user can do
  foot shooting, if he desires to.
2011-12-20 13:53:31 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Gleb Smirnoff
f3909e37ff Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib(). 2011-12-15 12:49:10 +00:00
Brooks Davis
f26fa169e7 Remove the unused if_free_type() function.
X-MFC after:	never
2011-12-09 23:26:28 +00:00
Luigi Rizzo
506cc70cce 1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
   even if the NIC resets its own ring (e.g. restarting from 0),
   the client will not see any change in the current rx/tx positions,
   because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
   There were some inconsistencies in the implementation of the netmap
   support routines, now drivers have been aligned to a common
   code structure.

3. import netmap support for ixgbe . This is implemented as a very
   small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
   in total the patch has only 54 lines of new code) , as most of
   the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
   following some initial comments from Jack Vogel about making
   changes less intrusive.
   (Note, i have emailed Jack multiple times asking if he had
   comments on this structure of the code; i got no reply so
   i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
2011-12-05 12:06:53 +00:00
Lawrence Stewart
3e47c78798 Revert r227778 in preparation for committing reworked patches in its place. 2011-11-29 12:55:26 +00:00
John Baldwin
d9b1d61535 Change the if_vlan driver to use if_transmit for forwarding packets to the
parent interface.  This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.

Suggested by:	np
Reviewed by:	brooks
MFC after:	1 month
2011-11-28 19:35:08 +00:00
Gleb Smirnoff
2e9fff5b18 - Use generic alloc_unr(9) allocator for if_clone, instead
of hand-made.
- When registering new cloner, check whether a cloner with
  same name already exist.
- When allocating unit, also check with help of ifunit()
  whether such interface already exist or not. [1]

PR:		kern/162789 [1]
2011-11-28 14:44:59 +00:00
Gleb Smirnoff
c0ba290b5f Improve logging:
- don't hardcode function name
- use LOG_DEBUG for such a debug message
- print error value
2011-11-22 19:42:17 +00:00
Lawrence Stewart
b6f1c7db32 - When feed-forward clock support is compiled in, change the BPF header to
contain both a regular timestamp obtained from the system clock and the
  current feed-forward ffcounter value. This enables new possibilities including
  comparison of timekeeping performance and timestamp correction during post
  processing.

- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
  packets using the feedback or feed-forward system clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 04:17:24 +00:00
Luigi Rizzo
68b8534bdf Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

	http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

	sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
    Netmap has been developed by Luigi Rizzo and other collaborators
    at the Universita` di Pisa, and supported by EU project CHANGE
    (http://www.change-project.eu/)
    The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
  We should place kernel documentation in the same dir that contains
  the code, which would make it much simpler to keep doc and code
  in sync, reduce the clutter in share/man/ and incidentally is
  the policy used for all of userspace code.
  Makefiles and doc tools can be trivially adjusted to find the
  manpages in the relevant subdirs.
2011-11-17 12:17:39 +00:00
Robert Millan
ea4d9a14f1 Remove a few bits of FreeBSD 2.x compatibility code.
Approved by:	kib (mentor)
2011-11-14 18:21:27 +00:00
Brooks Davis
4b22573a89 In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type.  Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after:	1 Month
2011-11-11 22:57:52 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Max Laier
3ca1a2d6a0 Fix a use-after-free/redzone issue in the routing code.
Reported by (repeatedly):	Mike Tancsa
Prodded by (repeatedly):	bz
Forgotten by (repeatedly):	mlaier
MFC after:			2 weeks
2011-11-03 18:33:30 +00:00
Gleb Smirnoff
a0af7c3edb Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off
the queue. It can be utilized in queue processing to avoid multiple
locking/unlocking.
2011-10-27 09:45:12 +00:00