4821 Commits

Author SHA1 Message Date
Alexander V. Chernikov
9d099b4f38 * Dump available table algorithms via "ipfw talist" cmd.
Kernel changes:
* Add type/refcount fields to table algo instances.
* Add IP_FW_TABLES_ALIST opcode to export available algorihms to userland.

Userland changes:
* Fix cores on empty input inside "ipfw table" handler.
* Add "ipfw talist" cmd to print availabled kernel algorithms.
* Change "table info" output to reflect long algorithm config lines.
2014-07-29 22:44:26 +00:00
Alexander V. Chernikov
68394ec88e * Add generic ipfw interface tracking API
* Rewrite interface tables to use interface indexes

Kernel changes:
* Add generic interface tracking API:
 - ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates
  state & bumps ref)
 - ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to
  update ifindex)
 - ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer)
 - ipfw_iface_unref(unlocked, drops reference)
Additionally, consumer callbacks are called in interface withdrawal/departure.

* Rewrite interface tables to use iface tracking API. Currently tables are
  implemented the following way:
  runtime data is stored as sorted array of {ifidx, val} for existing interfaces
  full data is stored inside namedobj instance (chained hashed table).

* Add IP_FW_XIFLIST opcode to dump status of tracked interfaces

* Pass @chain ptr to most non-locked algorithm callbacks:
  (prepare_add, prepare_del, flush_entry ..). This may be needed for better
  interaction of given algorithm an other ipfw subsystems

* Add optional "change_ti" algorithm handler to permit updating of
  cached table_info pointer (happens in case of table_max resize)

* Fix small bug in ipfw_list_tables()
* Add badd (insert into sorted array) and bdel (remove from sorted array) funcs

Userland changes:
* Add "iflist" cmd to print status of currently tracked interface
* Add stringnum_cmp for better interface/table names sorting
2014-07-28 19:01:25 +00:00
Alexander V. Chernikov
7e767c791f * Use different rule structures in kernel/userland.
* Switch kernel to use per-cpu counters for rules.
* Keep ABI/API.

Kernel changes:
* Each rules is now exported as TLV with optional extenable
  counter block (ip_fW_bcounter for base one) and
  ip_fw_rule for rule&cmd data.
* Counters needs to be explicitly requested by IPFW_CFG_GET_COUNTERS flag.
* Separate counters from rules in kernel and clean up ip_fw a bit.
* Pack each rule in IPFW_TLV_RULE_ENT tlv to ease parsing.
* Introduce versioning in container TLV (may be needed in future).
* Fix ipfw_cfg_lheader broken u64 alignment.

Userland changes:
* Use set_mask from cfg header when requesting config
* Fix incorrect read accouting in ipfw_show_config()
* Use IPFW_RULE_NOOPT flag instead of playing with _pad
* Fix "ipfw -d list": do not print counters for dynamic states
* Some small fixes
2014-07-08 23:11:15 +00:00
Alexander V. Chernikov
6447bae661 * Prepare to pass other dynamic states via ipfw_dump_config()
Kernel changes:
* Change dump format for dynamic states:
  each state is now stored inside ipfw_obj_dyntlv
  last dynamic state is indicated by IPFW_DF_LAST flag
* Do not perform sooptcopyout() for !SOPT_GET requests.

Userland changes:
* Introduce foreach_state() function handler to ease work
  with different states passed by ipfw_dump_config().
2014-07-06 23:26:34 +00:00
Alexander V. Chernikov
81d3153d61 * Add "lookup" table functionality to permit userland entry lookups.
* Bump table dump format preserving old ABI.

Kernel size:
* Add IP_FW_TABLE_XFIND to handle "lookup" request from userland.
* Add ta_find_tentry() algorithm callbacks/handlers to support lookups.
* Fully switch to ipfw_obj_tentry for various table dumps:
  algorithms are now required to support the latest (ipfw_obj_tentry) entry
    dump format, the rest is handled by generic dump code.
  IP_FW_TABLE_XLIST opcode version bumped (0 -> 1).
* Eliminate legacy ta_dump_entry algo handler:
  dump_table_entry() converts data from current to legacy format.

Userland side:
* Add "lookup" table parameter.
* Change the way table type is guessed: call table_get_info() first,
  and check value for IPv4/IPv6 type IFF table does not exist.
* Fix table_get_list(): do more tries if supplied buffer is not enough.
* Sparate table_show_entry() from table_show_list().
2014-07-06 18:16:04 +00:00
Alexander V. Chernikov
ac35ff1784 Fully switch to named tables:
Kernel changes:
* Introduce ipfw_obj_tentry table entry structure to force u64 alignment.
* Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED).
* Use "subtype" field to distingush between IPv4 and IPv6 table records
  instead of previous hack.
* Add value type (vtype) field for kernel tables. Current types are
  number,ip and dscp
* Fix sets mask retrieval for old binaries
* Fix crash while using interface tables

Userland changes:
* Switch ipfw_table_handler() to use named-only tables.
* Add "table NAME create [type {cidr|iface|u32} [valtype {number|ip|dscp}] ..."
* Switch ipfw_table_handler to match_token()-based parser.
* Switch ipfw_sets_handler to use new ipfw_get_config() for mask  retrieval.
* Allow ipfw set X table ... syntax to permit using per-set table namespaces.
2014-07-03 22:25:59 +00:00
Alexander V. Chernikov
6c2997ffec * Add new IP_FW_XADD opcode which permits to
a) specify table ids as names
  b) add multiple rules at once.
Partially convert current code for atomic addition of multiple rules.
2014-06-29 22:35:47 +00:00
Alexander V. Chernikov
563b5ab132 Suppord showing named tables in ipfw(8) rule listing.
Kernel changes:
* change base TLV header to be u64 (so size can be u32).
* Introduce ipfw_obj_ctlv generc container TLV.
* Add IP_FW_XGET opcode which is now used for atomic configuration
  retrieval. One can specify needed configuration pieces to retrieve
  via flags field. Currently supported are
  IPFW_CFG_GET_STATIC (static rules) and
  IPFW_CFG_GET_STATES (dynamic states).
  Other configuration pieces (tables, pipes, etc..) support is planned.

Userland changes:
* Switch ipfw(8) to use new IP_FW_XGET for rule listing.
* Split rule listing code get and show pieces.
* Make several steps forward towards libipfw:
  permit printing states and rules(paritally) to supplied buffer.
  do not die on malloc/kernel failure inside given printing functions.
  stop assuming cmdline_opts is global symbol.
2014-06-28 23:20:24 +00:00
Alexander V. Chernikov
9490a62716 * Add IP_FW_TABLE_XCREATE / IP_FW_TABLE_XMODIFY opcodes.
* Add 'algoname' string to ipfw_xtable_info permitting to specify lookup
algoritm with parameters.
* Rework part of ipfw_rewrite_table_uidx()

Sponsored by:	Yandex LLC
2014-06-16 13:05:07 +00:00
Alexander V. Chernikov
d3a4f9249c Simplify opcode handling.
* Use one u16 from op3 header to implement opcode versioning.
* IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current).
* Every getsockopt request is now handled in ip_fw_table.c
* Rename new opcodes:
IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY
IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE
IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST
IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO
IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH

* Add some docs about using given opcodes.
* Group some legacy opcode/handlers.
2014-06-15 13:40:27 +00:00
Alexander V. Chernikov
f1220db8d7 Move further to eliminate next pieces of number-assuming code inside tables.
Kernel changes:
* Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set)
* Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set)
* Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables)

Userland changes:
* move tables code to separate tables.c file
* get rid of tables_max
* switch "all"/list handling to new opcodes
2014-06-14 22:47:25 +00:00
Alexander V. Chernikov
9f7d47b025 Add API to ease adding new algorithms/new tabletypes to ipfw.
Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
  Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
  new ip_fw_table_algo.c file.
  Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+       char            name[64];
+       int             idx;
+       ta_init         *init;
+       ta_destroy      *destroy;
+       table_lookup_t  *lookup;
+       ta_prepare_add  *prepare_add;
+       ta_prepare_del  *prepare_del;
+       ta_add          *add;
+       ta_del          *del;
+       ta_flush_entry  *flush_entry;
+       ta_foreach      *foreach;
+       ta_dump_entry   *dump_entry;
+       ta_dump_xentry  *dump_xentry;
+};

* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
   ->tablestate pointer (array of 32 bytes structures necessary for
   runtime lookups (can be probably shrinked to 16 bytes later):

   +struct table_info {
   +       table_lookup_t  *lookup;        /* Lookup function */
   +       void            *state;         /* Lookup radix/other structure */
   +       void            *xstate;        /* eXtended state */
   +       u_long          data;           /* Hints for given func */
   +};

* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().

* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
    implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
    currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).

Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)

Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
2014-06-14 10:58:39 +00:00
Alexander V. Chernikov
b074b7bbce Make ipfw tables use names as used-level identifier internally:
* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode

Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize

Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced

Sponsored by:	Yandex LLC
2014-06-12 09:59:11 +00:00
Michael Tuexen
dfa9c0b787 Use ENOBUFS instead of ENOMEM in error situations related to m_uiotombuf().
This was suggested by kevlo@.

MFC after: 3 days
2014-06-05 12:51:12 +00:00
Kevin Lo
71c92ff80a Fix build UDP-Lite with VIMAGE enabled when building with gcc.
Reported and tested by: Jason Hellenthal
2014-06-03 01:30:32 +00:00
Hiren Panchasara
fc5e1956d9 ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by:	Midori Kato (aoimidori27@gmail.com)
Worked with:	Lars Eggert (lars@netapp.com)
Reviewed by:	luigi, hiren
2014-06-01 07:28:24 +00:00
Bjoern A. Zeeb
700515aa62 While PAWS is disabled, there are no consumers for the tcp options
argument to tcp_twcheck();  thus mark it __unused.

MFC after:	2 weeks
2014-05-30 22:34:06 +00:00
Alan Somers
2f308a343f Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
Jilles Tjoelker
ced01b33f4 netinet/in.h: Expose htonl(), htons(), ntohl() and ntohs() in strict POSIX
mode.

Put the htonl(), htons(), ntohl() and ntohs() declarations under
__POSIX_VISIBLE >= 200112. POSIX.1-2001 and newer require these to be
exposed from <netinet/in.h> (as well as <arpa/inet.h>).

Note that it may be unnecessary to check __POSIX_VISIBLE >= 200112 because
older versions of POSIX and the C standard do not define this header.
However, other places in the same file already perform the check.

PR:		188316
Submitted by:	Christian Neukirchen
2014-05-29 15:23:37 +00:00
Adrian Chadd
8bde802a2b The users of RSS shouldn't be directly concerned about hash -> CPU ID
mappings.  Instead, they should be first mapping to an RSS bucket and
then querying the RSS bucket -> CPU ID mapping to figure out the target
CPU.

When (if?) RSS rebalancing is implemented or some other (non round-robin)
distribution of work from buckets to CPU IDs, various bits of code - both
userland and kernel - will need to know how this mapping works.

So, to support this:

* Add a new function rss_m2bucket() - this maps an mbuf to a given bucket.
  Anything which is currently doing hash -> CPU work may instead wish to
  do hash -> bucket, and then query the bucket->cpuid map for which
  CPU it belongs on.  Or, map it to a bucket, then re-pin that bucket ->
  CPU during a rebalance operation.

* For userland applications which wish to exploit affinity to RSS buckets,
  the bucket -> CPU ID mapping is now available via a sysctl.
  net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via
  a list of bucket:cpu pairs.
2014-05-27 08:06:20 +00:00
Bjoern A. Zeeb
3150f357ff Remove the prototpye for the static inline function
tcp_signature_verify_input().
The function is defined before first use already.

MFC after:	2 weeks
2014-05-24 15:31:40 +00:00
Bjoern A. Zeeb
ad494fa898 syncache_lookup() is a file local function. Make it static and
take it out of the public KPI; seems it was never used elsewhere.

MFC after:	2 weeks
2014-05-24 15:03:36 +00:00
Bjoern A. Zeeb
4fd2b4eb53 Make tcp_twrespond() file local private; this removes it from the
public KPI; it is not used anywhere else and seems it never was.

MFC after:	2 weeks
2014-05-24 14:01:18 +00:00
Bjoern A. Zeeb
5688fa661b Remove the prototypes for things that are no longer file local but were
moved to the header file.

Pointy hat to:	clang || bz
MFC after:	2 weeks
X-MFC with:	r266596
Reported by:	gcc build of sparc64
2014-05-23 21:12:33 +00:00
Bjoern A. Zeeb
255cd9fd58 Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.

Reviewed by:	np
MFC after:	2 weeks
2014-05-23 20:15:01 +00:00
Adrian Chadd
bad008ce85 Use CPU_FIRST() / CPU_NEXT() to iterate over the valid CPU IDs. 2014-05-22 07:25:36 +00:00
Adrian Chadd
883831c675 When RSS is enabled and per cpu TCP timers are enabled, do an RSS
lookup for the inp flowid/flowtype to destination CPU.

This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled.  Otherwise the behaviour should be the same
as before.
2014-05-18 22:39:01 +00:00
Adrian Chadd
9c42397277 * When copying the flowid from inp -> outbound mbuf, also assign the
hashtype to to the outbound mbuf as well as the flowid.

* Add in socket options to fetch the hashid, the hashtype and RSS CPU
  ID for a given socket.
2014-05-18 22:37:31 +00:00
Adrian Chadd
2f71993288 Ensure that the flowid hashtype is assigned to the inp if the flowid
is also assigned.
2014-05-18 22:34:06 +00:00
Adrian Chadd
cc6c187794 Add a new function to do a CPU ID lookup based on RSS hash information.
This is intended to be used by various places that wish to hash some
information about a TCP/UDP/IP flow but don't necessarily have a
live mbuf to do it with.

Refactor rss_m2cpuid() to use the refactored function.
2014-05-18 22:32:04 +00:00
Adrian Chadd
34e3dcedec Add the flowtype to the inpcb.
The flowid isn't enough to use as part of any RSS related CPU affinity
lookups - the RSS code would like to know what kind of hash it is.
2014-05-18 22:30:12 +00:00
Alexander V. Chernikov
c3015737f3 Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR:		bin/189471
Submitted by:	Dennis Yusupoff <dyr@smartspb.net>
MFC after:	2 weeks
2014-05-17 13:45:03 +00:00
Gleb Smirnoff
b1a4156614 Provide compatibility #define after r265408.
Suggested by:	truckman
2014-05-17 12:33:27 +00:00
Adrian Chadd
d804a08f3e Reserve IP_FLOWID, IP_FLOWTYPE, IP_RSSCPUID socket option IDs for
near-term future use.

These are intended to fetch the current flow id, flow hash type
(M_HASHTYPE_* from the sys/mbuf.h) and if RSS is enabled, the
RSS destined CPU ID for the receive path.
2014-05-17 00:09:12 +00:00
Mike Silbersack
f1395664e5 Remove the function tcp_twrecycleable; it has been #if 0'd for
eight years.  The original concept was to improve the
corner case where you run out of ephemeral ports, but it
was causing performance problems and the mechanism
of limiting the number of time_wait sockets serves
the same purpose in the end.

Reviewed by:	bz
2014-05-16 01:38:38 +00:00
Pyun YongHyeon
c732cd1af1 Fix checksum computation. Previously it didn't include carry.
Reviewed by:	tuexen
2014-05-13 05:07:03 +00:00
Michael Tuexen
a485f139c3 Disable TX checksum offload for UDP-Lite completely. It wasn't used for
partial checksum coverage, but even for full checksum coverage it doesn't
work.
This was discussed with Kevin Lo (kevlo@).
2014-05-12 09:46:48 +00:00
Michael Tuexen
6c19260269 Whitespace change. 2014-05-10 08:48:04 +00:00
Michael Tuexen
d58c15339b Fix a logic bug which prevented the sending of UDP packet with 0 checksum.
This bug was introduced in r264212 and should be X-MFCed with that
revision, if UDP-Lite support if MFCed.
2014-05-09 14:15:48 +00:00
Michael Tuexen
26461454fc Use KASSERTs as suggested by glebius@
MFC after: 3 days
X-MFC with: 265691
2014-05-08 20:47:54 +00:00
Michael Tuexen
8e1d0a568a For some UDP packets (for example with 200 byte payload) and IP options,
the IP header and the UDP header are not in the same mbuf.
Add code to in_delayed_cksum() to deal with this case.

MFC after: 3 days
2014-05-08 17:27:46 +00:00
Michael Tuexen
4aa74d8b65 Remove unused code. This is triggered by the bugreport of Sylvestre Ledru
which deal with useless code in the user land stack:
https://bugzilla.mozilla.org/show_bug.cgi?id=1003929

MFC after: 3 days
2014-05-06 16:51:07 +00:00
Gleb Smirnoff
c669105d17 - Remove net.inet.tcp.reass.overflows sysctl. It counts exactly
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
  description in netstat(1) output.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-06 00:00:07 +00:00
Gleb Smirnoff
6c42c8a93f The tcp_log_addrs() uses th pointer, which points into the mbuf, thus we
can not free the mbuf before tcp_log_addrs().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-05-05 21:33:20 +00:00
Gleb Smirnoff
e407b67be4 The FreeBSD-SA-14:08.tcp was a lesson on not doing acrobatics with
mixing on stack memory and UMA memory in one linked list.

Thus, rewrite TCP reassembly code in terms of memory usage. The
algorithm remains unchanged.

We actually do not need extra memory to build a reassembly queue.
Arriving mbufs are always packet header mbufs. So we got the length
of data as pkthdr.len. We got m_nextpkt for linkage. And we need
only one pointer to point at the tcphdr, use PH_loc for that.

In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen
field now counts not packets, but bytes in the queue. This gives
us more precision when comparing to socket buffer limits.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-04 23:25:32 +00:00
Alexander V. Chernikov
a32603a55a Fix panic on IPv4 address removal introduced in r265279.
Reported by:	Trond Endrestøl
MFC with:	r265279
2014-05-03 20:22:13 +00:00
Alexander V. Chernikov
b980262e63 Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().
2014-05-03 16:28:54 +00:00
Xin LI
c6f70658c3 Fix TCP reassembly vulnerability.
Patch done by:	glebius
Security:	FreeBSD-SA-14:08.tcp
Security:	CVE-2014-3000
2014-04-30 04:02:57 +00:00
Alan Somers
7278b62aee Fix a panic when removing an IP address from an interface, if the same address
exists on another interface.  The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS.  The solution
is to use the interface fib in that call.  For the majority of users, that will
be equivalent to the legacy behavior.

PR:		kern/189089
Reported by:	neel
Reviewed by:	neel
MFC after:	3 weeks
X-MFC with:	264887
Sponsored by:	Spectra Logic
2014-04-29 14:46:45 +00:00
Alan Somers
0cfee0c223 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00