Commit Graph

382 Commits

Author SHA1 Message Date
Luigi Rizzo
1cdc5f0b87 cleanup and document in some detail the internals of the testing code
for dummynet schedulers
2016-01-27 02:22:31 +00:00
Luigi Rizzo
ff8d60ab4d the _Static_assert was not supposed to be in the commit. 2016-01-27 02:14:08 +00:00
Luigi Rizzo
788c0c66ab bugfix: the scheduler template (dn_schk) for the round robin scheduler
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).

This is a merge candidate for stable and 10.3

MFC after:	3 days
2016-01-27 02:08:30 +00:00
Luigi Rizzo
10d72ffc7d fix various warnings to compile the test code with -Wextra 2016-01-26 23:37:07 +00:00
Luigi Rizzo
fa57c83c70 fix various warnings (signed/unsigned, printf types, unused arguments) 2016-01-26 23:36:18 +00:00
Luigi Rizzo
f6a5c66400 prevent warnings for signed/unsigned comparisons and unused arguments.
Add checks for parameters overflowing 32 bit.
2016-01-26 22:46:58 +00:00
Luigi Rizzo
e72cd9a70d prevent warning for unused argument 2016-01-26 22:45:45 +00:00
Luigi Rizzo
4d85bfeb07 avoid warnings for signed/unsigned comparison and unused arguments 2016-01-26 22:45:05 +00:00
Luigi Rizzo
f51b072d4c Revert one chunk from commit 285362, which introduced an off-by-one error
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.
2016-01-26 04:48:24 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Alexander V. Chernikov
fa7c058bf8 Fix panic on table/table entry delete. The panic could have happened
if more than 64 distinct values had been used.

Table value code uses internal objhash API which requires unique key
  for each object. For value code, pointer to the actual value data
  is used. The actual problem arises from the fact that 'actual' e.g.
  runtime data is stored in array and that array is auto-growing. There is
  special hook (update_tvalue() function) which is used to update the pointers
  after the change. For some reason, object 'key' was not updated.
  Fix this by adding update code to the update_tvalue().

Sponsored by:	Yandex LLC
2016-01-21 18:20:40 +00:00
Alexander V. Chernikov
89fc126add Initialize error value ta_lookup_kfib() by default to please compiler. 2016-01-10 08:37:00 +00:00
Bjoern A. Zeeb
60c274aaf8 Initialize error after r293626 in case neither INET nor INET6 is
compiled into the kernel.  Ideally lots more code would just not
be called (or compiled in) in that case but that requires a lot
more surgery.  For now try to make IP-less kernels compile again.
2016-01-10 08:14:25 +00:00
Alexander V. Chernikov
004d3e30a7 Make ipfw addr:kfib lookup algo use new routing KPI. 2016-01-10 06:43:43 +00:00
Alexander V. Chernikov
3673828490 Use already pre-calculated number of entries instead of tc->count. 2016-01-10 00:28:44 +00:00
Alexander V. Chernikov
ea8d14925c Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
Alexander V. Chernikov
460a5b502f Convert pf(4) to the new routing API.
Differential Revision:	https://reviews.freebsd.org/D4763
2016-01-07 10:20:03 +00:00
Hans Petter Selasky
c8cfbc066f Properly drain callouts in the IPFW subsystem to avoid use after free
panics when unloading the dummynet and IPFW modules:

- The callout drain function can sleep and should not be called having
a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)".

- Add a new "dn_gone" variable to prevent asynchronous restart of
dummynet callouts when unloading the dummynet kernel module.

- Call "dn_reschedule()" locked so that "dn_gone" can be set and
checked atomically with regard to starting a new callout.

Reviewed by:	hiren
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3855
2015-12-15 09:02:05 +00:00
Alexander V. Chernikov
65ff3638df Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
Andrey V. Elsukov
1cf09efe5d Add destroy_object callback to object rewriting framework.
It is called when last reference to named object is going to be released
and allows to do additional cleanup for implementation of named objects.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-23 22:06:55 +00:00
Bryan Drewery
7143303723 Fix dynamic IPv6 rules showing junk for non-specified address masks.
For example:
  00002      0         0 (19s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
  00002      4       412 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
  00002     10       777 (1s) LIMIT tcp 2001:894:5a24:653::503:1 52023 <-> 2001:894:5a24:653:ca0a:a9ff:fe04:3978 22
  00002      0         0 (17s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> 80f3:70d:23fe:ffff:1005:: 0

Fix this by zeroing the unused address, as is done for IPv4:
  00002     0         0 (18s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
  00002    36     14952 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
  00002     0         0 (0s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> :: 0
  00002     4       345 (274s) LIMIT tcp 2001:894:5a24:653::503:1 34131 <-> 2001:470:1f11:262:ca0a:a9ff:fe04:3978 22

MFC after:	2 weeks
2015-11-17 20:42:08 +00:00
Alexander V. Chernikov
637670e77e Bring back the ability of passing cached route via nd6_output_ifp(). 2015-11-15 16:02:22 +00:00
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Alexander V. Chernikov
91e93daf9c Print proper setfib values in ipfw log.
Submitted by:	Denis Schneider <v1ne2go at gmail>
2015-11-08 13:44:21 +00:00
Alexander V. Chernikov
b554a27822 Fix setfib target.
Problem was introduced in r272840 when converting tablearg value to 0.

Submitted by:	Denis Schneider <v1ne2go at gmail>
2015-11-08 12:24:19 +00:00
Kristof Provost
5a505b317a pf: Fix broken rule skip calculation
r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.
2015-11-07 23:51:42 +00:00
Andrey V. Elsukov
ee09cb0bfb Remove now obsolete KASSERT.
Actually, object classify callbacks can skip some opcodes, that could
be rewritten. We will deteremine real numbed of rewritten opcodes a bit
later in this function.

Reported by:	David H. Wolfskill <david at catwhisker dot org>
2015-11-03 22:23:09 +00:00
Andrey V. Elsukov
748c9559ee Eliminate any conditional increments of object_opcodes in the
check_ipfw_rule_body() function. This function is intended to just
determine that rule has some opcodes that can be rewrited. Then the
ref_rule_objects() function will determine real number of rewritten
opcodes using classify callback.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:34:26 +00:00
Andrey V. Elsukov
f81431cca1 Add ipfw_check_object_name_generic() function to do basic checks for an
object name correctness. Each type of object can do more strict checking
in own implementation. Do such checks for tables in check_table_name().

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:29:46 +00:00
Andrey V. Elsukov
5dc5a0e0aa Implement ipfw internal olist command to list named objects.
Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:21:53 +00:00
Kristof Provost
679e3c77b7 pf: Fix IPv6 checksums with route-to.
When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by:	Luoqi Chen
MFC after:	1 week
2015-10-29 20:45:53 +00:00
Alexander V. Chernikov
78546dad4e Eliminate last rtalloc_ign() caller.
Differential Revision:	https://reviews.freebsd.org/D3927
2015-10-27 21:25:40 +00:00
Kristof Provost
c110fc49da pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
Alexander V. Chernikov
c6fb65b1df Bump number of prefixes in O_IP_<SRC|DST> from 15 to 31 (max possible).
PR:		203459
Submitted by:	groos at xiplink.com
MFC after:	2 weeks
2015-10-03 05:42:25 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Kristof Provost
2f6c345adf pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set
If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR:	202351
Reviewed by:	eri
Differential Revision:	https://reviews.freebsd.org/D3534
2015-09-01 19:04:04 +00:00
Kristof Provost
64b3b4d611 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
Alexander V. Chernikov
3535eac433 Fix packets/bytes accounting on i386.
Spotted by:	julian
2015-08-27 07:53:58 +00:00
Luiz Otavio O Souza
22932fc9be Reapply r196551 which was accidentally reverted by r223637 (update to
OpenBSD pf 4.5).

Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

This fix the failure when you hit the self table size and force it to be
resized.

MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza
0a70aaf8f5 Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza
f2fc809dcd Fix the copy of addresses passed from userland in table replace command.
The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).

Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-17 23:03:54 +00:00
Mariusz Zaborski
643ef281cd Use correct src/dst ports when removing states.
Submitted by:	Milosz Kaniewski <m.kaniewski@wheelsystems.com>,
		UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal)
Reviewed by:	glebius
Approved by:	pjd (mentor)
Obtained from:	OpenBSD
MFC after:	3 days
2015-08-11 17:24:34 +00:00
Andrey V. Elsukov
b13653baf9 Reduce overhead of ipfw's me6 opcode.
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 10:53:42 +00:00
Kristof Provost
48c29b118e pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

PR:		201879, 201932
MFC after:	5 days
2015-07-29 06:35:36 +00:00
Renato Botelho
299c819a75 Simplify logic added in r285945 as suggested by glebius
Approved by:	glebius
MFC after:	3 days
Sponsored by:	Netgate
2015-07-28 14:59:29 +00:00
Renato Botelho
b1b98a2db7 Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by:	gnn, eri
Approved by:	gnn
Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Netgate
Differential Revision:	https://reviews.freebsd.org/D3222
2015-07-28 10:31:34 +00:00
Gleb Smirnoff
3e437fd2c6 Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by:	dhartmei
2015-07-28 09:36:26 +00:00
Andrey V. Elsukov
af9aa0a837 Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 07:26:31 +00:00
Luigi Rizzo
4af7aed7c6 assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors):
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.
2015-07-10 19:24:36 +00:00
Luigi Rizzo
e38e277fc4 one more warning suppression when compiling the test code in userspace. 2015-07-10 19:18:49 +00:00