Commit Graph

3771 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
7a657e630d Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by:	(ten 211.ru)
Tested by:	(ten 211.ru)
MFC after:	4 days
2010-04-27 15:07:08 +00:00
Bruce M Simpson
fd963b9929 Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.

Submitted by:	Matthew Luckie
MFC after:	3 days
2010-04-27 14:14:21 +00:00
Michael Tuexen
6dbd88581d Sending a FWDTSN chunk should not affect the retran count.
MFC after: 3 days.
2010-04-25 19:00:37 +00:00
Michael Tuexen
475d0674a6 Undo my lastest fix since that wasn't one at all.
MFC after: 3 days.
2010-04-25 15:04:57 +00:00
Michael Tuexen
f31e6c7f26 * Fix compilation when using SCTP_AUDITING_ENABLED.
* Fix delaying of SACK by taking out old optimization code
  which does not optimize anymore.
* Fix fast retransmission of chunks abandoned by the
  "number of retransmissions" policy.

MFC after: 3 days.
2010-04-23 08:19:47 +00:00
Bjoern A. Zeeb
1c044382c3 Avoid memory access after free. Use the (shortend) copy for the
ipsec mtu lookup as well.

PR:		kern/145736
Submitted by:	Peter Molnar (peter molnar.cc)
MFC after:	3 days
2010-04-21 10:21:34 +00:00
Michael Tuexen
ee94f0a272 Update highest_tsn variables when sliding mapping arrays. 2010-04-20 08:51:21 +00:00
Michael Tuexen
553aff12d4 Really print the nr_mapping array when it should be printed.`
MFC after: 3 days.
2010-04-20 08:50:19 +00:00
Luigi Rizzo
6ba1ccc0f2 whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)
2010-04-19 16:17:30 +00:00
Kenneth D. Merry
3579cf4c4f Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was
causing TSO to break for the Xen netfront driver.

Reviewed by:	gibbs, rwatson
MFC after:	7 days
2010-04-19 15:15:36 +00:00
Michael Tuexen
307b49efef Get delayed SACK working again.
MFC after: 3 days.
2010-04-19 14:15:58 +00:00
Michael Tuexen
37f144eb5d Fix a bug where SACKs are not sent when they should.
Move some protection code to INVARIANTS.
Cleanups.

MFC after: 3 days.
2010-04-17 12:22:44 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb
0f08182a03 Try to help with a virtualized dummynet after r206428.
This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.
2010-04-10 22:11:01 +00:00
Rui Paulo
9c251892c0 Honor the CE bit even when the CWR bit is set.
PR:		145600
Submitted by:	Richard Scheffenegger <rs at netapp.com>
MFC after:	1 week
2010-04-10 12:47:06 +00:00
Bruce M Simpson
933fc4dde6 Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.

In inp_join_group():
 If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
 MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
 legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
 not violate POLA for apps using the old API.
 It also stops us falling through to kicking IGMP state transactions
 in what is otherwise a no-op case.
 [This has already been dealt with in HEAD, but make it explicit before
  we MFC the change to 8.]

In inp_leave_group():
 Fix a bogus conditional.
 Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
 where it actually belongs.
 If an interface was specified, by primary IPv4 address, for ioctl
 IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
 then and only then should we look up the ifp from the IPv4 address in
 mreqs.imr_interface.
 If not, we fall through to imo_match_group() as before, but only in
 the IP_DROP_MEMBERSHIP case.

With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.

Found by:	ermal (with pfSense)
MFC after:	3 days
2010-04-10 12:05:31 +00:00
Luigi Rizzo
368a605202 This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.
2010-04-09 18:02:19 +00:00
Luigi Rizzo
c11e54acfc no need to pass an argument to dn_compat_calc_size()
MFC after:	3 days
2010-04-09 16:06:53 +00:00
Luigi Rizzo
7f0de52d2c Hopefully fix the recent breakage in rule deletion.
A few  more tests and this will also go into -stable where
the problem is more critical.
2010-04-07 08:23:58 +00:00
Michael Tuexen
aed5947cd0 Fix a off-by-one bug in zeroing out the mapping arrays.
Fix sctp_print_mapping_array().

MFC after: 1 week
2010-04-06 18:57:50 +00:00
Michael Tuexen
c1589eec14 Use also SCTP/IPv6 checksum offloading in special cases.
MFC after: 2 weeks
2010-04-03 23:51:41 +00:00
Michael Tuexen
b5c164935e * Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
  FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
2010-04-03 15:40:14 +00:00
Xin LI
b80d1bf60e Add definition of IPv6 mobility header's protocol number, as assigned by
IANA and defined in RFC 3775.

Obtained from:	KAME
2010-03-31 23:02:25 +00:00
Luigi Rizzo
af84b6f8a7 fix bug in previous commit related to rule deletion
(stable/8 just fixed moments ago)
2010-03-31 02:20:22 +00:00
Luigi Rizzo
10afb58b81 remove a leftover debugging message 2010-03-29 12:27:49 +00:00
Luigi Rizzo
296ec631be Fix handling of set manipulations.
This patch has two fixes for potential kernel panics (one wrong
index, one access to the wrong lock) and two fixes to wrong logic
in a conditional. The potential panics are also on stable/8,
so I am going to MFC the fix quickly.
2010-03-29 12:19:23 +00:00
Randall Stewart
ff014514ee Adds the option of keeping per-cpu statistics in SCTP. This
may be useful since it gets rid of atomics but I want it to
remain an option until I can do further testing on if it really
speeds things up.
2010-03-24 20:02:40 +00:00
Randall Stewart
7fa19ca6c1 lagging file I forgot to commit with my nr-sack fixes... opps
Reviewed by:	tuexen@freebsd.org
2010-03-24 20:01:14 +00:00
Randall Stewart
77acdc2565 Fix for NR-Sack code. The code was NOT working properly when
enabled. Basically most of the operations were incorrect causing
bad sacks when you enabled nr-sack. The fixes range across
4 files and unifiy most of the processing so that we only test
nr_sack flags to decide which type of sack to generate.

Optimization left for this is to combine the sack generation
code and make it capable of generating either sack thus shrinking
out a routine.

Reviewed by:	tuexen@freebsd.org
2010-03-24 19:45:36 +00:00
Luigi Rizzo
592a685e33 Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.

PR:		145004
2010-03-24 15:16:59 +00:00
Randall Stewart
0e13104de6 Fixes a bug where SACKs in the face of
mapping_array expansion would break. Basically
once we expanded the array we no longer had both
mapping arrays in sync which the sack processing code depends on.
This would mean we were randomly referring to memory that was probably
not there. This mostly just gave us bad sack results going back to the peer.
If INVARIENTS was on of course we would hit the panic routine in the sack_check
call.

We also add a print routine for the place where one would panic in
invarients so one can see what the main mapping array holds.

Reviewed by: tuexen@freebsd.org
MFC after:	2 weeks
2010-03-23 01:36:50 +00:00
Kip Macy
3059584e2a - boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
  when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
  allocating again until below 12.5%

MFC after:	7 days
2010-03-22 23:04:12 +00:00
Luigi Rizzo
3b4d8b3f7a Add a priority-based packet scheduler.
Sponsored by:	The ONELAB2 Project
Submitted by:	Riccardo Panicucci
2010-03-21 16:30:32 +00:00
Luigi Rizzo
b4eacea680 no need for ipfw_flush_tables(), we just need ipfw_destroy_tables() 2010-03-21 15:54:07 +00:00
Luigi Rizzo
2baa9be5d7 revise documentation 2010-03-21 15:52:55 +00:00
Kip Macy
87aedea449 - spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1
- don't default to acquiring tcbinfo lock exclusively in rexmt

MFC after:	7 days
2010-03-20 19:47:30 +00:00
Bjoern A. Zeeb
d0e157f6aa Add pcb reference counting to the pcblist sysctl handler functions
to ensure type stability while caching the pcb pointers for the
copyout.

Reviewed by:	rwatson
MFC after:	7 days
2010-03-17 18:28:27 +00:00
Luigi Rizzo
0804384f1d small fixes to estimate the buffer size when requesting all pipes/flows. 2010-03-15 18:09:21 +00:00
Luigi Rizzo
f9f7bde3bc + implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
  pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
  enabled with the command

        queue X config mask queue ...

  and makes it possible to support priority-based schedulers, where
  packets should be grouped according to the priority and not some
  fields in the 5-tuple.
  This is implemented as follows:
  - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
    without changing the size or shape of the structure, so there are
    no ABI changes. On passing, also document how other fields are
    used, and remove some useless assignments in ip_fw2.c

  - implement small changes in the userland code to set/read the field;

  - revise the functions in ip_dummynet.c to manipulate masks so they
    also handle the additional field;

There are no ABI changes in this commit.
2010-03-15 17:14:27 +00:00
Robert Watson
9bcd427b89 Abstract out initialization of most aspects of struct inpcbinfo from
their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and
create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy()
to do this work in a central spot.  As inpcbinfo becomes more complex
due to ongoing work to add connection groups, this will reduce code
duplication.

MFC after:      1 month
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-14 18:59:11 +00:00
Randall Stewart
1966e5b5a1 The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR:		144529
MFC after:	2 weeks
2010-03-12 22:58:52 +00:00
Kip Macy
d4121a02c0 - restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
  (e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after:	7 days
2010-03-12 05:03:26 +00:00
Luigi Rizzo
5007b59f26 implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.
2010-03-11 22:42:33 +00:00
Luigi Rizzo
642dddf0f8 fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by:	Riccardo Panicucci
2010-03-10 14:21:05 +00:00
Qing Li
c7ea0aa648 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
Luigi Rizzo
feadd2b1ca cosmetic changes and C++ compatibility 2010-03-08 11:27:39 +00:00
Luigi Rizzo
d12cc63303 don't use C++ keywords as variable names 2010-03-08 11:27:08 +00:00
Luigi Rizzo
b854138d5f do not report an error unnecessarily 2010-03-08 11:22:47 +00:00
Bjoern A. Zeeb
376aadf896 Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).

Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC after:	5 days
2010-03-07 15:58:44 +00:00
Bjoern A. Zeeb
e253cdd07c Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00