5139 Commits

Author SHA1 Message Date
ae
cbc4e577f0 Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
tuexen
02ec72fed7 Ensure that the COOKIE-ACK can be sent over UDP if the COOKIE-ECHO was
received over UDP.
Thanks to Felix Weinrank for makeing me aware of the problem and to
Irene Ruengeler for providing the fix.

MFC after: 1 week
2015-05-12 08:08:16 +00:00
gnn
ac5008de11 Add a state transition call to show that we have entered TIME_WAIT.
Although this is not important to the rest of the TCP processing
it is a conveneint way to make the DTrace state-transition probe
catch this important state change.

MFC after:	1 week
2015-05-01 12:49:03 +00:00
gnn
d333d3e495 Move the SIFTR DTrace probe out of the writing thread context
and directly into the place where the data is collected.
2015-04-30 17:43:40 +00:00
gnn
08d35a248b Brief demo script showing the various values that can be read via
the new SIFTR statically defined tracepoint (SDT).

Differential Revision:	https://reviews.freebsd.org/D2387
Reviewed by:	bz, markj
2015-04-29 17:19:55 +00:00
melifaro
9f3d7ccd07 Make rule table kernel-index rewriting support any kind of objects.
Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
 values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
 do a) for them.

Kernel does almost the same when exporting rules to userland:
 prepares array of used tables in all rules in range, and
 prepends it before the actual ruleset retaining actual in-kernel
 indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
  between userland string identifier and in-kernel index.
  (See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
 shared index and names instead of numbers (in next commit).

Sponsored by:	Yandex LLC
2015-04-27 08:29:39 +00:00
ae
b38774ffc9 Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet()
returns EJUSTRETURN.

Sponsored by:	Yandex LLC
2015-04-27 01:11:09 +00:00
ae
5a6412a276 Fix possible use after free due to security policy deletion.
When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(),
we hold one reference to security policy and release it just after return
from this function. But IPSec processing can be deffered and when we release
reference to security policy after ipsec[46]_process_packet(), user can
delete this security policy from SPDB. And when IPSec processing will be
done, xform's callback function will do access to already freed memory.

To fix this move KEY_FREESP() into callback function. Now IPSec code will
release reference to SP after processing will be finished.

Differential Revision:	https://reviews.freebsd.org/D2324
No objections from:	#network
Sponsored by:	Yandex LLC
2015-04-27 00:55:56 +00:00
tuexen
5106da568a Don't panic under INVARIANTS when receiving a SACK which cumacks
a TSN never sent.
While there, fix two typos.

MFC after: 1 week
2015-04-26 21:47:15 +00:00
bapt
9ec79fef30 mdoc: fix rendering issues 2015-04-26 11:39:25 +00:00
ae
2af9b531aa Fix possible reference leak.
Sponsored by:	Yandex LLC
2015-04-24 21:05:29 +00:00
glebius
2ac0a031ed Improve carp(4) locking:
- Use the carp_sx to serialize not only CARP ioctls, but also carp_attach()
  and carp_detach().
- Use cif_mtx to lock only access to those the linked list.
- These locking changes allow us to do some memory allocations with M_WAITOK
  and also properly call callout_drain() in carp_destroy().
- In carp_attach() assert that ifaddr isn't attached. We always come here
  with a pristine address from in[6]_control().

Reviewed by:	oleg
Sponsored by:	Nginx, Inc.
2015-04-21 20:25:12 +00:00
glebius
14b7122d6d Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
jch
b227cb3d85 Fix an old and well-documented use-after-free race condition in
TCP timers:
 - Add a reference from tcpcb to its inpcb
 - Defer tcpcb deletion until TCP timers have finished

Differential Revision:	https://reviews.freebsd.org/D2079
Submitted by:		jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by:		imp, rrs, adrian, jhb, bz
Approved by:		jhb
Sponsored by:		Verisign, Inc.
2015-04-16 10:00:06 +00:00
adrian
efbeb67101 Fix RSS build - netisr input / NETISR_IP_DIRECT is used here. 2015-04-15 00:57:21 +00:00
mjg
7f65d178b4 Replace struct filedesc argument in getsock_cap with struct thread
This is is a step towards removal of spurious arguments.
2015-04-11 16:00:33 +00:00
mjg
22da590f11 fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
delphij
34b5443a48 Attempt to fix build after 281351 by defining full prototype for the
functions that were moved to ip_reass.c.
2015-04-11 01:06:59 +00:00
glebius
d2eb1d4ad8 o Use Jenkins hash. With previous hash, for a single source IP address and
sequential IP ID case (e.g. ping -f), distribution fell into 8-10 buckets
  out of 64. With Jenkins hash, distribution is even.
o Add random seed to the hash.

Sponsored by:	Nginx, Inc.
2015-04-10 06:55:43 +00:00
glebius
a0f9f3c303 Move all code related to IP fragment reassembly to ip_reass.c. Some
function names have changed and comments are reformatted or added, but
there is no functional change.

Claim copyright for me and Adrian.

Sponsored by:	Nginx, Inc.
2015-04-10 06:02:37 +00:00
glebius
1f2f31e87d Now that IP reassembly is no longer under single lock, book-keeping amount
of allocations in V_nipq is racy.  To fix that, we would simply stop doing
book-keeping ourselves, and rely on UMA doing that.  There could be a
slight overcommit due to caches, but that isn't a big deal.

o V_nipq and V_maxnipq go away.
o net.inet.ip.fragpackets is now just SYSCTL_UMA_CUR()
o net.inet.ip.maxfragpackets could have been just SYSCTL_UMA_MAX(), but
  historically it has special semantics about values of 0 and -1, so
  provide sysctl_maxfragpackets() to handle these special cases.
o If zone limit lowers either due to net.inet.ip.maxfragpackets or due to
  kern.ipc.nmbclusters, then new function ipq_drain_tomax() goes over
  buckets and frees the oldest packets until we are in the limit.
  The code that (incorrectly) did that in ip_slowtimo() is removed.
o ip_reass() doesn't check any limits and calls uma_zalloc(M_NOWAIT).
  If it fails, a new function ipq_reuse() is called. This function will
  find the oldest packet in the currently locked bucket, and if there is
  none, it will search in other buckets until success.

Sponsored by:	Nginx, Inc.
2015-04-09 22:13:27 +00:00
glebius
8685b42463 In the ip_reass() do packet examination and adjusting before acquiring
locks and doing lookups.

Sponsored by:	Nginx, Inc.
2015-04-09 21:32:32 +00:00
glebius
a68631b0e4 Make ip reassembly queue mutexes per-vnet, putting them into the structure
that they protect.

Sponsored by:	Nginx, Inc.
2015-04-09 21:17:07 +00:00
glebius
977624bc79 Use TAILQ_FOREACH_SAFE() instead of implementing it ourselves.
Sponsored by:	Nginx, Inc.
2015-04-09 09:00:32 +00:00
glebius
03afa52780 If V_maxnipq is set to zero, drain the queue here and now, instead of
relying on timeouts.

Sponsored by:	Nginx, Inc.
2015-04-09 08:56:23 +00:00
glebius
ddd581998f o Since we always update either fragdrop or fragtimeout stat counter when we
free a fragment, provide two inline functions that do that for us:
  ipq_drop() and ipq_timeout().
o Rename ip_free_f() to ipq_free() to match the name scheme of IP reassembly.
o Remove assertion from ipq_free(), since it requires extra argument to be
  passed, but locking scheme is simple enough and function is static.

Sponsored by:	Nginx, Inc.
2015-04-09 08:52:02 +00:00
glebius
89392dc82f Rename ip_drain_locked() to ip_drain_vnet(), since the function differs
from ip_drain() not in locking, but in the scope of its work.

Sponsored by:	Nginx, Inc.
2015-04-09 08:37:16 +00:00
adrian
f1a634931a Move the IPv4 reassembly queue locking from a single lock to be per-bucket (global).
This significantly improves performance on multi-core servers where there
is any kind of IPv4 reassembly going on.

glebius@ would like to see the locking moved to be attached to the reassembly
bucket, which would make it per-bucket + per-VNET, instead of being global.
I decided to keep it global for now as it's the minimal useful change;
if people agree / wish to migrate it to be per-bucket / per-VNET then please
do feel free to do so.  I won't complain.

Thanks to Norse Corp for giving me access to much larger servers
to test this at across the 4 core boxes I have at home.

Differential Revision:	https://reviews.freebsd.org/D2095
Reviewed by:	glebius (initial comments incorporated into this patch)
MFC after:	2 weeks
Sponsored by:	Norse Corp, Inc (hardware)
2015-04-07 23:09:34 +00:00
delphij
4836e6055c Improve patch for SA-15:04.igmp to solve a potential buffer overflow.
Reported by:	bde
Submitted by:	oshogbo
2015-04-07 20:20:03 +00:00
glebius
42eded17aa Add sleepable lock to protect at least against two parallel SIOCSVHs.
Sponsored by:	Nginx, Inc.
2015-04-06 15:31:19 +00:00
hselasky
5d01d90225 Extend fixes made in r278103 and r38754 by copying the complete packet
header and not only partial flags and fields. Firewalls can attach
classification tags to the outgoing mbufs which should be copied to
all the new fragments. Else only the first fragment will be let
through by the firewall. This can easily be tested by sending a large
ping packet through a firewall. It was also discovered that VLAN
related flags and fields should be copied for packets traversing
through VLANs. This is all handled by "m_dup_pkthdr()".

Regarding the MAC policy check in ip_fragment(), the tag provided by
the originating mbuf is copied instead of using the default one
provided by m_gethdr().

Tested by:		Karim Fodil-Lemelin <fodillemlinkarim at gmail.com>
MFC after:		2 weeks
Sponsored by:		Mellanox Technologies
PR:			7802
2015-04-02 15:47:37 +00:00
jch
990bf230ee Provide better debugging information in tcp_timer_activate() and
tcp_timer_active()

Differential Revision:	https://reviews.freebsd.org/D2179
Suggested by:		bz
Reviewed by:		jhb
Approved by:		jhb
2015-04-02 14:43:07 +00:00
glebius
0514a3075b Provide a comment explaining issues with the counter(9) trick, so that
people won't copy and paste it blindly.

Prodded by:	ian
Sponsored by:	Nginx, Inc.
2015-04-02 14:22:59 +00:00
bz
868c1ec5c2 Try to unbreak the build after r280971 by providing the missing
#include header for SYSINIT.
2015-04-02 00:30:53 +00:00
glebius
7c22152af0 o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
jch
e6365e0ee8 Use appropriate timeout_t* instead of void* in tcp_timer_activate()
Suggested by:		imp
Differential Revision:	https://reviews.freebsd.org/D2154
Reviewed by:		imp, jhb
Approved by:		jhb
2015-03-31 10:17:13 +00:00
glebius
5292c6a2d2 VNETalize random IP ID engine.
Sponsored by:	Nginx, Inc.
2015-03-28 16:59:57 +00:00
glebius
96ef75c4d1 Initialize random IP ID engine via SYSINIT() instead of doing that on
first packet.  This allow to use M_WAITOK and cut down some error handling.

Sponsored by:	Nginx, Inc.
2015-03-28 16:06:46 +00:00
fabient
d819d79785 On multi CPU systems, we may emit successive packets with the same id.
Fix the race by using an atomic operation.

Differential Revision:	https://reviews.freebsd.org/D2141
Obtained from:	emeric.poupon@stormshield.eu
MFC after:	1 week
Sponsored by:	Stormshield
2015-03-27 13:26:59 +00:00
tuexen
65bba872bd Improve the selection of the destination address of SACK chunks.
This fixes
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196755
and is joint work with rrs@.

MFC after: 1 week
2015-03-26 22:05:31 +00:00
tuexen
7ca13bb5b0 Make sure that we don't free an SCTP shared key too early.
Thanks to Pouyan Sepehrdad from Qualcomm Product Security Initiative
for reporting the issue.
MFC after: 3 days
2015-03-25 22:45:54 +00:00
tuexen
34cbdad427 Use the reference count of the right SCTP inp.
Joint work with rrs@

MFC after: 3 days
2015-03-25 21:41:20 +00:00
tuexen
5d17873a8e Fix two bugs which resulted in a screwed up end point list:
* Use a save way to walk throught a list while manipulting it.
* Have to appropiate locks in place.
Joint work with rrs@

MFC after: 3 days
2015-03-24 21:12:45 +00:00
lstewart
3cad1a07e8 The addition of flowid and flowtype in r280233 and r280237 respectively forgot
to extend the IPv6 packet node format string, which causes a build failure when
SIFTR is compiled with IPv6 support.

Reported by:	Lars Eggert
2015-03-24 15:08:43 +00:00
tuexen
dd7a50a918 Fix the bug in the handling of fragmented abandoned SCTP user messages reported in
https://code.google.com/p/sctp-refimpl/issues/detail?id=11
Thanks to Lally Singh for reporting it.
MFC after: 3 days
2015-03-24 15:05:36 +00:00
tuexen
8a1215fa15 Fix an accounting bug related to the per stream chunk counter.
While there, don't refer to a net articifically.

MFC after: 3 days
2015-03-24 14:51:46 +00:00
tuexen
aa1ba0c150 When an ICMP message is received and the MTU shrinks, only
mark outstanding chunks for retransmissions.

MFC after: 3 days
2015-03-23 23:34:21 +00:00
tuexen
834e273477 Remove a useless assignment.
MFC after: 1 week
2015-03-23 15:12:02 +00:00
hiren
78a38e3fbf Add connection flow type to siftr(4).
Suggested by:	adrian
Sponsored by:	Limelight Networks
2015-03-19 00:23:16 +00:00
hiren
3617a765ed Add connection flowid to siftr(4).
Reviewed by:	lstewart
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D2089
2015-03-18 23:24:25 +00:00