4624 Commits

Author SHA1 Message Date
Gleb Smirnoff
4937a6561f Simplify ip_stripoptions() reducing number of intermediate
variables.
2012-10-23 10:29:31 +00:00
Gleb Smirnoff
8ad458a471 Do not reduce ip_len by size of IP header in the ip_input()
before passing a packet to protocol input routines.
  For several protocols this mean that now protocol needs to
do subtraction itself, and for another half this means that
we do not need to add header length back to the packet.

  Make ip_stripoptions() to adjust ip_len, since now we enter
this function with a packet header whose ip_len does represent
length of entire packet, not payload only.
2012-10-23 08:33:13 +00:00
Xin LI
6f56329a25 Remove __P.
Submitted by:	kevlo
Reviewed by:	md5(1)
MFC after:	2 months
2012-10-22 21:49:56 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Andrey Zonov
32fe38f123 - Update cachelimit after hashsize and bucketlimit were set.
Reported by:	az
Reviewed by:	melifaro
Approved by:	kib (mentor)
MFC after:	1 week
2012-10-19 14:00:03 +00:00
Andre Oppermann
c9b652e3e8 Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
2012-10-18 13:57:24 +00:00
Ed Maste
983731268c Avoid potential bad pointer dereference.
Previously RuleAdd would leave entry->la unset for the first entry in
the proxyList.

Sponsored by: ADARA Networks
MFC After: 1 week
2012-10-17 20:23:07 +00:00
Gleb Smirnoff
e76163a539 We don't need to convert ip6_len to host byte order before
ip6_output(), the IPv6 stack is working in net byte order.

The reason this code worked before is that ip6_output()
doesn't look at ip6_plen at all and recalculates it based
on mbuf length.
2012-10-15 07:57:55 +00:00
Gleb Smirnoff
347d90acff Fix a miss from r241344: in ip_mloopback() we need to go to
net byte order prior to calling in_delayed_cksum().

Reported by:	 Olivier Cochard-Labbe <olivier cochard.me>
2012-10-14 15:08:07 +00:00
Alexander V. Chernikov
3bff27cd67 Cleanup documentation: cloning route support has been removed in r186119.
MFC after:	2 weeks
2012-10-13 09:31:01 +00:00
Gleb Smirnoff
86b61e4748 Revert fixup of ip_len from r241480. Now stack isn't yet
ready for that change.
2012-10-12 09:32:38 +00:00
Gleb Smirnoff
105bd2113b In ip_stripoptions():
- Remove unused argument and incorrect comment.
  - Fixup ip_len after stripping.
2012-10-12 09:24:24 +00:00
Alexander V. Chernikov
3c2824b9ef Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is
enabled. This eliminates one mtx_lock() per each routing lookup thus improving
performance in several cases (routing to directly connected interface or routing
to default gateway).

Icmp redirects should not be used to provide routing direction nowadays, even
for end hosts. Routers should not use them too (and this is explicitly restricted
in IPv6, see RFC 4861, clause 8.2).

Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every
AF_INET routing table in given VNET instance on drop_redirect sysctl change.

This change is part of bigger patch eliminating rte locking.

Sponsored by:	Yandex LLC
MFC after:	2 weeks
2012-10-10 19:06:11 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Gleb Smirnoff
23e9c6dc1e After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
  - ip_output(), ipfilter(4) not changed, since already call
    in_delayed_cksum() with header in net byte order.
  - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
    there and back.
  - mrouting code and IPv6 ipsec now need to switch byte order there and
    back, but I hope, this is temporary solution.
  - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
  - pf_route() catches up on r241245 changes to ip_output().
2012-10-08 08:03:58 +00:00
Gleb Smirnoff
b7fb54d8ae No reason to play with IP header before calling sctp_delayed_cksum()
with offset beyond the IP header.
2012-10-08 07:21:32 +00:00
Gleb Smirnoff
21d172a3f1 A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
Gleb Smirnoff
df4e91d386 There is a complex race in in_pcblookup_hash() and in_pcblookup_group().
Both functions need to obtain lock on the found PCB, and they can't do
classic inter-lock with the PCB hash lock, due to lock order reversal.
To keep the PCB stable, these functions put a reference on it and after PCB
lock is acquired drop it. If the reference was the last one, this means
we've raced with in_pcbfree() and the PCB is no longer valid.

  This approach works okay only if we are acquiring writer-lock on the PCB.
In case of reader-lock, the following scenario can happen:

  - 2 threads locate pcb, and do in_pcbref() on it.
  - These 2 threads drop the inp hash lock.
  - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock,
    does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which
    doesn't free the pcb due to two references on it. Then it unlocks the pcb.
  - 2 aforementioned threads acquire reader lock on the pcb and run
    in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues,
    second gets 0 and considers pcb freed, returns.
  - The thread that got 1 continutes working with detached pcb, which later
    leads to panic in the underlying protocol level.

  To plumb that problem an additional INPCB flag introduced - INP_FREED. We
check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend
that that was the last reference.

Discussed with:		rwatson, jhb
Reported by:		Vladimir Medvedkin <medved rambler-co.ru>
2012-10-02 12:03:02 +00:00
Gleb Smirnoff
891122d180 carp_send_ad() should never return without rescheduling next run. 2012-09-29 05:52:19 +00:00
Gleb Smirnoff
85c05144f1 Fix bug in TCP_KEEPCNT setting, which slipped in in the last round
of reviewing of r231025.

Unlike other options from this family TCP_KEEPCNT doesn't specify
time interval, but a count, thus parameter supplied doesn't need
to be multiplied by hz.

Reported & tested by:	amdmi3
2012-09-27 07:13:21 +00:00
Michael Tuexen
e06f3469e0 Whitespace change.
MFC after:	3 days
2012-09-23 07:43:10 +00:00
Michael Tuexen
a98809db78 Declare a static function as such.
MFC after:	3 days
2012-09-23 07:23:18 +00:00
Michael Tuexen
efb0814c24 Fix a bug related to handling Re-config chunks. It is not true that
the association can be removed if the socket is gone.

MFC after:	3 days
2012-09-22 22:04:17 +00:00
Michael Tuexen
2089750009 Small cleanups. No functional change.
MFC after:	10 days
2012-09-22 14:39:20 +00:00
Kevin Lo
b7e1113e8f Fix typo: s/pakcet/packet 2012-09-20 03:29:43 +00:00
Eitan Adler
582212fa04 s/teh/the/g
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:59:55 +00:00
Michael Tuexen
dcb68fba2d Small cleanups. No functional change.
MFC after:	10 days
2012-09-14 18:32:20 +00:00
Gleb Smirnoff
3b3a8eb937 o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c		-> sys/netpfil/pf/
sys/contrib/pf/net/*.h		-> sys/net/
contrib/pf/pfctl/*.c		-> sbin/pfctl
contrib/pf/pfctl/*.h		-> sbin/pfctl
contrib/pf/pfctl/pfctl.8	-> sbin/pfctl
contrib/pf/pfctl/*.4		-> share/man/man4
contrib/pf/pfctl/*.5		-> share/man/man5

sys/netinet/ipfw		-> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with:		bz, luigi
2012-09-14 11:51:49 +00:00
Michael Tuexen
8225a9bc85 Whitespace changes.
MFC after: 10 days
2012-09-09 08:14:04 +00:00
Michael Tuexen
fe6bb0a788 Whitespace cleanup.
MFC after: 10 days
2012-09-08 20:54:54 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
Michael Tuexen
a169d6ec2b Don't include a structure containing a flexible array in another
structure.

MFC after:	10 days
2012-09-07 13:36:42 +00:00
Michael Tuexen
12780a595e Get rid of a gcc'ism.
MFC after: 10 days
2012-09-06 07:03:56 +00:00
Michael Tuexen
dd294dcec6 Using %p in a format string requires a void *.
MFC after: 10 days
2012-09-05 18:52:01 +00:00
Michael Tuexen
2899aa8f65 Use the consistenly the size of a variable. This helps to keep the code
simpler for the userland implementation.

MFC after: 3 days
2012-09-04 22:45:00 +00:00
Michael Tuexen
c6328f940e Whitespace change.
MFC after: 3 days
2012-09-04 22:40:49 +00:00
Alexander V. Chernikov
7d4317bd40 Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after:     3 weeks
2012-09-04 19:43:26 +00:00
Gleb Smirnoff
478df1d534 Provide a sysctl switch that allows to install ARP entries
with multicast bit set. FreeBSD refuses to install such
entries since 9.0, and this broke installations running
Microsoft NLB, which are violating standards.

Tested by:	Tarasov Oleg <oleg_tarasov sg-tea.com>
2012-09-03 14:29:28 +00:00
Michael Tuexen
81eb4e6351 Fix a typo which results in RTT to be off by a factor of 10, if the RTT is
larger than 1 second.

MFC after:	3 days
2012-09-02 12:37:30 +00:00
Eitan Adler
64baf9fbe0 Mark the ipfw interface type as not being ether. This fixes an issue
where uuidgen tried to obtain a ipfw device's mac address which was
    always zero.

    PR:		170460
    Submitted by:	wxs
    Reviewed by:	bdrewery
    Reviewed by:	delphij
    Approved by:	cperciva
    MFC after:	1 week
2012-09-01 23:33:49 +00:00
Randall Stewart
ec03d5433f This small change takes care of a race condition
that can occur when both sides close at the same time.
If that occurs, without this fix the connection enters
FIN1 on both sides and they will forever send FIN|ACK at
each other until the connection times out. This is because
we stopped processing the FIN|ACK and thus did not advance
the sequence and so never ACK'd each others FIN. This
fix adjusts it so we *do* process the FIN properly and
the race goes away ;-)

MFC after:	1 month
2012-08-25 09:26:37 +00:00
Navdeep Parhar
06fd9875aa Correctly handle the case where an inp has already been dropped by the time
the TOE driver reports that an active open failed.  toe_connect_failed is
supposed to handle this but it should be provided the inpcb instead of the
tcpcb which may no longer be around.
2012-08-21 18:09:33 +00:00
Randall Stewart
7db496de2c Though I disagree, I conceed to jhb & Rui. Note
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)
2012-08-19 11:54:02 +00:00
Randall Stewart
9424879158 Ok jhb, lets move the ifa_free() down to the bottom to
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.

MFC after:	1 week (or more)
2012-08-17 05:51:46 +00:00
Lawrence Stewart
ee24d3b840 The TCP PAWS fix for kernels with fast tick rates (r231767) changed the TCP
timestamp related stack variables to reference ms directly instead of ticks.
The h_ertt(4) Khelp module relies on TCP timestamp information in order to
calculate its enhanced RTT estimates, but was not updated as part of r231767.

Consequently, h_ertt has not been calculating correct RTT estimates since
r231767 was comitted, which in turn broke all delay-based congestion control
algorithms because they rely on the h_ertt RTT estimates.

Fix the breakage by switching h_ertt to use tcp_ts_getticks() in place of all
previous uses of the ticks variable. This ensures all timestamp related
variables in h_ertt use the same units as the TCP stack and therefore results in
meaningful comparisons and RTT estimate calculations.

Reported & tested by:	Naeem Khademi (naeemk at ifi uio no)
Discussed with:	bz
MFC after:	3 days
2012-08-17 01:49:51 +00:00
Randall Stewart
184749821f Its never a good idea to double free the same
address.

MFC after:	1 week (after the other commits ahead of this gets MFC'd)
2012-08-16 17:55:16 +00:00
Luigi Rizzo
e5813a3bce s/lenght/length/ in comments 2012-08-07 07:52:25 +00:00
Luigi Rizzo
17369272e4 move functions outside the SYSBEGIN/SYSEND block
(SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to
emulate sysctl on platforms that do not have them, and they work
by creating an array which contains all the sysctl-ed symbols.)
2012-08-06 11:02:23 +00:00
Luigi Rizzo
00c4633285 use FREE_PKT instead of m_freem to free an mbuf.
The former is the standard form used in ipfw/dummynet, so that
it is easier to remap it to different memory managers depending
on the platform.
2012-08-06 10:50:43 +00:00