Commit Graph

2958 Commits

Author SHA1 Message Date
ae
b3c4973a10 Fix style and comments. 2013-03-19 05:51:47 +00:00
glebius
b37af62b9e Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
glebius
76306b3465 - Use m_getcl() instead of hand allocating.
- Convert panic() to KASSERT.
- Remove superfluous cleaning of mbuf fields after allocation.
- Add comment on possible use of m_get2() here.

Sponsored by:	Nginx, Inc.
2013-03-15 12:52:59 +00:00
glebius
37a43650ed Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
glebius
18f90896db Reinitialize eh after pfil(9) processing.
PR:		176764
Submitted by:	adri
2013-03-11 12:06:57 +00:00
melifaro
fccbb24392 Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00
melifaro
2a77ba4103 Write lock is not required for find&compare operation.
MFC after:	2 weeks
2013-03-05 13:38:45 +00:00
glebius
f8098d720c Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
glebius
a47c0295c5 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
rrs
75ad250e97 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
glebius
7f832c3059 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
glebius
5e925b9e5a route_output() always supplies info with RTAX_GATEWAY member that
points to a sockaddr of AF_LINK family. Assert this instead of
checking.
2013-01-29 21:44:22 +00:00
np
66b6d0e94e Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by:	bz@
2013-01-25 23:58:21 +00:00
glebius
9f6a60e000 - Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.
2013-01-24 14:29:31 +00:00
luigi
47486b84ac leftover from r245579... flags for semi transparent mode and direct
forwarding through a VALE switch
2013-01-23 03:49:48 +00:00
glebius
bc87b91f9e If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.
2013-01-21 08:59:31 +00:00
glebius
cf8b6db820 - Add dashes before copyright notices.
- Add $FreeBSD$.
- Remove unused define.
2013-01-07 19:36:11 +00:00
peter
184da5d83d Juggle some internal symbols from our antique zlib (that originally came
in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice.

This is a horrible hack to get freefall to compile, and is in dire need
of reconciliation.  This antique zlib-1.04 code needs to go away.
2013-01-06 14:59:59 +00:00
ae
f89408c03e Add an ability to set net.link.stf.permit_rfc1918 from the loader.
MFC after:	2 weeks
2012-12-27 21:26:08 +00:00
ae
6064718574 Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow
the use of private IPv4 addresses with stf(4).

MFC after:	2 weeks
2012-12-27 20:59:22 +00:00
kevlo
df821e6509 Fix typo in comment.
Reviewed by:	thompsa
2012-12-18 06:37:23 +00:00
glebius
8137816adb Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
ghelmer
726beb3d43 Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.
2012-12-10 16:14:44 +00:00
hrs
377b89c55f - Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
2012-12-05 19:45:24 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
hrs
db5359d69a - Fix LOR in sa6_recoverscope() in rt_msg2()[1].
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).

Reported by:	rpaulo[1], Mateusz Guzik[1], peter[2]
2012-12-04 17:12:23 +00:00
melifaro
20cedcd3c8 Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after:	1 week
2012-12-02 21:43:37 +00:00
pjd
30b3450aa9 - Use more appropriate loop (do { } while()) when generating ethernet address
for bridge interface.
- If we found a collision we can break the loop - only one collision is
  possible and one is exactly enough to need to renegerate.

Obtained from:	WHEEL Systems
MFC after:	1 week
2012-11-29 08:06:23 +00:00
andre
cd840ea089 Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with:	yongari
MFC after:	1 week
2012-11-27 19:31:49 +00:00
davidxu
852ac8ea6c Pass allocated unit number to make_dev, otherwise kernel panics later while
cloning second tap.

Reviewed by: kevlo,ed
2012-11-27 12:23:57 +00:00
glebius
d927dc2039 Better safe than sorry: reinitialize eh after ng_ether(4) and
if_bridge(4) processing, since mbuf may be modified there.

Submitted by:	youngari
2012-11-27 06:35:26 +00:00
glebius
ff9490f069 Re-initialize eh pointer after m_adj()
Submitted by:	Kohji Okuno <okuno.kohji jp.panasonic.com>
Reviewed by:	yongari
2012-11-26 19:45:01 +00:00
adrian
2191ae37a2 Fix up a compile time warning if INET6 isn't defined. 2012-11-18 04:51:46 +00:00
hrs
456b7a9341 Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to
userland via routing socket or sysctl.  This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:

 sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);

This behavior can be controlled by net.inet6.ip6.deembed_scopeid.  This is
set to 1 by default (sin6_scope_id will be filled in the kernel).

Reviewed by:	bz
2012-11-17 20:19:00 +00:00
ghelmer
249e3589f9 Work around a race in bpfread() by validating the hold buffer pointer
before freeing it. Otherwise, we can lose a buffer and cause a panic
in catchpacket().
2012-11-06 21:07:04 +00:00
ae
4354018055 Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by:	andre
2012-11-02 01:20:55 +00:00
glebius
f79061ff05 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
ae
71112b5a8e Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by:	Yandex LLC
Discussed with:	net@
MFC after:	2 weeks
2012-10-25 09:39:14 +00:00
glebius
278c83657a Fix fallout from r240071. If destination interface lookup fails,
we should broadcast a packet, not try to deliver it to NULL.

Reported by:	rpaulo
2012-10-24 18:33:44 +00:00
glebius
5cc3ac5902 Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
melifaro
030e8d5bab Make PFIL use per-VNET lock instead of per-AF lock. Since most used packet
filters (ipfw and PF) use the same ruleset with the same lock for both
AF_INET and AF_INET6 there is no need in more fine-grade locking.
However, it is possible to request personal lock by specifying
PFIL_FLAG_PRIVATE_LOCK flag in pfil_head structure (see pfil.9 for
more details).

Export PFIL lock via rw_lock(9)/rm_lock(9)-like API permitting pfil consumers
to use this lock instead of own lock. This help reducing locks on main
traffic path.

pfil_assert() is currently not implemented due to absense of rm_assert().
Waiting for some kind of r234648 to be merged in HEAD.

This change is part of bigger patch reducing routing locking.

Sponsored by:	Yandex LLC
Reviewed by:	glebius, ae
OK'd by:	silence on net@
MFC after:	3 weeks
2012-10-22 14:10:17 +00:00
andre
91a702221a Update to previous r241688 to use __func__ instead of spelled out function
name in log(9) message.

Suggested by:	glebius
2012-10-19 10:07:55 +00:00
andre
a2f4fcfc3d Use LOG_WARNING level in in_attachdomain1() instead of printf().
Submitted by:	vijju.singh-at-gmail.com
2012-10-18 14:08:26 +00:00
andre
34a9a386cb Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
2012-10-18 13:57:24 +00:00
glebius
7bdf4320f7 Utilize new macro to initialize if_baudrate(). 2012-10-18 09:57:56 +00:00
glebius
d5815329e7 Fix VIMAGE build.
Reported by:	Nikolai Lifanov <lifanov mail.lifanov.com>
Pointy hat to:	glebius
2012-10-17 21:19:27 +00:00
emax
0dfb309a1f provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf.
again, use ixgbe(4) as an example of how to use new helper function.

Reviewed by:	jhb
MFC after:	1 week
2012-10-17 19:24:13 +00:00
delphij
37fb264720 Fix build. 2012-10-17 08:19:08 +00:00
emax
f4127691ff report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after:	1  week
2012-10-16 22:43:14 +00:00
emax
214df82afa introduce concept of ifi_baudrate power factor. the idea is to work
around the problem where high speed interfaces (such as ixgbe(4))
are not able to report real ifi_baudrate. bascially, take a spare
byte from struct if_data and use it to store ifi_baudrate power
factor. in other words,

real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor

this should be backwards compatible with old binaries. use ixgbe(4)
as an example on how drivers would set ifi_baudrate power factor

Discussed with:	kib, scottl, glebius
MFC after:	1 week
2012-10-16 20:18:15 +00:00