Commit Graph

2099 Commits

Author SHA1 Message Date
andre
a8987888fb Shave 40 unused bytes from struct tcpcb. 2004-10-22 19:55:04 +00:00
andre
42e8443fa1 When printing the initialization string and IPDIVERT is not compiled into the
kernel refer to it as "loadable" instead of "disabled".
2004-10-22 19:18:06 +00:00
andre
7c8480e7f1 Refuse to unload the ipdivert module unless the 'force' flag is given to kldunload.
Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8)
man pages.
2004-10-22 19:12:01 +00:00
andre
513a1a5fd1 Destroy the UMA zone on unload. 2004-10-19 22:51:20 +00:00
andre
952f796999 Slightly extend the locking during unload to fully cover the protocol
deregistration.  This does not entirely close the race but narrows the
even previously extremely small chance of a race some more.
2004-10-19 22:08:13 +00:00
rwatson
009582eb77 Annotate a newly introduced race present due to the unloading of
protocols: it is possible for sockets to be created and attached
to the divert protocol between the test for sockets present and
successful unload of the registration handler.  We will need to
explore more mature APIs for unregistering the protocol and then
draining consumers, or an atomic test-and-unregister mechanism.
2004-10-19 21:35:42 +00:00
andre
9f43dad9fc Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability
of protocols.  The call to divert_packet() is done through a function pointer.  All
semantics of IPDIVERT remain intact.  If IPDIVERT is not loaded ipfw will refuse to
install divert rules and  natd will complain about 'protocol not supported'.  Once
it is loaded both will work and accept rules and open the divert socket.  The module
can only be unloaded if no divert sockets are open.  It does not close any divert
sockets when an unload is requested but will return EBUSY instead.
2004-10-19 21:14:57 +00:00
andre
40693cc7d9 Properly declare the "net.inet" sysctl subtree. 2004-10-19 21:06:14 +00:00
andre
11ab41ab2f Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACER
to document that this value is globally assigned for a special purpose and
may not be reused within the IPPROTO number space.
2004-10-19 20:59:01 +00:00
andre
cf99677e64 Make use of the PROTO_SPACER functionality for dynamically loadable
protocols in inetsw[] and define initially eight spacer slots.

Remove conflicting declaration 'struct pr_usrreqs nousrreqs'.  It is
now declared and initialized in kern/uipc_domain.c.
2004-10-19 15:58:22 +00:00
andre
00d43f4bd8 Support for dynamically loadable and unloadable IP protocols in the ipmux.
With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain.  However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.

The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way.  To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required.  The function does not allow to overwrite an already
registered protocol.  The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.
2004-10-19 15:45:57 +00:00
andre
23afa2eef1 Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules. 2004-10-19 14:34:13 +00:00
andre
dcb4801af4 Make comments more clear. Change the order of one if() statement to check the
more likely variable first.
2004-10-19 14:31:56 +00:00
rwatson
4b81ce6dd2 Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
rwatson
cf6eacbc1e Don't release the udbinfo lock until after the last use of UDP inpcb
in udp_input(), since the udbinfo lock is used to prevent removal of
the inpcb while in use (i.e., as a form of reference count) in the
in-bound path.

RELENG_5 candidate.
2004-10-12 20:03:56 +00:00
rwatson
338d307146 Modify the thrilling "%D is using my IP address %s!" message so that
it isn't printed if the IP address in question is '0.0.0.0', which is
used by nodes performing DHCP lookup, and so constitute a false
positive as a report of misconfiguration.
2004-10-12 17:10:40 +00:00
rwatson
84c4649f53 When the access control on creating raw sockets was modified so that
processes in jail could create raw sockets, additional access control
checks were added to raw IP sockets to limit the ways in which those
sockets could be used.  Specifically, only the socket option IP_HDRINCL
was permitted in rip_ctloutput().  Other socket options were protected
by a call to suser().  This change was required to prevent processes
in a Jail from modifying system properties such as multicast routing
and firewall rule sets.

However, it also introduced a regression: processes that create a raw
socket with root privilege, but then downgraded credential (i.e., a
daemon giving up root, or a setuid process switching back to the real
uid) could no longer issue other unprivileged generic IP socket option
operations, such as IP_TOS, IP_TTL, and the multicast group membership
options, which prevented multicast routing daemons (and some other
tools) from operating correctly.

This change pushes the access control decision down to the granularity
of individual socket options, rather than all socket options, on raw
IP sockets.  When rip_ctloutput() doesn't implement an option, it will
now pass the request directly to in_control() without an access
control check.  This should restore the functionality of the generic
IP socket options for raw sockets in the above-described scenarios,
which may be confirmed with the ipsockopt regression test.

RELENG_5 candidate.

Reviewed by:	csjp
2004-10-12 16:47:25 +00:00
rwatson
a475461b84 Acquire the send socket buffer lock around tcp_output() activities
reaching into the socket buffer.  This prevents a number of potential
races, including dereferencing of sb_mb while unlocked leading to
a NULL pointer deref (how I found it).  Potentially this might also
explain other "odd" TCP behavior on SMP boxes (although  haven't
seen it reported).

RELENG_5 candidate.
2004-10-09 16:48:51 +00:00
rwatson
ccb2845f23 When running with debug.mpsafenet=0, initialize IP multicast routing
callouts as non-CALLOUT_MPSAFE.  Otherwise, they may trigger an
assertion regarding Giant if they enter other parts of the stack from
the callout.

MFC after:	3 days
Reported by:	Dikshie < dikshie at ppk dot itb dot ac dot id >
2004-10-07 14:13:35 +00:00
ps
c8e4aa1cd5 - Estimate the amount of data in flight in sack recovery and use it
to control the packets injected while in sack recovery (for both
  retransmissions and new data).
- Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c.
- Add a new sysctl (net.inet.tcp.sack.initburst) that controls the
  number of sack retransmissions done upon initiation of sack recovery.

Submitted by:	Mohan Srinivasan <mohans@yahoo-inc.com>
2004-10-05 18:36:24 +00:00
green
cb606898b9 Add support to IPFW for matching by TCP data length. 2004-10-03 00:47:15 +00:00
green
4f70622005 Add support to IPFW for classification based on "diverted" status
(that is, input via a divert socket).
2004-10-03 00:26:35 +00:00
green
a1ab5f0c7d Add to IPFW the ability to do ALTQ classification/tagging. 2004-10-03 00:17:46 +00:00
green
3f01e230b4 Validate the action pointer to be within the rule size, so that trying to
add corrupt ipfw rules would not potentially panic the system or worse.
2004-09-30 17:42:00 +00:00
mlaier
b65eae4c19 Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by:		rwatson
A lot of work by:	csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by:		rwatson, csjp
Tested by:		-pf, -ipfw, LINT, csjp and myself
MFC after:		3 days

LOR IDs:		14 - 17 (not fixed yet)
2004-09-29 04:54:33 +00:00
rwatson
e455dd69f8 Assign so_pcb to NULL rather than 0 as it's a pointer.
Spotted by:	dwhite
2004-09-29 04:01:13 +00:00
maxim
72a6bed376 o Turn net.inet.ip.check_interface sysctl off by default.
When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in
rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to
0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only.  Among with the
fact this knob is not documented it breaks POLA especially in bridge
environment.

OK'ed by:	andre
Reviewed by:	-current
2004-09-24 12:18:40 +00:00
andre
d4e3412583 Fix an out of bounds write during the initialization of the PF_INET protocol
family to the ip_protox[] array.  The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).

Add more comments ip_init().
2004-09-16 18:33:39 +00:00
andre
44b7e0a719 Clarify some comments for the M_FASTFWD_OURS case in ip_input(). 2004-09-15 20:17:03 +00:00
andre
5b67b5c1f3 Remove the last two global variables that are used to store packet state while
it travels through the IP stack.  This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.

The IP source route options of a packet are now stored in a mtag instead of the
global variable.
2004-09-15 20:13:26 +00:00
andre
3767c4cf7b Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled into
the kernel.  Return EINVAL instead.
2004-09-13 19:27:23 +00:00
andre
2c213c186f If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don't
prepend an already existing tag again.  Instead unlink it and prepend it again
to have it as the first tag in the chain.

PR:		kern/71380
2004-09-13 19:20:14 +00:00
andre
e837c32545 Make comments more clear for the packet changed cases after pfil hooks. 2004-09-13 17:09:06 +00:00
andre
532ad0416b Fix ip_input() fallback for the destination modified cases (from the packet
filters).  After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS
tagged packets to have ip_len and ip_off in host byte order instead of
network byte order.

PR:		kern/71652
Submitted by:	mlaier (patch)
2004-09-13 17:01:53 +00:00
andre
eba7c4085c Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copied
and sent to the DIVERT socket while the original packet continues with the
next rule.  Unlike a normally diverted packet no IP reassembly attemts are
made on tee'd packets and they are passed upwards totally unmodified.

Note: This will not be MFC'd to 4.x because of major infrastucture changes.

PR:		kern/64240 (and many others collapsed into that one)
2004-09-13 16:46:05 +00:00
glebius
c887bf2814 Check flag do_bridge always, even if kernel was compiled without
BRIDGE support. This makes dynamic bridge.ko working.

Reviewed by:	sam
Approved by:	julian (mentor)
MFC after:	1 week
2004-09-09 12:34:07 +00:00
jmg
af7268d8eb revert comment from rev1.158 now that rev1.225 backed it out..
MFC after:	3 days
2004-09-06 15:48:38 +00:00
glebius
2087edb938 Recover normal behavior: return EINVAL to attempt to add a divert rule
when module is built without IPDIVERT.

Silence from:	andre
Approved by:	julian (mentor)
2004-09-05 20:06:50 +00:00
jmg
8e8293b765 fix up socket/ip layer violation... don't assume/know that
SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
2004-09-05 02:34:12 +00:00
andre
2126402238 Apply error and success logic consistently to the function netisr_queue() and
its users.

netisr_queue() now returns (0) on success and ERRNO on failure.  At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.

Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success.  Due to this
schednetisr() was never called to kick the scheduling of the isr.  However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.

PR:		kern/70988
Found by:	MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after:	3 days
2004-08-27 18:33:08 +00:00
andre
c3936067e7 In the case the destination of a packet was changed by the packet filter
to point to a local IP address; and the packet was sourced from this host
we fill in the m_pkthdr.rcvif with a pointer to the loopback interface.

Before the function ifunit("lo0") was used to obtain the ifp.  However
this is sub-optimal from a performance point of view and might be dangerous
if the loopback interface has been renamed.  Use the global variable 'loif'
instead which always points to the loopback interface.

Submitted by:	brooks
2004-08-27 15:39:34 +00:00
andre
b32a1f9e40 Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion. 2004-08-27 15:32:28 +00:00
andre
d243747d92 Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option.  All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over.  This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
2004-08-27 15:16:24 +00:00
ru
fdbaf5c59c Revert the last change to sys/modules/ipfw/Makefile and fix a
standalone module build in a better way.

Silence from:	andre
MFC after:	3 days
2004-08-26 14:18:30 +00:00
pjd
8c34a02cfb Allocate memory when dumping pipes with M_WAITOK flag.
On a system with huge number of pipes, M_NOWAIT failes almost always,
because of memory fragmentation.
My fix is different than the patch proposed by Pawel Malachowski,
because in FreeBSD 5.x we cannot sleep while holding dummynet mutex
(in 4.x there is no such lock).
My fix is also ugly, but there is no easy way to prepare nice and clean fix.

PR:		kern/46557
Submitted by:	Eugene Grosbein <eugen@grosbein.pp.ru>
Reviewed by:	mlaier
2004-08-25 09:31:30 +00:00
mlaier
252cbf1c2a Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.
Previously the early drop was disabled unconditionally for ALTQ-enabled
kernels.

This should give some benefit for the normal gateway + LAN-server case with
a busy LAN leg and an ALTQ managed uplink.

Reviewed and style help from:	cperciva, pjd
2004-08-22 16:42:28 +00:00
rwatson
2989f4181e When sliding the m_data pointer forward, update m_pktrhdr.len as well
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain.  The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.

Foot shot:	green
2004-08-22 01:32:48 +00:00
csjp
657b6f650c When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.

This commit makes the following changes:

- Add a comment to rtioctl warning developers that if they add
  any ioctl commands, they should use super-user checks where necessary,
  as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
  and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
  is PRISON root, make sure the socket option name is IP_HDRINCL,
  otherwise deny the request.

Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.

Looking forward, we will probably want to eliminate the
references to curthread.

This may be a MFC candidate for RELENG_5.

Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-21 17:38:57 +00:00
rwatson
51b320a56b When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header.  If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer.  This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
2004-08-21 16:14:04 +00:00
andre
80ff6433dd Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaning
up its remains.  Do not terminate 'if' lines with ';'.

Spotted by:	claudio@OpenBSD.ORG (sitting 3m from my desk)
Pointy hat to:	andre
2004-08-20 00:36:55 +00:00