Commit Graph

1595 Commits

Author SHA1 Message Date
Robert Watson
6063b5f0ad Style update: use newer style function prototypes in if_sl.c in
prep for merging locking.
2004-08-22 21:32:52 +00:00
Robert Watson
201a36deca Don't need to assert Giant in fw_output(), only in the firewire start
routine.
2004-08-22 14:48:55 +00:00
Robert Watson
b062951a3d If a tunable for the routing socket netisr queue max is defined, allow it
to override the default value, rather than the default value overriding
the tunable.
2004-08-21 21:45:40 +00:00
Robert Watson
190a4c9436 Allow the size of the routing socket netisr queue to be configured using
the tunable or sysctl 'net.route.netisr_maxqlen'.  Default the maximum
depth to 256 rather than IFQ_MAXLEN due to the downsides of dropping
routing messages.

MT5 candidate.

Discussed with:	mdodd, mlaier, Vincent Jardin <jardin at 6wind.com>
2004-08-21 21:20:06 +00:00
Christian S.J. Peron
5090559b7f When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.

This commit makes the following changes:

- Add a comment to rtioctl warning developers that if they add
  any ioctl commands, they should use super-user checks where necessary,
  as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
  and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
  is PRISON root, make sure the socket option name is IP_HDRINCL,
  otherwise deny the request.

Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.

Looking forward, we will probably want to eliminate the
references to curthread.

This may be a MFC candidate for RELENG_5.

Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-21 17:38:57 +00:00
Andre Oppermann
9b932e9e04 Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI.  The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

 In general ipfw is now called through the PFIL_HOOKS and most associated
 magic, that was in ip_input() or ip_output() previously, is now done in
 ipfw_check_[in|out]() in the ipfw PFIL handler.

 IPDIVERT is entirely handled within the ipfw PFIL handlers.  A packet to
 be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
 reassembly.  If not, or all fragments arrived and the packet is complete,
 divert_packet is called directly.  For 'tee' no reassembly attempt is made
 and a copy of the packet is sent to the divert socket unmodified.  The
 original packet continues its way through ip_input/output().

 ipfw 'forward' is done via m_tag's.  The ipfw PFIL handlers tag the packet
 with the new destination sockaddr_in.  A check if the new destination is a
 local IP address is made and the m_flags are set appropriately.  ip_input()
 and ip_output() have some more work to do here.  For ip_input() the m_flags
 are checked and a packet for us is directly sent to the 'ours' section for
 further processing.  Destination changes on the input path are only tagged
 and the 'srcrt' flag to ip_forward() is set to disable destination checks
 and ICMP replies at this stage.  The tag is going to be handled on output.
 ip_output() again checks for m_flags and the 'ours' tag.  If found, the
 packet will be dropped back to the IP netisr where it is going to be picked
 up by ip_input() again and the directly sent to the 'ours' section.  When
 only the destination changes, the route's 'dst' is overwritten with the
 new destination from the forward m_tag.  Then it jumps back at the route
 lookup again and skips the firewall check because it has been marked with
 M_SKIP_FIREWALL.  ipfw 'forward' has to be compiled into the kernel with
 'option IPFIREWALL_FORWARD' to enable it.

 DUMMYNET is entirely handled within the ipfw PFIL handlers.  A packet for
 a dummynet pipe or queue is directly sent to dummynet_io().  Dummynet will
 then inject it back into ip_input/ip_output() after it has served its time.
 Dummynet packets are tagged and will continue from the next rule when they
 hit the ipfw PFIL handlers again after re-injection.

 BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
 they did before.  Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

 conf/files
	Add netinet/ip_fw_pfil.c.

 conf/options
	Add IPFIREWALL_FORWARD option.

 modules/ipfw/Makefile
	Add ip_fw_pfil.c.

 net/bridge.c
	Disable PFIL_HOOKS if ipfw for bridging is active.  Bridging ipfw
	is still directly invoked to handle layer2 headers and packets would
	get a double ipfw when run through PFIL_HOOKS as well.

 netinet/ip_divert.c
	Removed divert_clone() function.  It is no longer used.

 netinet/ip_dummynet.[ch]
	Neither the route 'ro' nor the destination 'dst' need to be stored
	while in dummynet transit.  Structure members and associated macros
	are removed.

 netinet/ip_fastfwd.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_fw.h
	Removed 'ro' and 'dst' from struct ip_fw_args.

 netinet/ip_fw2.c
	(Re)moved some global variables and the module handling.

 netinet/ip_fw_pfil.c
	New file containing the ipfw PFIL handlers and module initialization.

 netinet/ip_input.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.  ip_forward() does not longer require
	the 'next_hop' struct sockaddr_in argument.  Disable early checks
	if 'srcrt' is set.

 netinet/ip_output.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_var.h
	Add ip_reass() as general function.  (Used from ipfw PFIL handlers
	for IPDIVERT.)

 netinet/raw_ip.c
	Directly check if ipfw and dummynet control pointers are active.

 netinet/tcp_input.c
	Rework the 'ipfw forward' to local code to work with the new way of
	forward tags.

 netinet/tcp_sack.c
	Remove include 'opt_ipfw.h' which is not needed here.

 sys/mbuf.h
	Remove m_claim_next() macro which was exclusively for ipfw 'forward'
	and is no longer needed.

Approved by:	re (scottl)
2004-08-17 22:05:54 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Robert Watson
3b7d076fe7 Use IFQ_SET_MAXLEN() to set the maximum queue depth of the routing
socket netisr queue.

Pointed out by:	winter
2004-08-13 22:23:21 +00:00
Tony Ackerman
b59db7bbe8 Added two new media types for 10GBASE-SR and 10GBASE-LR 2004-08-12 23:48:26 +00:00
Andre Oppermann
2dc1d58164 Convert the routing table to use an UMA zone for rtentries. The zone is
called "rtentry".

This saves a considerable amount of kernel memory.  R_Zmalloc previously
used 256 byte blocks (plus kmalloc overhead) whereas UMA only needs 132
bytes.

Idea from:	OpenBSD
2004-08-11 17:26:56 +00:00
Maksim Yevmenkin
285b72aa78 Set IFF_RUNNING flag on the interface as soon as the control device is opened. 2004-08-11 00:12:27 +00:00
Max Laier
de0332d4fa Add a "void *if_carp" placeholder to struct ifnet with prospect to bring in
the "Common address redundancy protocol" (CARP) during the 5-STABLE cycle.
Hence doing the ABI break now.

Approved by:	re (scottl)
2004-08-07 09:32:04 +00:00
Robert Watson
ebcd28e669 As SLIP directly accesses the tty code from its if_start() routine,
mark if_sl as IFF_NEEDSGIANT.
2004-08-06 22:41:13 +00:00
Peter Pentchev
3f35d5150b Do not attempt to clean up data that has not been initialized yet.
This fixes two kernel panics on boot when the xl driver fails to
allocate bus/port/memory resources.

Reviewed by:	silence on -net
2004-08-06 09:08:33 +00:00
Maxim Sobolev
97c4cd9853 Set ip_v field properly.
PR:	kern/69957
2004-08-05 08:12:46 +00:00
Robert Watson
46691dd8d7 Do a lockless read of the BPF interface structure descriptor list head
before grabbing BPF locks to see if there are any entries in order to
avoid the cost of locking if there aren't any.  Avoids a mutex lock/
unlock for each packet received if there are no BPF listeners.
2004-08-05 02:37:36 +00:00
Alexander Kabaev
445e045b0d Avoid casts as lvalues. 2004-07-28 06:59:55 +00:00
Alexander Kabaev
a0ec13c419 Initialize ; variable eraly to shut up GCC warning. 2004-07-28 06:48:36 +00:00
Robert Watson
af5e59bf28 Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment.  To do this, introduce if_start(), a
wrapper function for ifp->if_start().  If the interface can run MPSAFE,
it directly dispatches into the interface start routine.  If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch.  However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.
2004-07-27 23:20:45 +00:00
Yaroslav Tykhiy
d6fcfb7ae1 Stop tinkering with the parent's VLAN_MTU capability.
Now it is user-controlled through ifconfig(8).

The former ``automagic'' way of operation created more
trouble than good.  First, VLAN_MTU consumers other than
vlan(4) had appeared, e.g., ng_vlan(4).  Second, there was
no way to disable VLAN_MTU manually if it were causing
trouble, e.g., data corruption.

Dropping the ``automagic'' should be completely invisible
to the user since
a) all the drivers supporting VLAN_MTU
have it enabled by default, and in the first place
b) there is only one driver that can really toggle VLAN_MTU
in the hardware under its control (it's fxp(4), to which
I added VLAN_MTU controls to illustrate the principle.)
2004-07-26 14:46:04 +00:00
Robert Watson
572bde2aea Prefer NULL to '0' when checking a pointer value. 2004-07-24 16:58:56 +00:00
Brooks Davis
b4e9f8379e Actually free the unit when destroying the interface.
Reported by:	la at delfi.lt
Tested by:	la at delfi.lt
PR:		68618
2004-07-22 22:50:15 +00:00
Max Laier
ca64c799d4 When removing the last reference to a cloner, do not try to unlock twice -
esp. not since the backing memory was just freed.

Reviewed by:	rwatson
2004-07-20 21:44:28 +00:00
Robert Watson
08f85b089e Comment clarifying debug_mpsafenet. 2004-07-18 21:50:22 +00:00
Robert Watson
8bbfdc98e4 Gratuitous whitespace change to un-wrap a short line. 2004-07-18 19:53:35 +00:00
Poul-Henning Kamp
672c05d49c Preparation commit for the tty cleanups that will follow in the near
future:

rename ttyopen() -> tty_open() and ttyclose() -> tty_close().

We need the ttyopen() and ttyclose() for the new generic cdevsw
functions for tty devices in order to have consistent naming.
2004-07-15 20:47:41 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Max Laier
bfe4641596 Fix a copy-and-paste-o in IFQ_DRV_PREPEND - all pointyhats to me.
While here also fix a (not less stupid) braino in IFQ_DRV_PURGE.

Reported-by:	clement
Tested-by:	clement (_PREPEND in sis(4))
2004-07-14 13:31:41 +00:00
Robert Watson
efe0ab01b2 Convert SLIP to using C99 structure initialization for its struct
linesw.
2004-07-14 05:01:40 +00:00
Bruce M Simpson
086e98c437 Use ETHER_IS_MULTICAST() consistently in ether_resolvemulti().
Reviewed by:	jmallett
2004-07-09 05:26:27 +00:00
Bruce M Simpson
ca28620f0d Use M_ZERO instead of bzero(). 2004-07-06 03:34:16 +00:00
Bruce M Simpson
9b3d77e7c9 Be consistent and use bzero() instead of memset(). 2004-07-06 03:29:41 +00:00
Bruce M Simpson
b3c9a01e5e Use M_ZERO instead of memset() (!). 2004-07-06 03:28:24 +00:00
Bruce M Simpson
e1a8c3dc33 Use M_ZERO instead of bzero(). 2004-07-06 03:26:26 +00:00
Bruce M Simpson
60323f48bd Replace a bzero() after malloc() with M_ZERO. 2004-07-06 03:16:55 +00:00
Bruce M Simpson
832cb4aef7 Style. 2004-07-06 03:07:50 +00:00
Robert Watson
28b8605232 In the BPF and ethernet bridging code, don't allow callouts to execute
without Giant if we're not debug.mpsafenet=1.
2004-07-05 16:28:31 +00:00
Bruce M Simpson
29c2dfbe32 Workaround a locking problem in vlan(4). vlan_setmulti() may be called
with sleepable locks held from further up in the network stack, and
attempts to allocate memory to hold multicast group membership information
with M_WAITOK.

This panic was triggered specifically when an exiting routing daemon
process closes its raw sockets after joining multicast groups on them.

While we're here, comment some possible locking badness.

PR:	kern/48560
2004-07-04 18:32:54 +00:00
Bruce M Simpson
15a66c21c0 style(9)/whitespace cleanup while I'm in this file. 2004-07-04 16:43:24 +00:00
Bruce M Simpson
4c9e94d42c The net.link.ether.bridge.enable sysctl MIB variable enables bridge
functionality by setting to a non-zero value. This is an integer, but
is treated as a boolean by the code, so clamp it to a boolean value
when set so as to avoid unnecessary bridge reinitialization if it's
changed to another value.

PR:		kern/61174
Requested by:	Bruce Cran
2004-07-04 15:53:28 +00:00
Brooks Davis
f93dfa28b1 Don't announce the ethernet address when it's 00:00:00:00:00:00. It's
not of any interest.  This primairly happens when vlan(4) interfaces are
created.
2004-07-02 19:44:59 +00:00
Max Laier
7929aa036c Bring in the first chunk of altq driver modifications. This covers the
following drivers: bfe(4), em(4), fxp(4), lnc(4), tun(4), de(4) rl(4),
sis(4) and xl(4)

More patches are pending on: http://peoples.freebsd.org/~mlaier/ Please take
a look and tell me if "your" driver is missing, so I can fix this.

Tested-by:	many
No-objection:	-current, -net
2004-07-02 12:16:02 +00:00
Roman Kurakin
e874bf6648 Do not m_free packet since IF_HANDOFF (called from netisr_queue) will
do it for us, just count it.
2004-06-28 15:32:24 +00:00
Pawel Jakub Dawidek
0a44517d3a Those are unneeded too. 2004-06-27 09:06:10 +00:00
Pawel Jakub Dawidek
46e3b1cbe7 Add two missing includes and remove two uneeded.
This is quite serious fix, because even with MAC framework compiled in,
MAC entry points in those two files were simply ignored.
2004-06-27 09:03:22 +00:00
Poul-Henning Kamp
cb9ea5f4cb Pick the hotchar out of the tty structure instead of caching private
copies.

No current line disciplines have a dynamically changing hotchar, and
expecting to receive anything sensible during a change in ldisc is
insane so no locking of the hotchar field is necessary.
2004-06-26 09:20:07 +00:00
Poul-Henning Kamp
4776c07426 Fix line discipline switching issues: If opening a new ldisc fails,
we have to revert to TTYDISC which we know will successfully open
rather than try the previous ldisc which might also fail to open.

Do not let ldisc implementations muck about with ->t_line, and remove
code which checks for reopens, it should never happen.

Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation
initialize it in their open routines.  Reset to zero when we enter
TTYDISC.  ("no" should really be -1 since zero could be a valid
hotchar for certain old european mainframe protocols.)
2004-06-26 08:44:04 +00:00
Roman Kurakin
1127aac31e Do not count loobacks as other fuilures.
As a result magic will not be rejected any more in case of loopback.

Discussed with:	joerg@
2004-06-25 10:25:33 +00:00
Joerg Wunsch
b46f884b80 Add a couple of #ifdef DEBUG printf()s in vlan_input() I found to be
useful when debugging the ether_demux() problem (when bridging over
VLANs).
2004-06-24 12:32:41 +00:00
Joerg Wunsch
cd0cd0149b When considering an ethernet frame that is not destined for us, do not
only allow this to be further processed when bridging is active on
that interface, but also if the current packet has a VLAN tag and
VLANs are active on our interface.  This gives the VLAN layers a
chance to also consider the packet (and perhaps drop it instead of the
main dispatcher).

This fixes a situation where bridging was only active on VLAN
interfaces but ether_demux() called on behalf of the main interface
had already thrown the packet away.

MFC after:	4 weeks
2004-06-24 12:31:44 +00:00
Dag-Erling Smørgrav
d7647d966e Make dependencies on the TCP/IP stack conditional on INET / INET6. This
makes it possible to build a kernel with NIC drivers but no TCP/IP stack.

Sponsored by:	Teleplan AS
2004-06-24 10:58:08 +00:00
Brooks Davis
f889d2ef8d Major overhaul of pseudo-interface cloning. Highlights include:
- Split the code out into if_clone.[ch].
 - Locked struct if_clone. [1]
 - Add a per-cloner match function rather then simply matching names of
   the form <name><unit> and <name>.
 - Use the match function to allow creation of <interface>.<tag>
   vlan interfaces.  The old way is preserved unchanged!
 - Also the match function to allow creation of stf(4) interfaces named
   stf0, stf, or 6to4.  This is the only major user visible change in
   that "ifconfig stf" creates the interface stf rather then stf0 and
   does not print "stf0" to stdout.
 - Allow destroy functions to fail so they can refuse to delete
   interfaces.  Currently, we forbid the deletion of interfaces which
   were created in the init function, particularly lo0, pflog0, and
   pfsync0.  In the case of lo0 this was a panic implementation so it
   does not count as a user visiable change. :-)
 - Since most interfaces do not need the new functionality, an family of
   wrapper functions, ifc_simple_*(), were created to wrap old style
   cloner functions.
 - The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
   IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
   instead.

Submitted by:   Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by:    andre, mlaier
Discussed on:	net
2004-06-22 20:13:25 +00:00
Mark Murray
3410878421 Give zlib the ability to be a module that can be depended on,
in the MODULE_DEPEND() sense.
2004-06-20 17:42:35 +00:00
Bruce Evans
7a637a637e Include <sys/_lock.h>'s prerequisite <sys/queue.h> before including the
former, not after.

Don't hide this bug by including <sys/queue.h> in <sys/_lock.h>.
2004-06-19 14:58:35 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Max Laier
affc907d0c Replace IF_HANDOFF with new IFQ_HANDOFF to enqueue with ALTQ once enabled on
the respective drivers.
2004-06-15 23:57:42 +00:00
Robert Watson
730262cdf7 Lock down rawcb_list, a global list of control blocks for raw sockets,
using rawcb_mtx.  Hold this mutex while modifying or iterating over
the control list; this means that the mutex is held over calls into
socket delivery code, which no longer causes a lock order reversal as
the routing socket code uses a netisr to avoid recursing socket ->
routing -> socket.

Note: Locking of IPsec consumers of rawcb_list is not included in this
commit.
2004-06-15 04:13:59 +00:00
Max Laier
62d7f46e88 Fix a typeo in IFQ_HANDOFF. 2004-06-15 03:40:39 +00:00
Max Laier
4cb655c020 Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
2004-06-15 01:45:19 +00:00
Doug Rabson
941d37182e Fix big-endian build. 2004-06-14 08:17:51 +00:00
Max Laier
930e2cfa1f Unbreak non-ALTQ kernel linking. I forgot about tbr_dequeue.
In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.

Note: Check new features w/ LINT *and* w/ LINT minus the new feature.

Found-by:	rwatson
2004-06-14 03:55:09 +00:00
Doug Rabson
eedccad06a Add MAC framework bits to the output path. 2004-06-13 19:55:16 +00:00
Doug Rabson
d9eb70ad37 Remove advertising clause. 2004-06-13 19:15:44 +00:00
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Doug Rabson
b8b3323469 Add a new driver to support IP over firewire. This driver is intended to
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
2004-06-13 10:54:36 +00:00
Robert Watson
395a08c904 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
Robert Watson
935becd8dd Constify raw_sendspace and raw_recvspace, as they're not mutable. 2004-06-11 03:52:56 +00:00
Robert Watson
b8f9429d55 Switch to conditionally acquiring and dropping Giant around calls into
ifp->if_output() basedd on debug.mpsafenet.  That way once bpfwrite()
can be called without Giant, it will acquire Giant (if desired) before
entering the network stack.
2004-06-11 03:47:21 +00:00
Robert Watson
8240bf1e04 Un-staticize 'dst' sockaddr in the stack of bpfwrite() to prevent
the need to synchronize access to the structure.  I believe this
should fit into the stack under the necessary circumstances, but
if not we can either add synchronization or use a thread-local
malloc for the duration.
2004-06-11 03:45:42 +00:00
Robert Watson
d989c7b389 Introduce a netisr to deliver kernel-generated routing, avoiding
recursive entering of the socket code from the routing code:

- Modify rt_dispatch() to bundle up the sockaddr family, if any,
  associated with a pending mbuf to dispatch to routing sockets, in
  an m_tag on the mbuf.

- Allocate NETISR_ROUTE for use by routing sockets.

- Introduce rtsintrq, an ifqueue to be used by the netisr, and
  introduce rts_input(), a function to unbundle the tagged sockaddr
  and inject the mbuf and address into raw_input(), which previously
  occurred in rt_dispatch().

- Introduce rts_init() to initialize rtsintrq, its mutex, and
  register the netisr.  Perform this at the same point in system
  initialization as setup of the domains.

This change introduces asynchrony between the generation of a
pending routing socket message and delivery to sockets for use
by userspace.  It avoids socket->routing->rtsock->socket use and
helps to avoid lock order reversals between the routing code and
socket code (in particular, raw socket control blocks), as route
locks are held over calls to rt_dispatch().

Reviewed by:		"George V.Neville-Neil" <gnn@neville-neil.com>
Conceptual head nod by:	sam
2004-06-09 02:48:23 +00:00
Poul-Henning Kamp
3786c125c7 Use ldisc_[de]register() instead of frobbing linesw[] directly. 2004-06-07 20:43:37 +00:00
Christian Weisgerber
16b4a34316 Add helper functions to calculate the standard ethernet CRC in
little/big endian fashion, so that network drivers can just reference
the standard implementation and don't have to bring their own.

As discussed on arch@.

Obtained from:	NetBSD
2004-06-02 21:34:14 +00:00
Poul-Henning Kamp
5dba30f15a add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
Poul-Henning Kamp
41ee9f1c69 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
David Malone
bde800e688 Make the comment for DLT_NULL slightly more accurate.
PR:		62272
Submitted by:	Radim Kolar <hsn@netmag.cz>
MFC after:	1 week
2004-05-30 17:03:48 +00:00
Yaroslav Tykhiy
6cbd3e99ec if_printf() won't emit a newline unless told to. 2004-05-26 11:41:26 +00:00
Roman Kurakin
9105841d31 Keepalive timer should be added if we does not have any sppp consumers before
and should be deleted if we do not have any anymore.
2004-05-25 21:54:07 +00:00
Yaroslav Tykhiy
656acce4f4 After all the relevant drivers have been fixed, fix vlan(4) itself
WRT manipulating capabilities of the parent interface:

- use ioctl(SIOCSIFCAP) to toggle VLAN_MTU (the way that was done
  before was just wrong);

- use the right order of conditional clauses to set the MTU fudge
  (that is logically independent from toggling VLAN_MTU.)
2004-05-25 14:30:12 +00:00
Maxime Henrion
7131aeaea1 Remove another redundant if_output initialization. 2004-05-24 11:01:45 +00:00
Yaroslav Tykhiy
b08347a005 Consult parent's if_capenable for active VLAN-related capabilities.
This change is possible since all the relevant drivers have been
fixed to set if_capenable properly.  The field if_capabilities tracks
supported capabilities, which may be disabled administratively.

Inheriting checksum offload support from the parent interface isn't
that easy because the checksumming capabilities of the parent may be
toggled on the fly.  Disable the code for now.
2004-05-23 22:32:15 +00:00
Ruslan Ermilov
d35bcd3bbf Added dependency on the miibus module. 2004-05-21 08:43:38 +00:00
Christian S.J. Peron
3581cc66bb Zero the un-used portions of the struct sockaddr data before sending
it back to userspace, so it does not break bind(2) on raw sockets in jails.

Currently some processes, like traceroute(8) construct a routing request
to determine its source address based on the destination. This sockaddr
data is fed directly to bind(2). When bind calls ifa_ifwithaddr(9) to
make sure the address exists on the interface, the comparison will
fail causing bind(2) to return EADDRNOTAVAIL if the data wasnt zero'ed
before initialization.

Approved by:	bmilekic (mentor)
2004-05-10 15:07:23 +00:00
Scott Long
e6d95d5137 Add route.h to pick up the rt_ifmsg() declaration. 2004-05-04 02:39:41 +00:00
Maxim Konovalov
1a0c4873ed o Fix misindentation in the previous commit. 2004-05-03 17:15:34 +00:00
Andre Oppermann
127d7b2d2d Link state change notification of ethernet media to the routing socket.
o Extend the if_data structure with an ifi_link_state field and
  provide the corresponding defines for the valid states.

o The mii_linkchg() callback updates the ifi_link_state field
  and calls rt_ifmsg() to notify listeners on the routing socket
  in addition to the kqueue KNOTE.

o If vlans are configured on a physical interface notify and update
  all vlan pseudo devices as well with the vlan_link_state() callback.

No objections by:	sam, wpaul, ru, bms
Brucification by:	bde
2004-05-03 13:48:35 +00:00
Bosko Milekic
5a59cefcd1 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00
Luigi Rizzo
cd46a114fc This commit does two things:
1. rt_check() cleanup:
    rt_check() is only necessary for some address families to gain access
    to the corresponding arp entry, so call it only in/near the *resolve()
    routines where it is actually used -- at the moment this is
    arpresolve(), nd6_storelladdr() (the call is embedded here),
    and atmresolve() (the call is just before atmresolve to reduce
    the number of changes).
    This change will make it a lot easier to decouple the arp table
    from the routing table.

    There is an extra call to rt_check() in if_iso88025subr.c to
    determine the routing info length. I have left it alone for
    the time being.

    The interface of arpresolve() and nd6_storelladdr() now changes slightly:
     + the 'rtentry' parameter (really a hint from the upper level layer)
       is now passed unchanged from *_output(), so it becomes the route
       to the final destination and not to the gateway.
     + the routines will return 0 if resolution is possible, non-zero
       otherwise.
     + arpresolve() returns EWOULDBLOCK in case the mbuf is being held
       waiting for an arp reply -- in this case the error code is masked
       in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
    Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
    and use the IFP2AC macro to access arpcom fields.
    This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
   rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
   rt_check() cleanup

net/if_ethersubr.c
   rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
   rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
   rt_check() cleanup

netatalk/aarp.c
   arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
   arpcom untangling

netinet/if_ether.c
   rt_check() cleanup (change arpresolve)

netinet6/nd6.c
   rt_check() cleanup (change nd6_storelladdr)
2004-04-25 09:24:52 +00:00
Luigi Rizzo
490b9d88fa fix one typo and remove one wrong line 2004-04-25 01:39:00 +00:00
Luigi Rizzo
769270223c Correct and extend the description of the behaviour of rt_check(). 2004-04-24 23:34:56 +00:00
Luigi Rizzo
3916ebe8f0 document the locking behaviour of the functions that access
the routing table.
2004-04-24 23:34:04 +00:00
Luigi Rizzo
3fefbff0c2 arpcom untangling:
consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.

In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'
2004-04-24 22:24:48 +00:00
Luigi Rizzo
56f7062728 arpcom untangling:
do not use struct arpcom directly, rather use IFP2AC(ifp).
2004-04-24 22:11:13 +00:00
Luigi Rizzo
49572c5b0d arpcom untangling:
- use ifp instead if &ac->ac_if in a couple of nd6* calls;
   this removes a useless dependency.

 - use IFP2AC(ifp) instead of an extra variable to point to the struct arpcom;
   this does not remove the nesting dependency between arpcom and ifnet but
   makes it more evident.
2004-04-24 21:59:41 +00:00
Andre Oppermann
8b75eec175 Add the comment of the previous commit to the source file directly.
Requested by:	ru
2004-04-23 16:57:43 +00:00
Andre Oppermann
5efdd80a6a Call ip_output() with IP_FORWARD flag to prevent it from overwriting the
ip_id again.  ip_id is already set to the ip_id of the encapsulated packet.

Make a comment about mbuf allocation failures more realistic.

Reviewed by:	sobomax
2004-04-23 16:10:23 +00:00
Luigi Rizzo
04f05de961 Readability fixes:
Clearly comment the assumptions on the structure of keys (addresses)
and masks, and introduce a macro, LEN(p), to extract the size of these
objects instead of using *(u_char *)p which might be confusing.

Comment the confusion in the types used to pass around pointers
to keys and masks, as a reminder to fix that at some point.

Add a few comments on what some functions do.

Comment a probably inefficient (but still correct) section of code
in rn_walktree_from()

The object code generated after this commit is the same as before.

At some point we should also change same variable identifiers such
as "t, tt, ttt" to fancier names such as "root, left, right" (just
in case someone wants to understand the code!), replace misspelling
of NULL as 0, remove 'register' declarations that make little sense
these days.
2004-04-21 15:27:36 +00:00
Luigi Rizzo
d6941ce931 Clearly comment the assumptions that allow us to cast a
'struct radix_node *' to a 'struct rtentry *' in this code,
and introduce a macro, RNTORT(), to do this type conversion.
2004-04-21 15:16:08 +00:00
Luigi Rizzo
85911824db Fix the initial check for NULL arguments in rtfree (previously
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.

Slightly simplify (and document) the logic for memory allocation
in rt_setgate().

The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.
2004-04-20 07:04:47 +00:00
Luigi Rizzo
f76d5670c0 Document an assumption on the structure of 'struct rtentry' 2004-04-20 07:03:30 +00:00