Commit Graph

1542 Commits

Author SHA1 Message Date
dillon
bb806c49bf Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
hsu
2c79764ced Cosmetic-only changes for readability.
Reviewed by:	(early form passed by) bde
Approved by:	itojun (from core@kame.net)
2002-08-17 02:05:25 +00:00
luigi
6d2f675ff7 sys/netinet/ip_fw2.c:
Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops
    for firewall-generated packets (the constant has to go in sys/mbuf.h).

    Better comments on keepalive generation, and enforce dyn_rst_lifetime
    and dyn_fin_lifetime to be less than dyn_keepalive_period.

    Enforce limits (up to 64k) on the number of dynamic buckets, and
    retry allocation with smaller sizes.

    Raise default number of dynamic rules to 4096.

    Improved handling of set of rules -- now you can atomically
    enable/disable multiple sets, move rules from one set to another,
    and swap sets.

sbin/ipfw/ipfw2.c:

    userland support for "noerror" pipe attribute.

    userland support for sets of rules.

    minor improvements on rule parsing and printing.

sbin/ipfw/ipfw.8:

    more documentation on ipfw2 extensions, differences from ipfw1
    (so we can use the same manpage for both), stateful rules,
    and some additional examples.
    Feedback and more examples needed here.
2002-08-16 10:31:47 +00:00
alfred
30a4b05c04 make the strings for tcptimers, tanames and prurequests const to silence
warnings.
2002-08-16 09:07:59 +00:00
rwatson
331fe87203 Code formatting sync to trustedbsd_mac: don't perform an assignment
in an if clause.

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
2002-08-15 22:04:31 +00:00
rwatson
aa8060c29e Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 18:51:27 +00:00
hsu
deb0142560 Reset dupack count in header prediction.
Follow-on to rev 1.39.

Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
2002-08-15 17:13:18 +00:00
luigi
2cf7fe40e7 Kernel support for a dummynet option:
When a pipe or queue has the "noerror" attribute, do not report
drops to the caller (ip_output() and friends).
(2 lines to implement it, 2 lines to document it.)

This will let you simulate losses on the sender side as if they
happened in the middle of the network, i.e. with no explicit feedback
to the sender.

manpage and ipfw2.c changes to follow shortly, together with other
ipfw2 changes.

Requested by: silby
MFC after: 3 days
2002-08-15 16:53:43 +00:00
rwatson
bdc6074d4f It's now sufficient to rely on a nested include of _label.h to make sure
all structures in ip_var.h are defined, so remove include of mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:45 +00:00
rwatson
9b4d2c89ad Perform a nested include of _label.h if #ifdef _KERNEL. This will
satisfy consumers of ip_var.h that need a complete definition of
struct ipq and don't include mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:02 +00:00
rwatson
3df0baf6ac Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which
is no longer present.

Pointed out by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:27:46 +00:00
phk
88ea06b67f remove spurious printf 2002-08-13 19:13:23 +00:00
jennifer
5709e80b7d Assert that the inpcb lock is held when calling tcp_output().
Approved by:	hsu
2002-08-12 03:22:46 +00:00
luigi
e3c4c6c9da One bugfix and one new feature.
The bugfix (ipfw2.c) makes the handling of port numbers with
a dash in the name, e.g. ftp-data, consistent with old ipfw:
use \\ before the - to consider it as part of the name and not
a range separator.

The new feature (all this description will go in the manpage):

each rule now belongs to one of 32 different sets, which can
be optionally specified in the following form:

	ipfw add 100 set 23 allow ip from any to any

If "set N" is not specified, the rule belongs to set 0.

Individual sets can be disabled, enabled, and deleted with the commands:

	ipfw disable set N
	ipfw enable set N
	ipfw delete set N

Enabling/disabling of a set is atomic. Rules belonging to a disabled
set are skipped during packet matching, and they are not listed
unless you use the '-S' flag in the show/list commands.
Note that dynamic rules, once created, are always active until
they expire or their parent rule is deleted.
Set 31 is reserved for the default rule and cannot be disabled.

All sets are enabled by default. The enable/disable status of the sets
can be shown with the command

	ipfw show sets

Hopefully, this feature will make life easier to those who want to
have atomic ruleset addition/deletion/tests. Examples:

To add a set of rules atomically:

	ipfw disable set 18
	ipfw add ... set 18 ...		# repeat as needed
	ipfw enable set 18

To delete a set of rules atomically

	ipfw disable set 18
	ipfw delete set 18
	ipfw enable set 18

To test a ruleset and disable it and regain control if something
goes wrong:

	ipfw disable set 18
	ipfw add ... set 18 ...         # repeat as needed
	ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18

    here if everything goes well, you press control-C before
    the "sleep" terminates, and your ruleset will be left
    active. Otherwise, e.g. if you cannot access your box,
    the ruleset will be disabled after the sleep terminates.

I think there is only one more thing that one might want, namely
a command to assign all rules in set X to set Y, so one can
test a ruleset using the above mechanisms, and once it is
considered acceptable, make it part of an existing ruleset.
2002-08-10 04:37:32 +00:00
silby
23c761aa82 Handle PMTU discovery in syn-ack packets slightly differently;
rely on syncache flags instead of directly accessing the route
entry.

MFC after:	3 days
2002-08-05 22:34:15 +00:00
luigi
056c90a35e bugfix: move check for udp_blackhole before the one for icmp_bandlim.
MFC after: 3 days
2002-08-04 20:50:13 +00:00
luigi
6f96abf099 Fix handling of packets which matched an "ipfw fwd" rule on the input side. 2002-08-03 14:59:45 +00:00
rwatson
d429ea44ea When preserving the IP header in extra mbuf in the IP forwarding
case, also preserve the MAC label.  Note that this mbuf allocation
is fairly non-optimal, but not my fault.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-02 20:45:27 +00:00
rwatson
f0cf913285 Work to fix LINT build.
Reported by:	phk
2002-08-02 18:08:14 +00:00
rwatson
3b36c9b2c4 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add MAC support for the UDP protocol.  Invoke appropriate MAC entry
points to label packets that are generated by local UDP sockets,
and to authorize delivery of mbufs to local sockets both in the
multicast/broadcast case and the unicast case.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 21:37:34 +00:00
rwatson
00b9d17af8 Document the undocumented assumption that at least one of the PCB
pointer and incoming mbuf pointer will be non-NULL in tcp_respond().
This is relied on by the MAC code for correctness, as well as
existing code.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:54:43 +00:00
rwatson
41180c5da4 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add support for labeling most out-going ICMP messages using an
appropriate MAC entry point.  Currently, we do not explicitly
label packet reflect (timestamp, echo request) ICMP events,
implicitly using the originating packet label since the mbuf is
reused.  This will be made explicit at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:53:04 +00:00
rwatson
a034d0cd3c Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the TCP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check socket and
mbuf labels before permitting delivery to a socket.  Assign labels
to newly accepted connections when the syncache/cookie code has done
its business.  Also set peer labels as convenient.  Currently,
MAC policies cannot influence the PCB matching algorithm, so cannot
implement polyinstantiation.  Note that there is at least one case
where a PCB is not available due to the TCP packet not being associated
with any socket, so we don't label in that case, but need to handle
it in a special manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 19:06:49 +00:00
rwatson
9ab1b809a6 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the raw IP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check the
socket and mbuf labels before permitting delivery to a socket,
permitting MAC policies to selectively allow delivery of raw IP mbufs
to various raw IP sockets that may be open.  Restructure the policy
checking code to compose IPsec and MAC results in a more readable
manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 18:30:34 +00:00
rwatson
7ecedd74b3 Introduce support for Mandatory Access Control and extensible
kernel access control.

When fragmenting an IP datagram, invoke an appropriate MAC entry
point so that MAC labels may be copied (...) to the individual
IP fragment mbufs by MAC policies.

When IP options are inserted into an IP datagram when leaving a
host, preserve the label if we need to reallocate the mbuf for
alignment or size reasons.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:21:01 +00:00
rwatson
c520fb317a Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the code managing IP fragment reassembly queues (struct ipq)
to invoke appropriate MAC entry points to maintain a MAC label on
each queue.  Permit MAC policies to associate information with a queue
based on the mbuf that caused it to be created, update that information
based on further mbufs accepted by the queue, influence the decision
making process by which mbufs are accepted to the queue, and set the
label of the mbuf holding the reassembled datagram following reassembly
completetion.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:17:51 +00:00
rwatson
47bde04257 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an IGMP message, invoke a MAC entry point to permit
the MAC framework to label its mbuf appropriately for the target
interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:46:56 +00:00
rwatson
ea680c3385 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an ARP query, invoke a MAC entry point to permit the
MAC framework to label its mbuf appropriately for the interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:45:16 +00:00
rwatson
f2eb16e52d Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the MAC framework to label mbuf created using divert sockets.
These labels may later be used for access control on delivery to
another socket, or to an interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI LAbs
2002-07-31 16:42:47 +00:00
rwatson
2cef0b1901 Introduce support for Mandatory Access Control and extensible
kernel access control.

Label IP fragment reassembly queues, permitting security features to
be maintained on those objects.  ipq_label will be used to manage
the reassembly of fragments into IP datagrams using security
properties.  This permits policies to deny the reassembly of fragments,
as well as influence the resulting label of a datagram following
reassembly.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 23:09:20 +00:00
maxim
3a86ff2aeb Use a common way to release locks before exit.
Reviewed by:	hsu
2002-07-29 09:01:39 +00:00
truckman
b1555a2743 Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
ume
3428bc649e make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.
MFC after:	1 week
2002-07-25 18:10:04 +00:00
ume
e96d7f2205 cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,
ip6_mapped_addr_on is unified into ip6_v6only.

MFC after:	1 week
2002-07-25 17:40:45 +00:00
luigi
ccc9f3815e Only log things net.inet.ip.fw.verbose is set 2002-07-24 02:41:19 +00:00
ru
7bd1d4e8de Don't forget to recalculate the IP checksum of the original
IP datagram embedded into ICMP error message.

Spotted by:	tcpdump 3.7.1 (-vvv)
MFC after:	3 days
2002-07-23 00:16:19 +00:00
ru
cb7f1cefec Don't shrink socket buffers in tcp_mss(), application might have already
configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows.

PR:		kern/11966
MFC after:	1 week
2002-07-22 22:31:09 +00:00
ume
cd3f29ae66 do not refer to IN6P_BINDV6ONLY anymore.
Obtained from:	KAME
MFC after:	1 week
2002-07-22 15:51:02 +00:00
jdp
4a190d7015 Fix overflows in intermediate calculations in sysctl_msec_to_ticks().
At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle
to be reported as negative.

MFC after:	3 days
2002-07-20 23:48:59 +00:00
rwatson
25dc25ceae Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel
data structures pick up security and synchronization primitives, it
becomes increasingly desirable not to arbitrarily export them via
include files to userland, as the userland applications pick up new
#include dependencies.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-20 22:46:20 +00:00
dillon
62fef107e4 Add the tcps_sndrexmitbad statistic, keep track of late acks that caused
unnecessary retransmissions.
2002-07-19 18:29:38 +00:00
dillon
66e9d99f6c Introduce two new sysctl's:
net.inet.tcp.rexmit_min (default 3 ticks equiv)

    This sysctl is the retransmit timer RTO minimum,
    specified in milliseconds.  This value is
    designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

    This sysctl is the retransmit timer RTO slop
    which is added to every retransmit timeout and
    is designed to handle protocol stack overheads
    and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!).  Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.
2002-07-18 19:06:12 +00:00
luigi
45b936c950 Move IPFW2 definition before including ip_fw.h
Make indentation of new parts consistent with the style used for this file.
2002-07-18 05:18:41 +00:00
dillon
f7ed3f7332 I don't know how the minimum retransmit timeout managed to get set to
one second but it badly breaks throughput on networks with minor packet
loss.

Complaints by: at least two people tracked down to this.
MFC after:	3 days
2002-07-17 23:32:03 +00:00
luigi
7bf1404d8f Fix a panic when doing "ipfw add pipe 1 log ..."
Also synchronize ip_dummynet.c with the version in RELENG_4 to
ease MFC's.
2002-07-17 07:21:42 +00:00
luigi
d3091fc32b Implement keepalives for dynamic rules, so they will not expire
just because you leave your session idle.

Also, put in a fix for 64-bit architectures (to be revised).

In detail:

ip_fw.h

  * Reorder fields in struct ip_fw to avoid alignment problems on
    64-bit machines. This only masks the problem, I am still not
    sure whether I am doing something wrong in the code or there
    is a problem elsewhere (e.g. different aligmnent of structures
    between userland and kernel because of pragmas etc.)

  * added fields in dyn_rule to store ack numbers, so we can
    generate keepalives when the dynamic rule is about to expire

ip_fw2.c

  * use a local function, send_pkt(), to generate TCP RST for Reset rules;

  * save about 250 bytes by cleaning up the various snprintf()
    in ipfw_log() ...

  * ... and use twice as many bytes to implement keepalives
    (this seems to be working, but i have not tested it extensively).

Keepalives are generated once every 5 seconds for the last 20 seconds
of the lifetime of a dynamic rule for an established TCP flow.  The
packets are sent to both sides, so if at least one of the endpoints
is responding, the timeout is refreshed and the rule will not expire.

You can disable this feature with

        sysctl net.inet.ip.fw.dyn_keepalive=0

(the default is 1, to have them enabled).

MFC after: 1 day

(just kidding... I will supply an updated version of ipfw2 for
RELENG_4 tomorrow).
2002-07-14 23:47:18 +00:00
luigi
7cb243d2f4 Avoid dereferencing a null pointer in ro_rt.
This was always broken in HEAD (the offending statement was introduced
in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev.
1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30.

So I am also restoring these two lines in RELENG_4 now.
We might need another few things from 1.99.2.30.
2002-07-12 22:08:47 +00:00
truckman
90d4a243cd Back out the previous change, since it looks like locking udbinfo provides
sufficient protection.
2002-07-12 09:55:48 +00:00
truckman
eadeed2263 Lock inp while we're accessing it. 2002-07-12 08:05:22 +00:00
truckman
5d8999f18a Defer calling SYSCTL_OUT() until after the locks have been released. 2002-07-11 23:18:43 +00:00
truckman
fe3b828f84 Reduce the nesting level of a code block that doesn't need to be in
an else clause.
2002-07-11 23:13:31 +00:00
luigi
65eee719d2 Change one variable to make it easier to switch between ipfw and ipfw2 2002-07-09 06:53:38 +00:00
luigi
152cae690e Fix a bug caused by dereferencing an invalid pointer when
no punch_fw was used.
Fix another couple of bugs which prevented rules from being
installed properly.

On passing, use IPFW2 instead of NEW_IPFW to compile the new code,
and slightly simplify the instruction generation code.
2002-07-08 22:57:35 +00:00
luigi
c0b0dbd582 No functional changes, but:
Following Darren's suggestion, make Dijkstra happy and rewrite the
ipfw_chk() main loop removing a lot of goto's and using instead a
variable to store match status.

Add a lot of comments to explain what instructions are supposed to
do and how -- this should ease auditing of the code and make people
more confident with it.

In terms of code size: the entire file takes about 12700 bytes of text,
about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!)
for ipfw_log().
2002-07-08 22:46:01 +00:00
luigi
29f095d651 Remove one unused command name. 2002-07-08 22:39:19 +00:00
luigi
1bddbf46a1 Forgot to update one field name in one of the latest commits. 2002-07-08 22:37:55 +00:00
luigi
95e13a442b Implement the last 2-3 missing instructions for ipfw,
now it should support all the instructions of the old ipfw.

Fix some bugs in the user interface, /sbin/ipfw.

Please check this code against your rulesets, so i can fix the
remaining bugs (if any, i think they will be mostly in /sbin/ipfw).

Once we have done a bit of testing, this code is ready to be MFC'ed,
together with a bunch of other changes (glue to ipfw, and also the
removal of some global variables) which have been in -current for
a couple of weeks now.

MFC after: 7 days
2002-07-05 22:43:06 +00:00
brian
95fdc0d642 Remove trailing whitespace 2002-07-01 11:19:40 +00:00
jesper
3a003d229a Extend the effect of the sysctl net.inet.tcp.icmp_may_rst
so that, if we recieve a ICMP "time to live exceeded in transit",
(type 11, code 0) for a TCP connection on SYN-SENT state, close
the connection.

MFC after:	2 weeks
2002-06-30 20:07:21 +00:00
jlemon
3d67b56283 One possible code path for syncache_respond() is:
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)

Which winds up deleting a different entry from the syncache.  Handle
this by not utilizing the next entry in the timer chain until after
syncache_respond() completes.  The case of A == B should not be possible.

Problem found by: Don Bowman <don@sandvine.com>
2002-06-28 19:12:38 +00:00
dfr
e59e678096 Fix warning.
Reviewed by: luigi
2002-06-28 08:36:26 +00:00
luigi
a9ab854862 The new ipfw code.
This code makes use of variable-size kernel representation of rules
(exactly the same concept of BPF instructions, as used in the BSDI's
firewall), which makes firewall operation a lot faster, and the
code more readable and easier to extend and debug.

The interface with the rest of the system is unchanged, as witnessed
by this commit. The only extra kernel files that I am touching
are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In
userland I only had to touch those programs which manipulate the
internal representation of firewall rules).

The code is almost entirely new (and I believe I have written the
vast majority of those sections which were taken from the former
ip_fw.c), so rather than modifying the old ip_fw.c I decided to
create a new file, sys/netinet/ip_fw2.c .  Same for the user
interface, which is in sbin/ipfw/ipfw2.c (it still compiles to
/sbin/ipfw).  The old files are still there, and will be removed
in due time.

I have not renamed the header file because it would have required
touching a one-line change to a number of kernel files.

In terms of user interface, the new "ipfw" is supposed to accepts
the old syntax for ipfw rules (and produce the same output with
"ipfw show". Only a couple of the old options (out of some 30 of
them) has not been implemented, but they will be soon.

On the other hand, the new code has some very powerful extensions.
First, you can put "or" connectives between match fields (and soon
also between options), and write things like

ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any

This should make rulesets slightly more compact (and lines longer!),
by condensing 2 or more of the old rules into single ones.

Also, as an example of how easy the rules can be extended, I have
implemented an 'address set' match pattern, where you can specify
an IP address in a format like this:

        10.20.30.0/26{18,44,33,22,9}

which will match the set of hosts listed in braces belonging to the
subnet 10.20.30.0/26 . The match is done using a bitmap, so it is
essentially a constant time operation requiring a handful of CPU
instructions (and a very small amount of memmory -- for a full /24
subnet, the instruction only consumes 40 bytes).

Again, in this commit I have focused on functionality and tried
to minimize changes to the other parts of the system. Some performance
improvement can be achieved with minor changes to the interface of
ip_fw_chk_t. This will be done later when this code is settled.

The code is meant to compile unmodified on RELENG_4 (once the
PACKET_TAG_* changes have been merged), for this reason
you will see #ifdef __FreeBSD_version in a couple of places.
This should minimize errors when (hopefully soon) it will be time
to do the MFC.
2002-06-27 23:02:18 +00:00
mux
5347bbeaf2 Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
luigi
c576432afc Just a comment on some additional consistency checks that could
be added here.
2002-06-26 21:00:53 +00:00
ken
0d3a835f3f At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
hsu
deb4c976c7 Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by:	jlemon
2002-06-24 22:25:00 +00:00
hsu
f13c72f301 Style bug: fix 4 space indentations that should have been tabs.
Submitted by:	jlemon
2002-06-24 16:47:02 +00:00
luigi
fbbaa9503a Slightly restructure the #ifdef INET6 sections to make the code
more readable.

Remove the six "register" attributes from variables tcp_output(), the
compiler surely knows well how to allocate them.
2002-06-23 21:25:36 +00:00
luigi
ebcf841898 Move two global variables to automatic variables within the
only function where they are used (they are used with TCPDEBUG only).
2002-06-23 21:22:56 +00:00
luigi
e49d2528a1 Move some global variables in more appropriate places.
Add XXX comments to mark places which need to be taken care of
if we want to remove this part of the kernel from Giant.

Add a comment on a potential performance problem with ip_forward()
2002-06-23 20:48:26 +00:00
luigi
21d4ca5fd2 fix bad indentation and whitespace resulting from cut&paste 2002-06-23 09:15:43 +00:00
luigi
d781bb8585 fix indentation of a comment 2002-06-23 09:14:24 +00:00
luigi
b2109fbe30 fix a typo in a comment 2002-06-23 09:13:46 +00:00
luigi
085f8ffcb0 Remove ip_fw_fwd_addr (forgotten in previous commit)
remove some extra whitespace.
2002-06-23 09:03:42 +00:00
luigi
5259888148 Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

        ip_divert_cookie        used by divert sockets
        ip_fw_fwd_addr          used for transparent ip redirection
        last_pkt                used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
 * there is at least one global variable left, sro_fwd, in ip_output().
   I am not sure if/how this can be removed.

 * I have deliberately avoided gratuitous style changes in this commit
   to avoid cluttering the diffs. Minor stule cleanup will likely be
   necessary

 * this commit only focused on the IP layer. I am sure there is a
   number of global variables used in the TCP and maybe UDP stack.

 * despite the number of files touched, there are absolutely no API's
   or data structures changed by this commit (except the interfaces of
   ip_fw_chk() and dummynet_io(), which are internal anyways), so
   an MFC is quite safe and unintrusive (and desirable, given the
   improved readability of the code).

MFC after: 10 days
2002-06-22 11:51:02 +00:00
hsu
3710c0eed0 Fix logic which resulted in missing a call to INP_UNLOCK().
Submitted by:	jlemon, mux
2002-06-21 22:54:16 +00:00
hsu
80cef86a8d TCP notify functions can change the pcb list. 2002-06-21 22:52:48 +00:00
peter
c7fdf6d30b Solve the 'unregistered netisr 18' information notice with a sledgehammer.
Register the ISR early, but do not actually kick off the timer until we
see some activity.  This still saves us from running the arp timers on
a system with no network cards.
2002-06-20 01:27:40 +00:00
tanimura
cb3347e926 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
hsu
abda76de0b Notify functions can destroy the pcb, so they have to return an
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.

Submitted by:	Jennifer Yang (yangjihui@yahoo.com)
Bug reported by:  Sid Carter (sidcarter@symonds.net)
2002-06-14 08:35:21 +00:00
silby
86950bb1f4 Re-commit w/fix:
Ensure that the syn cache's syn-ack packets contain the same
  ip_tos, ip_ttl, and DF bits as all other tcp packets.

  PR:             39141
  MFC after:      2 weeks

This time, make sure that ipv4 specific code (aka all of the above)
is only run in the ipv4 case.
2002-06-14 03:08:05 +00:00
silby
0bbc7c9dc5 Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)
Pointy-hat to:	silby
2002-06-14 02:43:20 +00:00
silby
acb0745bfa Ensure that the syn cache's syn-ack packets contain the same
ip_tos, ip_ttl, and DF bits as all other tcp packets.

PR:		39141
MFC after:	2 weeks
2002-06-14 02:36:34 +00:00
hsu
c78cdaf83b Because we're holding an exclusive write lock on the head, references to
the new inp cannot leak out even though it has been placed on the head list.
2002-06-13 23:14:58 +00:00
hsu
c580ba61b3 The UDP head was unlocked too early in one unicast case.
Submitted by:	bug reported by arr
2002-06-12 15:21:41 +00:00
hsu
b67cb93fe3 Fix logic which resulted in missing a call to INP_UNLOCK(). 2002-06-12 03:11:06 +00:00
hsu
ab949ac863 Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK.
Submitted by: tegge, jlemon

Prefer LIST_FOREACH macro.
  Submitted by: jlemon
2002-06-12 03:08:08 +00:00
hsu
d1834ccc3b Remember to initialize the control block head mutex. 2002-06-11 10:58:57 +00:00
hsu
f140f41dad Fix typo.
Submitted by:	Kyunghwan Kim <redjade@atropos.snu.ac.kr>
2002-06-11 10:56:49 +00:00
hsu
439384bfd7 Every array elt is initialized in the following loop, so remove
unnecessary M_ZERO.
2002-06-10 23:48:37 +00:00
hsu
cd25d4648f Lock up inpcb.
Submitted by:	Jennifer Yang <yangjihui@yahoo.com>
2002-06-10 20:05:46 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
wollman
5f28f6025e Avoid unintentional trigraph. 2002-05-30 20:53:45 +00:00
arr
37981f345c - Change the newly turned INVARIANTS #ifdef blocks (they were changed from
DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code
  readability.
2002-05-21 18:52:24 +00:00
arr
f20545d47c - Turn a few DIAGNOSTIC into INVARIANTS since they are really sanity
checks.
2002-05-20 22:05:13 +00:00
arr
56aea61cc9 - Turn a DIAGNOSTIC into an INVARIANTS since it's a sanity check. Use
proper ``if'' statement style.
2002-05-20 22:04:19 +00:00
arr
6fe64080f2 - Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this line
through the #endif is really a sanity check.

Reviewed by: jake
2002-05-20 21:50:39 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
kbyanc
134bb77c23 Reset token-ring source routing control field on receipt of ethernet frame
without source routing information.  This restores the behaviour in this
scenario to that of prior to my last commit.
2002-05-15 01:03:32 +00:00
rwatson
be8339f00b Modify the arguments to syncache_socket() to include the mbuf (m) that
results in the syncache entry being turned into a socket.  While it's
not used in the main tree, this is required in the MAC tree so that
labels can be propagated from the mbuf to the socket.  This is also
useful if you're doing things like transparent IP connection hijacking
and you want to use the syncache/cookie mechanism, but we won't go
there.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-05-14 18:57:55 +00:00
luigi
2afce45ffc Add ipfw hooks to ether_demux() and ether_output_frame().
Ipfw processing of frames at layer 2 can be enabled by the sysctl variable

	net.link.ether.ipfw=1

Consider this feature experimental, because right now, the firewall
is invoked in the places indicated below, and controlled by the
sysctl variables listed on the right.  As a consequence, a packet
can be filtered from 1 to 4 times depending on the path it follows,
which might make a ruleset a bit hard to follow.

I will add an ipfw option to tell if we want a given rule to apply
to ether_demux() and ether_output_frame(), but we have run out of
flags in the struct ip_fw so i need to think a bit on how to implement
this.

		to upper layers
	     |			     |
	     +----------->-----------+
	     ^			     V
	[ip_input]		[ip_output]	net.inet.ip.fw.enable=1
	     |			     |
	     ^			     V
	[ether_demux]      [ether_output_frame]	net.link.ether.ipfw=1
	     |			     |
	     +->- [bdg_forward]-->---+		net.link.ether.bridge_ipfw=1
	     ^			     V
	     |			     |
		 to devices
2002-05-13 10:37:19 +00:00
luigi
f172dc1bd4 Remove custom definitions (IP_FW_TCPF_SYN etc.) of TCP header flags
which are the same as the original ones (TH_SYN etc.)
2002-05-13 10:21:13 +00:00
luigi
320493f9eb Add code to match MAC header fields (at the moment supported on
bridged packets only, soon to come also for packets on ordinary
ether_input() and ether_output() paths. The syntax is

    ipfw add <action> MAC dst src type

where dst and src can be "any" or a MAC address optionallyfollowed
by a mask, e.g.

	10:20:30:40:50
	10:20:30:40:50/32
	10:20:30:40:50&ff:ff:ff:f0:ff:0f

and type can be a single ethernet type, a range, or a type followed by
a mask (values are always in hexadecimal) e.g.

	0800
	0800-0806
	0800/8
	0800&03ff

Note, I am still uncertain on what is the best format for inputting
these values, having the values in hexadecimal is convenient in most
cases but can be confusing sometimes. Suggestions welcome.

Implement suggestion from PR 37778 to allow "not me" on destination
and source IP. The code in the PR was slightly wrong and interfered
with the normal handling of IP addresses. This version hopefully is
correct.

Minor cleanup of the code, in some places moving the indentation to 4
spaces because the code was becoming too deep. Eventually, in a
separate commit, I will move the whole file to 4 space indent.
2002-05-12 20:43:50 +00:00
dd
9cc64ca23c s/demon/daemon/ 2002-05-12 00:22:38 +00:00
mike
3ef853a60c Remove some duplicate types that should have been removed as part of
the rearranging in the previous revision.

Pointy hat to:	cvs update (merging), mike (for not noticing)
2002-05-11 23:28:51 +00:00
luigi
23cf222c81 Cleanup the interface to ip_fw_chk, two of the input arguments
were totally useless and have been removed.

ip_input.c, ip_output.c:
    Properly initialize the "ip" pointer in case the firewall does an
    m_pullup() on the packet.

    Remove some debugging code forgotten long ago.

ip_fw.[ch], bridge.c:
    Prepare the grounds for matching MAC header fields in bridged packets,
    so we can have 'etherfw' functionality without a lot of kernel and
    userland bloat.
2002-05-09 10:34:57 +00:00
kbyanc
cc607e6c2d Move ISO88025 source routing information into sockaddr_dl's sdl_data
field.  This returns the sdl_data field to a variable-length field.  More
importantly, this prevents a easily-reproduceable data-corruption bug when
the interface name plus the hardware address exceed the sdl_data field's
original 12 byte limit.  However, token-ring interfaces may still overflow
the new sdl_data field's 46 byte limit if the interface name exceeds 6
characters (since 6 characters for interface name plus 6 for hardware
address plus 34 for source routing = the size of sdl_data).  Further
refinements could overcome this limitation but would break binary
compatibility; this commit only addresses fixing the bug for
commonly-occuring cases without breaking binary compatibility with the
intention that the functionality can be MFC'ed to -stable.

  See message ID's (both send to -arch):
	20020421013332.F87395-100000@gateway.posi.net
	20020430181359.G11009-300000@gateway.posi.net
  for a more thorough description of the bug addressed and how to
reproduce it.

Approved by:	silence on -arch and -net
Sponsored by:	NTT Multimedia Communications Labs
MFC after:	1 week
2002-05-07 22:14:06 +00:00
ume
0dc033806b Revised MLD-related definitions
- Used mld_xxx and MLD_xxx instead of mld6_xxx and MLD6_xxx according
  to the official defintions in rfc2292bis
  (macro definitions for backward compatibility were provided)
- Changed the first member of mld_hdr{} from mld_hdr to mld_icmp6_hdr
  to avoid name space conflict in C++

This change makes ports/net/pchar compilable again under -CURRENT.

Obtained from:	KAME
2002-05-06 16:28:25 +00:00
luigi
a03098c406 Indentation and comments cleanup, no functional change.
MFC after: 3 days
2002-05-05 21:27:47 +00:00
alfred
798c53d495 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
alfred
21257e117d Fix some edge cases where bad string handling could occur.
Submitted by: ps
2002-05-01 08:29:41 +00:00
alfred
f34e021666 cleanup:
fix line wraps, add some comments, fix macro definitions, fix for(;;) loops.
2002-05-01 08:08:24 +00:00
cjc
6b0c9026c6 Enlighten those who read the FINE POINTS of the documentation a bit
more on how ipfw(8) deals with tiny fragments. While we're at it, add
a quick log message to even let people know we dropped a packet. (Note
that the second FINE POINT is somewhat redundant given the first, but
since the code is there, leave the docs for it.)

MFC after:	1 day
2002-05-01 06:29:16 +00:00
tanimura
89ec521d91 Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
tanimura
dbb4756491 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
mike
491520a810 Rearrange <netinet/in.h> so that it is easier to conditionalize
sections for various standards.  Conditionalize sections for various
standards.  Use standards conforming spelling for types in the
sockaddr_in structure.
2002-04-24 01:26:11 +00:00
mike
39f7a31d80 Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h>
and <sys/socket.h>.  Previously, sa_family_t was only typedef'd in
<sys/socket.h>.
2002-04-20 02:24:35 +00:00
suz
553226e8e1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
suz
7a2a62c14d initialize local variable explicitly
Reviewed by:	ume
Obtained from:	Fujitsu guys
MFC after:	1 week
2002-04-11 02:14:21 +00:00
silby
c7389be7ba Remove some ISN generation code which has been unused since the
syncache went in.

MFC after:	3 days
2002-04-10 22:12:01 +00:00
silby
5c10a8af24 Totally nuke IPPORT_USERRESERVED, it is no longer used anywhere, update
remaining comments to reflect new ephemeral port range.

Reminded by:	Maxim Konovalov <maxim@macomnet.ru>
MFC after:	3 days
2002-04-10 19:30:58 +00:00
mike
4100d7ad0f Unconditionalize the definition of INET_ADDRSTRLEN and
INET6_ADDRSTRLEN.  Doing this helps expose bogus redefinitions in 3rd
party software.
2002-04-10 11:59:02 +00:00
brian
c321804c50 Remove the code that masks an EEXIST returned from rtinit() when
calling ioctl(SIOC[AS]IFADDR).

This allows the following:

  ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00
  ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias
  ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias
  ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias

but would (given the above) reject this:

  ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias

due to the conflicting netmasks.  I would assert that it's wrong
to mask the EEXIST returned from rtinit() as in the above scenario, the
deletion of the 1.2.3.25 address will leave the 1.2.3.27 address
as unroutable as it was in the first place.

Offered for review on: -arch, -net
Discussed with: stephen macmanus <stephenm@bayarea.net>
MFC after: 3 weeks
2002-04-10 01:42:44 +00:00
brian
2eb3cb5cca Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255.
This change allows bootp to work with more than one interface, at the
expense of some rather ``wrong'' looking code.  I plan to MFC this in
place of luigi's recent #ifdef BOOTP stuff that was committed to this
file in -stable, as that's slightly more wrong that this is.

Offered for review on: -arch, -net
MFC after: 2 weeks
2002-04-10 01:42:32 +00:00
jhb
6615797e53 Change the first argument of prison_xinpcb() to be a thread pointer instead
of a proc pointer so that prison_xinpcb() can use td_ucred.
2002-04-09 20:04:10 +00:00
silby
5339bdcf65 Update comments to reflect the recent ephemeral port range
change.

Noticed by:	ru
MFC After:	1 day
2002-04-09 18:01:26 +00:00
mdodd
23ff620fb4 Retire this copy; it now lives in sys/net/fddi.h. 2002-04-05 19:24:38 +00:00
jhb
db9aa81e23 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
jhb
dc2e474f79 Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
mike
beecc37c73 o Implement <sys/_types.h>, a new header for storing types that are
MI, not required to be a fixed size, and used in multiple headers.
  This will grow in time, as more things move here from <sys/types.h>
  and <machine/ansi.h>.
o Add missing type definitions (uint16_t and uint32_t) to
  <arpa/inet.h> and <netinet/in.h>.
o Reduce pollution in <sys/types.h> by using `#if _FOO_T_DECLARED'
  widgets to avoid including <sys/stdint.h>.
o Add some missing type definitions to <unistd.h> and note the ones
  that still need to be added.
o Make use of <sys/_types.h> primitives in <grp.h> and <sys/types.h>.

Reviewed by:	bde
2002-04-01 08:12:25 +00:00
bde
867fc1ed1c Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.
2002-03-24 10:19:10 +00:00
rwatson
afe2b1f929 Merge from TrustedBSD MAC branch:
Move the network code from using cr_cansee() to check whether a
    socket is visible to a requesting credential to using a new
    function, cr_canseesocket(), which accepts a subject credential
    and object socket.  Implement cr_canseesocket() so that it does a
    prison check, a uid check, and add a comment where shortly a MAC
    hook will go.  This will allow MAC policies to seperately
    instrument the visibility of sockets from the visibility of
    processes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-22 19:57:41 +00:00
ru
cb4688c90e Prevent icmp_reflect() from calling ip_output() with a NULL route
pointer which will then result in the allocated route's reference
count never being decremented.  Just flood ping the localhost and
watch refcnt of the 127.0.0.1 route with netstat(1).

Submitted by:	jayanth

Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed
ip_output() to be called with a NULL route pointer.  The previous
paragraph shows why this was a bad idea in the first place.

MFC after:	0 days
2002-03-22 16:45:54 +00:00
silby
c260993d3b Change the ephemeral port range from 1024-5000 to 49152-65535.
This increases the number of concurrent outgoing connections from ~4000
to ~16000.  Other OSes (Solaris, OS X, NetBSD) and many other NAT
products have already made this change without ill effects, so we
should not run into any problems.

MFC after:	1 week
2002-03-22 03:28:11 +00:00
orion
10ea87ba4b Send periodic ARP requests when ARP entries for hosts we are sending
to are about to expire.  This prevents high packet rate flows from
experiencing packet drops at the sender following ARP cache entry
timeout.

PR:		kern/25517
Reviewed by:	luigi
MFC after:	7 days
2002-03-20 15:56:36 +00:00
jeff
0a59f1223c Switch vm_zone.h with uma.h. Change over to uma interfaces. 2002-03-20 05:48:55 +00:00
alfred
357e37e023 Remove __P. 2002-03-19 21:25:46 +00:00
jeff
2923687da3 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
rwatson
60d6d81252 NAI DBA update 2002-03-14 16:53:39 +00:00
mike
b9910027dd o Add INET_ADDRSTRLEN and INET6_ADDRSTRLEN defines to <arpa/inet.h>
for POSIX.1-2001 conformance.
o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent
  redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN.
o Add a note about missing typedefs in <arpa/inet.h>.
2002-03-10 06:42:27 +00:00
mike
b8cc0d1207 o Don't require long long support in bswap64() functions.
o In i386's <machine/endian.h>, macros have some advantages over
  inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
  (previously __uint16_swap_uint32), it has some uses on i386's with
  PDP endianness.

Submitted by:	bde

o Move a comment up in <machine/endian.h> that was accidentially moved
  down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
  byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
  functions, so that the non-GCC (libc asm) case has proper
  prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
  _BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
  for platforms in which asm versions don't exist.  This significantly
  reduces the complexity of some things at the cost of duplicate code.

Reviewed by:	bde
2002-03-09 21:02:16 +00:00
ume
3d5b174433 - Set inc_isipv6 in tcp6_usr_connect().
- When making a pcb from a sync cache, do not forget to copy inc_isipv6.

Obtained from:	KAME
MFC After:	1 week
2002-02-28 17:11:10 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
cjc
822f4e8381 Change the wording of the inline comments from the previous commit.
Objection from:	ru
2002-02-27 13:52:06 +00:00
alfred
943268c4b5 More IPV6 const fixes. 2002-02-27 05:11:50 +00:00
dd
c8a6bd9922 Introduce a version field to `struct xucred' in place of one of the
spares (the size of the field was changed from u_short to u_int to
reflect what it really ends up being).  Accordingly, change users of
xucred to set and check this field as appropriate.  In the kernel,
this is being done inside the new cru2x() routine which takes a
`struct ucred' and fills out a `struct xucred' according to the
former.  This also has the pleasant sideaffect of removing some
duplicate code.

Reviewed by:	rwatson
2002-02-27 04:45:37 +00:00
brooks
b1c3d8a603 Staticize an extern that no one else used. 2002-02-26 18:24:00 +00:00
jedgar
ecdaec0ea7 Enforce inbound IPsec SPD
Reviewed by:	fenner
2002-02-26 02:11:13 +00:00
alfred
96af38570e Document what inpcb->inp_vflag is for.
Submitted by: Marco Molteni <molter@tin.it>
2002-02-25 09:41:43 +00:00
cjc
8b28692f71 The TCP code did not do sufficient checks on whether incoming packets
were destined for a broadcast IP address. All TCP packets with a
broadcast destination must be ignored. The system only ignored packets
that were _link-layer_ broadcasts or multicast. We need to check the
IP address too since it is quite possible for a broadcast IP address
to come in with a unicast link-layer address.

Note that the check existed prior to CSRG revision 7.35, but was
removed. This commit effectively backs out that nine-year-old change.

PR:		misc/35022
2002-02-25 08:29:21 +00:00