freebsd-dev

Author	SHA1	Message	Date
Paul Saab	4f590175b7	Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.	2006-04-21 09:25:40 +00:00
Robert Watson	a34f6c1e1d	Correct incorrect assertion in div_bind(): inp must not be NULL here. Reported by: tegge MFC after: 3 months	2006-04-03 09:01:17 +00:00
Robert Watson	14ba8add01	Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months	2006-04-01 16:20:54 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Ruslan Ermilov	303989a2f3	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.	2005-11-09 13:29:16 +00:00
Gleb Smirnoff	b3cf6808ce	In div_output() explicitly set m->m_nextpkt to NULL. If divert socket is not userland, but ng_ksocket, then m->m_nextpkt may be non-NULL. In this case we would panic in sbappend.	2005-05-13 11:44:37 +00:00
Colin Percival	fd94099ec2	If we are going to 1. Copy a NULL-terminated string into a fixed-length buffer, and 2. copyout that buffer to userland, we really ought to 0. Zero the entire buffer first. Security: FreeBSD-SA-05:08.kmem	2005-05-06 02:50:00 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Gleb Smirnoff	c1384b5ae2	- Since divert protocol is not connection oriented, remove SS_ISCONNECTED flag from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now. Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month	2004-11-18 13:49:18 +00:00
Gleb Smirnoff	ea0bd57615	Fix ng_ksocket(4) operation as a divert socket, which is pretty useful and has been broken twice: - in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2] A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly. Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week	2004-11-12 22:17:42 +00:00
Poul-Henning Kamp	e21e4c19c9	Add missing '=' Spotted by: obrien	2004-11-11 19:02:01 +00:00
Poul-Henning Kamp	756d52a195	Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.	2004-11-08 14:44:54 +00:00
Andre Oppermann	84bb6a2e75	IPDIVERT is a module now and tell the other parts of the kernel about it. IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.	2004-10-25 20:02:34 +00:00
Andre Oppermann	24fc79b0a4	Refuse to unload the ipdivert module unless the 'force' flag is given to kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.	2004-10-22 19:12:01 +00:00
Andre Oppermann	57bbe2e1ab	Destroy the UMA zone on unload.	2004-10-19 22:51:20 +00:00
Andre Oppermann	2de1a9eb6e	Slightly extend the locking during unload to fully cover the protocol deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.	2004-10-19 22:08:13 +00:00
Robert Watson	279128e295	Annotate a newly introduced race present due to the unloading of protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.	2004-10-19 21:35:42 +00:00
Andre Oppermann	72584fd2c0	Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.	2004-10-19 21:14:57 +00:00
Brian Feldman	6daf7ebd28	Add support to IPFW for classification based on "diverted" status (that is, input via a divert socket).	2004-10-03 00:26:35 +00:00
John-Mark Gurney	b5d47ff592	fix up socket/ip layer violation... don't assume/know that SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...	2004-09-05 02:34:12 +00:00
Andre Oppermann	9b932e9e04	Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in\|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)	2004-08-17 22:05:54 +00:00
Andre Oppermann	420a281164	Backout removal of UMA_ZONE_NOFREE flag for all zones which are established for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging. Discussed with: bosko, tegge, rwatson	2004-08-11 20:30:08 +00:00
Andre Oppermann	4efb805c0c	Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.	2004-08-11 17:08:31 +00:00
Andre Oppermann	f0cada84b1	o Move all parts of the IP reassembly process into the function ip_reass() to make it fully self-contained. o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len including the IP header. o Computation of the delayed checksum is moved into divert_packet(). Reviewed by: silby	2004-08-03 12:31:38 +00:00
Poul-Henning Kamp	e3e244bff6	Rwatson, write 100 times for tomorrow: First unlock, then assign NULL to pointer.	2004-06-27 21:54:34 +00:00
Robert Watson	1e4d7da707	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
Robert Watson	bb7479a613	Acquire socket lock around frobbing of socket state in divert sockets.	2004-06-22 04:00:51 +00:00
Robert Watson	ffcbc0e4c5	Prefer use of the inpcb as a MAC label source for outgoing packets sent via divert sockets, when available.	2004-06-22 03:58:50 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Robert Watson	c1d587c848	Remove unneeded Giant acquisition in divert_packet(), which is left over from debug.mpsafenet affecting only the forwarding plane. Giant is now acquired in the ithread/netisr or in the system call code.	2004-06-11 04:06:51 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Pawel Jakub Dawidek	b0330ed929	Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume	2004-03-27 21:05:46 +00:00
Pawel Jakub Dawidek	6823b82399	Remove unused argument. Reviewed by: ume	2004-03-27 20:41:32 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Max Laier	ac9d7e2618	Re-remove MT_TAGs. The problems with dummynet have been fixed now. Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam	2004-02-25 19:55:29 +00:00
Max Laier	36e8826ffb	Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place. Approved by: bms(mentor)	2004-02-18 00:04:52 +00:00
Max Laier	1094bdca51	This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson	2004-02-13 19:14:16 +00:00
Sam Leffler	5bd311a566	Split the "inp" mutex class into separate classes for each of divert, raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)	2003-11-26 01:40:44 +00:00
Andre Oppermann	97d8d152c2	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 20:07:39 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Brian Feldman	633461295a	Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but do not have mh_nextpkt initialized. Somtimes what's there is "1", and the ip_input() code pukes trying to m_free() it, rendering divert sockets and such broken. This really underscores the need to get rid of MT_TAG. Reviewed by: rwatson	2003-11-17 03:17:49 +00:00
Sam Leffler	252f24a2cf	divert socket fixups: o pickup Giant in divert_packet to protect sbappendaddr since it can be entered through MPSAFE callouts or through ip_input when mpsafenet is 1 o add missing locking on output o add locking to abort and shutdown o add a ctlinput handler to invalidate held routing table references on an ICMP redirect (may not be needed) Supported by: FreeBSD Foundation	2003-11-08 23:09:42 +00:00
Brooks Davis	9bf40ede4a	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)	2003-10-31 18:32:15 +00:00
Sam Leffler	26f91065e7	o add locking o move the global divsrc socket address to a local variable instead of locking it Sponsored by: FreeBSD Foundation	2003-09-05 00:00:51 +00:00
Dag-Erling Smørgrav	fe58453891	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-04-08 14:25:47 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Poul-Henning Kamp	4ee6e70ef3	Check bounds for index before dereferencing memory past end of array. Found by: FlexeLint	2003-01-28 22:44:12 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Bill Fenner	4d3ffc9841	Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers. This allows socket() to return an error when the kernel is not built with IPDIVERT, and doesn't prevent future applications from using the "borrowed" IP protocol number. The sysctl net.inet.raw.olddiverterror controls whether opening a socket with the "borrowed" IP protocol fails with an accompanying kernel printf; this code should last only a couple of releases. Approved by: re	2002-10-29 16:46:13 +00:00
Maxime Henrion	56e77afa59	Fix kernel build on sparc64 in the IPDIVERT case.	2002-10-24 09:58:50 +00:00
Sam Leffler	5d84645305	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month	2002-10-16 01:54:46 +00:00
Robert Watson	d3990b06e1	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs	2002-07-31 16:42:47 +00:00
Luigi Rizzo	a5924d6100	fix a typo in a comment	2002-06-23 09:13:46 +00:00
Luigi Rizzo	2b25acc158	Remove (almost all) global variables that were used to hold packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days	2002-06-22 11:51:02 +00:00
Jeffrey Hsu	7a9378e7f5	Remember to initialize the control block head mutex.	2002-06-11 10:58:57 +00:00
Jeffrey Hsu	3d9baf34c0	Fix typo. Submitted by: Kyunghwan Kim <redjade@atropos.snu.ac.kr>	2002-06-11 10:56:49 +00:00
Jeffrey Hsu	f76fcf6d4c	Lock up inpcb. Submitted by: Jennifer Yang <yangjihui@yahoo.com>	2002-06-10 20:05:46 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00
Seigo Tanimura	243917fe3b	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred	2002-05-20 05:41:09 +00:00
Seigo Tanimura	960ed29c4b	Revert the change of #includes in sys/filedesc.h and sys/socketvar.h. Requested by: bde Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h. While I am here, sort include files alphabetically, where possible.	2002-04-30 01:54:54 +00:00
John Baldwin	ad278afdf0	Change the first argument of prison_xinpcb() to be a thread pointer instead of a proc pointer so that prison_xinpcb() can use td_ucred.	2002-04-09 20:04:10 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Jeff Roberson	69c2d429c1	Switch vm_zone.h with uma.h. Change over to uma interfaces.	2002-03-20 05:48:55 +00:00
Mike Barcroft	fd8e4ebc8c	o Move NTOHL() and associated macros into <sys/param.h>. These are deprecated in favor of the POSIX-defined lowercase variants. o Change all occurrences of NTOHL() and associated marcros in the source tree to use the lowercase function variants. o Add missing license bits to sparc64's <machine/endian.h>. Approved by: jake o Clean up <machine/endian.h> files. o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>. o Remove prototypes for non-existent bswapXX() functions. o Include <machine/endian.h> in <arpa/inet.h> to define the POSIX-required ntohl() family of functions. o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>, and <sys/param.h>. o Prepend underscores to the ntohl() family to help deal with complexities associated with having MD (asm and inline) versions, and having to prevent exposure of these functions in other headers that happen to make use of endian-specific defines. o Create weak aliases to the canonical function name to help deal with third-party software forgetting to include an appropriate header. o Remove some now unneeded pollution from <sys/types.h>. o Add missing <arpa/inet.h> includes in userland. Tested on: alpha, i386 Reviewed by: bde, jake, tmm	2002-02-18 20:35:27 +00:00
David E. O'Brien	6e551fb628	Update to C99, s/__FUNCTION__/__func__/, also don't use ANSI string concatenation.	2001-12-10 08:09:49 +00:00
Robert Watson	ce17880650	o Replace reference to 'struct proc' with 'struct thread' in 'struct sysctl_req', which describes in-progress sysctl requests. This permits sysctl handlers to have access to the current thread, permitting work on implementing td->td_ucred, migration of suser() to using struct thread to derive the appropriate ucred, and allowing struct thread to be passed down to other code, such as network code where td is not currently available (and curproc is used). o Note: netncp and netsmb are not updated to reflect this change, as they are not currently KSE-adapted. Reviewed by: julian Obtained from: TrustedBSD Project	2001-11-08 02:13:18 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Julian Elischer	f0ffb944d2	Patches from Keiichi SHIMA <keiichi@iij.ad.jp> to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.	2001-09-03 20:03:55 +00:00
Hajimu UMEMOTO	13cf67f317	move ipsec security policy allocation into in_pcballoc, before making pcbs available to the outside world. otherwise, we will see inpcb without ipsec security policy attached (-> panic() in ipsec.c). Obtained from: KAME MFC after: 3 days	2001-07-26 19:19:49 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Poul-Henning Kamp	cf9fa8e725	Move suser() and suser_xxx() prototypes and a related #define from <sys/proc.h> to <sys/systm.h>. Correctly document the #includes needed in the manpage. Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.	2000-10-29 16:06:56 +00:00
Ruslan Ermilov	e30177e024	Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time. Requested by: wollman	2000-09-14 14:42:04 +00:00
Ruslan Ermilov	04287599db	Fixed broken ICMP error generation, unified conversion of IP header fields between host and network byte order. The details: o icmp_error() now does not add IP header length. This fixes the problem when icmp_error() is called from ip_forward(). In this case the ip_len of the original IP datagram returned with ICMP error was wrong. o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host byte order, so DTRT and convert these fields back to network byte order before sending a message. This fixes the problem described in PR 16240 and PR 20877 (ip_id field was returned in host byte order). o ip_ttl decrement operation in ip_forward() was moved down to make sure that it does not corrupt the copy of original IP datagram passed later to icmp_error(). o A copy of original IP datagram in ip_forward() was made a read-write, independent copy. This fixes the problem I first reported to Garrett Wollman and Bill Fenner and later put in audit trail of PR 16240: ip_output() (not always) converts fields of original datagram to network byte order, but because copy (mcopy) and its original (m) most likely share the same mbuf cluster, ip_output()'s manipulations on original also corrupted the copy. o ip_output() now expects all three fields, ip_len, ip_off and (what is significant) ip_id in host byte order. It was a headache for years that ip_id was handled differently. The only compatibility issue here is the raw IP socket interface with IP_HDRINCL socket option set and a non-zero ip_id field, but ip.4 manual page was unclear on whether in this case ip_id field should be in host or network byte order.	2000-09-01 12:33:03 +00:00
Ruslan Ermilov	3e065e76ac	Fixed the bug that div_bind() always returned zero even if there was an error (broken in rev 1.9).	2000-08-30 14:43:02 +00:00
Ruslan Ermilov	cec335f937	Make netstat(1) to be aware of divert(4) sockets.	2000-08-03 14:09:52 +00:00
Paul Richards	7a04c4f85a	Force the address of the socket to be INADDR_ANY immediately before calling in_pcbbind so that in_pcbbind sees a valid address if no address was specified (since divert sockets ignore them). PR: 17552 Reviewed by: Brian	2000-05-02 23:53:46 +00:00
Yoshinobu Inoue	0ba9128b0c	prevent kernel panic which happens when either of IPSEC and IPDIVERT is enabled. Confirmed by: Eugene M. Kim <ab@astralblue.com>	2000-01-08 12:53:48 +00:00
Yoshinobu Inoue	6a800098cc	IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project	1999-12-22 19:13:38 +00:00
Archie Cobbs	8948e4ba8e	Miscellaneous fixes/cleanups relating to ipfw and divert(4): - Implement 'ipfw tee' (finally) - Divert packets by calling new function divert_packet() directly instead of going through protosw[]. - Replace kludgey global variable 'ip_divert_port' with a function parameter to divert_packet() - Replace kludgey global variable 'frag_divert_port' with a function parameter to ip_reass() - style(9) fixes Reviewed by: julian, green	1999-12-06 00:43:07 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	f711d546d2	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.	1999-04-27 11:18:52 +00:00
Julian Elischer	a0c091ad1d	remove leftover garbage line.	1999-02-08 05:53:39 +00:00
Julian Elischer	b0935ca284	Fix for PR 9309. Divert was not feeding clean data to ifa_ifwithaddr() so it was giving bad results. Submitted by: kseel <kseel@utcorp.com>, Ruslan Ermilov <ru@ucb.crimea.ua>	1999-02-08 05:48:46 +00:00
Archie Cobbs	2127f26023	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>	1998-12-04 22:54:57 +00:00
Julian Elischer	efe39c6a1a	Bring back some slight cleanups from 2.2	1998-07-06 09:06:58 +00:00
Julian Elischer	7d82bea53d	Remove out of date comment.	1998-07-02 06:31:25 +00:00
Julian Elischer	b3adeeb209	Remove the option to keep IPFW diversion backwards compatible WRT diversion reinjection. No-one has been bitten by the new behaviour that I know of.	1998-07-02 05:49:12 +00:00
Julian Elischer	0cab75363d	include opt_ipdivert.h so we get correct options	1998-06-12 02:48:47 +00:00
Julian Elischer	bab04eb816	Allow diverted packets from the transmit side to remember if they had a recv interface and allow that state to be available after re-injection for further tests.	1998-06-12 01:54:29 +00:00
Julian Elischer	3ed81d03b3	Fix wrong data type for a pointer.	1998-06-06 20:45:28 +00:00
Julian Elischer	c977d4c735	clean up the changes made to ipfw over the last weeks (should make the ipfw lkm work again)	1998-06-06 19:39:10 +00:00
Julian Elischer	e256a933a8	Reverse the default sense of the IPFW/DIVERT reinjection code so that the new behaviour is now default. Solves the "infinite loop in diversion" problem when more than one diversion is active. Man page changes follow. The new code is in -stable as the NON default option.	1998-06-05 22:40:01 +00:00
Julian Elischer	bb60f459a0	Add optional code to change the way that divert and ipfw work together. Prior to this change, Accidental recursion protection was done by the diverted daemon feeding back the divert port number it got the packet on, as the port number on a sendto(). IPFW knew not to redivert a packet to this port (again). Processing of the ruleset started at the beginning again, skipping that divert port. The new semantic (which is how we should have done it the first time) is that the port number in the sendto() is the rule number AFTER which processing should restart, and on a recvfrom(), the port number is the rule number which caused the diversion. This is much more flexible, and also more intuitive. If the user uses the same sockaddr received when resending, processing resumes at the rule number following that that caused the diversion. The user can however select to resume rule processing at any rule. (0 is restart at the beginning) To enable the new code use option IPFW_DIVERT_RESTART This should become the default as soon as people have looked at it a bit	1998-05-25 10:37:48 +00:00
Julian Elischer	436c7212e6	Hide the interface name in the sin_zero section of the sockaddr_in passed to the user process for incoming packets. When the sockaddr_in is passed back to the divert socket later, use thi sas the primary interface lookup and only revert to the IP address when the name fails. This solves a long standing bug with divert sockets: When two interfaces had the same address (P2P for example) the interface "assigned" to the reinjected packet was sometimes incorect. Probably we should define a "sockaddr_div" to officially hold this extended information in teh same manner as sockaddr_dl.	1998-05-25 08:44:31 +00:00
Julian Elischer	25e75fb320	Take the user's "IGNORE_DIVERT" argument from where the user put it and not from the PCB which HAPPENS to contain the same number most of the time, but not always.	1998-05-25 07:41:23 +00:00
Garrett Wollman	98271db4d5	Convert socket structures to be type-stable and add a version number. Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs. Convert PF_LOCAL PCB structures to be type-stable and add a version number. Define an external format for infomation about socket structures and use it in several places. Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines. It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).	1998-05-15 20:11:40 +00:00
Bruce Evans	8781d8e928	Fixed style bugs (mostly) in previous commit.	1998-03-28 10:18:26 +00:00
Garrett Wollman	3d4d47f398	Use the zone allocator to allocate inpcbs and tcpcbs. Each protocol creates its own zone; this is used particularly by TCP which allocates both inpcb and tcpcb in a single allocation. (Some hackery ensures that the tcpcb is reasonably aligned.) Also keep track of the number of pcbs of each type allocated, and keep a generation count (instance version number) for future use.	1998-03-24 18:06:34 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00

1 2 3 4

168 Commits