freebsd-skq

Author	SHA1	Message	Date
rwatson	ea70a3542d	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
rwatson	3c02410d55	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks	2009-06-22 10:59:34 +00:00
rdivacky	9992cf9aeb	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
bz	b7ff2bdc20	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
zec	8b1f38241a	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
rwatson	f4934662e5	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
luigi	1384b62c4b	More cleanup in preparation of ipfw relocation (no actual code change): + move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c; + remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly; + remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly; + move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it; To be merged together with rev 193497 MFC after: 5 days	2009-06-05 13:44:30 +00:00
pjd	5243d2d206	- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it). Discussed with: julian, adrian, jhb, rwatson, kmacy	2009-06-01 10:30:00 +00:00
zec	39b6dc8ba2	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
rwatson	692f8aa2fa	Update stats in struct ipstat using four new macros, IPSTAT_ADD(), IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-11 23:35:20 +00:00
bms	71233409ea	Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070.	2009-03-09 17:53:05 +00:00
bz	df2be82cec	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
jamie	12bbe1869f	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
bz	6dddd78341	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
kmacy	9be993627e	check pointer against NULL add new line after declaration for style	2008-12-16 03:18:59 +00:00
zec	7b573d1496	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
bz	604d89458a	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
bz	d2730d5b27	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
julian	cf07f793f2	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
zec	815d52c5df	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
bz	4d4d2d367d	Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)	2008-10-17 16:26:16 +00:00
bz	c3b991ea35	Remove an INP_RUNLOCK() missed in SVN r183606, cvs rev. 1.195 raw_ip.c when transitioning from so_cred to inp_cred. MFC after: 6 weeks	2008-10-04 16:48:09 +00:00
bz	77f80e0672	Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred	2008-10-04 15:06:34 +00:00
zec	8797d4caec	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
julian	0592958505	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
bz	1021d43b56	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
mav	ac42a1aad0	Some style and assertion fixes to the previous commits hinted by rwatson. There is no functional changes.	2008-07-28 06:57:28 +00:00
mav	0023432ccb	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock. This relaxed locking noticably increses receive socket lookup performance.	2008-07-26 21:12:00 +00:00
mav	4e8bcc24da	Add hash table lookup for a fully connected raw sockets. This gives significant performance improvements when many raw sockets used. Benchmarks of mpd handeling 1000 simultaneous PPTP connections show up to 50% performance boost. With higher number of connections benefit becomes even bigger. PopTop snd others should also get some benefits.	2008-07-26 17:32:15 +00:00
rwatson	4efb6d2a37	Eliminate use of the global ripsrc which was being used to pass address information from rip_input() to rip_append(). Instead, pass the source address for an IP datagram to rip_append() using a stack-allocated sockaddr_in, similar to udp_input() and udp_append(). Prior to the move to rwlocks for inpcbinfo, this was not a problem, as use of the global was synchronized using the ripcbinfo mutex, but with read-locking there is the potential for a race during concurrent receive. This problem is not present in the IPv6 raw IP socket code, which already used a stack variable for the address. Spotted by: mav MFC after: 1 week (before inpcbinfo rwlock changes)	2008-07-18 10:47:07 +00:00
rwatson	6f9bd693c0	Rename raw_append() to rip_append(): the raw_ prefix is generally used for functions in the generic raw socket library (raw_cb.c, raw_usrreq.c), and they are not used for IPv4 raw sockets. MFC after: 3 days	2008-07-05 18:55:03 +00:00
rwatson	6ee57a292b	Improve approximation of style(9) in raw socket code.	2008-07-05 18:03:39 +00:00
rwatson	fcb43193c1	Consistently check IPFW and DUMMYNET privileges in the configuration routines for those modules, rather than in the raw socket code. This each privilege check to occur in exactly once place and avoids duplicate checks across layers. MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.	2008-05-22 08:10:31 +00:00
julian	1dfc5c98a4	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
rwatson	9ee84cddef	With IPv4 raw sockets, read lock rather than write lock the inpcb when receiving or transmitting. With IPv6 raw sockets, read lock rather than write lock the inpcb when receiving. Unfortunately, IPv6 source address selection appears to require a write lock on the inpcb for the time being. MFC after: 3 months	2008-04-21 12:06:41 +00:00
rwatson	ca47fccd6b	Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)	2008-04-17 21:38:18 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
silby	f965c7bdc4	Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)	2007-10-07 20:44:24 +00:00
gnn	aeca69ded5	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
rwatson	bb6f1c3d9b	Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9) comments. Approved by: re (bmah)	2007-07-02 15:44:30 +00:00
gnn	0cd74db89b	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00
rwatson	00b02345d4	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
rwatson	47d37a80be	Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.	2007-05-11 10:20:51 +00:00
rwatson	46a4c44c3b	Remove unneeded wrappers for in_setsockaddr() and in_setpeeraddr(), which used to exist so pcbinfo locks could be acquired, but are no longer required as a result of socket/pcb reference model refinements.	2007-05-11 09:54:53 +00:00
rwatson	a25f94b5ae	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.	2007-05-10 15:58:48 +00:00
rwatson	a9656f2df2	Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().	2007-05-01 16:31:02 +00:00
rwatson	c27ef03414	Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.	2007-04-30 23:12:05 +00:00
bms	a243534dde	Increase default size of raw IP send and receive buffers to the same as udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB) are unnecessarily fragmented. A common use case for this is OSPF link-state database synchronization during adjacency bringup on a high speed network with a large MTU. It is not possible to auto-tune this setting until a socket is bound to a given interface, and because the laddr part of the inpcb tuple may be overridden, it makes no sense to do so. Applications may request a larger socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options. Certain applications such as Quagga ospfd do not probe for interface MTU and therefore do not increase SO_SENDBUF in this use case. XORP is not affected by this problem as it preemptively uses SO_SENDBUF and SO_RECVBUF to account for any possible additional latency in XRL IPC. PR: kern/108375 Requested by: Vladimir Ivanov MFC after: 1 week	2007-03-20 13:15:20 +00:00
piso	0db606a3b1	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
jhb	9adb288460	Some whitespace nits and remove a few casts.	2006-12-29 14:58:18 +00:00

1 2 3 4 5

216 Commits