freebsd-skq

Author	SHA1	Message	Date
glebius	25a76c3618	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
glebius	6d8cc191f9	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
markj	5c563658ea	Plug some networking sysctl leaks. Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301	2018-11-22 20:49:41 +00:00
andrew	a6605d2938	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
mmacy	14de8a2820	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
mmacy	c7b15ce781	inpcb: don't gratuitously defer frees Don't defer frees in sysctl handlers. It isn't necessary and it just confuses things. revert: r333911, r334104, and r334125 Requested by: jtl	2018-07-02 05:19:44 +00:00
mmacy	1cbc14be82	mechanical CK macro conversion of inpcbinfo lists This is a dependency for converting the inpcbinfo hash and info rlocks to epoch.	2018-06-12 22:18:20 +00:00
mmacy	9c1b4ad579	convert allocations to INVARIANTS M_ZERO	2018-05-24 01:04:56 +00:00
mmacy	ecd6e9d307	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
mmacy	4a166d604d	epoch: allow for conditionally asserting that the epoch context fields are unused by zeroing on INVARIANTS builds	2018-05-23 17:00:05 +00:00
mmacy	fe7765802e	inpcb: consolidate possible deletion in pcblist functions in to epoch deferred context.	2018-05-20 02:27:58 +00:00
mmacy	7aeac9ef18	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
ae	9c4008f018	Set M_BCAST and M_MCAST flags on mbuf sent via divert socket. r290383 has changed how mbufs sent by divert socket are handled. Previously they are always handled by slow path processing in ip_input(). Now ip_tryforward() is invoked from ip_input() before in_broadcast() check. Since diverted packet lost all mbuf flags, it passes the broadcast check in ip_tryforward() due to missing M_BCAST flag. In the result the broadcast packet is forwarded to the wire instead of be consumed by network stack. Add in_broadcast() check to the div_output() function. And restore the M_BCAST flag if destination address is broadcast for the given network interface. PR: 209491 MFC after: 1 week	2017-05-17 09:04:09 +00:00
glebius	19fea9c3e0	Reduce in_pcbinfo_init() by two params. No users supply any flags to this function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones.	2017-05-15 21:58:36 +00:00
glebius	3a5c9aaf2b	Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL \|\| _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018	2017-03-21 06:39:49 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
bz	fac944a70a	The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652	2016-06-01 10:14:04 +00:00
pfg	f749d852b1	ipdivert: Remove unnecessary and incorrectly typed variable. In principle n is only used to carry a copy of ipi_count, which is unsigned, in the non-VIMAGE case, however ipi_count can be used directly so it is not needed at all. Removing it makes things look cleaner.	2016-04-28 02:46:08 +00:00
pfg	fe6332dab0	netinet: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae. tuexen	2016-04-15 15:46:41 +00:00
bz	7d6c3b6b6e	Mfp: r296346 No reason identified to keep UMA_ZONE_NOFREE here. Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5736	2016-04-09 10:39:54 +00:00
jhb	93f2095877	Update ip_divert.ko to depend on version 3 of ipfw.	2014-10-11 16:08:54 +00:00
kevlo	7727a3c215	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
glebius	f469ae1d45	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
glebius	5cc3ac5902	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
glebius	eecb11a14e	We don't need to convert ip6_len to host byte order before ip6_output(), the IPv6 stack is working in net byte order. The reason this code worked before is that ip6_output() doesn't look at ip6_plen at all and recalculates it based on mbuf length.	2012-10-15 07:57:55 +00:00
kevlo	ceb08698f2	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
kevlo	8747a46991	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
glebius	9086143e8c	After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().	2012-10-08 08:03:58 +00:00
glebius	ef52d4e591	No reason to play with IP header before calling sctp_delayed_cksum() with offset beyond the IP header.	2012-10-08 07:21:32 +00:00
bz	2e7b42f00f	Make #error messages string-literals and remove punctuation. Reported by: bde (for ip_divert) Reviewed by: bde MFC after: 3 days	2012-01-22 10:41:58 +00:00
bz	6aadb2bfe6	Fix ip_divert handling of inet and inet6 and module building some more. Properly sort the "carp" case in modules/Makefile after it was renamed. Reported by: bde (most) Reviewed by: bde MFC after: 3 days	2012-01-22 02:16:31 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
glebius	fa73fbf6a6	Add missing break; in r223593. Submitted by: sem Pointy hat to: glebius Approved by: re (kib)	2011-08-01 13:41:38 +00:00
glebius	b39e872c06	Add possibility to pass IPv6 packets to a divert(4) socket. Submitted by: sem	2011-06-27 12:21:11 +00:00
rwatson	6e29aea1db	Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-06 12:55:02 +00:00
rwatson	f2e70f34ff	IP divert sockets use their inpcbinfo for port reservation, although not for lookup. I missed its call to in_pcbbind() when preparing previous patches, which would lead to a lock assertion failure (although problem not an actual race condition due to global pcbinfo locks providing required synchronisation -- in this particular case only). This change adds the missing locking of the pcbhash lock. (Existing comments in the ipdivert code question the need for using the global hash to manage the namespace, as really it's a simple port namespace and not an address/port namespace. Also, although in_pcbbind is used to manage reservations, the hash tables aren't used for lookup. It might be a good idea to make them use hashed lookup, or to use a different reservation scheme.) Reviewed by: bz Reported by: Kristof Provost <kristof at sigsegv.be> Sponsored by: Juniper Networks	2011-06-04 16:26:02 +00:00
rwatson	fdfdadb612	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-30 09:43:55 +00:00
mdf	0086ff603f	Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.	2011-01-18 21:14:13 +00:00
dim	fb307d7d1d	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
dim	fda4020a88	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
jhb	d8e2e27165	Ensure a minimum "slop" of 10 extra pcb structures when providing a memory size estimate to userland for pcb list sysctls. The previous behavior of a "slop" of n/8 does not work well for small values of n (e.g. no slop at all if you have less than 8 open UDP connections). Reviewed by: bz MFC after: 1 week	2010-08-17 16:41:16 +00:00
bz	d9875d4fd4	Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson MFC after: 7 days	2010-03-17 18:28:27 +00:00
rwatson	1fdd3bccc0	Abstract out initialization of most aspects of struct inpcbinfo from their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy() to do this work in a central spot. As inpcbinfo becomes more complex due to ongoing work to add connection groups, this will reduce code duplication. MFC after: 1 month Reviewed by: bz Sponsored by: Juniper Networks	2010-03-14 18:59:11 +00:00
rrs	5db64758fc	The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks	2010-03-12 22:58:52 +00:00
rwatson	c25f1494fd	Remove unnecessary locking of divcbinfo lock from div_output(): this has not been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was guaranteed to be stable when a valid socket reference is held (as it is in the output path). MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks	2010-03-06 22:04:45 +00:00
luigi	db333db4e6	Following up on a request from Ermal Luci to make ip_divert work as a client of pf(4), make ip_divert not depend on ipfw. This is achieved by moving to ip_var.h the struct ipfw_rule_ref (which is part of the mtag for all reinjected packets) and other declarations of global variables, and moving to raw_ip.c global variables for filter and divert hooks. Note that names and locations could be made more generic (ipfw_rule_ref is really a generic reference robust to reconfigurations; the packet filter is not necessarily ipfw; filters and their clients are not necessarily limited to ipv4), but _right now_ most of this stuff works on ipfw and ipv4, so i don't feel like doing a gratuitous renaming, at least for the time being.	2010-01-07 10:39:15 +00:00
luigi	40024ff7c3	Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler. - make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place. - use a macro FREE_PKT to dispose of mbufs. This eases portability. On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things. Overall the code shrinks a bit and is hopefully more readable. I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).	2010-01-04 19:01:22 +00:00
luigi	c4e6c7a490	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00

1 2 3 4 5

207 Commits