freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	1830dae3d3	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
Kristof Provost	f8e7fe32a4	pf: Fix DIOCGETSRCNODES r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the number of source tracking nodes. This meant that we never copied the information to userspace, leading to '? -> ?' output from pfctl. PR: 236368 MFC after: 1 week	2019-03-08 09:33:16 +00:00
Kristof Provost	6f4909de5f	pf: IPv6 fragments with malformed extension headers could be erroneously passed by pf or cause a panic We mistakenly used the extoff value from the last packet to patch the next_header field. If a malicious host sends a chain of fragmented packets where the first packet and the final packet have different lengths or number of extension headers we'd patch the next_header at the wrong offset. This can potentially lead to panics or rule bypasses. Security: CVE-2019-5597 Obtained from: OpenBSD Reported by: Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv	2019-03-01 07:37:45 +00:00
Kristof Provost	22c58991e3	pf: Small performance tweak Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. From the man page: The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states. This handling of adaptive timeouts is done in pf_state_expires(). The calculation needs three values: start, end and states. 1. Normal rules "pass .." without adaptive setting meaning "start = 0" runs in the else-section and therefore takes "start" and "end" from the global default settings and sets "states" to pf_status.states (= total number of states). 2. Special rules like "pass .. keep state (adaptive.start 500 adaptive.end 1000)" have start != 0, run in the if-section and take "start" and "end" from the rule and set "states" to the number of states created by their rule using counter_u64_fetch(). Thats all ok, but there is a third case without special handling in the above code snippet: 3. All "rdr/nat pass .." statements use together the pf_default_rule. Therefore we have "start != 0" in this case and we run the if-section but we better should run the else-section in this case and do not fetch the counter of the pf_default_rule but take the total number of states. Submitted by: Andreas Longwitz <longwitz@incore.de> MFC after: 2 weeks	2019-02-24 17:23:55 +00:00
Patrick Kelsey	d178fee632	Place pf_altq_get_nth_active() under the ALTQ ifdef MFC after: 1 week	2019-02-11 05:39:38 +00:00
Patrick Kelsey	8f2ac65690	Reduce the time it takes the kernel to install a new PF config containing a large number of queues In general, the time savings come from separating the active and inactive queues lists into separate interface and non-interface queue lists, and changing the rule and queue tag management from list-based to hash-bashed. In HFSC, a linear scan of the class table during each queue destroy was also eliminated. There are now two new tunables to control the hash size used for each tag set (default for each is 128): net.pf.queue_tag_hashsize net.pf.rule_tag_hashsize Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D19131	2019-02-11 05:17:31 +00:00
Gleb Smirnoff	d38ca3297c	Return PFIL_CONSUMED if packet was consumed. While here gather all the identical endings of pf_check_*() into single function. PR: 235411	2019-02-02 05:49:05 +00:00
Gleb Smirnoff	b252313f0b	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
Patrick Kelsey	59099cd385	Don't re-evaluate ALTQ kernel configuration due to events on non-ALTQ interfaces Re-evaluating the ALTQ kernel configuration can be expensive, particularly when there are a large number (hundreds or thousands) of queues, and is wholly unnecessary in response to events on interfaces that do not support ALTQ as such interfaces cannot be part of an ALTQ configuration. Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18918	2019-01-28 20:26:09 +00:00
Kristof Provost	d9d146e67b	pf: Fix use-after-free of counters When cleaning up a vnet we free the counters in V_pf_default_rule and V_pf_status from shutdown_pf(), but we can still use them later, for example through pf_purge_expired_src_nodes(). Free them as the very last operation, as they rely on nothing else themselves. PR: 235097 MFC after: 1 week	2019-01-25 01:06:06 +00:00
Kristof Provost	180b0dcbbb	pf: Validate psn_len in DIOCGETSRCNODES psn_len is controlled by user space, but we allocated memory based on it. Check how much memory we might need at most (i.e. how many source nodes we have) and limit the allocation to that. Reported by: markj MFC after: 1 week	2019-01-22 02:13:33 +00:00
Kristof Provost	6a8ee0f715	pf: fix pfsync breaking carp Fix missing initialisation of sc_flags into a valid sync state on clone which breaks carp in pfsync. This regression was introduce by r342051. PR: 235005 Submitted by: smh@FreeBSD.org Pointy hat to: kp MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18882	2019-01-18 08:19:54 +00:00
Kristof Provost	032dff662c	pf: silence a runtime warning Sometimes, for negated tables, pf can log 'pfr_update_stats: assertion failed'. This warning does not clarify anything for users, so silence it, just as OpenBSD has. PR: 234874 MFC after: 1 week	2019-01-15 08:59:51 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Kristof Provost	336683f24f	pf: Fix endless loop on NAT exhaustion with sticky-address When we try to find a source port in pf_get_sport() it's possible that all available source ports will be in use. In that case we call pf_map_addr() to try to find a new source IP to try from. If there are no more available source IPs pf_map_addr() will return 1 and we stop trying. However, if sticky-address is set we'll always return the same IP address, even if we've already tried that one. We need to check the supplied address, because if that's the one we'd set it means pf_get_sport() has already tried it, and we should error out rather than keep trying. PR: 233867 MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18483	2018-12-12 20:15:06 +00:00
Kristof Provost	5b551954ab	pf: Prevent integer overflow in PF when calculating the adaptive timeout. Mainly states of established TCP connections would be affected resulting in immediate state removal once the number of states is bigger than adaptive.start. Disabling adaptive timeouts is a workaround to avoid this bug. Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr) Reported by: Andreas Longwitz <longwitz AT incore.de> Obtained from: OpenBSD MFC after: 2 weeks	2018-12-11 21:44:39 +00:00
Kristof Provost	4fc65bcbe3	pfsync: Performance improvement pfsync code is called for every new state, state update and state deletion in pf. While pf itself can operate on multiple states at the same time (on different cores, assuming the states hash to a different hashrow), pfsync only had a single lock. This greatly reduced throughput on multicore systems. Address this by splitting the pfsync queues into buckets, based on the state id. This ensures that updates for a given connection always end up in the same bucket, which allows pfsync to still collapse multiple updates into one, while allowing multiple cores to proceed at the same time. The number of buckets is tunable, but defaults to 2 x number of cpus. Benchmarking has shown improvement, depending on hardware and setup, from ~30% to ~100%. MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D18373	2018-12-06 19:27:15 +00:00
Kristof Provost	2b0a4ffadb	pf: add a comment describing why do we call pf_map_addr again if port selection process fails Obtained from: OpenBSD	2018-12-06 18:58:54 +00:00
Kristof Provost	b2e0b24f76	pf: Fix panic on overlapping interface names In rare situations[] it's possible for two different interfaces to have the same name. This confuses pf, because kifs are indexed by name (which is assumed to be unique). As a result we can end up trying to if_rele(NULL), which panics. Explicitly checking the ifp pointer before if_rele() prevents the panic. Note pf will likely behave in unexpected ways on the the overlapping interfaces. [] Insert an interface in a vnet jail. Rename it to an interface which exists on the host. Remove the jail. There are now two interfaces with the same name in the host.	2018-12-01 09:58:21 +00:00
Kristof Provost	87e4ca37d5	pf: Prevent tables referenced by rules in anchors from getting disabled. PR: 183198 Obtained from: OpenBSD MFC after: 2 weeks	2018-11-08 21:54:40 +00:00
Kristof Provost	58ef854f8b	pf: Fix build if INVARIANTS is not set r340061 included a number of assertions pf_frent_remove(), but these assertions were the only use of the 'prev' variable. As a result builds without INVARIANTS had an unused variable, and failed. Reported by: vangyzen@	2018-11-02 19:23:50 +00:00
Kristof Provost	14624ab582	pf: Keep a reference to struct ifnets we're using Ensure that the struct ifnet we use can't go away until we're done with it.	2018-11-02 17:05:40 +00:00
Kristof Provost	dde6e1fecb	pfsync: Add missing unlock If we fail to set up the multicast entry for pfsync and return an error we must release the pfsync lock first. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17506	2018-11-02 17:03:53 +00:00
Kristof Provost	04fe85f068	pfsync: Allow module to be unloaded MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17505	2018-11-02 17:01:18 +00:00
Kristof Provost	fbbf436d56	pfsync: Handle syncdev going away If the syncdev is removed we no longer need to clean up the multicast entry we've got set up for that device. Pass the ifnet detach event through pf to pfsync, and remove our multicast handle, and mark us as no longer having a syncdev. Note that this callback is always installed, even if the pfsync interface is disabled (and thus it's not a per-vnet callback pointer). MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17502	2018-11-02 16:57:23 +00:00
Kristof Provost	26549dfcad	pfsync: Ensure uninit is done before pf pfsync touches pf memory (for pf_state and the pfsync callback pointers), not the other way around. We need to ensure that pfsync is torn down before pf. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17501	2018-11-02 16:53:15 +00:00
Kristof Provost	5f6cf24e2d	pfsync: Make pfsync callbacks per-vnet The callbacks are installed and removed depending on the state of the pfsync device, which is per-vnet. The callbacks must also be per-vnet. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17499	2018-11-02 16:47:07 +00:00
Kristof Provost	790194cd47	pf: Limit the fragment entry queue length to 64 per bucket. So we have a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17734	2018-11-02 15:32:04 +00:00
Kristof Provost	fd2ea405e6	pf: Split the fragment reassembly queue into smaller parts Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17733	2018-11-02 15:26:51 +00:00
Kristof Provost	2b1c354ee6	pf: Count holes rather than fragments for reassembly Avoid traversing the list of fragment entris to check whether the pf(4) reassembly is complete. Instead count the holes that are created when inserting a fragment. If there are no holes left, the fragments are continuous. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17732	2018-11-02 15:23:57 +00:00
Kristof Provost	19a22ae313	Revert "pf: Limit the maximum number of fragments per packet" This reverts commit r337969. We'll handle this the OpenBSD way, in upcoming commits.	2018-11-02 15:01:59 +00:00
Kristof Provost	99eb00558a	pf: Make ':0' ignore link-local v6 addresses too When users mark an interface to not use aliases they likely also don't want to use the link-local v6 address there. PR: 201695 Submitted by: Russell Yount <Russell.Yount AT gmail.com> Differential Revision: https://reviews.freebsd.org/D17633	2018-10-28 05:32:50 +00:00
Kristof Provost	13d640d376	pf: Fix copy/paste error in IPv6 address rewriting We checked the destination address, but replaced the source address. This was fixed in OpenBSD as part of their NAT rework, which we don't want to import right now. CID: 1009561 MFC after: 3 weeks	2018-10-24 00:19:44 +00:00
Kristof Provost	73c9014569	pf: ifp can never be NULL in pfi_ifaddr_event() There's no point in the NULL check for ifp, because we'll already have dereferenced it by then. Moreover, the event will always have a valid ifp. Replace the late check with an early assertion. CID: 1357338	2018-10-23 23:15:44 +00:00
Kristof Provost	1563a27e1f	pf synproxy will do the 3WHS on behalf of the target machine, and once the 3WHS is completed, establish the backend connection. The trigger for "3WHS completed" is the reception of the first ACK. However, we should not proceed if that ACK also has RST or FIN set. PR: 197484 Obtained from: OpenBSD MFC after: 2 weeks	2018-10-20 18:37:21 +00:00
John-Mark Gurney	032d3aaa96	Significantly improve pf purge cpu usage by only taking locks when there is work to do. This reduces CPU consumption to one third on systems. This will help keep the thread CPU usage under control now that the default hash size has increased. Reviewed by: kp Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17097	2018-09-16 00:44:23 +00:00
Patrick Kelsey	249cc75fd1	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
Kristof Provost	d47023236c	pf: Limit the maximum number of fragments per packet Similar to the network stack issue fixed in r337782 pf did not limit the number of fragments per packet, which could be exploited to generate high CPU loads with a crafted series of packets. Limit each packet to no more than 64 fragments. This should be sufficient on typical networks to allow maximum-sized IP frames. This addresses the issue for both IPv4 and IPv6. MFC after: 3 days Security: CVE-2018-5391 Sponsored by: Klara Systems	2018-08-17 15:00:10 +00:00
Kristof Provost	e9ddca4a40	pf: Take the IF_ADDR_RLOCK() when iterating over the group list We did do this elsewhere in pf, but the lock was missing here. Sponsored by: Essen Hackathon	2018-08-11 16:37:55 +00:00
Kristof Provost	33b242b533	pf: Fix 'set skip on' for groups The pfi_skip_if() function sometimes caused skipping of groups to work, if the members of the group used the groupname as a name prefix. This is often the case, e.g. group lo usually contains lo0, lo1, ..., but not always. Rather than relying on the name explicitly check for group memberships. Obtained from: OpenBSD (pf_if.c,v 1.62, pf_if.c,v 1.63) Sponsored by: Essen Hackathon	2018-08-11 16:34:30 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Kristof Provost	32ece669c2	pf: Fix synproxy Synproxy was accidentally broken by r335569. The 'return (action)' must be executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP error) should only be sent if the packet was dropped (i.e. PF_DROP) and the return flag is set. PR: 229477 Submitted by: Andre Albsmeier <mail AT fbsd.e4m.org> MFC after: 1 week	2018-07-14 10:14:59 +00:00
Kristof Provost	3e603d1ffa	pf: Fix panic on vnet jail shutdown with synproxy When shutting down a vnet jail pf_shutdown() clears the remaining states, which through pf_clear_states() calls pf_unlink_state(). For synproxy states pf_unlink_state() will send a TCP RST, which eventually tries to schedule the pf swi in pf_send(). This means we can't remove the software interrupt until after pf_shutdown(). MFC after: 1 week	2018-07-14 09:11:32 +00:00
Will Andrews	cc535c95ca	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Kristof Provost	de210decd1	pfsync: Fix state sync during initial bulk update States learned via pfsync from a peer with the same ruleset checksum were not getting assigned to rules like they should because pfsync_in_upd() wasn't passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import. PR: 229092 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Obtained from: OpenBSD MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-30 12:51:08 +00:00
Kristof Provost	150182e309	pf: Support "return" statements in passing rules when they fail. Normally pf rules are expected to do one of two things: pass the traffic or block it. Blocking can be silent - "drop", or loud - "return", "return-rst", "return-icmp". Yet there is a 3rd category of traffic passing through pf: Packets matching a "pass" rule but when applying the rule fails. This happens when redirection table is empty or when src node or state creation fails. Such rules always fail silently without notifying the sender. Allow users to configure this behaviour too, so that pf returns an error packet in these cases. PR: 226850 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-22 21:59:30 +00:00
Kristof Provost	0b799353d8	pf: Fix deadlock with route-to If a locally generated packet is routed (with route-to/reply-to/dup-to) out of a different interface it's passed through the firewall again. This meant we lost the inp pointer and if we required the pointer (e.g. for user ID matching) we'd deadlock trying to acquire an inp lock we've already got. Pass the inp pointer along with pf_route()/pf_route6(). PR: 228782 MFC after: 1 week	2018-06-09 14:17:06 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Sean Bruno	2695c9c109	Retire ixgb(4) This driver was for an early and uncommon legacy PCI 10GbE for a single ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family. Submitted by: kbowling Reviewed by: brooks imp jeffrey.e.pieper@intel.com Relnotes: yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15234	2018-05-02 15:59:15 +00:00
Kristof Provost	c41420d5dc	pf: limit ioctl to a reasonable and tuneable number of elements pf ioctls frequently take a variable number of elements as argument. This can potentially allow users to request very large allocations. These will fail, but even a failing M_NOWAIT might tie up resources and result in concurrent M_WAITOK allocations entering vm_wait and inducing reclamation of caches. Limit these ioctls to what should be a reasonable value, but allow users to tune it should they need to. Differential Revision: https://reviews.freebsd.org/D15018	2018-04-11 11:43:12 +00:00
Kristof Provost	1a125a2f7f	pf: Improve ioctl validation Ensure that multiplications for memory allocations cannot overflow, and that we'll not try to allocate M_WAITOK for potentially overly large allocations. MFC after: 1 week	2018-04-06 19:36:35 +00:00
Kristof Provost	02214ac854	pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. There's no obvious limit to the request size for these, so we limit the requests to something which won't overflow. Change the memory allocation to M_NOWAIT so excessive requests will fail rather than stall forever. MFC after: 1 week	2018-04-06 19:20:45 +00:00
Kristof Provost	adfe2f6aff	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
Kristof Provost	8748b499c1	pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of tables at a time, and as such try to allocate <number of tables> * sizeof(struct pfr_table). This multiplication can overflow. Thanks to mallocarray() this is not exploitable, but an overflow does panic the system. Arbitrarily limit this to 65535 tables. pfctl only ever processes one table at a time, so it presents no issues there. MFC after: 1 week	2018-04-06 15:01:45 +00:00
Brooks Davis	541d96aaaf	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
Kristof Provost	effaab8861	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
Kristof Provost	b4b8fa3387	pf: Fix memory leak in DIOCRADDTABLES If a user attempts to add two tables with the same name the duplicate table will not be added, but we forgot to free the duplicate table, leaking memory. Ensure we free the duplicate table in the error path. Reported by: Coverity CID: 1382111 MFC after: 3 weeks	2018-03-19 21:13:25 +00:00
Kristof Provost	bf56a3fe47	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
Kristof Provost	c201b5644d	pf: Avoid warning without INVARIANTS When INVARIANTS is not set the 'last' variable is not used, which can generate compiler warnings. If this invariant is ever violated it'd result in a KASSERT failure in refcount_release(), so this one is not strictly required.	2018-02-01 07:52:06 +00:00
Kristof Provost	6701c43213	pf: States have at least two references pf_unlink_state() releases a reference to the state without checking if this is the last reference. It can't be, because pf_state_insert() initialises it to two. KASSERT() that this is always the case. CID: 1347140	2018-01-24 04:29:16 +00:00
Kristof Provost	6273ba66f2	pf: Avoid integer overflow issues by using mallocarray() iso. malloc() pfioctl() handles several ioctl that takes variable length input, these include: - DIOCRADDTABLES - DIOCRDELTABLES - DIOCRGETTABLES - DIOCRGETTSTATS - DIOCRCLRTSTATS - DIOCRSETTFLAGS All of them take a pfioc_table struct as input from userland. One of its elements (pfrio_size) is used in a buffer length calculation. The calculation contains an integer overflow which if triggered can lead to out of bound reads and writes later on. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>	2018-01-07 13:35:15 +00:00
Kristof Provost	9d671fee3a	pf: Allow the module to be unloaded pf can now be safely unloaded. Most of this code is exercised on vnet jail shutdown. Don't block unloading.	2017-12-31 16:18:13 +00:00
Kristof Provost	5d0020d6d7	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
Pedro F. Giffuni	6e778a7efd	SPDX: license IDs for some ISC-related files.	2017-12-08 15:57:29 +00:00
Pedro F. Giffuni	8820ecc040	SPDX: Fix some cases wrongly attributed to MIT. In the cases of BSD-style license variants without clauses, use 0BSD for the time being in lack of a better description.	2017-11-30 15:10:11 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Kristof Provost	7f3ad01804	pf_get_sport(): Prevent possible endless loop when searching for an unused nat port This is an import of Alexander Bluhm's OpenBSD commit r1.60, the first chunk had to be modified because on OpenBSD the 'cut' declaration is located elsewhere. Upstream report by Jingmin Zhou: https://marc.info/?l=openbsd-pf&m=150020133510896&w=2 OpenBSD commit message: Use a 32 bit variable to detect integer overflow when searching for an unused nat port. Prevents a possible endless loop if high port is 65535 or low port is 0. report and analysis Jingmin Zhou; OK sashan@ visa@ Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c PR: 221201 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: OpenBSD via ElectroBSD MFC after: 1 week	2017-08-08 21:09:26 +00:00
Kristof Provost	b7ae43552b	pf: Fix vnet purging pf_purge_thread() breaks up the work of iterating all states (in pf_purge_expired_states()) and tracks progress in the idx variable. If multiple vnets exist this results in pf_purge_thread() only calling pf_purge_expired_states() for part of the states (the first part of the first vnet, second part of the second vnet and so on). Combined with the mark-and-sweep approach to cleaning up old rules (in V_pf_unlinked_rules) that resulted in pf freeing rules that were still referenced by states. This in turn caused panics when pf_state_expires() encounters that state and attempts to access the rule. We need to track the progress per vnet, not globally, so idx is moved into a per-vnet V_pf_purge_idx. PR: 219251 Sponsored by: Hackathon Essen 2017	2017-07-09 17:56:39 +00:00
Kristof Provost	468cefa22e	pf: Fix vnet initialisation When running the vnet init code (pf_load_vnet()) we used to iterate over all vnets, marking them as unhooked. This is incorrect and leads to panics if pf is unloaded, as the unload code does not unregister the pfil hooks (because the vnet is marked as unhooked). There's no need or reason to touch other vnets during initialisation. Their pf_load_vnet() function will be triggered, which handles all required initialisation. Reviewed by: zec, gnn Differential Revision: https://reviews.freebsd.org/D10592	2017-05-07 14:33:58 +00:00
Kristof Provost	64c79ee733	pf: Fix panic on unload vnet_pf_uninit() is called through vnet_deregister_sysuninit() and linker_file_unload() when the pf module is unloaded. This is executed after pf_unload() so we end up trying to take locks which have been destroyed already. Move pf_unload() to a separate SYSUNINIT() to ensure it's called after all the vnet_pf_uninit() calls. Differential Revision: https://reviews.freebsd.org/D10025	2017-05-03 20:56:54 +00:00
Marko Zec	1e9e374199	Fix VNET leakages in PF by V_irtualizing pfr_ktables and friends. Apparently this resolves a PF-triggered panic when destroying VNET jails. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:34:39 +00:00
Marko Zec	3a36ee404f	Since curvnet is already properly set on entry to event handlers, there's no need to override it, particularly not unconditionally with vnet0. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:30:28 +00:00
Kristof Provost	00eab743ab	pf: Fix possible incorrect IPv6 fragmentation When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days	2017-04-20 09:05:53 +00:00
Kristof Provost	4e261006a1	pf: Also clear limit counters The "pfctl -F info" command didn't clear the limit counters ( as shown in the "pfctl -vsi" output). Submitted by: Max <maximos@als.nnov.ru>	2017-04-18 20:07:21 +00:00
Gleb Smirnoff	9f5efe718f	Fix potential NULL deref. Found by: PVS Studio	2017-04-14 01:56:15 +00:00
Kristof Provost	3601d25181	pf: Fix leak of pf_state_keys If we hit the state limit we returned from pf_create_state() without cleaning up. PR: 217997 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2017-04-01 12:22:34 +00:00
Kristof Provost	2f8fb3a868	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
Kristof Provost	08ef4ddb0f	pf: Fix rule evaluation after inet6 route-to In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out of a different interface. pf_test6() needs to know that the packet was forwarded (in case it needs to refragment so it knows whether to call ip6_output() or ip6_forward()). This lead pf_test6() to try to evaluate rules against the PF_FWD direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT. Once fwdir is set correctly the correct output/forward function will be called. PR: 217883 Submitted by: Kajetan Staszkiewicz MFC after: 1 week Sponsored by: InnoGames GmbH	2017-03-19 03:06:09 +00:00
Kristof Provost	5c172e7059	pf: Fix memory leak on vnet shutdown or unload Rules are unlinked in shutdown_pf(), so we must call pf_unload_vnet_purge(), which frees unlinked rules, after that, not before. Reviewed by: eri, bz Differential Revision: https://reviews.freebsd.org/D10040	2017-03-18 01:37:20 +00:00
Kristof Provost	2a57d24bd1	pf: Fix incorrect rw_sleep() in pf_unload() When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep() with it, because it would release a lock we do not hold. There's no need for the lock either, so we can just tsleep(). While here also make the same change in pf_purge_thread(), because it explicitly takes the lock before rw_sleep() and then immediately releases it afterwards.	2017-03-12 05:42:57 +00:00
Kristof Provost	f618201314	pf: Do not lose the VNET lock when ending the purge thread When the pf_purge_thread() exits it must make sure to release the VNET_LIST_RLOCK it still holds. kproc_exit() does not return.	2017-03-12 05:00:04 +00:00
Kristof Provost	98a9874f7b	pf: Fix a crash in low-memory situations If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's no more memory for it) it frees skp. This is wrong, because skp is a pf_state_key *, so we need to free skp, as is done later in the function. Getting it wrong means we try to free a stack variable of the calling pf_test_rule() function, and we panic.	2017-03-06 23:41:23 +00:00
Eric van Gyzen	643faabe0d	pf: use inet_ntoa_r() instead of inet_ntoa(); maybe fix IPv6 OS fingerprinting inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. This code had an INET6 conditional before this commit, but opt_inet6.h was not included, so INET6 was never defined. Apparently, pf's OS fingerprinting hasn't worked with IPv6 for quite some time. This commit might fix it, but I didn't test that. Reviewed by: gnn, kp MFC after: 2 weeks Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6) Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:44:44 +00:00
Gleb Smirnoff	164aa3ce5e	Fix indentantion in pf_purge_thread(). No functional change.	2017-01-30 22:47:48 +00:00
Luiz Otavio O Souza	a5c1a50a26	Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-29 02:17:52 +00:00
Marcel Moolenaar	aa8c6a6dca	Improve upon r309394 Instead of taking an extra reference to deal with pfsync_q_ins() and pfsync_q_del() taken and dropping a reference (resp,) make it optional of those functions to take or drop a reference by passing an extra argument. Submitted by: glebius@	2016-12-10 03:31:38 +00:00
Gleb Smirnoff	296d65b7a9	Backout accidentially leaked in r309746 not yet reviewed patch :(	2016-12-09 18:00:45 +00:00
Gleb Smirnoff	3cbee8caa1	Use counter_ratecheck() in the ICMP rate limiting. Together with: rrs, jtl	2016-12-09 17:59:15 +00:00
Kristof Provost	c3e14afc18	pflog: Correctly initialise subrulenr subrulenr is considered unset if it's set to -1, not if it's set to 1. See contrib/tcpdump/print-pflog.c pflog_print() for a user. This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0): rule 0..16777216(match) instead of the correct output of rule 0/0(match) PR: 214832 Submitted by: andywhite@gmail.com	2016-12-05 21:52:10 +00:00
Marcel Moolenaar	d6d35f1561	Fix use-after-free bugs in pfsync(4) Use after free happens for state that is deleted. The reference count is what prevents the state from being freed. When the state is dequeued, the reference count is dropped and the memory freed. We can't dereference the next pointer or re-queue the state. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8671	2016-12-02 06:15:59 +00:00
Kristof Provost	1f4955785d	pf: port extended DSCP support from OpenBSD Ignore the ECN bits on 'tos' and 'set-tos' and allow to use DCSP names instead of having to embed their TOS equivalents as plain numbers. Obtained from: OpenBSD Sponsored by: OPNsense Differential Revision: https://reviews.freebsd.org/D8165	2016-10-13 20:34:44 +00:00
Kristof Provost	813196a11a	pf: remove fastroute tag The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code allows to skip the in pfil hooks and completely removes the out pfil invoke, albeit looking up a route that the IP stack will likely find on its own. The code between IPv4 and IPv6 is also inconsistent and marked as "XXX" for years. Submitted by: Franco Fichtner <franco@opnsense.org> Differential Revision: https://reviews.freebsd.org/D8058	2016-10-04 19:35:14 +00:00
Kevin Lo	c7641cd18d	Remove ifa_list, use ifa_link (structure field) instead. While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in sctp_bsd_addr.c Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8051	2016-09-28 13:29:11 +00:00
Kristof Provost	0df377cbb8	pf: Add missing byte-order swap to pf_match_addr_range Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not match addresses correctly on little-endian systems. PR: 211796 Obtained from: OpenBSD (sthen) MFC after: 3 days	2016-08-15 12:13:14 +00:00
Kristof Provost	aa7cac58c6	pf: Map hook returns onto the correct error values pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers expect to get E<foo> error codes. Map the returns values. A pass is 0 (everything is OK), anything else means pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP error message. PR: 207598	2016-07-09 12:17:01 +00:00
Bjoern A. Zeeb	a8fc1b786d	The void isn't void. Unbreak sparc64 and powerpc builds. Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 12 days	2016-06-24 11:53:12 +00:00
Bjoern A. Zeeb	66c00e9efb	Proerply virtualize pfsync for bringup after pf is initialized and teardown of VNETs once pf(4) has been shut down. Properly split resources into VNET_SYS(UN)INITs and one time module loading. While here cover the INET parts in the uninit callpath with proper #ifdefs. Approved by: re (gjb) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-23 22:31:44 +00:00
Bjoern A. Zeeb	7d7751a071	Make sure pflog is attached after pf is initializaed so we can borrow pf's lock, and also make sure pflog goes after pf is gone in order to avoid callouts in VNETs to an already freed instance. Reported by: Ivan Klymenko, Johan Hendriks on current@ today Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation MFC after: 13 days Approved by: re (gjb)	2016-06-23 22:31:10 +00:00
Bjoern A. Zeeb	a8e8c57443	PFSTATE_NOSYNC goes onto state_flags, not sync_state; this prevents: panic: pfsync_delete_state: unexpected sync state 8 Reviewed by: kp Approved by: re (gjb) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6942	2016-06-23 21:42:43 +00:00
Bjoern A. Zeeb	a0429b5459	Update pf(4) and pflog(4) to survive basic VNET testing, which includes proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924	2016-06-23 21:34:38 +00:00
Bjoern A. Zeeb	8147948e19	Import a fix for and old security issue (CVE-2010-3830) in pf which was not relevant to FreeBSD as only root could open /dev/pf by default. With VIMAGE this is will longer be the case. As pf(4) starts to be supported with VNETs 3rd party users may open /dev/pf inside the virtual jail instance; thus we need to address this issue after all. While OpenBSD largely rewrote code parts for the fix [1], and it's unclear what Apple [3] did, import the minimal fix from NetBSD [2]. [1] http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_ioctl.c.diff?r1=1.235&r2=1.236 [2] http://mail-index.netbsd.org/source-changes/2011/01/19/msg017518.html [3] https://support.apple.com/en-gb/HT202154 Obtained from: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dist/pf/net/pf_ioctl.c.diff?r1=1.42&r2=1.43&only_with_tag=MAIN MFC After: 2 weeks Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Security: CVE-2010-3830	2016-06-23 05:41:46 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Kristof Provost	3e248e0fb4	pf: Filter on and set vlan PCP values Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786	2016-06-17 18:21:55 +00:00
Kristof Provost	b599e8dc59	pf: Fix more ICMP mistranslation In the default case fix the substitution of the destination address. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2016-05-23 13:59:48 +00:00
Kristof Provost	c0c82715b8	pf: Fix ICMP translation Fix ICMP source address rewriting in rdr scenarios. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2016-05-23 12:41:29 +00:00
Kristof Provost	d9f4fce5a7	pf: Fix fragment timeout We were inconsistent about the use of time_second vs. time_uptime. Always use time_uptime so the value can be meaningfully compared. Submitted by: "Max" <maximos@als.nnov.ru> MFC after: 4 days	2016-05-20 15:41:05 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Kristof Provost	0d8c93313e	pf: Improve forwarding detection When we guess the nature of the outbound packet (output vs. forwarding) we need to take bridges into account. When bridging the input interface does not match the output interface, but we're not forwarding. Similarly, it's possible for the interface to actually be the bridge interface itself (and not a member interface). PR: 202351 MFC after: 2 weeks	2016-03-16 06:42:15 +00:00
Kristof Provost	14b5e85b18	pf: Fix possible out-of-bounds write In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs, which is processed in pfr_set_addrs(). At the users request we also provide feedback on the deleted addresses, by storing them after the new list ('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()). This means we write outside the bounds of the buffer we've just allocated. We need to look at pfrio_size2 instead (i.e. the size the user reserved for our feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than pfrio_size though, in which case we'd still read outside of the allocated buffer. Instead we allocate the largest of the two values. Reported By: Paul J Murphy <paul@inetstat.net> PR: 207463 MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D5426	2016-02-25 07:33:59 +00:00
Kristof Provost	c90369f880	in pf_print_state_parts, do not use skw->proto to print the protocol but our local copy proto that we very carefully set beforehands. skw being NULL is perfectly valid there. Obtained from: OpenBSD (henning)	2016-02-20 12:53:53 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Alexander V. Chernikov	460a5b502f	Convert pf(4) to the new routing API. Differential Revision: https://reviews.freebsd.org/D4763	2016-01-07 10:20:03 +00:00
Alexander V. Chernikov	637670e77e	Bring back the ability of passing cached route via nd6_output_ifp().	2015-11-15 16:02:22 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Kristof Provost	5a505b317a	pf: Fix broken rule skip calculation r289932 accidentally broke the rule skip calculation. The address family argument to PF_ANEQ() is now important, and because it was set to 0 the macro always evaluated to false. This resulted in incorrect skip values, which in turn broke the rule evaluations.	2015-11-07 23:51:42 +00:00
Kristof Provost	679e3c77b7	pf: Fix IPv6 checksums with route-to. When using route-to (or reply-to) pf sends the packet directly to the output interface. If that interface doesn't support checksum offloading the checksum has to be calculated in software. That was already done in the IPv4 case, but not for the IPv6 case. As a result we'd emit packets with pseudo-header checksums (i.e. incorrect checksums). This issue was exposed by the changes in r289316 when pf stopped performing full checksum calculations for all packets. Submitted by: Luoqi Chen MFC after: 1 week	2015-10-29 20:45:53 +00:00
Alexander V. Chernikov	78546dad4e	Eliminate last rtalloc_ign() caller. Differential Revision: https://reviews.freebsd.org/D3927	2015-10-27 21:25:40 +00:00
Kristof Provost	c110fc49da	pf: Fix TSO issues In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779	2015-10-14 16:21:41 +00:00
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Kristof Provost	2f6c345adf	pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in pf_test6() because the rcvif and the ifp (output interface) are different. In that case we're bridging though, and the rcvif the the bridge member on which the packet was received and ifp is the bridge itself. If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect. Instead check if the rcvif is a member of the ifp bridge. (In other words, the if_bridge is the ifp's softc). If that's the case we're not forwarding but bridging. PR: 202351 Reviewed by: eri Differential Revision: https://reviews.freebsd.org/D3534	2015-09-01 19:04:04 +00:00
Kristof Provost	64b3b4d611	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza	22932fc9be	Reapply r196551 which was accidentally reverted by r223637 (update to OpenBSD pf 4.5). Fix argument ordering to memcpy as well as the size of the copy in the (theoretical) case that pfi_buffer_cnt should be greater than ~_max. This fix the failure when you hit the self table size and force it to be resized. MFC after: 3 days Sponsored by: Rubicon Communications (Netgate)	2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza	0a70aaf8f5	Add ALTQ(9) support for the CoDel algorithm. CoDel is a parameterless queue discipline that handles variable bandwidth and RTT. It can be used as the single queue discipline on an interface or as a sub discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ. Differential Revision: https://reviews.freebsd.org/D3272 Reviewd by: rpaulo, gnn (previous version) Obtained from: pfSense Sponsored by: Rubicon Communications (Netgate)	2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza	f2fc809dcd	Fix the copy of addresses passed from userland in table replace command. The size2 is the maximum userland buffer size (used when the addresses are copied back to userland). Obtained from: pfSense MFC after: 3 days Sponsored by: Rubicon Communications (Netgate)	2015-08-17 23:03:54 +00:00
Mariusz Zaborski	643ef281cd	Use correct src/dst ports when removing states. Submitted by: Milosz Kaniewski <m.kaniewski@wheelsystems.com>, UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal) Reviewed by: glebius Approved by: pjd (mentor) Obtained from: OpenBSD MFC after: 3 days	2015-08-11 17:24:34 +00:00
Kristof Provost	48c29b118e	pf: Always initialise pf_fragment.fr_flags When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to initialise the fr_flags field. As a result we sometimes mistakenly thought the fragment to not be a buffered fragment. This resulted in panics because we'd end up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it to be part of V_pf_cachequeue). The next time we iterated V_pf_fragqueue we'd use a freed object and panic. While here also fix a pf_fragment use after free in pf_normalize_ip(). pf_reassemble() frees the pf_fragment, so we can't use it any more. PR: 201879, 201932 MFC after: 5 days	2015-07-29 06:35:36 +00:00
Renato Botelho	299c819a75	Simplify logic added in r285945 as suggested by glebius Approved by: glebius MFC after: 3 days Sponsored by: Netgate	2015-07-28 14:59:29 +00:00
Renato Botelho	b1b98a2db7	Respect pf rule log option before log dropped packets with IP options or dangerous v6 headers Reviewed by: gnn, eri Approved by: gnn Obtained from: pfSense MFC after: 3 days Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D3222	2015-07-28 10:31:34 +00:00
Gleb Smirnoff	3e437fd2c6	Fix a typo in r280169. Of course we are interested in deleting nsn only if we have just created it and we were the last reference. Submitted by: dhartmei	2015-07-28 09:36:26 +00:00
Ermal Luçi	a5b789f65a	ALTQ FAIRQ discipline import from DragonFLY Differential Revision: https://reviews.freebsd.org/D2847 Reviewed by: glebius, wblock(manpage) Approved by: gnn(mentor) Obtained from: pfSense Sponsored by: Netgate	2015-06-24 19:16:41 +00:00
Kristof Provost	06ba348d27	pf: Remove frc_direction We don't use the direction of the fragments for anything. The frc_direction field is assigned, but never read. Just remove it. Differential Revision: https://reviews.freebsd.org/D2773 Approved by: philip (mentor)	2015-06-11 17:57:47 +00:00
Kristof Provost	837b925aba	pf: Save the protocol number in the pf_fragment When we try to look up a pf_fragment with pf_find_fragment() we compare (see pf_frag_compare()) addresses (and family), id but also protocol. We failed to save the protocol to the pf_fragment in pf_fragcache(), resulting in failing reassembly. Differential Revision: https://reviews.freebsd.org/D2772	2015-06-11 13:26:16 +00:00
Kristof Provost	0b7eba6ad4	pf: address family must be set when creating a pf_fragment Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set. In that scenario we take a different branch in pf_normalize_ip(), taking us to pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a pf_fragment, but do not set the address family. This leads to a panic when we try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to compare the pf_fragments doesn't know what to do if the address family is not set. Simply ensure that the address family is set correctly (always AF_INET in this path). PR: 200330 Differential Revision: https://reviews.freebsd.org/D2769 Approved by: philip (mentor), gnn (mentor)	2015-06-10 13:44:04 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Gleb Smirnoff	3dd01a884c	Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization from associated structures initialization. The mutexes are global, while the structures are per-vnet. Submitted by: Nikos Vassiliadis <nvass gmx.com>	2015-05-19 14:04:21 +00:00
Gleb Smirnoff	30fe681e44	During module unload unlock rules before destroying UMA zones, which may sleep in uma_drain(). It is safe to unlock here, since we are already dehooked from pfil(9) and all pf threads had quit. Sponsored by: Nginx, Inc.	2015-05-19 14:02:40 +00:00
Gleb Smirnoff	78680d05d1	A miss from r283061: don't dereference NULL is pf_get_mtag() fails. PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>	2015-05-18 15:51:27 +00:00
Gleb Smirnoff	b7f69c506d	Don't dereference NULL is pf_get_mtag() fails. PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>	2015-05-18 15:05:12 +00:00
Gleb Smirnoff	772e66a6fc	Move ALTQ from contrib to net/altq. The ALTQ code is for many years discontinued by its initial authors. In FreeBSD the code was already slightly edited during the pf(4) SMP project. It is about to be edited more in the projects/ifnet. Moving out of contrib also allows to remove several hacks to the make glue. Reviewed by: net@	2015-04-16 20:22:40 +00:00
Kristof Provost	3d1bbe5fa0	pf: Fix forwarding detection If the direction is not PF_OUT we can never be forwarding. Some input packets have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound packets, causing panics. Equally, we need to ensure that packets were really received and not locally generated before trying to ip6_forward() them. Differential Revision: https://reviews.freebsd.org/D2286 Approved by: gnn(mentor)	2015-04-14 19:07:37 +00:00
George V. Neville-Neil	916e17fd56	I can find no reason to allow packets with both SYN and FIN bits set past this point in the code. The packet should be dropped and not massaged as it is here. Differential Revision: https://reviews.freebsd.org/D2266 Submitted by: eri Sponsored by: Rubicon Communications (Netgate)	2015-04-14 14:43:42 +00:00
Kristof Provost	1873dcc8c9	pf: Skip firewall for refragmented ip6 packets In cases where we scrub (fragment reassemble) on both input and output we risk ending up in infinite loops when forwarding packets. Fragmented packets come in and get collected until we can defragment. At that point the defragmented packet is handed back to the ip stack (at the pfil point in ip6_input(). Normal processing continues. Eventually we figure out that the packet has to be forwarded and we end up at the pfil hook in ip6_forward(). After doing the inspection on the defragmented packet we see that the packet has been defragmented and because we're forwarding we have to refragment it. In pf_refragment6() we split the packet up again and then ip6_forward() the individual fragments. Those fragments hit the pfil hook on the way out, so they're collected until we can reconstruct the full packet, at which point we're right back where we left off and things continue until we run out of stack. Break that loop by marking the fragments generated by pf_refragment6() as M_SKIP_FIREWALL. There's no point in processing those packets in the firewall anyway. We've already filtered on the full packet. Differential Revision: https://reviews.freebsd.org/D2197 Reviewed by: glebius, gnn Approved by: gnn (mentor)	2015-04-06 19:05:00 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Kristof Provost	7dce9b515b	pf: Deal with runt packets On Ethernet packets have a minimal length, so very short packets get padding appended to them. This padding is not stripped off in ip6_input() (due to support for IPv6 Jumbograms, RFC2675). That means PF needs to be careful when reassembling fragmented packets to not include the padding in the reassembled packet. While here also remove the 'Magic from ip_input.' bits. Splitting up and re-joining an mbuf chain here doesn't make any sense. Differential Revision: https://reviews.freebsd.org/D2189 Approved by: gnn (mentor)	2015-04-01 12:16:56 +00:00
Kristof Provost	798318490e	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
Sergey Kandaurov	a4879be402	Static'ize pf_fillup_fragment body to match its declaration. Missed in 278925.	2015-03-26 13:31:04 +00:00
Gleb Smirnoff	3e8c6d74bb	Always lock the hash row of a source node when updating its 'states' counter. PR: 182401 Sponsored by: Nginx, Inc.	2015-03-17 12:19:28 +00:00
Andrey V. Elsukov	998fbd14b8	Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was consumed by filter. This fixes several panics due to accessing to mbuf after free. Submitted by: Kristof Provost MFC after: 1 week	2015-03-12 08:57:24 +00:00
Gleb Smirnoff	4ac6485cc6	Even more fixes to !INET and !INET6 kernels. In collaboration with: pluknet	2015-02-17 22:33:22 +00:00
Gleb Smirnoff	0324938a0f	- Improve INET/INET6 scope. - style(9) declarations. - Make couple of local functions static.	2015-02-16 23:50:53 +00:00
Gleb Smirnoff	8dc98c2a36	Toss declarations to fix regular build and NO_INET6 build.	2015-02-16 21:52:28 +00:00
Gleb Smirnoff	39a58828ef	In the forwarding case refragment the reassembled packets with the same size as they arrived in. This allows the sender to determine the optimal fragment size by Path MTU Discovery. Roughly based on the OpenBSD work by Alexander Bluhm. Submitted by: Kristof Provost Differential Revision: D1767	2015-02-16 07:01:02 +00:00
Gleb Smirnoff	f5ceb22b78	Update the pf fragment handling code to closer match recent OpenBSD. That partially fixes IPv6 fragment handling. Thanks to Kristof for working on that. Submitted by: Kristof Provost Tested by: peter Differential Revision: D1765	2015-02-16 03:38:27 +00:00
Gleb Smirnoff	efc6c51ffa	Back out r276841, r276756, r276747, r276746. The change in r276747 is very very questionable, since it makes vimages more dependent on each other. But the reason for the backout is that it screwed up shutting down the pf purge threads, and now kernel immedially panics on pf module unload. Although module unloading isn't an advertised feature of pf, it is very important for development process. I'd like to not backout r276746, since in general it is good. But since it has introduced numerous build breakages, that later were addressed in r276841, r276756, r276747, I need to back it out as well. Better replay it in clean fashion from scratch.	2015-01-22 01:23:16 +00:00
Craig Rodrigues	7259906eb0	Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx. They are already initialized by MTX_SYSINIT. Submitted by: Nikos Vassiliadis <nvass@gmx.com>	2015-01-08 17:49:07 +00:00
Craig Rodrigues	8d665c6ba8	Reapply previous patch to fix build. PR: 194515	2015-01-06 16:47:02 +00:00
Craig Rodrigues	4de985af0b	Instead of creating a purge thread for every vnet, create a single purge thread and clean up all vnets from this thread. PR: 194515 Differential Revision: D1315 Submitted by: Nikos Vassiliadis <nvass@gmx.com>	2015-01-06 09:03:03 +00:00
Craig Rodrigues	c75820c756	Merge: r258322 from projects/pf branch Split functions that initialize various pf parts into their vimage parts and global parts. Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). PR: 194515 Differential Revision: D1309 Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com> Reviewed by: trociny, zec, gnn	2015-01-06 08:39:06 +00:00
Ermal Luçi	7b56cc430a	pf(4) needs to have a correct checksum during its processing. Calculate checksums for the IPv6 path when needed before delving into pf(4) code as required. PR: 172648, 179392 Reviewed by: glebius@ Approved by: gnn@ Obtained from: pfSense MFC after: 1 week Sponsored by: Netgate	2014-11-19 13:31:08 +00:00
Alexander V. Chernikov	5b07fc31cc	Finish r274315: remove union 'u' from struct pf_send_entry. Suggested by: kib	2014-11-09 17:01:54 +00:00
Alexander V. Chernikov	a458ad86ee	Remove unused 'struct route' fields.	2014-11-09 16:15:28 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Dag-Erling Smørgrav	99e9de871a	Add a complete implementation of MurmurHash3. Tweak both implementations so they match the established idiom. Document them in hash(9). MFC after: 1 month MFC with: r272906	2014-10-18 22:15:11 +00:00
George V. Neville-Neil	1d2baefc13	Change the PF hash from Jenkins to Murmur3. In forwarding tests this showed a conservative 3% incrase in PPS. Differential Revision: https://reviews.freebsd.org/D461 Submitted by: des Reviewed by: emaste MFC after: 1 month	2014-10-10 19:26:26 +00:00
Alexander V. Chernikov	31f0d081d8	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
Gleb Smirnoff	495a22b595	Use rn_detachhead() instead of direct free(9) for radix tables. Sponsored by: Nginx, Inc.	2014-10-01 13:35:41 +00:00
Gleb Smirnoff	2a6009bfa6	Mechanically convert to if_inc_counter().	2014-09-19 09:19:29 +00:00
Gleb Smirnoff	56b61ca27a	Remove ifq_drops from struct ifqueue. Now queue drops are accounted in struct ifnet if_oqdrops. Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops is simply removed from them. There were no API to read this statistic. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-19 09:01:19 +00:00
Gleb Smirnoff	450cecf0a0	- Provide a sleepable lock to protect against ioctl() vs ioctl() races. - Use the new lock to protect against simultaneous DIOCSTART and/or DIOCSTOP ioctls. Reported & tested by: jmallett Sponsored by: Nginx, Inc.	2014-09-12 08:39:15 +00:00
Gleb Smirnoff	bf7dcda366	Clean up unused CSUM_FRAGMENT. Sponsored by: Nginx, Inc.	2014-09-03 08:30:18 +00:00
Gleb Smirnoff	b616ae250c	Explicitly free packet on PF_DROP, otherwise a "quick" rule with "route-to" may still forward it. PR: 177808 Submitted by: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH	2014-09-01 13:00:45 +00:00
Gleb Smirnoff	e85343b1a5	Do not lookup source node twice when pf_map_addr() is used. PR: 184003 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 14:16:08 +00:00
Gleb Smirnoff	afab0f7e01	pf_map_addr() can fail and in this case we should drop the packet, otherwise bad consequences including a routing loop can occur. Move pf_set_rt_ifp() earlier in state creation sequence and inline it, cutting some extra code. PR: 183997 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 14:02:24 +00:00
Gleb Smirnoff	11341cf97e	Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL. PR: 127920 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 04:35:34 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Gleb Smirnoff	a9572d8f02	- Count global pf(4) statistics in counter(9). - Do not count global number of states and of src_nodes, use uma_zone_get_cur() to obtain values. - Struct pf_status becomes merely an ioctl API structure, and moves to netpfil/pf/pf.h with its constants. - V_pf_status is now of type struct pf_kstatus. Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-14 18:57:46 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Gleb Smirnoff	8ff2bd98d6	On machines with strict alignment copy pfsync_state_key from packet on stack to avoid unaligned access. PR: 187381 Submitted by: Lytochkin Boris <lytboris gmail.com>	2014-07-10 12:41:58 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
John Baldwin	b437b06c79	Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count, not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).	2014-05-29 19:17:10 +00:00
Gleb Smirnoff	0e4f18aa68	o In pf_normalize_ip() we don't need mtag in !(PFRULE_FRAGCROP\|PFRULE_FRAGDROP) case. o In the (PFRULE_FRAGCROP\|PFRULE_FRAGDROP) case we should allocate mtag if we don't find any. Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2014-05-17 12:30:27 +00:00
Gleb Smirnoff	53f4b0cf9b	The current API for adding rules with pool addresses is the following: - DIOCADDADDR adds addresses and puts them into V_pf_pabuf - DIOCADDRULE takes all addresses from V_pf_pabuf and links them into rule. The ugly part is that if address is a table, then it is initialized in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't supply ruleset. But if address is a dynaddr, we need address family, and address family could be different for different addresses in one rule, so dynaddr is initialized in DIOCADDADDR. This leads to the entangled state of addresses on V_pf_pabuf. Some are initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf) can lead to a panic on a NULL table address. Since proper fix requires API/ABI change, for now simply plug the panic in pf_empty_pool(). Reported by: danger	2014-04-25 11:36:11 +00:00
Martin Matuska	ecb47cf9c5	Backport from projects/pf r263908: De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. MFC after: 1 week	2014-04-20 09:17:48 +00:00
Gleb Smirnoff	79bde95f51	Backout r257223,r257224,r257225,r257246,r257710. The changes caused some regressions in ICMP handling, and right now me and Baptiste are out of time on analyzing them. PR: 188253	2014-04-16 09:25:20 +00:00
Martin Matuska	42311ccc0b	Merge from projects/pf r264198: Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic. Reviewed by: trociny MFC after: 1 week	2014-04-07 07:06:13 +00:00
Martin Matuska	0a7c583acc	Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic. Reviewed by: trociny	2014-04-06 19:19:25 +00:00
Martin Matuska	7e92ce7380	De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. Reviewed by: Nikos Vassiliadis, trociny@	2014-03-29 09:05:25 +00:00
Martin Matuska	1709ccf9d3	Merge head up to r263906.	2014-03-29 08:39:53 +00:00
Martin Matuska	d318d97fb5	Merge from projects/pf r251993 (glebius@): De-vnet hash sizes and hash masks. Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny MFC after: 1 month	2014-03-25 06:55:53 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
Gleb Smirnoff	fb3541ad15	Instead of playing games with casts simply add 3 more members to the structure pf_rule, that are used when the structure is passed via ioctl(). PR: 187074	2014-03-05 00:40:03 +00:00
Martin Matuska	5748b897da	Merge head up to r262222 (last merge was incomplete).	2014-02-19 22:02:15 +00:00

... 2 3 4 5 6 ...

430 Commits