freebsd-dev

Author	SHA1	Message	Date
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Pawel Biernacki	10b49b2302	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. Mark all nodes in pf, pfsync and carp as MPSAFE. Reviewed by: kp Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23634	2020-02-21 16:23:00 +00:00
Hans Petter Selasky	fbb890056e	Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process incoming packets in taskqueue context. This patch extends r357772. Differential Revision: https://reviews.freebsd.org/D23742 Reviewed by: glebius@ Sponsored by: Mellanox Technologies	2020-02-18 19:53:36 +00:00
Hans Petter Selasky	b4426a7175	Add missing EPOCH(9) wrapper in ipfw(8). Backtrace: panic() ip_output() dyn_tick() softclock_call_cc() softclock() ithread_loop() Differential Revision: https://reviews.freebsd.org/D23599 Reviewed by: glebius@ and ae@ Found by: mmacy@ Reported by: jmd@ Sponsored by: Mellanox Technologies	2020-02-11 18:16:29 +00:00
Kristof Provost	e3e03bc159	pf: Apply kif flags to new group members If we have a 'set skip on <ifgroup>' rule this flag it set on the group kif, but must also be set on all members. pfctl does this when the rules are set, but if groups are added afterwards we must also apply the flags to the new member. If not, new group members will not be skipped until the rules are reloaded. Reported by: dvl@ Reviewed by: glebius@ Differential Revision: https://reviews.freebsd.org/D23254	2020-01-23 22:13:41 +00:00
Kristof Provost	ef1bd1e517	pfsync: Ensure we enter network epoch before calling ip_output As of r356974 calls to ip_output() require us to be in the network epoch. That wasn't the case for the calls done from pfsyncintr() and pfsync_defer_tmo().	2020-01-22 21:01:19 +00:00
Gleb Smirnoff	2a4bd982d0	Introduce NET_EPOCH_CALL() macro and use it everywhere where we free data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC	2020-01-15 06:05:20 +00:00
Alexander V. Chernikov	57ddf39678	ipfw: Don't rollback state in alloc_table_vidx() if atomicity is not required. Submitted by: Neel Chauhan <neel AT neelc DOT org> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22662	2019-12-19 10:22:16 +00:00
Alexander V. Chernikov	00b45f58e8	Revert r355908 to commit it with a proper message.	2019-12-19 10:20:38 +00:00
Alexander V. Chernikov	880266635d	svn-commit.tmp	2019-12-19 09:19:27 +00:00
Kristof Provost	cca2ea64e9	pf: Make request_maxcount runtime adjustable There's no reason for this to be a tunable. It's perfectly safe to change this at runtime. Reviewed by: Lutz Donnerhacke Differential Revision: https://reviews.freebsd.org/D22737	2019-12-14 02:06:07 +00:00
Andrey V. Elsukov	1ae74c1359	Make TCP options parsing stricter. Rework tcpopts_parse() to be more strict. Use const pointer. Add length checks for specific TCP options. The main purpose of the change is avoiding of possible out of mbuf's data access. Reported by: Maxime Villard Reviewed by: melifaro, emaste MFC after: 1 week	2019-12-13 11:47:58 +00:00
Andrey V. Elsukov	2873980947	Follow RFC 4443 p2.2 and always use own addresses for reflected ICMPv6 datagrams. Previously destination address from original datagram was used. That looked confusing, especially in the traceroute6 output. Also honor IPSTEALTH kernel option and do TTL/HLIM decrementing only when stealth mode is disabled. Reported by: Marco van Tol <marco at tols org> Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D22631	2019-12-12 13:28:46 +00:00
Andrey V. Elsukov	ca0ac0a6c1	Avoid access to stale ip pointer and call UPDATE_POINTERS() after PULLUP_LEN_LOCKED(). PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related pointers that can be used in next opcodes. Reported by: Maxime Villard <max at m00nbsd net> MFC after: 1 week	2019-12-10 10:35:32 +00:00
Kristof Provost	492f3a312a	pf: Add endline to all DPFPRINTF() DPFPRINTF() doesn't automatically add an endline, so be consistent and always add it.	2019-11-24 13:53:36 +00:00
Kristof Provost	a0d571cbef	pf: Must be in NET_EPOCH to call icmp_error icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH. Failure to hold it leads to the following panic (with INVARIANTS): panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742 cpuid = 2 time = 1571233273 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920 vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980 panic() at panic+0x43/frame 0xfffffe00e09779e0 icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0 icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10 pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50 ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0 fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0 Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(), but ip_output() will soon be converted to a function that requires epoch, so entering NET_EPOCH directly from pf_intr() makes more sense. Discussed with: glebius@	2019-10-18 03:36:26 +00:00
Gleb Smirnoff	16a72f53e2	Use epoch(9) directly instead of obsoleted KPI.	2019-10-14 16:37:41 +00:00
Gleb Smirnoff	961c033ef1	ipfw(4) rule matching always happens in network epoch.	2019-10-14 16:37:00 +00:00
Mark Johnston	bff630d1dc	Fix the build after r353458. MFC with: r353458 Sponsored by: The FreeBSD Foundation	2019-10-13 00:08:17 +00:00
Mark Johnston	6cc9ab8610	Add a missing include of opt_sctp.h. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-12 23:01:16 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Gleb Smirnoff	f8b45306c6	Drivers may pass runt packets to filter. This is okay. Reviewed by: gallatin	2019-09-13 22:36:04 +00:00
Andrey V. Elsukov	773a7e2224	Fix rule truncation on external action module unloading. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-08-15 13:44:33 +00:00
Ed Maste	c54ee572e5	pf: zero (another) output buffer in pfioctl Avoid potential structure padding leak. r350294 identified a leak via static analysis; although there's no report of a leak with the DIOCGETSRCNODES ioctl it's a good practice to zero the memory. Suggested by: kp MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-31 16:58:09 +00:00
Andrey V. Elsukov	e758846c09	dd ipfw_get_action() function to get the pointer to action opcode. ACTION_PTR() returns pointer to the start of rule action section, but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ, and only then real action opcode is stored. ipfw_get_action() function inspects the rule action section, skips all modifiers and returns action opcode. Use this function in ipfw_reset_eaction() and flush_nat_ptrs(). MFC after: 1 week Sponsored by: Yandex LLC	2019-07-29 15:09:12 +00:00
Kristof Provost	f287767d4f	pf: Remove partial RFC2675 support Remove our (very partial) support for RFC2675 Jumbograms. They're not used, not actually supported and not a good idea. Reviewed by: thj@ Differential Revision: https://reviews.freebsd.org/D21086	2019-07-29 13:21:31 +00:00
Andrey V. Elsukov	d4e6a52959	Avoid possible lock leaking. After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just returns via `goto pullup_failed`. There are two places where PULLUP_LEN() is called with IPFW_PF_RLOCK() held. Add PULLUP_LEN_LOCKED() macro to use in these places to be able release the lock, when m_pullup() fails. Sponsored by: Yandex LLC	2019-07-29 12:55:48 +00:00
Ed Maste	532bc58628	pf: zero output buffer in pfioctl Avoid potential structure padding leak. Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: kp MFC after: 3 days Security: Potential kernel memory disclosure Sponsored by: The FreeBSD Foundation	2019-07-24 16:51:14 +00:00
Andrey V. Elsukov	455d2ecb71	Eliminate rmlock from ipfw's BPF code. After r343631 pfil hooks are invoked in net_epoch_preempt section, this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion to each ipfw_bpf_tap() call to require to be called from inside epoch section. Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes safe to free() ifnet. And use on-stack ifnet pointer in each ipfw_bpf_tap() call to avoid NULL pointer dereference in case when V_log_if global variable will become NULL during ipfw_bpf_tap*() call. Sponsored by: Yandex LLC	2019-07-23 12:52:36 +00:00
Andrey V. Elsukov	2dab0de635	Do not modify cmd pointer if it is already last opcode in the rule. MFC after: 1 week	2019-07-12 09:59:21 +00:00
Andrey V. Elsukov	4ee2f4c180	Correctly truncate the rule in case when it has several action opcodes. It is possible, that opcode at the ACTION_PTR() location is not real action, but action modificator like "log", "tag" etc. In this case we need to check for each opcode in the loop to find O_EXTERNAL_ACTION. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-07-12 09:48:42 +00:00
Hans Petter Selasky	59854ecf55	Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m \| grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:54:41 +00:00
Andrey V. Elsukov	019c8c9330	Follow the RFC 3128 and drop short TCP fragments with offset = 1. Reported by: emaste MFC after: 1 week	2019-06-25 11:40:37 +00:00
Andrey V. Elsukov	7d4b2d5244	Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in compact form. MFC after: 1 week	2019-06-25 09:11:22 +00:00
Andrey V. Elsukov	978f2d1728	Add "tcpmss" opcode to match the TCP MSS value. With this opcode it is possible to match TCP packets with specified MSS option, whose value corresponds to configured in opcode value. It is allowed to specify single value, range of values, or array of specific values or ranges. E.g. # ipfw add deny log tcp from any to any tcpmss 0-500 Reviewed by: melifaro,bcr Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-06-21 10:54:51 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Andrey V. Elsukov	efdadaa2d8	Initialize V_nat64out methods explicitly. It looks like initialization of static variable doesn't work for VIMAGE and this leads to panic. Reported by: olivier MFC after: 1 week	2019-06-05 09:25:40 +00:00
Li-Wen Hsu	d086d41363	Remove an uneeded indentation introduced in r223637 to silence gcc warnging MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-05-25 23:58:09 +00:00
John Baldwin	fb3bc59600	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
Andrey V. Elsukov	90ecb41fba	Add IPv6 support for O_IPLEN opcode. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-04-29 09:33:16 +00:00
Kristof Provost	1c75b9d2cd	pf: No need to M_NOWAIT in DIOCRSETTFLAGS Now that we don't hold a lock during DIOCRSETTFLAGS memory allocation we can use M_WAITOK. MFC after: 1 week Event: Aberdeen hackathon 2019 Pointed out by: glebius@	2019-04-18 11:37:44 +00:00
Kristof Provost	f5e0d9fcb4	pf: Fix panic on invalid DIOCRSETTFLAGS If during DIOCRSETTFLAGS pfrio_buffer is NULL copyin() will fault, which we're not allowed to do with a lock held. We must count the number of entries in the table and release the lock during copyin(). Only then can we re-acquire the lock. Note that this is safe, because pfr_set_tflags() will check if the table and entries exist. This was discovered by a local syzcaller instance. MFC after: 1 week Event: Aberdeen hackathon 2019	2019-04-17 16:42:54 +00:00
Rodney W. Grimes	6c1c6ae537	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317	2019-04-04 19:01:13 +00:00
Conrad Meyer	a8a16c7128	Replace read_random(9) with more appropriate arc4rand(9) KPIs Reviewed by: ae, delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19760	2019-04-04 01:02:50 +00:00
Ed Maste	a342f5772f	pf: use UID_ROOT and GID_WHEEL named constants in make_dev No functional change but improves consistency and greppability of make_dev calls. Discussed with: kp	2019-03-26 21:20:42 +00:00
Gleb Smirnoff	97245d4074	Always create ipfw(4) hooks as long as module is loaded. Now enabling ipfw(4) with sysctls controls only linkage of hooks to default heads. When module is loaded fetch sysctls as tunables, to make it possible to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.	2019-03-21 16:15:29 +00:00
Kristof Provost	64af73aade	pf: Ensure that IP addresses match in ICMP error packets States in pf(4) let ICMP and ICMP6 packets pass if they have a packet in their payload that matches an exiting connection. It was not checked whether the outer ICMP packet has the same destination IP as the source IP of the inner protocol packet. Enforce that these addresses match, to prevent ICMP packets that do not make sense. Reported by: Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv Obtained from: OpenBSD Security: CVE-2019-5598	2019-03-21 08:09:52 +00:00
Andrey V. Elsukov	b8c431f9c0	Do not enter epoch section recursively. A pfil hook is already invoked in NET_EPOCH section.	2019-03-20 10:11:21 +00:00
Andrey V. Elsukov	e0b7b6d465	Use NET_EPOCH instead of allocating separate one. MFC after: 1 month	2019-03-20 10:06:44 +00:00
Andrey V. Elsukov	d18c1f26a4	Reapply r345274 with build fixes for 32-bit architectures. Update NAT64LSN implementation: o most of data structures and relations were modified to be able support large number of translation states. Now each supported protocol can use full ports range. Ports groups now are belongs to IPv4 alias addresses, not hosts. Each ports group can keep several states chunks. This is controlled with new `states_chunks` config option. States chunks allow to have several translation states for single alias address and port, but for different destination addresses. o by default all hash tables now use jenkins hash. o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path. o one NAT64LSN instance now can be used to handle several IPv6 prefixes, special prefix "::" value should be used for this purpose when instance is created. o due to modified internal data structures relations, the socket opcode that does states listing was changed. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-19 10:57:03 +00:00

1 2 3 4 5 ...

693 Commits