freebsd-skq

Author	SHA1	Message	Date
Gleb Smirnoff	825398f946	ipfw: make the "frag" keyword accept additional options "mf", "df", "rf" and "offset". This allows to match on specific bits of ip_off field. For compatibility reasons lack of keyword means "offset". Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D26021	2020-08-11 15:46:22 +00:00
Andrey V. Elsukov	aaef76e1fd	Handle delayed checksums if needed in NAT64. Upper level protocols defer checksums calculation in hope we have checksums offloading in a network card. CSUM_DELAY_DATA flag is used to determine that checksum calculation was deferred. And IP output routine checks for this flag before pass mbuf to lower layer. Forwarded packets have not this flag. NAT64 uses checksums adjustment when it translates IP headers. In most cases NAT64 is used for forwarded packets, but in case when it handles locally originated packets we need to finish checksum calculation that was deferred to correctly adjust it. Add check for presence of CSUM_DELAY_DATA flag and finish checksum calculation before adjustment. Reported and tested by: Evgeniy Khramtsov <evgeniy at khramtsov org> MFC after: 1 week	2020-08-05 09:16:35 +00:00
Tom Jones	b2776a1809	Don't print VNET pointer when initializing dummynet When dummynet initializes it prints a debug message with the current VNET pointer unnecessarily revealing kernel memory layout. This appears to be left over from when the first pieces of vimage support were added. PR: 238658 Submitted by: huangfq.daxian@gmail.com Reviewed by: markj, bz, gnn, kp, melifaro Approved by: jtl (co-mentor), bz (co-mentor) Event: July 2020 Bugathon MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D25619	2020-07-13 13:35:36 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Mark Johnston	1388cfe1b5	ipfw(4): make O_IPVER/ipversion match IPv4 or 6, not just IPv4. Submitted by: Neel Chauhan <neel AT neelc DOT org> Reviewed by: Lutz Donnerhacke MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25227	2020-06-24 15:46:33 +00:00
Eugene Grosbein	47cb0632e8	ipfw: unbreak matching with big table type flow. Test case: # n=32769 # ipfw -q table 1 create type flow:proto,dst-ip,dst-port # jot -w 'table 1 add tcp,127.0.0.1,' $n 1 \| ipfw -q /dev/stdin # ipfw -q add 5 unreach filter-prohib flow 'table(1)' The rule 5 matches nothing without the fix if n>=32769. With the fix, it works: # telnet localhost 10001 Trying 127.0.0.1... telnet: connect to address 127.0.0.1: Permission denied telnet: Unable to connect to remote host MFC after: 2 weeks Discussed with: ae, melifaro	2020-06-04 14:15:39 +00:00
Andrey V. Elsukov	e43ae8dcb5	Fix O_IP_FLOW_LOOKUP opcode handling. Do not check table value matching when table lookup has failed. Reported by: Sergey Lobanov MFC after: 1 week	2020-05-29 10:37:42 +00:00
Ed Maste	db462d948f	ipfw: whitespace fix in SCTP_ABORT_ASSOCIATION case statement comment Submitted by: Neel Chauhan <neel AT neelc DOT org> Reviewed by: rgrimes, tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24602	2020-05-03 03:44:16 +00:00
Alexander V. Chernikov	e7d8af4f65	Move route_temporal.c and route_var.h to net/route. Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597	2020-04-28 19:14:09 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Hans Petter Selasky	fbb890056e	Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process incoming packets in taskqueue context. This patch extends r357772. Differential Revision: https://reviews.freebsd.org/D23742 Reviewed by: glebius@ Sponsored by: Mellanox Technologies	2020-02-18 19:53:36 +00:00
Hans Petter Selasky	b4426a7175	Add missing EPOCH(9) wrapper in ipfw(8). Backtrace: panic() ip_output() dyn_tick() softclock_call_cc() softclock() ithread_loop() Differential Revision: https://reviews.freebsd.org/D23599 Reviewed by: glebius@ and ae@ Found by: mmacy@ Reported by: jmd@ Sponsored by: Mellanox Technologies	2020-02-11 18:16:29 +00:00
Gleb Smirnoff	2a4bd982d0	Introduce NET_EPOCH_CALL() macro and use it everywhere where we free data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC	2020-01-15 06:05:20 +00:00
Alexander V. Chernikov	57ddf39678	ipfw: Don't rollback state in alloc_table_vidx() if atomicity is not required. Submitted by: Neel Chauhan <neel AT neelc DOT org> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22662	2019-12-19 10:22:16 +00:00
Alexander V. Chernikov	00b45f58e8	Revert r355908 to commit it with a proper message.	2019-12-19 10:20:38 +00:00
Alexander V. Chernikov	880266635d	svn-commit.tmp	2019-12-19 09:19:27 +00:00
Andrey V. Elsukov	1ae74c1359	Make TCP options parsing stricter. Rework tcpopts_parse() to be more strict. Use const pointer. Add length checks for specific TCP options. The main purpose of the change is avoiding of possible out of mbuf's data access. Reported by: Maxime Villard Reviewed by: melifaro, emaste MFC after: 1 week	2019-12-13 11:47:58 +00:00
Andrey V. Elsukov	2873980947	Follow RFC 4443 p2.2 and always use own addresses for reflected ICMPv6 datagrams. Previously destination address from original datagram was used. That looked confusing, especially in the traceroute6 output. Also honor IPSTEALTH kernel option and do TTL/HLIM decrementing only when stealth mode is disabled. Reported by: Marco van Tol <marco at tols org> Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D22631	2019-12-12 13:28:46 +00:00
Andrey V. Elsukov	ca0ac0a6c1	Avoid access to stale ip pointer and call UPDATE_POINTERS() after PULLUP_LEN_LOCKED(). PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related pointers that can be used in next opcodes. Reported by: Maxime Villard <max at m00nbsd net> MFC after: 1 week	2019-12-10 10:35:32 +00:00
Gleb Smirnoff	16a72f53e2	Use epoch(9) directly instead of obsoleted KPI.	2019-10-14 16:37:41 +00:00
Gleb Smirnoff	961c033ef1	ipfw(4) rule matching always happens in network epoch.	2019-10-14 16:37:00 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Gleb Smirnoff	f8b45306c6	Drivers may pass runt packets to filter. This is okay. Reviewed by: gallatin	2019-09-13 22:36:04 +00:00
Andrey V. Elsukov	773a7e2224	Fix rule truncation on external action module unloading. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-08-15 13:44:33 +00:00
Andrey V. Elsukov	e758846c09	dd ipfw_get_action() function to get the pointer to action opcode. ACTION_PTR() returns pointer to the start of rule action section, but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ, and only then real action opcode is stored. ipfw_get_action() function inspects the rule action section, skips all modifiers and returns action opcode. Use this function in ipfw_reset_eaction() and flush_nat_ptrs(). MFC after: 1 week Sponsored by: Yandex LLC	2019-07-29 15:09:12 +00:00
Andrey V. Elsukov	d4e6a52959	Avoid possible lock leaking. After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just returns via `goto pullup_failed`. There are two places where PULLUP_LEN() is called with IPFW_PF_RLOCK() held. Add PULLUP_LEN_LOCKED() macro to use in these places to be able release the lock, when m_pullup() fails. Sponsored by: Yandex LLC	2019-07-29 12:55:48 +00:00
Andrey V. Elsukov	455d2ecb71	Eliminate rmlock from ipfw's BPF code. After r343631 pfil hooks are invoked in net_epoch_preempt section, this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion to each ipfw_bpf_tap() call to require to be called from inside epoch section. Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes safe to free() ifnet. And use on-stack ifnet pointer in each ipfw_bpf_tap() call to avoid NULL pointer dereference in case when V_log_if global variable will become NULL during ipfw_bpf_tap*() call. Sponsored by: Yandex LLC	2019-07-23 12:52:36 +00:00
Andrey V. Elsukov	2dab0de635	Do not modify cmd pointer if it is already last opcode in the rule. MFC after: 1 week	2019-07-12 09:59:21 +00:00
Andrey V. Elsukov	4ee2f4c180	Correctly truncate the rule in case when it has several action opcodes. It is possible, that opcode at the ACTION_PTR() location is not real action, but action modificator like "log", "tag" etc. In this case we need to check for each opcode in the loop to find O_EXTERNAL_ACTION. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-07-12 09:48:42 +00:00
Andrey V. Elsukov	019c8c9330	Follow the RFC 3128 and drop short TCP fragments with offset = 1. Reported by: emaste MFC after: 1 week	2019-06-25 11:40:37 +00:00
Andrey V. Elsukov	7d4b2d5244	Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in compact form. MFC after: 1 week	2019-06-25 09:11:22 +00:00
Andrey V. Elsukov	978f2d1728	Add "tcpmss" opcode to match the TCP MSS value. With this opcode it is possible to match TCP packets with specified MSS option, whose value corresponds to configured in opcode value. It is allowed to specify single value, range of values, or array of specific values or ranges. E.g. # ipfw add deny log tcp from any to any tcpmss 0-500 Reviewed by: melifaro,bcr Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-06-21 10:54:51 +00:00
Andrey V. Elsukov	efdadaa2d8	Initialize V_nat64out methods explicitly. It looks like initialization of static variable doesn't work for VIMAGE and this leads to panic. Reported by: olivier MFC after: 1 week	2019-06-05 09:25:40 +00:00
John Baldwin	fb3bc59600	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
Andrey V. Elsukov	90ecb41fba	Add IPv6 support for O_IPLEN opcode. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-04-29 09:33:16 +00:00
Rodney W. Grimes	6c1c6ae537	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317	2019-04-04 19:01:13 +00:00
Gleb Smirnoff	97245d4074	Always create ipfw(4) hooks as long as module is loaded. Now enabling ipfw(4) with sysctls controls only linkage of hooks to default heads. When module is loaded fetch sysctls as tunables, to make it possible to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.	2019-03-21 16:15:29 +00:00
Andrey V. Elsukov	b8c431f9c0	Do not enter epoch section recursively. A pfil hook is already invoked in NET_EPOCH section.	2019-03-20 10:11:21 +00:00
Andrey V. Elsukov	e0b7b6d465	Use NET_EPOCH instead of allocating separate one. MFC after: 1 month	2019-03-20 10:06:44 +00:00
Andrey V. Elsukov	d18c1f26a4	Reapply r345274 with build fixes for 32-bit architectures. Update NAT64LSN implementation: o most of data structures and relations were modified to be able support large number of translation states. Now each supported protocol can use full ports range. Ports groups now are belongs to IPv4 alias addresses, not hosts. Each ports group can keep several states chunks. This is controlled with new `states_chunks` config option. States chunks allow to have several translation states for single alias address and port, but for different destination addresses. o by default all hash tables now use jenkins hash. o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path. o one NAT64LSN instance now can be used to handle several IPv6 prefixes, special prefix "::" value should be used for this purpose when instance is created. o due to modified internal data structures relations, the socket opcode that does states listing was changed. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-19 10:57:03 +00:00
Andrey V. Elsukov	d6369c2d18	Revert r345274. It appears that not all 32-bit architectures have necessary CK primitives.	2019-03-18 14:00:19 +00:00
Andrey V. Elsukov	d7a1cf06f3	Update NAT64LSN implementation: o most of data structures and relations were modified to be able support large number of translation states. Now each supported protocol can use full ports range. Ports groups now are belongs to IPv4 alias addresses, not hosts. Each ports group can keep several states chunks. This is controlled with new `states_chunks` config option. States chunks allow to have several translation states for single alias address and port, but for different destination addresses. o by default all hash tables now use jenkins hash. o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path. o one NAT64LSN instance now can be used to handle several IPv6 prefixes, special prefix "::" value should be used for this purpose when instance is created. o due to modified internal data structures relations, the socket opcode that does states listing was changed. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-18 12:59:08 +00:00
Andrey V. Elsukov	5c04f73e07	Add NAT64 CLAT implementation as defined in RFC6877. CLAT is customer-side translator that algorithmically translates 1:1 private IPv4 addresses to global IPv6 addresses, and vice versa. It is implemented as part of ipfw_nat64 kernel module. When module is loaded or compiled into the kernel, it registers "nat64clat" external action. External action named instance can be created using `create` command and then used in ipfw rules. The create command accepts two IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted, IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used. # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in Obtained from: Yandex LLC Submitted by: Boris N. Lytochkin MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC	2019-03-18 11:44:53 +00:00
Andrey V. Elsukov	002cae78da	Add SPDX-License-Identifier and update year in copyright. MFC after: 1 month	2019-03-18 10:50:32 +00:00
Andrey V. Elsukov	b11efc1eb6	Modify struct nat64_config. Add second IPv6 prefix to generic config structure and rename another fields to conform to RFC6877. Now it contains two prefixes and length: PLAT is provider-side translator that translates N:1 global IPv6 addresses to global IPv4 addresses. CLAT is customer-side translator (XLAT) that algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses. Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn) translators. Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept prefix length and use plat_plen to specify prefix length. Retire net.inet.ip.fw.nat64_allow_private sysctl variable. Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to configure this ability separately for each NAT64 instance. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-18 10:39:14 +00:00
Gleb Smirnoff	f355cb3e6f	PFIL_MEMPTR for ipfw link level hook With new pfil(9) KPI it is possible to pass a void pointer with length instead of mbuf pointer to a packet filter. Until this commit no filters supported that, so pfil run through a shim function pfil_fake_mbuf(). Now the ipfw(4) hook named "default-link", that is instantiated when net.link.ether.ipfw sysctl is on, supports processing pointer/length packets natively. - ip_fw_args now has union for either mbuf or void , and if flags have non-zero length, then we use the void . - through ipfw_chk() we handle mem/mbuf cases differently. - ether_header goes away from args. It is ipfw_chk() responsibility to do parsing of Ethernet header. - ipfw_log() now uses different bpf APIs to log packets. Although ipfw_chk() is now capable to process pointer/length packets, this commit adds support for the link level hook only, see ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet() can be improved too, but that requires more changes since the hook supports more complex actions: NAT, divert, etc. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D19357	2019-03-14 22:52:16 +00:00
Gleb Smirnoff	dc0fa4f712	Remove 'dir' argument from dummynet_io(). This makes it possible to make dn_dir flags private to dummynet. There is still some room for improvement.	2019-03-14 22:32:50 +00:00
Gleb Smirnoff	b00b7e03fd	Reduce argument list to ipfw_divert(), as args holds the rule ref and the direction. While here make 'tee' a bool.	2019-03-14 22:31:12 +00:00
Gleb Smirnoff	cef9f220cd	Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info. While here make 'tee' boolean.	2019-03-14 22:30:05 +00:00
Gleb Smirnoff	b7795b6746	- Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir" parameter that is often passes together with args. - Rename ip_fw_args.oif to ifp and now it is set to either input or output interface, depending on IPFW_ARGS_IN/OUT bit set.	2019-03-14 22:28:50 +00:00

1 2 3 4 5 ...

425 Commits