Commit Graph

699 Commits

Author SHA1 Message Date
thj
c2faa57f9e Don't print VNET pointer when initializing dummynet
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.

PR:		238658
Submitted by:	huangfq.daxian@gmail.com
Reviewed by:	markj, bz, gnn, kp, melifaro
Approved by:	jtl (co-mentor), bz (co-mentor)
Event:		July 2020 Bugathon
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D25619
2020-07-13 13:35:36 +00:00
melifaro
bb8dbff4ac Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
 over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25445
2020-07-02 21:04:08 +00:00
markj
a43e851194 ipfw(4): make O_IPVER/ipversion match IPv4 or 6, not just IPv4.
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Reviewed by:	Lutz Donnerhacke
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25227
2020-06-24 15:46:33 +00:00
markj
864f5d9ff1 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
eugen
ad5077cb33 ipfw: unbreak matching with big table type flow.
Test case:

# n=32769
# ipfw -q table 1 create type flow:proto,dst-ip,dst-port
# jot -w 'table 1 add tcp,127.0.0.1,' $n 1 | ipfw -q /dev/stdin
# ipfw -q add 5 unreach filter-prohib flow 'table(1)'

The rule 5 matches nothing without the fix if n>=32769.

With the fix, it works:
# telnet localhost 10001
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Permission denied
telnet: Unable to connect to remote host

MFC after:	2 weeks
Discussed with: ae, melifaro
2020-06-04 14:15:39 +00:00
ae
d7e9bff26b Fix O_IP_FLOW_LOOKUP opcode handling.
Do not check table value matching when table lookup has failed.

Reported by:	Sergey Lobanov
MFC after:	1 week
2020-05-29 10:37:42 +00:00
markj
68806c4044 pf: Add a new zone for per-table entry counters.
Right now we optionally allocate 8 counters per table entry, so in
addition to memory consumed by counters, we require 8 pointers worth of
space in each entry even when counters are not allocated (the default).

Instead, define a UMA zone that returns contiguous per-CPU counter
arrays for use in table entries.  On amd64 this reduces sizeof(struct
pfr_kentry) from 216 to 160.  The smaller size also results in better
slab efficiency, so memory usage for large tables is reduced by about
28%.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24843
2020-05-16 00:28:12 +00:00
markj
5ffbc764fb pf: Don't allocate per-table entry counters unless required.
pf by default does not do per-table address accounting unless the
"counters" keyword is specified in the corresponding pf.conf table
definition.  Yet, we always allocate 12 per-CPU counters per table.  For
large tables this carries a lot of overhead, so only allocate counters
when they will actually be used.

A further enhancement might be to use a dedicated UMA zone to allocate
counter arrays for table entries, since close to half of the structure
size comes from counter pointers.  A related issue is the cost of
zeroing counters, since counter_u64_zero() calls smp_rendezvous() on
some architectures.

Reported by:	loos, Jim Pingle <jimp@netgate.com>
Reviewed by:	kp
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D24803
2020-05-11 18:47:38 +00:00
kp
0300ecad73 pf: Improve DIOCADDRULE validation
We expect the addrwrap.p.dyn value to be set to NULL (and assert such),
but do not verify it on input.

Reported-by:	syzbot+936a89182e7d8f927de1@syzkaller.appspotmail.com
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24538
2020-05-03 16:09:35 +00:00
emaste
8aa4dbecf5 ipfw: whitespace fix in SCTP_ABORT_ASSOCIATION case statement comment
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Reviewed by:	rgrimes, tuexen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24602
2020-05-03 03:44:16 +00:00
melifaro
7c1be85b66 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
kp
827ff56c93 pf: Virtualise pf_frag_mtx
The pf_frag_mtx mutex protects the fragments queue. The fragments queue
is virtualised already (i.e. per-vnet) so it makes no sense to block
jail A from accessing its fragments queue while jail B is accessing its
own fragments queue.

Virtualise the lock for improved concurrency.

Differential Revision:	https://reviews.freebsd.org/D24504
2020-04-26 16:30:00 +00:00
kp
4cc4938336 pf: Improve input validation
If we pass an anchor name which doesn't exist pfr_table_count() returns
-1, which leads to an overflow in mallocarray() and thus a panic.

Explicitly check that pfr_table_count() does not return an error.

Reported-by:	syzbot+bd09d55d897d63d5f4f4@syzkaller.appspotmail.com
Reviewed by:	melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24539
2020-04-26 16:16:39 +00:00
kp
96abf553f8 pf: Improve ioctl() input validation
Both DIOCCHANGEADDR and DIOCADDADDR take a struct pf_pooladdr from
userspace. They failed to validate the dyn pointer contained in its
struct pf_addr_wrap member structure.

This triggered assertion failures under fuzz testing in
pfi_dynaddr_setup(). Happily the dyn variable was overruled there, but
we should verify that it's set to NULL anyway.

Reported-by:	syzbot+93e93150bc29f9b4b85f@syzkaller.appspotmail.com
Reviewed by:	emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24431
2020-04-19 16:10:20 +00:00
kp
9319f3ce02 pf: Do not allow negative ps_len in DIOCGETSTATES
Userspace may pass a negative ps_len value to us, which causes an
assertion failure in malloc().
Treat negative values as zero, i.e. return the required size.

Reported-by:	syzbot+53370d9d0358ee2a059a@syzkaller.appspotmail.com
Reviewed by:	lutz at donnerhacke.de
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24447
2020-04-17 14:35:11 +00:00
melifaro
7b9732d3ab Convert pf rtable checks to the new routing KPI.
Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D24386
2020-04-15 13:00:48 +00:00
kaktus
ad355b0a9d Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
kaktus
5c176e02f4 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by:	kp
Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23634
2020-02-21 16:23:00 +00:00
hselasky
e0a1cd8bc2 Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

This patch extends r357772.

Differential Revision:	https://reviews.freebsd.org/D23742
Reviewed by:	glebius@
Sponsored by:	Mellanox Technologies
2020-02-18 19:53:36 +00:00
hselasky
845016f4a2 Add missing EPOCH(9) wrapper in ipfw(8).
Backtrace:
panic()
ip_output()
dyn_tick()
softclock_call_cc()
softclock()
ithread_loop()

Differential Revision:	https://reviews.freebsd.org/D23599
Reviewed by:	glebius@ and ae@
Found by:	mmacy@
Reported by:	jmd@
Sponsored by:	Mellanox Technologies
2020-02-11 18:16:29 +00:00
kp
95e3476bfd pf: Apply kif flags to new group members
If we have a 'set skip on <ifgroup>' rule this flag it set on the group
kif, but must also be set on all members. pfctl does this when the rules
are set, but if groups are added afterwards we must also apply the flags
to the new member. If not, new group members will not be skipped until
the rules are reloaded.

Reported by:	dvl@
Reviewed by:	glebius@
Differential Revision:	https://reviews.freebsd.org/D23254
2020-01-23 22:13:41 +00:00
kp
02995f6a7e pfsync: Ensure we enter network epoch before calling ip_output
As of r356974 calls to ip_output() require us to be in the network epoch.
That wasn't the case for the calls done from pfsyncintr() and
pfsync_defer_tmo().
2020-01-22 21:01:19 +00:00
glebius
5057bc60d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
melifaro
7cd596605e ipfw: Don't rollback state in alloc_table_vidx() if atomicity is not required.
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22662
2019-12-19 10:22:16 +00:00
melifaro
4344de7fc6 Revert r355908 to commit it with a proper message. 2019-12-19 10:20:38 +00:00
melifaro
52b84efbe2 svn-commit.tmp 2019-12-19 09:19:27 +00:00
kp
f063394b44 pf: Make request_maxcount runtime adjustable
There's no reason for this to be a tunable. It's perfectly safe to
change this at runtime.

Reviewed by:	Lutz Donnerhacke
Differential Revision:	https://reviews.freebsd.org/D22737
2019-12-14 02:06:07 +00:00
ae
5d1408ea62 Make TCP options parsing stricter.
Rework tcpopts_parse() to be more strict. Use const pointer. Add length
checks for specific TCP options. The main purpose of the change is
avoiding of possible out of mbuf's data access.

Reported by:	Maxime Villard
Reviewed by:	melifaro, emaste
MFC after:	1 week
2019-12-13 11:47:58 +00:00
ae
fb953e763f Follow RFC 4443 p2.2 and always use own addresses for reflected ICMPv6
datagrams.

Previously destination address from original datagram was used. That
looked confusing, especially in the traceroute6 output.
Also honor IPSTEALTH kernel option and do TTL/HLIM decrementing only
when stealth mode is disabled.

Reported by:	Marco van Tol <marco at tols org>
Reviewed by:	melifaro
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D22631
2019-12-12 13:28:46 +00:00
ae
6dfdda7982 Avoid access to stale ip pointer and call UPDATE_POINTERS() after
PULLUP_LEN_LOCKED().

PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related
pointers that can be used in next opcodes.

Reported by:	Maxime Villard <max at m00nbsd net>
MFC after:	1 week
2019-12-10 10:35:32 +00:00
kp
48915ceec6 pf: Add endline to all DPFPRINTF()
DPFPRINTF() doesn't automatically add an endline, so be consistent and
always add it.
2019-11-24 13:53:36 +00:00
kp
9cc4228e50 pf: Must be in NET_EPOCH to call icmp_error
icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH.
Failure to hold it leads to the following panic (with INVARIANTS):

  panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742
  cpuid = 2
  time = 1571233273
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920
  vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980
  panic() at panic+0x43/frame 0xfffffe00e09779e0
  icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0
  icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10
  pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50
  ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0
  fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0
  fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0

Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(),
but ip_output() will soon be converted to a function that requires epoch, so
entering NET_EPOCH directly from pf_intr() makes more sense.

Discussed with:	glebius@
2019-10-18 03:36:26 +00:00
glebius
5d9a41a2e5 Use epoch(9) directly instead of obsoleted KPI. 2019-10-14 16:37:41 +00:00
glebius
962bb78b05 ipfw(4) rule matching always happens in network epoch. 2019-10-14 16:37:00 +00:00
markj
fdce34ac5e Fix the build after r353458.
MFC with:	r353458
Sponsored by:	The FreeBSD Foundation
2019-10-13 00:08:17 +00:00
markj
0ca8957b6d Add a missing include of opt_sctp.h.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-12 23:01:16 +00:00
glebius
337378e04f Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
glebius
3d08ed450e Drivers may pass runt packets to filter. This is okay.
Reviewed by:	gallatin
2019-09-13 22:36:04 +00:00
ae
8915b51753 Fix rule truncation on external action module unloading.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-08-15 13:44:33 +00:00
emaste
2232ec44d9 pf: zero (another) output buffer in pfioctl
Avoid potential structure padding leak.  r350294 identified a leak via
static analysis; although there's no report of a leak with the
DIOCGETSRCNODES ioctl it's a good practice to zero the memory.

Suggested by:	kp
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-31 16:58:09 +00:00
ae
fe1518789b dd ipfw_get_action() function to get the pointer to action opcode.
ACTION_PTR() returns pointer to the start of rule action section,
but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ,
and only then real action opcode is stored.

ipfw_get_action() function inspects the rule action section, skips
all modifiers and returns action opcode.

Use this function in ipfw_reset_eaction() and flush_nat_ptrs().

MFC after:	1 week
Sponsored by:	Yandex LLC
2019-07-29 15:09:12 +00:00
kp
65ea01a683 pf: Remove partial RFC2675 support
Remove our (very partial) support for RFC2675 Jumbograms. They're not
used, not actually supported and not a good idea.

Reviewed by:	thj@
Differential Revision:	https://reviews.freebsd.org/D21086
2019-07-29 13:21:31 +00:00
ae
aaf6f1c4ac Avoid possible lock leaking.
After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro
is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just
returns via `goto pullup_failed`. There are two places where PULLUP_LEN()
is called with IPFW_PF_RLOCK() held.

Add PULLUP_LEN_LOCKED() macro to use in these places to be able release
the lock, when m_pullup() fails.

Sponsored by:	Yandex LLC
2019-07-29 12:55:48 +00:00
emaste
e6cea1ca6d pf: zero output buffer in pfioctl
Avoid potential structure padding leak.

Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	kp
MFC after:	3 days
Security:	Potential kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-07-24 16:51:14 +00:00
ae
5163ded6d4 Eliminate rmlock from ipfw's BPF code.
After r343631 pfil hooks are invoked in net_epoch_preempt section,
this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion
to each ipfw_bpf_*tap*() call to require to be called from inside
epoch section.

Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes
safe to free() ifnet. And use on-stack ifnet pointer in each
ipfw_bpf_*tap*() call to avoid NULL pointer dereference in case when
V_*log_if global variable will become NULL during ipfw_bpf_*tap*() call.

Sponsored by:	Yandex LLC
2019-07-23 12:52:36 +00:00
ae
571b48c044 Do not modify cmd pointer if it is already last opcode in the rule.
MFC after:	1 week
2019-07-12 09:59:21 +00:00
ae
15268fa047 Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-07-12 09:48:42 +00:00
hselasky
1a5fd513af Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by:	markj @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:54:41 +00:00
ae
e59b219e5f Follow the RFC 3128 and drop short TCP fragments with offset = 1.
Reported by:	emaste
MFC after:	1 week
2019-06-25 11:40:37 +00:00
ae
a864a749ac Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in
compact form.

MFC after:	1 week
2019-06-25 09:11:22 +00:00