Commit Graph

664 Commits

Author SHA1 Message Date
Andrey V. Elsukov
2dab0de635 Do not modify cmd pointer if it is already last opcode in the rule.
MFC after:	1 week
2019-07-12 09:59:21 +00:00
Andrey V. Elsukov
4ee2f4c180 Correctly truncate the rule in case when it has several action opcodes.
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-07-12 09:48:42 +00:00
Hans Petter Selasky
59854ecf55 Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by:	markj @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:54:41 +00:00
Andrey V. Elsukov
019c8c9330 Follow the RFC 3128 and drop short TCP fragments with offset = 1.
Reported by:	emaste
MFC after:	1 week
2019-06-25 11:40:37 +00:00
Andrey V. Elsukov
7d4b2d5244 Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in
compact form.

MFC after:	1 week
2019-06-25 09:11:22 +00:00
Andrey V. Elsukov
978f2d1728 Add "tcpmss" opcode to match the TCP MSS value.
With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.

 # ipfw add deny log tcp from any to any tcpmss 0-500

Reviewed by:	melifaro,bcr
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-06-21 10:54:51 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Andrey V. Elsukov
efdadaa2d8 Initialize V_nat64out methods explicitly.
It looks like initialization of static variable doesn't work for
VIMAGE and this leads to panic.

Reported by:	olivier
MFC after:	1 week
2019-06-05 09:25:40 +00:00
Li-Wen Hsu
d086d41363 Remove an uneeded indentation introduced in r223637 to silence gcc warnging
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-05-25 23:58:09 +00:00
John Baldwin
fb3bc59600 Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated
  for a different ifp than the one the packet is being output on), in
  ip_output() and ip6_output().  This avoids sending packets with send
  tags to ifnet drivers that don't support send tags.

  Since we are now checking for ifp mismatches before invoking
  if_output, we can now try to allocate a new tag before invoking
  if_output sending the original packet on the new tag if allocation
  succeeds.

  To avoid code duplication for the fragment and unfragmented cases,
  add ip_output_send() and ip6_output_send() as wrappers around
  if_output and nd6_output_ifp, respectively.  All of the logic for
  setting send tags and dealing with send tag-related errors is done
  in these wrapper functions.

  For pseudo interfaces that wrap other network interfaces (vlan and
  lagg), wrapper send tags are now allocated so that ip*_output see
  the wrapper ifp as the ifp in the send tag.  The if_transmit
  routines rewrite the send tags after performing an ifp mismatch
  check.  If an ifp mismatch is detected, the transmit routines fail
  with EAGAIN.

- To provide clearer life cycle management of send tags, especially
  in the presence of vlan and lagg wrapper tags, add a reference count
  to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
  Provide a helper function (m_snd_tag_init()) for use by drivers
  supporting send tags.  m_snd_tag_init() takes care of the if_ref
  on the ifp meaning that code alloating send tags via if_snd_tag_alloc
  no longer has to manage that manually.  Similarly, m_snd_tag_rele
  drops the refcount on the ifp after invoking if_snd_tag_free when
  the last reference to a send tag is dropped.

  This also closes use after free races if there are pending packets in
  driver tx rings after the socket is closed (e.g. from tcpdrop).

  In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
  csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
  Drivers now also check this flag instead of checking snd_tag against
  NULL.  This avoids false positive matches when a forwarded packet
  has a non-NULL rcvif that was treated as a send tag.

- cxgbe was relying on snd_tag_free being called when the inp was
  detached so that it could kick the firmware to flush any pending
  work on the flow.  This is because the driver doesn't require ACK
  messages from the firmware for every request, but instead does a
  kind of manual interrupt coalescing by only setting a flag to
  request a completion on a subset of requests.  If all of the
  in-flight requests don't have the flag when the tag is detached from
  the inp, the flow might never return the credits.  The current
  snd_tag_free command issues a flush command to force the credits to
  return.  However, the credit return is what also frees the mbufs,
  and since those mbufs now hold references on the tag, this meant
  that snd_tag_free would never be called.

  To fix, explicitly drop the mbuf's reference on the snd tag when the
  mbuf is queued in the firmware work queue.  This means that once the
  inp's reference on the tag goes away and all in-flight mbufs have
  been queued to the firmware, tag's refcount will drop to zero and
  snd_tag_free will kick in and send the flush request.  Note that we
  need to avoid doing this in the middle of ethofld_tx(), so the
  driver grabs a temporary reference on the tag around that loop to
  defer the free to the end of the function in case it sends the last
  mbuf to the queue after the inp has dropped its reference on the
  tag.

- mlx5 preallocates send tags and was using the ifp pointer even when
  the send tag wasn't in use.  Explicitly use the ifp from other data
  structures instead.

- Sprinkle some assertions in various places to assert that received
  packets don't have a send tag, and that other places that overwrite
  rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.

Reviewed by:	gallatin, hselasky, rgrimes, ae
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20117
2019-05-24 22:30:40 +00:00
Andrey V. Elsukov
90ecb41fba Add IPv6 support for O_IPLEN opcode.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-04-29 09:33:16 +00:00
Kristof Provost
1c75b9d2cd pf: No need to M_NOWAIT in DIOCRSETTFLAGS
Now that we don't hold a lock during DIOCRSETTFLAGS memory allocation we can
use M_WAITOK.

MFC after:	1 week
Event:		Aberdeen hackathon 2019
Pointed out by:	glebius@
2019-04-18 11:37:44 +00:00
Kristof Provost
f5e0d9fcb4 pf: Fix panic on invalid DIOCRSETTFLAGS
If during DIOCRSETTFLAGS pfrio_buffer is NULL copyin() will fault, which we're
not allowed to do with a lock held.
We must count the number of entries in the table and release the lock during
copyin(). Only then can we re-acquire the lock. Note that this is safe, because
pfr_set_tflags() will check if the table and entries exist.

This was discovered by a local syzcaller instance.

MFC after:	1 week
Event:		Aberdeen hackathon 2019
2019-04-17 16:42:54 +00:00
Rodney W. Grimes
6c1c6ae537 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
Conrad Meyer
a8a16c7128 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
Ed Maste
a342f5772f pf: use UID_ROOT and GID_WHEEL named constants in make_dev
No functional change but improves consistency and greppability of
make_dev calls.

Discussed with: kp
2019-03-26 21:20:42 +00:00
Gleb Smirnoff
97245d4074 Always create ipfw(4) hooks as long as module is loaded.
Now enabling ipfw(4) with sysctls controls only linkage of hooks to default
heads. When module is loaded fetch sysctls as tunables, to make it possible
to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.
2019-03-21 16:15:29 +00:00
Kristof Provost
64af73aade pf: Ensure that IP addresses match in ICMP error packets
States in pf(4) let ICMP and ICMP6 packets pass if they have a
packet in their payload that matches an exiting connection.  It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet.  Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.

Reported by:	Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv
Obtained from:	OpenBSD
Security:	CVE-2019-5598
2019-03-21 08:09:52 +00:00
Andrey V. Elsukov
b8c431f9c0 Do not enter epoch section recursively.
A pfil hook is already invoked in NET_EPOCH section.
2019-03-20 10:11:21 +00:00
Andrey V. Elsukov
e0b7b6d465 Use NET_EPOCH instead of allocating separate one.
MFC after:	1 month
2019-03-20 10:06:44 +00:00
Andrey V. Elsukov
d18c1f26a4 Reapply r345274 with build fixes for 32-bit architectures.
Update NAT64LSN implementation:

  o most of data structures and relations were modified to be able support
    large number of translation states. Now each supported protocol can
    use full ports range. Ports groups now are belongs to IPv4 alias
    addresses, not hosts. Each ports group can keep several states chunks.
    This is controlled with new `states_chunks` config option. States
    chunks allow to have several translation states for single alias address
    and port, but for different destination addresses.
  o by default all hash tables now use jenkins hash.
  o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
  o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
    special prefix "::" value should be used for this purpose when instance
    is created.
  o due to modified internal data structures relations, the socket opcode
    that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-19 10:57:03 +00:00
Andrey V. Elsukov
d6369c2d18 Revert r345274. It appears that not all 32-bit architectures have
necessary CK primitives.
2019-03-18 14:00:19 +00:00
Andrey V. Elsukov
d7a1cf06f3 Update NAT64LSN implementation:
o most of data structures and relations were modified to be able support
  large number of translation states. Now each supported protocol can
  use full ports range. Ports groups now are belongs to IPv4 alias
  addresses, not hosts. Each ports group can keep several states chunks.
  This is controlled with new `states_chunks` config option. States
  chunks allow to have several translation states for single alias address
  and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
  special prefix "::" value should be used for this purpose when instance
  is created.
o due to modified internal data structures relations, the socket opcode
  that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 12:59:08 +00:00
Andrey V. Elsukov
5c04f73e07 Add NAT64 CLAT implementation as defined in RFC6877.
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.

  # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
  # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
  # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in

Obtained from:	Yandex LLC
Submitted by:	Boris N. Lytochkin
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
2019-03-18 11:44:53 +00:00
Andrey V. Elsukov
002cae78da Add SPDX-License-Identifier and update year in copyright.
MFC after:	1 month
2019-03-18 10:50:32 +00:00
Andrey V. Elsukov
b11efc1eb6 Modify struct nat64_config.
Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.

Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.

Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 10:39:14 +00:00
Kristof Provost
812483c46e pf: Rename pfsync bucket lock
Previously the main pfsync lock and the bucket locks shared the same name.
This lead to spurious warnings from WITNESS like this:

    acquiring duplicate lock of same type: "pfsync"
     1st pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1402
     2nd pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1429

It's perfectly okay to grab both the main pfsync lock and a bucket lock at the
same time.

We don't need different names for each bucket lock, because we should always
only acquire a single one of those at a time.

MFC after:	1 week
2019-03-16 10:14:03 +00:00
Kristof Provost
5904868691 pf :Use counter(9) in pf tables.
The counters of pf tables are updated outside the rule lock. That means state
updates might overwrite each other. Furthermore allocation and
freeing of counters happens outside the lock as well.

Use counter(9) for the counters, and always allocate the counter table
element, so that the race condition cannot happen any more.

PR:		230619
Submitted by:	Kajetan Staszkiewicz <vegeta@tuxpowered.net>
Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19558
2019-03-15 11:08:44 +00:00
Gleb Smirnoff
f355cb3e6f PFIL_MEMPTR for ipfw link level hook
With new pfil(9) KPI it is possible to pass a void pointer with length
instead of mbuf pointer to a packet filter. Until this commit no filters
supported that, so pfil run through a shim function pfil_fake_mbuf().

Now the ipfw(4) hook named "default-link", that is instantiated when
net.link.ether.ipfw sysctl is on, supports processing pointer/length
packets natively.

- ip_fw_args now has union for either mbuf or void *, and if flags have
  non-zero length, then we use the void *.
- through ipfw_chk() we handle mem/mbuf cases differently.
- ether_header goes away from args. It is ipfw_chk() responsibility
  to do parsing of Ethernet header.
- ipfw_log() now uses different bpf APIs to log packets.

Although ipfw_chk() is now capable to process pointer/length packets,
this commit adds support for the link level hook only, see
ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet()
can be improved too, but that requires more changes since the hook
supports more complex actions: NAT, divert, etc.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D19357
2019-03-14 22:52:16 +00:00
Gleb Smirnoff
dc0fa4f712 Remove 'dir' argument from dummynet_io(). This makes it possible to make
dn_dir flags private to dummynet. There is still some room for improvement.
2019-03-14 22:32:50 +00:00
Gleb Smirnoff
b00b7e03fd Reduce argument list to ipfw_divert(), as args holds the rule ref and
the direction. While here make 'tee' a bool.
2019-03-14 22:31:12 +00:00
Gleb Smirnoff
cef9f220cd Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info.
While here make 'tee' boolean.
2019-03-14 22:30:05 +00:00
Gleb Smirnoff
b7795b6746 - Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
  parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
  output interface, depending on IPFW_ARGS_IN/OUT bit set.
2019-03-14 22:28:50 +00:00
Gleb Smirnoff
1830dae3d3 Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.
2019-03-14 22:23:09 +00:00
Gleb Smirnoff
2d0232783c Simplify ipfw_bpf_mtap2(). No functional change. 2019-03-14 22:20:48 +00:00
Andrey V. Elsukov
ca0f03e808 Add IP_FW_NAT64 to codes that ipfw_chk() can return.
It will be used by upcoming NAT64 changes. We use separate code
to avoid propogating EACCES error code to user level applications
when NAT64 consumes a packet.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-03-11 10:42:09 +00:00
Andrey V. Elsukov
d76227959a Add NULL pointer check to nat64_output().
It is possible, that a processed packet was originated by local host,
in this case m->m_pkthdr.rcvif is NULL. Check and set it to V_loif to
avoid NULL pointer dereference in IP input code, since it is expected
that packet has valid receiving interface when netisr processes it.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-03-11 10:33:32 +00:00
Kristof Provost
f8e7fe32a4 pf: Fix DIOCGETSRCNODES
r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the
number of source tracking nodes.
This meant that we never copied the information to userspace, leading to '? ->
?' output from pfctl.

PR:		236368
MFC after:	1 week
2019-03-08 09:33:16 +00:00
Andrey V. Elsukov
83354acf5a Fix the problem with O_LIMIT states introduced in r344018.
dyn_install_state() uses `rule` pointer when it creates state.
For O_LIMIT states this pointer actually is not struct ip_fw,
it is pointer to O_LIMIT_PARENT state, that keeps actual pointer
to ip_fw parent rule. Thus we need to cache rule id and number
before calling dyn_get_parent_state(), so we can use them later
when the `rule` pointer is overrided.

PR:		236292
MFC after:	3 days
2019-03-07 04:40:44 +00:00
Kristof Provost
6f4909de5f pf: IPv6 fragments with malformed extension headers could be erroneously passed by pf or cause a panic
We mistakenly used the extoff value from the last packet to patch the
next_header field. If a malicious host sends a chain of fragmented packets
where the first packet and the final packet have different lengths or number of
extension headers we'd patch the next_header at the wrong offset.
This can potentially lead to panics or rule bypasses.

Security:       CVE-2019-5597
Obtained from:  OpenBSD
Reported by:    Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv
2019-03-01 07:37:45 +00:00
Kristof Provost
22c58991e3 pf: Small performance tweak
Because fetching a counter is a rather expansive function we should use
counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

From the man page:
    The adaptive timeout values can be defined both globally and for
    each rule.  When used on a per-rule basis, the values relate to the
    number of states created by the rule, otherwise to the total number
    of states.

This handling of adaptive timeouts is done in pf_state_expires().  The
calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
   runs in the else-section and therefore takes "start" and "end" from
   the global default settings and sets "states" to pf_status.states
   (= total number of states).

2. Special rules like
   "pass .. keep state (adaptive.start 500 adaptive.end 1000)"
   have start != 0, run in the if-section and take "start" and "end"
   from the rule and set "states" to the number of states created by
   their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in the
above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
   Therefore we have "start != 0" in this case and we run the
   if-section but we better should run the else-section in this case and
   do not fetch the counter of the pf_default_rule but take the total
   number of states.

Submitted by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	2 weeks
2019-02-24 17:23:55 +00:00
Andrey V. Elsukov
804a6541db Remove `set' field from state structure and use set from parent rule.
Initially it was introduced because parent rule pointer could be freed,
and rule's information could become inaccessible. In r341471 this was
changed. And now we don't need this information, and also it can become
stale. E.g. rule can be moved from one set to another. This can lead
to parent's set and state's set will not match. In this case it is
possible that static rule will be freed, but dynamic state will not.
This can happen when `ipfw delete set N` command is used to delete
rules, that were moved to another set.
To fix the problem we will use the set number from parent rule.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-02-11 18:10:55 +00:00
Patrick Kelsey
d178fee632 Place pf_altq_get_nth_active() under the ALTQ ifdef
MFC after:	1 week
2019-02-11 05:39:38 +00:00
Patrick Kelsey
8f2ac65690 Reduce the time it takes the kernel to install a new PF config containing a large number of queues
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19131
2019-02-11 05:17:31 +00:00
Gleb Smirnoff
d38ca3297c Return PFIL_CONSUMED if packet was consumed. While here gather all
the identical endings of pf_check_*() into single function.

PR:		235411
2019-02-02 05:49:05 +00:00
Gleb Smirnoff
2790ca97d9 Fix build without INET6. 2019-02-01 00:33:17 +00:00
Gleb Smirnoff
b252313f0b New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented.  The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision:	https://reviews.freebsd.org/D18951
2019-01-31 23:01:03 +00:00
Gleb Smirnoff
f712b16127 Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock.
The pfil(9) system is about to be converted to epoch(9) synchronization, so
we need [temporarily] go back with ipfw internal locking.

Discussed with:	ae
2019-01-31 21:04:50 +00:00
Andrey V. Elsukov
7664b71b62 Fix the bug introduced in r342908, that causes problems with dynamic
handling for protocols without ports numbers.

Since port numbers were uninitialized for protocols like ICMP/ICMPv6,
ipfw_chk() used some non-zero values to create dynamic states, and due
this it failed to match replies with created states.

Reported by:	Oliver Hartmann, Boris Lytochkin
Obtained from:	Yandex LLC
X-MFC after:	r342908
2019-01-29 11:18:41 +00:00
Patrick Kelsey
59099cd385 Don't re-evaluate ALTQ kernel configuration due to events on non-ALTQ interfaces
Re-evaluating the ALTQ kernel configuration can be expensive,
particularly when there are a large number (hundreds or thousands) of
queues, and is wholly unnecessary in response to events on interfaces
that do not support ALTQ as such interfaces cannot be part of an ALTQ
configuration.

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D18918
2019-01-28 20:26:09 +00:00