Commit Graph

103 Commits

Author SHA1 Message Date
Andrey V. Elsukov
90ecb41fba Add IPv6 support for O_IPLEN opcode.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-04-29 09:33:16 +00:00
Gleb Smirnoff
97245d4074 Always create ipfw(4) hooks as long as module is loaded.
Now enabling ipfw(4) with sysctls controls only linkage of hooks to default
heads. When module is loaded fetch sysctls as tunables, to make it possible
to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.
2019-03-21 16:15:29 +00:00
Gleb Smirnoff
f355cb3e6f PFIL_MEMPTR for ipfw link level hook
With new pfil(9) KPI it is possible to pass a void pointer with length
instead of mbuf pointer to a packet filter. Until this commit no filters
supported that, so pfil run through a shim function pfil_fake_mbuf().

Now the ipfw(4) hook named "default-link", that is instantiated when
net.link.ether.ipfw sysctl is on, supports processing pointer/length
packets natively.

- ip_fw_args now has union for either mbuf or void *, and if flags have
  non-zero length, then we use the void *.
- through ipfw_chk() we handle mem/mbuf cases differently.
- ether_header goes away from args. It is ipfw_chk() responsibility
  to do parsing of Ethernet header.
- ipfw_log() now uses different bpf APIs to log packets.

Although ipfw_chk() is now capable to process pointer/length packets,
this commit adds support for the link level hook only, see
ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet()
can be improved too, but that requires more changes since the hook
supports more complex actions: NAT, divert, etc.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D19357
2019-03-14 22:52:16 +00:00
Gleb Smirnoff
dc0fa4f712 Remove 'dir' argument from dummynet_io(). This makes it possible to make
dn_dir flags private to dummynet. There is still some room for improvement.
2019-03-14 22:32:50 +00:00
Gleb Smirnoff
b7795b6746 - Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
  parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
  output interface, depending on IPFW_ARGS_IN/OUT bit set.
2019-03-14 22:28:50 +00:00
Gleb Smirnoff
f712b16127 Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock.
The pfil(9) system is about to be converted to epoch(9) synchronization, so
we need [temporarily] go back with ipfw internal locking.

Discussed with:	ae
2019-01-31 21:04:50 +00:00
Andrey V. Elsukov
7664b71b62 Fix the bug introduced in r342908, that causes problems with dynamic
handling for protocols without ports numbers.

Since port numbers were uninitialized for protocols like ICMP/ICMPv6,
ipfw_chk() used some non-zero values to create dynamic states, and due
this it failed to match replies with created states.

Reported by:	Oliver Hartmann, Boris Lytochkin
Obtained from:	Yandex LLC
X-MFC after:	r342908
2019-01-29 11:18:41 +00:00
Andrey V. Elsukov
48266154de Relax requirement to packet size of CARP protocol and remove version check.
CARP shares protocol number 112 with VRRP (RFC 5798). And the size of
VRRP packet may be smaller than CARP. ipfw_chk() does m_pullup() to at
least sizeof(struct carp_header) and can fail when packet is VRRP. This
leads to packet drop and message about failed pullup attempt.
Also, RFC 5798 defines version 3 of VRRP protocol, this version number
also unsupported by CARP and such check leads to packet drop.

carp_input() does its own checks for protocol version and packet size,
so we can remove these checks to be able pass VRRP packets.

PR:		234207
MFC after:	1 week
2019-01-11 01:54:15 +00:00
Andrey V. Elsukov
1cdf23bc03 Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64.
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.

ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64.  Each time ipfw(4) does bzero() on it, and then it initializes
some fields.

glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).

Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed.  The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.

ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.

Reviewed by:	gallatin
Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D18690
2019-01-10 01:47:57 +00:00
Andrey V. Elsukov
986368d85d Add extra parentheses to fix "versrcreach" opcode, (oif != NULL) should
not be used as condition for ternary operator.

Submitted by:	Tatsuki Makino <tatsuki_makino at hotmail dot com>
Approved by:	re (kib)
MFC after:	1 week
2018-10-15 10:25:34 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Andrey V. Elsukov
f7c4fdee1a Add "record-state", "set-limit" and "defer-action" rule options to ipfw.
"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by:	lev
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D1776
2018-07-09 11:35:18 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Andrey V. Elsukov
12c080e613 Do not try to reassemble IPv6 fragments in "reass" rule.
ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.

PR:		170604
MFC after:	1 week
2018-03-12 09:40:46 +00:00
Andrey V. Elsukov
b99a682320 Rework ipfw dynamic states implementation to be lockless on fast path.
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
  for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
  need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
  information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
  dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
  lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
  and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
  O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
  prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
  separate arrays. 65535 limit to maximum number of hash buckets now
  removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
  and for parent states.
o By default now is used Jenkinks hash function.

Obtained from:	Yandex LLC
MFC after:	42 days
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D12685
2018-02-07 18:59:54 +00:00
Andrey V. Elsukov
14a6bab1da When IPv6 packet is handled by O_REJECT opcode, convert ICMP code
specified in the arg1 into ICMPv6 destination unreachable code according
to RFC7915.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2018-01-24 12:40:28 +00:00
Andrey V. Elsukov
d38344208e Add UDPLite support to ipfw(4).
Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2018-01-19 12:50:03 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Michael Tuexen
665c8a2ee5 Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.

There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.

Thanks to Timo Voelker for helping me to test this patch.
Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part)
Differential Revision:	https://reviews.freebsd.org/D13239
2017-11-26 18:19:01 +00:00
Andrey V. Elsukov
1719df1bb4 Modify ipfw's dynamic states KPI.
Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D11657
2017-11-23 08:02:02 +00:00
Andrey V. Elsukov
30df59d581 Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c.
It is not specific for dynamic states function and called also from
generic code.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-11-23 06:04:57 +00:00
Andrey V. Elsukov
7143bb7626 Add ipfw_add_protected_rule() function that creates rule with 65535
number in the reserved set 31. Use this function to create default rule.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-11-22 05:49:21 +00:00
Andrey V. Elsukov
66f84fabb3 Add comment for accidentally committed unrelated change in r325960.
Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects
a packet is IPv4. And in case when it is IPv6, it just translates them
as IPv4. This leads to corruption and in some cases to panics.
In particular a panic can happen when value of ip6_plen modified to
something that leads to IP fragmentation, but actual packet length does
not match the IP length.

Packets that are not IPv4 will be dropped by NAT rule.

Reported by:	Viktor Dukhovni <freebsd at dukhovni dot org>
MFC after:	1 week
2017-11-17 23:25:06 +00:00
Andrey V. Elsukov
e11f0a0c4c Unconditionally enable support for O_IPSEC opcode.
IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-11-17 22:40:02 +00:00
Andrey V. Elsukov
5c70ebfa57 Add IPv6 support for O_TCPDATALEN opcode.
PR:		222746
MFC after:	1 week
2017-10-24 08:39:05 +00:00
Andrey V. Elsukov
ff0a137952 Fix regression in handling O_FORWARD_IP opcode after r279948.
To properly handle 'fwd tablearg,port' opcode, copy sin_port value from
sockaddr_in structure stored in the opcode into corresponding hopstore
field.

PR:		222953
MFC after:	1 week
2017-10-13 11:11:53 +00:00
Michael Tuexen
945906384d Fix a bug which avoided that rules for matching port numbers for SCTP
packets where actually matched.
While there, make clean in the man-page that SCTP port numbers are
supported in rules.

MFC after:	1 month
2017-10-02 18:25:30 +00:00
Andrey V. Elsukov
5df8171da3 Use in_localip() function instead of unlocked access to addresses hash
to determine that an address is our local.

PR:		220078
MFC after:	1 week
2017-09-20 22:35:28 +00:00
Philip Paeps
b0e1660d53 Fix GRE over IPv6 tunnels with IPFW
Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).

PR:		220640
Submitted by:	Kun Xie <kxie@xiplink.com>
MFC after:	1 week
2017-07-13 09:01:22 +00:00
Andrey V. Elsukov
88d950a650 Remove "IPFW static rules" rmlock.
Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
No objection from: #network
Differential Revision:	https://reviews.freebsd.org/D10154
2017-04-03 13:35:04 +00:00
Andrey V. Elsukov
788e62864f Reset the cached state of last lookup in the dynamic states when an
external action is completed, but the rule search is continued.

External action handler can change the content of @args argument,
that is used for dynamic state lookup. Enforce the new lookup to be able
install new state, when the search is continued.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-03-31 09:26:08 +00:00
Andrey V. Elsukov
54e5669d8c Add IPv6 support to O_IP_DST_LOOKUP opcode.
o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of
  ipfw_insn_u32;
o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and
  remove old ipfw_lookup_table();
o use args->f_id.flow_id6 that is in host byte order to get DSCP value;
o add SCTP ports support to 'lookup src/dst-port' opcode;
o add IPv6 support to 'lookup src/dst-ip' opcode.

PR:		217292
Reviewed by:	melifaro
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9873
2017-03-05 23:48:24 +00:00
Andrey V. Elsukov
43b294a4db Fix matching table entry value. Use real table value instead of its index
in valuestate array.

When opcode has size equal to ipfw_insn_u32, this means that it should
additionally match value specified in d[0] with table entry value.
ipfw_table_lookup() returns table value index, use TARG_VAL() macro to
convert it to its value. The actual 32-bit value stored in the tag field
of table_value structure, where all unspecified u32 values are kept.

PR:		217262
Reviewed by:	melifaro
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-03-03 20:22:42 +00:00
Andrey V. Elsukov
576429f04b Fix NPTv6 rule counters when one_pass is not enabled.
Consider the rule matching when both @done and @retval values
returned from ipfw_run_eaction() are zero. And modify ipfw_nptv6()
to return IP_FW_DENY and @done=0 when addresses do not match.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2017-03-01 20:00:19 +00:00
Andrey V. Elsukov
0d9cbb874c Move opcode rewriter init and destroy handlers into non-VENT code.
PR:		212576,212649,212077
Submitted by:	John Zielinski
MFC after:	1 week
2016-09-18 17:35:17 +00:00
Andrey V. Elsukov
56132dcc0d Move logging via BPF support into separate file.
* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
  ipfw0 by DLT type used in bpfattach. This interface is intended to
  used by ipfw modules to dump packets with additional info attached.
  Currently pflog format is used. ipfw_bpf_mtap2() function uses second
  argument to determine which interface use for dumping. If dlen is equal
  to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
  PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-08-13 15:41:04 +00:00
Andrey V. Elsukov
d6eb9b0249 Restore "nat global" support.
Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.

PR:		211256
Tested by:	Victor Chernov
MFC after:	3 days
2016-08-11 10:10:10 +00:00
Andrey V. Elsukov
ed22e564b8 Add named dynamic states support to ipfw(4).
The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by:	julian
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6674
2016-07-19 04:56:59 +00:00
Bjoern A. Zeeb
31fe4e62fa Move the ipfw_log_bpf() calls from global module initialisation to
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.

Approved by:		re (gjb)
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
2016-06-30 01:33:14 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Andrey V. Elsukov
2acdf79f53 Add External Actions KPI to ipfw(9).
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 22:51:23 +00:00
Andrey V. Elsukov
23a6c7330c Fix bug in filling and handling ipfw's O_DSCP opcode.
Due to integer overflow CS4 token was handled as BE.

PR:		207459
MFC after:	1 week
2016-02-24 13:16:03 +00:00
Hans Petter Selasky
c8cfbc066f Properly drain callouts in the IPFW subsystem to avoid use after free
panics when unloading the dummynet and IPFW modules:

- The callout drain function can sleep and should not be called having
a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)".

- Add a new "dn_gone" variable to prevent asynchronous restart of
dummynet callouts when unloading the dummynet kernel module.

- Call "dn_reschedule()" locked so that "dn_gone" can be set and
checked atomically with regard to starting a new callout.

Reviewed by:	hiren
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3855
2015-12-15 09:02:05 +00:00
Alexander V. Chernikov
65ff3638df Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
Alexander V. Chernikov
b554a27822 Fix setfib target.
Problem was introduced in r272840 when converting tablearg value to 0.

Submitted by:	Denis Schneider <v1ne2go at gmail>
2015-11-08 12:24:19 +00:00
Andrey V. Elsukov
b13653baf9 Reduce overhead of ipfw's me6 opcode.
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 10:53:42 +00:00
Andrey V. Elsukov
af9aa0a837 Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 07:26:31 +00:00
Luigi Rizzo
8ff71b031e bugfix (only affecting the "lookup" option in the userspace version of ipfw):
the conditional block should not include the 'else' otherwise
the code does a 'break;' without completing the check
2015-05-13 11:53:25 +00:00
Alexander V. Chernikov
74b22066b0 Make rule table kernel-index rewriting support any kind of objects.
Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
 values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
 do a) for them.

Kernel does almost the same when exporting rules to userland:
 prepares array of used tables in all rules in range, and
 prepends it before the actual ruleset retaining actual in-kernel
 indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
  between userland string identifier and in-kernel index.
  (See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
 shared index and names instead of numbers (in next commit).

Sponsored by:	Yandex LLC
2015-04-27 08:29:39 +00:00