Commit Graph

200 Commits

Author SHA1 Message Date
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Alexander V. Chernikov
da187ddb3d * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
Alexander V. Chernikov
e7403d0230 Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
Alexander V. Chernikov
79674562b8 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
Alexander V. Chernikov
3553b3007f Switch ip_output/icmp_reflect rt lookup calls with fib4_lookup.
fib4_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.

Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24976
2020-05-28 07:31:53 +00:00
Alexander V. Chernikov
9ac7c6cfed Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to
the new routing KPI.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24245
2020-04-14 23:06:25 +00:00
Randall Stewart
481be5de9d White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by:	Netflix Inc.
2020-02-12 13:31:36 +00:00
Alexander V. Chernikov
34a5582c47 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Mark Johnston
7762bbc30e Add CTLFLAG_VNET to the net.inet.icmp.tstamprepl definition.
Reported by:	Hans Fiedler <hans@hfconsulting.com>
MFC after:	3 days
2019-03-26 22:14:50 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Ed Maste
2bfaf585ca Avoid buffer underwrite in icmp_error
icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.

Include the ip header in the size passed to m_align.  On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster.  This will result in slightly pessimal alignment for
the ICMP data copy.

Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.

Reported by:	A reddit user
Reviewed by:	bz, jtl
MFC after:	3 days
Security:	CVE-2018-17156
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17909
2018-11-08 20:17:36 +00:00
Jonathan T. Looney
54e675342b m_pulldown() may reallocate n. Update the oip pointer after the
m_pulldown() call.

MFC after:	2 weeks
Sponsored by:	Netflix
2018-11-02 19:14:15 +00:00
Eugene Grosbein
410634efd1 New sysctl: net.inet.icmp.error_keeptags
Currently, icmp_error() function copies FIB number from original packet
into generated ICMP response but not mbuf_tags(9) chain.
This prevents us from easily matching ICMP responses corresponding
to tagged original packets by means of packet filter such as ipfw(8).
For example, ICMP "time-exceeded in-transit" packets usually generated
in response to traceroute probes lose tags attached to original packets.

This change adds new sysctl net.inet.icmp.error_keeptags
that defaults to 0 to avoid extra overhead when this feature not needed.

Set net.inet.icmp.error_keeptags=1 to make icmp_error() copy mbuf_tags
from original packet to generated ICMP response.

PR:		215874
MFC after:	1 month
2018-10-21 21:29:19 +00:00
Andrew Turner
1e0582fd55 icmp_quotelen was accidentially changes in r336676, undo this.
Sponsored by:	DARPA, AFRL
2018-07-24 16:45:01 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Randall Stewart
8de9ac5eec Bump the ICMP echo limits to match the RFC
Reviewed by:	tuexen
Sponsored by: Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D16333
2018-07-18 22:49:53 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Andrey V. Elsukov
f415d666c3 Some mbuf related fixes in icmp_error()
* check mbuf length before doing mtod() and accessing to IP header;
* update oip pointer and all depending pointers after m_pullup();
* remove extra checks and extra parentheses, wrap long lines;

PR:		222670
Reported by:	Prabhakar Lakhera
MFC after:	1 week
2017-09-29 06:24:45 +00:00
Jonathan T. Looney
382a6bbcf1 Enforce the limit on ICMP messages before doing work to formulate the
response.

Delete an unneeded rate limit for UDP under IPv6. Because ICMP6
messages have their own rate limit, it is unnecessary to apply a
second rate limit to UDP messages.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10387
2017-05-30 14:32:44 +00:00
Ed Maste
3e85b721d6 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Eric van Gyzen
8144690af4 Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

Suggested by:	glebius, emaste
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9625
2017-02-16 20:47:41 +00:00
Gleb Smirnoff
8c70a35334 Fix build for 32-bit machines.
Submitted by:	tuexen
2016-12-09 20:50:35 +00:00
Gleb Smirnoff
3cbee8caa1 Use counter_ratecheck() in the ICMP rate limiting.
Together with:	rrs, jtl
2016-12-09 17:59:15 +00:00
Michael Tuexen
3e1465754f Make ICMPv6 hard error handling for TCP consistent with the ICMPv4
handling. Ensure that:
* Protocol unreachable errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Communication prohibited errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Hop Limited exceeded errors are handled by indicating EHOSTUNREACH
  to the TCP user for both IPv4 and IPv6.
  For IPv6 the TCP connected was dropped but errno wasn't set.

Reviewed by: gallatin, rrs
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: 7904
2016-10-21 10:32:57 +00:00
Michael Tuexen
f88d0cfe7a When sending in ICMP response to an SCTP packet,
* include the SCTP common header, if possible
* include the first 8 bytes of the INIT chunk, if possible
This provides the necesary information for the receiver of the ICMP
packet to process it.

MFC after:	1 week
2016-05-25 22:16:11 +00:00
Mark Johnston
e8800f3c2f Fix a few style issues in the ICMP sysctl descriptions.
MFC after:	1 week
2016-05-15 03:19:53 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Pedro F. Giffuni
99d628d577 netinet: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae. tuexen
2016-04-15 15:46:41 +00:00
Michael Tuexen
f77b842746 When delivering an ICMP packet to the ctlinput function, ensure that
the outer IP header, the ICMP header, the inner IP header and the
first n bytes are stored in contgous memory. The ctlinput functions
currently rely on this for n = 8. This fixes a bug in case the inner IP
header had options.
While there, remove the options from the outer header and provide a
way to increase n to allow improved ICMP handling for SCTP. This will
be added in another commit.

MFC after:	1 week
2016-04-14 19:51:29 +00:00
Alexander V. Chernikov
9977be4a64 Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
  in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
  return IPv6 destination address with embedded scope. Currently
  rw_gateway has it scope embedded, do the same for non-gatewayed
  destinations.

Sponsored by:	Yandex LLC
2015-12-09 11:14:27 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
Warner Losh
61f26cae7d Where appropriate, use the modern terms for the one true time base
(UTC) rather than the archaic (GMT) in comments. Except where the
comments are making fun of people doing this (and pedants who insist
on the new terms).
2014-12-21 05:07:11 +00:00
Andrey V. Elsukov
2d957916ef Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by:	Yandex LLC
2014-12-02 04:20:50 +00:00
Alexander V. Chernikov
670e8b3b8c Kill custom in_matroute() radix mathing function removing one rte mutex lock.
Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.

So it looks like this is nearly impossible to make GC do its work nowadays:

in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.

Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.

So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
  it is not possible to use rt_expire from user point of view, proto3 support
  is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts

MFC after:	1 month
2014-11-11 02:52:40 +00:00
Alexander V. Chernikov
d1f79a3bfc Remove kernel handling of ICMP_SOURCEQUENCH.
It hasn't been used for a very long time.
Additionally, it was deprecated by RFC 6633.
2014-11-10 23:10:01 +00:00
Alexander V. Chernikov
603eaf792b Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from:	net@
2014-11-09 21:33:01 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Mark Johnston
00cb6bef99 Add a sysctl, net.inet.icmp.tstamprepl, which can be used to disable replies
to ICMP Timestamp packets.

PR:		193689
Submitted by:	Anthony Cornehl <accornehl@gmail.com>
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
2014-10-01 18:07:34 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Andrey V. Elsukov
41ea685c32 Don't copy the MF flag from original IP header to ICMP error message.
PR:		188092
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-03-31 13:00:49 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Gleb Smirnoff
9e2a372fd2 Use ip_stripoptions() instead of handrolled version. 2012-10-23 10:30:09 +00:00
Gleb Smirnoff
8ad458a471 Do not reduce ip_len by size of IP header in the ip_input()
before passing a packet to protocol input routines.
  For several protocols this mean that now protocol needs to
do subtraction itself, and for another half this means that
we do not need to add header length back to the packet.

  Make ip_stripoptions() to adjust ip_len, since now we enter
this function with a packet header whose ip_len does represent
length of entire packet, not payload only.
2012-10-23 08:33:13 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00