4697 Commits

Author SHA1 Message Date
yongari
9c37e88860 MFC r265942:
Fix checksum computation.  Previously it didn't include carry.
2014-05-16 05:05:53 +00:00
kevlo
69da76e9e5 MFC r264212,r264213,r264248,r265776,r265811,r265909:
- Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
  Tested with vlc and a test suite [1].
  [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

  Reviewed by:	jhb, glebius, adrian

- Fix a logic bug which prevented the sending of UDP packet with 0 checksum.

- Disable TX checksum offload for UDP-Lite completely. It wasn't used for
  partial checksum coverage, but even for full checksum coverage it doesn't
  work.
2014-05-13 06:05:53 +00:00
melifaro
aaa6b80bb3 Merge 260488, r260508.
r260488:
  Split rt_newaddrmsg_fib() into two different functions.
  Adding/deleting interface addresses involves access to 3 different subsystems,
  int different parts of code. Each call can fail, so reporting successful
  operation by rtsock in the middle of the process error-prone.

  Further split routing notification API and actual rtsock calls via creating
  public-available rt_addrmsg() / rt_routemsg() functions with "private"
  rtsock_* backend.

r260508:
  Simplify inet alias handling code: if we're adding/removing alias which
  has the same prefix as some other alias on the same interface, use
  newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

  This eliminates the following rtsock messages:

  Pinned RTM_ADD for prefix (for alias addition).
  Pinned RTM_DELETE for prefix (for alias withdrawal).

  Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

  before commit, addition:

    got message of size 116 on Fri Jan 10 14:13:15 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

    got message of size 192 on Fri Jan 10 14:13:15 2014
    RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

  after commit, addition:

    got message of size 116 on Fri Jan 10 13:56:26 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

  before commit, wihdrawal:

    got message of size 192 on Fri Jan 10 13:58:59 2014
    RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

    got message of size 116 on Fri Jan 10 13:58:59 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  adter commit, withdrawal:

    got message of size 116 on Fri Jan 10 14:14:11 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
  (and requires some hacks to keep prefix in route table on RTM_DELETE).

  I've tested this change with quagga (no change) and bird (*).

  bird alias handling is already broken in *BSD sysdep code, so nothing
  changes here, too.

  I'm going to MFC this change if there will be no complains about behavior
  change.

  While here, fix some style(9) bugs introduced by r260488
  (pointed by glebius and bde).
2014-05-08 21:03:31 +00:00
rmacklem
d706690fda MFC: r264739
Add {} braces so that the code conforms to the indentation.
Fortunately, I don't think doing the assignment of cap->tsomax
unconditionally causes any problem.
2014-05-06 22:04:50 +00:00
delphij
acd7398463 Fix devfs rules not applied by default for jails.
Fix OpenSSL use-after-free vulnerability.

Fix TCP reassembly vulnerability.

Security:	FreeBSD-SA-14:07.devfs
Security:	CVE-2014-3001
Security:	FreeBSD-SA-14:08.tcp
Security:	CVE-2014-3000
Security:	FreeBSD-SA-14:09.openssl
Security:	CVE-2010-5298
2014-04-30 04:03:05 +00:00
ae
92efc4b623 MFC r263966:
Don't copy the MF flag from original IP header to ICMP error message.

PR:		188092
Sponsored by:	Yandex LLC
2014-04-07 12:50:08 +00:00
glebius
be0e4274f5 Merge r262341:
- Improve logging of send errors, reporting error code and interface.
  - Reduce code duplication between INET and INET6.
2014-04-04 10:33:59 +00:00
glebius
03fdc2934e Merge r262763, r262767, r262771, r262806 from head:
- Remove rt_metrics_lite and simply put its members into rtentry.
  - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
    removes another cache trashing ++ from packet forwarding path.
  - Create zini/fini methods for the rtentry UMA zone. Via initialize
    mutex and counter in them.
  - Fix reporting of rmx_pksent to routing socket.
  - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
2014-03-21 15:15:30 +00:00
glebius
73a339242d Merge r262747: remove extraneous ifa_ref()/ifa_free(). 2014-03-19 09:23:58 +00:00
glebius
ad70c4103e Merge r263091: fix mbuf flags clash that lead to failure of operation
of IPSEC and packet filters.

PR:		kern/185876
PR:		kern/186755
2014-03-18 16:56:05 +00:00
glebius
ed41469327 Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,
r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029,
      r262030, r262162 from head.

  Large flowtable revamp. See commit messages for merged revisions for
  details.

Sponsored by:	Netflix
2014-03-04 15:14:47 +00:00
glebius
352d508b16 Merge r261590: Fixup for r261590 (vnet sysctl handlers cleanup) 2014-03-04 14:05:37 +00:00
glebius
4b9e17c3ef Merge r261590, r261592 from head:
Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
  in the sysctl_root().

  Note: SYSCTL_VNET_* macros can be removed as well. All is
    needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
    But for now keep macros in place to avoid large code churn.
2014-03-04 14:01:12 +00:00
adrian
3ee8d78c97 MFC r260871:
If the flowid is available for the mbuf that finalised the creation
  of a syncache connection, copy it into the inp_flowid field.

  Without this, an incoming TCP connection won't have an inp_flowid marked
  until some data comes in, and this means that things like the per-CPU
  TCP timer option will choose a different CPU for the timer work.
  (It also means that if one grabbed the flowid via an ioctl from userland,
  it won't be available until some data has been received.)

Sponsored by:	Netflix, Inc.
2014-02-10 06:29:05 +00:00
ae
88938f44a0 MFC r260702 (by melifaro):
Fix ipfw fwd for IPv4 traffic broken by r249894.

  Problem case:
  Original lookup returns route with GW set, so gw points to
  rte->rt_gateway.
  After that we're changing dst and performing lookup another time.
  Since fwd host is most probably directly reachable, resulting
  rte does not contain rt_gateway, so gw is not set. Finally, we
  end with packet transmitted to proper interface but wrong
  link-layer address.
2014-02-06 10:48:55 +00:00
gnn
dbf0fc2c51 MFC 260796
Fix various places where we don't properly release a lock

PR:		185043
Submitted by:	Michael Bentkofsky
2014-02-03 03:31:35 +00:00
glebius
6e1079d8b4 Merge 261024: fix PIM input regression. 2014-01-27 09:33:30 +00:00
glebius
34e36d1706 Merge r257846:
Make TCP_KEEP* socket options readable. At least PostgreSQL wants
  to read the values.
2014-01-22 10:08:33 +00:00
avg
c1dbdbde60 MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE 2014-01-17 10:58:59 +00:00
avg
c2040a08a9 MFC r258605: Convert over the TCP probes to use mtod()
MFC slacker:	adrian
2014-01-17 10:48:44 +00:00
ae
65169ca8a0 MFC r260151 (by adrian):
Use an RLOCK here instead of an RWLOCK - matching all the other calls
  to lla_lookup().

  This drastically reduces the very high lock contention when doing parallel
  TCP throughput tests (> 1024 sockets) with IPv6.

MFC r260187:
  lla_lookup() does modification only when LLE_CREATE is specified.
  Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
  lla_lookup() without LLE_CREATE flag.

MFC r260217:
  Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
  LLE_CREATE flag.
2014-01-10 09:45:28 +00:00
peter
aa3916c7c0 Revert MFC of r258821 - it was already handled by MFC of r239672.
Pointy hat to: peter
2014-01-08 03:16:21 +00:00
tuexen
624cda0839 MFC r259943:
Address some warnings which showed up on the userland version.
2014-01-07 23:50:02 +00:00
peter
16f467cacd MFC r258821 - fix tcp simultaneous close
PR:		kern/99188
2014-01-07 23:00:58 +00:00
glebius
e3ce8ac51c Merge r260188 from head:
Fix regression from r249894. Now we pass "gw" as argument to if_output
  method, thus for multicast case we need it to point at "dst".

PR:		185395
2014-01-05 13:55:33 +00:00
pluknet
ac5822de6e MFC r259906: Draft-ietf-tcpm-initcwnd-05 became RFC6928. 2014-01-02 16:48:08 +00:00
dim
58bc1e001e MFC r259839:
In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever
KTR is defined, so put it between #ifdef KTR guards.  This avoids a
warning about a unused function if KTR is not enabled.
2013-12-28 01:15:34 +00:00
tuexen
cfa2934865 MFC r258574:
Only initialize some mutexes for the default VNET.

In r208160, sctp_it_ctl was made a global variable, across all VNETs.
However, sctp_init() is called for every VNET that is created.  This results
in the same global mutexes which are part of sctp_it_ctl being initialized.  This can result
in crashes if many jails are created.

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Witness will warn about the same mutex being initialized.

Fix the problem by only initializing these mutexes in the default VNET.

MFC r258765:

In
http://svnweb.freebsd.org/changeset/base/258221
I introduced a bug which initialized global locks
whenever the SCTP stack initialized. This was fixed in
http://svnweb.freebsd.org/changeset/base/258574
by rodrigc@. He just initialized the locks for
the default vnet. This fix reverts to the old
behaviour before r258221, which explicitly makes
sure it is only called once, because this works also on
other platforms.

Approved by: re@ (gjb)
2013-12-03 20:55:37 +00:00
tuexen
877516e51a MFC r256556:
Remove a buggy comparision when setting manually the path MTU.
After fixing, the comparision would have become redundant.
Thanks to Andrew Galante for reporting the issue.

MFC r257272:
Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined.
The issue was reported by Andrew Galante.

MFC r257274:
Fix the value of *optlen when calling getsockopt() for
SCTP_REMOTE_UDP_ENCAPS_PORT.
This issue was reported by Andrew Galante.

MFC r257359:
Terminate a debug output with a \n.

MFC r257555:
Changes from upstream to improve compilation when INET or INET6
or none of them is defined.

MFC r257574:
Unlock the lock before destroying it.
This issue was reported by Andrew Galante.

MFC r257800:
Use htons()/ntohs() appropriately.
These issues were reported by Andrew Galante.

MFC r257803:
Make sure that we don't try to build an ASCONF-ACK chunk
larger than what fits in the the mbuf cluster.
This issue was reported by Andrew Galante.

MFC r257804:
Get rid of the artification limitation enforced by
SCTP_AUTH_RANDOM_SIZE_MAX.
This was suggested by Andrew Galante.

MFC r258221:
Cleanups which result in fixes which have been made upstream
and where partially suggested by Andrew Galante.
There is no functional change in FreeBSD.

MFC r258224:
When determining if an address belongs to an stcb, take the address family
into account for wildcard bound endpoints.

MFC r258228:
Remove a stray write operation.

MFC r258235:
Use SCTP_PR_SCTP_TTL when the user provides a positive
timetolive in sctp_sendmsg().

Approved by: re@
2013-11-21 23:00:09 +00:00
andre
82fad41b08 MFC r256920:
The TCP delayed ACK logic isn't aware of LRO passing up large aggregated
  segments thinking it received only one segment. This causes it to enable
  the delay the ACK for 100ms to wait for another segment which may never
  come because all the data was received already.

  Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
  us further away from acking every other packet; b) it introduces additional
  delay in responding to the sender.  The latter is especially bad because it
  is in the nature of LRO to aggregated all segments of a burst with no more
  coming until an ACK is sent back.

  Change the delayed ACK logic to detect LRO segments by being larger than
  the MSS for this connection and issuing an immediate ACK for them to keep
  the ACK clock ticking without interruption.

  Reported by:  julian, cperciva
  Tested by:    cperciva
  Reviewed by:  lstewart

Approved by:    re (glebius)
2013-10-29 21:00:54 +00:00
glebius
2f1b9cddbb When processing ACK in tcp_do_segment, use sbcut_locked() instead of
sbdrop_locked() to cut acked mbufs from the socket buffer. Free this
chain a batch manner after the socket buffer lock is dropped.

This measurably reduces contention on socket buffer.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Approved by:	re (marius)
2013-10-09 12:00:38 +00:00
markj
8ff2d52009 Add a separate translator for headers passed to the TCP probes in the
input path. These probes get some of the fields in host order, whereas the
output probes get them in network order, so a single translator isn't
enough. This workaround ensures that the problem is essentially invisble
to users: none of the probe arguments or their fields have changed.

Approved by:	re (hrs)
2013-10-02 17:14:12 +00:00
bz
b67689ea7e Introduce spares in the TCP syncache and timewait structures
so that fixed TCP_SIGNATURE handling can later be merged.

This is derived from follow-up work to SVN r183001 posted to
net@ on Sep 13 2008.

Approved by:	re (gjb)
2013-09-21 10:01:51 +00:00
trociny
8439b55778 Unregister inet/inet6 pfil hooks on vnet destroy.
Discussed with:	andre
Approved by:	re (rodrigc)
2013-09-13 18:45:10 +00:00
tuexen
545d815a3b Fix the aborting of association with the iterator using an empty
user initiated error cause (using SCTP_ABORT|SCTP_SENDALL).

Approved by: re (delphij)
MFC after: 1 week
2013-09-09 21:40:07 +00:00
trociny
4233132eb4 Relese the interface in the last.
Reviewed by:	glebius
Approved by:	re (kib)
2013-09-08 18:19:40 +00:00
tuexen
aa05f03aae When computing the partial delivery point, take the
receiver socket buffer size correctly into account.

MFC after: 1 week
2013-09-07 00:45:24 +00:00
jhb
42eb0e69b4 Use LIST_FOREACH_SAFE() instead of doing it by hand. 2013-09-05 14:26:37 +00:00
jhb
057d03f1de Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This
matches the types used when computing hash indices and the type of the
maximum size of mfchashtbl[].

PR:		kern/181821
Submitted by:	Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4)
MFC after:	1 week
2013-09-05 14:16:37 +00:00
ae
0a28609aca Remove unused code and sort variables declarations.
PR:		kern/181822
MFC after:	1 week
2013-09-05 08:12:36 +00:00
tuexen
0ad83fb985 Remove redundant field pr_sctp_on.
MFC after: 1 week
2013-09-03 19:31:59 +00:00
tuexen
d6366593f2 Use uint16_t instead of in_port_t for consistency with the SCTP code.
MFC after: 1 week
2013-09-02 23:27:53 +00:00
tuexen
7d2fcf1932 All changes affect only SCTP-AUTH:
* Remove non working code related to SHA224.
* Remove support for non-standardised HMAC-IDs using SHA384 and SHA512.
* Prefer SHA256 over SHA1.
* Minor cleanup.

MFC after: 2 weeks
2013-09-02 22:48:41 +00:00
np
e9b6cb5ecc Merge r254336 from user/np/cxl_tuning.
Add a last-modified timestamp to each LRO entry and provide an interface
to flush all inactive entries.  Drivers decide when to flush and what
the inactivity threshold should be.

Network drivers that process an rx queue to completion can enter a
livelock type situation when the rate at which packets are received
reaches equilibrium with the rate at which the rx thread is processing
them.  When this happens the final LRO flush (normally when the rx
routine is done) does not occur.  Pure ACKs and segments with total
payload < 64K can get stuck in an LRO entry.  Symptoms are that TCP
tx-mostly connections' performance falls off a cliff during heavy,
unrelated rx on the interface.

Flushing only inactive LRO entries works better than any of these
alternates that I tried:
- don't LRO pure ACKs
- flush _all_ LRO entries periodically (every 'x' microseconds or every
  'y' descriptors)
- stop rx processing in the driver periodically and schedule remaining
  work for later.

Reviewed by:	andre
2013-08-28 23:00:34 +00:00
jhb
a437be7257 Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by:	bde
Tested by:	exp-run by bdrewery
2013-08-26 18:16:05 +00:00
markj
f9ffef3723 The second last argument of udp:::receive is supposed to contain the
connection state, not the IP header.

X-MFC with:	r254889
2013-08-26 00:28:57 +00:00
markj
29e4661920 Implement the ip, tcp, and udp DTrace providers. The probe definitions use
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.

Tested by:	gnn, hiren
MFC after:	1 month
2013-08-25 21:54:41 +00:00
tuexen
3869cd403d Provide human readable debug output. 2013-08-25 12:44:03 +00:00
andre
10b033d327 For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.
The upper 32bits are not occupied for now.

Sponsored by:	The FreeBSD Foundation
2013-08-25 09:49:00 +00:00
andre
e3737c33e7 Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features.  The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
  layer specific union PH_loc for local use.  Protocols can flexibly overlay
  their own 8 to 64 bit fields to store information while the packet is
  worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
  instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
  information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
  rsstype field.  Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
  with the packet.  It is not yet supported in any drivers but allows us to
  get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
  a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
  from the start of the packet.  This is important for various offload
  capabilities and to relieve the drivers from having to parse the packet
  and protocol headers to find out location of checksums and other
  information.  Header parsing in drivers is a lot of copy-paste and
  unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
  packet information, like ether_vtag, tso_segsz and csum fields.
  Depending on the csum_flags settings some fields may have different usage
  making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
  stack) and inbound (up the stack) use.  The CSUM flags used to be a bit
  chaotic and rather poorly documented leading to incorrect use in many
  places.  Bring clarity into their use through better naming.
  Compatibility mappings are provided to preserve the API.  The drivers
  can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by:	The FreeBSD Foundation
2013-08-24 19:51:18 +00:00