93918 Commits

Author SHA1 Message Date
Nathan Whitehorn
d9675cb0af Fix check: bitwise and has only one &.
MFC after:	1 week
2013-07-12 15:56:30 +00:00
Marius Strobl
85338755c1 Prefix the alias macros for members of struct __mcontext with an underscore
in order to avoid a clash in the net80211 code.
2013-07-12 14:24:52 +00:00
Hiroki Sato
4825b1e098 Add a leaf node CTL_NET.PF_ROUTE.0.AF.NET_RT_DUMP.0.FIB. This returns
routing table with the specified FIB number, not td->td_proc->p_fibnum.
2013-07-12 12:36:12 +00:00
Hiroki Sato
e9f947e27c - Drop GIF_ACCEPT_REVETHIP flag by default.
- Add IFF_MONITOR support.
2013-07-12 12:18:07 +00:00
Craig Rodrigues
71e6a9ce71 PR: kern/168520
Submitted by: "YAMAMOTO, Shigeru" <shigeru@iij.ad.jp>
Reviewed by: adrian

In PC-BSD 9.1, VIMAGE is enabled in the kernel config.
For laptops with Bluetooth capability, such as the HP Elitebook 8460p,
the kernel will panic upon bootup, because curthread->td_vnet
is not initialized.

Properly initialize curthread->td_vnet when initializing the Bluetooth stack.

This allows laptops such as the HP Elitebook 8460p laptop
to properly boot with VIMAGE kernels.
2013-07-12 08:03:10 +00:00
Andre Oppermann
10c982958c Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache.
Reported by:	trociny, rodrigc
2013-07-12 07:43:56 +00:00
Scott Long
b27b6b66c0 Refactor the various delete methods out of dastart(). Cleans up a bunch
of style and adds more modularity and clarity.

Obtained from:	Netflix
MFC after:	3 days
2013-07-12 00:50:25 +00:00
Konstantin Belousov
5a3c920f45 When swap pager allocates metadata in the pagedaemon context, allow it
to drain the reserve.  This was broken in r243040, causing deadlock.
Note that VM_WAIT call in case of uma_zalloc() failure from pagedaemon
would only wait for the v_pageout_free_min anyway.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-07-11 20:33:57 +00:00
Navdeep Parhar
2b66d73259 Attach to the 4x10G T540-CR card. 2013-07-11 19:09:31 +00:00
Andre Oppermann
bf0354c0f2 Fix const propagation issues to make GCC happy.
Submitted by:	Michael Butler <imb@protected-networks.net>
2013-07-11 16:27:11 +00:00
Andre Oppermann
81d392a09d Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK
information into the ISN (initial sequence number) without the additional
use of timestamp bits and switching to the very fast and cryptographically
strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against
forgeries.

The purpose of SYN cookies is to encode all necessary session state in
the 32 bits of our initial sequence number to avoid storing any information
locally in memory.  This is especially important when under heavy spoofed
SYN attacks where we would either run out of memory or the syncache would
fill with bogus connection attempts swamping out legitimate connections.

The original SYN cookies method only stored an indexed MSS values in the
cookie.  This isn't sufficient anymore and breaks down in the presence of
WSCALE information which is only exchanged during SYN and SYN-ACK.  If we
can't keep track of it then we may severely underestimate the available
send or receive window. This is compounded with large windows whose size
information on the TCP segment header is even lower numerically.  A number
of years back SYN cookies were extended to store the additional state in
the TCP timestamp fields, if available on a connection.  While timestamps
are common among the BSD, Linux and other *nix systems Windows never enabled
them by default and thus are not present for the vast majority of clients
seen on the Internet.

The common parameters used on TCP sessions have changed quite a bit since
SYN cookies very invented some 17 years ago.  Today we have a lot more
bandwidth available making the use window scaling almost mandatory.  Also
SACK has become standard making recovering from packet loss much more
efficient.

This change moves all necessary information into the ISS removing the need
for timestamps.  Both the MSS (16 bits) and send WSCALE (4 bits) are stored
in 3 bit indexed form together with a single bit for SACK.  While this is
significantly less than the original range, it is sufficient to encode all
common values with minimal rounding.

The MSS depends on the MTU of the path and with the dominance of ethernet
the main value seen is around 1460 bytes.  Encapsulations for DSL lines
and some other overheads reduce it by a few more bytes for many connections
seen.  Rounding down to the next lower value in some cases isn't a problem
as we send only slightly more packets for the same amount of data.

The send WSCALE index is bit more tricky as rounding down under-estimates
the available send space available towards the remote host, however a small
number values dominate and are carefully selected again.

The receive WSCALE isn't encoded at all but recalculated based on the local
receive socket buffer size when a valid SYN cookie returns.  A listen socket
buffer size is unlikely to change while active.

The index values for MSS and WSCALE are selected for minimal rounding errors
based on large traffic surveys.  These values have to be periodically
validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[]
and tcp_sc_wstab[] if necessary.

In addition the hash MAC to protect the SYN cookies is changed from MD5
to SipHash-2-4, a much faster and cryptographically secure algorithm.

Reviewed by:	dwmalone
Tested by:	Fabian Keil <fk@fabiankeil.de>
2013-07-11 15:29:25 +00:00
Jim Harris
66619178b5 Fix a poorly worded comment in nvme(4).
MFC after:	3 days
2013-07-11 15:02:38 +00:00
Andre Oppermann
6856398eab SipHash is a cryptographically strong pseudo-random function (a.k.a. keyed
hash function) optimized for speed on short messages returning a 64bit hash/
digest value.

SipHash is simpler and much faster than other secure MACs and competitive
in speed with popular non-cryptographic hash functions.  It uses a 128-bit
key without the hidden cost of a key expansion step.  SipHash iterates a
simple round function consisting of four additions, four xors, and six
rotations, interleaved with xors of message blocks for a pre-defined number
of compression and finalization rounds.  The absence of  secret load/store
addresses or secret branch conditions avoid timing attacks.  No state is
shared between messages.  Hashing is deterministic and doesn't use nonces.
It is not susceptible to length extension attacks.

Target applications include network traffic authentication, message
authentication (MAC) and hash-tables protection against hash-flooding
denial-of-service attacks.

The number of update/finalization rounds is defined during initialization:

 SipHash24_Init() for the fast and reasonable strong version.
 SipHash48_Init() for the strong version (half as fast).

SipHash usage is similar to other hash functions:

 struct SIPHASH_CTX ctx;
 char *k = "16bytes long key"
 char *s = "string";
 uint64_t h = 0;
 SipHash24_Init(&ctx);
 SipHash_SetKey(&ctx, k);
 SipHash_Update(&ctx, s, strlen(s));
 SipHash_Final(&h, &ctx);  /* or */
 h = SipHash_End(&ctx);    /* or */
 h = SipHash24(&ctx, k, s, strlen(s));

It was designed by Jean-Philippe Aumasson and Daniel J. Bernstein and
is described in the paper "SipHash: a fast short-input PRF", 2012.09.18:
 https://131002.net/siphash/siphash.pdf
 Permanent ID: b9a943a805fbfc6fde808af9fc0ecdfa

Implemented by:	andre (based on the paper)
Reviewed by:	cperciva
2013-07-11 14:18:38 +00:00
Andre Oppermann
bc4a1b8ccd Make use of the fact that uma_zone_set_max(9) already returns the
rounded limit making a call to uma_zone_get_max(9) unnecessary.

MFC after:	1 day
2013-07-11 12:53:13 +00:00
Andre Oppermann
e0c00adda2 Fix style issues, a typo in "kern.ipc.nmbufs" and correctly plave and
expose the value of the tunable maxmbufmem as "kern.ipc.maxmbufmem"
through sysctl.

Reported by:	smh
MFC after:	1 day
2013-07-11 12:46:35 +00:00
Konstantin Belousov
4f9c9114a3 The vm_fault() should not be allowed to proceed on the map entry which
is being wired now.  The entry wired count is changed to non-zero in
advance, before the map lock is dropped.  This makes the vm_fault() to
perceive the entry as wired, and breaks the fragment which moves the
wire count from the shadowed page, to the upper page, making the code
unwiring non-wired page.

On the other hand, the vm_fault() calls from vm_fault_wire() should be
allowed to proceed, so only drain MAP_ENTRY_IN_TRANSITION from
vm_fault() when wiring_thread is not current.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-11 05:58:28 +00:00
Konstantin Belousov
0acea7dfde The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly with
parallel creation of the map entries, e.g. by mmap() or stack growing.
It also breaks when other entry is wired in parallel.

The vm_map_wire() iterates over the map entries in the region, and
assumes that map entries it finds are marked as in transition before,
also that any entry marked as in transition, are marked by the current
invocation of vm_map_wire().  This is not true for new entries in the
holes.

Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct
vm_map_entry.  In vm_map_wire() and vm_map_unwire(), only process the
entries which transition owner is the current thread.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-11 05:55:08 +00:00
Konstantin Belousov
ebf5d94e82 Never remove user-wired pages from an object when doing
msync(MS_INVALIDATE).  The vm_fault_copy_entry() requires that object
range which corresponds to the user-wired vm_map_entry, is always
fully populated.

Add OBJPR_NOTWIRED flag for vm_object_page_remove() to request the
preserving behaviour, use it when calling vm_object_page_remove() from
vm_object_sync().

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-11 05:47:26 +00:00
Konstantin Belousov
3abeb8113d In the vm_page_set_invalid() function, do not assert that the page is
not busy, since its only caller brelse() can legitimately call it on
busy page.  This happens for VOP_PUTPAGES() on filesystems that use
buffers and which VOP_WRITE() method marked the buffer containing page
as non-cacheable.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-11 05:38:39 +00:00
Konstantin Belousov
92e5367354 Do not invalidate page of the B_NOCACHE buffer or buffer after an I/O
error if any user wired mappings exist.  Doing the invalidation
destroys the user wiring.

The change is the temporal measure to close the bug, the more proper
fix is to delegate the invalidation of the page to upper layers
always.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-11 05:36:26 +00:00
Konstantin Belousov
30dac21d0a Explicitely panic instead of possibly doing undefined things when
ptelist KVA is exhausted.  Currently this cannot happen, the added
panic serves as assert.

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-07-11 05:15:30 +00:00
Konstantin Belousov
3fb25770a9 MFamd64 r253140:
Clear m->object for the page taken from the delayed free list in
pmap_pv_reclaim().

Noted by:	alc
2013-07-11 05:10:36 +00:00
Jack F Vogel
3f80cc03fd Fix my last commit, flags rather than flag... duh.
MFC after: 2 days
2013-07-11 03:44:06 +00:00
Jack F Vogel
804d70535a Fix to a panic found internally, bad pointer during rxeof
processing. Thanks for John Baldwin for catching this. Not
clearing the flag member of the rxbuf could result in a NULL
mbuf pointer being used.

MFC after:	2 days (this needs to get into 9.2!)
2013-07-10 23:14:24 +00:00
Pedro F. Giffuni
c5249f35b8 Implement 1003.1-2001 pathconf() keys.
This is based on r106058 in UFS.

MFC after:	1 month
2013-07-10 22:03:01 +00:00
Marcel Moolenaar
68fd965f98 Add 2 builtin words for working with directories:
isdir?		( fd -- bool )
	freaddir	( fd -- ptr len TRUE | FALSE )

The 'isdir?' word returns `true' if the file descriptor is for a
directory and `false' otherwise.

The 'freaddir' word reads the next directory entry and if successful,
returns its name and 'true'. Otherwise 'false' is returned.

These words give the loader the ability to scan directories and read
files contained in them for 'rc.d'-like flexibility in handling which
modules to load and/or which tunables to set.

Obtained from:	Juniper Networks, Inc.
2013-07-10 21:37:50 +00:00
Pedro F. Giffuni
53aa3d1a99 Change i_gen in UFS to an unsigned type.
Missing type change from r252435.

This fixes a "Stale NFS file handle" error.

Reported by:	Claude Bisson
Tested by:	Claude Bisson
Pointed hat:	pfg
2013-07-10 18:19:48 +00:00
Marcel Moolenaar
eead2d551c Protect against broken hardware. In this particular case, protect against
H/W not de-asserting the interrupt at all. On x86, and because of the
following conditions, this results in a hard hang with interrupts disabled:
1.  The uart(4) driver uses a spin lock to protect against concurrent
    access to the H/W. Spin locks disable and restore interrupts.
2.  Restoring the interrupt on x86 always writes the flags register. Even
    if we're restoring the interrupt from disabled to disabled.
3.  The x86 CPU has a short window in which interrupts are enabled when the
    flags register is written.
4.  The uart(4) driver registers a fast interrupt by default.

To catch this case, we first try to clear any pending H/W interrupts and in
particular, before setting up the interrupt. This makes sure the interrupt
is masked on the PIC. The interrupt handler now has a limit set on the
number of iterations it'll go through to clear interrupt conditions. If the
limit is hit, the handler will return FILTER_SCHEDULE_THREAD. The attach
function will check for this return code and avoid setting up the interrupt
and foce polling in that case.

Obtained from:	Juniper Networks, Inc.
2013-07-10 17:42:20 +00:00
Marcel Moolenaar
8939c0693c Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

This change differs from r253224 in the following way:
o   The vfs_mounted handler is called before mountcheckdirs() and with
    newdp locked. vp is unlocked.
o   The event handlers are declared in <sys/eventhandler.h> and not in
    <sys/mount.h>. The <sys/mount.h> header is used in user land code
    that pretends to be kernel code and as such creates a very convoluted
    environment. It's hard to untangle.

Submitted by:	stevek@juniper.net
Discussed with:	pjd@
Obtained from:	Juniper Networks, Inc.
2013-07-10 15:35:25 +00:00
Andre Oppermann
07dacf031e Extend debug logging of TCP timestamp related specification
violations.

Update related comments and style.
2013-07-10 12:06:01 +00:00
Alexander Leidinger
19ca1aa0d4 Fix build for gcc users by declaring variables for unions in structs which
don't declare a variable. The size before/after this change of the structs
doesn't change with gcc/clang.

Noticed by:	several
Suggested by:	Gary Jennejohn <gljennjohn@googlemail.com>
2013-07-10 10:40:52 +00:00
Aleksandr Rybalko
614901534b Remove trailing whitespaces. 2013-07-10 10:15:38 +00:00
Konstantin Belousov
a4a65e69c6 When panicing due to the gjournal overflow, print the geom metadata
journal id.

Requested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	1 week
2013-07-10 10:11:43 +00:00
Konstantin Belousov
0cdd261571 Clear m->object for the page taken from the delayed free list for
reuse as the pv chink page in reclaim_pv_chunk().  Having non-NULL
m->object is wrong for page not owned by an object and confuses both
vm_page_free_toq() and vm_page_remove() when the page is freed later.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-07-10 09:24:03 +00:00
Hiren Panchasara
3c9d5a037d Adding urtwn(4) firmware and related changes.
Reviewed by:	rpaulo
Approved by:	sbruno (mentor)
2013-07-10 08:21:09 +00:00
Kevin Lo
039418617b Add the ARM processor-specific section types.
Reviewed by:	imp
2013-07-10 07:15:39 +00:00
Pyun YongHyeon
37d17b6b63 Avoid controller reinitialization which could be triggered by
dhclient(8) or alias addresses are added.

Tested by:	dcx dcy <dcbsdx@hotmail.com>
2013-07-10 06:46:46 +00:00
David E. O'Brien
d0961945bb Refactor random_systat to be a *random_systat. This avoids unnecessary
structure copying in random_ident_hardware(). This change will also help
further modularization of random(4) subsystem.

Submitted by: arthurmesh@gmail.com
Reviewed by: obrien
Obtained from: Juniper Networks
2013-07-09 23:47:28 +00:00
Marius Strobl
68e9cbd385 - As it turns out, not only MSI-X is broken for devices passed through by
VMware up to at least ESXi 5.1. Actually, using INTx in that case instead
  may still result in interrupt storms, with MSI being the only working
  option in some configurations. So introduce a PCI_QUIRK_DISABLE_MSIX quirk
  which only blacklists MSI-X but not also MSI and use it for the VMware
  PCI-PCI-bridges. Note that, currently, we still assume that if MSI doesn't
  work, MSI-X won't work either - but that's part of the internal logic and
  not guaranteed as part of the API contract. While at it, add and employ
  a pci_has_quirk() helper.
  Reported and tested by: Paul Bucher
- Use NULL instead of 0 for pointers.

Submitted by:	jhb (mostly)
Approved by:	jhb
MFC after:	3 days
2013-07-09 23:12:26 +00:00
Xin LI
76a207c2b9 Sync with KAME.
MFC after:	1 month
2013-07-09 22:04:35 +00:00
Jim Harris
bd6b0ac5be Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.h
if not already defined elsewhere.

Requested by:	attilio
MFC after:	3 days
2013-07-09 21:24:19 +00:00
Jim Harris
e9efbc134f Update copyright dates.
MFC after:	3 days
2013-07-09 21:22:17 +00:00
Jim Harris
ec526ea90b Do not retry failed async event requests.
Sponsored by:	Intel
MFC after:	3 days
2013-07-09 21:03:39 +00:00
Jim Harris
eb32b874f6 Add pci_enable_busmaster() and pci_disable_busmaster() calls in
nvme_attach() and nvme_detach() respectively.

Sponsored by:	Intel
MFC after:	3 days
2013-07-09 21:02:45 +00:00
Konstantin Belousov
cc3d8c35f5 There are several code sequences like
vfs_busy(mp);
      vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls.  The unmount starts a write, while vfs_write_suspend() drain
writers.  On the other hand, unmount drains busy references, causing
the deadlock.

Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked.  The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.

Reported and tested by:	Andreas Longwitz <longwitz@incore.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2013-07-09 20:49:32 +00:00
Warner Losh
5163701c22 Nearly a complete rewrite of elf.h.
Start with NetBSD's sys/arch/mips/include/elf_machdep.h 1.18. Remove the NetBSD
specific glue pieces (leaving mostly just relocation types).

Add in FreeBSD specific glue pieces from older versions of this file, and
move to the top of the file:
r237430 | kib | 2012-06-22 00:38:31 -0600 (Fri, 22 Jun 2012) | 5 lines
r232449 | jmallett | 2012-03-03 01:19:18 -0700 (Sat, 03 Mar 2012) | 18 lines
r217097 | kib | 2011-01-07 07:22:34 -0700 (Fri, 07 Jan 2011) | 3 lines
r211412 | kib | 2010-08-17 02:55:45 -0600 (Tue, 17 Aug 2010) | 7 lines
r202908 | gonzo | 2010-01-23 19:59:22 -0700 (Sat, 23 Jan 2010) | 4 lines
r195356 | imp | 2009-07-05 01:00:51 -0600 (Sun, 05 Jul 2009) | 6 lines
r195128 | gonzo | 2009-06-27 17:27:41 -0600 (Sat, 27 Jun 2009) | 4 lines
r197933 | kib | 2009-10-10 09:31:24 -0600 (Sat, 10 Oct 2009) | 9 lines
r189926 | kib | 2009-03-17 06:50:16 -0600 (Tue, 17 Mar 2009) | 9 lines
r186191 | imp | 2008-12-16 13:07:47 -0700 (Tue, 16 Dec 2008) | 7 lines
as closely as I can tell, the projects/mips branch merge was disruptive
to good history.

This should make merges easier in the future from NetBSD and vice versa.
2013-07-09 19:01:38 +00:00
Jung-uk Kim
835fbe0ae7 Remove redundant definitions to appease tinderbox. 2013-07-09 18:15:59 +00:00
Andrey V. Elsukov
9f0f032d10 Correct the size of allocated memory to store array of counters. 2013-07-09 15:20:46 +00:00
Andrey V. Elsukov
9bea6fd6c6 Correct CTASSERT condition. 2013-07-09 15:10:27 +00:00
Michael Tuexen
e5aeb83c42 Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

X-MFC with: r252026
2013-07-09 14:38:26 +00:00