Commit Graph

122837 Commits

Author SHA1 Message Date
Emmanuel Vadot
38d3befe9c arm timer: Add workaround for Allwinner A64 timer
The timer present in allwinner A64 SoC is unstable, value can jump backward
or forward.
It was found that when bit 11 and upper roll over the low bits can sometimes
being read as all as 1 or all as 0.
Simply ignore the values for those cases.
2018-06-14 17:18:15 +00:00
Kenneth D. Merry
e4b58dfe33 Fix da(4) locking when probing SMR drives.
Probing host aware and host managed SMR drives got broken in revision
330796.

The added cam_periph_lock() calls were in areas in dadone() where
the peripheral lock was already held.

Since then, dadone() has been split into separate functions that are
dedicated to each probe state.

The result is that when probing a host aware drive, I ran into a recursive
lock acquisition in dadone_probeatalogdir(). I would have run into the
same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
dadone_probeatazone() in the error paths had the probe continued.

The solution is to take out all of the extra cam_periph_lock() calls. I
also added cam_periph_assert(periph, MA_OWNED) near the top of each of
the dadone_* calls. These make it clear to anyone coming along in the
the future that the lock is held in the probe done functions.

Also add a locking assert in daprobedone(), to make it clear that it must
be called with the periph lock held.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D15764
2018-06-14 17:08:44 +00:00
Justin Hibbits
402c7806cb Fix CTR formatting for moea64_native bootstrap
On very large memory systems 'size' can become 2GB or larger, resulting in a
negative value being formatted.  Also, moea64_pteg_count is already a long, so
format it as such.
2018-06-14 16:01:11 +00:00
Andrey V. Elsukov
9597ff83b8 Add missing BPF_MTAP2() for outbound packets. 2018-06-14 15:04:30 +00:00
Andrey V. Elsukov
2addcba7d5 Convert if_me(4) driver to use encap_lookup_t method and be lockless on
data path.
2018-06-14 14:53:24 +00:00
Konstantin Belousov
459ccd3c5f linuxolator/amd64: Don't mangle %r10 on return from syscall for EJUSTRETURN.
This fixes the %r10 content for rt_sigreturn.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:	1 week
2018-06-14 12:35:57 +00:00
Andrey V. Elsukov
eb548a1a5c In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested.
It is better to try allocate a big mbuf, than just silently drop a big
packet. A better solution could be reworking of libalias modules to be
able use m_copydata()/m_copyback() instead of requiring the single
contiguous buffer.

PR:		229006
MFC after:	1 week
2018-06-14 11:15:39 +00:00
Konstantin Belousov
5803d744c7 Reorganize code flow in fpudna()/npxdna() to highlight the critical
section scope.  Sprinkle __predict_false() for conditions known to
never occur or occur only on rare platforms.

Sponsored by:	The FreeBSD Foundation
2018-06-14 11:09:51 +00:00
Konstantin Belousov
fa7fad8ab9 Remove printf() in #NM handler.
Give up and remove the almost useless informational message reporting
that device not available exception occured while our state tracking
indicates the current CPU has FPU context loaded for the current
thread.

It seems that this is recurring bug with some VM monitors.

Sponsored by:	The FreeBSD Foundation
2018-06-14 10:33:26 +00:00
Rick Macklem
c338c94d20 Move four functions in nfscl.ko to nfscommon.ko.
Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom()
and nfsmnt_mdssession() are now called from within the nfsd.
As such, they needed to be moved from nfscl.ko to nfscommon.ko so that
nfsd.ko would load when nfscl.ko wasn't loaded.

Reported by:	herbert@gojira.at
2018-06-14 10:00:19 +00:00
Andrey V. Elsukov
a4f647571f Add NULL check like the rest of code has.
It is possible that ifma_protospec becomes NULL in this function for
some entry, but it is still referenced and thus it will not unlinked
from the list. Then "restart" condition triggers and this entry with
NULL ifma_protospec will lead to page fault.

PR:		228982
2018-06-14 09:36:25 +00:00
Andrey V. Elsukov
e36c281fc9 Remove stale comment. in6_ifdetach() can be called from places
where addresses are not removed yet.
2018-06-14 09:29:39 +00:00
Hans Petter Selasky
3ea947d615 Revert r335094 and properly fix OFED build after r335053.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-06-14 07:55:10 +00:00
Emmanuel Vadot
63e12418fd dts: Update our copy to Linux 4.17 2018-06-14 07:12:10 +00:00
Emmanuel Vadot
c0fc404789 Add modules/rockchip
Build rockchip modules as part of buildkernel.
Add the i2c controller module.
2018-06-14 06:40:59 +00:00
Emmanuel Vadot
3de61a6883 rk_i2c: Add driver for the I2C controller present in RockChip SoC
This controller have a special mode for RX to help with smbus-like transfer
when the controller will automatically send the slave address, register address
and read the data. Use it when possible.
The same mode for TX is describe is the datasheet but is broken and have been
since ~10 years of presence of this controller in RockChip SoCs.

Attach this driver early at we need it to communicate with the PMIC early in the
boot.
Do not hook it to the kernel build for now.
2018-06-14 06:39:33 +00:00
Emmanuel Vadot
e3fc845c91 rk3328: Add support for the i2c clocks 2018-06-14 06:34:27 +00:00
Emmanuel Vadot
3476304a69 if_dwc_rk: Add DesignWare driver for RockChip SoCs.
Add driver for the designware ethernet controller found in some RockChip SoCs.
The driver still rely on a lot of things setup by the bootloader like clocks
and phy mode.
But since netbooting is the only/easiest way to boot rockchip board at the
moment add the driver so other people can test/dev on thoses boards.
2018-06-14 06:28:09 +00:00
Emmanuel Vadot
282d1ef778 rk_armclk: Add the write mask to the register mux value
This was omitted in r334112 and r334996 which cause the PLL to not correctly
reparent, leaving the armclk to be derived from the APLL instead of the NPLL.
The arm core clock is now correctly set to 600Mhz via the assigned-clock present
in the DTB.
2018-06-14 05:46:57 +00:00
Emmanuel Vadot
1e7af4cc7a rk_pll: Add support for mode
RockChip PLL have two modes controlled by a register, a "slow mode" (the
default one) where the frequency is derived from the 24Mhz oscillator on the
board, and a "normal" one when the pll take it's input from the real PLL output.

Default the mode to normal for all the PLLs.
2018-06-14 05:43:45 +00:00
Emmanuel Vadot
b1b521b1d5 rk_pinctrl: Only add gpio subnode
This is the only node we are interested in so do not waste time to test
creating device that will be either unused or fail as most of the nodes
don't have a compatible string.
2018-06-14 05:41:16 +00:00
Randall Stewart
4aec110f70 This fixes several bugs that Larry Rosenman helped me find in
Rack with respect to its handling of TCP Fast Open. Several
fixes all related to TFO are included in this commit:
1) Handling of non-TFO retransmissions
2) Building the proper send-map when we are doing TFO
3) Dealing with the ack that comes back that includes the
   SYN and data.

It appears that with this commit TFO now works :-)

Thanks Larry for all your help!!

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15758
2018-06-14 03:27:42 +00:00
Navdeep Parhar
2ea5b0f54a cxgbe(4): Catch up with recent changes in the kernel -- it no longer
holds non-sleepable locks around any of the driver ioctls.

Sponsored by:	Chelsio Communications
2018-06-14 01:27:35 +00:00
Matt Macy
67b3b4d245 fix OFED build after r335053 2018-06-13 23:30:54 +00:00
Matt Macy
feeef8509b Fix PCBGROUPS build post CK conversion of pcbinfo 2018-06-13 23:19:54 +00:00
Konstantin Belousov
fc3e80c322 Enable eager FPU context switch by default on i386 too, based on
amd64 r335072.

Security:	CVE-2018-3665
Sponsored by:	The FreeBSD Foundation
2018-06-13 21:10:23 +00:00
Warner Losh
75b6758daa Add PNP info to PCI attachment of ae driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:36 +00:00
Warner Losh
ae8b178b0a Add PNP info to PCI attachments of age driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:32 +00:00
Warner Losh
542f4c5c92 Add PNP info to PCI attachment of amr driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:27 +00:00
Warner Losh
ef50201a2e Add PNP info to PCI attachment of ale driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:23 +00:00
Warner Losh
fc7449ce6a Add PNP info to PCI attachment of bwi driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:18 +00:00
Warner Losh
96b523613c Add PNP info to PCI attachment of bwn driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:13 +00:00
Warner Losh
769ac9e65b Add PNP info to PCI attachment of an driver
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:09 +00:00
Warner Losh
491589f3af Add PNP info to the PCI attachment of the ahci driver
Mark the PNP table, but still need to handle the CLASS / SUBCLASS /
REVID matching.

Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:04 +00:00
Warner Losh
c3f3f3e648 Add PNP info to the PCI attachment of the aacraid driver.
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:25:00 +00:00
Warner Losh
791a8cbbe6 Add PNP info to the PCI attachment of the ncr driver.
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-13 20:24:49 +00:00
Ryan Libby
a7be368aec i386: copyin/copyout error is EFAULT
Discussed with:	kib
MFC with:	r332489
Sponsored by:	Dell EMC Isilon
2018-06-13 19:57:03 +00:00
Konstantin Belousov
d1a07e31e5 Enable eager FPU context switch by default on amd64.
With compilers making increasing use of vector instructions the
performance benefit of lazily switching FPU state is no longer a
desirable tradeoff.  Linux switched to eager FPU context switch some
time ago, and the idea was floated on the FreeBSD-current mailing list
some years ago[1].

Enable eager FPU context switch by default on amd64, with a tunable/sysctl
available to turn it back off.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055198.html

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-06-13 17:55:09 +00:00
Jonathan T. Looney
0766f278d8 Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
Warner Losh
a971acbc25 Implement a 'car limit' for bioq.
Allow one to implement a 'car limit' for
bioq_disksort. debug.bioq_batchsize sets the size of car limit. Every
time we queue that many requests, we start over so that we limit the
latency for requests when the software queue depths are large. A value
of '0', the default, means to revert to the old behavior.

Sponsored by: Netflix
2018-06-13 16:48:07 +00:00
Andrew Turner
4e050d14e0 Add ThunderX2 to the list of CPUs we need to apply the branch predictor
hardening to.

Sponsored by:	DARPA, AFRL
2018-06-13 15:58:33 +00:00
Andrew Turner
3c4dad8812 Switch to the SMCCC function for branch predictor hardening. The previous
method may not have worked as the firmware checks for the ARCH_WORKAROUND_1
function ID.

Sponsored by:	DARPA, AFRL
2018-06-13 15:56:24 +00:00
Andrew Turner
09d1a08ddc Add the SMCCC return codes from ARM DEN 0070A.
While here add a comment with the document the function IDs come from.

Sponsored by:	DARPA, AFRL
2018-06-13 15:41:22 +00:00
Andrew Turner
f651b52527 Add support for the ARM SMC Calling Convention (SMCCC). This is a method
to call into the firmware in a similar way to the existing PSCI, and used
PSCI to detect when SMCCC is enabled.

There is a function ID space we can use. Currently we only support 3
functions in the ARM Architecture Calls region, however it is expected we
will expend these in the future.

Sponsored by:	DARPA, AFRL
2018-06-13 15:32:00 +00:00
Andrew Turner
9e8cb3d226 Move psci_call to a header file so we can use it in other files to
communicate with the firmware.

Sponsored by:	DARPA, AFRL
2018-06-13 15:24:07 +00:00
Alan Somers
ebc0d5599d audit(4): fix the definition of ARG_TERMID_ADDR
Due to a copy/paste error in r168688, ARG_TERMID_ADDR has the same
definition as ARG_SADDRUNIX.  Fix it.

The header change, while publicly visible, is guarded by #ifdef KERNEL, and
I can't find any kmod ports that use it.  So I'm not bumping
__FreeBSD_version.

PR:		228820
Submitted by:	aniketp
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15702
2018-06-13 14:55:31 +00:00
Bruce Evans
407a812657 Oops, r335053 had an old version of the comment about 16-bit linux dev_t
translation.
2018-06-13 12:44:45 +00:00
Andrew Turner
5add83935a Add a handler for the PSCI_FEATURES function. This needs PSCI 1.0, so
check for this, returning an error if the version is too old.

Sponsored by:	DARPA, AFRL
2018-06-13 12:33:47 +00:00
Andrew Turner
4493861b9a Find and cache the PSCI version on driver attach.
Sponsored by:	DARPA, AFRL
2018-06-13 12:32:04 +00:00
Andrew Turner
811880f2a4 Add the PSCI_FEATURES function ID. This is found in PSCI 1.0 and is used
to query if a given function is implemented and its features.

Sponsored by:	DARPA, AFRL
2018-06-13 12:26:37 +00:00
Bruce Evans
ab35e1c71b Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers.  This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility.  Extra bits give the much
larger complication that the translations need to compress into fewer
bits.  Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings.  E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2.  But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102.  ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them.  I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current.  E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation.  So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set.  Add some error checking for when bits are lost.  Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself.  Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32.  Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros.  Describe the encoding and
some of the reasons for it.  16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding.  My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits.  This minimizes discontiguities.

Reviewed by:	kib (except for rewrite of the comment in linux_stats.c)
2018-06-13 12:22:00 +00:00
Andrew Turner
8b47c1ae54 Rename the ThunderX CPU identification macros to include the X. This is the
name people know the product by, and is consistent with the later SoC ID
macros.

Sponsored by:	DARPA, AFRL
2018-06-13 12:17:11 +00:00
Andrew Turner
0014ef8a04 Add more Cavium CPU part numbers.
While here split the lists by vendor.

Sponsored by:	DARPA, AFRL
2018-06-13 11:58:41 +00:00
Andrey V. Elsukov
a5185adeb6 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00
Ruslan Bukin
b626c976dc Don't jump to VA space until kernel is ready.
This fixes the race when first core sets up the pagetables, while
secondary cores do translating the address of __riscv_boot_ap.

This now allows us to smpboot in QEMU with 8 cores just fine.

Sponsored by:	DARPA, AFRL
2018-06-13 10:32:21 +00:00
Bruce Evans
372639f944 Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat().  It was
missing clamping and bzero()ing of padding.

Reviewed by:	kib (except a final fix of the clamp to the signed maximum)
2018-06-13 08:50:43 +00:00
Dimitry Andric
2b6fe1b2da Fix build of liquidio with base gcc on i386
Some casts from pointers to uint64_t and back in lio_main.c cause base
gcc on i386 to warn "cast from pointer to integer of different size",
and vice versa.  Add additional casts to uintptr_t to suppress these.

Reviewed by:	sbruno
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15754
2018-06-13 07:55:57 +00:00
Marcelo Araujo
ebc3c37c6f Add SPDX tags to vmm(4).
MFC after:	4 weeks.
Sponsored by:	iXsystems Inc.
2018-06-13 07:02:58 +00:00
Matt Macy
483305b99c Handle INP_FREED when looking up an inpcb
When hash table lookups are not serialized with in_pcbfree it will be
possible for callers to find an inpcb that has been marked free. We
need to check for this and return NULL.
2018-06-13 04:23:49 +00:00
Randall Stewart
c9b4ac7587 This fixes missing VNET sets in the hpts system. Basically
without this and running vnets with a TCP stack that uses
some of the features is a recipe for panic (without this commit).

Reported by:	Larry Rosenman
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15757
2018-06-12 23:54:08 +00:00
Matt Macy
700e893c34 Defer inpcbport free in in_pcbremlists as well 2018-06-12 23:26:25 +00:00
Jung-uk Kim
6362b1a6b1 Fix number of auxargs entries to copy out for 32-bit Linuxulator.
PR:		228790
2018-06-12 22:54:48 +00:00
Rick Macklem
7e4595e82b Version bump since r334930 changed the interface between the NFS modules,
so they all need to be rebuilt.
2018-06-12 22:48:19 +00:00
Matt Macy
f09ee4fc01 Defer inpcbport free until after a grace period has elapsed
This is a dependency for inpcbinfo rlock conversion to epoch
2018-06-12 22:18:27 +00:00
Matt Macy
b872626dbe mechanical CK macro conversion of inpcbinfo lists
This is a dependency for converting the inpcbinfo hash and info rlocks
to epoch.
2018-06-12 22:18:20 +00:00
Matt Macy
addf2b2009 Defer inpcb deletion until after a grace period has elapsed
Deferring the actual free of the inpcb until after a grace
period has elapsed will allow us to convert the inpcbinfo
info and hash read locks to epoch.

Reviewed by: gallatin, jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15510
2018-06-12 22:18:15 +00:00
Emmanuel Vadot
25f0326aea simplebus pnp: Do not generate pnp info is the bus status is not okay
Generating the pnp info have the side effect to include all nodes even
if the status isn't "okay".
That means that loading the module will load but not attach as it checks
the status in the probe function.

On pine64 before :
root@pine64-lts:~ # devmatch -u
unattached on ofwbus pnpinfo name=memory
unattached on ofwbus pnpinfo name=chosen
unattached on ofwbus pnpinfo name=sound_spdif compat=simple-audio-card
unattached on ofwbus pnpinfo name=spdif-out compat=linux,spdif-dit
unattached on simplebus pnpinfo name=dma-controller@1c02000 compat=allwinner,sun50i-a64-dma
unattached on simplebus pnpinfo name=mmc@1c10000 compat=allwinner,sun50i-a64-mmc
unattached on simplebus pnpinfo name=usb@1c19000 compat=allwinner,sun8i-a33-musb
unattached on simplebus pnpinfo name=spdif@1c21000 compat=allwinner,sun50i-a64-spdif
unattached on simplebus pnpinfo name=i2s@1c22000 compat=allwinner,sun50i-a64-i2s
unattached on simplebus pnpinfo name=i2s@1c22400 compat=allwinner,sun50i-a64-i2s
unattached on simplebus pnpinfo name=serial@1c28400 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c28800 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c28c00 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c29000 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=i2c@1c2ac00 compat=allwinner,sun6i-a31-i2c
unattached on simplebus pnpinfo name=i2c@1c2b000 compat=allwinner,sun6i-a31-i2c
unattached on simplebus pnpinfo name=i2c@1c2b400 compat=allwinner,sun6i-a31-i2c
unattached on ofwbus pnpinfo name=aliases
unattached on ofwbus pnpinfo name=symbols

All simplebus node are disabled

After :
root@pine64-lts:~ # devmatch -u
unattached on ofwbus pnpinfo name=memory
unattached on ofwbus pnpinfo name=chosen
unattached on ofwbus pnpinfo name=sound_spdif compat=simple-audio-card
unattached on ofwbus pnpinfo name=spdif-out compat=linux,spdif-dit
unattached on simplebus pnpinfo name=dma-controller@1c02000 compat=allwinner,sun50i-a64-dma
unattached on simplebus pnpinfo name=usb@1c19000 compat=allwinner,sun8i-a33-musb
unattached on ofwbus pnpinfo name=aliases
unattached on ofwbus pnpinfo name=symbols

Reviewed by:	imp (with some objection)
Differential Revision:	https://reviews.freebsd.org/D15770
2018-06-12 20:03:00 +00:00
Breno Leitao
5ecc8c2077 powerpc64/powernv: Avoid type promotion
There is a type promotion that transform count = -1 into a unsigned int causing
the default TCE SEG SIZE not being returned on a Boston POWER9 machine.

This machine does not have the 'ibm,supported-tce-sizes' entries, thus, count
is set to -1, and the function continue to execute instead of returning.

Reviewed by: jhibbits, wma
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15763
2018-06-12 19:50:33 +00:00
Rick Macklem
90d2dfab19 Merge the pNFS server code from projects/pnfs-planb-server into head.
This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate.  Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
    the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
    and Change attributes for the file, so that the MDS doesn't need to
    acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories.  The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
             is. The critical bits of information returned by the FreeBSD
             server is the IP address of the DS and, for the Flexible
             File layout, that NFSv4.1 is to be used and that it is
             "tightly coupled".
             There is a "deviceid" which identifies the DeviceInfo.
Layout     - This is per file and can be recalled by the server when it
             is no longer valid. For the FreeBSD server, there is support
             for two types of layout, call File and Flexible File layout.
             Both allow the client to do I/O on the DS via NFSv4.1 I/O
             operations. The Flexible File layout is a more recent variant
             that allows specification of mirrors, where the client is
             expected to do writes to all mirrors to maintain them in a
             consistent state. The Flexible File layout also allows the
             client to report I/O errors for a DS back to the MDS.
             The Flexible File layout supports two variants referred to as
             "tightly coupled" vs "loosely coupled". The FreeBSD server always
             uses the "tightly coupled" variant where the client uses the
             same credentials to do I/O on the DS as it would on the MDS.
             For the "loosely coupled" variant, the layout specifies a
             synthetic user/group that the client uses to do I/O on the DS.
             The FreeBSD server does not do striping and always returns
             layouts for the entire file. The critical information in a layout
             is Read vs Read/Writea and DeviceID(s) that identify which
             DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by:	james.rose@framestore.com
Relnotes:	yes
2018-06-12 19:36:32 +00:00
Ruslan Bukin
6fdc57357e Include VirtIO devices to the GENERIC configuration file.
These are now available in QEMU/RISC-V.

Sponsored by:	DARPA, AFRL
2018-06-12 17:55:40 +00:00
Ruslan Bukin
2d53a67c2c o Add driver for PLIC (Platform-Level Interrupt Controller) device.
o Convert interrupt machdep support to use INTRNG code.

Sponsored by:	DARPA, AFRL
2018-06-12 17:45:15 +00:00
Ruslan Bukin
ebdf0baf3a Add simplebus-like RISC-V SoC bus.
This is required in order to probe and attach devices described under
"riscv-virtio-soc" node of DTS.

Sponsored by:	DARPA, AFRL
2018-06-12 17:07:30 +00:00
Ruslan Bukin
f2e299880a Release secondary cores from WFI (wait for interrupt) by sending them
an IPI.

This does not work however yet in QEMU. As a temporary workaround set
software interrupt pending bit manually on a local core to ensure WFI
doesn't halt the hart.

This is required to smpboot in QEMU.

Sponsored by:	DARPA, AFRL
2018-06-12 16:47:33 +00:00
Ruslan Bukin
a9063ba1d7 Align virtual addressing entries.
This is required due to C-compressed ISA extension option being turned on.

This fixes SMP operation in QEMU.

Sponsored by:	DARPA, AFRL
2018-06-12 16:19:27 +00:00
Andrew Turner
0c38b2d37c Rework PSCI so it only searches for the call function once.
This is in preperation for supporting newer smccc functions that also use
the same call method.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D15745
2018-06-12 14:54:17 +00:00
Ed Maste
0f69696824 linux64: use linux output target for linux_vdso.so
linux_vdso.so provides the vdso for the linuxulator's amd64 target and
is mapped into a Linux binary's address space.  Thus it should be a
Linux-style .so, which has the ELF OS/ABI unset.

It turns out that ELF Tool Chain elfcopy/objcopy also has a bug where
the OS/ABI field is unset, regardless of the specified --output-target,
so this change is a no-op with the default in-tree toolchain.  This is a
real fix when using external binutils, and the ELF Tool Chain bug will
be fixed in the future.

PR:		228934
Sponsored by:	Turing Robotic Industries
2018-06-12 13:32:42 +00:00
Diane Bruce
5bede50958 Add a driver for the BCM2835 Mini-UART as seen on the RPi3
Reviewed by:	andrew
Approved by:	andrew
Differential Revision:	https://reviews.freebsd.org/D15684
2018-06-12 13:26:31 +00:00
Emmanuel Vadot
e34425be26 arm64: rockchip: Correctly set armclk
Parent needs to be the same frequency as the armclk, not twice the freq.
The real divider is incremented by one so write it with - 1
The rate can be at index 0

Pointy Hat To: myself
2018-06-12 11:47:21 +00:00
Konstantin Belousov
6ee7d5afcc All exceptions IDT descriptors must use interrupt gates on 4/4 kernel.
Fix it for #MF.

Noted by:	rlibby
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:43:20 +00:00
Konstantin Belousov
7a18e90447 Fix typo.
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:41:26 +00:00
Hans Petter Selasky
986e3bed8b Implement the ip_eth_mc_map() function in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-06-12 08:43:49 +00:00
Navdeep Parhar
1bb577b4c2 cxgbe(4): Remove homemade version of htobe32 from the driver.
It was needed only for ia64 where it was implemented as a call to
bswapXX, which was always a real function.  htobeXX with a constant
argument is calculated at compile-time everywhere else.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-06-12 06:46:03 +00:00
Jonathan T. Looney
16a227c7c9 Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER
option.

The BPF code was creating a compiled filter in the common filter-creation
path.  However, BPF only uses compiled filters in the read direction.
When creating a write filter, the common filter-creation code was
creating an unneeded write filter and leaking the memory used for that.

MFC after:	2 weeks
Sponsored by:	Netflix
2018-06-11 23:32:06 +00:00
Ed Maste
2c8cf0c505 if_muge: retire lan78xx_eeprom_read
lan78xx_eeprom_read just checked for EEPROM presence then called
lan78xx_eeprom_read_raw if present, and had only one caller.  Introduce
lan78xx_eeprom_present to check for EEPROM presence, and use it in the
one place it is needed.

This is used by r334964, which was accidentally committed out-of-order
from my work tree.

Reported by:	markj
Sponsored by:	The FreeBSD Foundation
2018-06-11 19:34:47 +00:00
Rick Macklem
73b1879c2d Add a couple of safety belt checks to the NFSv4.1 client related to sessions.
There were a couple of cases in newnfs_request() that it assumed that it
was an NFSv4.1 mount with a session. This should always be the case when
a Sequence operation is in the reply or the server replies NFSERR_BADSESSION.
However, if a server was broken and sent an erroneous reply, these safety
belt checks should avoid trouble.
The one check required a small tweak to nfsmnt_mdssession() so that it
returns NULL when there is no session instead of the offset of the field
in the structure (0x8 for i386).
This patch should have no effect on normal operation of the client.
Found by inspection during pNFS server development.

MFC after:	2 weeks
2018-06-11 19:00:07 +00:00
Ed Maste
00ce0c6258 makesyscalls: simplify capenabled pipeline
Replace cat + 2x grep with one grep.

Sponsored by:	Turing Robotic Industries
2018-06-11 18:57:40 +00:00
Ed Maste
2d14fb8bec if_muge: add LAN7850 support
Differences between LAN7800 and LAN7850 from the driver's perspective:

* The LAN7800 muxes EEPROM signals with LEDs, so LED mode needs to be
  disabled when reading/writing EEPROM.  The EEPROM is not muxed on the
  LAN7850.

* The Linux driver enables automatic duplex and speed detection when
  there is no EEPROM, for the LAN7800 only.  With this FreeBSD driver
  LAN7850-based adapters without a configuration EEPROM fail to link
  (with or without the automatic duplex and speed detection code), so
  I have just followed the example of the Linux driver for now.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	Microchip (hardware)
2018-06-11 18:44:56 +00:00
Matt Macy
0ea9d9376e limit change to fixing controlp handling pending review 2018-06-11 17:10:19 +00:00
Matt Macy
c34bf30069 soreceive_stream: correctly handle edge cases
- non NULL controlp is not an error, returning EINVAL
  would cause X forwarding to fail

- MSG_PEEK and MSG_WAITALL are fairly exceptional, but we still
  want to handle them - punt to soreceive_generic
2018-06-11 16:31:42 +00:00
Mark Johnston
a9336cef39 Use the cached curthread reference in pmc_process_interrupt().
Fix indentation while here.
2018-06-11 16:27:09 +00:00
Hans Petter Selasky
35555d474b Implement the kstrtobool() and kstrtobool_from_user() functions
in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-11 16:26:33 +00:00
Hans Petter Selasky
93ed9ab20b Implement the user_access_begin(), user_access_end(), usafe_get_user() and
unsafe_put_user() function macros in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-11 15:42:29 +00:00
Konstantin Belousov
b45e10c3f4 Fix braino in r334799. Maxmem is in pages.
Reported by:	ae, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-11 15:28:20 +00:00
Jonathan T. Looney
cff21e484b Change RACK dependency on TCPHPTS from a build-time dependency to a load-
time dependency.

At present, RACK requires the TCPHPTS option to run. However, because
modules can be moved from machine to machine, this dependency is really
best assessed at load time rather than at build time.

Reviewed by:	rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15756
2018-06-11 14:27:19 +00:00
Dimitry Andric
48f00bafa1 Fix build of bxe with base gcc on i386
Casting from rman_res_t to a pointer results in "cast to pointer from
integer of different size" warnings with base gcc on i386, so print
these without casting.  The kva field of struct bxe_bar is of type
vm_offset_t, which can be 32 or 64 bit, so cast it to uintmax_t before
printing.

Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15733
2018-06-11 10:08:22 +00:00
Dimitry Andric
ee3d52d730 Disable building aesni with base gcc
Because base gcc does not support the required intrinsics, do not
attempt to compile the aesni module with it.

Noticed by:	Dan Allen <danallen46@gmail.com>
MFC after:	3 days
2018-06-11 08:42:03 +00:00
Dimitry Andric
a0e8aab7d8 Fix build of i915kms with base gcc
Base gcc fails to compile sys/dev/drm2/i915/intel_display.c for i386,
with the following -Werror warnings:

cc1: warnings being treated as errors
/usr/src/sys/dev/drm2/i915/intel_display.c:8884: warning:
initialization from incompatible pointer type

This is due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36432, which
incorrectly interprets the [] as a flexible array member.

Because base gcc does not have a -W flag to suppress this particular
warning, it requires a rather ugly cast.  To not influence any other
compiler, put it in a #if/#endif block.

Reviewed by:	kib
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15744
2018-06-11 08:11:35 +00:00
Dimitry Andric
bc5ea07c09 Fix build of ocs_fs with base gcc on i386
Add a few intermediate casts to uintptr_t to suppress "cast to pointer
from integer of different size" warnings from gcc.  Also remove a few
incorrect casts.

Reviewed by:	ram
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D15747
2018-06-11 07:57:32 +00:00
Eitan Adler
0642b65f38 usbdevs: sort my prior commit 2018-06-11 05:28:00 +00:00
Eitan Adler
876713d563 usbdevs: adding vendor
PR:		228856
Reported by:	hrs, ys-h@imail.earth
2018-06-11 05:27:07 +00:00
Andrew Turner
619e50a657 Remove the psci option from arm64. It is now a standard option as it is
required to boot correctly.

Sponsored by:	DARPA, AFRL
2018-06-10 19:42:44 +00:00
Eitan Adler
c8141b92e9 Revert r334929
Apparently some software might depend on a header whose sole contents is
a `#warning` to remove it. Revert pending exp-run.
2018-06-10 19:15:38 +00:00
Rick Macklem
8097753476 Add checks for the Flexible File layout to LayoutRecall callbacks.
The Flexible File layout case wasn't handled by LayoutRecall callbacks
because it just checked for File layout and returned NFSERR_NOMATCHLAYOUT
otherwise. This patch adds the Flexible File layout handling.
Found during testing of the pNFS server.

MFC after:	2 weeks
2018-06-10 19:03:21 +00:00
Eitan Adler
7f7bc5b2f9 include: remove sys/capability.h
This file has only generated a warning for the last 18 months. Its
existence at this point only serves to confuse software looking for
POSIX.1e capabilities and produce actionless warnings.
2018-06-10 18:38:48 +00:00
Andrew Turner
dc9b99a884 Clean up handling of unexpected exceptions. Previously we would issue a
breakpoint instruction, however this would lose information that may be
useful for debugging.

These are now handled in a similar way to other exceptions, however it
won't exit out of the exception handler until it is known if we can
handle these exceptions in a useful way.

Sponsored by:	DARPA, AFRL
2018-06-10 16:21:21 +00:00
Bruce Evans
3cd246d9a9 Untangle configuration ifdefs a little. On x86, msi is optional on pci,
and also on apic in common and i386 files (except for xen it is optional
only on xenhvm), but it was not ifdefed except on apic in common and i386
files.

This is all that is left from an attempt to build a (sub-)minimal kernel
without any devices.  The isa "option" is still used without ifdefs in many
standard files even on amd64.  ISAPNP is not optional on at least i386.
ATPIC is not optional on i386 (it is used mainly for Xspuriousint).  But
pci is now supposed to be optional on x86.
2018-06-10 14:49:13 +00:00
Bruce Evans
09e3c9a4ec Fix panics in potentially all x86bios calls on i386 since r332489.
A call to npxsave() in the exception trampolines was not relocated.
This call to a garbage address usually paniced when made, but it is only
made when the thread has used an FPU recently, and this is not the usual
case.

PR:		228755
Reviewed by:	kib
2018-06-10 14:21:01 +00:00
Vladimir Kondratyev
67580198b7 Drop MOUSE_GETVARS and MOUSE_SETVARS ioctls support.
These ioctls are not documented and only stubbed in a few drivers: mse(4),
psm(4) and syscon's sysmouse(4). The only exception is MOUSE_GETVARS
implemented in psm(4)

Given the fact that they were introduced 20 years ago and implementation
has never been completed, remove any related code.

PR:		228718 (exp-run)
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D15726
2018-06-10 10:23:31 +00:00
Rick Macklem
be9d155ff4 Delete some macros that are unused.
These macros were added because they were used by the pNFS server last
year. However, they are no longer used by the pNFS server code and
might as well be deleted.
This is a partial reversion of r326735.
2018-06-09 23:38:22 +00:00
Rick Macklem
d506aa140d Delete an unused macro and clean up a comment about it.
NFSDEV_MIRRORSTR was defined for the pNFS server, but has not been used,
so this patch deletes it. It also cleans up the comment and hopefully
makes it more readable.
2018-06-09 23:14:59 +00:00
Mark Johnston
8590505f48 Bump __FreeBSD_version after r334881 and force libdwarf to be rebuilt.
Reported by:	O. Hartmann <ohartmann@walstatt.org>
Reviewed by:	bdrewery
2018-06-09 20:01:03 +00:00
Mark Johnston
f090f67503 Tell the compiler that rdtscp clobbers %ecx. 2018-06-09 18:31:19 +00:00
Andrew Turner
c11862c030 In the ThunderX BGX network driver we were skipping the NULL terminator
when parsing the phy type, however this is included in the length returned
by OF_getprop. To fix this stop ignoring the terminator.

PR:		228828
Reported by:	sbruno
Sponsored by:	DARPA, AFRL
2018-06-09 14:47:49 +00:00
Kristof Provost
0b799353d8 pf: Fix deadlock with route-to
If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR:		228782
MFC after:	1 week
2018-06-09 14:17:06 +00:00
Andrey V. Elsukov
44bcc06816 Explicitly change the link state when we assingn an address.
Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that
after this link state information still not initialized properly.
This leads to problems with routing, since now interface has
IFCAP_LINKSTATE capability and a route is considered as working only
when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP()
macro).

Reported by:	Marek Zarychta
MFC after:	3 days
2018-06-09 09:57:14 +00:00
Mateusz Guzik
0001edb823 counter: add a bit missed in r334858
It happens to be a noop.
2018-06-08 22:06:32 +00:00
Stephen Hurd
3ab4a96085 Remove tx task spinning added in r333686
This caused issues with PASTE.  Just remove the reschedule since the DELAY()
should be enough for use cases such as pkt-gen which were failing before the
change.

Reported by:	Michio Honda
Sponsored by:	Limelight Networks
2018-06-08 21:49:19 +00:00
Mateusz Guzik
4e180881ae uma: implement provisional api for per-cpu zones
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.

In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
2018-06-08 21:40:03 +00:00
Matt Macy
4f63fbc955 hwpmc: remove dangling references to hwpmc_xscale
Reported by:	mjg
2018-06-08 20:39:49 +00:00
Tycho Nightingale
4d20e87b7e Don't bother looking for non-executable pages when a process is
excluded from PTI.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15708
2018-06-08 20:35:58 +00:00
Matt Macy
a62b4665f4 AF_UNIX: bring uipc_ready in compliance with new locking protocol
PR:	228742
Submitted by: markj
Reviewed by:	markj
2018-06-08 20:31:59 +00:00
Jonathan T. Looney
1fbe13cf4b Add a socket destructor callback. This allows kernel providers to set
callbacks to perform additional cleanup actions at the time a socket is
closed.

Michio Honda presented a use for this at BSDCan 2018.
(See https://www.bsdcan.org/2018/schedule/events/965.en.html .)

Submitted by:	Michio Honda <micchie at sfc.wide.ad.jp> (previous version)
Reviewed by:	lstewart (previous version)
Differential Revision:	https://reviews.freebsd.org/D15706
2018-06-08 19:35:24 +00:00
Matt Macy
239f5f541b hwpmc: yet another missed fixup 2018-06-08 18:54:47 +00:00
Konstantin Belousov
8d59ab652b Restore release semantic for the old thread unlock on arm64.
With the introduction of pmap_switch(), the DSB instruction on the
address map switch is not necessary executed, which is fixed by
changing the unlock store to release.  Also remove comment which
documented pre-pmap_switch() code.

Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-08 18:32:26 +00:00
Matt Macy
6272d7e647 hwpmc: remove hwpmc_xscale.c from corresponding arm build 2018-06-08 18:24:46 +00:00
Adrian Chadd
8a97beff98 [ath_hal] Return failure if noise floor calibration fails.
If we fail noise floor calibration then we may end up with a deaf NIC
which we can't recover without a full chip reset.

Earlier chips seem to get less stuck in this condition versus AR9280/later
and AR9300/later, but whilst here just fix up the AR5212 era chips to also
return NF calibration failures.

This HAL routine would only return failure if the channel was not configured.

This is a no-op until the driver side code for doing resets and the HAL
code for being told about the reset type (and then handling it!) is
implemented.

Tested:

* AR9280, STA mode
* AR2425, STA mode
* AR9380, STA mode
2018-06-08 18:21:57 +00:00
Adrian Chadd
7b1c2c4ec2 [ath_hal] Don't do ANI processing if we've reset.
If we've reset then we can't trust the current state of the ANI tracking,
so just wait until next time.

Tested:

* AR5424, STA mode (2GHz)
2018-06-08 18:15:23 +00:00
Matt Macy
7bca795ee0 hwpmc: retire never completed xscale support
hwpmc xscale support is not actually functional and the
architecture is well past its shelf life.
2018-06-08 18:09:19 +00:00
Matt Macy
d73912e57a hwpmc: update files missed by r334827 2018-06-08 17:41:49 +00:00
Sean Eric Fagan
69724399c4 This originated from ZFS On Linux, as
d4a72f2386

During scans (scrubs or resilvers), it sorts the blocks in each transaction
group by block offset; the result can be a significant improvement. (On my
test system just now, which I put some effort to introduce fragmentation into
the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the
changes.) I've seen similar rations on production systems.

Approved by:	Alexander Motin
Obtained from:	ZFS On Linux
Relnotes:	Yes (improved scrub performance, with tunables)
Differential Revision:	https://reviews.freebsd.org/D15562
2018-06-08 17:38:28 +00:00
Matt Macy
3db28e6656 avoid 'tcp_outflags defined but not used' 2018-06-08 17:37:49 +00:00
Matt Macy
afbd6cfa72 hpts: remove redundant decl breaking gcc build 2018-06-08 17:37:43 +00:00
Matt Macy
46033610ec unbreak LINT build after r334804 2018-06-08 05:48:36 +00:00
Matt Macy
7f5336f666 hwpmc: fix arm64 INVARIANTS build 2018-06-08 05:48:28 +00:00
Mateusz Guzik
b8af2820f6 uma: fix up r334824
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
2018-06-08 05:40:36 +00:00
Matt Macy
58378a8971 rtentry_zinit: don't blindly pass through M_ZERO to counter alloc 2018-06-08 05:17:06 +00:00
Matt Macy
978910109d hwpmc: avoid undefined variable on LINT 2018-06-08 05:01:09 +00:00
Matt Macy
eb7c901995 hwpmc: simplify calling convention for hwpmc interrupt handling
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.

While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.

core2_intr(cpu, tf) ->
  pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
    pmc_add_sample(cpu, ring, pm, tf, inuserspace)

In the process of optimizing the tail call the tf pointer was getting
clobbered:

(kgdb) up
    at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709                                pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205                    error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,

resulting in a crash in pmc_save_kernel_callchain.
2018-06-08 04:58:03 +00:00
Mateusz Guzik
dfa5753e09 amd64: remove now unused bzero, bcmp and bcopy. move pagecopy higher up. 2018-06-08 04:18:42 +00:00
Mateusz Guzik
ea99223ec9 uma: remove M_ZERO support for pcpu zones
Nothing in the tree uses it and pcpu zones have a fundamentally different use
case than the regular zones - they are not supposed to be allocated and freed
all the time.

This reduces pollution in the allocation fast path.
2018-06-08 03:16:16 +00:00
Mateusz Guzik
c9ca1a70cc amd64: fix a retarded bug in memset
memset fills the target buffer from a byte-sized value passed in as the
second argument.

The fully-sized (8 bytes) register containing it is named %rsi. Lower 4 bytes
can be referred to as %esi and finally the lowest byte is %sil.

Vast majority of all the callers just zero the target buffer and set it up by
doing xor %esi,%esi which has a side-effect of zeroing the upper parts of
the register as well. Some others do a word-sized move to %esi which has the
same result.

However, there are callers which only fill %sil. This does *not* clear up
the rest of the register.

The value of %rsi is multiplied by $0x0101010101010101 to create a 8-byte sized
pattern for 8-byte stores.

Prior to the patch, the func just blindly took %rsi assuming the unwanted bytes
are zeroed out. Since this is not the case for the callers which only play with
%sil (the rest of the register can have absolutely anything), the resulting
pattern can be garbage.

This has potential for funny bugs. One side effect (which was not amusing)
after enabling it instead of bzero was that the kernel was hanging on boot
as a xen domU.

Reported by:	Trond Endrestøl <Trond.Endrestol fagskolen.gjovik.no>
Pointy hat: me
2018-06-08 00:47:24 +00:00
Gleb Smirnoff
c5deaf0452 UMA memory debugging enabled with INVARIANTS consists of two things:
trashing freed memory and checking that allocated memory is properly
trashed, and also of keeping a bitset of freed items. Trashing/checking
creates a lot of CPU cache poisoning, while keeping debugging bitsets
consistent creates a lot of contention on UMA zone lock(s). The performance
difference between INVARIANTS kernel and normal one is mostly attributed
to UMA debugging, rather than to all KASSERT checks in the kernel.

Add loader tunable vm.debug.divisor that allows either to turn off UMA
debugging completely, or turn it on only for a fraction of allocations,
while still running all KASSERTs in kernel. That allows to run INVARIANTS
kernels in production environments without reducing load by orders of
magnitude, but still doing useful extra checks.

Default value is 1, meaning debug every allocation. Value of 0 would
disable UMA debugging completely. Values above 1 enable debugging only
for every N-th item. It isn't possible to strictly follow the number,
but still amount of debugging is reduced roughly by (N-1)/N percent.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D15199
2018-06-08 00:15:08 +00:00
Breno Leitao
6d645c57a3 Fix excise_initrd_region() to support 32- and 64-bit initrd params.
Changed excise_initrd_region to support both 32- and 64-bit
values for linux,initrd-start and linux,initrd-end.

This fixes the boot problem on some machines after rS334485.

Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: jhibbits, leitao
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15667
2018-06-07 21:24:21 +00:00
Randall Stewart
401e870791 Take out the stack alias inadvertantly added by my commit.
Reported by:	Peter Lei
2018-06-07 20:57:12 +00:00
Randall Stewart
12693c6c83 Fix build issue with const and volatile and the
myriad ways that the various compliers treat this. The
only safe prefetch appears to be for AMD. The other
compilers either are not volatile or are not const :(

Reported by:	Michael Tuexen
2018-06-07 19:57:55 +00:00
Benno Rice
b3b11d6400 Break recursion involving getnewvnode and zfs_rmnode.
When we're at our vnode limit, getnewvnode will call into the vnode LRU
cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we
end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling
represents something with extended attributes, zfs_rmnode will call
zfs_zget which will attempt to allocate another vnode. If the next vnode we
try to recycle is also a ZFS vnode representing something with extended
attributes we can recurse further. This ends up being unbounded and can end
up overflowing the stack.

In order to avoid this, restructure zfs_rmnode to simply add the extended
attribute directory's object ID to the unlinked set, thus not requiring the
allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain
which will do the work of properly marking the vnodes for unlinking.
zfs_unlinked_drain is also called on mount so these will be cleaned up
there.

Reviewed by:	avg, mav
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15342
2018-06-07 18:59:32 +00:00
Randall Stewart
89e560f441 This commit brings in a new refactored TCP stack called Rack.
Rack includes the following features:
 - A different SACK processing scheme (the old sack structures are not used).
 - RACK (Recent acknowledgment) where counting dup-acks is no longer done
        instead time is used to knwo when to retransmit. (see the I-D)
 - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt
        to try not to take a retransmit time-out. (see the I-D)
 - Burst mitigation using TCPHTPS
 - PRR (partial rate reduction) see the RFC.

Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.

Note that any connection that does not support SACK will be kicked
back to the "default" base  FreeBSD stack (currently known as "default").

To build this into your kernel you will need to enable in your
kernel:
   makeoptions WITH_EXTRA_TCP_STACKS=1
   options TCPHPTS

Sponsored by:	Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D15525
2018-06-07 18:18:13 +00:00
Konstantin Belousov
943defc3a0 Account for dmap limit when selecting the pages for the bootstrap
pagetables.

physmap[] can be inconsistent with the physical memory limit due to
buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables
are initialized by accesses through the DMAP, we must ensure that DMAP
really cover the selected pages. This is only relevant when machine
has less than 4G RAM and buggy BIOS, which is the combination on Acer
Chromebook 720.

The call to mp_bootaddress() is moved later to have Maxmem initialized.

An alternative could be to always cover 4G for DMAP, but this change
seems to be simpler.

Reported and tested by:	grembo
Reviewed by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15675
2018-06-07 17:04:34 +00:00
Breno Leitao
4a4b4c98f5 dev/ofw: Fix ofw_fdt_getprop() return values to match documentation
Fix the behavior of ofw_fdt_getprop() and ofw_fdt_getprop() functions to match
the documentation as the non-fdt code.

Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: manu, jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15680
2018-06-07 15:59:08 +00:00
Andriy Gapon
0fb3a72a0d x86: reorganize code that deals with unexpected NMI-s
Expected NMI-s are those than are either generated by the software (such
as a CPU sending NMI to other CPU) or generated by the hardware after
the software configured it to do so (such as NMI-s on PMC events).

Some unexpected NMI-s can be caused by hardware failures and it is
possible to inquire the hardware about them (somewhat like MCA but much
more primitive) using an EISA mechanism.  In some cases the origin of
the NMI can remain truly unknown.

This commit should not change any functionality.  It just reorganizes
the code, so that it is easier to extend with new checks for the origin
of the NMI.  Also, it frees the code that has nothing to do with ISA
from DEV_ISA.

MFC after:	3 weeks
2018-06-07 14:46:52 +00:00
Andriy Gapon
413ed27cd7 expand descriptions of x86 panic_on_nmi and kdb_on_nmi sysctls
The descriptions were as terse as the variable names and they did not
explain additional conditions for knobs.

MFC after:	1 week
2018-06-07 14:23:31 +00:00
Breno Leitao
7b2c7b92be md: use prestaged mfs_root
On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs
into memory and set two fdt entries to describe where the file is located in
the memory;

I need to pass this memory region to the md device as a mfs_root, but, current
md driver does not support two things:

 * Just getting a pointer from an external (bootloader) memory. If I need to
workaround it, I would need to declare a static array and memcopy from this
external memory to this static variable.

 * The size of the image. The usage of mfs_root_end, which is not a pointer,
seems to be not possible for this prestaged scenario.

This patch simply adds a new way to load mfs_root from memory.

Differential Revision: https://reviews.freebsd.org/D15625
Approved by: kib, jhibbits (mentor)
2018-06-07 13:57:34 +00:00
Jonathan T. Looney
16e05b3275 Fix a typo in vm_domain_set(). When a domain crosses into the severe range,
we need to set the domain bit from the vm_severe_domains bitset (instead
of clearing it).

Reviewed by:	jeff, markj
Sponsored by:	Netflix, Inc.
2018-06-07 13:29:54 +00:00
Eric Joyner
a06424ddd3 iflib: Record TCP checksum info in iflib when TCP checksum is requested
ixl(4) (when it switches over to using iflib) devices need the TCP header
length in order to do TCP checksum offload.

Reviewed by:	gallatin@, shurd@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15558
2018-06-07 13:03:07 +00:00
Hans Petter Selasky
71ee95ddf7 Define ARCH_KMALLOC_MINALIGN in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-07 11:44:11 +00:00
Hans Petter Selasky
d150e15285 Wrap timespec64 into timespec in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-07 11:41:42 +00:00
Hans Petter Selasky
a041e75a34 Move the EXPORT_SYMBOL_XXX() function macros into own header file.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-07 11:34:59 +00:00
Hans Petter Selasky
422d8af4df Implement the dev_pm_set_driver_flags() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-07 11:29:07 +00:00
Justin Hibbits
b5e08a60e0 Build nvme modules for powerpc, and install man pages
NVMe builds for powerpc now, so just build modules for all powerpc targets,
and install NVMe man pages for all powerpc targets.
2018-06-07 11:25:36 +00:00
Alan Cox
5b274055d1 When pidctrl_daemon() is called multiple times within an interval, it
should use the cumulative error to calculate the output.
2018-06-07 07:48:50 +00:00
Matt Macy
fcabd54160 AF_UNIX: check for unp == unp2 on disconnect 2018-06-07 04:57:40 +00:00
Alan Cox
e768070ca9 pidctrl_daemon() implements a variation on the classical, discrete PID
controller that tries to handle early invocations of the controller,
in other words, invocations before the expected end of the interval.
However, there were some calculation errors in this early invocation
case.  Notably, if an early invocation occurred while the error was
negative, the derivative term was off by a large amount.  One visible
effect of this error was that processes were being killed by the
virtual memory system's OOM killer when in fact there was plentiful
free memory.

Correct a couple minor errors in the sysctl descriptions, and apply
some style fixes.

Reviewed by:	jeff, markj
2018-06-07 02:54:11 +00:00
Matt Macy
9616acde97 hwpmc: don't do EMIT64 on constant 2018-06-07 02:20:27 +00:00
Matt Macy
f992dd4b5c pmc: convert native to jsonl and track TSC value of samples
- add '-j' options to filter to enable converting native pmc
  log format to json lines format to enable the use of scripts
  and external tooling

% pmc filter -j pmc.log pmc.jsonl

- Record the tsc value in sampling interrupts as opposed to
  recording nanotime when the sample is copied to a global log
  in hardclock - potentially many milliseconds later.

- At initialize record the tsc_freq and the time of day to give
  us an offset for translating the tsc values in callchain records
2018-06-07 02:03:22 +00:00
Matt Macy
41abd7afa3 hwpmc: don't log pid->name more than once 2018-06-07 00:54:43 +00:00
Matt Macy
155046394a cpufunc: add rdtscp for x86 2018-06-07 00:54:11 +00:00
Michael Tuexen
ff34bbe9c2 Improve compliance with RFC 4895 and RFC 6458.
Silently dicard SCTP chunks which have been requested to be
authenticated but are received unauthenticated no matter if support
for SCTP authentication has been negotiated. This improves compliance
with RFC 4895.

When the application uses the SCTP_AUTH_CHUNK socket option to
request a chunk to be received in an authenticated way, enable
the SCTP authentication extension for the end-point. This improves
compliance with RFC 6458.

Discussed with:	Peter Lei
MFC after:	3 days
2018-06-06 19:27:06 +00:00
Conrad Meyer
cbb009b9fe puc(4): Add provisional support for Exar XR17V352
Reportedly, this is sufficient for 4800bps use, but maybe not any better.

PR:		228781
Submitted by:	peo AT nethead.se
2018-06-06 16:47:33 +00:00
Hans Petter Selasky
40ddfc7604 Make some list functions RCU safe in the LinuxKPI.
While at it rename hlist_add_after() into hlist_add_behind().

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 15:49:01 +00:00
Sean Bruno
1a43cff92a Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
Hans Petter Selasky
17e2a84e9a Rewrite code using atomic_fcmpset_int() in the LinuxKPI.
Suggested by:	mjg@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 15:31:47 +00:00
Hans Petter Selasky
e6e028d01f Implement the __add_wait_queue_entry_tail() function in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 15:19:30 +00:00
Justin Hibbits
3f9e1fc8ee Revert r334708
This is the wrong place to put the barrier.
Requested by:	kib,mjg
2018-06-06 15:12:19 +00:00
Hans Petter Selasky
7e95e98db8 Implement the might_sleep_if() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 15:10:11 +00:00
Hans Petter Selasky
ab98f1e8d8 Rename two structure field members while keeping backwards compatibility in
the LinuxKPI. Add a comment saying in which Linux version this change was made.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 15:06:21 +00:00
Hans Petter Selasky
1b092623b2 Implement the init_wait_entry() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 14:59:23 +00:00
Hans Petter Selasky
23dcf4359e Implement the atomic_dec_if_positive() function in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 13:59:51 +00:00
Hans Petter Selasky
9e067b2256 Implement the ktime_compare() and ktime_after() functions in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 13:37:31 +00:00
Hans Petter Selasky
a16387c1d4 Implement the rdmsrl_safe() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 13:29:52 +00:00
Andrey V. Elsukov
590d0a43b6 Make in_delayed_cksum() be similar to IPv6 implementation.
Use m_copyback() function to write checksum when it isn't located
in the first mbuf of the chain. Handmade analog doesn't handle the
case when parts of checksum are located in different mbufs.
Also in case when mbuf is too short, m_copyback() will allocate new
mbuf in the chain instead of making out of bounds write.

Also wrap long line and remove now useless KASSERTs.

X-MFC after:	r334705
2018-06-06 13:01:53 +00:00
Justin Hibbits
32c369f40c Add a memory barrier after taking a reference on the vnode holdcnt in _vhold
This is needed to avoid a race between the VNASSERT() below, and another
thread updating the VI_FREE flag, on weakly-ordered architectures.

On a 72-thread POWER9, without this barrier a 'make -j72 buildworld' would
panic on the assert regularly.

It may be possible to use a weaker barrier, and I'll investigate that once
all stability issues are worked out on POWER9.
2018-06-06 12:57:11 +00:00
Andrey V. Elsukov
4a089e6bc5 Use m_copyback() function to write delayed checksum when it isn't located
in the first mbuf of the chain.

MFC after:	1 week
2018-06-06 10:46:24 +00:00
Tom Jones
1fdbfb909f Use UDP len when calculating UDP checksums
The length of the IP payload is normally equal to the UDP length, UDP Options
(draft-ietf-tsvwg-udp-options-02) suggests using the difference between IP
length and UDP length to create space for trailing data.

Correct checksum length calculation to use the UDP length rather than the IP
length when not offloading UDP checksums.

Approved by: jtl (mentor)
Differential Revision:	https://reviews.freebsd.org/D15222
2018-06-06 07:04:40 +00:00
Andrey V. Elsukov
a41372abd4 Fix LINT-NOINET build.
Use known at build time size for min_length value. Also remove the
check from in6_gre_encapcheck(), now it is done in generic code.
2018-06-06 05:17:21 +00:00
Mateusz Guzik
ffc3ab5d39 malloc: elaborate on r334545 due to frequent questions
While here annotate the NULL check as probably true.
2018-06-06 05:08:05 +00:00
Matt Macy
b2ca2e50b9 hwpmc: add summary command and further metadata extensions
metadata changes:
- log pmc sample rate with pmcallocate
- log proc flags with thread / process logging
  to identify user vs kernel threads

fixes:
- use log cpuid to translate event id to event name

Implement rudimentary summary command to track sample
counts by thread and process name within a pmc log.

% make -j4 buildkernel >& /dev/null &
% sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15
% pmc summary foo
cpu_clk_unhalted.thread_p_any:
        idle: 138108207162
        clang-6.0: 105336158004
        sh: 72340108510
        make: 8642012963
        kernel: 7754011631
longest_lat_cache.miss:
        clang-6.0: 87502625
        sh: 40901227
        make: 5500165
        kernel: 3300099
        awk: 2000060

%  pmc summary -f ~/foo
idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003
idle: 69054
clang-6.0: 52668
sh: 36170
make: 4321
kernel: 3877
hwpmc: proc(7445): 3319
awk: 1289
xargs: 357
rand_harvestq: 181
mtree: 102
intr: 53
zfskern: 31
usb: 7
pagedaemon: 4
ntpd: 3
syslogd: 1
acpi_thermal: 1
logger: 1
syncer: 1
snmptrapd: 1
sleep: 1
idx: 17 name: longest_lat_cache.miss rate: 100003
clang-6.0: 875
sh: 409
make: 55
kernel: 33
awk: 20
hwpmc: proc(7445): 14
xargs: 9
idle: 8
intr: 3
zfskern: 2
2018-06-06 02:48:09 +00:00
Andrey V. Elsukov
b941bc1d6e Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
Andrey V. Elsukov
73ae9758a1 Constify argument of in6_getscope(). 2018-06-05 20:54:29 +00:00
Andrey V. Elsukov
6d8fdfa9d5 Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
Eric van Gyzen
56009ba0ed Make Coverity more happy with r334545
Coverity complains about:

	if (((flags) & M_WAITOK) || _malloc_item != NULL)

saying:

	The expression
		1 /* (2 | 0x100) & 2 */ || _malloc_item != NULL
	is suspicious because it performs a Boolean operation
	on a constant other than 0 or 1.

Although the code is correct, add "!= 0" to make it slightly
more legible and to silence hundreds(?) of Coverity warnings.

Reported by:	Coverity
Discussed with:	mjg
Sponsored by:	Dell EMC
2018-06-05 20:34:11 +00:00
Andrey V. Elsukov
9f855fb282 tcp_lro.h requires <netinet/in.h>, include it directly instead of
indirect inclusion trough if_gif.h
2018-06-05 19:23:23 +00:00
Hans Petter Selasky
7a13eeba18 Declare and set the global "system_highpri_wq" workqueue structure pointer
in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:49:35 +00:00
Hans Petter Selasky
c6d920309c Implement the INIT_DELAYED_WORK_ONSTACK() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:46:16 +00:00
Hans Petter Selasky
7f346854b8 Define the __kernel_size_t type in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:42:35 +00:00
Hans Petter Selasky
5a1d03bb7c Implement the task_pid_vnr() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:40:09 +00:00
Hans Petter Selasky
747d9a8165 Add "access" function pointer to the "vm_operations_struct" structure
in the LinuxKPI. While at it document when to use the "virtual_address" or
the "address" field in the "vm_fault" structure.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:37:28 +00:00
Hans Petter Selasky
0d2dce0b78 Implement mul_u32_u32() function in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:30:36 +00:00
Hans Petter Selasky
f446b7cab4 Implement timer_setup() and from_timer() function macros in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:20:20 +00:00
Ram Kishore Vegesna
ca21db8546 Issue:
Utility hangs when  OCS_IOCTL_CMD_MGMT_GET_ALL called in parallel on port 0 and port 1.

Fix: Using static structure for results is corrupting the second ioctl request. Removed static for results structure.
Approved by: ken
MFC after: 3 days
2018-06-05 15:05:26 +00:00
Ilya Bakulin
d670d9518f Enable high-speed on the card before increasing frequency on the controller
Increasing operating frequency without telling card to switch
to high-speed mode first upsets some cards and generates CRC errors.

While here, deselect / reselect cards after CMD6 and SCR fetch, as in original code.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15568
2018-06-05 11:03:24 +00:00