Commit Graph

108849 Commits

Author SHA1 Message Date
ed
e04e947299 Fix the build after breaking it in r285549.
I performed the commit on a different system as where I wrote the
change. After pulling in the change from Phabricator, I didn't notice
that a single chunk did not apply.

Approved by:	secteam (implicit, as intended change was approved)
Pointy hat to:	me
2015-07-14 20:45:24 +00:00
andrew
09018605c0 Also accept "ok" to enable a device, some vendor device trees use this when
they mean "okay"
2015-07-14 19:11:16 +00:00
ed
4f8313adcd Implement the CloudABI random_get() system call.
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.

This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.

Approved by:	secteam
Reviewed by:	jmg, markm, wblock
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3053
2015-07-14 18:45:15 +00:00
markj
0f95984131 Fix some error-handling bugs when core dump compression is enabled:
- Ensure that core dump parameters are initialized in the error path.
- Don't call gzio_fini() on a NULL stream.

Reported by:	rpaulo
2015-07-14 18:24:05 +00:00
ed
f01f2379e0 Regenerate system call table for r285540. 2015-07-14 15:12:24 +00:00
ed
7073f2e35c Implement thread_tcb_set() and thread_yield().
The first system call is used to set the user TLS address. Right now
this system call is invoked by the C library for both the initial thread
and additional threads unconditionally, but in the future we'll only
call this if the architecture does not support this. On recent x86-64
CPUs we could use the WRFSBASE instruction.

This system call was erroneously placed in sys/compat/cloudabi64, even
though it does not depend on any pointer size dependent datastructure.
Move it to the right place.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 15:11:50 +00:00
ed
295791c35b Implement {,p}{read,write}{,v}().
Add a routine similar to copyinuio() and freebsd32_copyinuio() that
copies in CloudABI's struct iovecs. These are then translated into
FreeBSD format and placed in a 'struct uio', so we can call into the
kern_*() functions.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 14:33:21 +00:00
andrew
f206c90cda Set memory to be inner-sharable. This isn't needed on device memory as the
MMU will ignore the attribute there, howeverit simplifies to code to alwas
set it.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-14 12:37:47 +00:00
ed
0ca2e5d58e Let proc_raise() call into pksignal() directly.
Summary:
As discussed with kib@ in response to r285404, don't call into
kern_sigaction() within proc_raise() to reset the signal to the default
action before delivery. We'd better do that during image execution.

Change the code to simply use pksignal(), so we don't waste cycles on
functions like pfind() to look up the currently running process itself.

Test Plan:
This change has also been pushed into the cloudabi branch on GitHub. The
raise() tests still seem to pass.

Reviewers: kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3076
2015-07-14 12:16:14 +00:00
zbb
f8ff9f2f9b Fix secondary PIC initialization order
Call arm_init_secondary before any other PIC-related functions
are called. This is necessary for GICv3 where PIC_INIT_SECONDARY
allocates resources needed for all further operations.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3066
2015-07-14 12:02:56 +00:00
zbb
820d648934 Fix intr_machdep.c for ARM64
On ARMv8 IPIs are mapped to 0-15. Incrementing the number by 16
is wrong, because it sets a reserved bit in the IPI register.
This patch removes all "+16" to comply with specs.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3029
2015-07-14 11:59:43 +00:00
brueffer
d9ba778236 Spell crypto correctly. 2015-07-14 10:47:56 +00:00
hiren
96a5a3791e Expose full 32bit RSS hash from card regardless of whether RSS is defined or
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.

While here, delete the FreeBSD version check and use of deprecated M_FLOWID.

Reviewed by:	adrian, erj
MFC after:	1 week
Sponsored by:	Limelight Networks
2015-07-14 09:13:18 +00:00
np
0f6290037a cxgbe(4): Update T4 and T5 firmwares to 1.14.2.0.
Obtained from:	Chelsio Communications
MFC after:	3 days
2015-07-14 08:02:05 +00:00
jmg
a765cff836 Fix XTS, and name things a bit better...
Though confusing, GCM using ICM_BLOCK_LEN, but ICM does not is
correct...  GCM is built on ICM, but uses a function other than
swcr_encdec...  swcr_encdec cannot handle partial blocks which is
why it must still use AES_BLOCK_LEN and is why XTS was broken by the
commit...

Thanks to the tests for helping sure I didn't break GCM w/ an earlier
patch...

I did run the tests w/o this patch, and need to figure out why they
did not fail, clearly more tests are needed...

Prodded by:	peter
2015-07-14 07:45:18 +00:00
jmg
fb11a07bc9 fix typos..
Submitted by:	brueffer
2015-07-14 06:34:57 +00:00
adrian
af6717c229 Populate hw.model with the CPU model information.
Now you see something like:

# sysctl hw.model
hw.model: Atheros AR9330 rev 1

Tested:

* Carambola 2, AR9331 SoC
2015-07-14 05:14:10 +00:00
jmg
52344fe373 cryptodev is not needed for TCP_SIGNATURE...
Comment that cryptodev shouldn't be used unless you know what you're
doing...

The various arm/mips and one powerpc configs that have cryptodev in
them need to be addressed, audited if they provide benefit and removed
if they don't...
2015-07-14 05:09:58 +00:00
cem
576619e564 Fix cleanup race between unp_dispose and unp_gc
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.

This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.

To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

PR:		194264
Differential Revision:	https://reviews.freebsd.org/D3044
Reviewed by:	mjg (earlier version)
Approved by:	markj (mentor)
Obtained from:	mjg
MFC after:	1 month
Sponsored by:	EMC / Isilon Storage Division
2015-07-14 02:00:50 +00:00
mjg
2ee8352407 exec: textvp -> oldtextvp; binvp -> newtextvp
This makes it consistent with the rest of the naming in do_execve.

No functional changes.
2015-07-14 01:13:37 +00:00
mjg
1fc8c9c24b exec plug a redundant vref + vrele of the image vnode 2015-07-14 00:43:08 +00:00
mjg
2c8a153253 racct: perform a lockless check for p_throttled
This reduces proc lock contention.

Reviewed by:	trasz
2015-07-13 22:52:11 +00:00
mav
bbe5427644 Switch initiator IDs in target mode to the same address space as target
IDs in initiator mode -- index in port database instead of handlers.

This makes initiator IDs persist across role changes and firmware resets,
when handlers previously assigned by firmware are lost and reused.

Sponsored by:	iXsystems, Inc.
2015-07-13 21:01:24 +00:00
loos
a59cb6a48a Bring a few simplifications to a10_gpio:
o Return the real hardware state in gpio_pin_getflags() instead of keep
   the last state in an internal table.  Now the driver returns the real
   state of pins (input/output and pull-up/pull-down) at all times.
 o Use a spin mutex.  This is required by interrupts and the 1-wire code.
 o Use better variable names and place parentheses around them in MACROS.
 o Do not lock the driver when returning static data.

Tested with gpioled(4) and DS1820 (1-wire) sensors on banana pi.
2015-07-13 18:19:26 +00:00
cem
08bb34a547 pipe_direct_write: Fix mismatched pipelock/unlock
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.

Reported by:	pho
Differential Revision:	https://reviews.freebsd.org/D3069
Reviewed by:	kib
Approved by:	markj (mentor)
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-07-13 17:45:22 +00:00
mav
d576761ee8 Make role sysctl handling from r284727 less strict. 2015-07-13 15:51:28 +00:00
mav
5a7280556e Unify port database use for target and initiator roles.
Aside from cleaner and more consistent code, this allows ports to be both
target and initiator same time, and easily switch from any role to any.

Sponsored by:	iXsystems, Inc.
2015-07-13 15:11:05 +00:00
luigi
c1ee844bc0 set the refcount for the structure (dropped by mistake in the last commit). 2015-07-13 10:23:52 +00:00
markm
054d3d8bb0 Rework the read routines to keep the PRNG sources happy. These work
in units of crypto blocks, so must have adequate space to write.
This means needing to be careful about buffers and keeping track
of external read request length.

Approved by:	so (/dev/random blanket)
2015-07-13 08:38:21 +00:00
adrian
20587f04e1 Fixes the RF switch state polling by comparing with the revision of the
PHY instead of the revision of the RADIO.

This fixes the RF switch state polling.

This is from DragonflyBSD, Commit 202e28d1f65e9f35df6032400df3242a3bafb483

Obtained from:	DragonflyBSD
2015-07-13 05:13:39 +00:00
ian
93f91db3aa Add PRINTF_BUFR_SIZE=128 to avoid interleaved output. 2015-07-12 19:58:12 +00:00
ian
098fcc1952 Use the monotonic (uptime) counter rather than time-of-day to measure elapsed
time between ntp_adjtime() clock offset adjustments.  This eliminates spurious
frequency steering after a large clock step (such as a 1970->2015 step on a
system with no battery-backed clock hardware).

This problem was discovered after the import of ntpd 4.2.8, which does things
in a slightly different (but still correct) order than the 4.2.4 we had
previously.  In particular, 4.2.4 would step the clock then immediately after
use ntp_adjtime() to set the frequency and offset to zero, which captured the
post-step time-of-day as a side effect.  In 4.2.8, ntpd sets frequency and
offset to zero before any initial clock step, capturing the time as 1970-ish,
then when it next calls ntp_adjtime() it's with a non-zero offset measurement.
This non-zero value gets multiplied by the apparent 45-year interval, which
blows up into a completely bogus frequency steer.  That gets clamped to
500ppm, but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.
2015-07-12 18:38:17 +00:00
zbb
f3029e5e66 Add ARM64TODO comments to ACPI PCI stubs
This will make searching for missing functionalities easier.
2015-07-12 18:32:16 +00:00
markm
4637ff6821 * Address review (and add a bit myself).
- Tweek man page.
 - Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA.
 - Tidy up headers a bit.
 - Tidy up declarations a bit.
 - Make static in a couple of places where needed.
 - Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used.
 - Get rid of random_*_process_buffer() functions. They were only used in one place each, and are better subsumed into those places.
 - Remove *_post_read() functions as they are stubs everywhere.
 - Assert against buffer size illegalities.
 - Clean up some silly code in the randomdev_read() routine.
 - Make the harvesting more consistent.
 - Make some requested argument name changes.
 - Tidy up and clarify a few comments.
 - Make some requested comment changes.
 - Make some requested macro changes.

* NOTE: the thing calling itself a 'unit test' is not yet a proper
  unit test, but it helps me ensure things work. It may be a proper
  unit test at some time in the future, but for now please don't make
  any assumptions or hold any expectations.

Differential Revision:	https://reviews.freebsd.org/D2025
Approved by:	so (/dev/random blanket)
2015-07-12 18:14:38 +00:00
zbb
69cdb114a9 Implement stubs for ACPI PCI routines
ACPI driver requires special functions to be provided by machdep code.
Add temporary stubs to satisfy the compiler when both "pci" and "acpi"
are enabled in the kernel configuration file.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3028
2015-07-12 17:28:31 +00:00
bz
782cc51566 Try to unbreak the build after r285390 removing the obsolete static
declaration.
2015-07-12 00:26:22 +00:00
loos
5e85f1c8a9 Return the FDT node of the GPIO controller to gpiobus. It is used by the
children of gpiobus.
2015-07-11 21:09:43 +00:00
ed
c6e2a0a560 Implement normal and abnormal process termination.
CloudABI does not provide an explicit kill() system call, for the reason
that there is no access to the global process namespace. Instead, it
offers a raise() system call that can at least be used to terminate the
process abnormally.

CloudABI does not support installing signal handlers. CloudABI's raise()
system call should behave as if the default policy is set up. Call into
kern_sigaction(SIG_DFL) before calling sys_kill() to force this.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-11 19:41:31 +00:00
ed
84e10df8b1 Use FDDUP_NORMAL instead of hardcoding value 0.
Proposed by:	mjg
2015-07-11 18:53:30 +00:00
ed
a75cc35b71 Add missing function parameter.
A function parameter got added in r285356, meaning that the call to
kern_dup() needs to be patched up.
2015-07-11 18:39:16 +00:00
jhibbits
34875849ac cpu_number and cpu_swapout are never used, and only defined in powerpc. 2015-07-11 17:33:50 +00:00
mjg
b9cbaea559 linprocfs: vref the vnode passed to vn_fullpath 2015-07-11 16:44:28 +00:00
mjg
b3b0716b63 vfs: always clear VI_OWEINACT in consumers bumping v_usecount
Previously vputx would detect the condition and clear the flag.

With this change it is invalid to have both v_usecount > 0 and the flag
set. Assert the condition is met in all revlevant places.

Reviewed by:	kib
2015-07-11 16:28:55 +00:00
mjg
74d7b1e72e vfs: move si_usecount manipulation to dedicated functions
Reviewed by:	kib
2015-07-11 16:28:12 +00:00
mjg
a85ed5531d Create a dedicated function for ensuring that cdir and rdir are populated.
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by:	kib
2015-07-11 16:22:48 +00:00
mjg
c71e9ab863 Move chdir/chroot-related fdp manipulation to kern_descrip.c
Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib
2015-07-11 16:19:11 +00:00
andrew
cf5550c91f Always send a SIGSEGV on a map failure. Use the code to tell the reason
for the signal.

Sponsored by:	ABT Systems Ltd
2015-07-11 16:02:06 +00:00
adrian
11d40ec060 Regenerate syscalls. 2015-07-11 15:22:11 +00:00
adrian
41db4b88e0 Add an initial NUMA affinity/policy configuration for threads and processes.
This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.

* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
  the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
  if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
  policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
  policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
  set in a variety of methods.

This is only relevant for very specific workloads.

This doesn't pretend to be a final NUMA solution.

The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.

This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.

Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.

Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.

Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.

Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!

Tested:

* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)

* Peter Holm ran a stress test suite on this work and found one
  issue, but has not been able to verify it (it doesn't look NUMA
  related, and he only saw it once over many testing runs.)

* I've tested bhyve instances running in fixed NUMA domains and cpusets;
  all seems to work correctly.

Verified:

* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
  NUMA policies for processes under test.

Review:

This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@.  The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).

This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus.  My hope is that with further
exposure and testing more functionality can be implemented and evaluated.

Notes:

* The VM doesn't handle unbalanced domains very well, and if you have an overly
  unbalanced memory setup whilst under high memory pressure, VM page allocation
  may fail leading to a kernel panic.  This was a problem in the past, but it's
  much more easily triggered now with these tools.

* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
  affect contigmalloc, KVA placement, UMA, etc.  So, driver placement of memory
  isn't really guaranteed in any way.  That's next on my plate.

Sponsored by:	Norse Corp, Inc.; Dell
2015-07-11 15:21:37 +00:00
kib
564e88d499 Do not allow creation of the dirty buffers for the dead buffer
objects, i.e. for buffer objects which vnode was reclaimed.  Buffer
cache cannot write such buffers.  Return the error and discard the
buffer immediately on write attempt.

BO_DIRTY now always set during vnode reclamation, since it is used not
only for the INVARIANTS checks.  Do allow placement of the clean
buffers on dead bufobj list, otherwise filesystems cannot use bufcache
at all after the devvp reclaim.

Reported and tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-11 11:21:56 +00:00