CloudABI does not provide an explicit kill() system call, for the reason
that there is no access to the global process namespace. Instead, it
offers a raise() system call that can at least be used to terminate the
process abnormally.
CloudABI does not support installing signal handlers. CloudABI's raise()
system call should behave as if the default policy is set up. Call into
kern_sigaction(SIG_DFL) before calling sys_kill() to force this.
Obtained from: https://github.com/NuxiNL/freebsd
Previously vputx would detect the condition and clear the flag.
With this change it is invalid to have both v_usecount > 0 and the flag
set. Assert the condition is met in all revlevant places.
Reviewed by: kib
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.
Reviewed by: kib
This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.
* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
set in a variety of methods.
This is only relevant for very specific workloads.
This doesn't pretend to be a final NUMA solution.
The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.
This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.
Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.
Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.
Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.
Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!
Tested:
* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)
* Peter Holm ran a stress test suite on this work and found one
issue, but has not been able to verify it (it doesn't look NUMA
related, and he only saw it once over many testing runs.)
* I've tested bhyve instances running in fixed NUMA domains and cpusets;
all seems to work correctly.
Verified:
* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
NUMA policies for processes under test.
Review:
This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@. The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus. My hope is that with further
exposure and testing more functionality can be implemented and evaluated.
Notes:
* The VM doesn't handle unbalanced domains very well, and if you have an overly
unbalanced memory setup whilst under high memory pressure, VM page allocation
may fail leading to a kernel panic. This was a problem in the past, but it's
much more easily triggered now with these tools.
* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory
isn't really guaranteed in any way. That's next on my plate.
Sponsored by: Norse Corp, Inc.; Dell
objects, i.e. for buffer objects which vnode was reclaimed. Buffer
cache cannot write such buffers. Return the error and discard the
buffer immediately on write attempt.
BO_DIRTY now always set during vnode reclamation, since it is used not
only for the INVARIANTS checks. Do allow placement of the clean
buffers on dead bufobj list, otherwise filesystems cannot use bufcache
at all after the devvp reclaim.
Reported and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.
(detected by jenkins run with gcc 4.9)
Update documentation on the use of netmap_priv_d,
rename the refcount and use the same structure in
FreeBSD and linux
No functional changes.
- make mode enum start from 0 so that the assertion covers all cases [1]
- rename prefix _CLOEXEC flag with _FLAG
- postpone fhold on the old file descriptor, which eliminates the need to fdrop
in error cases.
- fixup FDDUP_FCNTL check missed in the previous commit
This removes 'fp == oldfde->fde_file' assertion which had little value. kern_dup
only calls fd-related functions which cannot drop the lock or a whole lot of
races would be introduced.
Noted by: kib [1]
versions of the x87 tags. The conversion is naive, used abridged tag
is converted to valid unabridged, without additional checks for zero
and special values.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
i386/include/frame.h after a code was moved from machdep.c to frame.h
in r284925.
Use include guards style similar to other guards.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
to more C11-ish atomic_thread_fence_seq_cst().
Note that on PowerPC, which currently uses lwsync for mb(), the change
actually fixes the missed store/load barrier, intended by r271604 [*].
Reviewed by: alc
Noted by: alc [*]
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
We currently return EINVAL when calling listen() on a UNIX socket that
has not been bound to a pathname. If my interpretation of POSIX is
correct, we should return EDESTADDRREQ: "The socket is not bound to a
local address, and the protocol does not support listening on an unbound
socket."
Return EDESTADDRREQ instead when not bound and not connected.
Differential Revision: https://reviews.freebsd.org/D3038
Reviewed by: gnn, network
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)
- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.
MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco
Detected by clang 3.7.0 with the warning:
sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c:309:18: error: variable
'rptr' is uninitialized when used here [-Werror,-Wuninitialized]
chp->cq.rptr = rptr;
^~~~
MFC after: 1 week
mode and with hardware support on systems that have AESNI instructions.
Differential Revision: D2936
Reviewed by: jmg, eri, cognet
Sponsored by: Rubicon Communications (Netgate)
The logic is reorganised so that there is one exit point prior to the
lookup loop. This is an intermediate step to making audit logging
functions use found vnode instead of translating ni_dirfd on their own.
ni_startdir validation is removed. The only in-tree consumer is nfs
which already makes sure it is a directory.
Reviewed by: kib
apparently neither clang nor gcc complain about this.
But clang intis the var to NULL correctly while gcc on at least mips does not.
Correct the undefined behavior by initializing the variable properly.
PR: 201371
Differential Revision: https://reviews.freebsd.org/D3036
Reviewed by: gnn
Approved by: gnn(mentor)
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.
The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.
This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.
Differential Revision: https://reviews.freebsd.org/D3035
Reviewed by: mjg
namei used to vref fd_cdir, which was immediatley vrele'd on entry to
the loop.
Check for absolute lookup and vref the right vnode the first time.
Reviewed by: kib
fd_rdir vnode was stored in ni_rootdir without refing it in any way,
after which the filedsc lock was being dropped.
The vnode could have been freed by mountcheckdirs or another thread doing
chroot.
VREF the vnode while the lock is held.
Reviewed by: kib
MFC after: 1 week
and psci to start them. I expect ACPI support to be added later.
This has been tested on qemu with 2 cpus as that is the current value of
MAXCPUS. This is expected to be increased in the future as FreeBSD has
been tested on 48 cores on the Cavium ThunderX hardware.
Partially based on a patch from Robin Randhawa from ARM.
Approved by: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3024
include this file without first including the headers needed for uint32_t
and the like use the __foo type.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.
Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.
Differential Revision: https://reviews.freebsd.org/D3032
Reviewed by: kib
This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.
The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.
Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.
New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:
printf("is thunder: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0));
printf("is generic: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_ARM, CPU_PART_FOUNDATION, 0, 0));
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3030
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory. The result is the race and accesses to the freed memory.
Refcount the selfd ownership. One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.
Reported by: Larry Rosenman <ler@lerctr.org>
Tested by: Larry Rosenman <ler@lerctr.org>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.
CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.
The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.
We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.
The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().
More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA
Differential Revision: https://reviews.freebsd.org/D2848
Reviewed by: emaste, brooks
Obtained from: https://github.com/NuxiNL/freebsd
session in multiple threads w/o locking.. There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...
This patch adds a per cpu context and a lock to protect it...
It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...
I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines... The patch in D3015
was used to verify that this fix does fix the issue...
Reviewed by: gnn, kib (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D3016
for timehands consumers, by using fences.
Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).
Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].
Noted by: alc [*]
Requested by: bde [**]
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
seq_write_begin(), instead of the load_rmb/rbm_load functions. The
update does not need to be atomic due to the write lock owned.
Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.
For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().
In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.
Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
On platforms which are fully IO-coherent, the map might be null.
We need to guarantee that all data is observable after the
sync operation is called. Add a memory barrier to ensure that on ARM.
Reviewed by: andrew, kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3012
The number of available lock list entries for a thread is LOCK_CHILDCOUNT,
and each entry can record up to LOCK_NCHILDREN locks. When iterating over
the locks held by a thread, a bound on the loop index is therefore given
by LOCK_CHILDCOUNT * LOCK_NCHILDREN; WITNESS_COUNT is an unrelated
constant.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2974
The UEFI loader on the 10.1 release install disk (disc1) modifies an
existing EFI_DEVICE_PATH_PROTOCOL instance in an apparent attempt to
truncate the device path. In doing so it creates an invalid device
path.
Perform the equivalent action without modification of structures
allocated by firmware.
PR: 197641
MFC After: 1 week
Submitted by: Chris Ruffin <chris.ruffin@intel.com>
Place sched_random nearer to where it's first used: moving the
code nearer to where it is used makes the code easier to read
and we can reduce the initial "#ifdef SMP" island.
Reword a little the comment and clean some whitespaces
while here.
The 6205 (Taylor Peak) in the Lenovo X230 works fine in 5GHz 11a and 11n HT20,
but not 11n HT40. The NIC goes RX deaf the moment HT40 is configured.
It's so RX deaf that it doesn't even hear beacons and the firmware sends
"BEACON MISS" events. That's pretty deaf.
I tried configuring up the HT40 flags in monitor mode and it worked - so
I assumed that doing the transition from 20 -> 40MHz channel configuration
when going auth->assoc (ie, after the NIC has been partially configured)
is a problem.
So for now, let's just always set them if they're available.
Tested:
* Intel 5300, STA mode, 5GHz HT/40 AP; 2GHz HT/20 AP
* Intel 6205, STA mode, 5GHz HT/40, HT20, 11a AP; 2GHz HT/20 AP
This was pointed out to me by coworkers trying to use FreeBSD-HEAD
in the office on their Thinkpad T420p laptops.
TODO:
* I don't like how the HT40 flags are configured - the whole interop/
protection config should be re-checked. Notably, I think curhtprotmode
is 0 in a lot of cases, which means "no interoperability" and i think
that's busted.
Sponsored by: Norse Corp, Inc.
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
flags are not specified... This bug was introduced in r275732...
This only affects IPsec ESP only policies w/ the aesni module loaded,
other subsystems specify one or both of the flags...
Reviewed by: gnn, delphij, eri
Add ARM ITS (Interrupt Translation Services) support required
to bring-up message signalled interrupts on some ARM64 platforms.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
pointer is NULL, as in that case there are no userland pages that
could potentially be wired. It is common for old to be NULL and
oldlenp to be non-NULL in calls to userland_sysctl(), as this is used
to probe for the length of a variable-length sysctl entry before
retrieving a value. Note that it is typical for such calls to be made
with an uninitialized value in *oldlenp, so sysctlmemlock was
essentially being acquired at random (depending on the uninitialized
value in *oldlenp being > PAGE_SIZE or not) for these calls prior to
this patch.
Differential Revision: https://reviews.freebsd.org/D2987
Reviewed by: mjg, kib
Approved by: jmallett (mentor)
MFC after: 1 month
Summary:
Both booke and AIM interrupt.c files contain nearly identical code. This merges
the two files, to reduce duplication.
Reviewers: #powerpc, marcel
Reviewed By: marcel
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2991
bd_hdrcmplt. As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt. Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.
Differential Revision: https://reviews.freebsd.org/D2989
Reviewed by: gnn, melifaro
Approved by: jmallett (mentor)
MFC after: 3 days
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.
Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D2937
To avoid conflicts between target and initiator devices in CAM, make
CTL use target ID reported by HBA as its initiator_id in XPT_PATH_INQ.
That target ID is known to never be used for initiator role, so it won't
conflict. For Fibre Channel and FireWire HBAs this specific ID choice
is irrelevant since all target IDs there are virtual. Same time for SPI
HBAs it seems could be even requirement to use same target ID for both
initiator and target roles.
While there are some more things to polish in isp(4) driver, first tests
of using both roles same time on the same port appeared successfull:
# camcontrol devlist -v
scbus0 on isp0 bus 0:
<FREEBSD CTLDISK 0001> at scbus0 target 1 lun 0 (da20,pass21)
<> at scbus0 target 256 lun 0 (ctl0)
<> at scbus0 target -1 lun ffffffff (ctl1)
FreeBSD never had limitation on number of target IDs, and there is no
any other requirement to allocate them densely. Since slots of port
database already populated just sequentially, there is no much need
for another indirection to allocate sequentially too.
On Book-E, physical addresses are actually 36-bits, not 32-bits. This is
currently worked around by ignoring the top bits. However, in some cases, the
boot loader configures CCSR to something above the 32-bit mark. This is stage 1
in updating the pmap to handle 36-bit physaddr.
This will print out the Memory Subsystem Status Register on MPC745x (G4+ class),
and the Machine Check Status Register on Book-E class CPUs, to aid in debugging
machine checks. Other relevant registers, for other CPUs, can be added in the
future.
directory sys/contrib/libnv.
The goal of this operation is to NOT install header files which shouldn't
be used outside the nvlist library.
Approved by: pjd (mentor)
make the find it does extremely expensive, so compute it only
once. Also make sure the 'traditional' module building method works at
the expense of a bit of duplicated code.
and start teaching subsystems about it.
The Atheros MIPS platforms don't guarantee any kind of FIFO consistency
with interrupts in hardware. So software needs to do a flush when it
receives an interrupt and before it calls the interrupt handler.
There are new ones for the QCA934x and QCA955x, so do a few things:
* Get rid of the individual ones (for ethernet and IP2);
* Create a mux and enum listing all the variations on DDR flushes;
* replace the uses of IP2 with the relevant one (which will typically
be "PCI" here);
* call the USB DDR flush before calling the real USB interrupt handlers;
* call the ethernet one upon receiving an interrupt that's for us,
rather than never calling it during operation.
Tested:
* QCA9558 (TP-Link archer c7 v2)
* AR9331 (Carambola 2)
TODO:
* PCI, USB, ethernet, etc need to do a double-check to see if the
interrupt was truely for them before doing the DDR. For now I
prefer "correct" over "fast".
This stops the panics that occur on MIPS platforms when doing say,
'sysctl dev.ath.0' whilst the MAC is asleep. The MIPS platform is
rather unforgiving in getting power-save register access wrong and you
will get all kinds of odd failures if you don't have things woken
up at the right times.
Tested:
* QCA9558 (TP-Link Archer C7 v2)
* AR9331 (Carambola 2)
.. with no VAPs configured and ath0 down (thus the MAC is definitely
asleep.)
PR: kern/201117
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.
PR: 201073
Submitted by: masato@itc.naist.jp
MFC after: 2 weeks
Update setkey and libipsec to understand aes-gcm-16 as an
encryption method.
A partial commit of the work in review D2936.
Submitted by: eri
Reviewed by: jmg
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
change to GMAC easier on A20 SoCs.
On A10 only the EMAC controller is available (fast ethernet), but on A20
there is also GMAC a high (or better) performant controller (gigabit
ethernet).
On A20 the both controllers uses the same pins to talk to the ethernet PHY
(MII or RGMII) and they can be selected by the GPIO pin mux.
There is work in progress to bring in GMAC support.
When IPSEC is enabled on the kernel the forwarding path has an optimization to not enter the code paths
for checking security policies but first checks if there is any security policy active at all.
The patch introduces the same optimization but for traffic generated from the host itself.
This reduces the overhead by 50% on my tests for generated host traffic without and SP active.
Differential Revision: https://reviews.freebsd.org/D2980
Reviewed by: ae, gnn
Approved by: gnn(mentor)
The Allwinner SoC has an AHCI device on its internal main bus rather
than the PCI bus. This SoC is somewhat underdocumented, and its SATA
controller is no exception. The methods to support this chip were
harvested from the Linux Allwinner SDK, and then constants invented to
describe what's going on based on low-level constants contained in the
SATA standard and guess work.
This SoC requires a specific AHCI channel setup in order to start the
operations on the channel properly.
Clock setup and AHCI channel setup idea came from NetBSD.
Tested on Cubieboard 2 and Banana pi (and attachment on Cubieboard by
Pratik Singhal).
Differential Revision: https://reviews.freebsd.org/D737
Submitted by: imp
Reviewed by: imp, ganbold, mav, andrew
Try to preserve the xn configuration when migrating. This is not always
possible since the backend might not have the same set of options
available, in which case we will try to preserve as many as possible.
MFC after: 2 weeks
PR: 183139
Reported by: mcdouga9@egr.msu.edu
Sponsored by: Citrix Systems R&D
kmalloc() call. Make function global instead of static inline to fix
compiler warnings about passing variable argument lists to inline
functions.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The SoC, the flash, the ethernet ports and ethernet switch all work.
The USB works.
The 11ac PCIe NIC internally is at least seen by the PCIE RC, but
I haven't tried using it yet. There's no driver and I haven't
yet swapped it out for a non-11ac chip.
The on-chip 2GHz wifi works, but there are some data errors that
get thrown up in STA mode when scanning. I have a feeling I have
to finish the DDR flush code out and have it run correctly on the
shared interrupts; that'll take a bit of time to get right.
But if you're after an updated piece of hardware, the Archer C7 v2
is certainly there, and you can replace the 11ac NIC with a 3x3
Atheros PCIe device (eg AR9380, AR9390, AR9580, etc) and it'll
"just work".
Tested:
* TP-Link archer c7v2.
The Tp-link Archer-C7v2 unit has a QCA9558 internally but hangs the
QCA988x 11ac PCIe NIC off of PCI RC #1, not #0.
So I actually finally /do/ have a board to verify whether PCIe is working.
Grr.
Tested:
* TP-Link Archer-C7v2.
lightly used. Find the proper .m file when we depend on *_if.[ch] in
the srcs line, with seat-belts for false positive matches. This uses
make's path mechanism. A further refinement would be to calculate this
once, and then pass the resulting _MPATH to modules submakes.
Differential Revision: https://reviews.freebsd.org/D2327
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.
Reported by: alex.burlyga.ietf@gmail.com
Tested by: alex.burlyga.ietf@gmail.com
MFC after: 2 weeks
ip_forward() does a route lookup for testing this packet can be sent to a known destination,
it also can do another route lookup if it detects that an ICMP redirect is needed,
it forgets all of this and handovers to ip_output() to do the same lookup yet again.
This optimisation just does one route lookup during the forwarding path and handovers that to be considered by ip_output().
Differential Revision: https://reviews.freebsd.org/D2964
Approved by: ae, gnn(mentor)
MFC after: 1 week
user address when ABI uses shared page.
Note that the change is no-op for correctness, since shared page does
not fault. The mapping for the shared page is installed at the
address space creation, the page is unmanaged and its pte/pv entry
cannot be reclaimed.
Submitted by: Oliver Pinter
Review: https://reviews.freebsd.org/D2954
MFC after: 1 week
macros on amd64 and i386. Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().
On i386, correct the lowest kernel address. After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]
Submitted by: Oliver Pinter [*]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
asserts are made. Remove them, since we might dereference freed
memory. Leaked locks are asserted by the syscall return code anyway.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We now take z_teardown_lock as a writer to ensure that there is no I/O
while the filesystem state is in a flux. Also, zfs_suspend_fs() ->
zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() ->
zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no
synchronization between those calls and the calls in the re-mounting
case. That could lead to concurrent execution and a crash.
PR: 180060
Differential Revision: https://reviews.freebsd.org/D2865
Suggested by: mahrens
Reviewed by: delphij, pho, mahrens, will
MFC after: 13 days
Sponsored by: ClusterHQ
According to report, some recent unrelated changes in the driver triggered
timeouts when testing for absent port multiplier. Cause of this behavior
channge is unclear, but since these chips are old, rare and buggy, it is
easier to just disable port multiplier support, same as done in Linux.
Reported by: bar
MFC after: 3 days
DMA handles all data transfers up to 128K or 16 segments and fallback to
pio mode when DMA requirements are not met.
The read performance has improved greatly while the write performance also
showed some improvement but seems limited by the card type and quality.
Submitted by: Pratik Singhal <pratiksinghal@freebsd.org>
Sponsored by: Google Summer of Code 2015
Tested on: A10 (cubieboard) and A20 (cubieboard 2 and banana pi)
them when a different thread last used them, or when the thread was last
run on a different cpu.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
process beyond the end of the process address space. Such setting is
not dangerous to the kernel integrity, but it causes confusing
application misbehaviour.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.
* src/share/man/...
- Update the man pages.
* src/etc/...
- The startup/shutdown work is done in D2924.
* src/UPDATING
- Add UPDATING announcement.
* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.
* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.
* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.
* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.
* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.
* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.
* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.
* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.
* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.
Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
condition.
If you send a 0-length packet, but there is data is the socket buffer, and
neither the rexmt or persist timer is already set, then activate the persist
timer.
PR: 192599
Differential Revision: D2946
Submitted by: jlott at averesystems dot com
Reviewed by: jhb, jch, gnn, hiren
Tested by: jlott at averesystems dot com, jch
MFC after: 2 weeks
booting on a PC with CMOS clock set to a year before 2000.
This uses 1980 (instead of 1970 as in the initial patch) as pivot year as
suggested by imp in the PR followup.
PR: 195703
Submitted by: cs@soi.spb.ru
Reviewed by: imp
MFC after: 1 weeks
restore the FPU state from the format of machine FSAVE area. The
intended use is for ABI emulators to provide FSAVE-formatted FPU state
to usermode requiring it, while kernel could use FXSAVE due to
XMM/XSAVE.
The core functionality to convert from/to FXSAVE format is shared with
the fill_fpregs_xmm() and set_fpregs_xmm(). Move the later functions
to npx.c and rename them to npx_fill_fpregs_xmm() and
npx_set_fpregs_xmm(). They differ from nptx_get/set_fsave(9) since
our mcontext contains padding to be zeroed or ignored.
fill_fpregs() and set_fpregs() could be converted to use the new
interface, but there are small differences to handle.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers. The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE),
override the corresponding segment registers.
The main purpose of the operations is to retrieve and modify the tcb
address for debuggee.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The sentinel attribute was originally implemented in OpenBSD's gcc and
later adopted by upstream GCC 4.0 (and clang). From the OpenBSD's
gcc-local manpage:
- gcc recognizes the extra attribute __sentinel__, which can be used to
mark varargs function that need a NULL pointer to mark argument
termination, like execl(3). This exposes latent bugs for 64-bit
architectures, where a terminating 0 will expand to a 32-bit int, and
not a full-fledged 64-bits pointer.
While here sort the visibility attributes.
Hinted-by: OpenBSD