We currently return EINVAL when calling listen() on a UNIX socket that
has not been bound to a pathname. If my interpretation of POSIX is
correct, we should return EDESTADDRREQ: "The socket is not bound to a
local address, and the protocol does not support listening on an unbound
socket."
Return EDESTADDRREQ instead when not bound and not connected.
Differential Revision: https://reviews.freebsd.org/D3038
Reviewed by: gnn, network
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)
- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.
MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco
Detected by clang 3.7.0 with the warning:
sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c:309:18: error: variable
'rptr' is uninitialized when used here [-Werror,-Wuninitialized]
chp->cq.rptr = rptr;
^~~~
MFC after: 1 week
mode and with hardware support on systems that have AESNI instructions.
Differential Revision: D2936
Reviewed by: jmg, eri, cognet
Sponsored by: Rubicon Communications (Netgate)
The logic is reorganised so that there is one exit point prior to the
lookup loop. This is an intermediate step to making audit logging
functions use found vnode instead of translating ni_dirfd on their own.
ni_startdir validation is removed. The only in-tree consumer is nfs
which already makes sure it is a directory.
Reviewed by: kib
apparently neither clang nor gcc complain about this.
But clang intis the var to NULL correctly while gcc on at least mips does not.
Correct the undefined behavior by initializing the variable properly.
PR: 201371
Differential Revision: https://reviews.freebsd.org/D3036
Reviewed by: gnn
Approved by: gnn(mentor)
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.
The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.
This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.
Differential Revision: https://reviews.freebsd.org/D3035
Reviewed by: mjg
namei used to vref fd_cdir, which was immediatley vrele'd on entry to
the loop.
Check for absolute lookup and vref the right vnode the first time.
Reviewed by: kib
fd_rdir vnode was stored in ni_rootdir without refing it in any way,
after which the filedsc lock was being dropped.
The vnode could have been freed by mountcheckdirs or another thread doing
chroot.
VREF the vnode while the lock is held.
Reviewed by: kib
MFC after: 1 week
and psci to start them. I expect ACPI support to be added later.
This has been tested on qemu with 2 cpus as that is the current value of
MAXCPUS. This is expected to be increased in the future as FreeBSD has
been tested on 48 cores on the Cavium ThunderX hardware.
Partially based on a patch from Robin Randhawa from ARM.
Approved by: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3024
include this file without first including the headers needed for uint32_t
and the like use the __foo type.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.
Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.
Differential Revision: https://reviews.freebsd.org/D3032
Reviewed by: kib
This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.
The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.
Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.
New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:
printf("is thunder: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0));
printf("is generic: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_ARM, CPU_PART_FOUNDATION, 0, 0));
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3030
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory. The result is the race and accesses to the freed memory.
Refcount the selfd ownership. One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.
Reported by: Larry Rosenman <ler@lerctr.org>
Tested by: Larry Rosenman <ler@lerctr.org>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.
CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.
The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.
We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.
The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().
More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA
Differential Revision: https://reviews.freebsd.org/D2848
Reviewed by: emaste, brooks
Obtained from: https://github.com/NuxiNL/freebsd
Merge upstream fix to eliminate build-breaking gcc warnings of no
importance.
commit: cab33b7a0acba7d2268a23c4383be6167106e549
Update ND_TTEST2 to fix issue 443
Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].
Reviewed by: adrian
Approved by: jmallett (mentor)
MFC after: 1 month
OCF w/o documentation...
Document the new (8+ year old) device_t way of handling things, that
_unregister_all will leave no threads in newsession, the _SYNC flag,
the requirement that a flag be specified...
Other minor changes like breaking up a wall of text into paragraphs...
session in multiple threads w/o locking.. There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...
This patch adds a per cpu context and a lock to protect it...
It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...
I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines... The patch in D3015
was used to verify that this fix does fix the issue...
Reviewed by: gnn, kib (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D3016
for timehands consumers, by using fences.
Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).
Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].
Noted by: alc [*]
Requested by: bde [**]
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
seq_write_begin(), instead of the load_rmb/rbm_load functions. The
update does not need to be atomic due to the write lock owned.
Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.
For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().
In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.
Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
jail.conf parameters. This flag disallows redefinition of the parameter.
"name" and/or "jid" are automatically defined in jail.conf by using
the jail names at the front of jail parameter definitions. However,
one could override them by using a variable with the same name like
$name = "foo". This confused the parser and could end up with SIGSEGV.
Note that this change also affects a case when all of parameters are
defined in the command line arguments, not in jail.conf. Specifically,
"jail -c name=j1 name=j2" no longer works. This should be harmless.
PR: 196574
Reviewed by: jamie
Differential Revision: https://reviews.freebsd.org/D3017