Remove useless release semantic for some stores to it_need. For
stores where the release is needed, add a comment explaining why.
Fence after the atomic_cmpset() op on the it_need should be acquire
only, release is not needed (see above). The combination of
atomic_cmpset() + fence_acq() is better expressed there as
atomic_cmpset_acq().
Use atomic_cmpset() for swi' ih_need read and clear.
Discussed with: alc, bde
Reviewed by: bde
Comments wording provided by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
with some cards that causes them to become deselected after probing for
switch capabilities. The old workaround fixes the behavior with some cards,
but causes problems with the cards the behave correctly and don't become
deselected. Forcing a deselect then reselect appears to work correctly
with all cards in initial testing.
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).
Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.
Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
a set of differentiated services, set IPTOS_PREC_* macros using
IPTOS_DSCP_* macro definitions.
While here, add IPTOS_DSCP_VA macro according to RFC 5865.
Differential Revision: https://reviews.freebsd.org/D3119
Reviewed by: gnn
LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and
we don't want to recurse if lockstat probes are enabled.
PR: 201642
Reviewed by: avg
MFC after: 3 days
enabled. The cost of a timecounter read can be quite significant, and the
problem became more apparent after r284297, since that change resulted in
a call to lockstat_nsecs() for each acquisition of an rwlock read lock.
PR: 201642
Reviewed by: avg
Tested by: Jason Unovitch
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D3073
It turns out that the CDDL sources already introduce a function called
thread_create(). I'll investigate what we can do to make these functions
coexist.
Reported by: Ivan Klymenko
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Differential Revision: https://reviews.freebsd.org/D3060
Reviewed by: hiren
Approved by: jmallett (mentor)
MFC after: 3 days
Sponsored by: Norse Corp, Inc.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.
This function is going to be used by the CloudABI compatibility layer.
Reviewed by: kib
MFC after: 1 month
Basing on B.2.3.4:
Synchronization and coherency issues between data and
instruction accesses.
To ensure that modified instructions are visible to all PEs
(Processing Elements) in a shareability domain one need to
perform following sequence:
1. Clean D-cache
2. Ensure the visibility of data cleaned from cache
3. Invalidate I-cache
4. Ensure completion
5. In SMP system PE must issue isb to ensure execution of the
modified instructions
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3106
Secondary stack calculation is modified to provide
stack_top = secondary_stacks + (cpu_id) * PAGE_SIZE * KSTACK_PAGES
because on ARM64 the stack grows to lower memory addresses.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3107
Previous DMAP size was too small for systems with more than 64GB
of RAM. Increase it to 128GB to support ThunderX CRB.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3113
Add support for the <sys/mman.h> functions by wrapping around our own
implementations. There are no kern_*() variants of these system calls,
but we also don't need them in this case. It is sufficient to just call
into the sys_*() functions.
Differential Revision: https://reviews.freebsd.org/D3033
Reviewed by: brooks
save it for later. This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).
MFC after: 1 month
belongs to the kernel stack address range for the thread. Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.
Reviewed by: andrew, emaste, markj
Differential revision: https://reviews.freebsd.org/D3108
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.
Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.
Reviewers: dchagin, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3098
1) Use the TX FIFO empty interrupts to poll the transmit FIFO usage,
instead of using own software counters and waiting for SOF
interrupts. Assume that enough FIFO space is available to execute one
USB OUT transfer of any kind when the TX FIFO is empty.
2) Use the host channel halted event to asynchronously wait for host
channels to be disabled instead of waiting for SOF interrupts. This
results in less turnaround time for re-using host channels and at the
same time increases the performance.
The network transmit performance measured by "iperf" for the "RPi-B v1
2011/12" board, increased from 45MBit/s to 65Mbit/s after applying the
changes above.
No regressions seen using:
- High Speed (BULK, CONTROL, INTERRUPT)
- Full Speed (All transfer types)
- Low Speed (Control and Interrupt)
MFC after: 1 month
Submitted by: Daisuke Aoyama <aoyama@peach.ne.jp>
Their primary use was in thread_cow_update to free up old resources.
Freeing had to be done with proc lock held and _cow_ funcs already knew
how to free old structs.
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.
Reviewed by: kib (earlier version)
Tested by: pho
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.
Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.
The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
This commit adds proper cache and shareability attributes to
the TCR register.
Set memory attributes to Normal, outer and inner cacheable WBWA.
Set shareability to inner and outer shareable when SMP is enabled.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3093
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.
Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?
CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).
Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:
int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
size_t fdslen);
This change introduces a set of fd*_remapped() functions:
- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
fdinstall_remapped().
We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().
Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.
Reviewers: kib, mjg
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3079
It appears that the linker will not handle 64-bit relocations at addresses that
are not aligned to 8-byte boundaries. Prior to this change the line:
.llong generictrap
was aligned to a 4-byte address, and the linker replaced that with an 8-byte
0x0. Aligning that address to 8 bytes caused the linker to generate the proper
relocation. As a follow-through, the dblow from trap_subr33.S used the code
sequence 'lwz %r1, TRAP_GENTRAP(0)', so this reproduces the analogue of that for
64-bit.
polling at device attach time [1].
Add tunables 'debug.uart_force_poll' and 'debug.uart_poll_freq' to control
uart polling.
Submitted by: Aleksey Kuleshov (rndfax@yandex.ru) [1]
architectures. Atomic_cmpset_int(9) is a direct replacement, due to
loop. The change fixes arm, arm64, mips an sparc64, which lack
atomic_swap().
Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
current value. It is believed that the change is the real fix for the
issue which was covered over by the r252683.
With the current code, if the interrupt handler sets it_need between
read and consequent reset, the update could be lost and
ithread_execute_handlers() would not be called in response to the lost
update.
The r252683 could have hide the issue since at the moment of commit,
atomic_load_acq_int() did locked cmpxchg on the variable, which puts
the cache line into the exclusive owned state and clears store
buffers. Then the immediate store of zero has very high chance of
reusing the exclusive state of the cache line and make the load and
store sequence operate as atomic swap.
For now, add the acq+rel fence immediately after the swap, to not
disturb current (but excessive) ordering. Acquire is needed for the
ih_need reads after the load, while release does not serve a useful
purpose [*].
Reviewed by: alc
Noted by: alc [*]
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Identify current CPU. This is necessary to setup
affinity registers and to provide support for
runtime chip identification.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3095
We can map these system calls directly to the FreeBSD counterparts. The
other filesystem related system calls will be sent out for review
separately, as they are a bit more complex to get right.
an if_ixv instance can now set at creation time, and the receive ring
tail pointer is correctly initialized (previously, things still worked
because the receive ring tail pointer was being fixed up as a side
effect of other activity).
Differential Revision: https://reviews.freebsd.org/D2922
Reviewed by: erj, gnn
Approved by: jmallett (mentor)
Sponsored by: Norse Corp, Inc.
I performed the commit on a different system as where I wrote the
change. After pulling in the change from Phabricator, I didn't notice
that a single chunk did not apply.
Approved by: secteam (implicit, as intended change was approved)
Pointy hat to: me
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.
This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.
Approved by: secteam
Reviewed by: jmg, markm, wblock
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3053
The first system call is used to set the user TLS address. Right now
this system call is invoked by the C library for both the initial thread
and additional threads unconditionally, but in the future we'll only
call this if the architecture does not support this. On recent x86-64
CPUs we could use the WRFSBASE instruction.
This system call was erroneously placed in sys/compat/cloudabi64, even
though it does not depend on any pointer size dependent datastructure.
Move it to the right place.
Obtained from: https://github.com/NuxiNL/freebsd
Add a routine similar to copyinuio() and freebsd32_copyinuio() that
copies in CloudABI's struct iovecs. These are then translated into
FreeBSD format and placed in a 'struct uio', so we can call into the
kern_*() functions.
Obtained from: https://github.com/NuxiNL/freebsd
Summary:
As discussed with kib@ in response to r285404, don't call into
kern_sigaction() within proc_raise() to reset the signal to the default
action before delivery. We'd better do that during image execution.
Change the code to simply use pksignal(), so we don't waste cycles on
functions like pfind() to look up the currently running process itself.
Test Plan:
This change has also been pushed into the cloudabi branch on GitHub. The
raise() tests still seem to pass.
Reviewers: kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3076
Call arm_init_secondary before any other PIC-related functions
are called. This is necessary for GICv3 where PIC_INIT_SECONDARY
allocates resources needed for all further operations.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3066
On ARMv8 IPIs are mapped to 0-15. Incrementing the number by 16
is wrong, because it sets a reserved bit in the IPI register.
This patch removes all "+16" to comply with specs.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3029
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.
While here, delete the FreeBSD version check and use of deprecated M_FLOWID.
Reviewed by: adrian, erj
MFC after: 1 week
Sponsored by: Limelight Networks
Though confusing, GCM using ICM_BLOCK_LEN, but ICM does not is
correct... GCM is built on ICM, but uses a function other than
swcr_encdec... swcr_encdec cannot handle partial blocks which is
why it must still use AES_BLOCK_LEN and is why XTS was broken by the
commit...
Thanks to the tests for helping sure I didn't break GCM w/ an earlier
patch...
I did run the tests w/o this patch, and need to figure out why they
did not fail, clearly more tests are needed...
Prodded by: peter
Comment that cryptodev shouldn't be used unless you know what you're
doing...
The various arm/mips and one powerpc configs that have cryptodev in
them need to be addressed, audited if they provide benefit and removed
if they don't...
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.
This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.
To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.
PR: 194264
Differential Revision: https://reviews.freebsd.org/D3044
Reviewed by: mjg (earlier version)
Approved by: markj (mentor)
Obtained from: mjg
MFC after: 1 month
Sponsored by: EMC / Isilon Storage Division
IDs in initiator mode -- index in port database instead of handlers.
This makes initiator IDs persist across role changes and firmware resets,
when handlers previously assigned by firmware are lost and reused.
Sponsored by: iXsystems, Inc.
o Return the real hardware state in gpio_pin_getflags() instead of keep
the last state in an internal table. Now the driver returns the real
state of pins (input/output and pull-up/pull-down) at all times.
o Use a spin mutex. This is required by interrupts and the 1-wire code.
o Use better variable names and place parentheses around them in MACROS.
o Do not lock the driver when returning static data.
Tested with gpioled(4) and DS1820 (1-wire) sensors on banana pi.
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.
Reported by: pho
Differential Revision: https://reviews.freebsd.org/D3069
Reviewed by: kib
Approved by: markj (mentor)
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Aside from cleaner and more consistent code, this allows ports to be both
target and initiator same time, and easily switch from any role to any.
Sponsored by: iXsystems, Inc.
in units of crypto blocks, so must have adequate space to write.
This means needing to be careful about buffers and keeping track
of external read request length.
Approved by: so (/dev/random blanket)
PHY instead of the revision of the RADIO.
This fixes the RF switch state polling.
This is from DragonflyBSD, Commit 202e28d1f65e9f35df6032400df3242a3bafb483
Obtained from: DragonflyBSD
time between ntp_adjtime() clock offset adjustments. This eliminates spurious
frequency steering after a large clock step (such as a 1970->2015 step on a
system with no battery-backed clock hardware).
This problem was discovered after the import of ntpd 4.2.8, which does things
in a slightly different (but still correct) order than the 4.2.4 we had
previously. In particular, 4.2.4 would step the clock then immediately after
use ntp_adjtime() to set the frequency and offset to zero, which captured the
post-step time-of-day as a side effect. In 4.2.8, ntpd sets frequency and
offset to zero before any initial clock step, capturing the time as 1970-ish,
then when it next calls ntp_adjtime() it's with a non-zero offset measurement.
This non-zero value gets multiplied by the apparent 45-year interval, which
blows up into a completely bogus frequency steer. That gets clamped to
500ppm, but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.
- Tweek man page.
- Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA.
- Tidy up headers a bit.
- Tidy up declarations a bit.
- Make static in a couple of places where needed.
- Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used.
- Get rid of random_*_process_buffer() functions. They were only used in one place each, and are better subsumed into those places.
- Remove *_post_read() functions as they are stubs everywhere.
- Assert against buffer size illegalities.
- Clean up some silly code in the randomdev_read() routine.
- Make the harvesting more consistent.
- Make some requested argument name changes.
- Tidy up and clarify a few comments.
- Make some requested comment changes.
- Make some requested macro changes.
* NOTE: the thing calling itself a 'unit test' is not yet a proper
unit test, but it helps me ensure things work. It may be a proper
unit test at some time in the future, but for now please don't make
any assumptions or hold any expectations.
Differential Revision: https://reviews.freebsd.org/D2025
Approved by: so (/dev/random blanket)
ACPI driver requires special functions to be provided by machdep code.
Add temporary stubs to satisfy the compiler when both "pci" and "acpi"
are enabled in the kernel configuration file.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3028
CloudABI does not provide an explicit kill() system call, for the reason
that there is no access to the global process namespace. Instead, it
offers a raise() system call that can at least be used to terminate the
process abnormally.
CloudABI does not support installing signal handlers. CloudABI's raise()
system call should behave as if the default policy is set up. Call into
kern_sigaction(SIG_DFL) before calling sys_kill() to force this.
Obtained from: https://github.com/NuxiNL/freebsd
Previously vputx would detect the condition and clear the flag.
With this change it is invalid to have both v_usecount > 0 and the flag
set. Assert the condition is met in all revlevant places.
Reviewed by: kib
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.
Reviewed by: kib
This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.
* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
set in a variety of methods.
This is only relevant for very specific workloads.
This doesn't pretend to be a final NUMA solution.
The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.
This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.
Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.
Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.
Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.
Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!
Tested:
* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)
* Peter Holm ran a stress test suite on this work and found one
issue, but has not been able to verify it (it doesn't look NUMA
related, and he only saw it once over many testing runs.)
* I've tested bhyve instances running in fixed NUMA domains and cpusets;
all seems to work correctly.
Verified:
* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
NUMA policies for processes under test.
Review:
This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@. The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus. My hope is that with further
exposure and testing more functionality can be implemented and evaluated.
Notes:
* The VM doesn't handle unbalanced domains very well, and if you have an overly
unbalanced memory setup whilst under high memory pressure, VM page allocation
may fail leading to a kernel panic. This was a problem in the past, but it's
much more easily triggered now with these tools.
* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory
isn't really guaranteed in any way. That's next on my plate.
Sponsored by: Norse Corp, Inc.; Dell
objects, i.e. for buffer objects which vnode was reclaimed. Buffer
cache cannot write such buffers. Return the error and discard the
buffer immediately on write attempt.
BO_DIRTY now always set during vnode reclamation, since it is used not
only for the INVARIANTS checks. Do allow placement of the clean
buffers on dead bufobj list, otherwise filesystems cannot use bufcache
at all after the devvp reclaim.
Reported and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.
(detected by jenkins run with gcc 4.9)
Update documentation on the use of netmap_priv_d,
rename the refcount and use the same structure in
FreeBSD and linux
No functional changes.
- make mode enum start from 0 so that the assertion covers all cases [1]
- rename prefix _CLOEXEC flag with _FLAG
- postpone fhold on the old file descriptor, which eliminates the need to fdrop
in error cases.
- fixup FDDUP_FCNTL check missed in the previous commit
This removes 'fp == oldfde->fde_file' assertion which had little value. kern_dup
only calls fd-related functions which cannot drop the lock or a whole lot of
races would be introduced.
Noted by: kib [1]
versions of the x87 tags. The conversion is naive, used abridged tag
is converted to valid unabridged, without additional checks for zero
and special values.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
i386/include/frame.h after a code was moved from machdep.c to frame.h
in r284925.
Use include guards style similar to other guards.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
to more C11-ish atomic_thread_fence_seq_cst().
Note that on PowerPC, which currently uses lwsync for mb(), the change
actually fixes the missed store/load barrier, intended by r271604 [*].
Reviewed by: alc
Noted by: alc [*]
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
We currently return EINVAL when calling listen() on a UNIX socket that
has not been bound to a pathname. If my interpretation of POSIX is
correct, we should return EDESTADDRREQ: "The socket is not bound to a
local address, and the protocol does not support listening on an unbound
socket."
Return EDESTADDRREQ instead when not bound and not connected.
Differential Revision: https://reviews.freebsd.org/D3038
Reviewed by: gnn, network
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)
- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.
MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco
Detected by clang 3.7.0 with the warning:
sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c:309:18: error: variable
'rptr' is uninitialized when used here [-Werror,-Wuninitialized]
chp->cq.rptr = rptr;
^~~~
MFC after: 1 week
mode and with hardware support on systems that have AESNI instructions.
Differential Revision: D2936
Reviewed by: jmg, eri, cognet
Sponsored by: Rubicon Communications (Netgate)
The logic is reorganised so that there is one exit point prior to the
lookup loop. This is an intermediate step to making audit logging
functions use found vnode instead of translating ni_dirfd on their own.
ni_startdir validation is removed. The only in-tree consumer is nfs
which already makes sure it is a directory.
Reviewed by: kib
apparently neither clang nor gcc complain about this.
But clang intis the var to NULL correctly while gcc on at least mips does not.
Correct the undefined behavior by initializing the variable properly.
PR: 201371
Differential Revision: https://reviews.freebsd.org/D3036
Reviewed by: gnn
Approved by: gnn(mentor)
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.
The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.
This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.
Differential Revision: https://reviews.freebsd.org/D3035
Reviewed by: mjg
namei used to vref fd_cdir, which was immediatley vrele'd on entry to
the loop.
Check for absolute lookup and vref the right vnode the first time.
Reviewed by: kib
fd_rdir vnode was stored in ni_rootdir without refing it in any way,
after which the filedsc lock was being dropped.
The vnode could have been freed by mountcheckdirs or another thread doing
chroot.
VREF the vnode while the lock is held.
Reviewed by: kib
MFC after: 1 week
and psci to start them. I expect ACPI support to be added later.
This has been tested on qemu with 2 cpus as that is the current value of
MAXCPUS. This is expected to be increased in the future as FreeBSD has
been tested on 48 cores on the Cavium ThunderX hardware.
Partially based on a patch from Robin Randhawa from ARM.
Approved by: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3024
include this file without first including the headers needed for uint32_t
and the like use the __foo type.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.
Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.
Differential Revision: https://reviews.freebsd.org/D3032
Reviewed by: kib
This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.
The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.
Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.
New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:
printf("is thunder: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0));
printf("is generic: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
CPU_IMPL_ARM, CPU_PART_FOUNDATION, 0, 0));
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3030
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory. The result is the race and accesses to the freed memory.
Refcount the selfd ownership. One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.
Reported by: Larry Rosenman <ler@lerctr.org>
Tested by: Larry Rosenman <ler@lerctr.org>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.
CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.
The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.
We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.
The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().
More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA
Differential Revision: https://reviews.freebsd.org/D2848
Reviewed by: emaste, brooks
Obtained from: https://github.com/NuxiNL/freebsd
session in multiple threads w/o locking.. There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...
This patch adds a per cpu context and a lock to protect it...
It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...
I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines... The patch in D3015
was used to verify that this fix does fix the issue...
Reviewed by: gnn, kib (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D3016
for timehands consumers, by using fences.
Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).
Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].
Noted by: alc [*]
Requested by: bde [**]
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
seq_write_begin(), instead of the load_rmb/rbm_load functions. The
update does not need to be atomic due to the write lock owned.
Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.
For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().
In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.
Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
On platforms which are fully IO-coherent, the map might be null.
We need to guarantee that all data is observable after the
sync operation is called. Add a memory barrier to ensure that on ARM.
Reviewed by: andrew, kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3012
The number of available lock list entries for a thread is LOCK_CHILDCOUNT,
and each entry can record up to LOCK_NCHILDREN locks. When iterating over
the locks held by a thread, a bound on the loop index is therefore given
by LOCK_CHILDCOUNT * LOCK_NCHILDREN; WITNESS_COUNT is an unrelated
constant.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2974
The UEFI loader on the 10.1 release install disk (disc1) modifies an
existing EFI_DEVICE_PATH_PROTOCOL instance in an apparent attempt to
truncate the device path. In doing so it creates an invalid device
path.
Perform the equivalent action without modification of structures
allocated by firmware.
PR: 197641
MFC After: 1 week
Submitted by: Chris Ruffin <chris.ruffin@intel.com>
Place sched_random nearer to where it's first used: moving the
code nearer to where it is used makes the code easier to read
and we can reduce the initial "#ifdef SMP" island.
Reword a little the comment and clean some whitespaces
while here.
The 6205 (Taylor Peak) in the Lenovo X230 works fine in 5GHz 11a and 11n HT20,
but not 11n HT40. The NIC goes RX deaf the moment HT40 is configured.
It's so RX deaf that it doesn't even hear beacons and the firmware sends
"BEACON MISS" events. That's pretty deaf.
I tried configuring up the HT40 flags in monitor mode and it worked - so
I assumed that doing the transition from 20 -> 40MHz channel configuration
when going auth->assoc (ie, after the NIC has been partially configured)
is a problem.
So for now, let's just always set them if they're available.
Tested:
* Intel 5300, STA mode, 5GHz HT/40 AP; 2GHz HT/20 AP
* Intel 6205, STA mode, 5GHz HT/40, HT20, 11a AP; 2GHz HT/20 AP
This was pointed out to me by coworkers trying to use FreeBSD-HEAD
in the office on their Thinkpad T420p laptops.
TODO:
* I don't like how the HT40 flags are configured - the whole interop/
protection config should be re-checked. Notably, I think curhtprotmode
is 0 in a lot of cases, which means "no interoperability" and i think
that's busted.
Sponsored by: Norse Corp, Inc.
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
flags are not specified... This bug was introduced in r275732...
This only affects IPsec ESP only policies w/ the aesni module loaded,
other subsystems specify one or both of the flags...
Reviewed by: gnn, delphij, eri
Add ARM ITS (Interrupt Translation Services) support required
to bring-up message signalled interrupts on some ARM64 platforms.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
pointer is NULL, as in that case there are no userland pages that
could potentially be wired. It is common for old to be NULL and
oldlenp to be non-NULL in calls to userland_sysctl(), as this is used
to probe for the length of a variable-length sysctl entry before
retrieving a value. Note that it is typical for such calls to be made
with an uninitialized value in *oldlenp, so sysctlmemlock was
essentially being acquired at random (depending on the uninitialized
value in *oldlenp being > PAGE_SIZE or not) for these calls prior to
this patch.
Differential Revision: https://reviews.freebsd.org/D2987
Reviewed by: mjg, kib
Approved by: jmallett (mentor)
MFC after: 1 month
Summary:
Both booke and AIM interrupt.c files contain nearly identical code. This merges
the two files, to reduce duplication.
Reviewers: #powerpc, marcel
Reviewed By: marcel
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2991
bd_hdrcmplt. As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt. Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.
Differential Revision: https://reviews.freebsd.org/D2989
Reviewed by: gnn, melifaro
Approved by: jmallett (mentor)
MFC after: 3 days
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.
Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D2937
To avoid conflicts between target and initiator devices in CAM, make
CTL use target ID reported by HBA as its initiator_id in XPT_PATH_INQ.
That target ID is known to never be used for initiator role, so it won't
conflict. For Fibre Channel and FireWire HBAs this specific ID choice
is irrelevant since all target IDs there are virtual. Same time for SPI
HBAs it seems could be even requirement to use same target ID for both
initiator and target roles.
While there are some more things to polish in isp(4) driver, first tests
of using both roles same time on the same port appeared successfull:
# camcontrol devlist -v
scbus0 on isp0 bus 0:
<FREEBSD CTLDISK 0001> at scbus0 target 1 lun 0 (da20,pass21)
<> at scbus0 target 256 lun 0 (ctl0)
<> at scbus0 target -1 lun ffffffff (ctl1)
FreeBSD never had limitation on number of target IDs, and there is no
any other requirement to allocate them densely. Since slots of port
database already populated just sequentially, there is no much need
for another indirection to allocate sequentially too.