Mark the pipe() system call as COMPAT10.
As of r302092 libc uses pipe2() with a zero flags value instead of pipe().
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D6816
As of r302092 libc uses pipe2() with a zero flags value instead of pipe().
Commit with regenerated files and implementation to follow.
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D6816
File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.
Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.
Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
to mount points with the given filesystem type, specified by mount
vfs_ops pointer.
Based on patch by: mckusick
Reviewed by: avg, mckusick
Tested by: allanjude, madpilot
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
threads, to make it less confusing and using modern kernel terms.
Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
cpu_set_upcall -> cpu_copy_thread (for forks)
cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
Differential revision: https://reviews.freebsd.org/D6731
reason for it in modern times. In the other case, expand the comment
stating instead of doubting.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
X-Differential revision: https://reviews.freebsd.org/D6731
This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.
Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.
Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547
we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which
wins race with us could dereference the covered vnode, and we are
left with the locked freed memory.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week
The allproc_lock lock used in the sysctl_kern_corefile function is initialized
in the procinit function which is called after setting sysctl values at boot.
That means if we set kern.corefile at boot we will be trying to use
lock with is uninitialized and machine will crash.
If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will
not call the sysctl_kern_corefile function and we will not use an uninitialized
lock. When machine will boot then we will start using function depending on
the lock.
Reviewed by: pjd
by holding allprison_lock exclusively (even if only for a moment before
downgrading) on all paths that call PR_METHOD_REMOVE. Since they may run
on a downgraded lock, it's still possible for them to run concurrently
with PR_METHOD_GET, which will need to use the prison lock.
when the process credentials were not changed. This can happen if an
error occured trying to activate the setuid binary. And on error, if
new credentials were not yet assigned, they must be freed to not
create the leak.
Use oldcred == NULL as the predicate to detect credential
reassignment.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
panic string again if set, in case it scrolled out of the active
window. This avoids having to remember the symbol name.
Also add a show callout <addr> command to DDB in order to inspect
some struct callout fields in case of panics in the callout code.
This may help to see if there was memory corruption or to further
ease debugging problems.
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Reviewed by: jhb (comment only on the show panic initally)
Differential Revision: https://reviews.freebsd.org/D4527
p_sched is unused.
The struct td_sched is always co-allocated with the struct thread,
except for the thread0. Avoid useless indirection, instead calculate
td_sched location using simple pointer arithmetic in td_get_sched(9).
For thread0, which is statically allocated, create a structure to
emulate layout of the dynamic allocation.
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D6711
BUS_MAP_INTR() is used to get an interrupt mapping data according
to provided hints. The hints could be modified afterwards, but only
if mapping data was allocated. This method is intended to be called
before BUS_ALLOC_RESOURCE().
An interrupt mapping data describes an interrupt - hardware number,
type, configuration, cpu binding, and whatever is needed to setup it.
(2) Introduce a method which allows storing of an additional data
in struct resource to be available for bus drivers. This method is
convenient in two ways:
- there is no need to rework existing bus drivers as they can simply
be extended to provide an additional data,
- there is no need to modify any existing bus methods as struct
resource is already passed to them as argument and thus stored data
is simply accessible by other bus drivers.
For now, implement this method only for INTRNG.
This is motivated by needs of modern SOCs where hardware initialization
is not straightforward and resources descriptions are complex, opaque
for everyone but provider, and may vary from SOC to SOC. Typical
situation is that one bus driver can fetch a resource description for
its child device, but it's opaque for this driver. Another bus driver
knows a provider for this kind of resource and can pass this resource
description to it. In fact, something like device IVARS would be
perfect for that if implemented generally enough. Unfortunatelly, IVARS
are usable only by their owners now. Only owner knows its IVARS layout,
thus other bus drivers are not able to use them.
Differential Revision: https://reviews.freebsd.org/D6632
range of interrupts they pass to a second controller driver to handle.
The parent driver is expected to detect when one of these interrupts has
been triggered and call intr_child_irq_handler to pass the interrupt to
a child. The children controllers are then expected to manage the range
by allocating interrupts as needed.
This will initially be used by the ARM GICv3 driver, but is is expected to
be useful for other driver where this type of allocation applies.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6436
Inline version of primitives do an atomic op and if it fails they fallback to
actual primitives, which immediately retry the atomic op.
The obvious optimisation is to check if the lock is free and only then proceed
to do an atomic op.
Reviewed by: jhb, vangyzen
specific order. VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.
Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.
Reviewed by: gnn, tuexen (sctp), jhb (kernel.h)
Obtained from: projects/vnet
MFC after: 2 weeks
X-MFC: do not remove pr_destroy
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6652
The lock was temporarily dropped for vrele calls, but they can be
postponed to a point where the lock is not held in the first place.
While here shuffle other code not needing the lock.
This was previously set after the hook and only if auxargs were present.
Now always provide it if possible.
MFC after: 2 weeks
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D6546
This allows an EVENTHANDLER(process_exec) hook to see if the new image
will cause credentials to change whether due to setgid/setuid or because
of POSIX saved-id semantics.
This adds 3 new fields into image_params:
struct ucred *newcred Non-null if the credentials will change.
bool credential_setid True if the new image is setuid or setgid.
This will pre-determine the new credentials before invoking the image
activators, where the process_exec hook is called. The new credentials
will be installed into the process in the same place as before, after
image activators are done handling the image.
MFC after: 2 weeks
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D6544
Use the C99 'static' keyword to hint to the compiler IVs and output digest
sizes. The keyword informs the compiler of the minimum valid size for a given
array. Obviously not every pointer can be validated (i.e., the compiler can
produce false negative but not false positive reports).
No functional change. No ABI change.
Sponsored by: EMC / Isilon Storage Division
Because the size of bool can be implementation defined, make a bool
sysctl handler which handle bools. Userspace sees the bools like
unsigned 8-bit integers. Values are filtered to either 1 or 0 upon
read and write, similar to what a compiler would do.
Requested by: kmacy @
Sponsored by: Mellanox Technologies
have ACLE support built in. The ACLE (ARM C Language Extensions) defines
a set of standardized symbols which indicate the architecture version and
features available. ACLE support is built in to modern compilers (both
clang and gcc), but absent from gcc prior to 4.4.
ARM (the company) provides the acle-compat.h header file to define the
right symbols for older versions of gcc. Basically, acle-compat.h does
for arm about the same thing cdefs.h does for freebsd: defines
standardized macros that work no matter which compiler you use. If ARM
hadn't provided this file we would have ended up with a big #ifdef __arm__
section in cdefs.h with our own compatibility shims.
Remove #include <machine/acle-compat.h> from the zillion other places (an
ever-growing list) that it appears. Since style(9) requires sys/types.h
or sys/param.h early in the include list, and both of those lead to
including cdefs.h, only a couple special cases still need to include
acle-compat.h directly.
Loves it: imp
After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists where a request can be
cancelled after it has partially completed. POSIX doesn't appear to
dictate exactly how to handle this case, but in general I feel that
aio_cancel() should arrange to cancel any request it can, but that any
partially completed requests should return a partial completion rather
than ECANCELED. To that end, fix the socket AIO cancellation routine to
return a short read/write if a partially completed request is cancelled
rather than ECANCELED.
Sponsored by: Chelsio Communications
We may enable interrupts from within the callback, e.g. in a data abort
during copyin. If we receive an interrupt at that time pmc_hook will be
called again and, as it is handling userspace stack tracing, will hit a
KASSERT as it checks if the trapframe is from userland.
With this I can run hwpmc with intrng on a ThunderX and have it trace all
CPUs.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket.
Previously, requests were only requeued if they returned EWOULDBLOCK
and completed no data. Now after a partial completion on a blocking
socket the request is queued and the remaining request is retried when
the socket is ready. This allows writes larger than the currently
available space on a blocking socket to fully complete. Reads on a
blocking socket that satifsy the low watermark can still return a short
read (same as read()).
In order to track previously completed data, the internal 'status'
field of the AIO job is used to store the amount of previously
computed data.
Non-blocking sockets continue to return short completions for both
reads and writes.
Add a test for a "large" AIO write on a blocking socket that writes
twice the socket buffer size to a UNIX domain socket.
Sponsored by: Chelsio Communications