- Retire long time unused (basically always unused) sys__umtx_lock()
and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall
__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.
Sponsored by: EMC / Isilon storage division
Reviewed by: jhb
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks
change... This eliminates a cast, and also forces td_retval
(often 2 32-bit registers) to be aligned so that off_t's can be
stored there on arches with strict alignment requirements like
armeb (AVILA)... On i386, this doesn't change alignment, and on
amd64 it doesn't either, as register_t is already 64bits...
This will also prevent future breakage due to people adding additional
fields to the struct...
This gets AVILA booting a bit farther...
Reviewed by: bde
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.
Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically
changed the sending sockbuf's high water mark whenever adding or removing
data from the receiving sockbuf. It worked for stream sockets, but it never
worked for SOCK_SEQPACKET sockets because of their atomic nature. If the
sockbuf was partially full, it might return EMSGSIZE instead of blocking.
The new solution is based on DragonFlyBSD's fix from commit
3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP
flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it
sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that
sockbuf, causing sosend_generic and friends to block. uipc_rcvd will
likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no
longer need to call chgsbsize() on every send and receive because they don't
change the sockbuf's high water mark. Also, uipc_sense no longer needs to
acquire the UIPC linkage lock, because it's simpler to compute the
st_blksizes.
There is one drawback: since sbspace() will only ever return 0 or the
maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum
size by at most one packet of size less than the max. I don't think that's
a serious problem. In fact, I'm not even positive that FreeBSD guarantees a
socket will always stay within its nominal size limit.
sys/sys/sockbuf.h
Add the SB_STOP flag and adjust sbspace()
sys/sys/unpcb.h
Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb.
sys/kern/uipc_usrreq.c
Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP
backpressure mechanism. Removing obsolete unpcb fields from
db_show_unpcb.
tests/sys/kern/unix_seqpacket_test.c
Clear expected failures from ATF.
Obtained from: DragonFly BSD
PR: kern/185812
Reviewed by: silence from freebsd-net@ and rwatson@
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
a single priority queue. If that queue had a thread or threads which
could not be migrated we would fail to steal load. This could cause
starvation in situations where cores are idle.
Submitted by: Doug Kilpatrick <dkilpatrick@isilon.com>
Tested by: pho
Reviewed by: mav
Sponsored by: EMC / Isilon Storage Division
perforce syntax and committed some unrelated files. Only devd files
should've been committed.
Reported by: imp
Pointy hat to: asomers
MFC after: 3 weeks
X-MFC-With: r262914
sbin/devd/devd.cc
Add a -q flag to devd that will suppress syslog logging at
LOG_NOTICE or below.
Requested by: ian@ and imp@
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
buffers drop packets". It was caused by a check for the space available
in a sockbuf, but it was checking the wrong sockbuf.
sys/sys/sockbuf.h
sys/kern/uipc_sockbuf.c
Add sbappendaddr_nospacecheck_locked(), which is just like
sbappendaddr_locked but doesn't validate the receiving socket's
space. Factor out common code into sbappendaddr_locked_internal().
We shouldn't simply make sbappendaddr_locked check the space and
then call sbappendaddr_nospacecheck_locked, because that would cause
the O(n) function m_length to be called twice.
sys/kern/uipc_usrreq.c
Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets,
because the receiving sockbuf's size limit is irrelevant.
tests/sys/kern/unix_seqpacket_test.c
Now that 185813 is fixed, pipe_128k_8k fails intermittently due to
185812. Make it fail every time by adding a usleep after starting
the writer thread and before starting the reader thread in
test_pipe. That gives the writer time to fill up its send buffer.
Also, clear the expected failure message due to 185813. It actually
said "185812", but that was a typo.
PR: kern/185813
Reviewed by: silence from freebsd-net@ and rwatson@
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
to use-after-free.
fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but
kern_proc_{o,}filedesc_out only checked for hold count.
MFC after: 3 days
fdgrowtable() now only reallocates fd_map when necessary.
This fixes fdgrowtable() to use the same logic as fdescfree() for
when to free the fd_map. The logic in fdescfree() is intended to
not free the initial static allocation, however the fd_map grows
at a slower rate than the table does. The table is intended to hold
20 fd, but its initial map has many more slots than 20. The slot
sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches
NDSLOTS(20), so fdescfree() was assuming that the fd_map was still
the initial allocation and not freeing it.
This partially reverts r244510 by reintroducing some of the logic
it removed in fdgrowtable().
Reviewed by: mjg
Approved by: bapt (mentor)
MFC after: 2 weeks
routine, now a platform can provide a pointer to an early_putc() routine
which is used instead of cn_putc(). Control can be handed off from early
printf support to standard console support by NULLing out the pointer
during standard console init.
This leverages all the existing error reporting that uses printf calls,
such as panic() which can now be usefully employed even in early
platform init code (useful at least to those who maintain that code and
build kernels with EARLY_PRINTF defined).
Reviewed by: imp, eadler
in the sysctl_root().
Note: SYSCTL_VNET_* macros can be removed as well. All is
needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
But for now keep macros in place to avoid large code churn.
Sponsored by: Nginx, Inc.
r261266:
Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
Submitted by: netchild
MFC after: 1 week
It's common for multi-threaded processes to create a thread for
the purpose of synchronously processing signals. Allow such processes to
utilize a capabilities sandbox.
Discussed with: rwatson, pjd
MFC after: 2 weeks
paper trail now, this patch is similar to one posted for one of the
preliminary versions of a new armv6 port. I took them and made them
more generic. Option not enabled by default since each board/port has
to provide its own eputc, and possibly do other things as well...
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.
Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.
This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use. Note
it doesn't mark the region as free overall - only free from this
transaction. The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.
TODO:
* documentation, as always
Sponsored by: Netflix, Inc.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.
MFC after: 3 days
Having ncneg diverge with the actual length of the ncneg tailq causes
NULL dereference.
Add assertion that an entry taken from ncneg queue is indeed negative.
Reported by and discussed with: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
bio_completed, only manage bio_resid, e.g. sa(4).
Reported and tested by: Manfred Antar <null@pozo.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
They are stored as two separate characters in the vtbuf, so copy-pasting
will cause them to be passed to terminal_input_char() twice. Extend
terminal_input_char() to explicitly discard characters with TF_CJK_RIGHT
set. This causes only the left part to generate input.
o Forward termianl framebuffer ioctl to fbd.
o Forward terminal mmap request to fbd.
o Move inclusion of sys/conf.h to vt.h.
Sponsored by: The FreeBSD Foundation
Introduce a new formatting bit (TF_CJK_RIGHT) that is set when putting a
cell that is the right part of a CJK fullwidth character. This will
allow drivers like vt(9) to support fullwidth characters properly.
emaste@ has a patch to extend vt(9)'s font handling to increase the
number of Unicode -> glyph maps from 2 ({normal,bold)} to 4
({normal,bold} x {left,right}). This will need to use this formatting
bit to determine whether to draw the left or right glyph.
Reviewed by: emaste
covered by sbintime (LONG_MAX seconds).
Some programs use timeout values in excess of 1000 years. The conversion
to sbintime caused wrap-around on overflow, which resulted in short or
negative timeout values. This caused long delays on sockets opened by
affected programs (e.g. OpenSSH).
Kernels compiled without -fno-strict-overflow were not affected, apparently
because the compiler tested the sign of the timeout value before performing
the multiplication that lead to overflow.
When the -fno-strict-overflow option was added to CFLAGS, this optimization
was disabled and the test was performed on the result of the multiplication.
Negative products were caught and resulted in EINVAL being returned, but
wrap-around to positive values just shortened the timeout value to the
residue of the result that could be represented by sbintime.
The fix is to cap the timeout values at the maximum that can be represented
by sbintime, which is 2^31 - 1 seconds or more than 68 years.
After this change, the kernel can be compiled with -fno-strict-overflow
with no ill effects.
MFC after: 3 days
linker_unload_file() rather than kern_kldload() and kern_kldunload(). This
ensures that the handlers are invoked for files that are loaded/unloaded
automatically as dependencies. Previously, they were only invoked for files
loaded by a user.
As a side effect, the kld_load and kld_unload handlers are now invoked with
the kernel linker lock exclusively held.
Reported by: avg
Reviewed by: jhb
MFC after: 2 weeks
to fail and return error.
- Use make_dev_p() in tty_makedevf() instead of make_dev_cred().
- Always pass MAKEDEV_CHECKNAME flag.
- Optionally pass MAKEDEV_REF flag.
- Provide macro for compatibility with old API.
This fixes races with simultaneous creation and desctruction of
ttys, and makes it possible to call tty_makedevf() from device
cloners.
A race in tty_watermarks() still exist, since the latter drops
lock for M_WAITOK allocation. This will be addressed in separate
commit.
Reviewed by: kib
Sponsored by: Nginx, Inc.
child process that were inherited from its parent. However, this should
not be done in the case of a vfork, since the fork handler ends up removing
the tracepoints from the shared vm space, and userland DTrace probes in the
parent will no longer fire as a result.
Now the child of a vfork may trigger userland DTrace probes enabled in its
parent, so modify the fasttrap probe handler to handle this case and handle
the child process in the same way that it would handle the traced process.
In particular, if once traces function foo() in a process that vforks, and
the child calls foo(), fasttrap will treat this call as having come from the
parent. This is the behaviour of the upstream code.
While here, add #ifdef guards to some code that isn't present upstream.
MFC after: 1 month
advisory lock cannot be obtained, prevent double-close of the vnode in
vn_close() called from the fdrop(), by resetting file' f_ops methods.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This allows it to be better tracked as well as being able to leverage
UMA for more interesting/useful behaviour at a later date.
Sponsored by: Netflix, Inc.
the TTY. In such a case, ttydev_close() is called multiple times and
each time, t_revokecnt is incremented and cv_broadcast() is called for
both the t_outwait and t_inwait condition variables.
Let's say revoke(2) comes in first and gets to call tty_drain() from
ttydev_leave(). Let's say that the revoke comes from init(8) as the
result of running "shutdown -r now". Since shutdown prints various
messages to the console before announing that the machine will reboot
immediately, let's also say that the output queue is not empty and
that tty_drain() has something to do. Let's assume this all happens
on a 9600 baud serial console, so it takes a time to drain.
The shutdown command will exit(2) and as such will end up closing
stdout. Let's say this close will come in second, bump t_revokecnt
and call tty_wakeup(). This has tty_wait() return prematurely and
the next thing that will happen is that the thread doing revoke(2)
will flush the TTY. Since the drain wasn't complete, the flush will
effectively drop whatever is left in t_outq.
This change takes into account that tty_drain() will return ERESTART
due to the fact that t_revokecnt was bumped and in that case simply
call tty_drain() again. The thread in question is already performing
the close so it can safely finish draining the TTY before destroying
the TTY structure.
Now all messages from shutdown will be printed on the serial console.
Obtained from: Juniper Networks, Inc.
In case of 4K allocation quantum that means for allocations up to 128K.
With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items). Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive. Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.
While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.
If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
the excess code in g_io_check(), bio_resid is also truncated by
g_io_deliver(). As result, bufdonebio() assigns truncated value to
the buffer b_resid field.
Use the residual bio_completed to calculate buffer b_resid from
b_bcount in bufdonebio(), instead of bio_resid, calculated from
bio_length in g_io_deliver().
The issue is seemingly caused by the code rearrange into g_io_check(),
which is not present in stable/10. The change still looks as the
useful change to have in 10 nevertheless.
Reported by: Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Tested by: pho, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config
setting). Use this now-variable value in place of the defined constant
to control how long the system waits after a panic before rebooting.
by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting
priority value (before nice adjustment) is between SCHED_PRI_MIN and
SCHED_PRI_MAX, inclusive.
Submitted by: kib
Reported by: pho
MFC after: 1 week
MACHINE_ARCH values whose binaries this kernel can run. This patch provides
a feature requested for implementing pkgng ABI identifiers in a robust
way.
The list is designed to indicate whether, say, an i386 package can be run on
the current system. If kern.supported_abis contains "i386", then the answer
is yes. Otherwise, the answer is no.
At the moment, this only supports MACHINE_ARCH and MACHINE_ARCH32. As we
gain support for more interesting combinations, this needs to become more
flexible, possibily through the sysent framework, along with the
hw.machine_arch emulation immediately preceding this code in kern_mib.c.
Reviewed by: imp
MFC after: 3 days
for extending and reusing it.
The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper,
used to indicate that the backing store for a group of mbufs has completed.
It's only being used by sendfile for now and it's only implementing a
sleep/wakeup rendezvous. However, there are other potential signaling
paths (kqueue) and other potential uses (socket zero-copy write) where the
same mechanism would also be useful.
So, with that in mind:
* extract the sendfile_sync code out into sf_sync_*() methods
* teach the sf_sync_alloc method about the current config flag -
it will eventually know about kqueue.
* move the sendfile_sync code out of do_sendfile() - the only thing
it now knows about is the sfs pointer. The guts of the sync
rendezvous (setup, rendezvous/wait, free) is now done in the
syscall wrapper.
* .. and teach the 32-bit compat sendfile call the same.
This should be a no-op. It's primarily preparation work for teaching
the sendfile_sync about kqueue notification.
Tested:
* Peter Holm's sendfile stress / regression scripts
Sponsored by: Netflix, Inc.
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.
MFC after: 3 days
in one of the many layers of indirection and shims through stable/7
in jail_handle_ips(). When it was cleaned up and unified through
kern_jail() for 8.x, the byte order swap was lost.
This only matters for ancient binaries that call jail(2) themselves
internally.
This API has semantics similar to that of taskqueue_drain but acts on
all tasks that might be queued or running on a taskqueue.
A caller must ensure that no new tasks are being enqueued otherwise this
call would be totally meaningless. For example, if the tasks are
enqueued by an interrupt filter then its interrupt must be disabled.
MFC after: 10 days
given process.
Note that the correctness of the trampoline length returned for ABIs
which do not use shared page depends on the correctness of the struct
sysvec sv_szsigcodebase member, which will be fixed on as-need basis.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE.
Probes defined in this way will appear under SDT provider named "sdt".
Parameter types are exposed via SDT_PROBE_ARGTYPE.
This is something that illumos does not have by default.
This kind of SDT probes is already present in ZFS code, so those probes
will now be available if KDTRACE_HOOKS options is enabled.
A potential future illumos compatibility enhancement is to encode a provider
name as a prefix in a probe name.
Reviewed by: markj
MFC after: 3 weeks
X-MFC after: r258622
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).
Reviewed by: markj
MFC after: 3 weeks
callable from the kernel.
Right now vn_sendfile() can't be called from anything other than
a syscall handler _and_ return the number of bytes queued.
This simply moves the copyout() to do_sendfile() so that any kernel
code can initiate vn_sendfile() outside of a syscall context.
Tested:
* tiny little sendfile program spitting things out a tcp socket
Sponsored by: Netflix, Inc.