Commit Graph

8407 Commits

Author SHA1 Message Date
Hans Petter Selasky
c55f4c9445 Revert r287780 until more developers have their say.
Differential Revision:	https://reviews.freebsd.org/D3521
Requested by:		gnn
2015-09-22 06:51:55 +00:00
Ed Maste
d237d4cccb Add MIPS ELF section type SHT_MIPS_ABIFLAGS definition
Sponsored by:	The FreeBSD Foundation
2015-09-22 00:56:43 +00:00
Konstantin Belousov
cff8c6f2d1 Add support for weak symbols to the kernel linkers. It means that
linkers no longer raise an error when undefined weak symbols are
found, but relocate as if the symbol value was 0.  Note that we do not
repeat the mistake of userspace dynamic linker of making the symbol
lookup prefer non-weak symbol definition over the weak one, if both
are available.  In fact, kernel linker uses the first definition
found, and ignores duplicates.

Signature of the elf_lookup() and elf_obj_lookup() functions changed
to split result/error code and the symbol address returned.
Otherwise, it is impossible to return zero address as the symbol
value, to MD relocation code.  This explains the mechanical changes in
elf_machdep.c sources.

The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the
lookup() call, the patch leaves the code as is (untested).

Reported by:	glebius
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-09-20 01:27:59 +00:00
Edward Tomasz Napierala
0d3d0cc358 Kernel part of reroot support - a way to change rootfs without reboot.
Note that the mountlist manipulations are somewhat fragile, and not very
pretty.  The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly.  It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D2698
2015-09-18 17:32:22 +00:00
Gleb Smirnoff
37bde76922 Remove extra tabs. 2015-09-17 20:21:55 +00:00
Steven Hartland
412ce2744c Fix kqueue write events for files > 2GB
Due to the use of int's for file offsets in the VOP_WRITE_(PRE|POST)
macros, kqueue write events for files greater 2GB where never fired.

This caused tail -f on a file greater 2GB to never see updates.

MFC after:	1 week
Relnotes:	YES
Sponsored by:	Multiplay
2015-09-17 00:03:55 +00:00
Mateusz Guzik
7665e341ca sysctl: switch sysctllock to a sleepable rmlock, take 2
This restores r285125. Previous attempt was reverted due to a bug in rmlocks,
which is fixed since r287833.
2015-09-15 23:06:56 +00:00
Conrad Meyer
55d33667ee kevent(2): Note DOOMED vnodes with NOTE_REVOKE
In poll mode, check for and wake VBAD vnodes.  (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)

Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation.  (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)

Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3675
2015-09-15 20:22:30 +00:00
Hans Petter Selasky
9acc0eafd7 Implement callout_drain_async(), inspired by the projects/hps_head
branch.

This function is used to drain a callout via a callback instead of
blocking the caller until the drain is complete. Refer to the
callout_drain_async() manual page for a detailed description.

Limitation: If a lock is used with the callout, the callout can only
be drained asynchronously one time unless the callout_init_mtx()
function is called again. This limitation is not present in
projects/hps_head and will require more invasive changes to the
timeout code, which was not in the scope of this patch.

Differential Revision:	https://reviews.freebsd.org/D3521
Reviewed by:		wblock
MFC after:		1 month
2015-09-14 10:52:26 +00:00
Mark Johnston
610141cebb Add stack_save_td_running(), a function to trace the kernel stack of a
running thread.

It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.

This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.

Reviewed by:	bdrewery, jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3256
2015-09-11 03:54:37 +00:00
Mark Johnston
d73ce4c698 Remove the v_cache_min and v_cache_max sysctls. They are unused and have
no effect.

Reviewed by:	alc
Sponsored by:	EMC / Isilon Storage Division
2015-09-11 03:00:20 +00:00
Warner Losh
ad8d57a99d dev_strategy and dev_strategy_csw are unused since r281825. Remove
them.

Differential Revision: https://reviews.freebsd.org/D3620
2015-09-11 00:38:58 +00:00
Enji Cooper
7afd9bf059 Remove opt_random.h header pollution from sys/random.h by moving
RANDOM_LOADABLE and RANDOM_YARROW's definitions from opt_random.h to
opt_global.h

This unbreaks `make depend` in sys/modules with multiple drivers (tmpfs, etc)
after r286839

X-MFC with: r286839
Reviewed by: imp
Submitted by: lwhsu
Differential Revision: D3486
2015-09-08 08:50:28 +00:00
Mateusz Guzik
d7832811a7 fd: make the common case in filecaps_copy work lockless
The filedesc lock is only needed if ioctls caps are present, which is a
rare situation. This is a step towards reducing the scope of the filedesc
lock.
2015-09-07 20:02:56 +00:00
Conrad Meyer
bcb60d52e6 Follow-up to r287442: Move sysctl to compiled-once file
Avoid duplicate sysctl nodes.

Found by:	tijl
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3586
2015-09-07 16:44:28 +00:00
Kirk McKusick
17518b1a2b Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.

Reviewed by: kib
Tested by:   Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
MFC after:   2 weeks
2015-09-06 05:50:51 +00:00
Xin LI
28ffe927c2 Expose an interface to determine if an ACE is inherited.
Submitted by:	sef
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3540
2015-09-04 00:14:20 +00:00
Conrad Meyer
14bdbaf2e4 Detect badly behaved coredump note helpers
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.

When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.

NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath.  As vnodes may move around during dump, this is racy.

So:

 - Detect badly behaved notes in putnote() and pad underfilled notes.

 - Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
   exercise the NT_PROCSTAT_FILES corruption.  It simply picks random
   lengths to expand or truncate paths to in fo_fill_kinfo_vnode().

 - Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
   disable kinfo packing for PROCSTAT_FILES notes.  This should avoid
   both FILES note corruption and truncation, even if filenames change,
   at the cost of about 1 kiB in padding bloat per open fd.  Document
   the new sysctl in core.5.

 - Fix note_procstat_files to self-limit in the 2nd pass.  Since
   sometimes this will result in a short write, pad up to our advertised
   size.  This addresses note corruption, at the risk of sometimes
   truncating the last several fd info entries.

 - Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
   zero padding.

With suggestions from:	bjk, jhb, kib, wblock
Approved by:	markj (mentor)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3548
2015-09-03 20:32:10 +00:00
John Baldwin
183b68f74f Export current system call code and argument count for system call entry
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)

The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3536
2015-09-01 22:24:54 +00:00
Konstantin Belousov
44e629f18d Remove single-use macros obfuscating malloc(9) and free(9) calls.
Style.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-30 17:58:11 +00:00
Pedro F. Giffuni
fa60bc2f86 Add underscores to attributes when checking for __has_attribute.
This is a good practice to avoid confusion with allowed macros.

Suggested by:	jilles
2015-08-28 15:36:05 +00:00
Pedro F. Giffuni
46e45c23d4 trailing space 2015-08-28 14:13:01 +00:00
Pedro F. Giffuni
ddd54889b7 Be more GCC-friendly with attributes
Being clang the default compiler, we were always giving precedence to
the __has_attribute check. Unfortunately clang generally doesn't support
the new attributes (alloc_size was briefly supported and then reverted)
so we were always doing both checks. Give the precedence to GCC as that is
the working case now.

Do the same for  __has_builtin() for consistency.
2015-08-28 14:06:28 +00:00
Mark Johnston
c25fabea97 Remove weighted page handling from vm_page_advise().
This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED).  Its objective was to ensure that the page daemon
would eventually reclaim other unreferenced pages (i.e., unreferenced pages
not touched by madvise()) from the active queue.

Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary.  Instead, always "cache" clean pages
by moving them to the head of the inactive page queue.  This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.

Suggested by:	alc
Reviewed by:	alc
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3401
2015-08-28 00:44:17 +00:00
Ed Schouten
bc1ace0b96 Decompose linkat()/renameat() rights to source and target.
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411
2015-08-27 15:16:41 +00:00
Alexander Kabaev
2c51488ec8 Repair sys/cdefs.h enough to be usable with GCC 5.x
The __alloc_size and __alloc_align need to be defined to
nothingness for lint, but the existing check is deficient
and allows attributes with working __has_attrubute() to
slip through.
2015-08-27 14:00:23 +00:00
Edward Tomasz Napierala
c9ba65040f Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3467
2015-08-24 13:18:13 +00:00
Edward Tomasz Napierala
6e572e084b After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
Mark Murray
e866d8f05b Make the UMA harvesting go away completely if not wanted. Default to "not wanted".
Provide and document the RANDOM_ENABLE_UMA option.

Change RANDOM_FAST to RANDOM_UMA to clarify the harvesting.

Remove RANDOM_DEBUG option, replace with SDT probes. These will be of
use to folks measuring the harvesting effect when deciding whether to
use RANDOM_ENABLE_UMA.

Requested by:	scottl and others.
Approved by:	so (/dev/random blanket)
Differential Revision:    https://reviews.freebsd.org/D3197
2015-08-22 12:59:05 +00:00
Justin Hibbits
6aabc119b6 Create a RouterBoard platform and use it to create a flash map
Summary:
The RouterBoard uses a predefined partition map which doesn't exist in the fdt.
This change allows overriding the fdt slicer with a custom slicer, and uses this
custom slicer to define the flash map on the RouterBoard RB800.
D3305 converts the mpc85xx platform into a base class, so that systems based on
the mpc85xx platform can add their own overrides.  This change builds on D3305,
and creates a RouterBoard (RB800) platform to initialize the slicer override.

Reviewed By: nwhitehorn, imp
Differential Revision: https://reviews.freebsd.org/D3345
2015-08-22 05:50:18 +00:00
Jason Evans
0ba74efb31 Bump __FreeBSD_version for the jemalloc 4.0.0 import. 2015-08-18 08:29:13 +00:00
Mark Murray
646041a89a Add DEV_RANDOM pseudo-option and use it to "include out" random(4)
if desired.

Retire randomdev_none.c and introduce random_infra.c for resident
infrastructure. Completely stub out random(4) calls in the "without
DEV_RANDOM" case.

Add RANDOM_LOADABLE option to allow loadable Yarrow/Fortuna/LocallyWritten
algorithm.  Add a skeleton "other" algorithm framework for folks
to add their own processing code. NIST, anyone?

Retire the RANDOM_DUMMY option.

Build modules for Yarrow, Fortuna and "other".

Use atomics for the live entropy rate-tracking.

Convert ints to bools for the 'seeded' logic.

Move _write() function from the algorithm-specific areas to randomdev.c

Get rid of reseed() function - it is unused.

Tidy up the opt_*.h includes.

Update documentation for random(4) modules.

Fix test program (reviewers, please leave this).

Differential Revision:    https://reviews.freebsd.org/D3354
Reviewed by:              wblock,delphij,jmg,bjk
Approved by:              so (/dev/random blanket)
2015-08-17 07:36:12 +00:00
Peter Grehan
949856d8d7 Add define for SATA Check-Power-Mode command, 0xe5. 2015-08-17 05:56:41 +00:00
Xin LI
e370f90a60 so_vnet is constant after creation and no locking is necessary,
document this fact.

(netmap have an assignment too but that socket object is on stack).

MFC after:	2 weeks
2015-08-17 05:53:37 +00:00
Mariusz Zaborski
347a39b4a6 Add support for the arrays in nvlist library.
- Add
  nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist,
  descriptor} functions.
- Add support for (un)packing arrays.
- Add the nvl_array_next field to the nvlist structure.
  If an array is added by the nvlist_{move,add}_nvlist_array function
  this field will contains next element in the array.
- Add the nitems field to the nvpair and nvpair_header structure.
  This field contains number of elements in the array.
- Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of
  an array.
- Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only
  on packing/unpacking.
- Add new API for traversing arrays (nvlist_get_array_next).
- Add the nvlist_get_pararr function which combines the
  nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in
  the array it will return next element from array. If nvlist is last
  element in array or it isn't in array it will return his
  container (parent). This function should simplify traveling over nvlist.
- Add tests for new features.
- Add documentation for new functions.
- Add my copyright.
- Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file.

PR:		191083
Reviewed by:	allanjude (doc)
Approved by:	pjd (mentor)
2015-08-15 06:34:49 +00:00
Ian Lepore
ca64e4807e Constify the pointers to eventtimer and timecounter name strings.
The need for this appears as soon as you try to set the names to something
that isn't a "quoted literal".  (I'm actually confused why quoted strings
aren't a problem as well, we must have some warning disabled.)
2015-08-13 14:43:25 +00:00
Warner Losh
ad38a9d962 Crunchgen needs to be bootstrapped to pick up the STRIP->STRIPBIN
changes to prevent the 'rescue: not found' errors from happening.
Bump FreeBSD_version to 1100078 since there's been no version bumps
since this change was made. Only people that installed since r284356
really need to do this bootstrapping, but since crunchgen needs to
bootstrap for other reasons, bumping the number was the simplest.
2015-08-12 16:43:15 +00:00
Koop Mast
d8f4b93526 Instead of defining the actualy user and group id in the drmP.h files
define GID_VIDEO in sys/conf.h, and use it together with UID_ROOT
to define DRM_DEV_UID and DRM_DEV_GID in the drmP.h files.

So there is one place where the UID's and GID's are defined.

Submitted by:	ed@
Reviewed by:	ed@, dumbbell@
Differential Revision:	https://reviews.freebsd.org/D3360
2015-08-11 16:51:44 +00:00
Ed Schouten
e26f6b5f6b Add support for anonymous kqueues.
CloudABI's polling system calls merge the concept of one-shot polling
(poll, select) and stateful polling (kqueue). They share the same data
structures.

Extend FreeBSD's kqueue to provide support for waiting for events on an
anonymous kqueue. Unlike stateful polling, there is no need to support
timeouts, as an additional timer event could be used instead.
Furthermore, it makes no sense to use a different number of input and
output kevents. Merge this into a single argument.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3307
2015-08-11 13:47:23 +00:00
Ed Schouten
aa04a06df5 Introduce kern_cap_rights_limit().
The existing sys_cap_rights_limit() expects that a cap_rights_t object
lives in userspace. It is therefore hard to call into it from
kernelspace.

Move the interesting bits of sys_cap_rights_limit() into
kern_cap_rights_limit(), so that we can call into it from the CloudABI
compatibility layer.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3314
2015-08-11 08:43:50 +00:00
Pedro F. Giffuni
2602e51339 cdefs: reduce code duplication 2015-08-09 15:38:32 +00:00
Alexander V. Chernikov
0cbefd30cb Add const-qualifiers for source mbuf argument in m_dup(), m_copym(),
m_dup_pkthdr() and m_tag_copy_chain().
2015-08-08 15:50:46 +00:00
Alexander Motin
709c835a5a Disable 32-bit PIO for 6Gbit/s Intel SATA controllers.
For some reason 32-bit PIO writes are not working on 6Gbit/s Intel SATA
ports, while 16/32-bit PIO reads and 16-bit PIO writes are working fine.
3Gbit/s ports on the same controllers have no this problem.

Workaround this by disabling 32-bit PIO for all Intel controllers that may
have 6Gbit/s ports.  It halves PIO performance from 6MB/s to 3MB/s, but
who bother about speed of such rare and slow mode, which is also highly
discouraged by SATA specifications?

MFC after:	2 weeks
2015-08-08 11:48:11 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
2433a4eb04 Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
Ed Schouten
39f5ebb774 Add sysent flag to switch to capabilities mode on startup.
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().

Reviewed by:	kib
2015-08-03 13:41:47 +00:00
Mark Johnston
ce1c953ee0 Don't modify curthread->td_locks unless INVARIANTS is enabled.
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3205
2015-08-02 00:03:08 +00:00
Ed Schouten
7ee1b208c3 Add kern_shm_open().
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.

The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.

Approved by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-01 07:21:14 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00