Commit Graph

371 Commits

Author SHA1 Message Date
Konstantin Belousov
d301b3580f Support for userspace non-transparent superpages (largepages).
Created with shm_open2(SHM_LARGEPAGE) and then configured with
FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee
that all userspace mappings of it are served by superpage non-managed
mappings.

Only amd64 for now, both 2M and 1G superpages can be requested, the
later requires CPU feature.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 22:12:51 +00:00
Konstantin Belousov
e8f77c204b Prepare to handle non-trivial errors from vm_map_delete().
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 21:34:31 +00:00
Konstantin Belousov
67a659d282 Add kern_mmap_racct_check(), a helper to verify limits in vm_mmap*().
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-08 23:48:19 +00:00
Mark Johnston
847ab36bf2 Include the psind in data returned by mincore(2).
Currently we use a single bit to indicate whether the virtual page is
part of a superpage.  To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl.  MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26238
2020-09-02 18:16:43 +00:00
Mateusz Guzik
c3aa3bf97c vm: clean up empty lines in .c and .h files 2020-09-01 21:20:45 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Edward Tomasz Napierala
52c81be11a Add linux_madvise(2) instead of having Linux apps call the native
FreeBSD madvise(2) directly.  While some of the flag values match,
most don't.

PR:		kern/230160
Reported by:	markj
Reviewed by:	markj
Discussed with:	brooks, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25272
2020-06-20 18:29:22 +00:00
Mark Johnston
0f1e6ec591 Add a helper function for validating VA ranges.
Functions which take untrusted user ranges must validate against the
bounds of the map, and also check for wraparound.  Instead of having the
same logic duplicated in a number of places, add a function to check.

Reviewed by:	dougm, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25328
2020-06-19 03:32:04 +00:00
Ed Maste
4d13f78444 Correct terminology in vm.imply_prot_max sysctl description
As with r361769 (man page), PROT_* are properly called protections, not
permissions.

MFC after:	1 week
MFC with:	r361769
Sponsored by:	The FreeBSD Foundation
2020-06-04 01:49:29 +00:00
Brooks Davis
d718de812f Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal.  We take advantage of the mandatory
zeroing of fields not listed in the initializer.

Remove kern_mmap_fpcheck() and use kern_mmap_req().

The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations.  In CheriBSD we have already added two more arguments.

Reviewed by:	kib
Discussed with:	kevans
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23164
2020-03-04 21:27:12 +00:00
Ed Maste
acb8858f05 Return ENOTSUP for mmap/mprotect if prot not subset of prot_max
From POSIX,

[ENOTSUP]
    The implementation does not support the combination of accesses
    requested in the prot argument.

This fits the case that prot contains permissions which are not a subset
of prot_max.

Reviewed by:	brooks, cem
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23843
2020-02-26 20:03:43 +00:00
Mateusz Guzik
3379d2f926 vm: use new capsicum helpers 2020-02-15 01:29:07 +00:00
Mateusz Guzik
23ed568caa vm: remove no longer needed atomic_load_ptr casts 2020-02-14 23:16:29 +00:00
Mateusz Guzik
643656cfaf vfs: replace VOP_MARKATIME with VOP_MMAPPED
The routine is only provided by ufs and is only used on mmap and exec.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23422
2020-02-01 06:46:55 +00:00
Kyle Evans
2180f6c6f1 kern_mmap: restore character deleted in transit
Pointy hat to:	kevans
X-MFC-With:	r356359
2020-01-04 23:51:44 +00:00
Kyle Evans
18348a2369 kern_mmap: add a variant that allows caller to inspect fp
Linux mmap rejects mmap() on a write-only file with EACCES.
linux_mmap_common currently does a fun dance to grab the fp associated with
the passed in fd, validates it, then drops the reference and calls into
kern_mmap(). Doing so is perhaps both fragile and premature; there's still
plenty of chance for the request to get rejected with a more appropriate
error, and it's prone to a race where the file we ultimately mmap has
changed after it drops its referenced.

This change alleviates the need to do this by providing a kern_mmap variant
that allows the caller to inspect the fp just before calling into the fileop
layer. The callback takes flags, prot, and maxprot as one could imagine
scenarios where any of these, in conjunction with the file itself, may
influence a caller's decision.

The file type check in the linux compat layer has been removed; EINVAL is
seemingly not an appropriate response to the file not being a vnode or
device. The fileop layer will reject the operation with ENODEV if it's not
supported, which more closely matches the common linux description of
mmap(2) return values.

If we discover that we're allowing an mmap() on a file type that Linux
normally wouldn't, we should restrict those explicitly.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22977
2020-01-04 23:39:58 +00:00
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
Jeff Roberson
f2410510db Avoid acquiring the object lock if color is already set. It can not be
unset until the object is recycled so this check is stable.  Now that we
can acquire the ref without a lock it is not necessary to group these
operations and we can avoid it entirely in many cases.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22565
2019-11-29 19:49:20 +00:00
Doug Moore
7cdcf86360 Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.

Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
2019-11-13 15:56:07 +00:00
Mark Johnston
01cef4caa7 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
Mark Johnston
d0c9294b81 Correct the range boundaries used by kern_mincore().
Reported by:	alc
Sponsored by:	Netflix
2019-10-16 21:47:58 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Kyle Evans
fe7bcbaf50 vm pager: writemapping accounting for OBJT_SWAP
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.

Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.

The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21456
2019-09-03 20:31:48 +00:00
Konstantin Belousov
5dc7e31a09 Control implicit PROT_MAX() using procctl(2) and the FreeBSD note
feature bit.

In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement.  Provide procctl(2) verbs to set and query
implicit PROTMAX handling.  The knobs mimic the same per-image flag
and per-process controls for ASLR.

Reviewed by:	emaste, markj (previous version)
Discussed with:	brooks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D20795
2019-07-02 19:07:17 +00:00
Konstantin Belousov
3730695151 Use traditional 'p' local to designate td->td_proc in kern_mmap.
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D20795
2019-07-02 19:01:14 +00:00
Brooks Davis
74a1b66cf4 Extend mmap/mprotect API to specify the max page protections.
A new macro PROT_MAX() alters a protection value so it can be OR'd with
a regular protection value to specify the maximum permissions.  If
present, these flags specify the maximum permissions.

While these flags are non-portable, they can be used in portable code
with simple ifdefs to expand PROT_MAX() to 0.

This change allows (e.g.) a region that must be writable during run-time
linking or JIT code generation to be made permanently read+execute after
writes are complete.  This complements W^X protections allowing more
precise control by the programmer.

This change alters mprotect argument checking and returns an error when
unhandled protection flags are set.  This differs from POSIX (in that
POSIX only specifies an error), but is the documented behavior on Linux
and more closely matches historical mmap behavior.

In addition to explicit setting of the maximum permissions, an
experimental sysctl vm.imply_prot_max causes mmap to assume that the
initial permissions requested should be the maximum when the sysctl is
set to 1.  PROT_NONE mappings are excluded from this for compatibility
with rtld and other consumers that use such mappings to reserve
address space before mapping contents into part of the reservation.  A
final version this is expected to provide per-binary and per-process
opt-in/out options and this sysctl will go away in its current form.
As such it is undocumented.

Reviewed by:	emaste, kib (prior version), markj
Additional suggestions from:	alc
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18880
2019-06-20 18:24:16 +00:00
Doug Moore
f8c8b2e8a0 r348879 introduced a wrong-way comparison that broke mmap.
This change rights that comparison.

Reported by: pho
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20595
2019-06-10 22:06:40 +00:00
Doug Moore
77555b849d Change the check for 'size' wrapping around to zero in kern_mmap to account
for both the lower and upper bound modifications. Change the error returned
to ENOMEM. Rename the parameter size to len and make size a local variable
that stores the value of len after it has been modified.

This addresses concerns expressed by Bruce Evans after r348843.

Reported by: brde@optusnet.com.au
Reviewed by: kib, markj (mentors)
MFC after: 3 days
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20592
2019-06-10 21:26:14 +00:00
Doug Moore
97220a279f There are times when a len==0 parameter to mmap is okay. But on a
32-bit machine, a len parameter just a few bytes short of 4G, rounded
up to a page boundary and hitting zero then, is not okay. Return
failure in that case.

Reported by: pho
Reviewed by: alc, kib (mentor)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20580
2019-06-10 03:07:10 +00:00
Mark Johnston
8cd6a80d7d Restore the pre-r347532 behaviour of ignoring wiring failures in mmap().
The error handling added in r347532 is not right when mapping vnodes
and will be fixed separately.

Reported by:	syzbot+1d2cc393bd6c88a548be@syzkaller.appspotmail.com
MFC with:	r347532
2019-05-13 18:40:01 +00:00
Mark Johnston
54a3a11421 Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes.  User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2).  Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks.  In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process.  The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity.  There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded.  For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM.  Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by:	kib, ngie (mlock() test changes)
Tested by:	pho (earlier version)
MFC after:	45 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Konstantin Belousov
5dddee2d65 i386: honor kern.elf32.read_exec for ommap(2) and break(2), as already
done on amd64.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:56:48 +00:00
Konstantin Belousov
a7f67facdf Normalize the declaration of i386_read_exec variable.
It is currently re-declared in sys/sysent.h which is a wrong place for
MD variable.  Which causes redeclaration error with gcc when
sys/sysent.h and machine/md_var.h are included both.

Remove it from sys/sysent.h and instead include machine/md_var.h when
needed, under #ifdef for both i386 and amd64.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:51:51 +00:00
Konstantin Belousov
3fbc2e00d1 Add a tunable which changes mincore(2) algorithm to only report data
from the local mapping.

Enable the setting by default.
The article behind the change: https://arxiv.org/abs/1901.01161

Reviewed by:	markj
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18764
2019-01-07 22:10:48 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Brooks Davis
d48719bd96 Normalize COMPAT_43 syscall declarations.
Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 16:48:47 +00:00
Mateusz Guzik
2554f86a8d vm: stop taking proc lock in mmap to satisfy racct if it is disabled
Limits can be safely obtained with lim_cur from the thread. racct is compiled
in but disabled by default. Note that racct enablement is a boot-only tunable.

This eliminates second most common place of taking the lock while pkg building.

While here don't take the lock in mlockall either.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17210
2018-09-18 01:24:30 +00:00
Konstantin Belousov
6e1d2cf679 For compat32, emulate the same wraparound check as occurs on the real
ILP32 system.

Reported by and discussed with:	asomers
PR:	230162
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16525
2018-07-31 18:00:47 +00:00
Alan Cox
3e7cb27cdd Use a single, consistent approach to returning success versus failure in
vm_map_madvise().  Previously, vm_map_madvise() used a traditional Unix-
style "return (0);" to indicate success in the common case, but Mach-
style return values in the edge cases.  Since KERN_SUCCESS equals zero,
the only problem with this inconsistency was stylistic.  vm_map_madvise()
has exactly two callers in the entire source tree, and only one of them
cares about the return value.  That caller, kern_madvise(), can be
simplified if vm_map_madvise() consistently uses Unix-style return
values.

Since vm_map_madvise() uses the variable modify_map as a Boolean, make it
one.

Eliminate a redundant error check from kern_madvise().  Add a comment
explaining where the check is performed.

Explicitly note that exec_release_args_kva() doesn't care about
vm_map_madvise()'s return value.  Since MADV_FREE is passed as the
behavior, the return value will always be zero.

Reviewed by:	kib, markj
MFC after:	7 days
2018-06-04 16:28:06 +00:00
Konstantin Belousov
633d3b1c71 Only check for MAP_32BIT when available.
Reported by:	mmacy
Sponsored by:	The FreeBSD Foundation
MFC after:	10 days
2018-06-01 23:50:51 +00:00
Alan Cox
60221a5701 Only a small subset of mmap(2)'s flags should be used in combination with
the flag MAP_GUARD.  Rather than enumerating the flags that are not
allowed, enumerate the flags that are allowed.  The list of allowed flags
is much shorter and less likely to change.  (As an aside, one of the
previously enumerated flags, MAP_PREFAULT, was not even a legal flag for
mmap(2).  However, because of an earlier check within kern_mmap(), this
misuse of MAP_PREFAULT was harmless.)

Reviewed by:	kib
MFC after:	10 days
2018-06-01 21:37:42 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Konstantin Belousov
1c5196c3ed Assign map->header values to avoid boundary checks.
In several places, entry start and end field are checked, after
excluding the possibility that the entry is map->header.  By assigning
max and min values to the start and end fields of map->header in
vm_map_init, the explicit map->header checks become unnecessary.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc, kib, markj (previous version)
Tested by:	pho (previous version)
MFC after:	1 week
Differential Revision:  https://reviews.freebsd.org/D13735
2018-01-20 12:19:02 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mark Johnston
bd0e1beb98 Correct the type of foff.
No functional change intended.

Github PR:	124
Submitted by:	Wuyang Chung <wuyang.m.chung@outlook.com>
MFC after:	1 week
2017-11-08 01:53:03 +00:00
Konstantin Belousov
6a97a3f756 Treat the addr argument for mmap(2) request without MAP_FIXED flag as
a hint.

Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted
as the absolute minimal address of the range where the mapping is
created.  The VA allocator only allocates in the range [addr,
VM_MAXUSER_ADDRESS].  This is too restrictive, the mmap(2) call might
unduly fail if there is no free addresses above addr but a lot of
usable space below it.

Lift this implementation limitation by allocating VA in two passes.
First, try to allocate above addr, as before.  If that fails, do the
second pass with less restrictive constraints for the start of
allocation by specifying minimal allocation address at the max bss
end, if this limit is less than addr.

One important case where this change makes a difference is the
allocation of the stacks for new threads in libthr.  Under some
configuration conditions, libthr tries to hint kernel to reuse the
main thread stack grow area for the new stacks.  This cannot work by
design now after grow area is converted to stack, and there is no
unallocated VA above the main stack.  Interpreting requested stack
base address as the hint provides compatibility with old libthr and
with (mis-)configured current libthr.

Reviewed by:	alc
Tested by:	dim (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-28 04:02:36 +00:00