Commit Graph

239548 Commits

Author SHA1 Message Date
Marius Strobl
a6611c938b Fix the build with ALTQ after r344060. 2019-02-12 22:33:17 +00:00
Marius Strobl
f855ec814d Make taskqgroup_attach{,_cpu}(9) work across architectures
So far, intr_{g,s}etaffinity(9) take a single int for identifying
a device interrupt. This approach doesn't work on all architectures
supported, as a single int isn't sufficient to globally specify a
device interrupt. In particular, with multiple interrupt controllers
in one system as found on e. g. arm and arm64 machines, an interrupt
number as returned by rman_get_start(9) may be only unique relative
to the bus and, thus, interrupt controller, a certain device hangs
off from.
In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to
the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}()
not work across architectures. Yet in turn, iflib(4) as gtaskqueue
consumer so far doesn't fit architectures where interrupt numbers
aren't globally unique.
However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as
employed by the gtaskqueue implementation to bind an interrupt to a
particular CPU, using bus_bind_intr(9) instead is equivalent from
a functional point of view, with bus_bind_intr(9) taking the device
and interrupt resource arguments required for uniquely specifying a
device interrupt.
Thus, change the gtaskqueue implementation to employ bus_bind_intr(9)
instead and intr_{g,s}etaffinity(9) to take the device and interrupt
resource arguments required respectively. This change also moves
struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps
struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL
as userland likes to include <sys/_task.h> or indirectly drags it
in - for better or worse also with _KERNEL defined -, which with
device_t and struct resource dependencies otherwise is no longer
as easily possible now.
The userland inclusion problem probably can be improved a bit by
introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the
existing _WANT_PRISON etc., which is orthogonal to this change,
though, and likely needs an exp-run.

While at it:
- Change the gt_cpu member in the grouptask structure to be of type
  int as used elswhere for specifying CPUs (an int16_t may be too
  narrow sooner or later),
- move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to
  the gtaskqueue implementation as it's only used and needed there,
- change the GTASK_INIT macro to use "gtask" rather than "task" as
  argument given that it actually operates on a struct gtask rather
  than a struct task, and
- let subr_gtaskqueue.c consistently use __func__ to print functions
  names.

Reported by:	mmel
Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D19139
2019-02-12 21:23:59 +00:00
Kristof Provost
3838c6a3e6 garp: Fix vnet related panic for gratuitous arp
Gratuitous ARP packets are sent from a timer, which means we don't have a vnet
context set. As a result we panic trying to send the packet.

Set the vnet context based on the interface associated with the interface
address.

To reproduce:
  sysctl net.link.ether.inet.garp_rexmit_count=2
  ifconfig vtnet1 10.0.0.1/24 up

PR:		235699
Reviewed by:	vangyzen@
MFC after:	1 week
2019-02-12 21:22:57 +00:00
Marius Strobl
95dcf343b7 Further correct and optimize the bus_dma(9) usage of iflib(4):
o Correct the obvious bugs in the netmap(4) parts:
  - No longer check for the existence of DMA maps as bus_dma(9)
    is used unconditionally in iflib(4) since r341095.
  - Supply the correct DMA tag and map pairs to bus_dma(9)
    functions (see also the commit message of r343753).
  - In iflib_netmap_timer_adjust(), add synchronization of the
    TX descriptors before calling the ift_txd_credits_update
    method as the latter evaluates the TX descriptors possibly
    updated by the MAC.
  - In _task_fn_tx(), wrap the netmap(4)-specific bits in
    #ifdef DEV_NETMAP just as done in _task_fn_admin() and
    _task_fn_rx() respectively.
o In iflib_fast_intr_rxtx(), synchronize the TX rather than
  the RX descriptors before calling the ift_txd_credits_update
  method (see also above).
o There's no need to synchronize an RX buffer that is going to
  be recycled in iflib_rxd_pkt_get(), yet; it's sufficient to
  do that as late as passing RX buffers to the MAC via the
  ift_rxd_refill method. Hence, combine that synchronization
  with the synchronization of new buffers into a common spot
  in _iflib_fl_refill().
o There's no need to synchronize the RX descriptors of a free
  list in preparation of the MAC updating their statuses with
  every invocation of rxd_frag_to_sd(); it's enough to do this
  once before handing control over to the MAC, i. e. before
  calling ift_rxd_flush method in _iflib_fl_refill(), which
  already performs the necessary synchronization.
o Given that the ift_rxd_available method evaluates the RX
  descriptors which possibly have been altered by the MAC,
  synchronize as appropriate beforehand. Most notably this
  is now done in iflib_rxd_avail(), which in turn means that
  we don't need to issue the same synchronization yet again
  before calling the ift_rxd_pkt_get method in iflib_rxeof().
o In iflib_txd_db_check(), synchronize the TX descriptors
  before handing them over to the MAC for transmission via
  the ift_txd_flush method.
o In iflib_encap(), move the TX buffer synchronization after
  the invocation of the ift_txd_encap() method. If the MAC
  driver fails to encapsulate the packet and we retry with
  a defragmented mbuf chain or finally fail, the cycles for
  TX buffer synchronization have been wasted. Synchronizing
  afterwards matches what non-iflib(4) drivers typically do
  and is sufficient as the MAC will not actually start with
  the transmission before - in this case - the ift_txd_flush
  method is called.
  Moreover, for the latter reason the synchronization of the
  TX descriptors in iflib_encap() can go as it's enough to
  synchronize them before passing control over to the MAC by
  issuing the ift_txd_flush() method (see above).
o In iflib_txq_can_drain(), only synchronize TX descriptors
  if the ift_txd_credits_update method accessing these is
  actually called.

Differential Revision:	https://reviews.freebsd.org/D19081
2019-02-12 21:08:44 +00:00
Poul-Henning Kamp
50619ae7d5 Point people to SMP(4) for CPU<->domain mapping. 2019-02-12 21:06:07 +00:00
Warner Losh
4def346d56 Revert r343077 until the license issues surrounding it can be resolved.
Approved by:	core@
2019-02-12 19:05:09 +00:00
Dimitry Andric
54553daf6d Pull in r339734 from upstream llvm trunk (by Eli Friedman):
[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.

  Intentionally excluding nodes from the DAGCombine worklist is likely
  to lead to weird optimizations and infinite loops, so it's generally
  a bad idea.

  To avoid the infinite loops, fix DAGCombine to use the
  isDesirableToCommuteWithShift target hook before performing the
  transforms in question, and implement the target hook in the ARM
  backend disable the transforms in question.

  Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
  reduced testcase for that bug. But we should have sufficient test
  coverage for PerformSHLSimplify given that we're not playing weird
  tricks with the worklist. I can try to bugpoint it if necessary,
  though.)

  Differential Revision: https://reviews.llvm.org/D50667

This should fix a possible hang when compiling sys/dev/nxge/if_nxge.c
(which exists now only in the stable/11 branch) for arm.
2019-02-12 18:32:14 +00:00
Edward Tomasz Napierala
240d69b9b4 Fix markup - use .Pa for the directory component, not .Fa.
Reported by:	0mp
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-02-12 13:01:55 +00:00
Leandro Lupori
b8efbfb9d3 [ppc64] prevent infinite loop on icache sync
At moea64_sync_icache(), when the 'va' argument has page size
alignment, round_page() will return the same value as 'va'.
This would cause 'len' to be 0 and thus an infinite loop.

With this change, 'lim' will always point to the next page boundary.

This issue occurred especially during debugging sessions, when a breakpoint
was placed on an exact page-aligned offset, for instance.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D19149
2019-02-12 11:29:03 +00:00
Michael Tuexen
aef0641755 Improve input validation for raw IPv4 socket using the IP_HDRINCL
option.

This issue was found by running syzkaller on OpenBSD.
Greg Steuck made me aware that the problem might also exist on FreeBSD.

Reported by:		Greg Steuck
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D18834
2019-02-12 10:17:21 +00:00
Li-Wen Hsu
4f3128086b Remove empty files
Approved by:	markj (mentor)
Sponsored by:	The FreeBSD Foundation
2019-02-12 08:16:05 +00:00
Ben Widawsky
2d8cb2f494 termcap: Add an entry for kitty
The project is here:
https://github.com/kovidgoyal/kitty/

I created a port (which now needs updating):
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233010

If only we could use terminfo :(

MFC after:      5 days
Approved by:    bapt
Differential Revision:	https://reviews.freebsd.org/D19060
2019-02-12 05:15:36 +00:00
Pedro F. Giffuni
6929b7d1ab UMA: unsign some variables related to allocation in hash_alloc().
As a followup to r343673, unsign some variables related to allocation
since the hashsize cannot be negative. This gives a bit more space to
handle bigger allocations and avoid some implicit casting.

While here also unsign uh_hashmask, it makes little sense to keep that
signed.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19148
2019-02-12 04:33:05 +00:00
Enji Cooper
7bc2a58ea6 Bump __FreeBSD_version__ for r343891
This will allow upstream consumers, e.g., capsicum-test and third-party
packages (via ports(7)), to test for a specific `__FreeBSD_version__` and
expect `renameat(2)` to be functional.

PR:		222258
Approved by:	emaste (mentor)
Reviewed by:	emaste
MFC with:	r343891
Differential Revision: https://reviews.freebsd.org/D19154
2019-02-12 03:32:40 +00:00
Kevin Lo
ec637bb957 Remove entry for Intenso product. 2019-02-12 02:55:25 +00:00
Kevin Lo
65564a5e76 Remove duplicate vendor id in r334650. Intenso doesn't have a USB VID. 2019-02-12 02:48:16 +00:00
Kyle Evans
446ae812b0 libbe(3): Belatedly note the BE_DESTROY_ORIGIN option added in r343977
X-MFC-With: r343977
2019-02-12 02:16:21 +00:00
Patrick Kelsey
997667302f Fix the fix added in r343287 for spurious HFSC bandwidth check errors
The logic added in r343287 to avoid false-positive
sum-of-child-bandwidth check errors for HFSC queues has a bug in it
that causes the upperlimit service curve of an HFSC queue to be pulled
down to its parent's linkshare service curve if it happens to be above
it.

Upon further inspection/reflection, this generic
sum-of-child-bandwidths check does not need to be fixed for HFSC - it
needs to be skipped.  For HFSC, the equivalent check is to ensure the
sum of child linkshare service curves are at or below the parent's
linkshare service curve, and this check is already being performed by
eval_pfqueue_hfsc().

This commit reverts the affected parts of r343287 and adds new logic
to skip the generic sum-of-child-bandwidths check for HFSC.

MFC after:	1 day
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19124
2019-02-11 22:58:43 +00:00
David Bright
3420c04b44 CID 1009492: Logically dead code in sys/cam/scsi/scsi_xpt.c
In `probedone()`, for the `PROBE_REPORT_LUNS` case, all paths that
fall to the bottom of the case set `lp` to `NULL`, so the test for a
non-NULL value of `lp` and call to `free()` if true is dead code as
the test can never be true. Fix by eliminating the whole if
statement. To guard against a possible future change that accidentally
violates this assumption, use a `KASSERT()` to catch if `lp` is
non-NULL.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19109
2019-02-11 22:09:26 +00:00
Brooks Davis
f95509a489 mdmfs: Fix many bugs in automatic md(4) creation.
This code allocated a correctly sized buffer, read past the end of the
source buffer, writing off the end of the target buffer, and then writing
a '\0' terminator past the end of the target buffer (in the wrong place).
It then leaked the buffer.

Switch to a statically sized buffer on the stack and update the source
pointer and
length before use so the correct things are copied.

Fix a logic error in the checks that the format of the line is as
expected and move on out of an assert.

Remove an unneeded close(). fclose() closes the descriptor.

Found with:	CheriABI
Obtained from:	CheriBSD
Reviewed by:	kib, jhb, markj
Differential Revision:	https://reviews.freebsd.org/D19122
2019-02-11 21:31:26 +00:00
John Baldwin
967b2dce02 Enable PCI BAR reallocation by default.
When pci_realloc_bars was first added, the intention was to eventually
enable it by default, but it was left disabled to preserve existing
behavior.  The setting is pretty conservative in that it does not
attempt to allocate resources for BARs that the BIOS/firmware leaves
disabled.  It only attempts to reallocate resources for a BAR that the
firmware programmed during boot but that conflicts with another
resource during the kernel's device scan.

PR 221350 is an example of a machine that this knob fixes.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D18965
2019-02-11 20:47:09 +00:00
Edward Tomasz Napierala
22427daf7e Add explanation of branches to the ports(7) man page.
Reviewed by:	matthew@, freebsd@mhka.no
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19146
2019-02-11 20:46:32 +00:00
Andrey V. Elsukov
804a6541db Remove `set' field from state structure and use set from parent rule.
Initially it was introduced because parent rule pointer could be freed,
and rule's information could become inaccessible. In r341471 this was
changed. And now we don't need this information, and also it can become
stale. E.g. rule can be moved from one set to another. This can lead
to parent's set and state's set will not match. In this case it is
possible that static rule will be freed, but dynamic state will not.
This can happen when `ipfw delete set N` command is used to delete
rules, that were moved to another set.
To fix the problem we will use the set number from parent rule.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-02-11 18:10:55 +00:00
Martin Cracauer
bc235bb54d Bump .Dd for today's edit.
Thank you Enji Cooper
2019-02-11 16:31:15 +00:00
Martin Cracauer
aa255a10b0 Clarify NFSv4 /etc/exports semantics, with working example.
The existing wording has been confusing users for years.
2019-02-11 15:51:28 +00:00
Michael Tuexen
74a083d6c7 Fix flags used when compiling kern_kcov.c and subr_coverage.c.
Without this fix, the usage of kernel coverage would lockup the system.
Thanks to Andrew for suggesting the final form of the fix.

PR:			235611
Reviewed by:		andrew@, emaste@
Differential Revision:	https://reviews.freebsd.org/D19135
2019-02-11 15:38:05 +00:00
Ganbold Tsagaankhuu
66bddb4c70 Add sensors support for AXP803/AXP813. Sensor values such as
battery charging, charge state, voltage, charging current, discharging current,
battery capacity etc. can be obtained via sysctl.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D19145
2019-02-11 14:31:19 +00:00
Oleksandr Tymoshenko
3af08701cd Fix off-by-one error in BERI virtio driver
The hardcoded ident is exactly 20 bytes long but sprintf adds terminating zero,
so there is one byte written out of array bounds.As a fix use strncpy it
appends \0 only if space allows and its behavior matches virtio spec:

When VIRTIO_BLK_T_GET_ID is issued, the device identifier, up to 20 bytes, is
written to the buffer. The identifier should be interpreted as an ascii string.
It is terminated with \0, unless it is exactly 20 bytes long.

PR:		202298
Reviewed by:	br
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18852
2019-02-11 07:42:32 +00:00
Patrick Kelsey
d178fee632 Place pf_altq_get_nth_active() under the ALTQ ifdef
MFC after:	1 week
2019-02-11 05:39:38 +00:00
Patrick Kelsey
8f2ac65690 Reduce the time it takes the kernel to install a new PF config containing a large number of queues
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19131
2019-02-11 05:17:31 +00:00
Kyle Evans
6286a6438e bectl(8): commit missing test modifications from r343993
X-MFC-With:	r343993
2019-02-11 04:00:42 +00:00
Kyle Evans
77b4126ce6 bectl(8): Add -o flag to destroy to clean up the origin snapshot of BE
We can't predict when destruction of origin is needed, and currently we have
a precedent for not prompting for things. Leave the decision up to the user
of bectl(8) if they want the origin snapshot to be destroyed or not.

Emits a warning when -o isn't used and an origin snapshot is left to be
cleaned up, for the time being. This is handy when one drops the -o flag but
really did want to clean up the origin.

A couple of -e ignore's have been sprinkled around the test suite for places
that we don't care that the origin's not been cleaned up. -o functionality
tests will be added in the future, but are omitted for now to reduce
conflicts with work in flight to fix bits of the tests.

Reported by:	Shawn Webb
MFC after:	1 week
2019-02-11 04:00:01 +00:00
Conrad Meyer
39f37df26e gbde(8) - simplify randomisation with arc4random_buf
Submitted by:	David CARLIER <devnexen AT gmail.com>
Differential Revision:	https://reviews.freebsd.org/D18678
2019-02-11 00:11:02 +00:00
Andriy Voskoboinyk
f3f08e16a3 net80211(4): hide casts for 'i_seq' field offset calculation inside
ieee80211_getqos() and reuse it in various places.

Checked with RTL8188EE, HOSTAP mode + RTL8188CUS, STA mode.

MFC after:	2 weeks
2019-02-10 23:58:56 +00:00
Mariusz Zaborski
0020c845a0 libnv: fix memory leaks
Free the data array for NV_TYPE_DESCRIPTOR_ARRAY case.

MFC after:	2 weeks
2019-02-10 23:30:54 +00:00
Mariusz Zaborski
b5d787d93b libnv: fix memory leaks
nvpair_create_stringv: free the temporary string; this fix affects
nvlist_add_stringf() and nvlist_add_stringv().

nvpair_remove_nvlist_array (NV_TYPE_NVLIST_ARRAY case): free the chain
of nvpairs (as resetting it prevents nvlist_destroy() from freeing it).
Note: freeing the chain in nvlist_destroy() is not sufficient, because
it would still leak through nvlist_take_nvlist_array().  This affects
all nvlist_*_nvlist_array() use

Submitted by:	Mindaugas Rasiukevicius <rmind@netbsd.org>
Reported by:	clang/gcc ASAN
MFC after:	2 weeks
2019-02-10 23:28:55 +00:00
Conrad Meyer
e0d164c7a6 Prevent overflow for usertime/systime in caclru1
PR:		76972 and duplicates
Reported by:	Dr. Christopher Landauer <cal AT aero.org>,
		Steinar Haug <sthaug AT nethelp.no>
Submitted by:	Andrey Zonov <andrey AT zonov.org> (earlier version)
MFC after:	2 weeks
2019-02-10 23:07:46 +00:00
Jilles Tjoelker
aac5464b61 sh: Restore $((x)) error checking after fix for $((-9223372036854775808))
SVN r342880 was designed to fix $((-9223372036854775808)) and things like
$((0x8000000000000000)) but also broke error detection for values of
variables without dollar sign ($((x))).

For compatibility, overflow in plain literals continues to be ignored and
the value is clamped to the boundary (except 9223372036854775808 which is
changed to -9223372036854775808).

Reviewed by:	se (although he would like error checking to be removed)
MFC after:	2 weeks
X-MFC-with:	r342880
Differential Revision:	https://reviews.freebsd.org/D18926
2019-02-10 22:23:05 +00:00
Andriy Voskoboinyk
2a0f9d5416 ifconfig(8): display 802.11n rates correctly for 'roam:rate' parameter
MFC after:	5 days
2019-02-10 21:32:39 +00:00
Marius Strobl
345c692d18 As struct cryptop is wrapped in #ifdef _KERNEL, userland doesn't
need to drag in <sys/_task.h> either.
2019-02-10 21:27:03 +00:00
Kristof Provost
4c8fb952b5 pfctl: Fix ifa_grouplookup()
Setting the length of the request got lost in r343287, which means SIOCGIFGMEMB
gives us the required length, but does not copy the names of the group members.
As a result we don't get a correct list of group members, and 'set skip on
<ifgroup>' broke.

This produced all sorts of very unexpected results, because we would end up
applying 'set skip' to unexpected interfaces.

X-MFC-with:	r343287
2019-02-10 21:22:55 +00:00
Kyle Evans
13c62c50e3 libbe(3): Add a destroy option for removing the origin
Currently origin snapshots are left behind when a BE is destroyed, whether
it was an auto-created snapshot or explicitly specified via, for example,
`bectl create -e be@mysnap ...`.

Removing it automatically could be argued as a POLA violation in some
circumstances, so provide a flag to be_destroy for it. An accompanying
option will be added to bectl(8) to utilize this.

Some minor style/consistency nits in the affected areas also addressed.

Reported by:	Shawn Webb
MFC after:	1 week
2019-02-10 21:19:09 +00:00
Justin Hibbits
dcbd7de5b6 powerpc: Clamp MAXCPU for MPC85XXSPE kernel to 2
SoCs with e500v2 chips only have at most 2 cores, and there are no plans to
release any more e500v2-based SoCs.  Clamping MAXCPU down to 2 saves 5MB of
data, and 1.5MB bss.
2019-02-10 20:21:20 +00:00
Nathan Whitehorn
f68992cf66 Performance improvements for octe(4):
- Distribute RX load across multiple cores, if present. This reverts
  r217212, which is no longer relevant (I think because of the newer
  SDK).
- Use newer APIs for pinning taskqueue entries to specific cores.
- Deepen RX buffers.

This more than doubles NAT forwarding throughput on my EdgeRouter Lite from,
with typical packet mixture, 90 Mbps to over 200 Mbps. The result matches
forwarding throughput in Linux without the UBNT hardware offload on the same
hardware, and thus likely reflects hardware limits.

Reviewed by:	jhibbits
2019-02-10 20:13:59 +00:00
Navdeep Parhar
3c25d4ea3c cxgbe(4): Ignore unused interrupts.
Sponsored by:	Chelsio Communications
2019-02-10 19:20:03 +00:00
Sergey Kandaurov
9b9a527843 Sync "struct addrinfo" declaration with netdb.h.
Notably, unlike in OpenBSD, which the man page was copied from,
ai_canonname and ai_addr come in different order.

PR:		225880
MFC after:	1 week
2019-02-10 19:07:47 +00:00
Konstantin Belousov
f6d281e8aa struct xswdev on amd64 requires compat32 shims after ino64.
i386 is the only architecture where uint64_t does not specify 8-bytes
alignment, which makes struct xswdev layout not compatible between
64bit and i386.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-10 19:01:05 +00:00
Michal Meloun
9492d971eb Fix bug introduced by r343962.
DMAMAP_DMAMEM_ALLOC is property of dmamap, not dmatag.

MFC after:	1 week
Reported by:	ian
Pointy hat:	mmel
2019-02-10 18:28:37 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Michal Meloun
e609023c0b Don't allocate same clock twice..
MFC after:	1 week
Reported by:	jah
2019-02-10 14:30:15 +00:00