13469 Commits

Author SHA1 Message Date
kib
a42a93a102 Fix futexes on i386 after the 4/4G split.
Use proper method to access userspace.  For now, only the slow copyout
path is implemented.

Reported and tested by:	tijl (previous version)
Sponsored by:	The FreeBSD Foundation
2018-04-24 12:50:21 +00:00
kib
82b83ca7ef Use symbolic constant, explaining the operation.
Sponsored by:	The FreeBSD Foundation
2018-04-19 18:08:46 +00:00
jhb
96955e60da Simplify the code to allocate stack for auxv, argv[], and environment vectors.
Remove auxarg_size as it was only used once right after a confusing
assignment in each of the variants of exec_copyout_strings().

Reviewed by:	emaste
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D15123
2018-04-19 16:00:34 +00:00
avg
cc87433116 set kdb_why to "trap" when calling kdb_trap from trap_fatal
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.

Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred.  So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.

I have added the event only on architectures that have trap_fatal()
function defined.  I haven't looked at other architectures.  Their
maintainers can add support for the event later.

Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20

Reviewed by:	kib, jhb, markj
MFC after:	11 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15093
2018-04-19 05:06:56 +00:00
kib
a62d4dbdb8 Fix pmap_trm_alloc(M_ZERO).
Sponsored by:	The FreeBSD Foundation
2018-04-18 20:09:26 +00:00
kib
9e8f392b01 For fatal traps other than pagefaults, print raw fault error codes.
For pagefaults, the error is already decoded and printed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-04-18 20:07:47 +00:00
avg
1bcaa50aa1 don't check for kdb reentry in trap_fatal(), it's impossible
trap() checks for it earlier and calls kdb_reentry().

Discussed with:	jhb
MFC after:	12 days
Sponsored by:	Panzura
2018-04-18 15:44:54 +00:00
brooks
c35e9275fc Remove the unused fuwintr() and suiwintr() functions.
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.

Reviewed by:	kib, andrew
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15102
2018-04-17 18:04:28 +00:00
imp
853a633386 Use bool instead of boolean_t here. No reason to use boolean_t.
Also, stop passing FALSE to a bool parameter.
2018-04-16 13:52:40 +00:00
imp
5c64b878fa Make first a 'bool' instead of a 'boolean_t'.
'bool' is preferred to 'boolean_t'. We only get the boolean_t
definition by header pollution (though the same is true for
bool). Since we use both, switch entirely to bool.

Note: We still have TRUE/FALSE instead of true/false in heavy use in
the rest of the file. These are with ints of various flavors, so
that's appropriate, even though we should eventually migrate to bool
and true/false (though the tables they are in are nicely packed with
short and wouldn't be so nicely packed with bool, another reason
to leave it alone for now).
2018-04-14 22:14:18 +00:00
kib
e3089a0318 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00
kib
eda1e69e06 Fix PSL_T inheritance on exec for x86.
The miscellaneous x86 sysent->sv_setregs() implementations tried to
migrate PSL_T from the previous program to the new executed one, but
they evaluated regs->tf_eflags after the whole regs structure was
bzeroed.  Make this functional by saving PSL_T value before zeroing.

Note that if the debugger is not attached, executing the first
instruction in the new program with PSL_T set results in SIGTRAP, and
since all intercepted signals are reset to default dispostion on
exec(2), this means that non-debugged process gets killed immediately
if PSL_T is inherited.  In particular, since suid images drop
P_TRACED, attempt to set PSL_T for execution of such program would
kill the process.

Another issue with userspace PSL_T handling is that it is reset by
trap().  It is reasonable to clear PSL_T when entering SIGTRAP
handler, to allow the signal to be handled without recursion or
delivery of blocked fault.  But it is not reasonable to return back to
the normal flow with PSL_T cleared.  This is too late to change, I
think.

Discussed with:	bde, Ali Mashtizadeh
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D14995
2018-04-12 20:43:39 +00:00
emaste
c5db8e6d0d linuxulator: deduplicate linux_exec_imgact_try
Previously linuxulator had three identical copies of
linux_exec_imgact_try.  Deduplicate before adding another arch to
linuxulator.

Sponsored by:	Turing Robotic Industries Inc
Differential Revision:	https://reviews.freebsd.org/D14856
2018-04-09 17:24:01 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
royger
5f1547e410 x86: improve reservation of AP trampoline memory
So that it doesn't rely on physmap[1] containing an address below
1MiB. Instead scan the full physmap and search for a suitable address
to place the trampoline code (below 1MiB) and the initial memory pages
(below 4GiB).

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D14878
2018-04-05 14:39:51 +00:00
avg
3331ff57c2 fix i386 build with CPU_ELAN (LINT for instance) after r331878
x86/cpu_machdep.c now needs to include elan_mmcr.h when CPU_ELAN is set.
While here, also remove the now unneeded inclusion of isareg.h in i386
and amd64 vm_machdep.c.

Reported by:	lwhsu
MFC after:	14 days
X-MFC with:	r331878
2018-04-03 17:16:06 +00:00
avg
cbde65132d unify amd64 and i386 cpu_reset() in x86/cpu_machdep.c
Because I didn't see any reason not too.
I've been making some changes to the code and couldn't help but notice
that the i386 and am64 code was nearly identical.

MFC after:	17 days
2018-04-02 13:45:23 +00:00
avg
f909741389 x86 cpu_reset: if failed to switch to BSP proceed to cpu_reset_real
If cpu_reset() is called on an AP and if it somehow fails to wake the
BSP, then it's better to attempt the reset on the AP than just sit there
spinning on an unusable and undebuggable system.

MFC after:	16 days
2018-04-02 08:06:18 +00:00
avg
a9c1c585d1 x86 cpu_reset_proxy: no need to stop_cpus() the original processor
The processor is "parked" in a spin-loop already and that's sufficient
for the reset.  There is nothing that stop_cpus() would add here, only
extra complexity and fragility.
The original processor does not need to enable interrupts now, in fact,
it must not do that.

MFC after:	2 weeks
2018-04-02 07:45:13 +00:00
avg
2bd01ebfdb align i386 cpu_reset() with amd64 version
Maybe this code could be moved to x86.

MFC after:	1 week
2018-03-30 11:25:30 +00:00
jeff
bfe01083f9 Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all
platforms.  Original commit message as follows:

Only use CPUs in the domain the device is attached to for default
assignment.  Device drivers are able to override the default assignment
if they bind directly.  There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.

Reviewed by:    jhb, kib
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14838
2018-03-28 18:47:35 +00:00
jhb
7c78547f5c Fix kernel builds without options DDB after r331650.
Reported by:	cy
2018-03-28 16:24:56 +00:00
jhb
24fa2df20b Remove very old and unused signal information codes.
These have been supplanted by the MI signal information codes in
<sys/signal.h> since 7.0.  The FPE_*_TRAP ones were deprecated even
earlier in 1999.

PR:		226579 (exp-run)
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14637
2018-03-27 20:57:51 +00:00
jeff
124dca372e Backout r331606 until I can identify why it does not boot on some
machines.
2018-03-27 10:20:50 +00:00
jeff
d1125a4e0d Only use CPUs in the domain the device is attached to for default
assignment.  Device drivers are able to override the default assignment
if they bind directly.  There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.

Reviewed by:	jhb, kib
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14838
2018-03-27 03:37:04 +00:00
emaste
fa8b1b66d0 Remove redundant cast from Linuxulator SYSINITs 2018-03-23 20:32:54 +00:00
emaste
9f106a66bc Sort headers in MD Linuxulator files
Bring #includes closer to style(9) and reduce differences between the
(three) MD versions of linux_machdep.c and linux_sysvec.c.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-23 17:16:36 +00:00
kib
e114084144 There is no need to disable interrupts around npxsave call.
i386 was changed to only require critical section around the thread
FPU state manipulations, and vm86_bioscall callers already enter
critical section for other reasons.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-23 15:46:53 +00:00
kib
97713e367c Update comment to match current field names.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-03-23 15:44:31 +00:00
emaste
289be8516c Share Linux errno table with libsysdecode
Requested by:	jhb
Reviewed by:	jhb
Sponsored by:	Turing Robotic Industries Inc.
2018-03-22 12:58:49 +00:00
emaste
444301c25e Fix kernel memory disclosure in ibcs2_getdents
ibcs2_getdents() copies a dirent structure to userland.  The ibcs2
dirent structure contains a 2 byte pad element.  This element is never
initialized, but copied to userland none-the-less.

Note that ibcs2 has not built on HEAD since r302095.

Submitted by:	Domagoj Stolfa <ds815@cam.ac.uk>
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after:	3 days
Security:	Kernel memory disclosure (803)
2018-03-21 23:26:42 +00:00
emaste
af3be5b4fb Add ) missing from r330297
Sponsored by:	The FreeBSD Foundation
2018-03-21 23:17:26 +00:00
emaste
6d7d087d6c Restore close quote lost in r331254 2018-03-20 21:04:47 +00:00
emaste
6fe54a5343 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
emaste
3df5d233f9 Rationalize license text on Linuxolator files
i386 linux.h missed in r330239.

Approved by:	sos
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-03-20 02:50:11 +00:00
emaste
58e1b5421f Rename linuxulator functions with linux_ prefix
It's preferable to have a consistent prefix.  This also reduces
differences between the three linux*_sysvec.c files.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-19 21:26:32 +00:00
emaste
47fc89046a linux*_sysvec.c: rationalize whitespace and comments
There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-19 15:11:10 +00:00
emaste
566c3d41cc Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table.  Move the table to a common
file to be used by all.  Make it 'const int' to place it in .rodata.

(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)

This change should introduce no functional change; a followup will add
missing errno values.

MFC after:	3 weeks
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14665
2018-03-16 14:46:38 +00:00
emaste
d90e216823 ANSIfy i386/vm86.c 2018-03-16 12:12:41 +00:00
emaste
9975d0e7b5 Remove stray ; at end of linux_vdso_deinstall() 2018-03-14 13:20:36 +00:00
emaste
6851f84d1f Use C99 boolean type for translate_osrel
Migrate to modern types before creating MD Linuxolator bits for new
architectures.

Reviewed by:	cem
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14676
2018-03-13 16:40:29 +00:00
emaste
bc4d21ce60 Apply some style(9) to Linuxulator linux_sysvec.c comments 2018-03-13 00:40:05 +00:00
emaste
c303c68d6a imgact_linux.c: use standard indentation
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 23:28:25 +00:00
emaste
ac81b47ca6 Linuxulator: apply style(9) to return
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 15:35:24 +00:00
brooks
1db1eebcab Remove obsolete pcaudioio.h.
Nothing uses the #define's values or the types.  (Some NTP code does use
an audio_info_t, but it is in #ifdef'd support for Solaris and is not
this audio_info_t).

Sponsored by:	DARPA, AFRL
2018-03-11 16:17:53 +00:00
eadler
9b1ea58f58 sys: Fix a few potential infoleaks in cloudabi
While there is no immediate leak, if the structure changes underneath
us, there might be in the future.

Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC After:	1 month
Sponsored by:	DARPA/AFRL
2018-03-07 14:44:32 +00:00
jtl
8e9b6569cb amd64: Protect the kernel text, data, and BSS by setting the RW/NX bits
correctly for the data contained on each memory page.

There are several components to this change:
 * Add a variable to indicate the start of the R/W portion of the
   initial memory.
 * Stop detecting NX bit support for each AP.  Instead, use the value
   from the BSP and, if supported, activate the feature on the other
   APs just before loading the correct page table.  (Functionally, we
   already assume that the BSP and all APs had the same support or
   lack of support for the NX bit.)
 * Set the RW and NX bits correctly for the kernel text, data, and
   BSS (subject to some caveats below).
 * Ensure DDB can write to memory when necessary (such as to set a
   breakpoint).
 * Ensure GDB can write to memory when necessary (such as to set a
   breakpoint).  For this purpose, add new MD functions gdb_begin_write()
   and gdb_end_write() which the GDB support code can call before and
   after writing to memory.

This change is not comprehensive:
 * It doesn't do anything to protect modules.
 * It doesn't do anything for kernel memory allocated after the kernel
   starts running.
 * In order to avoid excessive memory inefficiency, it may let multiple
   types of data share a 2M page, and assigns the most permissions
   needed for data on that page.

Reviewed by:	jhb, kib
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14282
2018-03-06 14:28:37 +00:00
kib
d27cb27779 Unify bulk free operations in several pmaps.
Submitted by:	Yoshihiro Ota
Reviewed by:	markj
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13485
2018-03-04 20:53:20 +00:00
rpokala
03d8fd318a imcsmb(4): Intel integrated Memory Controller (iMC) SMBus controller driver
imcsmb(4) provides smbus(4) support for the SMBus controller functionality
in the integrated Memory Controllers (iMCs) embedded in Intel Sandybridge-
Xeon, Ivybridge-Xeon, Haswell-Xeon, and Broadwell-Xeon CPUs. Each CPU
implements one or more iMCs, depending on the number of cores; each iMC
implements two SMBus controllers (iMC-SMBs).

*** IMPORTANT NOTE ***
Because motherboard firmware or the BMC might try to use the iMC-SMBs for
monitoring DIMM temperatures and/or managing an NVDIMM, the driver might
need to temporarily disable those functions, or take a hardware interlock,
before using the iMC-SMBs. Details on how to do this may vary from board to
board, and the procedure may be proprietary. It is strongly suggested that
anyone wishing to use this driver contact their motherboard vendor, and
modify the driver as described in the manual page and in the driver itself.
(For what it's worth, the driver as-is has been tested on various SuperMicro
motherboards.)

Reviewed by:	avg, jhb
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D14447
Discussed with:	avg, ian, jhb
Tested by:	allanjude (previous version), Panasas
2018-03-03 01:53:51 +00:00
brooks
a0e10aee51 Rename kernel-only members of semid_ds and msgid_ds.
This deliberately breaks the API in preperation for future syscall
revisions which will remove these nonstandard members.

In an exp-run a single port (devel/qemu-user-static) was found to
use them which it did becuase it emulates system calls.  This has
been fixed in the ports tree.

PR:		224443 (exp-run)
Reviewed by:	kib, jhb (previous version)
Exp-run by:	antoine
Sponsored by:	DARPA, AFRP
Differential Revision:	https://reviews.freebsd.org/D14490
2018-03-02 22:10:48 +00:00