Commit Graph

7219 Commits

Author SHA1 Message Date
Conrad Meyer
5dc5dab6eb Add 4Kn kernel dump support
(And 4Kn minidump support, but only for amd64.)

Make sure all I/O to the dump device is of the native sector size.  To
that end, we keep a native sector sized buffer associated with dump
devices (di->blockbuf) and use it to pad smaller objects as needed (e.g.
kerneldumpheader).

Add dump_write_pad() as a convenience API to dump smaller objects with
zero padding.  (Rather than pull in NPM leftpad, we wrote our own.)

Savecore(1) has been updated to deal with these dumps.  The format for
512-byte sector dumps should remain backwards compatible.

Minidumps for other architectures are left as an exercise for the
reader.

PR:		194279
Submitted by:	ambrisko@
Reviewed by:	cem (earlier version), rpokala
Tested by:	rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5848
2016-04-15 17:45:12 +00:00
Sepherosa Ziehau
0c29fe6db8 hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus
Submitted by:	Jun Su <junsu microsoft com>
Reviewed by:	jhb, kib, sephe
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5910
2016-04-15 02:20:18 +00:00
John Baldwin
4478441145 Expose doreti as a global symbol on amd64 and i386.
doreti provides the common code path for returning from interrupt
andlers on x86.  Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	kib
2016-04-13 17:37:31 +00:00
John Baldwin
7ecf8cab6f Enable DEVICE_NUMA with up to 8 domains by default on amd64.
8 memory domains should handle a quad-socket board with dual-domain
processors.

Reviewed by:	kib
Relnotes:	maybe?
Differential Revision:	https://reviews.freebsd.org/D5893
2016-04-12 21:23:44 +00:00
Andriy Gapon
0d63fc3ed8 re-enable AMD Topology extension on certain models if disabled by BIOS
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors.  We re-enable the extension, so that we can properly discover
core and cache topology.  Linux seems to do the same.

Reported by:	Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by:	jhb, kib
Tested by:	Johannes Dieterich <dieterich.joh@gmail.com>
		(earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D5883
2016-04-12 13:30:39 +00:00
Andriy Gapon
9054bcbce7 [amd64] dtrace_invop handler is to be called only for kernel exceptions
DTrace-related exceptions in userland code are handled elsewhere.
One practical problem was a crash in dtrace_invop_start() when saved
%rsp pointed to a virtual address that was not backed.

i386 code already ignored userland exceptions.

Reviewed by: markj, kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D5906
2016-04-12 06:46:54 +00:00
Anish Gupta
441a3497f5 Allow guest writes to AMD microcode update[0xc0010020] MSR without updating actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL.
Submitted by:Yamagi Burmeister
Approved by:grehan
MFC after:2 weeks
2016-04-11 05:09:43 +00:00
Ed Schouten
ab83575070 Make CloudABI's way of doing TLS more friendly to userspace emulators.
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:

- CloudABI executables want to write to the %fs, which typically
  requires extra system calls by the emulator every time it needs to
  switch between CloudABI's and its own TLS.

- If CloudABI executables overwrite the %fs base unconditionally, it
  also becomes harder for the emulator to store a backup of the old
  value of %fs. To solve this, let's no longer overwrite %fs, but just
  %fs:0.

As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.

Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5836
2016-04-06 11:11:31 +00:00
Baptiste Daroussin
b6348be7b9 Add kern.features flags for linux and linux64 modules
kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported

The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation

Reviewed by:	dchagin
MFC after:	1 week
Relnotes:	yes
Differential Revision:	D5830
2016-04-05 22:36:48 +00:00
John Baldwin
2b1e924b69 Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386. 2016-04-03 23:03:54 +00:00
Ed Schouten
4a8b3b18cc Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
  regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
  knows at which address it got loaded and can apply relocations.
2016-03-31 18:52:00 +00:00
Konstantin Belousov
0df87548b9 Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5771
2016-03-29 19:56:48 +00:00
Dmitry Chagin
7c5982000d Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.
Pointed out by:	ae@
2016-03-27 10:09:10 +00:00
Dmitry Chagin
c826fcfe22 iConvert Linux SOL_IPV6 level.
MFC after:	1 week
2016-03-27 08:12:01 +00:00
Alexander Motin
baa7dd65be Polish wbwd(4) driver and add more supported chips.
MFC after:	1 month
2016-03-24 20:52:35 +00:00
John Baldwin
7a2c1d8c60 Enable interrupts on the BSP once all PICs are initialized.
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code).  This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5710
2016-03-24 00:24:07 +00:00
Dmitry Chagin
351cf753eb Regen for r297061 (fstatfs64 Linux syscall).
MFC after:	1 week
2016-03-20 13:23:01 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Gleb Smirnoff
e33c2e6b06 Due to invalid use of a signed intermediate value in the bounds checking
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.

Submitted by:	CORE Security
Patch by:	kib
Security:	SA-16:15
2016-03-16 22:33:12 +00:00
Konstantin Belousov
3ef966c4c0 The PKRU state size is 4 bytes, its support makes the XSAVE area size
non-multiple of 64 bytes.  Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.

Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.

Reported and tested by:	"Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
	jimharris
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-03-15 15:42:53 +00:00
John Baldwin
6fc8053f1a Fix reporting of the CloudABI ABI in kdump.
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag.  All of
  the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
  but SV_LP64 isn't set.  Instead, only assume a 32-bit ABI if SV_ILP32 is
  set and fallback to the unknown value of "00" if neither SV_LP64 nor
  SV_ILP32 is set.

Reviewed by:	kib, ed
Differential Revision:	https://reviews.freebsd.org/D5560
2016-03-09 18:38:30 +00:00
Marcel Moolenaar
6bcf245ebc Bump VM_MAX_MEMSEGS from 2 to 3 to match the number of VM segment
identifiers present in vmmapi.h. In particular, it's now possible
to create a VM_FRAMEBUFFER segment.
2016-02-26 16:18:47 +00:00
Konstantin Belousov
abb8f08388 Return dst as the result from memcpy(9) on amd64.
PR:	207422
MFC after:	1 week
2016-02-24 11:58:15 +00:00
Svatopluk Kraus
b352b10400 As <machine/vm.h> is included from <vm/vm.h>, there is no need to
include it explicitly when <vm/vm.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5380
2016-02-22 09:10:23 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Gleb Smirnoff
b28cc462ad Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by:	jhb
2016-02-09 20:22:35 +00:00
Gleb Smirnoff
e60b2fcbeb Redo r292484. Embed task(9) into zone, so that uz_maxaction is called
in a context that can sleep, allowing consumers of the KPI to run their
drain routines without any extra measures.

Discussed with:	jtl
2016-02-03 23:30:17 +00:00
John Baldwin
aa949be551 Convert ss_sp in stack_t and sigstack to void *.
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD.  NetBSD and OpenBSD both changed their
fields to void * back in 1998.  No new build failures were reported
via an exp-run.

PR:		206503 (exp-run)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5092
2016-01-27 17:55:01 +00:00
Xin LI
669414e4fb Implement AT_SECURE properly.
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.

Submitted by:	imp, dchagin
Security:	FreeBSD-SA-16:10.linux
Security:	CVE-2016-1883
2016-01-27 07:20:55 +00:00
Dmitry Chagin
9dba79fb66 Remove obsolete comment.
MFC after:	3 days
2016-01-23 08:08:06 +00:00
Dmitry Chagin
f138999141 Fix a typo.
MFC after:	3 days
2016-01-23 08:04:29 +00:00
Hans Petter Selasky
c1ecb7e114 Add missing atomic wrapper macro.
Reviewed by:	alfred @
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-01-21 18:22:50 +00:00
Konstantin Belousov
f132cd0547 Use ANSI definitions. Wrap long line.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-19 08:08:08 +00:00
Konstantin Belousov
b57e68141f Clear whole XMM register file instead of only XMM0. Also clear x87
registers.  This brings amd64 on par with i386, providing consistent
initial FPU state.

Note that we do not clear any extended state, at least because kernel
does not understand extended state structure and consequences of zero
overwrite after fninit()/fpusave().

Submitted by:	joss.upton@yahoo.com
PR:	206370
MFC after:	2 weeks
2016-01-19 08:04:02 +00:00
Gleb Smirnoff
de44d808ef Regen after r293907. 2016-01-14 10:15:21 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Jung-uk Kim
4ec1c9bfac Remove dead code when the target processor has POPCNT instruction. 2016-01-13 19:19:50 +00:00
Dmitry Chagin
038c720553 Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision:  https://reviews.freebsd.org/D1090

Reviewed by:	kib, trasz
MFC after:	1 week
2016-01-09 20:18:53 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
Ian Lepore
69dcb7e771 Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly.  Most startup code wasn't
doing so.  Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values.  Now if the size is zero, the provided buffer
full of existing env data is installed.  A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp.  A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data.  Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
2016-01-02 02:53:48 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Enji Cooper
418629d81d Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.h
This variable was added to sys/x86/include/x86_var.h recently.

This unbreaks building kernel source that #includes both md_var.h and x86_var.h
with gcc 4.2.1 on amd64

Differential Revision: https://reviews.freebsd.org/D4686
Reviewed by: kib
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
2015-12-22 20:08:32 +00:00
Warner Losh
2fca0f2dd4 Save the physical address passed into the kernel of the UEFI system
table.
2015-12-19 19:01:43 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
49e806677c Use ANSI C definition.
MFC after:	1 week
2015-12-07 17:24:55 +00:00
Conrad Meyer
10386b56ad pmap_invalidate_range: For very large ranges, flush the whole TLB
Typical TLBs have 40-512 entries available.  At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.

Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.

Reviewed by:	alc
Feedback from:	jhb, kib
MFC notes:	Depends on r291688
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4280
2015-12-06 17:39:13 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Konstantin Belousov
724f4b62b0 Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain.  It is only utilized on i386 for iBCS2
binaries.  The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2.  The same note is true for any other potential user if
sv_sigtbl.  In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by:	The FreeBSD Foundation
2015-11-28 08:49:07 +00:00
Ed Maste
2e0002c18e Fix whitespace on addition of IPSEC option 2015-11-26 21:35:50 +00:00