Commit Graph

1143 Commits

Author SHA1 Message Date
Ed Maste
347a8ed1bf linuxulator: fix stack memory disclosure in linux_sigaltstack
Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before
callit it, but linux_waitid did not.  Instead of zeroing in the called
function to address linux_waitid (as in commit 2e6ebe70), just do it in
linux_waitid.

admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	Andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 17:12:16 +00:00
Ed Maste
9866e7bbae linuxulator: fix stack memory disclosure in linux_ioctl_termio
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:21:03 +00:00
Ed Maste
4308a37410 linuxulator: fix stack memory disclosure in linux_ioctl_v4l
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:19:02 +00:00
Kirk McKusick
88640c0e8b Create new EINTEGRITY error with message "Integrity check failed".
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
2019-01-17 06:35:45 +00:00
Mark Johnston
bb376a990c Specify the correct option level when emulating SO_PEERCRED.
Our equivalent to SO_PEERCRED, LOCAL_PEERCRED, is implemented at
socket option level 0, not SOL_SOCKET.

PR:		234722
Submitted by:	Dániel Bakai <bakaidl@gmail.com>
MFC after:	2 weeks
2019-01-08 17:21:59 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Mark Johnston
792843c38f Pass malloc flags directly through kevent(2) subroutines.
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318
2018-11-24 17:06:01 +00:00
Tijl Coosemans
7df0e7beb7 Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument.  Stop using the macro as well as LINUX_CMSG_FIRSTHDR.  Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR:		217901
MFC after:	2 days
X-MFC-With:	r340631
2018-11-20 14:18:57 +00:00
Tijl Coosemans
e3b385fc95 Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin.  For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly.  One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR:		217901
Reviewed by:	kib
MFC after:	3 days
2018-11-19 15:31:54 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Mark Johnston
c7902fbeae Improve handling of control message truncation.
If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space
for all control messages, the kernel sets MSG_CTRUNC in the message
flags to indicate truncation of the control messages.  In the case
of SCM_RIGHTS messages, however, we were failing to dispose of the
rights that had already been externalized into the recipient's file
descriptor table.  Add a new function and mbuf type to handle this
cleanup task, and use it any time we fail to copy control messages
out to the recipient.  To simplify cleanup, control message truncation
is now only performed at control message boundaries.

The change also fixes a few related bugs:
- Rights could be leaked to the recipient process if an error occurred
  while copying out a message's contents.
- We failed to set MSG_CTRUNC if the truncation occurred on a control
  message boundary, e.g., if the caller received two control messages
  and provided only the exact amount of buffer space needed for the
  first.

PR:		131876
Reviewed by:	ed (previous version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16561
2018-08-07 16:36:48 +00:00
Alan Somers
6040822c4e Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
Ed Maste
e8a1ec3e05 Split kern_break from sys_break and use it in linuxulator
Previously the linuxulator's linux_brk invoked the FreeBSD sys_break
syscall implementation directly.  Instead, move the bulk of the existing
implementation to kern_break, and call that from both sys_break and
linux_brk.

This also addresses a minor bug in linux_brk in that we now return the
actual (rounded up) break address, rather than the requested value.

Reviewed by:	brooks (earlier version)
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D16019
2018-06-27 14:45:13 +00:00
Ed Maste
9c42fa94a6 Quiet unused fn warning for linuxulator w/o legacy syscalls
Sponsored by:	Turing Robotic Industries
2018-06-25 19:24:50 +00:00
Chuck Tuffli
3575504976 Fix the Linux kernel version number calculation
The Linux compatibility code was converting the version number (e.g.
2.6.32) in two different ways and then comparing the results.

The linux_map_osrel() function converted MAJOR.MINOR.PATCH similar to
what FreeBSD does natively. I.e. where major=v0, minor=v1, and patch=v2
    v = v0 * 1000000 + v1 * 1000 + v2;

The LINUX_KERNVER() macro, on the other hand, converted the value with
bit shifts. I.e. where major=a, minor=b, and patch=c
    v = (((a) << 16) + ((b) << 8) + (c))

The Linux kernel uses the later format via the KERNEL_VERSION() macro in
include/generated/uapi/linux/version.h

Fix is to use the LINUX_KERNVER() macro in linux_map_osrel() as well as
in the .trans_osrel functions.

PR: 229209
Reviewed by: emaste, cem, imp (mentor)
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15952
2018-06-22 00:02:03 +00:00
Konstantin Belousov
31665c1abb linux_clone_thread: mark new thread as TDB_BORN.
So that the ptrace code will catch it and report it to attached
debugger.  Enables debugging of threaded Linux binaries with FreeBSD
debugger.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15880
2018-06-21 21:15:04 +00:00
Ed Maste
aec1e6d390 linuxulator: handle V3 capget/capset
Linux 2.6.26 introduced 64-bit capability sets.  Extend our stub
implementation to handle both 32- and 64-bit.  (We still report no
capabilities in capget, and disallow any in capset.)

Reviewed by:	chuck
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D15887
2018-06-19 21:26:23 +00:00
Ed Maste
645f3d4345 linuxulator: add debugging for invalid capget/capset version
Sponsored by:	Turing Robotic Industries Inc.
2018-06-18 18:43:45 +00:00
Ed Maste
931e2a1a6e linuxulator: do not include legacy syscalls on arm64
Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files.  We may need finer grained control in the future but this is
sufficient for now.

Reviewed by:	andrew
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D15237
2018-06-15 14:41:51 +00:00
Ed Maste
4b842782da Correct debug control for linuxulator faccessat
The Linuxulator provides per-syscall debug control via the
compat.linux.debug sysctl.  There's generally a 1:1 mapping between
sysctl setting and syscall, but faccessat was controlled by the access
setting, perhaps due to copy-paste.

Sponsored by:	Turing Robotic Industries
2018-06-15 14:29:41 +00:00
Brooks Davis
9da5364ed9 Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.

Reviewed by:	emaste, kib (older version)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15638
2018-06-14 21:27:25 +00:00
Bruce Evans
407a812657 Oops, r335053 had an old version of the comment about 16-bit linux dev_t
translation.
2018-06-13 12:44:45 +00:00
Bruce Evans
ab35e1c71b Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers.  This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility.  Extra bits give the much
larger complication that the translations need to compress into fewer
bits.  Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings.  E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2.  But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102.  ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them.  I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current.  E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation.  So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set.  Add some error checking for when bits are lost.  Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself.  Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32.  Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros.  Describe the encoding and
some of the reasons for it.  16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding.  My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits.  This minimizes discontiguities.

Reviewed by:	kib (except for rewrite of the comment in linux_stats.c)
2018-06-13 12:22:00 +00:00
Bruce Evans
372639f944 Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat().  It was
missing clamping and bzero()ing of padding.

Reviewed by:	kib (except a final fix of the clamp to the signed maximum)
2018-06-13 08:50:43 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Konstantin Belousov
0cde66af78 Fix futexes on i386 after the 4/4G split.
Use proper method to access userspace.  For now, only the slow copyout
path is implemented.

Reported and tested by:	tijl (previous version)
Sponsored by:	The FreeBSD Foundation
2018-04-24 12:50:21 +00:00
Ed Maste
38be312706 Map FreeBSD EDOOFUS to Linux EINVAL
Previously EDOOFUS mapped to EBUSY.  EINVAL seems more appropriate.

Discussed with:	cem
MFC after:	1 week
Sponsored by:	Turing Robotic Industries Inc.
2018-04-23 18:33:26 +00:00
Ed Maste
b267239d4b linuxulator: deduplicate linux_exec_imgact_try
Previously linuxulator had three identical copies of
linux_exec_imgact_try.  Deduplicate before adding another arch to
linuxulator.

Sponsored by:	Turing Robotic Industries Inc
Differential Revision:	https://reviews.freebsd.org/D14856
2018-04-09 17:24:01 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Ed Maste
19406511e5 Fix kernel memory disclosure in linux_ioctl_socket
strlcpy is used to copy a string into a buffer to be copied to userland,
previously leaving uninitialized data after the terminating NUL.  Zero
the buffer first to avoid a kernel memory disclosure.

admbugs:	765, 811
MFC after:	1 day
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reported by:	Vlad Tsyrklevich
Sponsored by:	The FreeBSD Foundation
2018-04-04 19:58:25 +00:00
Ed Maste
d851b216eb linux_ioctl_hdio: fix kernel memory disclosure
Stack-allocated struct linux_hd_big_geometry has undeclared padding
copied to userland.

admbugs:	765
Reported by:	Vlad Tsyrklevich
MFC after:	1 day
Security:	Kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2018-04-04 14:41:48 +00:00
Ed Maste
b1ed95fd9d Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-Clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.

Approved by:	rdivacky, marcel
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-03-23 14:39:34 +00:00
Ed Maste
1ac2776bbb Share Linux errno table with libsysdecode
Requested by:	jhb
Reviewed by:	jhb
Sponsored by:	Turing Robotic Industries Inc.
2018-03-22 12:58:49 +00:00
Ed Maste
d595c5c0d6 linux_errno.c: add newer errno values
Also introduce a static assert to ensure the list is kept up to date.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-16 14:51:47 +00:00
Ed Maste
6e481f83f7 Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table.  Move the table to a common
file to be used by all.  Make it 'const int' to place it in .rodata.

(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)

This change should introduce no functional change; a followup will add
missing errno values.

MFC after:	3 weeks
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14665
2018-03-16 14:46:38 +00:00
Ed Maste
340f4a8d3e Linuxulator: apply style(9) to return
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 15:35:24 +00:00
Eitan Adler
6eee883a17 sys/linux: Fix a few potential infoleaks in Linux IPC
Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC After:	1 month
2018-03-03 21:14:55 +00:00
Ed Maste
023b850b62 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
still waiting on permission from others are listed in review D14210.

Approved by:    dchagin, rdivacky, sos
MFC after:	1 week
MFC with:	r329370
Sponsored by:	The FreeBSD Foundation
2018-03-01 13:52:18 +00:00
Ed Maste
eae594f7d5 Correct proper nouns in the Linuxulator
- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines

Sponsored by:	Turing Robotic Industries Inc.
2018-02-22 02:24:17 +00:00
Hans Petter Selasky
07c757ec25 Allow LinuxKPI character devices to receive mmap() calls from the Linux
binary mode user-space emulation layer. This is a regression issue after
r328436, when LinuxKPI character devices started to use DTYPE_DEV in
the "f_type" field of the associated file structure(s).

MFC after:	3 days
Found by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-21 10:13:17 +00:00
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Ed Maste
132f90c660 Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator.  Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 17:29:12 +00:00
Pedro F. Giffuni
7f2d13d607 sys/compat: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:13:23 +00:00
Ed Schouten
814629dd64 Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00