Commit Graph

3326 Commits

Author SHA1 Message Date
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
caaad8736e Linuxulator getpeername() returns EINVAL in case then namelen less then 0.
MFC after:	2 weeks
2019-05-13 18:14:20 +00:00
Dmitry Chagin
d5368bf3df Our bsd_to_linux_sockaddr() and linux_to_bsd_sockaddr() functions
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.

Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.

To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.

Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.

PR:		232920
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20157
2019-05-13 17:48:16 +00:00
Johannes Lundberg
5098ed5f3b Implement linux_pci_unregister_drm_driver in linuxkpi so that drm drivers
can be unloaded.

This patch is a part of D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-10 23:10:22 +00:00
Hans Petter Selasky
e2eb11e577 Fix memory leak of PCI BUS structure in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-05-09 10:23:42 +00:00
Hans Petter Selasky
eb6f534241 Fix regression issue after r346645 in the LinuxKPI.
Make sure LinuxKPI PCI devices get a default BUSDMA tag.

Found by:	Thomas Laus <lausts@acm.org>
Sponsored by:	Mellanox Technologies
2019-05-09 09:45:19 +00:00
Hans Petter Selasky
423530be04 Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:23:33 +00:00
Ed Maste
0e26cd440f make sysent after r347228
Regenerate to add @generated tag in generated files.
2019-05-07 18:10:21 +00:00
Dmitry Chagin
52a9e429c8 Remove wrong copyright line. Discussed with Carlos Neira.
Reported by:	Rodney W. Grimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-07 05:08:13 +00:00
Dmitry Chagin
4e2f69f1cf Adds sys/class/net devices to linsysfs.
Only two interfaces are created eth0 and lo and they expose
the following properties:
address, addr_len, flags, ifindex, mty, tx_queue_len and type.

Initial patch developed by Carlos Neira in 2017 and finished by me.

PR:		223722
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-06 20:01:13 +00:00
Dmitry Chagin
bbac65c772 Rewrite linux_ifflags() in more readable Linuxulator style.
Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20146
2019-05-06 19:57:51 +00:00
Dmitry Chagin
9c1437ae57 Complete r347052 (https://reviews.freebsd.org/D20137) as it it was not
a final revision.

Fix style issues and change bool-like variables from int to bool.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20141
2019-05-06 19:56:13 +00:00
Hans Petter Selasky
46068e86c3 Use PCIV_INVALID in pci_channel_offline() in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:22:45 +00:00
Hans Petter Selasky
fa23397925 Disabling a PCI device should only disable busmaster in the LinuxKPI.
As Linux comment for this function point:
Signal to the system that the PCI device is not in use by the system
anymore. This only involves disabling PCI bus-mastering, if active.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:17:38 +00:00
Hans Petter Selasky
34cb771e01 Implement print_hex_dump_debug() function macro in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:10:26 +00:00
Hans Petter Selasky
4580f5eadd Allow controlling pr_debug at runtime in the LinuxKPI.
Turning on pr_debug at compile time make it non-optional at runtime.
This often means that the amount of the debugging is unbearable.
Allow developer to turn on pr_debug output only when needed.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		kib@
Sponsored by:		Mellanox Technologies
2019-05-06 16:00:20 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Hans Petter Selasky
442d12d89c Fix regression issue after r346645 in the LinuxKPI.
The S/G list must be mapped AS-IS without any optimisations.
This also implies that sg_dma_len() must be equal to sg->length.
Many Linux drivers assume this and this fixes some DRM issues.

Put the BUS DMA map pointer into the scatter-gather list to
allow multiple mappings on the same physical memory address.

The FreeBSD version has been bumped to force recompilation of
external kernel modules.

Sponsored by:		Mellanox Technologies
2019-05-04 09:47:01 +00:00
Hans Petter Selasky
8ec9f0282a Fix regression issue after r346645 in the LinuxKPI.
Properly handle error case when mapping DMA address fails.

Sponsored by:		Mellanox Technologies
2019-05-04 09:30:03 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Edward Tomasz Napierala
967cbe64b1 Decode more CPU flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20145
2019-05-03 08:27:03 +00:00
Edward Tomasz Napierala
6c8cb13dd8 Fix flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20139
2019-05-02 19:02:16 +00:00
Dmitry Chagin
03ddf624e6 Remove unneeded includes.
MFC after:	2 week
2019-05-02 09:00:36 +00:00
Edward Tomasz Napierala
12f3888a98 Add sys/devices/system/cpu/{possible,present} to linsysfs(5).
That makes Linux lscpu(1) work.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20131
2019-05-02 08:17:29 +00:00
Dmitry Chagin
5d520d7fab Follow the FreeBSD and implement PDEATH_SIG prctl ops in the Linuxulator.
It was first introduced in r163734 and missied by me in r283383.

MFC after:	1 week
2019-04-30 17:18:05 +00:00
Hans Petter Selasky
a6619e8d9c Reduce the number of mutexes after r346645 in the LinuxKPI.
Make function macro wrappers for locking and unlocking to ease readability.

No functional change.

Discussed with:		kib@, tychon@ and zeising@
Sponsored by:		Mellanox Technologies
2019-04-30 10:41:20 +00:00
Hans Petter Selasky
93a203ea65 Make the dma_pool structure private to the LinuxKPI similar to Linux.
No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:38:22 +00:00
Hans Petter Selasky
5a637529ed Store a pointer to the device instead of the PCI device in the DMA pool
implementation in the LinuxKPI. This avoids use of container_of().

No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:26:11 +00:00
Ed Maste
5803d72f7e make sysent after r346273 (readlinkat arg correction)
PR:		197915
Reminded by:	dchagin
2019-04-26 12:55:52 +00:00
Johannes Lundberg
af248a7cee Don't call cdev_init where cdev_alloc is called. cdev_alloc already
handles initialization.

Reported by:	johalun
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19565
2019-04-25 21:54:32 +00:00
Tycho Nightingale
b09626b330 LinuxKPI buildfix for ppc64 after r346645.
Proposed by:	hselasky
Sponsored by:	Dell EMC Isilon
2019-04-25 18:13:55 +00:00
Hans Petter Selasky
d4fedb75ec LinuxKPI buildfix for 32-bit DMA architectures after r346645.
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.

Sponsored by:		Mellanox Technologies
2019-04-25 09:13:15 +00:00
Tycho Nightingale
f211d536b6 LinuxKPI should use bus_dma(9) to be compatible with an IOMMU
Reviewed by:	hselasky, kib
Tested by:	greg@unrelenting.technology
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19845
2019-04-24 20:30:45 +00:00
Ed Maste
ff9be73ee3 Enable ioremap for aarch64 in the LinuxKPI
Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

PR:		237055
Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D19987
2019-04-20 15:57:05 +00:00
Ed Maste
e8ee7d9035 correct readlinkat(2) return type
r176215 corrected readlink(2)'s return type and the type of the last
argument.  readlink(2) was introduced in r177788 after being developed
as part of Google Summer of Code 2007; it appears to have inherited the
wrong return type.

Man pages and header files were already ssize_t; update syscalls.master
to match.

PR:		197915
Submitted by:	Henning Petersen <henning.petersen@t-online.de>
MFC after:	2 weeks
2019-04-16 13:26:31 +00:00
Mariusz Zaborski
a489026566 Regen after r345982. 2019-04-06 09:37:10 +00:00
Mariusz Zaborski
a1304030b8 Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
Conrad Meyer
a8a16c7128 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
Jason A. Harmening
f0645b3a06 freebsd32: fix padding of computed control message length for recvmsg()
Each control message region must be aligned on a 4-byte boundary on 32-bit
architectures. The 32-bit compat shim for recvmsg() gets the actual layout
right, but doesn't pad the payload length when computing msg_controllen for
the output message header. If a control message contains an unaligned
payload, such as the 1-byte TTL field in the example attached to PR 236737,
this can produce control message payload boundaries that extend beyond
the boundary reported by msg_controllen.

PR:	236737
Reported by:	Yuval Pavel Zholkover <paulzhol@gmail.com>
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19768
2019-03-30 23:43:58 +00:00
Dmitry Chagin
803fff9065 Whitespace cleanup (annoying).
MFC after:	1 month
2019-03-24 15:08:30 +00:00
Dmitry Chagin
f730d606d5 Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Konstantin Belousov
fd8d844f76 amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting.  PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.

Requested by:	jhb
Reviewed by:	jhb, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:44:33 +00:00
Hans Petter Selasky
582141f69d Revert r345102 until the DRM next port issues are resolved.
Requested by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-14 09:18:54 +00:00
Hans Petter Selasky
7d595f6b79 Resolve duplicate symbol name conflict after r345095, when building LINT.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-13 19:53:20 +00:00
Hans Petter Selasky
998f22eb4b Implement sg_virt() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:31:33 +00:00
Hans Petter Selasky
56b16627d4 Define SG_CHAIN and SG_END in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:30:40 +00:00
Hans Petter Selasky
c469882529 Implement pr_info_ratelimited() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:26:24 +00:00
Hans Petter Selasky
856b815d27 Define some RCU debug macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:24:30 +00:00
Hans Petter Selasky
0ac26c0e30 Honor SYSCTL function return values when creating sysfs nodes in the LinuxKPI.
Return proper error code upon failure.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:21:19 +00:00
Hans Petter Selasky
925245aaa9 Implement more malloc function macros in the LinuxKPI.
Fix arguments for currently unused kvmalloc().

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:17:52 +00:00
Hans Petter Selasky
ab62989ae9 Implement more PCI speed related functions and macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:15:36 +00:00
Hans Petter Selasky
75f7460b34 Implement IS_ALIGNED() and DIV_ROUND_DOWN_ULL() function macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:04:06 +00:00
Hans Petter Selasky
8734a56285 Implement si_meminfo() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 19:01:55 +00:00
Hans Petter Selasky
5746b1cd46 Implement task_euid() and get_task_state() function macros in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:55:41 +00:00
Hans Petter Selasky
c7486758eb Implement get_task_comm() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:53:29 +00:00
Hans Petter Selasky
638fa5a36f Implement current_exiting() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:51:33 +00:00
Hans Petter Selasky
845a91ce0b Implement list_for_each_entry_from_reverse() and
list_bulk_move_tail() in the LinuxKPI.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:47:17 +00:00
Hans Petter Selasky
4f081c4fc6 Implement dma_map_page_attrs() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:44:06 +00:00
Hans Petter Selasky
839b4bf24d Implement ida_free() and ida_alloc_max() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:02:47 +00:00
Hans Petter Selasky
8b2a8a4954 Implement DEFINE_STATIC_SRCU() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 18:00:25 +00:00
Hans Petter Selasky
884aaac660 Implement BITS_PER_TYPE() function macro in the LinuxKPI.
Fix some style while at it.

Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:55:58 +00:00
Hans Petter Selasky
4c09177ab7 Properly define the DMA attribute values in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:51:08 +00:00
Hans Petter Selasky
7f36930024 Implement dev_err_once() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:46:05 +00:00
Hans Petter Selasky
8fdb5febfc Implement dma_set_mask_and_coherent() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Limelight Networks
Sponsored by:		Mellanox Technologies
2019-03-13 17:42:31 +00:00
Warner Losh
329f0aa952 Kill tz_minuteswest and tz_dsttime.
Research Unix, 7th Edition introduced TIMEZONE and DSTFLAG
compile-time constants in sys/param.h to communicate these values for
the machine. 4.2BSD moved from the compile-time to run-time and
introduced these variables and used for localtime() to return the
right offset from UTC (sometimes referred to as GMT, for this purpose
is the same). 4.4BSD migrated to using the tzdata code/database and
these variables were basically unused.

FreeBSD removed the real need for these with adjkerntz in
1995. However, some RTC clocks continued to use these variables,
though they were largely unused otherwise.  Later, phk centeralized
most of the uses in utc_offset, but left it using both tz_minuteswest
and adjkerntz.

POSIX (IEEE Std 1003.1-2017) states in the gettimeofday specification
"If tzp is not a null pointer, the behavior is unspecified" so there's
no standards reason to retain it anymore. In fact, gettimeofday has
been marked as obsolecent, meaning it could be removed from a future
release of the standard. It is the only interface defined in POSIX
that references these two values. All other references come from the
tzdata database via tzset().

These were used to more faithfully implement early unix ABIs which
have been removed from FreeBSD.  NetBSD has completely eliminated
these variables years ago. Linux has migrated to tzdata as well,
though these variables technically still exist for compatibility
with unspecified older programs.

So, there's no real reason to have them these days. They are a
historical vestige that's no longer used in any meaningful way.

Reviewed By: jhb@, brooks@
Differential Revision: https://reviews.freebsd.org/D19550
2019-03-12 04:49:47 +00:00
Edward Tomasz Napierala
1699546def Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.

Reviewed by:	imp (who also contributed the commit message)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19280
2019-03-01 16:16:38 +00:00
Bjoern A. Zeeb
4974f2b172 Add ushort and ulong to linux/types.h.
When porting code once written for Linux we find not only uints but also ushort and ulong.
Provide central typedefs as part of the linuxkpi for those as well.

Reviewed by:	hselasky, emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19405
2019-03-01 14:33:20 +00:00
Matt Macy
3f6cab079c import linux debugfs support
Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19258
2019-02-23 20:56:41 +00:00
Matt Macy
2ce1771c12 linux/fs: simplify interop and correct definition of loff_t
- offsets can be negative, loff_t needs to be signed, it also simplifies
  interop with the rest of the code base to use off_t than the actual linux
  definition "long long"
- don't rely on the defining "file" to "linux_file" in interface definitions
  as that causes heartache with includes

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19274
2019-02-23 20:45:45 +00:00
Matt Macy
983ed4f9f1 lkpi: allow late binding of linux_alloc_current
Some consumers may be loosely coupled with the lkpi.
This allows them to call linux_alloc_current without
having a static dependency.

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19257
2019-02-22 23:15:32 +00:00
Marius Strobl
f855ec814d Make taskqgroup_attach{,_cpu}(9) work across architectures
So far, intr_{g,s}etaffinity(9) take a single int for identifying
a device interrupt. This approach doesn't work on all architectures
supported, as a single int isn't sufficient to globally specify a
device interrupt. In particular, with multiple interrupt controllers
in one system as found on e. g. arm and arm64 machines, an interrupt
number as returned by rman_get_start(9) may be only unique relative
to the bus and, thus, interrupt controller, a certain device hangs
off from.
In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to
the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}()
not work across architectures. Yet in turn, iflib(4) as gtaskqueue
consumer so far doesn't fit architectures where interrupt numbers
aren't globally unique.
However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as
employed by the gtaskqueue implementation to bind an interrupt to a
particular CPU, using bus_bind_intr(9) instead is equivalent from
a functional point of view, with bus_bind_intr(9) taking the device
and interrupt resource arguments required for uniquely specifying a
device interrupt.
Thus, change the gtaskqueue implementation to employ bus_bind_intr(9)
instead and intr_{g,s}etaffinity(9) to take the device and interrupt
resource arguments required respectively. This change also moves
struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps
struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL
as userland likes to include <sys/_task.h> or indirectly drags it
in - for better or worse also with _KERNEL defined -, which with
device_t and struct resource dependencies otherwise is no longer
as easily possible now.
The userland inclusion problem probably can be improved a bit by
introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the
existing _WANT_PRISON etc., which is orthogonal to this change,
though, and likely needs an exp-run.

While at it:
- Change the gt_cpu member in the grouptask structure to be of type
  int as used elswhere for specifying CPUs (an int16_t may be too
  narrow sooner or later),
- move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to
  the gtaskqueue implementation as it's only used and needed there,
- change the GTASK_INIT macro to use "gtask" rather than "task" as
  argument given that it actually operates on a struct gtask rather
  than a struct task, and
- let subr_gtaskqueue.c consistently use __func__ to print functions
  names.

Reported by:	mmel
Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D19139
2019-02-12 21:23:59 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Konstantin Belousov
a7f67facdf Normalize the declaration of i386_read_exec variable.
It is currently re-declared in sys/sysent.h which is a wrong place for
MD variable.  Which causes redeclaration error with gcc when
sys/sysent.h and machine/md_var.h are included both.

Remove it from sys/sysent.h and instead include machine/md_var.h when
needed, under #ifdef for both i386 and amd64.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:51:51 +00:00
Andriy Voskoboinyk
a99bdc110b Fix compilation with 'option NDISAPI + device ndis' and
without 'device pccard' in the kernel config file.

PR:		171532
Reported by:	Robert Bonomi <bonomi@host128.r-bonomi.com>
MFC after:	1 week
2019-01-30 11:40:12 +00:00
Hans Petter Selasky
232028b34e Add full support for PCI_ANY_ID when matching PCI IDs in the LinuxKPI.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-01-25 20:13:28 +00:00
Oleksandr Tymoshenko
1715256316 [ndis] Fix unregistered use of FPU by NDIS in kernel on amd64
amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use
of FPU in kernel" panic.

Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave.  To reduce
amount of allocations/deallocations done via
fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements.

Based on the patch by Paul B Mahol

PR:		165622
Submitted by:	Vlad Movchan <vladislav.movchan@gmail.com>
MFC after:	1 month
2019-01-22 03:53:42 +00:00
Ed Maste
347a8ed1bf linuxulator: fix stack memory disclosure in linux_sigaltstack
Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before
callit it, but linux_waitid did not.  Instead of zeroing in the called
function to address linux_waitid (as in commit 2e6ebe70), just do it in
linux_waitid.

admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	Andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 17:12:16 +00:00
Ed Maste
9866e7bbae linuxulator: fix stack memory disclosure in linux_ioctl_termio
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:21:03 +00:00
Ed Maste
4308a37410 linuxulator: fix stack memory disclosure in linux_ioctl_v4l
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:19:02 +00:00
Kirk McKusick
88640c0e8b Create new EINTEGRITY error with message "Integrity check failed".
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
2019-01-17 06:35:45 +00:00
Gleb Smirnoff
396694153f Fix compilation failures on different arches that have vm_machdep.c not
aware of counter_u64_t by including counter.h into uma_int.h. I'm not
happy about this inclusion, but it fixes compilation ASAP.
2019-01-15 19:33:47 +00:00
Gleb Smirnoff
2efcc8cbca Make uz_allocs, uz_frees and uz_fails counter(9). This removes some
atomic updates and reduces amount of data protected by zone lock.

During startup point these fields to EARLY_COUNTER. After startup
allocate them for all early zones.

Tested by:	pho
2019-01-15 18:24:34 +00:00
Olivier Houchard
21fb66241a Regenerate sysent files after having modified syscalls.master. 2019-01-13 00:38:55 +00:00
Olivier Houchard
2ca357528f amd64 is the only arch that doesn't require padding for 32bits syscalls, so
instead of listing every arch thar requires it, just exclude amd64.
2019-01-13 00:37:31 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Mark Johnston
bb376a990c Specify the correct option level when emulating SO_PEERCRED.
Our equivalent to SO_PEERCRED, LOCAL_PEERCRED, is implemented at
socket option level 0, not SOL_SOCKET.

PR:		234722
Submitted by:	Dániel Bakai <bakaidl@gmail.com>
MFC after:	2 weeks
2019-01-08 17:21:59 +00:00
Conrad Meyer
85f2a00b34 linuxkpi: Remove extraneous NULL check on M_WAITOK allocation
The check was not introduced in r342628, but the subsequent unchecked access to
refs was added then, prompting a Coverity warning about "Null pointer
dereferences (FORWARD_NULL)."  The warning is bogus due to M_WAITOK, but so is
the NULL check that hints it, so just remove it.

CID:		1398588
Reported by:	Coverity
2019-01-01 19:56:49 +00:00
Konstantin Belousov
9362b6a394 Fix 32bit gcc builds after r342625.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-12-30 16:39:26 +00:00
Konstantin Belousov
f823a36e83 Fix linux_destroy_dev() behaviour when there are still files open from
the destroying cdev.

Currently linux_destroy_dev() waits for the reference count on the
linux cdev to drain, and each open file hold the reference.
Practically it means that linux_destroy_dev() is blocked until all
userspace processes that have the cdev open, exit.  FreeBSD devfs does
not have such problem, because device refcount only prevents freeing
of the cdev memory, and separate 'active methods' counter blocks
destroy_dev() until all threads leave the cdevsw methods.  After that,
attempts to enter cdevsw methods are refused with an error.

Implement somewhat similar mechanism for LinuxKPI cdevs.  Demote cdev
refcount to only mean a hold on the linux cdev memory.  Add sirefs
count to track both number of threads inside the cdev methods, and for
single-bit indicator that cdev is being destroyed.  In the later case,
the call is redirected to the dummy cdev.

Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:46:45 +00:00
Konstantin Belousov
e5a3393a15 Implement zap_vma_ptes() for managed device objects.
Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:38:07 +00:00
Konstantin Belousov
069598b941 Use IDX_TO_OFF().
Reviewed by:	markj
Discussed with:	hselasky
Tested by:	zeising
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D18606
2018-12-30 15:28:31 +00:00
Mateusz Guzik
628888f0e0 Remove iBCS2, part2: general kernel
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
2018-12-19 21:57:58 +00:00
Brooks Davis
10f7b12c13 const poison the new pointer of __sysctl.
Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18444
2018-12-18 12:44:38 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Hans Petter Selasky
ca487c1888 Remove no longer needed ifdefs in the LinuxKPI, after r341787.
Differential Revision:	https://reviews.freebsd.org/D18450
Reviewed by:		kib@
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-12-10 13:41:33 +00:00
Konstantin Belousov
fd52edaf70 Regen. 2018-12-07 15:19:00 +00:00
Konstantin Belousov
d1fd400a80 Add new file handle system calls.
Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359
2018-12-07 15:17:29 +00:00
Hans Petter Selasky
52da588961 Remove redundant declaration after r341517.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:56:44 +00:00
Hans Petter Selasky
6da0d28e6a Fix some build of LinuxKPI on some platforms after r341518.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:53:34 +00:00
Slava Shwartsman
31c3f64819 mlx5: Fix driver version location
Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:10 +00:00
Slava Shwartsman
a9c20af23d ibcore: ip6_dev_find() needs to know the scope ID.
Else the wrong network device can be returned for link-local addresses.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:43 +00:00
Slava Shwartsman
452d59e130 linuxkpi: Really check if PCI is offline
Currently we always return false if for PCI offline query.
Try to read PCI config, if the return value if 0xffff probably the
PCI is offline.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:45 +00:00
Slava Shwartsman
92cbd83001 linuxkpi: properly implement netif_carrier_ok().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:15 +00:00
Slava Shwartsman
9c7b53cc65 linuxkpi: Fix for use-after-free when tearing down character devices.
Make sure we hold a reference on the character device for every opened file
to prevent the character device to be freed prematurely.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:16:39 +00:00
Slava Shwartsman
be34cfc587 linuxkpi: implement idr_is_empty() and ida_is_empty().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:15:57 +00:00
Konstantin Belousov
f186340011 Improve procstat reporting for the linux cdev file descriptors.
If there is a vnode attached to the linux file, use it to fill
kinfo_file.  Otherwise, report a new KF_TYPE_DEV file type, without
supplying any type-specific information.

KF_TYPE_DEV is supposed to be used by most devfs-specific file types.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-03 23:39:45 +00:00
Brooks Davis
f373437a01 Add helper functions to copy strings into struct image_args.
Given a zeroed struct image_args with an allocated buf member,
exec_args_add_fname() must be called to install a file name (or NULL).
Then zero or more calls to exec_args_add_env() followed by zero or
more calls to exec_args_add_env(). exec_args_adjust_args() may be
called after args and/or env to allow an interpreter to be prepended to
the argument list.

To allow code reuse when adding arg and env variables, begin_envv
should be accessed with the accessor exec_args_get_begin_envv()
which handles the case when no environment entries have been added.

Use these functions to simplify exec_copyin_args() and
freebsd32_exec_copyin_args().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15468
2018-11-29 21:00:56 +00:00
Mark Johnston
792843c38f Pass malloc flags directly through kevent(2) subroutines.
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318
2018-11-24 17:06:01 +00:00
Ben Widawsky
f82dd310bb linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
Ben Widawsky
5a46107832 linuxkpi: Remove duplicated text
Somehow this got botched while moving from git -> svn
2018-11-20 23:05:09 +00:00
Ben Widawsky
c3f4f28c63 linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
Tijl Coosemans
7df0e7beb7 Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument.  Stop using the macro as well as LINUX_CMSG_FIRSTHDR.  Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR:		217901
MFC after:	2 days
X-MFC-With:	r340631
2018-11-20 14:18:57 +00:00
Tijl Coosemans
e3b385fc95 Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin.  For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly.  One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR:		217901
Reviewed by:	kib
MFC after:	3 days
2018-11-19 15:31:54 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Hans Petter Selasky
0df8bab666 Define asm macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:23:45 +00:00
Hans Petter Selasky
1799873e3a Implement ktime_get_ts64() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:19:16 +00:00
Brooks Davis
5b1df30051 Use the main capabilities.conf for freebsd32.
Allow the location of capabilities.conf to be configured.

Also allow a per-abi syscall prefix to be configured with the
abi_func_prefix syscalls.conf variable and check syscalls against
entries in capabilities.conf with and without the prefix amended.

Take advantage of these two features to allow use shared capabilities.conf
between the default syscall vector and the freebsd32 compatability
layer.  We've been inconsistent about keeping the two in sync as
evidenced by the bugs fixed in r340294.  This eliminates that problem
going forward.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17932
2018-11-14 00:46:02 +00:00
Brooks Davis
4b499c75f9 Regen after r340302: Fix freebsd32 mknod(at).
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:02:07 +00:00
Brooks Davis
9a38df59e9 Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument.  64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations.  They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips.  Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.

The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.

Use uint32_t consistently as the type of the old dev_t.  This matches
the old definition.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:01:16 +00:00
Brooks Davis
1632f36305 Regen after r340294: Fix a number of bugs in freebsd32's capabilities.conf.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:06:25 +00:00
Brooks Davis
d457c0b61b Fix a number of bugs in freebsd32's capabilities.conf.
Bugs range from failure to update after changing syscall implementaion
names to using the wrong name.  Somewhat confusingly, the name in
capabilities.conf is exactly the string that appears in syscalls.master,
not the name with a COMPAT* prefix which is the actual function name.

Found while making a change to use the default capabilities.conf.

Fixes:	r335177, r336980, r340272, r340274, others
Reviewed by:	kib, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:03:01 +00:00
Brooks Davis
e1e300b671 Regen after r340274: Make freebsd32_utmx_op follow the freebsd32_foo
convention.
2018-11-09 00:46:50 +00:00
Brooks Davis
b34f4419fb Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:46:10 +00:00
Brooks Davis
86e06fa55b Regen after 340272: Make __sysctl follow the freebsd32_foo convention
Sponsored by:	DARPA, AFRL
2018-11-09 00:22:45 +00:00
Brooks Davis
4074751718 Make __sysctl follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:21:58 +00:00
Brooks Davis
5577e44bf4 Regen after r340221: allow pointer return types.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:56:07 +00:00
Brooks Davis
e56ec0e519 makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word.  This
allows it to be a pointer without using a typedef.

Update the return types of break, mmap, and shmat to be void * as
declared.  This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:55:04 +00:00
Brooks Davis
938e8dcf60 Regen after r340199: Use declared types for caddr_t arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:47:29 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Mariusz Zaborski
0b39d7e377 Remove ppoll. freebsd32 doesn't define a ppoll syscall.
Reported by:	jhb
2018-11-06 18:26:40 +00:00
Mariusz Zaborski
279e464dd5 Regenerate after r340195. 2018-11-06 18:06:52 +00:00
Mariusz Zaborski
4a1f3ed354 capsicum: Add ppoll and freebsd32_ppoll to compat32.
PR:		232495
Pointed out by: brooks
MFC after:	2 weeks
2018-11-06 18:05:46 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Brooks Davis
4e8c73eb20 Regen after r340080: Add const to input-only char * arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:56:19 +00:00
Brooks Davis
12e69f96a2 Add const to input-only char * arguments.
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:50:22 +00:00
Brooks Davis
f7e5ce325f Regent after r340034: Use mode_t when the documented signature does.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:10:53 +00:00
Brooks Davis
2105ac07d7 Use mode_t when the documented signature does.
This is more clear and produces better results when generating function
stubs from syscalls.master.

Reviewed by:	kib, emaste
Obtained from:	CheribSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:06:50 +00:00
Ben Widawsky
3d40cdf014 linuxkpi: Add GFP flags needed for ttm drivers
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Requested by:	bwidawsk
MFC after:	3 days
Approved by:	emaste (mentor)
2018-11-01 15:30:01 +00:00
Hans Petter Selasky
e35079db73 Implement the dump_stack() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:42:56 +00:00
Hans Petter Selasky
bf05cd05ac Implement __KERNEL_DIV_ROUND_UP() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:32:52 +00:00
Hans Petter Selasky
8ad9551d36 Implement dma_pool_zalloc() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-29 19:02:36 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Brooks Davis
22c0c9a481 Remove __restrict qualifiers from syscalls.master.
The restruct qualifier is intended to aid code generation in the
compiler, but the only access to storage through these pointers is via
structs using copyin/copyout and the like which can not be written in C
or C++ and thus the compiler gains nothing from the qualifiers.

As such, the qualifiers add no value in current usage.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17574
2018-10-22 21:50:43 +00:00
Tijl Coosemans
642909fdfb Define linuxkpi readq for 64-bit architectures. It is used by drm-kmod.
Currently the compiler picks up the definition in machine/cpufunc.h.

Add compiler memory barriers to read* and write*.  The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.

Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.

Qualify the addr parameter in write* as volatile.

Like Linux, define macros with the same name as the inline functions.

Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.

Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian

Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.

Reviewed by:	hselasky
MFC after:	3 days
2018-10-22 20:55:35 +00:00
Kyle Evans
29bf3a7ba8 Correct COMPAT* macro names in syscalls.master
Both ^/sys/compat/freebsd32/syscalls.master and ^/sys/kern/syscalls.master
cited "COMPAT[n] #ifdef" instead of "COMPAT_FREEBSD[n] #ifdef" in places.

Approved by:	re (glebius)
2018-10-15 21:35:57 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Brooks Davis
9bc603bd20 Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.
A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by:	re (gjb)
2018-10-04 23:55:03 +00:00
Brooks Davis
23f2e22802 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Reviewed by:	kib
Approved by:	re (rgrimes, gjb)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Review:	https://reviews.freebsd.org/D17388
2018-10-03 20:39:48 +00:00