freebsd-nq

Author	SHA1	Message	Date
Colin Percival	46dd801acb	Add userland boot profiling to TSLOG On kernels compiled with 'options TSLOG', record for each process ID: * The timestamp of the fork() which creates it and the parent process ID, * The first path passed to execve(), if any, * The first path resolved by namei, if any, and * The timestamp of the exit() which terminates the process. Expose this information via a new sysctl, debug.tslog_user. On kernels lacking 'options TSLOG' (the default), no information is recorded and the sysctl does not exist. Note that recording namei is needed in order to obtain the names of rc.d scripts being launched, as the rc system sources them in a subshell rather than execing the scripts. With this commit it is now possible to generate flamecharts of the entire boot process from the start of the loader to the end of /etc/rc. The code needed to perform this processing is currently found in github: https://github.com/cperciva/freebsd-boot-profiling Reviewed by: mhorne Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D32493	2021-10-16 11:47:34 -07:00
Dawid Gorecki	a97d697122	kern_exec: Add kern.stacktop sysctl. With stack gap enabled top of the stack is moved down by a random amount of bytes. Because of that some multithreaded applications which use kern.usrstack sysctl to calculate address of stacks for their threads can fail. Add kern.stacktop sysctl, which can be used to retrieve address of the stack after stack gap is applied to it. Returns value identical to kern.usrstack for processes which have no stack gap. Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31897	2021-10-15 10:21:55 +02:00
Dawid Gorecki	889b56c8cd	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516	2021-10-15 10:21:47 +02:00
Mateusz Guzik	a0558fe90d	Retire code added to support CloudABI CloudABI was removed in cf0ee8738e31aa9e6fbf4dca4dac56d89226a71a	2021-10-10 18:24:29 +00:00
Konstantin Belousov	b5cadc643e	Make core dump writes interruptible with SIGKILL This can be disabled by sysctl kern.core_dump_can_intr Reported and tested by: pho Reviewed by: imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313	2021-10-08 03:21:43 +03:00
Justin Hibbits	63cb9308a7	Fix segment size in compressing core dumps A core segment is bounded in size only by memory size. On 64-bit architectures this means a segment can be much larger than 4GB. However, compress_chunk() takes only a u_int, clamping segment size to 4GB-1, resulting in a truncated core. Everything else, including the compressor internally, uses size_t, so use size_t at the boundary here. This dates back to the original refactor back in 2015 (r279801 / aa14e9b7). MFC after: 1 week Sponsored by: Juniper Networks, Inc.	2021-10-01 14:16:33 -05:00
Konstantin Belousov	397f188936	Remove SV_CAPSICUM It was only needed for cloudabi Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31923	2021-09-22 00:18:44 +03:00
Andrew Turner	b792434150	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830	2021-08-30 12:50:53 +01:00
Dmitry Chagin	af29f39958	umtx: Split umtx.h on two counterparts. To prevent umtx.h polluting by future changes split it on two headers: umtx.h - ABI header for userspace; umtxvar.h - the kernel staff. While here fix umtx_key_match style. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31248 MFC after: 2 weeks	2021-07-29 12:41:29 +03:00
Dmitry Chagin	5fd9cd53d2	linux(4): Modify sv_onexec hook to return an error. Temporary add stubs to the Linux emulation layer which calls the existing hook. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D30911 MFC after: 2 weeks	2021-07-20 09:56:25 +03:00
Dmitry Chagin	62ba4cd340	Call sv_onexec hook after the process VA is created. For future use in the Linux emulation layer call sv_onexec hook right after the new process address space is created. It's safe, as sv_onexec used only by Linux abi and linux_on_exec() does not depend on a state of process VA. Reviewed by: kib Differential revision: https://reviews.freebsd.org/D30899 MFC after: 2 weeks	2021-07-20 09:55:14 +03:00
Konstantin Belousov	28a66fc3da	Do not call FreeBSD-ABI specific code for all ABIs Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle robust mutexes, for native FreeBSD ABI. Similarly, there is no sense in calling sigfastblock_clear() for non-native ABIs. Requested by: dchagin Reviewed by: dchagin, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30987	2021-07-07 14:12:07 +03:00
Konstantin Belousov	71ab344524	Add sv_onexec_old() sysent hook for exec event Unlike sv_onexec(), it is called from the old (pre-exec) sysentvec structure. The old vmspace for the process is still intact during the call. Reviewed by: dchagin, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30987	2021-07-07 14:12:07 +03:00
Edward Tomasz Napierala	db8d680ebe	procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS This introduces a new, per-process flag, "NO_NEW_PRIVS", which is inherited, preserved on exec, and cannot be cleared. The flag, when set, makes subsequent execs ignore any SUID and SGID bits, instead executing those binaries as if they not set. The main purpose of the flag is implementation of Linux PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged chroot. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30939	2021-07-01 09:42:07 +01:00
Dmitry Chagin	615f22b2fb	Add a link to the Elf_Brandinfo into the struc proc. To allow the ABI to make a dicision based on the Brandinfo add a link to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits of Elf_Brandinfo flags is private to the ABI. Note to MFC: it breaks KBI. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30918 MFC after: 2 weeks	2021-06-29 20:15:08 +03:00
Konstantin Belousov	2d423f7671	sysent: allow ABI to disable setid on exec. Reviewed by: dchagin Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28154	2021-06-06 21:42:52 +03:00
Konstantin Belousov	19e6043a44	kern_exec.c: Add execve_nosetid() helper Reviewed by: dchagin Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28154	2021-06-06 21:42:41 +03:00
Edward Tomasz Napierala	3b9971c8da	Clean up some of the core dumping code. No functional changes. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30397	2021-05-25 16:30:32 +01:00
Mateusz Guzik	154f0ecc10	Fix tinderbox build after 1762f674ccb571e6 ktrace commit.	2021-05-22 19:41:19 +00:00
Konstantin Belousov	1762f674cc	ktrace: pack all ktrace parameters into allocated structure ktr_io_params Ref-count the ktr_io_params structure instead of vnode/cred. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257	2021-05-22 15:16:08 +03:00
Edward Tomasz Napierala	33621dfc19	Refactor core dumping code a bit This makes it possible to use core_write(), core_output(), and sbuf_drain_core_output(), in Linux coredump code. Moving them out of imgact_elf.c is necessary because of the weird way it's being built. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30369	2021-05-22 09:59:00 +01:00
Mark Johnston	f1c3adefd9	execve: Mark exec argument buffers We cache mapped execve argument buffers to avoid the overhead of TLB shootdowns. Mark them invalid when they are freed to the cache. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29460	2021-04-13 17:42:21 -04:00
Konstantin Belousov	4ea65707d3	exec_new_vmspace: print useful error message on ctty if stack cannot be mapped. After old vmspace is destroyed during execve(2), but before the new space is fully constructed, an error during image activation cannot be returned because there is no executing program to receive it. In the relatively common case of failure to map stack, print some hints on the control terminal. Note that user has enough knobs to cause stack mapping error, and this is the most common reason for execve(2) aborting the process. Requested by: jhb Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Konstantin Belousov	2e1c94aa1f	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Konstantin Belousov	673e2dd652	Add ELF flag to disable ASLR stack gap. Also centralize and unify checks to enable ASLR stack gap in a new helper exec_stackgap(). PR: 239873 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-12-18 23:14:39 +00:00
Konstantin Belousov	87a9b18d22	Provide ABI modules hooks for process exec/exit and thread exit. Exec and exit are same as corresponding eventhandler hooks. Thread exit hook is called somewhat earlier, while thread is still owned by the process and enough context is available. Note that the process lock is owned when the hook is called. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27309	2020-11-23 17:29:25 +00:00
Konstantin Belousov	e68c619144	Stop using eventhandlers for itimers subsystem exec and exit hooks. While there, do some minor cleanup for kclocks. They are only registered from kern_time.c, make registration function static. Remove event hooks, they are not used by both registered kclocks. Add some consts. Perhaps we can stop registering kclocks at all and statically initialize them. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27305	2020-11-21 21:43:36 +00:00
Konstantin Belousov	74a093eb98	Stop using eventhandler to invoke umtx_exec hook. There is no point in dynamic registration, umtx hook is there always. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27303	2020-11-21 10:32:40 +00:00
Conrad Meyer	85078b8573	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037	2020-11-17 21:14:13 +00:00
Mark Johnston	f7db0c9532	vmspace: Convert to refcount(9) This is mostly mechanical except for vmspace_exit(). There, use the new refcount_release_if_last() to avoid switching to vmspace0 unless other processes are sharing the vmspace. In that case, upon switching to vmspace0 we can unconditionally release the reference. Remove the volatile qualifier from vm_refcnt now that accesses are protected using refcount(9) KPIs. Reviewed by: alc, kib, mmel MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27057	2020-11-04 16:30:56 +00:00
Kyle Evans	275c821d3d	audit: correct reporting of execve(2) success r326145 corrected do_execve() to return EJUSTRETURN upon success so that important registers are not clobbered. This had the side effect of tapping out 'failures' for all execve(2) audit records, which is less than useful for auditing purposes. Audit exec returns earlier, where we can know for sure that EJUSTRETURN translates to success. Note that this unsets TDP_AUDITREC as we commit the audit record, so the usual audit in the syscall return path will do nothing. PR: 249179 Reported by: Eirik Oeverby <ltning-freebsd anduin net> Reviewed by: csjp, kib MFC after: 1 week Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26922	2020-10-24 14:39:17 +00:00
Konstantin Belousov	5dca94ee82	Remove pointless local variable. Reported by: alc Sponsored by: The FreeBSD Foundation MFC after: 6 days	2020-09-24 12:14:25 +00:00
Konstantin Belousov	aaf78c16f5	Do not leak oldvmspace if image activation failed and current address space is already destroyed, so kern_execve() terminates the process. While there, clean up some internals of post_execve() inlined in init_main. Reported by: Peter <pmc@citylink.dinoex.sub.org> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26525	2020-09-23 18:03:07 +00:00
Mateusz Guzik	feabaaf995	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho	2020-08-24 08:57:02 +00:00
Mateusz Guzik	d8bc2a17a5	fd: remove fd_lastfile It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead.	2020-07-15 10:24:04 +00:00
Conrad Meyer	698dda6777	kern_exec.c: Produce valid code ifndef SYS_PROTO_H Reported by: Coccinelle	2020-05-02 18:54:25 +00:00
Brooks Davis	b24e6ac8b7	Convert canary, execpathp, and pagesizes to pointers. Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to r359987 and r359988. Reviewed by: jhb Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24446	2020-04-16 21:53:17 +00:00
Brooks Davis	9df1c38bbc	Export argc, argv, envc, envv, and ps_strings in auxargs. This simplifies discovery of these values, potentially with reducing the number of syscalls we need to make at runtime. Longer term, we wish to convert the startup process to pass an auxargs pointer to _start() and use that rather than walking off the end of envv. This is cleaner, more C-friendly, and for systems with strong bounds (e.g. CHERI) necessary. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24407	2020-04-15 20:23:55 +00:00
Brooks Davis	397df744f9	Make ps_strings in struct image_params into a pointer. This is a prepratory commit for D24407. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA	2020-04-15 20:21:30 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Mateusz Guzik	8d4d271e92	execve: use LOCKSHARED when looking up the interpreter Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23956	2020-03-04 19:52:34 +00:00
Pawel Biernacki	b05ca4290c	sys/: Document few more sysctls. Submitted by: Antranig Vartanian <antranigv@freebsd.am> Reviewed by: kaktus Commented by: jhb Approved by: kib (mentor) Sponsored by: illuria security Differential Revision: https://reviews.freebsd.org/D23759	2020-03-02 15:30:52 +00:00
Jeff Roberson	7aaf252c96	Convert a few triviail consumers to the new unlocked grab API. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23847	2020-02-28 20:34:30 +00:00
Konstantin Belousov	a113b17f10	Do not read sigfastblock word on syscall entry. On machines with SMAP, fueword executes two serializing instructions which can be seen in microbenchmarks. As a measure to restore microbenchmark numbers, only read the word on the attempt to deliver signal in ast(). If the word is set, signal is not delivered and word is kept, preventing interruption of interruptible sleeps by signals until userspace calls sigfastblock(UNBLOCK) which clears the word. This way, the spurious EINTR that userspace can see while in critical section is on first interruptible sleep, if a signal is pending, and on signal posting. It is believed that it is not important for rtld and lbithr critical sections. It might be visible for the application code e.g. for the callback of dl_iterate_phdr(3), but again the belief is that the non-compliance is acceptable. Most important is that the retry of the sleeping syscall does not interrupt unless additional signal is posted. For now I added the knob kern.sigfastblock_fetch_always to enable the word read on syscall entry to be able to diagnose possible issues due to spurious EINTR. While there, do some code restructuting to have all sigfastblock() handling located in kern_sig.c. Reviewed by: jeff Discussed with: mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23622	2020-02-20 15:34:02 +00:00
Konstantin Belousov	146fc63fce	Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773	2020-02-09 11:53:12 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Jeff Roberson	af00971419	Handle pagein clustering in vm_page_grab_valid() so that it can be used by exec_map_first_page(). This will also enable pagein clustering for other interested consumers (tmpfs, md, etc). Discussed with: alc Approved by: kib Differential Revision: https://reviews.freebsd.org/D22731	2019-12-15 02:00:32 +00:00
John Baldwin	d8010b1175	Copy out aux args after the argument and environment vectors. Partially revert r354741 and r354754 and go back to allocating a fixed-size chunk of stack space for the auxiliary vector. Keep sv_copyout_auxargs but change it to accept the address at the end of the environment vector as an input stack address and no longer allocate room on the stack. It is now called at the end of copyout_strings after the argv and environment vectors have been copied out. This should fix a regression in r354754 that broke the stack alignment for newer Linux amd64 binaries (and probably broke Linux arm64 as well). Reviewed by: kib Tested on: amd64 (native, linux64 (only linux-base-c7), and i386) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22695	2019-12-09 19:17:28 +00:00
John Baldwin	31174518d2	Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501	2019-12-03 23:17:54 +00:00

1 2 3 4 5 ...

550 Commits