freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Konstantin Belousov	57f22c828e	sigfastblock: do not skip cursig/postsig loop in ast() Even if sigfastblock block is non-zero, non-blockable signals must be checked on ast and delivered now. This also affects debugger ability to attach, because issignal() also calls ptracestop() if there is a pending stop for debugee. Instead of checking for sigfastblock, and either setting PENDING flag for usermode or doing signal delivery loop, always do the loop after checking, and then handle PENDING bit. issignal() already does the right thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to happen. Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>, markj Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089	2021-01-12 12:45:26 +02:00
Konstantin Belousov	513320c0f1	sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set. User pending bit should not be set if kernel did not noted a pending signal. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089	2021-01-12 12:43:34 +02:00
Alan Somers	ff1a307801	lio_listio: validate aio_lio_opcode Previously, we would accept any kind of LIO_* opcode, including ones that were intended for in-kernel use only like LIO_SYNC (which is not defined in userland). The situation became more serious with `022ca2fc7f`. After that revision, setting aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion. Note that POSIX does not specify what should happen if aio_lio_opcode is invalid. MFC-with: `022ca2fc7f` Reviewed by: jhb, tmunro, 0mp Differential Revision: <https://reviews.freebsd.org/D28078	2021-01-11 19:53:01 -07:00
Jason A. Harmening	e8a5a1ad71	rctl(4): support throttling resource usage to 0 For rate-based resources that support throttling (e.g. readiops/writeips), this fixes a divide-by-zero panic when rctl(8) passes 0 as the throttle value. For these resources, treat zero-throttle requests as requests to suspend forward progress as long as possible using the duration specified in kern.racct.rctl.throttle_max. PR: 251803 Reported by: chris@cretaforce.gr Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27858	2021-01-11 15:36:57 -08:00
Konstantin Belousov	4ea65707d3	exec_new_vmspace: print useful error message on ctty if stack cannot be mapped. After old vmspace is destroyed during execve(2), but before the new space is fully constructed, an error during image activation cannot be returned because there is no executing program to receive it. In the relatively common case of failure to map stack, print some hints on the control terminal. Note that user has enough knobs to cause stack mapping error, and this is the most common reason for execve(2) aborting the process. Requested by: jhb Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Konstantin Belousov	2e1c94aa1f	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Robert Watson	30b68ecda8	Changes that improve DTrace FBT reliability on freebsd/arm64: - Implement a dtrace_getnanouptime(), matching the existing dtrace_getnanotime(), to avoid DTrace calling out to a potentially instrumentable function. (These should probably both be under KDTRACE_HOOKS. Also, it's not clear to me that they are correct implementations for the DTrace thread time functions they are used in .. fixes for another commit.) - Don't allow FBT to instrument functions involved in EL1 exception handling that are involved in FBT trap processing: handle_el1h_sync() and do_el1h_sync(). - Don't allow FBT to instrument DDB and KDB functions, as that makes it rather harder to debug FBT problems. Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel panics due to recursion in DTrace. Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to have the aarch64 instrumentor more carefully check that instructions it replaces are against the stack pointer, which can otherwise lead to memory corruption. That change remains under review. MFC after: 2 weeks Reviewed by: andrew, kp, markj (earlier version), jrtc27 (earlier version) Differential revision: https://reviews.freebsd.org/D27766	2021-01-11 15:42:22 +00:00
Robert Watson	4f2cbaf3cd	Track pipe(2) reads and writes as rusage message receives and sends, a feature misplaced during the transition from BSD 4.4's socket implementation to the optimised FreeBSD pipe implementation. MFC after: 1 week Reviewed by: arichardson, imp Differential Revision: https://reviews.freebsd.org/D27878	2021-01-10 12:16:39 +00:00
Jamie Gritton	2a4b225146	jail: Simplify handling of prison_deref() Track the the current lock/reference state in a single variable, rather than deducing the proper prison_deref() flags from a combination of equations and hard-coded values.	2021-01-09 21:05:06 -08:00
Konstantin Belousov	5844bd058a	jobc: rework detection of orphaned groups. Instead of trying to maintain pg_jobc counter on each process group update (and sometimes before), just calculate the counter when needed. Still, for the benefit of the signal delivery code, explicitly mark orphaned groups as such with the new process group flag. This way we prevent bugs in the corner cases where updates to the counter were missed due to complicated configuration of p_pptr/p_opptr/real_parent (debugger). Since we need to iterate over all children of the process on exit, this change mostly affects the process group entry and leave, where we need to iterate all process group members to detect orpaned status. (For MFC, keep pg_jobc around but unused). Reported by: jhb Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	cf4f802e77	kinfo_proc: move job-control related data collection into a new helper. This improves code structure and allows to put the lock asserts right into place where the locks are needed. Also move zeroing of the kinfo_proc structure from fill_kinfo_proc_only() to fill_kinfo_proc(), this looks more symmetrical. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	4daea93813	Lock proctree in around fill_kinfo_proc(). Proctree lock is needed for correct calculation and collection of the job-control related data in kinfo_proc. There was even an XXX comment about it. Satisfy locking and lock ordering requirements by taking proctree lock around pass over each bucket in proc_iterate(), and in sysctl_kern_proc() and note_procstat_proc() for individual process reporting. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	a008bdeda3	tty_wait_background: improve locking. Increase the scope of the process group lock ownership. This ensures that we are consistent in returning EIO for tty write from an orphan and delivery of TTYOUT signals. Reviewed by: jilles Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	ef739c7373	pgrp: Prevent use after free. Often, we have a process locked and need to get locked process group. In this case, because progress group lock is before process lock, unlocking process allows the group to be freed. See for instance tty_wait_background(). Make pgrp structures allocated from nofree zone, and ensure type stability of the pgrp mutex. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00
Konstantin Belousov	e0d83cd3e4	issignal(): when handling STOP-like signals, drop sigacts mutex earlier. Reviewed by: jilles Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00
Konstantin Belousov	993a1699b1	Style. Improve some KASSERTs messages. Reviewed by: jilles Tested by: pho MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00
Michael Tuexen	6685e259e3	tcp: don't use KTLS socket option on listening sockets KTLS socket options make use of socket buffers, which are not available for listening sockets. Reported by: syzbot+a8829e888a93a4a04619@syzkaller.appspotmail.com Reviewed by: jhb@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D27948	2021-01-08 08:57:11 +01:00
Jan Kokemüller	4d0c33be63	kevent(2): Bugfix for wrong EVFILT_TIMER timeouts When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be used instead of NS_TO_SBT, otherwise the timeout results are misleading. PR: 252539 Reviewed by: kevans, kib Approved by: kevans MFC after: 3 weeks	2021-01-09 20:00:25 +01:00
Warner Losh	40e6e2c2f7	sysctl: improve debug.kdb.panic_str description Improve the wording for this sysctl. Submitted by: rpokala@	2021-01-09 11:10:42 -07:00
Warner Losh	936440560b	sysctl: implement debug.kdb.panic_str This is just like debug.kdb.panic, except the string that's passed in is reported in the panic message. This allows people with automated systems to collect kernel panics over a large fleet of machines to flag panics better. Strings like "Warner look at this hang" or "see JIRA ABC-1234 for details" allow these automated systems to route the forced panic to the appropriate engineers like you can with other types of panics. Other users are likely possible. Relnotes: Yes Sponsored by: Netflix Reviewed by: allanjude (earlier version) Suggestions from review folded in by: 0mp, emaste, lwhsu Differential Revision: https://reviews.freebsd.org/D28041	2021-01-08 14:30:28 -07:00
Andrew Gallatin	52cd25eb1a	mbuf: enable ext_pgs ("unmapped") mbufs by default Ext_pg mbufs allow carrying multiple pages per mbuf. This reduces mbuf linked list traversals, especially in socket buffers, thereby reducing cache misses and CPU use for applications using sendfile. Note that ext_pages use unmapped pages, eliminating KVA mapping costs on 32-bit platforms. Ext_pg mbufs are also required for ktls (KERN_TLS), and having them disabled by default is a stumbling block for those wishing to enable ktls. Reviewed-by: jhb, glebius Sponsored by: Netfix	2021-01-08 13:43:30 -05:00
Mateusz Guzik	8ddea0b127	cache: just assign ni_resflags = NIRES_ABS It is guaranteed to be 0 on entry.	2021-01-08 13:57:10 +00:00
Toomas Soome	742653ebd5	sysctl debug.dump_modinfo should recognize font module Add MODINFOMD_FONT to dump list.	2021-01-08 09:24:49 +02:00
Alan Somers	20321e6225	Regenerate syscall files after reallocation of aio_writev/aio_readv	2021-01-07 19:50:32 -07:00
Alan Somers	b3286afae3	Reallocate syscall numbers for aio_writev and aio_readv The originally chosen numbers interfere with downstream projects' syscalls. Move them to the end of the syscall table instead. Reported by: jrtc27 Reviewed by: brooks MFC-With: `022ca2fc7f` Differential Revision: `022ca2fc7f`	2021-01-07 19:49:27 -07:00
Thomas Munro	801ac943ea	aio_fsync(2): Support O_DSYNC. aio_fsync(O_DSYNC, ...) is the asynchronous version of fdatasync(2). Reviewed by: kib, asomers, jhb Differential Review: https://reviews.freebsd.org/D25071	2021-01-08 13:15:56 +13:00
Thomas Munro	a5e284038e	open(2): Add O_DSYNC flag. POSIX O_DSYNC means that writes include an implicit fdatasync(2), just as O_SYNC implies fsync(2). VOP_WRITE() functions that understand the new IO_DATASYNC flag can act accordingly, but we'll still pass down IO_SYNC so that file systems that don't understand it will continue to provide the stronger O_SYNC behaviour. Flag also applies to fcntl(2). Reviewed by: kib, delphij Differential Revision: https://reviews.freebsd.org/D25090	2021-01-08 13:15:56 +13:00
Mateusz Guzik	71bd18d373	fd: use seqc_read_notmodify when translating fds	2021-01-07 23:30:04 +00:00
Mateusz Guzik	20ac5cda96	fd: make fd/fp mandatory They are both always passed anyway.	2021-01-07 23:30:04 +00:00
Mateusz Guzik	fee405e057	cache: stop checkpointing cn_flags They are only modified, if ever, for the last component.	2021-01-07 23:29:52 +00:00
Mateusz Guzik	ac7715471c	cache: stop checkpointing cn_nameptr For aborts cn_nameptr is the same as cn_pnbuf. For partial results the same cn_nameptr is to be used.	2021-01-07 23:29:38 +00:00
Mateusz Guzik	0f1fc3a31f	cache: stop manipulating pathlen It is a copy-pasto from regular lookup. Add debug to ensure the result is the same.	2021-01-07 23:26:53 +00:00
Chuck Silvers	11403bdeb4	vfs: fix rangelock range in vn_rdwr() for IO_APPEND vn_rdwr() must lock the entire file range for IO_APPEND just like vn_io_fault() does for O_APPEND. Reviewed by: kib, imp, mckusick Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D28008	2021-01-07 13:37:35 -08:00
Mateusz Guzik	f2b794e1e9	cache: unengrish the comment in previous commit Reported by: rpokala, brd	2021-01-06 23:46:05 +00:00
Mateusz Guzik	deabdc6868	cache: stop pre-checking seqc when starting the lookup Tested by: pho	2021-01-06 07:28:07 +00:00
Mateusz Guzik	71a6a0b545	cache: skip checking for spurious slashes if possible Tested by: pho	2021-01-06 07:28:06 +00:00
Mateusz Guzik	33f3e81df5	cache: combine fast path enabled status into one flag Tested by: pho	2021-01-06 07:28:06 +00:00
Mateusz Guzik	dbbbc07cc3	cache: split handling of 0 and non-0 error codes Tested by: pho	2021-01-06 07:07:24 +01:00
Mateusz Guzik	a1a8f8ada1	cache: deinline state handling The intent is to reduce branchfest when finishing the lookup. Tested by: pho	2021-01-06 07:05:22 +01:00
Mateusz Guzik	05803be000	cache: stop setting cn_nameptr on entry as matches cn_pnbuf already While here tidy up other asserts.	2021-01-06 07:03:41 +01:00
Mateusz Guzik	3814bea00a	cache: drop the now spurious doomed check when crossing a mount point	2021-01-03 21:22:16 +00:00
Mateusz Guzik	33a195baf3	vfs: keep seqc unchanged as long as the vnode is accessible via SMR	2021-01-03 21:22:16 +00:00
Mark Johnston	214257da3a	sendfile: Clear page pointers when handling a pager error When INVARIANTS is configred, the sendfile_iodone() callback verifies that pages attached to the sendfile header are wired, but we unwire all such pages after a synchronous pager error, before calling sendfile_iodone(). Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation	2021-01-03 11:50:31 -05:00
Mark Johnston	90f580b954	Ensure that dirent's d_off field is initialized We have the d_off field in struct dirent for providing the seek offset of the next directory entry. Several filesystems were not initializing the field, which ends up being copied out to userland. Reported by: Syed Faraz Abrar <faraz@elttam.com> Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27792	2021-01-03 11:50:31 -05:00
Mateusz Guzik	82397d7919	vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794	2021-01-03 06:50:06 +00:00
Mateusz Guzik	3e506a67bb	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793	2021-01-03 06:50:06 +00:00
Mateusz Guzik	51bf55fa6c	cache: stop checkpointing cn_namelen The variable is recomputed by regular lookup from the get go.	2021-01-03 06:50:06 +00:00
Mateusz Guzik	7220a10b5b	cache: predict on no spurious slashes in cache_fpl_handle_root This is a step towards speculatively not handling them.	2021-01-03 06:50:06 +00:00
Mateusz Guzik	30a2fc91fa	cache: postpone NAME_MAX check as it may be unnecessary	2021-01-03 06:50:06 +00:00

1 2 3 4 5 ...

18077 Commits