freebsd-skq

Author	SHA1	Message	Date
ed	4eb594a8c4	Change the return type of msgrcv() to ssize_t as required by POSIX. It looks like the msgrcv() system call is already written in such a way that the size is internally computed as a size_t and written into all of td_retval[0]. This means that it is effectively already returning ssize_t. It's just that the userspace prototype doesn't match up.	2016-07-28 12:22:01 +00:00
kib	8fc564dae0	Rewrite subr_sleepqueue.c use of callouts to not depend on the specifics of callout KPI. Esp., do not depend on the exact interface of callout_stop(9) return values. The main change is that instead of requiring precise callouts, code maintains absolute time to wake up. Callouts now should ensure that a wake occurs at the requested moment, but we can tolerate both run-away callout, and callout_stop(9) lying about running callout either way. As consequence, it removes the constant source of the bugs where sleepq_check_timeout() causes uninterruptible thread state where the thread is detached from CPU, see e.g. r234952 and r296320. Patch also removes dual meaning of the TDF_TIMEOUT flag, making code (IMO much) simpler to reason about. Tested by: pho Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 09:09:55 +00:00
kib	2b85baaf40	Extract the calculation of the callout fire time into the new function callout_when(9). See the man page update for the description of the intended use. Tested by: pho Reviewed by: jhb, bjk (man page updates) Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 08:57:01 +00:00
kib	6ebb9a02fc	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256	2016-07-28 08:41:13 +00:00
stevek	3acd4a25e6	Prepare for network stack as a module - Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the INET and INET6-specific code from the rest of the prot code (It is only used by the network stack, so it makes sense for it to live with the other network stack code.) - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to cr_canseeothergids, make them non-static, and add prototypes (so they can be seen/called by in_prot.c functions.) - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an unused variable. Reviewed by: gnn, jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2901	2016-07-27 20:34:09 +00:00
jhb	e36f9065a4	Adjust tests in fsync job scheduling loop to reduce indentation.	2016-07-27 19:31:25 +00:00
emaste	47803d0890	ANSIfy kern_proc.c and delete register keyword Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6478	2016-07-27 14:27:08 +00:00
kib	d01b9d3a7c	Remove Giant from settime(), tc_setclock_mtx guards tc_windup() calls, and there is no other issues with parallel settime(). Remove spl() vestiges there as well. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:54:24 +00:00
kib	d932ddae7a	Prevent parallel tc_windup() calls, both parallel top-level calls from setclock() and from simultaneous top-level and interrupt. For this, tc_windup() is protected with a tc_setclock_mtx spinlock, in the try mode when called from hardclock interrupt. If spinlock cannot be obtained without spinning from the interrupt context, this means that top-level executes tc_windup() on other core and our try may be avoided. The boottimebin and boottime variables should be adjusted from tc_windup(). To be correct, they must be part of the timehands and read using lockless protocol. Remove the globals and reimplement the getboottime(9)/getboottimebin(9) KPI using the timehands read protocol. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:49:41 +00:00
kib	a25ba832b4	Fix a bug in r302252. Change ntpadj_lock to spinlock always, and rename stuff removing ADJ/adj from the names. ntp_update_second() requires ntp_lock and is called from the tc_windup(), so ntp_lock must be a spinlock. Add missed lock to ntp_update_second(). Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:40:06 +00:00
kib	b1a1209a8d	Reduce the resettodr_lock scope to only CLOCK_SETTIME() call. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:34:25 +00:00
kib	b3d1fb0758	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:33:33 +00:00
kib	b46a956ad1	Reduce number of timehands to just two. This is useful because consumers can now be only one tc_windup() call late. Use C99 initialization. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:27:52 +00:00
kib	42da5a6952	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
stevek	97d2f6d76a	Add the NUM_CORE_FILES kernel config option which specifies the limit for the number of core files allowed by a particular process when using the %I core file name pattern. Sanity check at compile time to ensure the value is within the valid range of 0-10. Reviewed by: jtl, sjg Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6812	2016-07-27 03:21:02 +00:00
ed	1fc7f43ffa	Add shmatt_t. It looks like our "struct shmid_ds::shm_nattch" deviates from the standard in the sense that it is a signed integer, whereas POSIX requires that it is unsigned, having a special type shmatt_t. Patch up our native and 32-bit copies to use a new shmatt_t that is an unsigned integer. As it's unsigned, we can relax the comparisons that are performed on it. Leave the Linux, iBCS2, etc. copies of the structure alone. Reviewed by: ngie Differential Revision: https://reviews.freebsd.org/D6655	2016-07-26 17:23:49 +00:00
cem	4c8503deb3	devfs: Move most ioctl logic down to vnode layer Devfs' file layer ioctl is now just a thin shim around the vnode layer. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7286	2016-07-25 16:28:02 +00:00
kib	5fbe67effd	Implement mtx_trylock_spin(9). Discussed with: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7192	2016-07-23 05:30:55 +00:00
jhb	6db41da768	Add more documentation regarding unsafe AIO requests. The asynchronous I/O changes made previously result in different behavior out of the box. Previously all AIO requests failed with ENOSYS / SIGSYS unless aio.ko was explicitly loaded. Now, some AIO requests complete and others ("unsafe" requests) fail with EOPNOTSUPP. Reword the introductory paragraph in aio(4) to add a general description of AIO before describing the vfs.aio.enable_unsafe sysctl. Remove the ENOSYS error description from aio_fsync(2), aio_read(2), and aio_write(2) and replace it with a description of EOPNOTSUPP. Remove the ENOSYS error description from aio_mlock(2). Log a message to the system log the first time a process requests an "unsafe" AIO request that fails with EOPNOTSUPP. This is modeled on the log message used for processes using the legacy pty devices. Reviewed by: kib (earlier version) MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7151	2016-07-21 22:49:47 +00:00
kib	35a1ffc1d8	Hide counted_warning(9) under #ifdef _KERNEL braces, to allow building subr_prf.c in userspace for libsbuf. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-21 17:59:30 +00:00
kib	1051528910	Declare aio requests on files from local filesystems safe. Two notes: - I allow AIO on reclaimed vnodes, since it is deterministically terminated fast. - devfs mounts are marked as MNT_LOCAL, but device vnodes have type VCHR, so the slow device io is not allowed. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7273	2016-07-21 17:07:06 +00:00
kib	48a468731d	Provide counter_warning(9) KPI which allows to issue limited number of warnings for some kernel events, mostly intended for the use of obsoleted or otherwise undersired interfaces. This is an abstracted and race-expelled code from compat pty driver. Requested and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7270	2016-07-21 16:34:56 +00:00
cem	1b6fb112bc	imgact_elf: Rename the segment iterator to match reality The each_writable_segment routine evaluates segments on a slightly little more nuanced metric than simply "writable" or not. Rename the function to more closely match its behavior (each_dumpable_segment). Suggested by: jhb Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:51:33 +00:00
cem	1e19a6f1d9	ANSI-fy imgact_elf.c Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:46:56 +00:00
cem	b8b7be1d97	Fix DEBUG build on 64-bit arch after r303099 Reported by: Larry Rosenman <ler at lerctr.org>	2016-07-20 18:11:22 +00:00
cem	08b61c5d52	Extend ELF coredump to support more than 65535 segments The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments (program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7, depending on document version) prescribes a special first section header where sh_info represents the real number of program headers. Test code to follow, when it is ready. Reference: http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf Reviewed by: emaste, markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7255	2016-07-20 16:59:36 +00:00
glebius	ec1d5b2486	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky Tested by: Larry Rosenman <ler lerctr.org> PR: 210884	2016-07-20 16:48:25 +00:00
glebius	40325c7987	Revert r303037. It re-introduces the panic with TCP timers. Agreed by: rrs, re (gjb)	2016-07-20 16:44:22 +00:00
rrs	992908d82b	This reverts out Gleb's changes and adds three small fixes that I think closes up the races Gleb was looking for. This is running quite nicely in Netflix and now no longer causes TCP-tcb leaks. Differential Revision: 7135	2016-07-19 18:31:19 +00:00
jhb	5535084c1a	Include process IDs in core dumps. When threads were added to the kernel, the pr_pid member of the NT_PRSTATUS note was repurposed to store LWP IDs instead of process IDs. However, the process ID was no longer recorded in core dumps. This change adds a pr_pid field to prpsinfo (NT_PRSINFO). Rather than bumping the prpsinfo version number, note parsers can use the note's payload size to determine if pr_pid is present. Reviewed by: kib, emaste (older version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7117	2016-07-18 15:14:23 +00:00
jhb	f39e6951ab	Add PTRACE_VFORK to trace vfork events. First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when the new child was created via vfork() rather than fork(). Second, a new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK event mask. This new stop is reported after the vfork parent resumes due to the child calling exit or exec. Debuggers can use this stop to reinsert breakpoints in the vfork parent process before it resumes. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7045	2016-07-18 14:53:55 +00:00
kib	2b0b2600f6	The assertion re-added in r302614 was triggered when stopping signal is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-18 10:53:47 +00:00
glebius	26eb470bad	Revert the last commit. It must get more review and testing first.	2016-07-18 09:29:08 +00:00
glebius	65e3443cc5	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky PR: 210884	2016-07-18 09:26:06 +00:00
kib	368a94767c	Fix another bug after r302350. Reported and tested by: pho PR: 210884 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-07-18 04:30:34 +00:00
kib	b5abcbd40f	Another issue reported on http://seclists.org/oss-sec/2016/q3/68 is that struct kevent member ident has uintptr_t type, which is silently truncated to int in the call to fget(). Explicitely check for the valid range. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-16 13:24:58 +00:00
kib	0738dd9c9e	In ptrace_vm_entry(), do not call vmspace_free() while owning a vm object lock. The vmspace_free() operations might need to lock map, object etc on last dereference. Postpone the free until object's inspection is done. Reported and tested by: will Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-15 23:26:33 +00:00
jhb	91d07047c4	Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044	2016-07-15 15:32:09 +00:00
glebius	ba9382e34a	Fix regression introduced by r302350. The change of return value for a callout that wasn't scheduled at all was unintentional and yielded in several panics. PR: 210884	2016-07-15 09:28:32 +00:00
kib	53c82a1389	Do not allow creation of char or block special nodes with VNOVAL dev_t. As was reported on http://seclists.org/oss-sec/2016/q3/68, tmpfs code contains assertion that rdev != VNOVAL. On FreeBSD, there is no other consequences except triggering the assert. To be compatible with systems where device nodes have some significance, reject mknod(2) call with dev == VNOVAL at the syscall level. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-15 09:23:18 +00:00
jhb	9a57990b79	Include command line arguments in core dump process info. Fill in pr_psargs in the NT_PRSINFO ELF core dump note with command line arguments. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7116	2016-07-14 23:20:05 +00:00
markj	b1a6a8478f	Let DDB's buf printer handle NULL pointers in the buf page array. A buf's b_pages and b_npages fields may be inconsistent after a panic. For instance, vfs_vmio_invalidate() sets b_npages to zero only after all pages are unwired and their page array entries are cleared. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-07-14 18:49:05 +00:00
badger	5908cb719e	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
kib	4575263f3c	Trace timeval parameters to the getitimer(2) and setitimer(2) syscalls. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7158	2016-07-13 14:37:58 +00:00
kib	5137ef553e	Revive the check, disabled in r197963. Despite the implication (process has pending signals -> the current thread marked for AST and has TDF_NEEDSIGCHK set) is not true due to other thread might manipulate its signal blocking mask, it should still hold for the single-threaded processes. Enable check for the condition for single-threaded case, and replicate it from userret() to ast() as well, where we check that ast indeed has no signal to deliver. Note that the check is under DIAGNOSTIC, it is not enabled for INVARIANTS but !DIAGNOSTIC since it imposes too heavy-weight locking for day-to-day used debugging kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-12 03:53:15 +00:00
kib	c58cbf4e59	Add assert to complement r302328. AST must not execute with TDF_SBDRY or TDF_SEINTR/TDF_SERESTART thread flags set, which is asserted in userret(). As the consequence, -1 return from cursig() must not be possible. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-12 03:52:05 +00:00
nwhitehorn	9852e91f16	Remove assumptions in MI code that the BSP is CPU 0. MFC after: 2 weeks	2016-07-11 21:25:28 +00:00
kib	9d45f8230f	Fix grammar. Submitted by: alc MFC after: 2 weeks	2016-07-11 17:04:22 +00:00
kib	c859b3f77d	In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called, if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate(). The vnode_destroy_object(), when calling into vm_object_terminate(), must be able to flush buffers. BO_DEAD purpose is to quickly destroy buffers on write when the underlying vnode is not operable any more (one example is the devfs node after geom is gone). Setting BO_DEAD for reclaiming vnode before object is terminated is premature, and results in unability to flush buffers with live SU dependencies from vinvalbuf() in vm_object_terminate(). Reported by: David Cross <dcrosstech@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-11 14:19:09 +00:00
rwatson	70c84e86c1	In process-descriptor close(2) and fstat(2), audit target process information. pgkill(2) already audits target process ID. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 14:17:36 +00:00

1 2 3 4 5 ...

14999 Commits