freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	3aeacc55a5	A followup to r315749, two more places where brand->interp_path was accessed unconditionally. Reported by: se Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-30 04:21:02 +00:00
Robert Watson	b783025921	When handling msgsys(2), semsys(2), and shmsys(2) multiplex system calls, map the 'which' argument into a suitable audit event identifier for the specific operation requested. Obtained from: TrustedBSD Project MFC after: 3 weeks Sponsored by: DARPA, AFRL	2017-03-29 23:31:35 +00:00
Robert Watson	d8ca0a2b70	Hook up new audit event identifiers for various non-Orange Book/CAPP system calls supported by OpenBSM 1.2-alpha5. Obtained from: TrustedBSD Project MFC after: 3 weeks Sponsored by: DARPA, AFRL	2017-03-29 22:33:56 +00:00
Bruce Evans	1370fa3380	Oops, my fix for bright colors broke bright black some more (in cases that used to work via the bold hack). Fix the table entry for bright black. Fix spelling of plain black in nearby table entries (use the macro for black everywhere everywhere). Fix the currently-unused non-bright color table to not have bright colors in entries 9-15. Improve nearby comments. Start converting to the xterm terminology and default rendering of "bright" instead of "light" for bright colors. Syscons wasn't affected by the bug since I optimized it a little by converting colors 0-15 directly. This also fixes the layering of the conversion for these colors. Apply the same optimization to vt (actually the layer above it). This also moves the conversion 1 closer to the correct layer for colors 0-15. The optimization of just avoiding 2 calls to a trivial function is worth about 10% for simple output to the virtual buffer with occasional rendering. The optimization is so large because the 2 calls are done on every character, so although there are too many other calls and other instructions per character, there are only about 10 times as many. Old versions of syscons were about 10 times faster for simple output, by using a fast path with about 12 instructions per character. Rendering to even slow hardware takes relatively little time provided it is rarely actually done.	2017-03-27 10:48:28 +00:00
Andriy Gapon	20c69e76b4	dtrace sched:::preempt should fire only when there is preemption The probe fire on any thread switch before. Reviewed by: markj MFC after: 1 week Sponsored by: Panzura	2017-03-25 19:08:51 +00:00
Gleb Smirnoff	9e3c8bd3e2	Make sendfile(2) more robust against file change. This fixes a possible crash when the file shrinks. This also fixes sendfile(2) not sending more data in a case when the file grows, and the request is open-ended or specifies a size that is greater than old file size. PR: 217789 Reviewed by: gallatin MFC after: 10 days	2017-03-24 16:01:19 +00:00
Ed Schouten	0fe9832013	Don't require the presence of the compat_3_brand. The existing ELF image activator requires the brandinfo to provide such a string unconditionally, even if the executable format in question doesn't use this type of branding. Skip matching when it's a null pointer. Reviewed by: kib MFC after: 2 weeks	2017-03-23 14:09:45 +00:00
Andriy Gapon	afa0a46cfd	move thread switch tracing from mi_switch to sched_switch This is done so that the thread state changes during the switch are not confused with the thread state changes reported when the thread spins on a lock. Here is an example, three consecutive entries for the same thread (from top to bottom): KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep", attributes: prio:84, wmesg:"-", lockname:"(null)" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning", attributes: lockname:"sched lock 1" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running", attributes: none The above trace could leave an impression that the final state of the thread was "running". After this change the sleep state will be reported after the "spinning" and "running" states reported for the sched lock. Reviewed by: jhb, markj MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9961	2017-03-23 08:57:04 +00:00
Konstantin Belousov	2274ab3d7b	Update r315753 with the proper flag name. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:28:13 +00:00
Konstantin Belousov	1438fe3cf2	Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. Interpretation of the 'static' there is that the binary must not specify an interpreter. In particular, shared objects are matched by the brand if BI_CAN_EXEC_DYN is also set. This improves precision of the brand matching, which should eliminate surprises due to brand ordering. Revert r315701. Discussed with and tested by: ed (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:23:01 +00:00
Konstantin Belousov	7aab7a80e2	Adjust r314851 to not require every brand to specify interpreter path. Reported and tested by: ed Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:06:48 +00:00
Enji Cooper	8b69c3e79f	Print out name of non-dynamic sysctl in sysctl_remove_oid_locked This will provide a slightly better smoking gun than just stating "can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9) and sysctl_remove_{name,oid}(9) with a non-dynamic (likely static) sysctl. MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-03-22 05:27:20 +00:00
Conrad Meyer	5abb3b74d3	kern_fail: Allow sleeping for more than 2147483/hz seconds Because of integer types, the timeout calculation result was limited to INT_MAX / (1000 * hz) seconds. For systems with hz=10000, this is only 215 seconds. Perform the calculation with 64-bit math to allow sleeping for the full INT_MAX / hz interval (215000 seconds on such hz=10000 systems). Submitted by: Scott Ferris <sferris at isilon.com> Sponsored by: Dell EMC Isilon	2017-03-21 22:41:37 +00:00
Ed Maste	26af611582	tighten buffer bounds in imgact_binmisc_populate_interp We must ensure there's space for the terminating null in the temporary buffer in imgact_binmisc_populate_interp(). Note that there's no buffer overflow here because xbe->xbe_interpreter's length and null termination is checked in imgact_binmisc_add_entry() before imgact_binmisc_populate_interp() is called. However, the latter should correctly enforce its own bounds. Reviewed by: sbruno MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10042	2017-03-21 18:02:14 +00:00
Alan Cox	2a016de1a5	Use IDX_TO_OFF(), not ptoa(), when converting the difference between two vm_pindex_t's into a vm_ooffset_t. The length given to shm_dotruncate() must never be negative. Assert this. Tidy up a comment. Reviewed by: kib MFC after: 1 week	2017-03-20 05:15:55 +00:00
Alan Cox	ac46d38655	Style fixes. In particular, the variable "bogus" is used like a Boolean. Define it as such. Reviewed by: kib MFC after: 1 week	2017-03-19 23:06:11 +00:00
Eric van Gyzen	26f86ab732	Regenerate syscall files for r315526 Sponsored by: Dell EMC	2017-03-19 00:54:24 +00:00
Eric van Gyzen	3f8455b090	Add clock_nanosleep() Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Reviewed by: kib, ngie, jilles MFC after: 3 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10020	2017-03-19 00:51:12 +00:00
Alan Cox	c547cbb49c	Avoid unnecessary calls to vm_map_protect() in elf_load_section(). Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to elf_map_insert(), it was needlessly enabling execute access on the mapping, and it would later have to call vm_map_protect() to correct the mapping's access rights. Now, instead, elf_load_section() always passes its parameter "prot" to elf_map_insert(). So, elf_load_section() must only call vm_map_protect() if it needs to remove the write access that was temporarily granted to perform a copyout(). Reviewed by: kib MFC after: 1 week	2017-03-18 23:37:00 +00:00
Eric van Gyzen	4cf66812ea	nanosleep: plug a kernel memory disclosure nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep() has not updated rmt, so sys_nanosleep() updates the user-space rmtp by copying garbage from its stack frame. This is not only a kernel memory disclosure, it's also not POSIX-compliant. Fix it to update rmtp only on EINTR. Reviewed by: jilles (via D10020), dchagin MFC after: 3 days Security: possibly Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10044	2017-03-18 20:16:23 +00:00
Bruce Evans	4eb235fb4f	Fix bright colors for syscons, and make them work for the first time for vt. Restore syscons' rendering of background (bg) brightness as foreground (fg) blinking and vice versa, and add rendering of blinking as background brightness to vt. Bright/saturated is conflated with light/white in the implementation and in this description. Bright colors were broken in all cases, but appeared to work in the only case shown by "vidcontrol show". A boldness hack was applied only in 1 layering-violation place (for some syscons sequences) where it made some cases seem to work but was undone by clearing bold using ANSI sequences, and more seriously was not undone when setting ANSI/xterm dark colors so left them bright. Move this hack to drivers. The boldness hack is only for fg brightness. Restore/add a similar hack for bg brightness rendered as fg blinking and vice versa. This works even better for vt, since vt changes the default text mode to give the more useful bg brightness instead of fg blinking. The brightness bit in colors was unnecessarily removed by the boldness hack. In other cases, it was lost later by teken_256to8(). Use teken_256to16() to not lose it. teken_256to8() was intended to be used for bg colors to allow finer or bg-specific control for the more difficult reduction to 8; however, since 16 bg colors actually work on VGA except in syscons text mode and the conversion isn't subtle enough to significantly in that mode, teken_256to8() is not used now. There are still bugs, especially in vidcontrol, if bright/blinking background colors are set. Restore XOR logic for bold/bright fg in syscons (don't change OR logic for vt). Remove broken ifdef on FG_UNDERLINE and its wrong or missing bit and restore the correct hard-coded bit. FG_UNDERLINE is only for mono mode which is not really supported. Restore XOR logic for blinking/bright bg in syscons (in vt, add OR logic and render as bright bg). Remove related broken ifdef on BG_BLINKING and its missing bit and restore the correct hard-coded bit. The same bit means blinking or bright bg depending on the mode, and we want to ignore the difference everywhere. Simplify conversions of attributes in syscons. Don't pretend to support bold fonts. Don't support unusual encodings of brightness. It is as good as possible to map 16 VGA colors to 16 xterm-16 colors. E.g., VGA brown -> xterm-16 Olive will be converted back to VGA brown, so we don't need to convert to xterm-256 Brown. Teken cons25 compatibility code already does the same, and duplicates some small tables. This is mostly for the sc -> te direction. The other direction uses teken_256to16() which is too generic.	2017-03-18 11:13:54 +00:00
Konstantin Belousov	469ec1eb6a	When clearing altsigstack settings on exec, do it to the right thread. Diagnosed by: smh Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-17 13:37:37 +00:00
Eric Badger	b4d3325975	Don't clear p_ptevents on normal SIGKILL delivery The ptrace() user has the option of discarding the signal. In such a case, p_ptevents should not be modified. If the ptrace() user decides to send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events do not have the capability to discard the signal, so continue to clear the mask in that case. Reviewed by: jhb (initial revision) MFC after: 1 week Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9939	2017-03-16 13:03:31 +00:00
John Baldwin	03f7f17878	Use UMA_ALIGN_PTR instead of sizeof(void ) for zone alignment. uma_zcreate()'s alignment argument is supposed to be sizeof(foo) - 1, and uma.h provides a set of helper macros for common types. Passing sizeof(void ) results in all of the members being misaligned triggering unaligned access faults on certain architectures (notably MIPS). Reported by: brooks Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA / AFRL	2017-03-15 18:23:32 +00:00
Alan Cox	52d1addaa1	Relax the locking requirements for vm_object_page_noreuse(). While reviewing all uses of OFF_TO_IDX(), I observed that vm_object_page_noreuse() is requiring an exclusive lock on the object when, in fact, a shared lock suffices. Reviewed by: kib, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D10011	2017-03-15 17:43:45 +00:00
Mark Johnston	7d88be4c03	When draining a callout, don't clear CALLOUT_ACTIVE while it is running. The callout may reschedule itself and execute again before callout_drain() returns, but we should not clear CALLOUT_ACTIVE until the callout is stopped. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-03-15 00:29:27 +00:00
Eric van Gyzen	8addc72b3e	Add missing pieces of r315280 I moved this branch from github to a private server, and pulled from the wrong one when committing r315280, so I failed to include two recent commits. Thankfully, they were only cosmetic and were included in the review. Specifically: Add documentation, polish comments, and improve style(9). Tested by: pho (r315280) MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9791	2017-03-14 22:02:02 +00:00
Konstantin Belousov	d1780e8dac	Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-14 19:39:17 +00:00
Eric van Gyzen	9dbdf2a169	When the RTC is adjusted, reevaluate absolute sleep times based on the RTC POSIX 2008 says this about clock_settime(2): If the value of the CLOCK_REALTIME clock is set via clock_settime(), the new value of the clock shall be used to determine the time of expiration for absolute time services based upon the CLOCK_REALTIME clock. This applies to the time at which armed absolute timers expire. If the absolute time requested at the invocation of such a time service is before the new value of the clock, the time service shall expire immediately as if the clock had reached the requested time normally. Setting the value of the CLOCK_REALTIME clock via clock_settime() shall have no effect on threads that are blocked waiting for a relative time service based upon this clock, including the nanosleep() function; nor on the expiration of relative timers based upon this clock. Consequently, these time services shall expire when the requested relative interval elapses, independently of the new or old value of the clock. When the real-time clock is adjusted, such as by clock_settime(3), wake any threads sleeping until an absolute real-clock time. Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions will set that field to zero and return zero to tell the caller to reevaluate its sleep duration based on the new value of the clock. At present, this affects the following functions: pthread_cond_timedwait(3) pthread_mutex_timedlock(3) pthread_rwlock_timedrdlock(3) pthread_rwlock_timedwrlock(3) sem_timedwait(3) sem_clockwait_np(3) I'm working on adding clock_nanosleep(2), which will also be affected. Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed by: jhb, kib MFC after: 2 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9791	2017-03-14 19:06:44 +00:00
Konstantin Belousov	01feb4c3d4	Use designated initializers for kevent_copyops. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-14 09:25:01 +00:00
Konstantin Belousov	67d0b0ea60	Hide kev_iovlen() definition under #ifdef KTRACE, fixing build of kernel configs without KTRACE. Reported by: rpokala Sponsored by: The FreeBSD Foundation MFC after: 4 days	2017-03-14 08:45:52 +00:00
Ian Lepore	e3f87f6c70	Change 'Hz' back to 'HZ'... it's referring to the kernel config option named HZ, not being used as an abbreviation of the unit of measure.	2017-03-12 18:07:03 +00:00
Ian Lepore	8a3966405e	Correct the abbreviations for microseconds (us, not ms), and for Hz (not HZ).	2017-03-12 17:43:45 +00:00
Konstantin Belousov	9a2dde8013	Avoid reusing p_ksi while it is on queue. When sending SIGCHLD informing reaper that a zombie was reparented to it, we might race with the situation where the previous parent still not finished delivering SIGCHLD and having its p_ksi structure on the signal queue. While on queue, the ksi should not be used for another send. Fix this by copying p_ksi into newly allocated ksi, which is directly put onto reaper sigqueue. The later ensures that siginfo for reaper SIGCHLD is always present, similar to guarantees for siginfo of child. Reported by: bdrewery Discussed with: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-12 13:58:51 +00:00
Konstantin Belousov	9bcf2f2da0	Accept linkers representation for ELF segments with zero on-disk length. For such segments, GNU bfd linker writes knowingly incorrect value into the the file offset field of the program header entry, with the motivation that file should not be mapped for creation of this segment at all. Relax checks for the ELF structure validity when on-disk segment length is zero, and explicitely set mapping length to zero for such segments to avoid validating rounding arithmetic. PR: 217610 Reported by: Robert Clausecker <fuz@fuz.su> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-12 13:51:13 +00:00
Konstantin Belousov	973d67c407	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-12 13:49:42 +00:00
Konstantin Belousov	1e4296c919	Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests. When ktrace-ing io, sys_kevent() allocates memory to copy the requested changes and reported events. Allocations are sized by the incoming syscall lengths arguments, which are user-controlled, and might cause overflow in calculations or too large allocations. Since io trace chunks are limited by ktr_geniosize, there is no sense it even trying to satisfy unbounded allocations. Export ktr_geniosize and clamp the buffers sizes in advance. PR: 217435 Reported by: Tim Newsham <tim.newsham@nccgroup.trust> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-12 13:48:24 +00:00
Alan Cox	e383e820d3	Simplify the control flow and tidy up a comment in map_insert. In collaboration with: kib MFC after: 1 week	2017-03-11 18:57:13 +00:00
Andriy Gapon	28ef18b8c1	trace thread running state when a thread is run for the first time This applies to both KTR_SCHED and DTrace sched:::on-cpu tracing. MFC after: 10 days	2017-03-11 15:57:36 +00:00
Andriy Gapon	6c9271a918	actually implement proc:::lwp-exit probe MFC after: 4 days	2017-03-11 15:47:27 +00:00
Mahdi Mokhtari	32a1fb0d3d	Fix NULL pointer dereference and panic with shm file pread/pwrite. PR: 217429 Reported by: Tim Newsham <tim.newsham@nccgroup.trust> Reviewed by: kib Approved by: dchagin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9844	2017-03-10 10:09:44 +00:00
Gleb Smirnoff	d0147e10ca	In linker_load_file() print name of a file that failed to load. Discussed with: kib	2017-03-09 00:56:07 +00:00
Gleb Smirnoff	a344babf78	Reduce stack usage in link_elf_load_file(), allocating struct nameidata. This function may be called recursively, when a module pulls its dependencies. Under certain circumstances, e.g. quad chain of dependencies and presence of dtrace we may run out of stack.	2017-03-09 00:45:15 +00:00
Gleb Smirnoff	14984031b7	m_mbuftouio() doesn't modify the mbuf.	2017-03-07 19:00:50 +00:00
Eric Badger	b38bd91f4f	don't stop in issignal() if P_SINGLE_EXIT is set Suppose a traced process is stopped in ptracestop() due to receipt of a SIGSTOP signal, and is awaiting orders from the tracing process on how to handle the signal. Before sending any such orders, the tracing process exits. This should kill the traced process. But suppose a second thread handles the SIGKILL and proceeds to exit1(), calling thread_single(). The first thread will now awaken and will have a chance to check once more if it should go to sleep due to the SIGSTOP. It must not sleep after P_SINGLE_EXIT has been set; this would prevent the SIGKILL from taking effect, leaving a stopped orphan behind after the tracing process dies. Also add new tests for this condition. Reviewed by: kib MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9890	2017-03-07 13:41:01 +00:00
Konstantin Belousov	15a9aedfa1	When selecting brand based on old Elf branding, prefer the brand which interpreter exactly matches the one requested by the activated image. This change applies r295277, which did the same for note branding, to the old brand selection, with the same reasoning of fixing compat32 interpreter substitution. PR: 211837 Reported by: kenji@kens.fm Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-07 13:38:25 +00:00
Konstantin Belousov	3d560b4be2	Require whole brand string matching for old Elf branding. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-07 13:37:35 +00:00
Konstantin Belousov	0bbee4cd3f	Consistently use vm_ooffset_t type for the vm object offset in elf_load_section. The values passed currently as vm_offset_t are phdr.p_offset, which have the native Elf word size. Since elf_load_section interprets them as the file offset, use vm object offset type. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-07 13:36:43 +00:00
Hiren Panchasara	f41b2de716	Fix the KASSERT check from r314813. len being 0 is valid. Submitted by: ngie Reported by: ngie (via jenkins test run) Sponsored by: Limelight Networks	2017-03-07 06:46:38 +00:00
Hiren Panchasara	b5b023b91e	We've found a recurring problem where some userland process would be stuck spinning at 100% cpu around sbcut_internal(). Inside sbflush_internal(), sb_ccc reached to about 4GB and before passing it to sbcut_internal(), we type-cast it from uint to int making it -ve. The root cause of sockbuf growing this large is unknown. Correct fix is also not clear but based on mailing list discussions, adding KASSERTs to panic instead of looping endlessly. Reviewed by: glebius Sponsored by: Limelight Networks	2017-03-07 00:20:01 +00:00
Gleb Smirnoff	6cf0c1db55	Fix compilation of r314784 on 32 bit.	2017-03-06 22:32:56 +00:00
Gleb Smirnoff	f2498877c9	In panic() print current timestamp, which matches timestamp in the dump header. This will help to correlate console server logs with dump files, no matter how precise is clock on a console server appliance, and how buggy the appliance is.	2017-03-06 19:14:08 +00:00
Konstantin Belousov	aaadc41f6c	Instead of direct use of vm_map_insert(), call vm_map_fixed(MAP_CHECK_EXCL). This KPI explicitely indicates the intent of creating the mapping at the fixed address, and incorporates the map locking into the callee. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-06 14:09:54 +00:00
Alan Cox	28e8da6517	Style and punctuation fixes. Reviewed by: kib MFC after: 3 days	2017-03-05 23:59:04 +00:00
Emmanuel Vadot	c1b014c51c	Export a sysctl dev.<clkdom>.<unit>.clocks for each clock domain containing all the clocks that they provide. Each clocks are exported under the node 'clock.<clkname>' and have the following children nodes : - frequency - parent (The selected parent, if any) - parents (The list of parents, if any) - childrens (The list of childrens, if any) - enable_cnt (The enabled counter) This give us the possibility to examine clocks at runtime and make graph of the clock flow. Reviewed by: mmel MFC after: 2 month Differential Revision: https://reviews.freebsd.org/D9833	2017-03-05 07:13:29 +00:00
Eric van Gyzen	8a8bea603c	Fix grammar in some comments in subr_sleepqueue.c While I'm here, remove trailing whitespace. Reviewed by: kib, mostly, as part of a larger review MFC after: 3 days	2017-03-03 21:03:28 +00:00
Mark Johnston	7813302434	Fix a ticks comparison in sched_pctcpu_update(). We may fail to reset the %CPU tracking window if a thread does not run for over half of the ticks rollover period, resulting in a bogus %CPU value for the thread until ticks fully rolls over. Handle this by comparing the unsigned difference ticks - ts_ltick with SCHED_TICK_TARG instead. Reviewed by: cem, jeff MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-03-03 20:57:40 +00:00
Ed Maste	e052a8b932	kern_sig.c: ANSIfy and remove archaic register keyword Sponsored by: The FreeBSD Foundation	2017-03-02 22:17:53 +00:00
Konstantin Belousov	fe0a8a3994	Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-03-02 17:35:13 +00:00
Hans Petter Selasky	403f4a31ab	Implement taskqueue_poll_is_busy() for use by the LinuxKPI. Refer to comment above function for a detailed description. Discussed with: kib @ MFC after: 1 week Sponsored by: Mellanox Technologies	2017-03-02 12:20:23 +00:00
Sean Bruno	d945ed6472	Make gtaskqueue compatible with drm-next such that they can be used with the linuxkpi tasklets. Submitted by: mmacy@nextbsd.org Reported by: hps	2017-03-01 18:37:35 +00:00
Konstantin Belousov	55b985b43b	Use vm_map_insert() instead of vm_map_find() in elf_map_insert(). Elf_map_insert() needs to create mapping at the known fixed address. Usage of vm_map_find() assumes, on the other hand, that any suitable address space range above or equal the specified hint, is acceptable. Due to operating on the fresh or cleared address space, vm_map_find() usually creates mapping starting exactly at hint. Switch to vm_map_insert() use to clearly request fixed mapping from the VM. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-01 10:28:15 +00:00
Konstantin Belousov	e3d8f8fed4	When deallocating the vm object in elf_map_insert() due to vm_map_insert() failure, drop the vnode lock around the call to vm_object_deallocate(). Since the deallocated object is the vm object of the vnode, we might get the vnode lock recursion there. In fact, it is almost impossible to make vm_map_insert() failing there on stock kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-01 10:22:07 +00:00
Mateusz Guzik	a21018063b	locks: ensure proper barriers are used with atomic ops when necessary Unclear how, but the locking routine for mutexes was using the release barrier instead of acquire. This must have been either a copy-pasto or bad completion. Going through other uses of atomics shows no barriers in: - upgrade routines (addressed in this patch) - sections protected with turnstile locks - this should be fine as necessary barriers are in the worst case provided by turnstile unlock I would like to thank Mark Millard and andreast@ for reporting the problem and testing previous patches before the issue got identified. ps. .-'---`-. ,' `. \| \ \| \ \ _ \ ,\ _ ,'-,/-)\ ( * \ \,' ,' ,'-) `._,) -',-') \/ ''/ ) / / / ,'-' Hardware provided by: IBM LTC	2017-03-01 05:06:21 +00:00
Scott Long	38e41e66e5	Provide a comment on why stdio.h needs to be included.	2017-02-28 21:27:51 +00:00
Jung-uk Kim	c4e929946c	Include stdio.h to fix libsbuf build. Reviewed by: scottl	2017-02-28 21:18:45 +00:00
Scott Long	388f3ce6c3	Implement sbuf_prf(), which takes an sbuf and outputs it to stdout in the non-kernel case and to the console+log in the kernel case. For the kernel case it hooks the putbuf() machinery underneath printf(9) so that the buffer is written completely atomically and without a copy into another temporary buffer. This is useful for fixing compound console/log messages that become broken and interleaved when multiple threads are competing for the console. Reviewed by: ken, imp Sponsored by: Netflix	2017-02-28 18:25:06 +00:00
Gleb Smirnoff	efe3b0de14	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
Konstantin Belousov	aca4bb9112	Do not leak mount references for dying threads. Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-25 10:38:18 +00:00
Konstantin Belousov	8cd5962571	Remove cpu_deepest_sleep variable. On Core2 and older Intel CPUs, where TSC stops in C2, system does not allow C2 entrance if timecounter hardware is TSC. This is done by tc_windup() which tests for TC_FLAGS_C2STOP flag of the new timecounter and increases cpu_disable_c2_sleep if flag is set. Right now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but TSC is initialized too early for this variable to be set by acpi_cpu.c. There is no reason to require that ACPI reported C2 and deeper states to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from init_TSC_tc() condition. And since this is the only use of the variable, remove it at all. Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com> Suggested by: jhb MFC after: 2 weeks	2017-02-24 16:11:55 +00:00
Warner Losh	bbf6e5144e	Cast values to (int) before comparing them to the range of the enum. This ensures they are in range w/o the warnings.	2017-02-24 01:39:12 +00:00
Warner Losh	df1c30f6bd	KDTRACE_HOOKS isn't guaranteed to be defined. Change to check to see if it is defined or not rather than if it is non-zero. Sponsored by: Netflix, Inc	2017-02-24 01:39:08 +00:00
Mateusz Guzik	dfaa7859d6	mtx: microoptimize lockstat handling in spin mutexes and thread lock While here make the code compilablle on kernels with LOCK_PROFILING but without KDTRACE_HOOKS.	2017-02-23 22:46:01 +00:00
Eric van Gyzen	b215ceaaec	Add sem_clockwait_np() This function allows the caller to specify the reference clock and choose between absolute and relative mode. In relative mode, the remaining time can be returned. The API is similar to clock_nanosleep(3). Thanks to Ed Schouten for that suggestion. While I'm here, reduce the sleep time in the semaphore "child" test to greatly reduce its runtime. Also add a reasonable timeout. Reviewed by: ed (userland) MFC after: 2 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9656	2017-02-23 19:36:38 +00:00
Jonathan T. Looney	c9cde8251c	Fix a panic during boot caused by inadequate locking of some vt(4) driver data structures. vt_change_font() calls vtbuf_grow() to change some vt driver data structures. It uses TF_MUTE to prevent the console from trying to use those data structures while it changes them. During the early stage of the boot process, the vt driver's tc_done routine uses those data structures; however, it is currently called outside the TF_MUTE check. Move the tc_done routine inside the locked TF_MUTE check. PR: 217282 Reviewed by: ed, ray Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9709	2017-02-23 01:18:47 +00:00
Warner Losh	6fec662c86	Make the code match the comments: If we have ANY buf's that failed then return EAGAIN. The current code just returns that if the LAST buf failed. Reviewed by: kib@, trasz@ Differential Revision: https://reviews.freebsd.org/D9677	2017-02-21 18:56:06 +00:00
John Baldwin	150599be12	Consolidate statements to initialize files. Previously, the first lines of various generated files from system call tables were generated in two sections. Some of the initialization was done in BEGIN, and the rest was done when the first line was encountered. The main reason for this split before r313564 was that most of the initialization done in the second section depended on the $FreeBSD$ tag extracted from the system call table. Now that the $FreeBSD$ tag is no longer used, consolidate all of the file initialization in the BEGIN section. This change was tested by confirming that the content of generated files did not change.	2017-02-20 20:37:25 +00:00
Mateusz Guzik	13d2ef0f3a	mtx: fix spin mutexes interaction with failed fcmpset While doing so move recursion support down to the fallback routine.	2017-02-20 19:08:36 +00:00
Eric Badger	82a4538f31	Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260	2017-02-20 15:53:16 +00:00
Konstantin Belousov	ecc6c515ab	Apply noexec mount option for mmap(PROT_EXEC). Right now the noexec mount option disallows image activators to try execve the files on the mount point. Also, after r127187, noexec also limits max_prot map entries permissions for mappings of files from such mounts, but not the actual mapping permissions. As result, the API behaviour is inconsistent. The files from noexec mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution permission, it cannot be re-enabled later. Make this consistent logically and aligned with behaviour of other systems, by disallowing PROT_EXEC for mmap(2). Note that this change only ensures aligned results from mmap(2) and mprotect(2), it does not prevent actual code execution from files coming from noexec mount. Such files can always be read into anonymous executable memory and executed from there. Reported by: shamaz.mazum@gmail.com PR: 217062 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-19 20:51:04 +00:00
Mateusz Guzik	b247fd395d	locks: make trylock routines check for 'unowned' value Since fcmpset can fail without lock contention e.g. on arm, it was possible to get spurious failures when the caller was expecting the primitive to succeed. Reported by: mmel	2017-02-19 16:28:46 +00:00
Hans Petter Selasky	316e092a77	Make sure the thread constructor and destructor eventhandlers are called for all threads belonging to a procedure. Currently the first thread in a procedure is kept around as an optimisation step and is never freed. Because the first thread in a procedure is never freed nor allocated, its destructor and constructor callbacks are never called which means per thread structures allocated by dtrace and the Linux emulation layers for example, might be present for threads which don't need these structures. This patch adds a thread construction and destruction call for the first thread in a procedure. Tested: dtrace, linux emulation Reviewed by: kib @ MFC after: 1 week Sponsored by: Mellanox Technologies	2017-02-19 13:15:33 +00:00
Jason A. Harmening	e2a8d17887	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587	2017-02-19 02:03:09 +00:00
Mateusz Guzik	5c5df0d99b	locks: clean up trylock primitives In particular thius reduces accesses of the lock itself.	2017-02-18 22:06:03 +00:00
Bryan Drewery	8e31b510b0	Fix panic with unlocked vnode to vrecycle(). MFC after: 2 weeks	2017-02-18 05:07:53 +00:00
Mateusz Guzik	a24c8eb847	mtx: plug the 'opts' argument when not used	2017-02-18 01:52:10 +00:00
Mateusz Guzik	cbebea4e67	mtx: get rid of file/line args from slow paths if they are unused This denotes changes which went in by accident in r313877. On most production kernels both said parameters are zeroed and have nothing reading them in either __mtx_lock_sleep or __mtx_unlock_sleep. Thus this change stops passing them by internal consumers which this is the case. Kernel modules use _flags variants which are not affected kbi-wise.	2017-02-17 15:40:24 +00:00
Mateusz Guzik	09f1319acd	mtx: restrict r313875 to kernels without LOCK_PROFILING	2017-02-17 15:34:40 +00:00
Mateusz Guzik	7640beb920	mtx: microoptimize lockstat handling in __mtx_lock_sleep This saves a function call and multiple branches after the lock is acquired.	2017-02-17 14:55:59 +00:00
Mateusz Guzik	0108a98012	sx: fix compilation on UP kernels after r313855 sx primitives use inlines as opposed to macros. Change the tested condition to LOCK_DEBUG which covers the case, but is slightly overzelaous. Reported by: kib	2017-02-17 10:58:12 +00:00
Mateusz Guzik	91fa47076d	Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.	2017-02-17 06:45:04 +00:00
Mateusz Guzik	ffd5c94c4f	locks: let primitives for modules unlock without always goging to the slsow path It is only needed if the LOCK_PROFILING is enabled. It has to always check if the lock is about to be released which requires an avoidable read if the option is not specified..	2017-02-17 05:39:40 +00:00
Mateusz Guzik	afa39f7a32	locks: remove SCHEDULER_STOPPED checks from primitives for modules They all fallback to the slow path if necessary and the check is there. This means a panicked kernel executing code from modules will be able to succeed doing actual lock/unlock, but this was already the case for core code which has said primitives inlined.	2017-02-17 05:09:51 +00:00
Ryan Stone	27ee18ad33	Revert r313814 and r313816 Something evidently got mangled in my git tree in between testing and review, as an old and broken version of the patch was apparently submitted to svn. Revert this while I work out what went wrong. Reported by: tuexen Pointy hat to: rstone	2017-02-16 21:18:31 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Ryan Stone	3600f4ba35	Fix a typo in my previous commit Somehow in the late stages of testing my sched_ule patch, a character was accidentally deleted from the file. Correct this. While I'm committing anyway, the previous commit message requires some clarification: in the normal case of unlending priority after releasing a mutex, the thread that was doing the lending will be woken up and immediately become the highest-priority thread, and in that case no priority inversion would take place. However, if that thread is pinned to a different CPU, then the currently running thread that just had its priority lowered will not be preempted and then priority inversion can occur. Reported by: O. Hartmann (typo), jhb (scheduler clarification) MFC after: 1 month Pointy hat to: rstone	2017-02-16 20:06:21 +00:00
Ryan Stone	09ae7c4814	Check for preemption after lowering a thread's priority When a high-priority thread is waiting for a mutex held by a low-priority thread, it temporarily lends its priority to the low-priority thread to prevent priority inversion. When the mutex is released, the lent priority is revoked and the low-priority thread goes back to its original priority. When the priority of that thread is lowered (through a call to sched_priority()), the schedule was not checking whether there is now a high-priority thread in the run queue. This can cause threads with real-time priority to be starved in the run queue while the low-priority thread finishes its quantum. Fix this by explicitly checking whether preemption is necessary when a thread's priority is lowered. Sponsored by: Dell EMC Isilon Obtained from: Sandvine Inc Differential Revision: https://reviews.freebsd.org/D9518 Reviewed by: Jeff Roberson (ule) MFC after: 1 month	2017-02-16 19:41:13 +00:00
Mark Johnston	c6a4ba5a38	Apply MADV_FREE to exec_map entries only after a lowmem event. This effectively provides the same benefit as applying MADV_FREE inline upon every execve, since the page daemon invokes lowmem handlers prior to scanning the inactive queue. It also has less overhead; the cost of applying MADV_FREE is very noticeable on many-CPU systems since it includes that of a TLB shootdown of global PTEs. For instance, this change nearly halves the system CPU usage during a buildkernel on a 128-vCPU EC2 instance (with some other patches applied). Benchmarked by: cperciva (earlier version) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9586	2017-02-15 01:50:58 +00:00
Eric Badger	28d2efa983	sleepq_catch_signals: do thread suspension before signal check Since locks are dropped when a thread suspends, it's possible for another thread to deliver a signal to the suspended thread. If the thread awakens from suspension without checking for signals, it may go to sleep despite having a pending signal that should wake it up. Therefore the suspension check is done first, so any signals sent while suspended will be caught in the subsequent signal check. Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9530	2017-02-14 17:13:23 +00:00
Andriy Gapon	937c1b0757	try to fix RACCT_RSS accounting There could be a race between the vm daemon setting RACCT_RSS based on the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to zero. In that case we can get a zombie process with non-zero RACCT_RSS. If the process is jailed, that may break accounting for the jail. There could be other consequences. Fix this race in the vm daemon by updating RACCT_RSS only when a process is in the normal state. Also, make accounting a little bit more accurate by refreshing the page resident count after calling vm_pageout_map_deactivate_pages(). Finally, add an assert that the RSS is zero when a process is reaped. PR: 210315 Reviewed by: trasz Differential Revision: https://reviews.freebsd.org/D9464	2017-02-14 13:54:05 +00:00

1 2 3 4 5 ...

15463 Commits