freebsd-skq

Author	SHA1	Message	Date
kib	7d0828c94e	When cleaning up from failed adv locking and checking for write, do not call VOP_CLOSE() manually. Instead, delegate the close to fo_close() performed as part of the fdrop() on the file failed to open. For this, finish constructing file on error, in particular, set f_vnode and f_ops. Forcibly resetting f_ops to badfileops disabled additional cleanups performed by fo_close() for some file types, in this case it was noted that cdevpriv data was corrupted. Since fo_close() call must be enabled for some file types, it makes more sense to enable it for all files opened through vn_open_cred(). In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-17 08:40:51 +00:00
jhb	ea7fa1c904	Remove aiod_timeout. It hasn't been used since the AIO code was made MPSAFE 10 years ago. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4946	2016-01-14 21:28:56 +00:00
jhb	61577b76c5	Rename aiod_bio taskqueue to aiod_kick. This taskqueue is not used to handle bio requests. It is only used to run aio_kick_nowait() to spin up new aio daemon processes. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4904	2016-01-14 20:51:48 +00:00
glebius	796cbcc738	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
cperciva	9ca3584fdd	Fix a bug introduced in r291716: "The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on page boundaries, creating segments that no longer have a size that's a multiple of the sector size. This breaks drivers like blkfront (and probably other stuff)." [1] This was most easily triggered by running `fsck /` on a system running in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably many other userland tools which access the disk directly. Patch by: royger [1] "Thinks this should be fine" by: ken	2016-01-11 20:38:39 +00:00
dchagin	e706df7b9a	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
markj	e38d62e90d	Prevent cv_waiters wraparound. r282971 attempted to fix this problem by decrementing cv_waiters after waking up from sleeping on a condition variable, but this can result in a use-after-free if the CV is freed before all woken threads have had a chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and have cv_signal() explicitly check for sleeping threads once cv_waiters has reached this bound. Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4822	2016-01-09 01:56:46 +00:00
glebius	aaa09777e1	New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and up to now. The new sendfile is the code that Netflix uses to send their multiple tens of gigabits of data per second. The new implementation features asynchronous I/O, when I/O operations are launched, but not awaited to be complete. An explanation of why such behavior is beneficial compared to old one is going to be too long for a commit message, so we will skip it here. Additional features of new syscall are extra flags, which provide an application more control over data sent. The SF_NOCACHE flag tells kernel that data shouldn't be cached after it was sent. The SF_READAHEAD() macro allows to specify readahead size in pages. The new syscalls is a drop in replacement. No modifications are required to applications. One can take nginx binary for stable/10 and run it successfully on head. Although SF_NODISKIO lost its original sense, as now sendfile doesn't block, and now means something completely different (tm), using the new sendfile the old way is absolutely safe. Celebrates: Netflix global launch! Sponsored by: Nginx, Inc. Sponsored by: Netflix Relnotes: yes	2016-01-08 20:34:57 +00:00
glebius	e25e77f91d	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
glebius	088235535d	Revert r293405: it breaks socket buffer INVARIANTS when sending control data over local sockets.	2016-01-08 17:27:23 +00:00
glebius	a4cad9f2ef	For SOCK_STREAM socket use sbappendstream() instead of sbappend().	2016-01-08 01:16:03 +00:00
kib	eb437d36bf	Convert tty common code to use make_dev_s(). Tty.c was untypical in that it handled the si_drv1 issue consistently and correctly, by always checking for si_drv1 being non-NULL and sleeping if NULL. The removed code also illustrated unneeded complications in drivers which are eliminated by the use of new KPI. Reviewed by: hps, jhb Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:15:09 +00:00
kib	3277da17a1	Provide yet another KPI for cdev creation, make_dev_s(9). Immediate problem fixed by the new KPI is the long-standing race between device creation and assignments to cdev->si_drv1 and cdev->si_drv2, which allows the window where cdevsw methods might be called with si_drv1,2 fields not yet set. Devices typically checked for NULL and returned spurious errors to usermode, and often left some methods unchecked. The new function interface is designed to be extensible, which should allow to add more features to make_dev_s(9) without inventing yet another name for function to create devices, while maintaining KPI and even KBI backward-compatibility. Reviewed by: hps, jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:08:02 +00:00
mjg	cbad85009d	cache: ansify functions and fix some style issues No functional changes.	2016-01-07 02:04:17 +00:00
kib	8c46f725d5	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-01-05 14:48:40 +00:00
ian	3d96cedc35	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
marius	05a298f61f	- (Ab)use udivx for dividing the u_int pc_cpuid when implementing CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the benefit of not clobbering %y, allowing to revert r222827 and partially r222828. - In r222828, CATR() already was changed to use the equivalent of PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so belatedly also catch up with the C side of ktr(9). Originally, in r203838 CATR() was moved away from directly reading the module ID or equivalent as that became impractical with other CPU types than USI/II supported. With r222828 in place, per-CPU data generally is set up soon enough, though, that employing PCPU things in ktr(9) also for use during early stages works. - Unfortunately, an exception to the latter is the ktr(9) use in pmap_bootstrap(), which actually is run so early that even checking for bootverbose being set via the loader doesn't work. Consequently, replace the ktr(9) use in pmap_bootstrap() with OF_printf(9) and put it under #ifdef DIAGNOSTIC instead. MFC after: 3 days	2015-12-30 13:49:20 +00:00
jhb	fb5720f7be	Add ptrace(2) reporting for LWP events. Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting thread creation and destruction. Newly created threads will stop to report PL_FLAG_BORN before returning to userland and exiting threads will stop to report PL_FLAG_EXIT before exiting completely. Both of these events are only enabled and reported if PT_LWP_EVENTS is enabled on a process.	2015-12-29 23:25:26 +00:00
jhb	79ec12eeb6	Call kern_thr_exit() instead of duplicating it. This code is missing the racct_subr() call from kern_thr_exit() and would require further code duplication in future changes. Reviewed by: kib MFC after: 1 week	2015-12-29 23:16:20 +00:00
dchagin	dad1819732	Verify that tv_sec value specified in settimeofday() and clock_settime() (CLOCK_REALTIME case) system calls is non negative. This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct() does not properly convert negative tv_sec. ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle negative tv_sec values. Differential Revision: https://reviews.freebsd.org/D4714 Reviewed by: kib MFC after: 1 week	2015-12-27 15:37:07 +00:00
kib	cc13042464	Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary. Before this change, it is impossible to activate non-default interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file exists. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-12-26 15:40:12 +00:00
jtl	f41bf39357	Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf. Differential Revision: https://reviews.freebsd.org/D4692 Reviewed by: kib MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-24 00:58:11 +00:00
ngie	b78f13918e	Fix r292640 vim overzealously removed some trailing `+' and I didn't check the diff MFC after: 1 week X-MFC with: r292640 Pointyhat to: ngie Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:34:43 +00:00
ngie	e1cc5a3ca1	Clean up trailing whitespace; no functional change MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:29:37 +00:00
ngie	9273c09a18	Fold lim_shared into lim_copy to mute a -Wunused compiler warning from clang when the kernel is compiled without INVARIANTS Differential Revision: https://reviews.freebsd.org/D4683 Reviewed by: kib, jhb MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-22 21:07:33 +00:00
kib	bcb048ba0c	If we annoy user with the terminal output due to failed load of interpreter, also show the actual error code instead of some interpretation. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-22 20:12:52 +00:00
jtl	94d8d1452b	Add a safety net to reclaim mbufs when one of the mbuf zones become exhausted. It is possible for a bug in the code (or, theoretically, even unusual network conditions) to exhaust all possible mbufs or mbuf clusters. When this occurs, things can grind to a halt fairly quickly. However, we currently do not call mb_reclaim() unless the entire system is experiencing a low-memory condition. While it is best to try to prevent exhaustion of one of the mbuf zones, it would also be useful to have a mechanism to attempt to recover from these situations by freeing "expendable" mbufs. This patch makes two changes: a) The patch adds a generic API to the UMA zone allocator to set a function that should be called when an allocation fails because the zone limit has been reached. Because of the way this function can be called, it really should do minimal work. b) The patch uses this API to try to free mbufs when an allocation fails from one of the mbuf zones because the zone limit has been reached. The function schedules a callout to run mb_reclaim(). Differential Revision: https://reviews.freebsd.org/D3864 Reviewed by: gnn Comments by: rrs, glebius MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-20 02:05:33 +00:00
mjg	e70da8e2e9	proc: fix a race which could result in dereference of bad p_pgrp pointer on fork During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days	2015-12-18 16:33:15 +00:00
adrian	a3e51ff0e6	[intrng] Migrate the intrng code from sys/arm/arm to sys/kern/subr_intr.c. The ci20 port (by kan@) is going to reuse almost all of the intrng code since the SoC in question looks suspiciously like someone took an ARM SoC design and replaced the ARM core with a MIPS core. * migrate out the code; * rename ARM_ -> INTR_; * rename arm_ -> intr_; * move the interrupt flush routine from intr.c / intrng.c into arm/machdep_intr.c - removing the code duplication and removing the ARM specific bits from here. Thanks to the Star Wars: The Force Awakens premiere line for allowing me a couple hours of quiet time to finish the universe builds. Tested: * make universe TODO: * The structure definitions in subr_intr.c still includes machine/intr.h which requires one duplicates all of the intrng definitions in the platform code (which kan has done, and I think we don't have to.) Instead I should break out the generic things (function declarations, common intr structures, etc) into a separate header. * Kan has requested I make the PIC based IPI stuff optional.	2015-12-18 05:43:59 +00:00
markj	338746b90e	Support an arbitrary number of arguments to DTrace syscall probes. Rather than pushing all eight possible arguments into dtrace_probe()'s stack frame, make the syscall_args struct for the current syscall available via the current thread. Using a custom getargval method for the systrace provider, this allows any syscall argument to be fetched, even in kernels that have modified the maximum number of system call arguments. Sponsored by: EMC / Isilon Storage Division	2015-12-17 00:00:27 +00:00
markj	fa1b8e9a4f	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
glebius	63cd1c131a	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
kib	b5160b0280	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation	2015-12-16 08:48:37 +00:00
kib	764a2409cb	Simplify the loop step in the flushbuflist() and make it independed on the type stability of the buffers memory. Instead of memoizing pointer to the next buffer and validating it, remember the next logical block number in the bo list and re-lookup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation	2015-12-16 08:39:51 +00:00
adrian	64b681fbd5	Don't call wakeup if we're just returning reserved space; just return the reservation and wait for more space to appear. Submitted by: jeff Reviewed by: kib	2015-12-16 00:13:16 +00:00
jamie	b78d6a91e2	Fix jail name checking that disallowed anything that starts with '0'. The intention was to just limit leading zeroes on numeric names. That check is now improved to also catch the leading spaces and '+' that strtoul can pass through. PR: 204897 MFC after: 3 days	2015-12-15 17:25:00 +00:00
trasz	9d2d111f78	Tweak comments. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:30:36 +00:00
trasz	6751d261c4	Actually make the 'amount' argument to racct_adjust_resource() signed, as it was always supposed to be. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:21:13 +00:00
trasz	3430b87794	Avoid useless relocking. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:08:29 +00:00
markj	98fd4878e0	Don't make assertions about td_critnest when the scheduler is stopped. A panicking thread always executes with a critical section held, so any attempt to allocate or free memory while dumping will otherwise cause a second panic. This can occur, for example, if xpt_polled_action() completes non-dump I/O that was pending at the time of the panic. The fact that this can occur is itself a bug, but asserting in this case does little but reduce the reliability of kernel dumps. Suggested by: kib Reported by: pho	2015-12-11 20:05:07 +00:00
imp	b4d51d26ba	Create the MDT_PNP_INFO metadata record to communicate PNP info about modules. External agents may use this data to automatically load those modules. Differential Review: https://reviews.freebsd.org/D3461	2015-12-11 05:27:53 +00:00
smh	4a58b9436f	Don't use 0 for pointer comparison Use NULL instead of 0 for comparison with panicstr. MFC after: 1 week Sponsored by: Multiplay	2015-12-08 18:38:33 +00:00
markj	f734f97f4e	Add helper functions proc_readmem() and proc_writemem(). These helper functions can be used to read in or write a buffer from or to an arbitrary process' address space. Without them, this can only be done using proc_rwmem(), which requires the caller to fill out a uio. This is onerous and results in code duplication; the new functions provide a simpler interface which is sufficient for most existing callers of proc_rwmem(). This change also adds a manual page for proc_rwmem() and the new functions. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D4245	2015-12-07 21:33:15 +00:00
emaste	0d1c50f494	Replace magic value ELF note type with NT_FREEBSD_ABI_TAG As of r291909 elf_common.h provides a definition. Suggested by: kib Sponsored by: The FreeBSD Foundation	2015-12-07 18:43:27 +00:00
kib	80e8626b43	Add support for usermode (vdso-like) gettimeofday(2) and clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural generic timer hardware. It is similar how the RDTSC timer is used in userspace on x86. Fix a permission problem where generic timer access from EL0 (or userspace on v7) was not properly initialized on APs. For ARMv7, mark the stack non-executable. The shared page is added for all arms (including ARMv8 64bit), and the signal trampoline code is moved to the page. Reviewed by: andrew Discussed with: emaste, mmel Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4209	2015-12-07 12:20:26 +00:00
mckusick	1a9ecd3df9	We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik PR: 204949	2015-12-04 03:54:18 +00:00
ken	d0f081c521	Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls. The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O. camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally. For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize. For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes. Things that would be nice to do for camdd(8) eventually: 1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc. 2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null. 3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6. 4. Add support for SATA device passthrough I/O. 5. Add support for random LBAs and/or lengths on the input and output sides. 6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination. sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively. Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful. sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support. Add two new ioctls, CAMIOQUEUE and CAMIOGET. CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer. When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue. If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done. The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc. The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical. The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases. The new passiocleanup() function restores data pointers in user CCBs and frees memory. Add new functions to support kqueue(2)/kevent(2): passreadfilt() tells kevent whether or not the done queue is empty. passkqfilter() adds a knote to our list. passreadfiltdetach() removes a knote from our list. Add a new function, passpoll(), for poll(2)/select(2) to use. Add devstat(9) support for the queued CCB path. sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type. sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.) sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags. sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags(). sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST. sys/dev/md/md.c: Add BIO_VLIST support to md(4). sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit. sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length. Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset. Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset. sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist(). sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL. This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t. sys/sys/bio.h: Add a new bio flag, BIO_VLIST. sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist(). share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls. usr.sbin/Makefile: Add camdd. usr.sbin/camdd/Makefile: Add a makefile for camdd(8). usr.sbin/camdd/camdd.8: Man page for camdd(8). usr.sbin/camdd/camdd.c: The new camdd(8) utility. Sponsored by: Spectra Logic MFC after: 1 week	2015-12-03 20:54:55 +00:00
mckusick	25671cd0d5	We need to zero out the union of pointers in a freed vnode structure. PR: 204949 Fix from: Mateusz Guzik Tested by: Jason Unovitch	2015-12-03 02:04:22 +00:00
nwhitehorn	635273ca5b	Missed header_supported call from r291020: make really, really sure the brand likes the executable.	2015-12-01 17:00:31 +00:00
mjg	101bd0f093	capsicum: plug spurious memset in __cap_rights_init Reviewed by: pjd	2015-12-01 02:48:42 +00:00
mckusick	6f0b4b3366	As the kernel allocates and frees vnodes, it fully initializes them on every allocation and fully releases them on every free. These are not trivial costs: it starts by zeroing a large structure then initializes a mutex, a lock manager lock, an rw lock, four lists, and six pointers. And looking at vfs.vnodes_created, these operations are being done millions of times an hour on a busy machine. As a performance optimization, this code update uses the uma_init and uma_fini routines to do these initializations and cleanups only as the vnodes enter and leave the vnode_zone. With this change the initializations are only done kern.maxvnodes times at system startup and then only rarely again. The frees are done only if the vnode_zone shrinks which never happens in practice. For those curious about the avoided work, look at the vnode_init() and vnode_fini() functions in kern/vfs_subr.c to see the code that has been removed from the main vnode allocation/free path. Reviewed by: kib Tested by: Peter Holm	2015-11-29 21:42:26 +00:00
kib	ee461b4bba	Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct sysent. sv_prepsyscall is unused. sv_sigsize and sv_sigtbl translate signal number from the FreeBSD namespace into the ABI domain. It is only utilized on i386 for iBCS2 binaries. The issue with this approach is that signals for iBCS2 were delivered with the FreeBSD signal frame layout, which does not follow iBCS2. The same note is true for any other potential user if sv_sigtbl. In other words, if ABI needs signal number translation, it really needs custom sv_sendsig method instead. Sponsored by: The FreeBSD Foundation	2015-11-28 08:49:07 +00:00
kib	d58541e227	Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation	2015-11-27 01:45:40 +00:00
kib	3cb64bd1ce	Move the comment about resident pages preventing vnode from leaving active list, into the header comment for vdrop(), which is the function that decides whether to leave the vnode on the list. Note that dirty page write-out in vinactive() is asynchronous. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-11-27 01:16:35 +00:00
ae	da001d5bf7	Check that hhk_helper pointer isn't NULL before access. It isn't forbidden to use NULL pointer for hook_helper in hookinfo structure when hhook_add_hook() adds new helper hook.	2015-11-25 07:14:58 +00:00
kib	896302b6a8	Rework the vnode cache recycling to meet free and unused vnodes targets. See the comment above wantfreevnodes variable for the description of the algorithm. The vfs.vlru_alloc_cache_src sysctl is removed. New code frees namecache sources as the last chance to satisfy the highest watermark, instead of selecting the source vnodes randomly. This provides good enough behaviour to keep vn_fullpath() working in most situations. The filesystem layout with deep trees, where the removed knob was required, is thus handled automatically. Submitted by: bde Discussed with: mckusick Tested by: pho MFC after: 1 month	2015-11-24 09:45:36 +00:00
markj	9b7d03a6f4	The buffer passed to an sbuf drain callback is not necessarily null-terminated, so don't assume that it is. Reported by: pho X-MFC-With: r291059	2015-11-23 18:45:35 +00:00
kib	e0c4faece4	Split kerne timekeep ABI structure vdso_sv_tk out of the struct sysentvec. This allows the timekeep data to be shared between similar ABIs which cannot share sysentvec. Make the timekeep_push_vdso() tick callback to the timekeep structures instead of sysentvecs. If several sysentvec share the vdso_sv_tk structure, we would update the userspace data several times on each tick, without the change. Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when sysentvec is marked with the new SV_TIMEKEEP flag. This saves allocation and update of unneeded vdso_sv_tk for ABIs which do not provide userspace gettimeofday yet, which are PowerPCs arches right now. Make vdso_sv_tk allocator public, namely split out and export alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep data now can allocate it manually and share as appropriate. Requested by: nwhitehorn Tested by: nwhitehorn, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-11-23 07:09:35 +00:00
glebius	6460e3db4e	Remove remnants of the old NFS from vnode pager. Reviewed by: kib Sponsored by: Netflix	2015-11-20 23:52:27 +00:00
trasz	a6632d64f5	The freebsd4_getfsstat() was broken in r281551 to always return 0 on success. All versions of getfsstat(3) are supposed to return the number of [o]statfs structs in the array that was copied out. Also fix missing bounds checking and signed comparison of unsigned types. Submitted by: bde@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-11-20 14:08:12 +00:00
jtl	f2aa140123	Consistently enforce the restriction against calling malloc/free when in a critical section. uma_zalloc_arg()/uma_zalloc_free() may acquire a sleepable lock on the zone. The malloc() family of functions may call uma_zalloc_arg() or uma_zalloc_free(). The malloc(9) man page currently claims that free() will never sleep. It also implies that the malloc() family of functions will not sleep when called with M_NOWAIT. However, it is more correct to say that these functions will not sleep indefinitely. Indeed, they may acquire a sleepable lock. However, a developer may overlook this restriction because the WITNESS check that catches attempts to call the malloc() family of functions within a critical section is inconsistenly applied. This change clarifies the language of the malloc(9) man page to clarify the restriction against calling the malloc() family of functions while in a critical section or holding a spin lock. It also adds KASSERTs at appropriate points to make the enforcement of this restriction more consistent. PR: 204633 Differential Revision: https://reviews.freebsd.org/D4197 Reviewed by: markj Approved by: gnn (mentor) Sponsored by: Juniper Networks	2015-11-19 14:04:53 +00:00
markj	7b874e569f	Remove a commented-out debug print. MFC after: 1 week	2015-11-19 05:58:51 +00:00
markj	7df27c4620	Add support for a configurable output channel to witness(4). This is useful in environments where system configuration is performed by automated interaction with the system console, since unexpected witness output makes such automation difficult. With this change, the new debug.witness.output_channel sysctl allows one to specify that witness output is to be printed to the kernel log (using log(9)) rather than the console. Reviewed by: cem, jhb MFC after: 2 weeks Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4183	2015-11-19 05:56:59 +00:00
markj	afc7726a46	Add vlog(9). Reviewed by: cem, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D4183	2015-11-19 05:50:22 +00:00
nwhitehorn	2b225aeb0b	Extend r270123 to run the brand info's header_supported() routine for branded as well as unbranded binaries. This will be required to add support for the new ELFv2 ABI on powerpc64, which is distinguished from ELFv1 by the contents of the ELF header's flags field. Reviewed by: imp MFC after: 2 weeks	2015-11-18 17:03:22 +00:00
marius	7965ca51f7	- Unbreak dumpsys(9) on sparc64 after r276772 - While at it, arrange #ifndefs in kern_dump.c more intelligently; it's rather confusing to have multiple competing and/or unused functions in the kernel.	2015-11-16 23:02:33 +00:00
trasz	ae1d62cec8	Speed up rctl operation with large rulesets, by holding the lock during iteration instead of relocking it for each traversed rule. Reviewed by: mjg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D4110	2015-11-15 12:10:51 +00:00
rrs	dc494194a2	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
jhb	c1d9f70889	Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773	2015-11-12 22:00:59 +00:00
rrs	24a4335f25	Add new async_drain to the callout system. This is so-far not used but should be used by TCP for sure in its cleanup of the IN-PCB (will be coming shortly). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4076	2015-11-10 14:49:32 +00:00
jpaetzel	22e9b6f780	Fix a bug in the CPU % limiting code If you attempt to set a pcpu limit that is higher than 110% using rctl (for instance, you want a jail to be able to use 2 cores on your system so you set pcpu to 200%) the thing you are trying to limit becomes unthrottled. PR: 189870 Submitted by: dustinwenz@ebureau.com Reviewed by: trasz MFC after: 1 week	2015-11-10 14:14:32 +00:00
trasz	425e227d02	Make naming more consistent; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-11-08 18:11:24 +00:00
trasz	183c511071	Speed up rctl(8) rule retrieval; the difference shows mostly in "rctl -n", as otherwise most of the time is spent resolving UIDs to names. Reviewed by: mjg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D4059	2015-11-08 18:08:31 +00:00
tijl	4e8b6b4a06	Since r289279 bufinit() uses mp_ncpus, but some architectures set this variable during mp_start() which is too late. Move this to mp_setmaxid() where other architectures set it and move x86 assertions to MI code. Reviewed by: kib (x86 part)	2015-11-08 14:26:50 +00:00
markj	a7bb6eb720	- Consistently use PROC_ASSERT_HELD() to verify that a process' hold count is non-zero. - Include the process address in the PROC_ASSERT_HELD() and PROC_ASSERT_NOT_HELD() assertion messages so that the corresponding process can be found easily when debugging. MFC after: 1 week	2015-11-08 01:38:56 +00:00
cem	c56fbbf66e	Flesh out sysctl types further (follow-up of r290475) Use the right intmax_t type instead of intptr_t in a few remaining places. Add support for CTLFLAG_TUN for the new fixed with types. Bruce will be upset that the new handlers silently truncate tuned quad-sized inputs, but so do all of the existing handlers. Add the new types to debug_dump_node, for whatever use that is. Bump FreeBSD_version again, for good measure. We are changing SYSCTL_HANDLER_ARGS and a member of struct sysctl_oid to intmax_t. Correct the sysctl typed NULL values for the fixed-width types. (Hat tip: hps@.) Suggested by: hps (partial) Sponsored by: EMC / Isilon Storage Division	2015-11-07 18:26:32 +00:00
adrian	4c139fdba3	Add a sched_yield() to work around low memory conditions in the current code. Things seem to get stuck in low memory conditions where no bufs are available, the reclamation path is called to wakeup the daemon, but no sleeping is done. Because of this, we are stuck in a tight loop in the current process and never run said reclamation path. This was introduced in r289279 . This is only a temporary workaround to restore system usefulness until the more permanent solutions can be found. Tested: * Carambola2, 64MB (and 32MB by manual config.)	2015-11-07 04:04:00 +00:00
cem	65f14c5699	Round out SYSCTL macros to the full set of fixed-width types Add S8, S16, S32, and U32 types; add SYSCTL() macros for them, as well as for the existing 64-bit types. (While SYSCTLQUAD and UQUAD macros already exist, they do not take the same sort of 'val' parameter that the other macros do.) Clean up the documented "types" in the sysctl.9 document. (These are macros and thus not real types, but the manual page documents intent.) The sysctl_add_oid(9) arg2 has been bumped from intptr_t to intmax_t to accommodate 64-bit types on 32-bit pointer architectures. This is just the kernel support piece; the userspace sysctl(1) support will follow in a later patch. Submitted by: Ravi Pokala <rpokala@panasas.com> Reviewed by: cem Relnotes: no Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D4091	2015-11-07 01:43:01 +00:00
mjg	bc4e60465f	fd: implement kern.proc.nfds sysctl Intended purpose is to provide an equivalent of OpenBSD's getdtablecount syscall for the compat library..	2015-11-07 00:18:14 +00:00
jhb	78dafbc6fb	When dumping an rman in DDB, include the RID of each resource. Submitted by: Ravi Pokala (rpokala@panasas.com) Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D4086	2015-11-05 23:12:23 +00:00
markj	70321eeb6d	Have elf_lookup() return an error if the specified non-weak symbol could not be found. Otherwise, relocations against such symbols will be silently ignored instead of causing an error to be raised. Reviewed by: kib MFC after: 1 week	2015-11-03 03:29:35 +00:00
ngie	c5650de671	Define `fhard` in pps_event(..) only when PPS_SYNC is defined to mute an -Wunused-but-set-variable warning Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job Sponsored by: EMC / Isilon Storage Division	2015-11-02 03:14:37 +00:00
ngie	c5d7b522c7	Define `compress` in `__elfN(coredump)` when #ifdef GZIO is true to mute an -Wunused-but-set-variable warning Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job Sponsored by: EMC / Isilon Storage Division	2015-11-02 01:47:26 +00:00
imp	6471aad35d	The error classification from lower layers is a poor indicator of whether an error is recoverable. Always re-dirty the buffer on errors from write requests. The invalidation we used to do for errors not EIO doesn't need to be done for a device that's really gone, since that's done in a different path. Reviewed by: mckusick@, kib@	2015-10-31 04:53:07 +00:00
kib	979d5cd34d	Minor (and incomplete) style cleanup. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 20:47:42 +00:00
kib	eddc63a998	Also mark compat32 umtx op table as constant. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 19:32:30 +00:00
kib	25e819cedd	Use C99 array initialization, which also makes the code self-documented, and eases addition of new ops. For the similar reasons, eliminate UMTX_OP_MAX. nitems() handles the only use of the symbol. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 19:20:40 +00:00
trasz	e52d6f1e4a	After r290196, the kernel won't wait for stuff like gmirror nodes if they are not required for mounting rootfs. However, it's possible that some setups try to mount them in mountcritlocal (ie from fstab). Export the list of current root mount holds using a new sysctl, vfs.root_mount_hold, and make mountcritlocal retry if "mount -a" fails and the list is not empty. MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3709	2015-10-30 15:52:10 +00:00
trasz	3b970d7ccc	Make root mount wait mechanism smarter, by making it wait only if the root device doesn't yet exist. Reviewed by: kib@, marcel@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3709	2015-10-30 15:35:04 +00:00
bdrewery	7d7f09674b	getnewbuf: Initialize bp to avoid uninitialized pointer dereference and brelse(). This came in recently in r289279. Coverity CID: 1331561	2015-10-29 19:02:24 +00:00
hselasky	881f337ccc	Add missing NULL check in physio(). When destroying a character device the si_devsw field is set to NULL before all references are gone, to indicate the character device is going away. This can cause a NULL-dereference fault inside physio(). The callers of physio() should own a thread reference on the cdev and if si_devsw is seen as non-NULL, it is usable during the execution of the function. Else an ENXIO error code is returned. Reviewed by: kib MFC after: 2 weeks	2015-10-29 13:53:37 +00:00
imp	20050b3d3c	Add a note to the effect that BUS_ADD_CHILD calls device_add_child_ordered to add the child. device_add_child_ordered doesn't call BUS_ADD_CHILD.	2015-10-28 18:53:18 +00:00
mckusick	485c0ddc22	Bring the tags and links entries for amd64 up to date. Based on how out of date it is, I doubt that anyone other than me and my code-reading students still use it.	2015-10-27 22:59:24 +00:00
pjd	10f57f392f	The aio_waitcomplete(2) syscall should not sleep when the given timeout is 0. Without this change it was sleeping for one tick. Maybe not a big deal, but it makes share/dtrace/blocking script to report that. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D3814 Sponsored by: Wheel Systems, http://wheelsystems.com	2015-10-25 18:48:09 +00:00
cem	51e3af66cf	Sysctl: Add common support for U8, U16 types Sponsored by: EMC / Isilon Storage Division	2015-10-22 23:03:06 +00:00
jhb	88ad316e08	Missing regen after last change to sys/kern/syscalls.master.	2015-10-22 21:30:39 +00:00
jhb	9740ac3060	Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D3954	2015-10-22 21:28:20 +00:00
ed	8be8acd7af	Add a way to distinguish between forking and thread creation in schedtail. For CloudABI we need to initialize the registers of new threads differently based on whether the thread got created through a fork or through simple thread creation. Add a flag, TDP_FORKING, that is set by do_fork() and cleared by fork_exit(). This can be tested against in schedtail. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3973	2015-10-22 09:33:34 +00:00
kib	f2fc8e1816	Trim spaces at end of line to record the proper commit message for r289660: Do not allow to execute ptrace(PT_TRACE_ME) when the process is already traced. Do not allow to execute ptrace(PT_TRACE_ME) when there is no parent which can trace the process, i.e. when the parent is already init. Note that after the PT_TRACE_ME request the process is unkillable and non-continuable until a debugger is attached, or parent is killed, the later clears P_TRACED state. Since init clearly would not debug the caller, and cannot be killed, disallow creation of unkillable processes. Reviewed by: jhb, pho Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D3908	2015-10-20 20:38:20 +00:00
kib	c517f862f2	Mark struct thread zone as type-stable. When establishing the locking state for several lock types (including blockable mutexes and sx) failed, locking primitives try to spin while the owner thread is running. The spinning loop performs the test for running condition by dereferencing the owner->td_state field of the owner thread. If the owner thread exited while spinner was put off the processor, it is harmless to access reused struct thread owner, since in some near future the current processor would notice the owner change and make appropriate progress. But it could be that the page which carried the freed struct thread was unmapped, then we fault (this cannot happen on amd64). For now, disallowing free of the struct thread seems to be good enough, and tests which create a lot of threads once, did not demonstrated regressions. Reviewed by: jhb, pho Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D3908	2015-10-20 20:29:21 +00:00

1 2 3 4 5 ...

14687 Commits