freebsd-dev

Author	SHA1	Message	Date
Sean Bruno	1ee17b070d	Fix bug where malloc(.., M_NOWAIT) return value is not checked, Change to M_WAITOK and move outside the mutex Submitted by: shurd Reviewed by: mmacy@nextbsd.org MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7649	2016-10-11 14:08:53 +00:00
Mateusz Guzik	45571f8886	vfs: assert empty tmp free list on unmount	2016-10-08 13:38:05 +00:00
Mateusz Guzik	c6c44ff7eb	vfs: clear the tmp free list flag before taking the free vnode list lock Safe access is already guaranteed because of the mnt_listmx lock.	2016-10-08 13:36:59 +00:00
Konstantin Belousov	f71d08566c	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation	2016-10-07 11:38:28 +00:00
Bryan Drewery	32641585a9	vrefl: Assert that the interlock is held. Sponsored by: Dell EMC Isilon MFC after: 2 weeks	2016-10-06 18:10:19 +00:00
Bryan Drewery	5a22c9582c	Add vrecyclel() to vrecycle() a vnode with the interlock already held. Obtained from: OneFS Sponsored by: Dell EMC Isilon MFC after: 2 weeks	2016-10-06 18:09:22 +00:00
Conrad Meyer	f43292ecf4	vfs_bio: Remove a leading space (style) Introduced in r282085. Sponsored by: Dell EMC Isilon	2016-10-05 23:42:02 +00:00
Bryan Drewery	0617f64ec6	Correct some comments after r294299. Sponsored by: Dell EMC Isilon	2016-10-04 21:44:20 +00:00
Ed Maste	65eea7ede6	ANSIfy inflate.c Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D8143	2016-10-04 17:57:30 +00:00
Konstantin Belousov	5420f76b59	Style. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-10-04 15:23:03 +00:00
Mateusz Guzik	4876636eb7	cache: ignore purgevfs requests for filesystems with few vnodes purgevfs is purely optional and induces lock contention in workloads which frequently mount and unmount filesystems. In particular, poudriere will do this for filesystems with 4 vnodes or less. Full cache scan is clearly wasteful. Since there is no explicit counter for namecache entries, the number of vnodes used by the target fs is checked. The default limit is the number of bucket locks. Reviewed by: kib	2016-10-03 00:02:32 +00:00
Mateusz Guzik	5bb81f9b2d	vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho	2016-09-30 17:27:17 +00:00
Mateusz Guzik	8660b707ff	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib	2016-09-30 17:11:03 +00:00
Gleb Smirnoff	7ed6b78b92	Provide kern.maxphys sysctl, which returns MAXPHYS. Naming matches NetBSD.	2016-09-29 23:07:28 +00:00
Allan Jude	0176ca2ed5	Allow reading the following sysctl MIBs in capability mode: kern.hostname, kern.domainname, and kern.hostuuid This allows sandboxed applications to read these sysctls Submitted by: cem (original version) Reviewed by: cem, jonathan, rwatson (original version) Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D8015	2016-09-29 16:29:49 +00:00
Hans Petter Selasky	99eca1b2b3	While draining a timeout task prevent the taskqueue_enqueue_timeout() function from restarting the timer. Commonly taskqueue_enqueue_timeout() is called from within the task function itself without any checks for teardown. Then it can happen the timer stays active after the return of taskqueue_drain_timeout(), because the timeout and task is drained separately. This patch factors out the teardown flag into the timeout task itself, allowing existing code to stay as-is instead of applying a teardown flag to each and every of the timeout task consumers. Add assert to taskqueue_drain_timeout() which prevents parallel execution on the same timeout task. Update manual page documenting the return value of taskqueue_enqueue_timeout(). Differential Revision: https://reviews.freebsd.org/D8012 Reviewed by: kib, trasz MFC after: 1 week	2016-09-29 10:38:20 +00:00
Hiren Panchasara	7c9a4d09d6	Revert r306337. dhw@ reproted a panic which seems related to this and bde@ has raised some issues.	2016-09-26 15:45:30 +00:00
Eric van Gyzen	310ab671b8	Make no assertions about mutex state when the scheduler is stopped. This changes the assert path to match the lock and unlock paths. MFC after: 1 week Sponsored by: Dell EMC	2016-09-26 15:30:30 +00:00
Hiren Panchasara	41bb1a25a9	In sendit(), if mp->msg_control is present, then in sockargs() we are allocating mbuf to store mp->msg_control. Later in kern_sendit(), call to getsock_cap(), will check validity of file pointer passed, if this fails EBADF is returned but mbuf allocated in sockargs() is not freed. Fix this possible leak. Submitted by: Lohith Bellad <lohith.bellad@me.com> Reviewed by: adrian MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D7910	2016-09-26 10:13:58 +00:00
Julian Elischer	1c8260b61d	Give the user a clue as to which process hit maxfiles. MFC after: 1 week Sponsored by: Panzura	2016-09-24 22:56:13 +00:00
Konstantin Belousov	939457e3e0	Add the foundation copyrights to procctl kernel sources. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-23 12:32:20 +00:00
Mariusz Zaborski	ad5e83dd3c	fd: fix up fget_cap If the kernel is not compiled with the CAPABILITIES kernel options fget_unlocked doesn't return the sequence number so fd_modify will always report modification, in that case we got infinity loop. Reported by: br Reviewed by: mjg Tested by: br, def	2016-09-23 08:13:46 +00:00
Mateusz Guzik	deffc4a026	fd: fix up fgetvp_rights after r306184 fget_cap_locked returns a referenced file, but the fgetvp_rights does not need it. Instead, due to the filedesc lock being held, it can ref the vnode after the file was looked up. Fix up fget_cap_locked to be consistent with other _locked helpers and not ref the file. This plugs a leak introduced in r306184. Pointy hat to: mjg, oshogbo	2016-09-23 06:51:46 +00:00
Mateusz Guzik	1d2541fd1a	cache: get rid of the global lock Add a table of vnode locks and use them along with bucketlocks to provide concurrent modification support. The approach taken is to preserve the current behaviour of the namecache and just lock all relevant parts before any changes are made. Lookups still require the relevant bucket to be locked. Discussed with: kib Tested by: pho	2016-09-23 04:45:11 +00:00
Gleb Smirnoff	a2d8f9d2fc	Fix regression from r297400, which truncates headers in case of low socket buffer and put a small optimization for low socket buffer case: - Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This fixes truncation of headers at low buffer. - If headers ate all the space, jump right to the end of the cycle, to avoid doing single page I/O and allocating zero length mbuf. - Clear hdr_uio only if space is positive, which indicates that all uio was copied in. Reviewed by: pluknet, jtl, emax, rrs, lstewart, emax, gallatin, scottl	2016-09-22 20:34:44 +00:00
Ruslan Bukin	30f3bfe58e	Adjust the sopt_val pointer on bigendian systems (e.g. MIPS64EB). sooptcopyin() checks if size of data provided by user is <= than we can accept, else it strips down the size. On bigendian platforms we have to move pointer as well so we copy the actual data. Reviewed by: gnn Sponsored by: DARPA, AFRL Sponsored by: HEIF5 Differential Revision: https://reviews.freebsd.org/D7980	2016-09-22 12:41:53 +00:00
Mariusz Zaborski	6490bc6529	fd: simplify fgetvp_rights by using fget_cap_locked Reviewed by: mjg	2016-09-22 11:54:20 +00:00
Mariusz Zaborski	85b0f9de11	capsicum: propagate rights on accept(2) Descriptor returned by accept(2) should inherits capabilities rights from the listening socket. PR: 201052 Reviewed by: emaste, jonathan Discussed with: many Differential Revision: https://reviews.freebsd.org/D7724	2016-09-22 09:58:46 +00:00
Mark Johnston	bdaf6d6913	Regenerate syscall provider argument strings.	2016-09-22 04:50:03 +00:00
Mark Johnston	5a4dfc8d83	Annotate syscall provider pointer arguments with the "userland" keyword. This causes dtrace to automatically copyin arguments from userland, so one no longer has to explicitly use the copyin() action to do so. Moreover, copyin() on userland addresses is a no-op, so existing scripts should be unaffected by this change. Discussed with: rstone MFC after: 2 weeks	2016-09-22 04:49:31 +00:00
Konstantin Belousov	851194715d	Make resettodr_lock accessible outside subr_rtc.c. Protect CLOCK_GETTIME() with the lock. Now all time-related accesses to the CMOS for RTC should be under the lock. This is needed to allow upcoming EFI Runtime Services support to provide required execution environment for the firmware calls. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:15:08 +00:00
Konstantin Belousov	643f6f47fd	Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap. Both can be used to cause processes in capability mode to receive SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from syscalls. Idea by: emaste Reviewed by: oshogbo (previous version), emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7965	2016-09-21 08:23:33 +00:00
Edward Tomasz Napierala	e313b4dd95	Fix bug introduced with r302388, which could cause processes accessing automounted shares to hang with "vfs_busy" wchan. (As a workaround one can run 'automount -u' from cron.) Reviewed by: kib@ MFC after: 1 month	2016-09-21 05:44:13 +00:00
Sepherosa Ziehau	a5ec35dfee	Fix LINT building. Sponsored by: Microsoft	2016-09-18 07:37:00 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Mariusz Zaborski	6e70b4f058	fd: add fget_cap and fget_cap_locked primitives They can be used to obtain capabilities along with a referenced fp. Reviewed by: mjg@	2016-09-12 22:46:19 +00:00
John Baldwin	71499f6a2d	Make device_quiet() an attachment property. In particular, reset the DF_QUIET flag when detaching from a device so that a driver that marks a device quiet doesn't dictate policy for a different driver that may claim the device in the future. Reviewed by: rpokala, wblock MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7803	2016-09-12 18:06:42 +00:00
Mateusz Guzik	a27815330c	cache: improve scalability by introducing bucket locks An array of bucket locks is added. All modifications still require the global cache_lock to be held for writing. However, most readers only need the relevant bucket lock and in effect can run concurrently to the writer as long as they use a different lock. See the added comment for more details. This is an intermediate step towards removal of the global lock. Reviewed by: kib Tested by: pho	2016-09-10 16:29:53 +00:00
Konstantin Belousov	2e4fd101fa	Fix build	2016-09-10 09:00:12 +00:00
Jilles Tjoelker	d30e66e53a	wait: Do not copyout uninitialized status/rusage/wrusage. If wait4() or wait6() return 0 because of WNOHANG, the status, rusage and wrusage information should not be returned. PR: 212048 Reported by: Casey Lucas MFC after: 2 weeks	2016-09-09 21:58:48 +00:00
Mateusz Guzik	a0d45f0fc8	locks: add backoff for spin mutexes and thread lock Reviewed by: jhb	2016-09-09 19:13:02 +00:00
Ed Maste	82b3cec52b	ANSIfy uipc_syscalls.c Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7839	2016-09-09 17:40:26 +00:00
Ed Maste	e62264e2dd	Update capabilities.conf comment getdtablesize is per-process state, not global state	2016-09-08 14:04:04 +00:00
Kevin Lo	cee4a05669	In m_devget(), if the data fits in a packet header mbuf, check the amount of data is less than or equal to MHLEN instead of MLEN when placing initial small packet header at end of mbuf. Reviewed by: glebius MFC after: 3 days	2016-09-08 01:02:53 +00:00
Brooks Davis	ed6d876b19	Modernize the initalization of sigproptbl. Use C99 designators to set the value of each slot and the nitems macro to check for valid entries. In the process, switch to indexing by signal number rather than signal-1 for improved clarity. Obtained from: CheriBSD (`a6053c5abf`) Sponsored by: DARPA, AFRL Reviewed by: kib	2016-09-06 22:03:53 +00:00
Mateusz Guzik	5b7d9ae2fd	cv: do a lockless check for no waiters in cv_signal and cv_broadcastpri In case of some consumers like zfs there are no waiters vast majority of the time Reviewed by: jhb MFC after: 1 week	2016-09-06 17:16:59 +00:00
Mateusz Guzik	591df14528	cache: defer freeing entries until after the global lock is dropped This also defers vdrop for held vnodes. Glanced at by: kib	2016-09-04 16:52:14 +00:00
Mateusz Guzik	31977b420a	cache: manage negative entry list with a dedicated lock Since negative entries are managed with a LRU list, a hit requires a modificaton. Currently the code tries to upgrade the global lock if needed and is forced to retry the lookup if it fails. Provide a dedicated lock for use when the cache is only shared-locked. Reviewed by: kib MFC after: 1 week	2016-09-04 08:58:35 +00:00
Mateusz Guzik	b9042ae1bf	cache: put all negative entry management code into dedicated functions Reviewed by: kib MFC after: 1 week	2016-09-04 08:55:15 +00:00
Mark Johnston	3da0f3c9ae	Micro-optimize sleepq_signal(). Lift a comparison out of the loop that finds the highest-priority thread on the queue. MFC after: 1 week	2016-09-04 00:29:48 +00:00
Brooks Davis	fd50a70770	Merge from CheriBSD: Rename sigprop-table constants to SIGPROP_ from SA_ to reduce the impression of a namespace collision. Submitted by: rwatson Reviewed by: jhb, kib (slightly different versions) Obtained from: CheriBSD (`814ec5771c`) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7616	2016-09-02 18:22:56 +00:00
Ed Maste	dd38731e09	allow kern.proc.nfds sysctl in capability mode Reviewed by: allanjude MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7733	2016-09-01 02:51:50 +00:00
Patrick Kelsey	da2ded6575	_taskqueue_start_threads() now fails if it doesn't actually start any threads. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D7701	2016-09-01 02:05:46 +00:00
Mark Johnston	99ab95db4d	Rename unp_dispose_so() to unp_dispose(). It implements the dom_dispose method for local socket domain, so its name should match the method name.	2016-08-31 21:48:22 +00:00
Ed Maste	bce38b9f35	Regnerate after r305140, getdtablesize in capability mode Sponsored by: The FreeBSD Foundation	2016-08-31 18:37:51 +00:00
Ed Maste	ca380195ab	Allow getdtablesize in capability mode getdtablesize is "trivial global state" and is similar to getrlimit(RLIMIT_NOFILE), so should be permitted in capability mode. Reviewed by: oshogbo MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7719	2016-08-31 18:33:15 +00:00
Allan Jude	61bd7ae0ec	Eliminate unnecessary loop in _cap_check() Calling cap_rights_contains() several times with the same inputs is not going to produce a different output. The variable being iterated, i, is never used inside the for loop. The loop is actually done in cap_rights_contains() Submitted by: Ryan Moeller <ryan@freqlabs.com> Reviewed by: oshogbo, ed MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7369	2016-08-31 17:52:11 +00:00
Nathan Whitehorn	09c697016b	Back out misfired extra file in r305108.	2016-08-31 04:03:55 +00:00
Nathan Whitehorn	c9a124dc9a	Refix operation on sparse CPU mappings as in r302372, temporarily broken by r304716. PR: kern/210106 MFC after: 2 days	2016-08-31 04:02:52 +00:00
Mateusz Guzik	4cbafea09c	fd: add fdeget_locked and use in kern_descrip	2016-08-30 21:53:22 +00:00
Bryan Drewery	533f3e1026	Reduce duplicated logic for !SMP Sponsored by: EMC / Isilon Storage Division	2016-08-30 19:26:07 +00:00
John Baldwin	e05ec081fe	Implement 'devctl clear driver' to undo a previous 'devctl set driver'. Add a new 'clear driver' command for devctl along with the accompanying ioctl and devctl_clear_driver() library routine to reset a device to use a wildcard devclass instead of a fixed devclass. This can be used to undo a previous 'set driver' command. After the device's name has been reset to permit wildcard names, it is reprobed so that it can attach to newly-available (to it) device drivers. MFC after: 1 month Sponsored by: Chelsio Communications	2016-08-29 22:48:36 +00:00
Mateusz Guzik	11d3ad2eab	vfs: provide a common exit point in namei for error cases This shortens the function, adds the SDT_PROBE use for error cases and consistenly unrefs rootdir last. Reviewed by: kib MFC after: 2 weeks	2016-08-27 22:43:41 +00:00
Konstantin Belousov	9ce60e28fd	Consistently delimit each vnode description block with two blank lines. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-08-27 18:12:42 +00:00
Konstantin Belousov	0f2d97838d	In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not obliterate possible error from sleep with errors from umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS. Noted and reviewed by: vangyzen Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-25 19:15:02 +00:00
Konstantin Belousov	28e21133f3	Prevent leak of URWLOCK_READ_WAITERS flag for urwlocks. If there was some error, e.g. the sleep was interrupted, as in the referenced PR, do_rw_rdlock() did not cleared URWLOCK_READ_WAITERS. Since unlock only wakes up write waiters when there is no read waiters, for URWLOCK_PREFER_READER kind of locks, the result was missed wakeups for writers. In particular, the most visible victims are ld-elf.so locks in processes which loaded libthr, because rtld locks are urwlocks in prefer-reader mode. Normal rwlocks fall into prefer-reader mode only if thread already owns rw lock in read mode, which is not typical and correspondingly less visible. In the PR, unowned rtld bind lock was waited for in the process where only one thread was left alive. Note that do_rw_wrlock() correctly clears URWLOCK_WRITE_WAITERS in case of errors. Reported and tested by: longwitz@incore.de PR: 211947 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-25 16:35:42 +00:00
Bruce Evans	d350ce61cf	Less-quick fix for locking fixes in r172250. r172250 added a second syscons spinlock for the output routine alone. It is better to extend the coverage of the first syscons spinlock added in r162285. 2 locks might work with complicated juggling, but no juggling was done. What the 2 locks actually did was to cover some of the missing locking in each other and deadlock less often against each other than a single lock with larger coverage would against itself. Races are preferable to deadlocks here, but 2 locks are still worse since they are harder to understand and fix. Prefer deadlocks to races and merge the second lock into the first one. Extend the scope of the spinlocking to all of sc_cnputc() instead of just the sc_puts() part. This further prefers deadlocks to races. Extend the kdb_active hack from sc_puts() internals for the second lock to all spinlocking. This reduces deadlocks much more than the other changes increases them. The s/p,10* test in ddb gets much further now. Hide this detail in the SC_VIDEO_LOCK() macro. Add namespace pollution in 1 nested #include and reduce namespace pollution in other nested #includes to pay for this. Move the first lock higher in the witness order. The second lock was unnaturally low and the first lock was unnaturally high. The second lock had to be above "sleepq chain" and/or "callout" to avoid spurious LORs for visual bells in sc_puts(). Other console driver locks are already even higher (but not adjacent like they should be) except when they are missing from the table. Audio bells also benefit from the syscons lock being high so that audio mutexes have chance of being lower. Otherwise, console drviver locks should be as low as possible. Non-spurious LORs now occur if the bell code calls printf() or is interrupted (perhaps by an NMI) and the interrupt handler calls printf(). Previous commits turned off many bells in console i/o but missed ones done by the teken layer.	2016-08-25 13:46:52 +00:00
Robert Watson	70a98c110e	Audit the accepted (or rejected) username argument to setlogin(2). (NB: This was likely a mismerge from XNU in audit support, where the text argument to setlogin(2) is captured -- but as a text token, whereas this change uses the dedicated login-name field in struct audit_record.) MFC after: 2 weeks Sponsored by: DARPA, AFRL	2016-08-20 20:28:08 +00:00
Robert Watson	c3c0088bb0	Audit additional vnode information in the implementation of the ftruncate(2) system call. This was not required by the Common Criteria, which needed only open-time audit. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2016-08-20 18:51:48 +00:00
Mark Johnston	e5574e0966	Don't set P2_PTRACE_FSTP in a process that invokes ptrace(PT_TRACE_ME). Such processes are stopped synchronously by a direct call to ptracestop(SIGTRAP) upon exec. P2_PTRACE_FSTP causes the exec()ing thread to suspend itself while waiting for a SIGSTOP that never arrives. Reviewed by: kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7576	2016-08-19 17:57:14 +00:00
Michal Meloun	895c8b1c39	INTRNG: Rework handling with resources. Partially revert r301453. - Read interrupt properties at bus enumeration time and store it into global mapping table. - At bus_activate_resource() time, given mapping entry is resolved and connected to real interrupt source. A copy of mapping entry is attached to given resource. - At bus_setup_intr() time, mapping entry stored in resource is used for delivery of requested interrupt configuration. - For MSI/MSIX interrupts, mapping entry is created within pci_alloc_msi()/pci_alloc_msix() call. - For legacy PCI interrupts, mapping entry must be created within pcib_route_interrupt() by pcib driver itself. Reviewed by: nwhitehorn, andrew Differential Revision: https://reviews.freebsd.org/D7493	2016-08-19 10:52:39 +00:00
Mark Johnston	7f649dda55	Correct a check for P2_PTRACE_FSTP in ptracestop(). MFC after: 1 day	2016-08-19 01:27:24 +00:00
George V. Neville-Neil	3e7e23332f	Remove the obsolete and unused openbsd_poll system call. (Phase 2) Reported by: brooks Reviewed by: brooks, jhb Differential Revision: https://reviews.freebsd.org/D7548	2016-08-18 10:54:39 +00:00
George V. Neville-Neil	5cba398b0c	Remove unusedd and obsolete openbsd_poll system call. (Phase 1) Reported by: brooks Reviewed by: brooks,jhb Differential Revision: https://reviews.freebsd.org/D7548	2016-08-18 10:50:40 +00:00
Bryan Drewery	b387915115	Garbage collect _umtx_lock(2)/_umtx_unlock(2) references removed in r263318. This has no real impact on the resulting libc.so file. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-08-17 10:20:05 +00:00
Konstantin Belousov	e2a18110f0	Remove duplicated code. aio_aqueue() calls aio_init_aioinfo() as the first action. There is no need to duplicate the code in kern_aio_fsync(). Also fix indent for aio_aqueue() definition. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7523	2016-08-17 10:14:22 +00:00
Konstantin Belousov	1680854946	Implement userspace gettimeofday(2) with HPET timecounter. Right now, userspace (fast) gettimeofday(2) on x86 only works for RDTSC. For older machines, like Core2, where RDTSC is not C2/C3 invariant, and which fall to HPET hardware, this means that the call has both the penalty of the syscall and of the uncached hw behind the QPI or PCIe connection to the sought bridge. Nothing can me done against the access latency, but the syscall overhead can be removed. System already provides mappable /dev/hpetX devices, which gives straight access to the HPET registers page. Add yet another algorithm to the x86 'vdso' timehands. Libc is updated to handle both RDTSC and HPET. For HPET, the index of the hpet device to mmap is passed from kernel to userspace, index might be changed and libc invalidates its mapping as needed. Remove cpu_fill_vdso_timehands() KPI, instead require that timecounters which can be used from userspace, to provide tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64 libc/<arch>/sys/__vdso_gettc.c into one source file in the new libc/x86/sys location. __vdso_gettc() internal interface is changed to move timecounter algorithm detection into the MD code. Measurements show that RDTSC even with the syscall overhead is faster than userspace HPET access. But still, userspace HPET is three-four times faster than syscall HPET on several Core2 and SandyBridge machines. Tested by: Howard Su <howard0su@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7473	2016-08-17 09:52:09 +00:00
Gleb Smirnoff	dc4ee9a895	Fix a stupid typo (or copy/paste buffer malfunction).	2016-08-16 23:00:22 +00:00
Gleb Smirnoff	c0f50fa012	We should not be allowing a timeout to reset when a drain is in progress on it (either async or sync drain). At this moment the only user of drain is TCP, but TCP wouldn't reschedule a callout after it has drained it, since it drains only when a tcpcb is closed. This for now the problem isn't observed. Submitted by: rrs	2016-08-16 21:55:34 +00:00
Ed Schouten	93d9ebd82e	Eliminate use of sys_fsync() and sys_fdatasync(). Make the kern_fsync() function public, so that it can be used by other parts of the kernel. Fix up existing consumers to make use of it. Requested by: kib	2016-08-15 20:11:52 +00:00
Eric Badger	b0f2185bbe	sem_post(): wake up the sleeper only after adjusting has_waiters If the caller of sem_post() wakes up a thread sleeping via sem_wait() before it clears the has_waiters flag, the caller of sem_wait() has no way of knowing when it is safe to destroy the semaphore and reuse the memory. This is because the caller of sem_post() may be interrupted between the wake step and the clearing of has_waiters. It will then write into the has_waiters flag in userspace after being preempted for some unknown amount of time. Reviewed by: jhb, kib, vangyzen Approved by: kib (mentor), vangyzen (mentor) MFC after: 2 weeks Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D7505	2016-08-15 20:09:09 +00:00
Konstantin Belousov	47e61f6cc6	Implement VOP_FDATASYNC() for msdosfs. Standard VOP_FSYNC() implementation just syncs data buffers, and due to this, is the correct and efficient implementation for msdosfs or any other filesystem which uses bufer cache trivially. Provide globally visible wrapper vop_stdfdatasync_buf() for future consumption by other filesystems. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471	2016-08-15 19:17:00 +00:00
Konstantin Belousov	1d2537a26a	Regen after r304176, fdatasync(2) addition.	2016-08-15 19:15:46 +00:00
Konstantin Belousov	295af703a0	Add an implementation of fdatasync(2). The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing code with fsync(2). For all filesystems, this commit provides the implementation which delegates the work of VOP_FDATASYNC() to VOP_FSYNC(). This is functionally correct but not efficient. This is not yet POSIX-compliant implementation, because it does not ensure that queued AIO requests are completed before returning. Reviewed by: mckusick Discussed with: avg (ZFS), jhb (AIO part) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471	2016-08-15 19:08:51 +00:00
Konstantin Belousov	c73fb33115	VOP_FSYNC() does not take cred as an argument. Correct comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-15 18:55:33 +00:00
Alan Cox	ce3ee09b53	Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when neither vm_pager_has_page() nor vm_pager_get_pages() is called. Reviewed by: kib, markj MFC after: 3 weeks	2016-08-14 22:00:45 +00:00
Bruce Evans	99061149a3	Print the tid of curthread in "show pcpu" in ddb. It was remarkably hard to trace all current threads. "show pcpu" only showed the pid, and there was nothing (?) better than searching ps output to find the tids on CPUs. This change simplifies the search, but you still have to trace the tid for each CPU manually.	2016-08-14 15:52:00 +00:00
Alan Cox	fc9bbf2794	Eliminate two calls to vm_page_xunbusy() that are both unnecessary and incorrect from the error cases in exec_map_first_page(). They are unnecessary because we automatically unbusy the page in vm_page_free() when we remove it from the object. The calls are incorrect because they happen after the page is freed, so we might actually unbusy the page after it has been reallocated to a different object. (This error was introduced in r292373.) Reviewed by: kib MFC after: 1 week	2016-08-13 18:10:32 +00:00
Edward Tomasz Napierala	f8acef5a3e	Remove unused "X" vnode lock assertion, somehow missed in r303743. MFC after: 1 month	2016-08-12 22:22:11 +00:00
Edward Tomasz Napierala	f83cc0aae4	Print vnode details when vnode locking assertion gets triggered. MFC after: 1 month	2016-08-12 22:20:52 +00:00
Stephen Hurd	23ac9029f9	Update iflib to support more NIC designs - Move group task queue into kern/subr_gtaskqueue.c - Change intr_enable to return an int so it can be detected if it's not implemented - Allow different TX/RX queues per set to be different sizes - Don't split up TX mbufs before transmit - Allow a completion queue for TX as well as RX - Pass the RX budget to isc_rxd_available() to allow an earlier return and avoid multiple calls Submitted by: shurd Reviewed by: gallatin Approved by: scottl Differential Revision: https://reviews.freebsd.org/D7393	2016-08-12 21:29:44 +00:00
Mark Johnston	5004817335	Remove b_pin_count from struct buf. It was added in r153192 for XFS and doesn't appear to have been used for anything else. XFS was disconnected in r241607 and removed entirely in r247631. Reported by: mlaier Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D7468	2016-08-11 07:58:23 +00:00
Edward Tomasz Napierala	411455a8fb	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
Mateusz Guzik	7c34b35b57	ktrace: do a lockless check on fork to see if tracing is enabled This saves 2 lock acquisitions in the common case.	2016-08-10 15:25:44 +00:00
Mateusz Guzik	382172be68	sigio: do a lockless check in funsetownlist There is no need to grab the lock first to see if sigio is used, and it typically is not.	2016-08-10 15:24:15 +00:00
Konstantin Belousov	3a77833e87	Fix indentation. Reported by: hselasky MFC after: 17 days	2016-08-10 14:41:53 +00:00
Konstantin Belousov	49c394a970	Re-schedule signals after kthread exits, since apparently there are processes which combine kernel and non-kernel threads, e.g. nfsd. For such processes, termination of a kthread must recheck signal delivery among other threads according to masks. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:47:12 +00:00
Jean-Sébastien Pédron	bd937497ea	Consistently use `device_t` Several files use the internal name of `struct device` instead of `device_t` which is part of the public API. This patch changes all `struct device *` to `device_t`. The remaining occurrences of `struct device` are those referring to the Linux or OpenBSD version of the structure, or the code is not built on FreeBSD and it's unclear what to do. Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version) Approved by: emaste, jhibbits, sbruno MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7447	2016-08-09 19:32:06 +00:00
Stephen J. Kiernan	0ce1624d0e	Move IPv4-specific jail functions to new file netinet/in_jail.c _prison_check_ip4 renamed to prison_check_ip4_locked Move IPv6-specific jail functions to new file netinet6/in6_jail.c _prison_check_ip6 renamed to prison_check_ip6_locked Add appropriate prototypes to sys/sys/jail.h Adjust kern_jail.c to call prison_check_ip4_locked and prison_check_ip6_locked accordingly. Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that need to be built when INET and INET6, respectively, are configured in the kernel configuration file. Reviewed by: jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6799	2016-08-09 02:16:21 +00:00
Mark Johnston	434ac8b6b7	Handle races with listening socket close when connecting a unix socket. If the listening socket is closed while sonewconn() is executing, the nascent child socket is aborted, which results in recursion on the unp_link lock when the child's pru_detach method is invoked. Fix this by using a flag to mark such sockets, and skip a part of the socket's teardown during detach. Reported by: Raviprakash Darbha <rdarbha@juniper.net> Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7398	2016-08-08 20:25:04 +00:00
Bryan Drewery	c1fa440409	Regenerate after r303755. MFC after: 3 days X-MFC-With: r303755 Sponsored by: EMC / Isilon Storage Division	2016-08-04 19:15:51 +00:00
Bryan Drewery	417d5dec39	Still provide freebsd10_* symbols from libc for COMPAT10. r296773 was done to only remove libc symbols for <7. We want to provide the syscall symbols going forward for 7+. Discussed with: jhb MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-08-04 19:14:18 +00:00
Edward Tomasz Napierala	7b255097eb	Remove unused - never actually implemented - vnode lock types from vnode_if.src. MFC after: 1 month	2016-08-04 13:45:18 +00:00
Bryan Drewery	78be18ae6e	Correct some comments. Sponsored by: EMC / Isilon Storage Division MFC after: 3 days	2016-08-03 18:48:56 +00:00
Mateusz Guzik	0453ade508	locks: fix sx compilation on mips after r303643 The kernel.h header is required for the SYSINIT macro, which apparently was present on amd64 by accident. Reported by: kib	2016-08-03 09:15:10 +00:00
Konstantin Belousov	e69ba32f88	Remove mention of the Giant from the fork_return() description. Making emphasis on this lock in the core function comment is confusing for the modern kernel. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-08-03 07:10:09 +00:00
Ed Schouten	e938ebbc0c	Regenerate system call tables for r303699 and r303700.	2016-08-03 06:36:45 +00:00
Ed Schouten	1b42875d4b	Re-add traling slash that was removed in r303699. I must have accidentally pressed some random key in vim.	2016-08-03 06:35:58 +00:00
Ed Schouten	a813fdc6c3	mprotect(): Change prototype to comply to POSIX. Our mprotect() function seems to take a "const void " address to the pages whose permissions need to be adjusted. POSIX uses "void ". Simply stick to the POSIX one to prevent us from writing unportable code. PR: 211423 (exp-run) Tested by: antoine@ (Thanks!)	2016-08-03 06:33:04 +00:00
Mateusz Guzik	fa5000a4f3	locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case Reported by: Michael Butler <imb protected-networks.net>	2016-08-02 03:05:59 +00:00
Mateusz Guzik	0412689595	locks: fix up ifdef guards introduced in r303643 Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct define. MFC after: 1 week	2016-08-02 00:15:08 +00:00
Mateusz Guzik	1ada904147	Implement trivial backoff for locking primitives. All current spinning loops retry an atomic op the first chance they get, which leads to performance degradation under load. One classic solution to the problem consists of delaying the test to an extent. This implementation has a trivial linear increment and a random factor for each attempt. For simplicity, this first thouch implementation only modifies spinning loops where the lock owner is running. spin mutexes and thread lock were not modified. Current parameters are autotuned on boot based on mp_cpus. Autotune factors are very conservative and are subject to change later. Reviewed by: kib, jhb Tested by: pho MFC after: 1 week	2016-08-01 21:48:37 +00:00
Mateusz Guzik	61852185ba	locks: change sleep_cnt and spin_cnt types to u_int Both variables are uint64_t, but they only count spins or sleeps. All reasonable values which we can get here comfortably hit in 32-bit range. Suggested by: kib MFC after: 1 week	2016-07-31 12:11:55 +00:00
Mateusz Guzik	e0c45af904	sx: increment spin_cnt before cpu_spinwait in xlock The change is a no-op only done for consistency with the rest of the file.	2016-07-30 22:23:31 +00:00
Mateusz Guzik	7a54be1870	rwlock: s/READER/WRITER/ in wlock lockstat annotation	2016-07-30 22:21:48 +00:00
Konstantin Belousov	50c22263bb	Cache getbintime(9) answer in timehands, similarly to getnanotime(9) and getmicrotime(9). Suggested and reviewed by: bde (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2016-07-30 09:25:57 +00:00
John Baldwin	e2325d8285	Don't treat NOCPU as a valid CPU to CPU_ISSET. If a thread is created bound to a cpuset it might already be bound before it's very first timeslice, and td_lastcpu will be NOCPU in that case. MFC after: 1 week	2016-07-29 20:19:14 +00:00
John Baldwin	005ce8e4e6	Fix locking issues with aio_fsync(). - Use correct lock in aio_cancel_sync when dequeueing job. - Add _locked variants of aio_set/clear_cancel_function and use those to avoid lock recursion when adding and removing fsync jobs to the per-process sync queue. - While here, add a basic test for aio_fsync(). PR: 211390 Reported by: Randy Westlund <rwestlun@gmail.com> MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7339	2016-07-29 18:26:15 +00:00
Brooks Davis	40018b91dd	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7353	2016-07-28 21:29:04 +00:00
Ed Schouten	5590eb985e	Regenerate system call table for r303435.	2016-07-28 12:22:34 +00:00
Ed Schouten	d9c4cd2fbc	Change the return type of msgrcv() to ssize_t as required by POSIX. It looks like the msgrcv() system call is already written in such a way that the size is internally computed as a size_t and written into all of td_retval[0]. This means that it is effectively already returning ssize_t. It's just that the userspace prototype doesn't match up.	2016-07-28 12:22:01 +00:00
Konstantin Belousov	2d19b736ed	Rewrite subr_sleepqueue.c use of callouts to not depend on the specifics of callout KPI. Esp., do not depend on the exact interface of callout_stop(9) return values. The main change is that instead of requiring precise callouts, code maintains absolute time to wake up. Callouts now should ensure that a wake occurs at the requested moment, but we can tolerate both run-away callout, and callout_stop(9) lying about running callout either way. As consequence, it removes the constant source of the bugs where sleepq_check_timeout() causes uninterruptible thread state where the thread is detached from CPU, see e.g. r234952 and r296320. Patch also removes dual meaning of the TDF_TIMEOUT flag, making code (IMO much) simpler to reason about. Tested by: pho Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 09:09:55 +00:00
Konstantin Belousov	a9e182e895	Extract the calculation of the callout fire time into the new function callout_when(9). See the man page update for the description of the intended use. Tested by: pho Reviewed by: jhb, bjk (man page updates) Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 08:57:01 +00:00
Konstantin Belousov	b7a25e63b6	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256	2016-07-28 08:41:13 +00:00
Stephen J. Kiernan	4ac21b4f09	Prepare for network stack as a module - Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the INET and INET6-specific code from the rest of the prot code (It is only used by the network stack, so it makes sense for it to live with the other network stack code.) - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to cr_canseeothergids, make them non-static, and add prototypes (so they can be seen/called by in_prot.c functions.) - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an unused variable. Reviewed by: gnn, jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2901	2016-07-27 20:34:09 +00:00
John Baldwin	b9a53e161b	Adjust tests in fsync job scheduling loop to reduce indentation.	2016-07-27 19:31:25 +00:00
Ed Maste	18c23dffac	ANSIfy kern_proc.c and delete register keyword Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6478	2016-07-27 14:27:08 +00:00
Konstantin Belousov	1822421c0b	Remove Giant from settime(), tc_setclock_mtx guards tc_windup() calls, and there is no other issues with parallel settime(). Remove spl() vestiges there as well. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:54:24 +00:00
Konstantin Belousov	5760b029ee	Prevent parallel tc_windup() calls, both parallel top-level calls from setclock() and from simultaneous top-level and interrupt. For this, tc_windup() is protected with a tc_setclock_mtx spinlock, in the try mode when called from hardclock interrupt. If spinlock cannot be obtained without spinning from the interrupt context, this means that top-level executes tc_windup() on other core and our try may be avoided. The boottimebin and boottime variables should be adjusted from tc_windup(). To be correct, they must be part of the timehands and read using lockless protocol. Remove the globals and reimplement the getboottime(9)/getboottimebin(9) KPI using the timehands read protocol. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:49:41 +00:00
Konstantin Belousov	4493f659e5	Fix a bug in r302252. Change ntpadj_lock to spinlock always, and rename stuff removing ADJ/adj from the names. ntp_update_second() requires ntp_lock and is called from the tc_windup(), so ntp_lock must be a spinlock. Add missed lock to ntp_update_second(). Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:40:06 +00:00
Konstantin Belousov	21547fc7ca	Reduce the resettodr_lock scope to only CLOCK_SETTIME() call. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:34:25 +00:00
Konstantin Belousov	4d29106e55	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:33:33 +00:00
Konstantin Belousov	a83c016f00	Reduce number of timehands to just two. This is useful because consumers can now be only one tc_windup() call late. Use C99 initialization. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:27:52 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Stephen J. Kiernan	cc37baea09	Add the NUM_CORE_FILES kernel config option which specifies the limit for the number of core files allowed by a particular process when using the %I core file name pattern. Sanity check at compile time to ensure the value is within the valid range of 0-10. Reviewed by: jtl, sjg Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6812	2016-07-27 03:21:02 +00:00
Ed Schouten	f63cd251b2	Add shmatt_t. It looks like our "struct shmid_ds::shm_nattch" deviates from the standard in the sense that it is a signed integer, whereas POSIX requires that it is unsigned, having a special type shmatt_t. Patch up our native and 32-bit copies to use a new shmatt_t that is an unsigned integer. As it's unsigned, we can relax the comparisons that are performed on it. Leave the Linux, iBCS2, etc. copies of the structure alone. Reviewed by: ngie Differential Revision: https://reviews.freebsd.org/D6655	2016-07-26 17:23:49 +00:00
Conrad Meyer	af326ace9d	devfs: Move most ioctl logic down to vnode layer Devfs' file layer ioctl is now just a thin shim around the vnode layer. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7286	2016-07-25 16:28:02 +00:00
Konstantin Belousov	90b581f2cc	Implement mtx_trylock_spin(9). Discussed with: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7192	2016-07-23 05:30:55 +00:00
John Baldwin	9c20dc9963	Add more documentation regarding unsafe AIO requests. The asynchronous I/O changes made previously result in different behavior out of the box. Previously all AIO requests failed with ENOSYS / SIGSYS unless aio.ko was explicitly loaded. Now, some AIO requests complete and others ("unsafe" requests) fail with EOPNOTSUPP. Reword the introductory paragraph in aio(4) to add a general description of AIO before describing the vfs.aio.enable_unsafe sysctl. Remove the ENOSYS error description from aio_fsync(2), aio_read(2), and aio_write(2) and replace it with a description of EOPNOTSUPP. Remove the ENOSYS error description from aio_mlock(2). Log a message to the system log the first time a process requests an "unsafe" AIO request that fails with EOPNOTSUPP. This is modeled on the log message used for processes using the legacy pty devices. Reviewed by: kib (earlier version) MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7151	2016-07-21 22:49:47 +00:00
Konstantin Belousov	492fe1b774	Hide counted_warning(9) under #ifdef _KERNEL braces, to allow building subr_prf.c in userspace for libsbuf. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-21 17:59:30 +00:00
Konstantin Belousov	9fe297bbdc	Declare aio requests on files from local filesystems safe. Two notes: - I allow AIO on reclaimed vnodes, since it is deterministically terminated fast. - devfs mounts are marked as MNT_LOCAL, but device vnodes have type VCHR, so the slow device io is not allowed. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7273	2016-07-21 17:07:06 +00:00
Konstantin Belousov	9837947b07	Provide counter_warning(9) KPI which allows to issue limited number of warnings for some kernel events, mostly intended for the use of obsoleted or otherwise undersired interfaces. This is an abstracted and race-expelled code from compat pty driver. Requested and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7270	2016-07-21 16:34:56 +00:00
Conrad Meyer	1005d8aff4	imgact_elf: Rename the segment iterator to match reality The each_writable_segment routine evaluates segments on a slightly little more nuanced metric than simply "writable" or not. Rename the function to more closely match its behavior (each_dumpable_segment). Suggested by: jhb Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:51:33 +00:00
Conrad Meyer	f3325003d9	ANSI-fy imgact_elf.c Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:46:56 +00:00
Conrad Meyer	07f825e871	Fix DEBUG build on 64-bit arch after r303099 Reported by: Larry Rosenman <ler at lerctr.org>	2016-07-20 18:11:22 +00:00
Conrad Meyer	c17b0bd2a6	Extend ELF coredump to support more than 65535 segments The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments (program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7, depending on document version) prescribes a special first section header where sh_info represents the real number of program headers. Test code to follow, when it is ready. Reference: http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf Reviewed by: emaste, markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7255	2016-07-20 16:59:36 +00:00
Gleb Smirnoff	9f3391243b	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky Tested by: Larry Rosenman <ler lerctr.org> PR: 210884	2016-07-20 16:48:25 +00:00
Gleb Smirnoff	47e4280922	Revert r303037. It re-introduces the panic with TCP timers. Agreed by: rrs, re (gjb)	2016-07-20 16:44:22 +00:00
Randall Stewart	3d84a18803	This reverts out Gleb's changes and adds three small fixes that I think closes up the races Gleb was looking for. This is running quite nicely in Netflix and now no longer causes TCP-tcb leaks. Differential Revision: 7135	2016-07-19 18:31:19 +00:00

1 2 3 4 5 ...

15193 Commits