freebsd-skq

Author	SHA1	Message	Date
pjd	f695b590b4	Allocate descriptor number in dupfdopen() itself instead of depending on the caller using finstall(). This saves us the filedesc lock/unlock cycle, fhold()/fdrop() cycle and closes a race between finstall() and dupfdopen(). MFC after: 1 month	2012-06-13 21:32:35 +00:00
pjd	f7e18321ef	- Remove nfp variable that is not really needed. - Update comment. - Style nits. MFC after: 1 month	2012-06-13 21:22:35 +00:00
pjd	219cd5caaa	Remove duplicated code. MFC after: 1 month	2012-06-13 21:15:01 +00:00
pjd	5d3532ce69	Add missing {. MFC after: 1 month	2012-06-13 21:13:18 +00:00
pjd	c745de62f2	Style. MFC after: 1 month	2012-06-13 21:11:58 +00:00
pjd	54a86dc320	There is no need to set td->td_retval[0] to -1 on error. Confirmed by: jhb MFC after: 1 month	2012-06-13 21:10:00 +00:00
pjd	b836448bf3	There is only one caller of the dupfdopen() function, so we can simplify it a bit: - We can assert that only ENODEV and ENXIO errors are passed instead of handling other errors. - The caller always call finstall() for indx descriptor, so we can assume it is set. Actually the filedesc lock is dropped between finstall() and dupfdopen(), so there is a window there for another thread to close the indx descriptor, but it will be closed in next commit. Reviewed by: mjg MFC after: 1 month	2012-06-13 19:00:29 +00:00
mjg	29bd2f6d46	Remove 'low' argument from fd_last_used(). This function is static and the only caller always passes 0 as low. While here update note about return values in comment. Reviewed by: pjd Approved by: trasz (mentor) MFC after: 1 month	2012-06-13 17:18:16 +00:00
mjg	1ca4c8cbf9	Re-apply reverted parts of r236935 by pjd with some changes. If fdalloc() decides to grow fdtable it does it once and at most doubles the size. This still may be not enough for sufficiently large fd. Use fd in calculations of new size in order to fix this. When growing the table, fd is already equal to first free descriptor >= minfd, also fdgrowtable() no longer drops the filedesc lock. As a result of this there is no need to retry allocation nor lookup. Fix description of fd_first_free to note all return values. In co-operation with: pjd Approved by: trasz (mentor) MFC after: 1 month	2012-06-13 17:12:53 +00:00
pjd	bcf3f4263d	Revert part of the r236935 for now, until I figure out why it doesn't work properly. Reported by: davidxu	2012-06-12 10:25:11 +00:00
pjd	ea4cd345da	fdgrowtable() no longer drops the filedesc lock so it is enough to retry finding free file descriptor only once after fdgrowtable(). Spotted by: pluknet MFC after: 1 month	2012-06-11 22:05:26 +00:00
pjd	b7902b949c	Use consistent way of checking if descriptor number is valid. MFC after: 1 month	2012-06-11 20:17:20 +00:00
pjd	00ef5a8d82	Be consistent with white spaces. MFC after: 1 month	2012-06-11 20:01:50 +00:00
pjd	d698b8f852	Remove code duplicated in kern_close() and do_dup() and use closefp() function introduced a minute ago. This code duplication was responsible for the bug fixed in r236853. Discussed with: kib Tested by: pho MFC after: 1 month	2012-06-11 20:00:44 +00:00
pjd	c8465e01a1	Introduce closefp() function that we will be able to use to eliminate code duplication in kern_close() and do_dup(). This is committed separately from the actual removal of the duplicated code, as the combined diff was very hard to read. Discussed with: kib Tested by: pho MFC after: 1 month	2012-06-11 19:57:31 +00:00
pjd	cab8c2dc3a	Merge two ifs into one to make the code almost identical to the code in kern_close(). Discussed with: kib Tested by: pho MFC after: 1 month	2012-06-11 19:53:41 +00:00
pjd	b903b5753d	Move the code around a bit to move two parts of code duplicated from kern_close() close together. Discussed with: kib Tested by: pho MFC after: 1 month	2012-06-11 19:51:27 +00:00
pjd	2042e99ed8	Now that fdgrowtable() doesn't drop the filedesc lock we don't need to check if descriptor changed from under us. Replace the check with an assert. Discussed with: kib Tested by: pho MFC after: 1 month	2012-06-11 19:48:55 +00:00
iwasaki	e6018a6dfd	Another fixe for r236772. - Adjust correct cpuset (stopped_cpus/suspended_cpus) for cpu_spinwait() in generic_stop_cpus().	2012-06-11 18:47:26 +00:00
pjd	859bb04daa	Style fixes and simplifications. MFC after: 1 month	2012-06-11 16:08:03 +00:00
pjd	f4a0f109b8	Remove redundant include. MFC after: 1 month	2012-06-10 20:24:01 +00:00
pjd	5081787525	Style: move opt_*.h includes in the proper place. MFC after: 1 month	2012-06-10 20:22:10 +00:00
pjd	23c7c80ef5	When we are closing capability during dup2(), we want to call mq_fdclose() on the underlying object and not on the capability itself. Discussed with: rwatson Sponsored by: FreeBSD Foundation MFC after: 1 month	2012-06-10 14:57:18 +00:00
pjd	0da1a67419	Merge two ifs into one. Other minor style fixes. MFC after: 1 month	2012-06-10 13:10:21 +00:00
pjd	67f6f356fc	Simplify fdtofp(). MFC after: 1 month	2012-06-10 06:31:54 +00:00
mckusick	070b3c0414	When synchronously syncing a device (MNT_WAIT), wait for buffers to become available. Otherwise we may excessively spin and fail with ``fsync: giving up on dirty''. Reviewed by: kib Tested by: Peter Holm MFC after: 1 week	2012-06-09 22:26:53 +00:00
pjd	0311d1f4cc	There is no need to drop the FILEDESC lock around malloc(M_WAITOK) anymore, as we now use sx lock for filedesc structure protection. Reviewed by: kib MFC after: 1 month	2012-06-09 18:50:32 +00:00
pjd	468d011a0d	Remove now unused variable. MFC after: 1 month MFC with: r236820	2012-06-09 18:48:06 +00:00
pjd	b9def82bd7	Make some of the loops more readable. Reviewed by: tegge MFC after: 1 month	2012-06-09 18:03:23 +00:00
pjd	b1dc458d22	Correct panic message. MFC after: 1 month MFC with: r236731	2012-06-09 12:27:30 +00:00
iwasaki	861bb3822c	Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@	2012-06-09 00:37:26 +00:00
jhb	176ddf31c3	Split the second half of vn_open_cred() (after a vnode has been found via a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function and use this function in fhopen() instead of duplicating code from vn_open_cred() directly. Tested by: pho Reviewed by: kib MFC after: 2 weeks	2012-06-08 18:32:09 +00:00
mjg	1d2ca7b8d8	Plug socket refcount leak on error in sys_sctp_peeloff. Reviewed by: tuexen Approved by: trasz (mentor) MFC after: 3 days	2012-06-08 08:04:51 +00:00
pjd	b738e3d524	In fdalloc() f_ofileflags for the newly allocated descriptor has to be 0. Assert that instead of setting it to 0. Sponsored by: FreeBSD Foundation MFC after: 1 month	2012-06-07 23:33:10 +00:00
pjd	576bf7639b	Eliminate redundant variable. Sponsored by: FreeBSD Foundation MFC after: 1 week	2012-06-07 23:08:18 +00:00
pjd	858973e30c	Plug file reference leak in capability failure case. Sponsored by: FreeBSD Foundation MFC after: 3 days	2012-06-07 22:49:09 +00:00
glebius	2bdbc6913f	style(9) for r236563.	2012-06-05 05:16:04 +00:00
glebius	b0d113b96e	Microoptimisation of code from r236560, also coming from Nginx Inc. Submitted by: ru	2012-06-04 14:18:13 +00:00
glebius	df2b290f0d	Optimise kern_sendfile(): skip cycling through the entire mbuf chain in m_cat(), storing pointer to last mbuf in chain in local variable and attaching new mbuf to the end of chain. Submitter reports that CPU load dropped for > 10% on a web server serving large files with this optimisation. Submitted by: Sergey Budnevitch <sb nginx.com>	2012-06-04 12:49:21 +00:00
kib	472101ae88	Add a knob to disable vn_io_fault. MFC after: 1 month	2012-06-03 16:19:37 +00:00
kib	d977144831	Count and export the number of prefaulting happen. MFC after: 1 month	2012-06-03 16:06:56 +00:00
avg	85a02186bc	free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG Those calls are useful with hardware watchdog drivers too. MFC after: 3 weeks	2012-06-03 08:01:12 +00:00
kib	3c3d727c68	Fix typo [1]. Use commas to separate flag printouts, in style with other parts of function. Submitted by: bf [1] MFC after: 1 week	2012-06-02 19:39:12 +00:00
kib	e32a51888c	Update the print mask for decoding b_flags. Add print masks for b_vflags and b_xflags_t and print them as well. MFC after: 1 week	2012-06-02 18:44:40 +00:00
jhb	65701cddc9	Extend VERBOSE_SYSINIT to also print out the name of variables passed to SYSINIT routines if they can be resolved via symbol look up in DDB. To avoid false positives, only honor a name if the symbol resolves exactly to the pointer value (no offset). MFC after: 1 week	2012-06-01 15:42:37 +00:00
pjd	fa432bf52c	Regenerate after r236361. MFC after: 3 days	2012-05-31 19:34:53 +00:00
pjd	b01a263416	Add missing system calls. MFC after: 3 days	2012-05-31 19:32:37 +00:00
pjd	cc9a75903e	There is no rmdirat system call. Weird, I know. MFC after: 3 days	2012-05-31 19:31:28 +00:00
imp	ce8d6b964c	Unlock in the error path to prevent a lock leak. PR: 162174 Submitted by: Ian Lepore MFC after: 2 weeks	2012-05-31 17:27:05 +00:00
kib	080f2e89d9	vn_io_fault() is a facility to prevent page faults while filesystems perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months	2012-05-30 16:42:08 +00:00
kib	6f4e16f833	Add a rangelock implementation, intended to be used to range-locking the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month	2012-05-30 16:06:38 +00:00
kib	7638868334	Assert that TDP_NOFAULTING and TDP_NOSPEEPING thread flags do not leak when thread returns from a syscall to usermode. Tested by: pho MFC after: 1 week	2012-05-30 13:44:42 +00:00
raj	7136f7f893	Let us manage differences of Book-E PowerPC variations i.e. vendor / implementation specific vs. the common architecture definition. Bring PPC4XX defines (PSL, SPR, TLB). Note the new definitions under BOOKE_PPC4XX are not used in the code yet. This change set is not supposed to affect existing E500 support, it's just another reorg step before bringing support for E500mc, E5500 and PPC465. Obtained from: AppliedMicro, Freescale, Semihalf	2012-05-27 10:25:20 +00:00
kib	cae6484163	Fix ki_cow for compat32 binaries. MFC after: 3 days	2012-05-27 05:24:53 +00:00
kib	dcb105721a	Stop treating td_sigmask specially for the purposes of new thread creation. Move it into the copied region of the struct thread. Update some comments. Requested by: bde X-MFC after: never	2012-05-26 20:03:47 +00:00
kib	08dbe8fa01	Add a vn_bmap_seekhole(9) vnode helper which can be used by any filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA commands for lseek(2). MFC after: 2 weeks	2012-05-26 05:28:47 +00:00
ed	0d9131d0d0	Regenerate system call tables.	2012-05-25 21:52:57 +00:00
ed	55e4d6365d	Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types.	2012-05-25 21:50:48 +00:00
avg	aa1a7122dc	device_add_child: protect against child device with no driver but fixed unit number This combination doesn't make sense, unit numbers should be hardwired only in context of a known driver. The wildcard devices should have wildcard unit numbers. Reviewed by: jhb MFC after: 2 weeks	2012-05-25 07:32:26 +00:00
mav	8f3c5562d6	MFprojects/zfsd: Hide warning behind bootverbose. Average user has nothing to do about it.	2012-05-24 11:24:44 +00:00
gleb	3c7243df78	Add kern_fhstat(), adjust sys_fhstat() to use it. Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue. Sponsored by: Google Summer of Code 2011	2012-05-24 08:00:26 +00:00
kib	187a8c5cd6	Calculate the count of per-process cow faults. Export the count to userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week	2012-05-23 18:10:54 +00:00
trasz	b2747e472e	Fix use-after-free in kern_jail_set() triggered e.g. by attempts to clear "persist" flag from empty persistent jail, like this: jail -c persist=1 jail -n 1 -m persist=0 Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks	2012-05-22 19:43:20 +00:00
trasz	a25d879040	Don't leak locks in prison_racct_modify(). Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks	2012-05-22 17:30:02 +00:00
trasz	3a811deac7	Fix panic with RACCT that could occur in low memory (or out of swap) situations, due to fork1() calling racct_proc_exit() without calling racct_proc_fork() first. Submitted by: Mateusz Guzik <mjguzik at gmail dot com> (earlier version) Reviewed by: Mateusz Guzik <mjguzik at gmail dot com>	2012-05-22 15:58:27 +00:00
harti	c7e30562ca	Make dumptid non-static. It is used by libkvm to detect whether this is a VNET-kernel or not. gcc used to put the static symbol into the symbol table, clang does not. This fixes the 'netstat: no namelist' error seen on clang+VNET systems.	2012-05-22 07:23:41 +00:00
melifaro	34ec5c8650	Fix old panic when BPF consumer attaches to destroying interface. 'flags' field is added to the end of bpf_if structure. Currently the only flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd() Problem can be easily triggered on SMP stable/[89] by the following command (sort of): 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done' Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory, while interface is still UP and there can be routes via this interface. Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api. Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches) Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:17:29 +00:00
iwasaki	31eddd58e3	Add SMP/i386 suspend/resume support. Most part is merged from amd64. - i386/acpica/acpi_wakecode.S Replaced with amd64 code (from realmode to paging enabling code). - i386/acpica/acpi_wakeup.c Replaced with amd64 code (except for wakeup_pagetables stuff). - i386/include/pcb.h - i386/i386/genassym.c Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT and TR) needed for suspend/resume, not for context switch. - i386/i386/swtch.s Added suspendctx() and resumectx(). Note that savectx() was not changed and used for suspending (while amd64 code uses it). BSP and AP execute the same sequence, suspendctx(), acpi_wakecode() and resumectx() for suspend/resume (in case of UP system also). - i386/i386/apic_vector.s Added cpususpend(). - i386/i386/mp_machdep.c - i386/include/smp.h Added cpususpend_handler(). - i386/include/apicvar.h - kern/subr_smp.c - sys/smp.h Added IPI_SUSPEND and suspend_cpus(). - i386/i386/initcpu.c - i386/i386/machdep.c - i386/include/md_var.h - pc98/pc98/machdep.c Moved initializecpu() declarations to md_var.h. MFC after: 3 days	2012-05-18 18:55:58 +00:00
gleb	3288f283ff	Skip directory entries with zero inode number during traversal. Entries with zero inode number are considered placeholders by libc and UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp, unionfs. Sponsored by: Google Summer of Code 2011	2012-05-16 10:44:09 +00:00
pluknet	7aab7d56be	Fix typo in function name SDT_PROBE4 and unbreak 4BSD UP.	2012-05-15 10:58:17 +00:00
gber	112a2e964f	Do not call bremfree for managed buffers. Calling bremfree for these buffers results in panic: "bremfree: buffer %p not on a queue." Approved by: kib	2012-05-15 09:55:15 +00:00
rstone	a059a0e086	Implement the DTrace sched provider. This implementation aims to be compatible with the sched provider implemented by Solaris and its open- source derivatives. Full documentation of the sched provider can be found on Oracle's DTrace wiki pages. Note that for compatibility with scripts originally written for Solaris, serveral probes are defined that will never fire. These probes are defined to fire when Solaris-specific features perform certain actions. As these features are not present in FreeBSD, the probes can never fire. Also, I have added a two probes that are not defined in Solaris, lend-pri and load-change. These probes have been added to make it possible to collect schedgraph data with DTrace. Finally, a few probes are defined in Solaris to take a cpuinfo_t * argument. As it was not immediately clear to me how to translate that to FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *. Sponsored by: Sandvine Incorporated MFC after: 2 weeks	2012-05-15 01:30:25 +00:00
delphij	53e510d1ef	Revert previous revision, misunderstood the code :(	2012-05-11 23:43:32 +00:00
delphij	f7e33a4a67	Release proc lock after setting signal queue. PR: kern/167727 Submitted by: Jinjun Gao <gjinjun gmail com> MFC after: 2 weeks	2012-05-11 23:41:52 +00:00
kib	c5f120d09b	Move the code to call the callout callback into the helper function softclock_call_cc(). While there, move some common code to callout_cc_del(). Requested by: avg, jhb Reviewed by: jhb MFC after: 1 week	2012-05-03 20:00:30 +00:00
kib	9e5fca0368	When callout_reset_on() cannot immediately migrate a callout since it is running on other cpu, the CALLOUT_PENDING flag is temporarily cleared. Then, callout_stop() on this, in fact active, callout fails because CALLOUT_PENDING is not set, and callout_stop() returns 0. Now, in sleepq_check_timeout(), the failed callout_stop() causes the sleepq code to execute mi_switch() without even setting the wmesg, since the switch-out is supposed to be transient. In fact, the thread is put off the CPU for full timeout interval, instead of being put on runq immediately. Until timeout fires, the process is unkillable for obvious reasons. Fix this by marking the migrating callouts with CALLOUT_DFRMIGRATION flag. The flag is cleared by callout_stop_safe() when the function detects a migration, besides returning the success. The softclock() rechecks the flag for migrating callout and cancels its execution if the flag was cleared meantime. PR: misc/166340 Reported, debugging traces provided and tested by: Christian Esken <christian.esken trivago com> Reviewed by: avg, jhb MFC after: 1 week	2012-05-03 10:38:02 +00:00
jhb	c96b8c07a4	- Don't log messages saying that accounting is being disabled and enabled if the accounting log file is atomically replaced with a new file (such as during log rotation). - Simplify accounting log rotation a bit. There is no need to re-run accton(8) after renaming the new log file to it's real name. PR: kern/167321 Tested by: Jeremy Chadwick	2012-05-02 14:25:39 +00:00
kib	0e86d1558c	Allow for the process information sysctls to accept a thread id in addition to the process id. It follows the ptrace(2) interface and allows debugging libraries to use thread ids directly, without slow and verbose conversion of thread id into pid. The PGET_NOTID flag is provided to allow a specific sysctl to disallow this behaviour. All current callers of pget(9) have useful semantic to operate on tid and do not need this flag. Reviewed by: jhb, trocini MFC after: 1 week	2012-04-23 20:56:05 +00:00
trasz	023bd7c6bf	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
trasz	baac623cd9	Remove unused thread argument from vtruncbuf(). Reviewed by: kib	2012-04-23 13:21:28 +00:00
jhb	aa85973504	Include the associated wait channel message for context switch ktrace records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week	2012-04-20 15:32:36 +00:00
jh	433fc8eeff	The value of flags matching VNOVAL can't be supported. Return EOPNOTSUPP from setfflags() in this case. This fixes the return value of chflags(path, -1). Discussed with: bde MFC after: 2 weeks	2012-04-20 10:08:30 +00:00
mckusick	d9895ac1fe	This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 07:00:28 +00:00
mckusick	5b7b29e35b	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 06:50:44 +00:00
mckusick	a9a210460f	Delete a no longer useful VNASSERT missed during changes in 234400. Suggested by: kib	2012-04-18 19:34:20 +00:00
mckusick	be8731298f	Fix a memory leak of M_VNODE_MARKER introduced in 234386. Found by: Peter Holm	2012-04-18 19:30:22 +00:00
mckusick	841f20af50	Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks	2012-04-17 21:46:59 +00:00
mckusick	ffee40eeff	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00
trasz	7f09aee7a1	Fix bug where NFSv4 ACL enforcement code wouldn't unconditionally allow the owner to read and write ACL and file attributes when there was no entry with subject matching the owner. In other words, 'getfacl meh' shouldn't fail for the owner if the ACL looks like this: # file: meh # owner: trasz # group: wheel user:root:------a-------:------:allow Reported by: kientzle	2012-04-17 14:54:00 +00:00
trasz	29ba0a35f6	Stop treating system processes as special. This fixes panics like the one triggered by this: # kldload geom_vinum # pwait `pgrep -S gv_worker` & # kldunload geom_vinum or this: GEOM_JOURNAL: Shutting down geom gjournal 3464572051. panic: destroying non-empty racct: 1 allocated for resource 6 which were tracked by jh@ to be caused by checking p->p_flag, while it wasn't initialised yet. Basically, during fork, the code checked p_flag, concluded the process isn't marked as P_SYSTEM, incremented the counter, and later on, when exiting, checked that the process was marked as P_SYSTEM, and thus didn't decrement it. Also, I believe there wasn't any good reason for checking P_SYSTEM in the first place. Tested by: jh	2012-04-17 14:31:02 +00:00
trasz	a41bb18a29	Fix panic, triggered like this: "int main() { thr_exit(); }" Submitted by: Mateusz Guzik	2012-04-17 13:44:40 +00:00
trasz	c37ffba90a	Enforce upper bound on the input buffer length. Reported by: Mateusz Guzik	2012-04-17 13:28:14 +00:00
jkim	e210f689a8	- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27 but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation	2012-04-16 21:22:02 +00:00
davide	63cc567af5	Fix a typo. Approved by: gnn (mentor) MFC after: 2 days	2012-04-14 23:59:58 +00:00
davide	ff8b0a29f3	Fix some style bugs introduced in a previous commit (r233045) Reported by: glebius, jmallet Reviewed by: jmallet Approved by: gnn (mentor) MFC after: 2 days	2012-04-14 23:53:31 +00:00
marius	6f1427f0e6	Fix !DDB build after r234190.	2012-04-14 11:21:24 +00:00
adrian	2c73480574	Use strdup() on the name (and free it when it's done) so non-static names can be used in firmware_register().	2012-04-13 04:22:42 +00:00
jhb	20ac4e4f81	- Extend the KDB interface to add a per-debugger callback to print a backtrace for an arbitrary thread (rather than the calling thread). A kdb_backtrace_thread() wrapper function uses the configured debugger if possible, otherwise it falls back to using stack(9) if that is available. - Replace a direct call to db_trace_thread() in propagate_priority() with a call to kdb_backtrace_thread() instead. MFC after: 1 week	2012-04-12 17:43:59 +00:00
jhb	51ec6999bb	If a linker file contains at least one module, but all of the modules fail to load (the MOD_LOAD event fails) during a kldload(2), unload the linker file and fail the kldload(2) with ENOEXEC. Reported by: gcooper MFC after: 1 week	2012-04-12 14:49:25 +00:00
kib	319ab382ef	Add thread-private flag to indicate that error value is already placed in td_errno. Flag is supposed to be used by syscalls returning EJUSTRETURN because errno was already placed into the usermode frame by a call to set_syscall_retval(9). Both ktrace and dtrace get errno value from td_errno if the flag is set. Use the flag to fix sigsuspend(2) error return ktrace records. Requested by: bde MFC after: 1 week	2012-04-12 10:48:43 +00:00
mckusick	7901256b30	Export vinactive() from kern/vfs_subr.c (e.g., make it no longer static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks	2012-04-11 23:01:11 +00:00
jhb	294ae9574d	Allow device_busy() and device_unbusy() to be invoked while a device is being attached. This is implemented by adding a new DS_ATTACHING state while a device's DEVICE_ATTACH() method is being invoked. A driver is required to not fail an attach of a busy device. The device's state will be promoted to DS_BUSY rather than DS_ACTIVE() if the device was marked busy during DEVICE_ATTACH(). Reviewed by: kib MFC after: 1 week	2012-04-11 20:57:41 +00:00
eadler	2a42c5c4e9	Return EBADF instead of EMFILE from dup2 when the second argument is outside the range of valid file descriptors PR: kern/164970 Submitted by: Peter Jeremy <peterjeremy@acm.org> Reviewed by: jilles Approved by: cperciva MFC after: 1 week	2012-04-11 14:08:09 +00:00
jilles	4360dc9ca8	Remove unused and wrong SA_PROC internal signal property. The SA_PROC signal property indicated whether each signal number is directed at a specific thread or at the process in general. However, that depends on how the signal was generated and not on the signal number. SA_PROC was not used.	2012-04-09 21:58:58 +00:00
mav	e1ffe54fb7	Microoptimize cpu_search(). According to profiling, it makes one take 6% of CPU time on hackbench with its million of context switches per second, instead of 8% before.	2012-04-09 18:24:58 +00:00
gleb	fb452e77b0	Add vfs_getopt_size. Support human readable file system options in tmpfs. Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs. Discussed with: delphij MFC after: 2 weeks	2012-04-07 15:27:34 +00:00
melifaro	8b1d10268c	- Improve BPF locking model. Interface locks and descriptor locks are converted from mutex(9) to rwlock(9). This greately improves performance: in most common case we need to acquire 1 reader lock instead of 2 mutexes. - Remove filter(descriptor) (reader) lock in bpf_mtap[2] This was suggested by glebius@. We protect filter by requesting interface writer lock on filter change. - Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h without including rwlock stuff. However, this is is temporary solution, struct bpf_if should be made opaque for any external caller. Found by: Dmitrij Tejblum <tejblum@yandex-team.ru> Sponsored by: Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 3 weeks	2012-04-06 06:53:58 +00:00
jhb	5829de48d9	Add new ktrace records for the start and end of VM faults. This gives a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks	2012-04-05 17:13:14 +00:00
davidxu	cc55f4943b	In sem_post, the field _has_waiters is no longer used, because some application destroys semaphore after sem_wait returns. Just enter kernel to wake up sleeping threads, only update _has_waiters if it is safe. While here, check if the value exceed SEM_VALUE_MAX and return EOVERFLOW if this is true.	2012-04-05 03:05:02 +00:00
davidxu	8c31e244f2	umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses a mutex after a thread has unlocked it, it event writes data to the mutex memory to clear contention bit, there is a race that other threads can lock it and unlock it, then destroy it, so it should not write data to the mutex memory if there isn't any waiter. The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It requires thread library to clear the lock word entirely, then call the WAKE2 operation to check if there is any waiter in kernel, and try to wake up a thread, if necessary, the contention bit is set again by the operation. This also mitgates the chance that other threads find the contention bit and try to enter kernel to compete with each other to wake up sleeping thread, this is unnecessary. With this change, the mutex owner is no longer holding the mutex until it reaches a point where kernel umtx queue is locked, it releases the mutex as soon as possible. Performance is improved when the mutex is contensted heavily. On Intel i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c	2012-04-05 02:24:08 +00:00
np	307ef13f94	- Remove redundant call to pr_ctloutput from code that handles SO_SETFIB. - Add a check for errors during copyin while here. Reviewed by: julian, bz MFC after: 2 weeks	2012-04-03 18:38:00 +00:00
kib	ff6239a557	When process exists, not only the children shall be reparented to init, but also the orphans shall be removed from the orphan list, because the list header is destroyed. Reported and tested by: pho MFC after: 3 days	2012-04-02 19:35:36 +00:00
kib	9ad701f91f	Add helper function to remove the process from the orphans list and use it instead of inlined code. Tested by: pho MFC after: 3 days	2012-04-02 19:34:56 +00:00
jhb	506e2f15b9	Export some more useful info about shared memory objects to userland via procstat(1) and fstat(1): - Change shm file descriptors to track the pathname they are associated with and add a shm_path() method to copy the path out to a caller-supplied buffer. - Use the fo_stat() method of shared memory objects and shm_path() to export the path, mode, and size of a shared memory object via struct kinfo_file. - Add a struct shmstat to the libprocstat(3) interface along with a procstat_get_shm_info() to export the mode and size of a shared memory object. - Change procstat to always print out the path for a given object if it is valid. - Teach fstat about shared memory objects and to display their path, mode, and size. MFC after: 2 weeks	2012-04-01 18:22:48 +00:00
davidxu	42d5de0c66	Remove stale comments.	2012-03-31 06:48:41 +00:00
davidxu	0bd3403eb7	Remove trailing semicolon, it is a typo.	2012-03-30 12:57:14 +00:00
davidxu	febc18f31b	Fix COMPAT_FREEBSD32 build. Submitted by: Andreas Tobler < andreast at fgznet dot ch >	2012-03-30 09:03:53 +00:00
davidxu	f7f769bc6d	Remove trailing space.	2012-03-30 05:49:32 +00:00
davidxu	5faf75d34c	Merge umtxq_sleep and umtxq_nanosleep into a single function by using an abs_timeout structure which describes timeout info.	2012-03-30 05:40:26 +00:00
davidxu	362bad78ca	Reduce code size by creating common timed sleeping function.	2012-03-29 02:46:43 +00:00
fabient	5edfb77dd3	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month	2012-03-28 20:58:30 +00:00
rstone	0ee65aa24e	Instead of only iterating over the set of known SDT probes when sdt.ko is loaded and unloaded, also have sdt.ko register callbacks with kern_sdt.c that will be called when a newly loaded KLD module adds more probes or a module with probes is unloaded. This fixes two issues: first, if a module with SDT probes was loaded after sdt.ko was loaded, those new probes would not be available in DTrace. Second, if a module with SDT probes was unloaded while sdt.ko was loaded, the kernel would panic the next time DTrace had cause to try and do anything with the no-longer-existent probes. This makes it possible to create SDT probes in KLD modules, although there are still two caveats: first, any SDT probes in a KLD module must be part of a DTrace provider that is defined in that module. At present DTrace only destroys probes when the provider is destroyed, so you can still panic the system if a KLD module creates new probes in a provider from a different module(including the kernel) and then unload the the first module. Second, the system will panic if you unload a module containing SDT probes while there is an active D script that has enabled those probes. MFC after: 1 month	2012-03-27 15:07:43 +00:00
melifaro	fd561480db	- Add knlist_init_rw_reader() function to kqueue(9). Function acquired reader lock if needed. Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED) - While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues Reviewed by: glebius Approved by: ae(mentor) MFC after: 2 weeks	2012-03-26 09:34:17 +00:00
trociny	0079b1f6c5	Add a sysctl to set and retrieve binary osreldate of another process. Suggested by: kib Reviewed by: kib MFC after: 2 weeks	2012-03-23 20:05:41 +00:00
ae	bb8b607479	Correct debug message.	2012-03-22 09:29:07 +00:00
alc	e02fd6b842	Handle spurious page faults that may occur in no-fault sections of the kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week	2012-03-22 04:52:51 +00:00
ae	f0e7ec67c0	Acquire modules lock before call module_getname() in the KLD_DEBUG case. MFC after: 1 week	2012-03-21 09:48:32 +00:00
eadler	169b46c915	- Clean up timestamps in msgbuf code. The timestamps should now be inserted after the priority token thus cleaning up the output. - Remove the needless double internal do_add_char function. - Resolve a possible deadlock if interrupts are disabled and getnanotime is called Reviewed by: bde kmacy, avg, sbruno (various versions) Approved by: cperciva MFC after: 2 weeks	2012-03-19 00:36:32 +00:00
jh	683a986c03	Cast wallclock.tv_sec to uint64_t to avoid overflow in the calculation. PR: kern/161552 Reviewed by: trasz Tested by: Nikos Vassiliadis MFC after: 1 week	2012-03-18 19:13:32 +00:00
davide	cd0c342e57	Add rudimentary profiling of the hash table used in the in the umtx code to hold active lock queues. Reviewed by: attilio Approved by: davidxu, gnn (mentor) MFC after: 3 weeks	2012-03-16 20:32:11 +00:00
tuexen	b8b34b6ecf	Fix bugs which can result in a panic when an non-SCTP socket it used with an sctp_ system-call which expects an SCTP socket. MFC after: 3 days.	2012-03-15 14:13:38 +00:00
ae	894c8dc15b	Add CTLFLAG_TUN to the sysctl definition and fix style. Pointed by: Garrett Cooper MFC after: 2 weeks	2012-03-15 06:01:21 +00:00
ae	9be115302d	Add debug.kld_debug loader tunable. MFC after: 2 weeks	2012-03-15 05:11:29 +00:00
jh	59d9d84ca4	Add an assert for proctree_lock to proc_to_reap(). Discussed with: kib MFC after: 1 week	2012-03-14 15:52:23 +00:00
kib	6e85340add	Lock the process around manipulations with p_flag. Reported and reviewed by: jh MFC after: 3 days	2012-03-13 22:00:46 +00:00
adrian	f2bb6a85d7	Add module load/unload stubs.	2012-03-13 20:27:48 +00:00
mav	5b5fc4e585	Add kern.eventtimer.activetick tunable/sysctl, specifying whether each hardclock() tick should be run on every active CPU, or on only one. On my tests, avoiding extra interrupts because of this on 8-CPU Core i7 system with HZ=10000 saves about 2% of performance. At this moment option implemented only for global timers, as reprogramming per-CPU timers is too expensive now to be compensated by this benefit, especially since we still have to regularly run hardclock() on at least one active CPU to update system uptime. For global timer it is quite trivial: timer runs always, but we just skip IPIs to other CPUs when possible. Option is enabled by default now, keeping previous behavior, as periodic hardclock() calls are still used at least to implement setitimer(2) with ITIMER_VIRTUAL and ITIMER_PROF arguments. But since default schedulers don't depend on it since r232917, we are much more free to experiment with it. MFC after: 1 month	2012-03-13 10:21:08 +00:00
mav	ffaa080e67	Rewrite thread CPU usage percentage math to not depend on periodic calls with HZ rate through the sched_tick() calls from hardclock(). Potentially it can be used to improve precision, but now it is just minus one more reason to call hardclock() for every HZ tick on every active CPU. SCHED_4BSD never used sched_tick(), but keep it in place for now, as at least SCHED_FBFS existing in patches out of the tree depends on it. MFC after: 1 month	2012-03-13 08:18:54 +00:00
pho	e35bb21f2c	Allways call fdrop().	2012-03-12 11:56:57 +00:00
kib	4e790f9b2b	ELF image can have several PT_NOTE program headers. Look for the ELF brand note in each header, instead of using only first one. Reviewed by: kan Tested by: andrew (arm), flo (sparc64) MFC after: 3 weeks	2012-03-11 19:38:49 +00:00
kib	8adabb0356	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni	2012-03-11 12:19:58 +00:00
mav	4be9351f8b	Revert r175376 and tune cpufreq(4) frequency comparison logic instead. Instead of using 25MHz equality threshold, look for the nearest value when handling dev.cpu.0.freq sysctl and for exact match when it is expected. ACPI may report extra level with frequency 1MHz above the nominal to control Intel Turbo Boost operation. It is not a bug, but feature: dev.cpu.0.freq_levels: 2934/106000 2933/95000 2800/82000 ... In this case value 2933 means 2.93GHz, but 2934 means 3.2-3.6GHz. I've found that my Core i7-870 based system has Intel Turbo Boost disabled by default and without this change it was absolutely invisible and hard to control. MFC after: 2 weeks	2012-03-10 18:56:16 +00:00
mav	1324baa4eb	Idle ticks optimization: - Pass number of events to the statclock() and profclock() functions same as to hardclock() before to not call them many times in a loop. - Rename them into statclock_cnt() and profclock_cnt(). - Turn statclock() and profclock() into compatibility wrappers, still needed for arm. - Rename hardclock_anycpu() into hardclock_cnt() for unification. MFC after: 1 week	2012-03-10 14:57:21 +00:00
trasz	a0d48d6f11	Remove useless thread_{lock,unlock}() in raccd.	2012-03-10 14:38:49 +00:00
jmallett	d25fa497f7	Export intrcnt correctly when running under 32-bit compatibility. Reviewed by: gonzo, nwhitehorn	2012-03-09 22:30:54 +00:00
pho	c84e05a07c	Perform the parameter validation before assigning it to a signed int variable. This fixes the problem seen with readdir(3) fuzzing. Submitted by: bde MFC after: 1 week	2012-03-09 21:31:12 +00:00
mav	d6e827162d	Make kern.sched.idlespinthresh default value adaptive depending of HZ. Otherwise with HZ above 8000 CPU may never skip timer ticks on idle.	2012-03-09 19:09:08 +00:00
mav	fb50c869a4	Be more polite when setting state->nextevent inside cpu_new_callout(). Hardclock is not the only who wakes idle CPU since kdtrace cyclic addition. MFC after: 2 weeks	2012-03-09 07:30:48 +00:00
kib	5abd2bb7cb	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week	2012-03-09 00:12:05 +00:00
pho	81cae127b0	Free up allocated memory used by posix_fadvise(2).	2012-03-08 20:34:13 +00:00

1 2 3 4 5 ...

12802 Commits