freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	7054ee4e38	The kqueue_register() function assumes that it is called from the top of the syscall code and acquires various event subsystem locks as needed. The handling of the NOTE_TRACK for EVFILT_PROC is currently done by calling the kqueue_register() from filt_proc() filter, causing recursive entrance of the kqueue code. This results in the LORs and recursive acquisition of the locks. Implement the variant of the knote() function designed to only handle the fork() event. It mostly copies the knote() body, but also handles the NOTE_TRACK, removing the handling from the filt_proc(), where it causes problems described above. The function is called from the fork1() instead of knote(). When encountering NOTE_TRACK knote, it marks the knote as influx and drops the knlist and kqueue lock. In this context call to kqueue_register is safe from the problems. An error from the kqueue_register() is reported to the observer as NOTE_TRACKERR fflag. PR: 108201 Reviewed by: jhb, Pramod Srinivasan <pramod juniper net> (previous version) Discussed with: jmg Tested by: pho MFC after: 2 weeks	2008-07-07 09:30:11 +00:00
Konstantin Belousov	e1a32fd42b	The r178914 I erronously put the setting of the KQ_FLUXWAIT flag before KQ_FLUX_WAKEUP(). Since the later macro clears the KQ_FLUXWAIT, the kqueue_scan() thread may be not woken up. Move the setting of KQ_FLUXWAIT after wakeup to correct the issue. Reported and tested by: pho MFC after: 3 days	2008-07-07 09:15:29 +00:00
Alan Cox	b89eaf4e9f	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout	2008-07-05 19:34:33 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Alan Cox	6819e13eeb	Correct an error in the comments for init_param3(). Discussed with: silby	2008-07-04 19:36:58 +00:00
Robert Watson	59dd72d040	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
Ed Maste	7928893d83	Use bcopy instead of strlcpy in uipc_bind and unp_connect, since soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the terminating NUL is not part of the address." Since strlcpy has to return "the total length of the string [it] tried to create," it walks off the end of soun->sun_path looking for a \0. This reverts r105332. Reported by: Ryan Stone	2008-07-03 23:26:10 +00:00
Julian Elischer	f44e6e2ecc	Change a variable name to not shadow a global Obtained from: vimage	2008-07-03 08:35:59 +00:00
Robert Watson	6992381eca	Update copyright date in light of soreceive_dgram(9).	2008-07-03 06:47:45 +00:00
Robert Watson	5df3e83946	Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers significant performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps	2008-07-02 23:23:27 +00:00
Roman Divacky	bff2d4d5ff	Use msleep_spin() instead of unlock/tsleep/lock. This was already commited but with a wrong msleep variant and then backed out. Note that this changes the semantic a little as msleep_spin does not let us to specify priority after wakeup. Approved by: wkoszek, cognet Approved by: kib (mentor)	2008-07-02 20:44:33 +00:00
Bjoern A. Zeeb	04a58b9d5f	Remove an unneeded error variable to make clear that if reaching the end of the function we never return an error.	2008-06-29 18:26:07 +00:00
Bjoern A. Zeeb	ba931c0855	Add a new priv 'PRIV_SCHED_CPUSET' to check if manipulating cpusets is allowed and replace the suser() call. Do not allow it in jails. Reviewed by: rwatson	2008-06-29 17:58:16 +00:00
John Baldwin	6bc1e9cd84	Rework the lifetime management of the kernel implementation of POSIX semaphores. Specifically, semaphores are now represented as new file descriptor type that is set to close on exec. This removes the need for all of the manual process reference counting (and fork, exec, and exit event handlers) as the normal file descriptor operations handle all of that for us nicely. It is also suggested as one possible implementation in the spec and at least one other OS (OS X) uses this approach. Some bugs that were fixed as a result include: - References to a named semaphore whose name is removed still work after the sem_unlink() operation. Prior to this patch, if a semaphore's name was removed, valid handles from sem_open() would get EINVAL errors from sem_getvalue(), sem_post(), etc. This fixes that. - Unnamed semaphores created with sem_init() were not cleaned up when a process exited or exec'd. They were only cleaned up if the process did an explicit sem_destroy(). This could result in a leak of semaphore objects that could never be cleaned up. - On the other hand, if another process guessed the id (kernel pointer to 'struct ksem' of an unnamed semaphore (created via sem_init)) and had write access to the semaphore based on UID/GID checks, then that other process could manipulate the semaphore via sem_destroy(), sem_post(), sem_wait(), etc. - As part of the permission check (UID/GID), the umask of the proces creating the semaphore was not honored. Thus if your umask denied group read/write access but the explicit mode in the sem_init() call allowed it, the semaphore would be readable/writable by other users in the same group, for example. This includes access via the previous bug. - If the module refused to unload because there were active semaphores, then it might have deregistered one or more of the semaphore system calls before it noticed that there was a problem. I'm not sure if this actually happened as the order that modules are discovered by the kernel linker depends on how the actual .ko file is linked. One can make the order deterministic by using a single module with a mod_event handler that explicitly registers syscalls (and deregisters during unload after any checks). This also fixes a race where even if the sem_module unloaded first it would have destroyed locks that the syscalls might be trying to access if they are still executing when they are unloaded. XXX: By the way, deregistering system calls doesn't do any blocking to drain any threads from the calls. - Some minor fixes to errno values on error. For example, sem_init() isn't documented to return ENFILE or EMFILE if we run out of semaphores the way that sem_open() can. Instead, it should return ENOSPC in that case. Other changes: - Kernel semaphores now use a hash table to manage the namespace of named semaphores nearly in a similar fashion to the POSIX shared memory object file descriptors. Kernel semaphores can now also have names longer than 14 chars (up to MAXPATHLEN) and can include subdirectories in their pathname. - The UID/GID permission checks for access to a named semaphore are now done via vaccess() rather than a home-rolled set of checks. - Now that kernel semaphores have an associated file object, the various MAC checks for POSIX semaphores accept both a file credential and an active credential. There is also a new posixsem_check_stat() since it is possible to fstat() a semaphore file descriptor. - A small set of regression tests (using the ksem API directly) is present in src/tools/regression/posixsem. Reported by: kris (1) Tested by: kris Reviewed by: rwatson (lightly) MFC after: 1 month	2008-06-27 05:39:04 +00:00
Julian Elischer	9dcc73ed79	Someone cut and pasted a bunch of stuff here so lots of indents were spaces when they should have been tabs, screwing up diffs and patches.. Whitespace commit as my first SVN commit. (yay) MFC after: 1 week	2008-06-26 22:45:04 +00:00
Doug Rabson	c675522fc4	Re-implement the client side of rpc.lockd in the kernel. This implementation provides the correct semantics for flock(2) style locks which are used by the lockf(1) command line tool and the pidfile(3) library. It also implements recovery from server restarts and ensures that dirty cache blocks are written to the server before obtaining locks (allowing multiple clients to use file locking to safely share data). Sponsored by: Isilon Systems PR: 94256 MFC after: 2 weeks	2008-06-26 10:21:54 +00:00
Ruslan Ermilov	d03c587ffa	Fix a chicken-and-egg problem: this files implements SSP support, so we cannot compile it with -fstack-protector[-all] flags (or it will self-recurse); this is ensured in sys/conf/files. This OTOH means that checking for defines __SSP__ and __SSP_ALL__ to determine if we should be compiling the support is impossible (which it was trying, resulting in an empty object file). Fix this by always compiling the symbols in this files. It's good because it allows us to always have SSP support, and then compile with SSP selectively. Repoted by: tinderbox	2008-06-26 07:52:45 +00:00
Ruslan Ermilov	042df2e2da	Enable GCC stack protection (aka Propolice) for userland: - It is opt-out for now so as to give it maximum testing, but it may be turned opt-in for stable branches depending on the consensus. You can turn it off with WITHOUT_SSP. - WITHOUT_SSP was previously used to disable the build of GNU libssp. It is harmless to steal the knob as SSP symbols have been provided by libc for a long time, GNU libssp should not have been much used. - SSP is disabled in a few corners such as system bootstrap programs (sys/boot), process bootstrap code (rtld, csu) and SSP symbols themselves. - It should be safe to use -fstack-protector-all to build world, however libc will be automatically downgraded to -fstack-protector because it breaks rtld otherwise. - This option is unavailable on ia64. Enable GCC stack protection (aka Propolice) for kernel: - It is opt-out for now so as to give it maximum testing. - Do not compile your kernel with -fstack-protector-all, it won't work. Submitted by: Jeremie Le Hen <jeremie@le-hen.org>	2008-06-25 21:33:28 +00:00
David Xu	7de1ecef2d	Add two commands to _umtx_op system call to allow a simple mutex to be locked and unlocked completely in userland. by locking and unlocking mutex in userland, it reduces the total time a mutex is locked by a thread, in some application code, a mutex only protects a small piece of code, the code's execution time is less than a simple system call, if a lock contention happens, however in current implemenation, the lock holder has to extend its locking time and enter kernel to unlock it, the change avoids this disadvantage, it first sets mutex to free state and then enters kernel and wake one waiter up. This improves performance dramatically in some sysbench mutex tests. Tested by: kris Sounds great: jeff	2008-06-24 07:32:12 +00:00
John Baldwin	c4f3a35a54	Remove the posixsem_check_destroy() MAC check. It is semantically identical to doing a MAC check for close(), but no other types of close() (including close(2) and ksem_close(2)) have MAC checks. Discussed with: rwatson	2008-06-23 21:37:53 +00:00
Robert Watson	3319d71265	If S_IFIFO is passed to mknod(2), invoke kern_mkfifoat(9) to create a FIFO, as required by SUSv3. No specific privilege check is performed in this case, as FIFOs may be created by unprivileged processes (subject to the normal file system name space restrictions that may be in place). Unlike the Apple implementation, we reject requests to create a FIFO using mknod(2) if there is a non-zero dev argument to the system call, which is permitted by the Open Group specification ("... undefined ..."). We might want to revise this if we find it causes compatibility problems for applications in practice. PR: kern/74242, kern/68459 Obtained from: Apple, Inc. MFC after: 3 weeks	2008-06-22 21:51:32 +00:00
Oleksandr Tymoshenko	22035f4727	Use minimum of max_aio_procs and target_aio_procs when spawning new aiod since there should be no more then max_aio_procs processes.	2008-06-21 11:34:34 +00:00
Warner Losh	c14909b6e2	Split out the probing magic of device_probe_and_attach into device_probe() so that it can be used by busses that may wish to do additional processing between probe and attach. Reviewed by: dfr@	2008-06-20 16:58:15 +00:00
Alan Cox	ac68d1c960	Enforce the mapping of kernel loadable modules in the uppermost 2GB of the kernel virtual address space on amd64.	2008-06-20 06:24:34 +00:00
Xin LI	2110d913c0	Revert rev. 178124 as requested by kris@. Having jail id not being reused too frequently is useful for script controlled environment.	2008-06-19 21:41:57 +00:00
Oleksandr Tymoshenko	23c8064e66	Renew semaphore's pointer after wakeup since during msleep sem_base may have been modified by destroying one of semaphores and semptr would not be valid in this case. PR: kern/123731	2008-06-19 18:08:42 +00:00
Konstantin Belousov	05427aafc6	Struct cdev is always the member of the struct cdev_priv. When devfs needed to promote cdev to cdev_priv, the si_priv pointer was followed. Use member2struct() to calculate address of the wrapping cdev_priv. Rename si_priv to __si_reserved. Tested by: pho Reviewed by: ed MFC after: 2 weeks	2008-06-16 17:34:59 +00:00
John Birrell	5d846378f7	Remove code that isn't required. It actually breaks the case where KDTRACE_HOOKS is defined and KDB isn't. This is the case that it was intended for.	2008-06-16 04:44:29 +00:00
Ed Schouten	0f03ce1bb8	Turn dev2unit(), minor(), unit2minor() and minor2unit() into macro's. Now that we got rid of the minor-to-unit conversion and the constraints on device minor numbers, we can convert the functions that operate on minor and unit numbers to simple macro's. The unit2minor() and minor2unit() macro's are now no-ops. The ZFS code als defined a macro named `minor'. Change the ZFS code to use umajor() and uminor() here, as it is the correct approach to do this. Also add $FreeBSD$ to keep SVN happy. Approved by: philip (mentor), pjd	2008-06-12 08:30:54 +00:00
Ed Schouten	29d4cb241b	Don't enforce unique device minor number policy anymore. Except for the case where we use the cloner library (clone_create() and friends), there is no reason to enforce a unique device minor number policy. There are various drivers in the source tree that allocate unr pools and such to provide minor numbers, without using them themselves. Because we still need to support unique device minor numbers for the cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's that are used in combination with the cloner library should be marked with this flag to make the cloning work. This means drivers can now freely use si_drv0 to store their own flags and state, making it effectively the same as si_drv1 and si_drv2. We still keep the minor() and dev2unit() routines around to make drivers happy. The NTFS code also used the minor number in its hash table. We should not do this anymore. If the si_drv0 field would be changed, it would no longer end up in the same list. Approved by: philip (mentor)	2008-06-11 18:55:19 +00:00
Oleksandr Tymoshenko	c9688a603b	Keep proper track of nsegs counter: sem_free is called for all allocated semaphores, so it's wrong to increase it conditionally, in this case for every over-the-limit semaphore nsegs is decreased without being previously increased. PR: kern/123685 Approved by: cognet (mentor)	2008-06-10 20:55:10 +00:00
Konstantin Belousov	a70537835f	Provide the mutual exclusion between the nfs export list modifications and nfs requests processing. Lockmgr lock provides the shared locking for nfs requests, while exclusive mode is used for modifications. The writer starvation is handled by lockmgr too. Reported by: kris, pho, many Based on the submission by: mohan Tested by: pho MFC after: 2 weeks	2008-06-09 10:31:38 +00:00
Wojciech A. Koszek	2e75877f12	Remove checks against DDB, which isn't used in this file. My intention is to bring no functional change. Discussion on: IRC Reviewed by: ed, kan, rink,	2008-06-08 20:43:27 +00:00
Ed Schouten	5db88944ac	Remove unneeded Giant locking of /dev/tty. The Giant lock is acquired in two places in tty_tty.c. In both places, it is unneeded. There is no reason to specify D_NEEDGIANT on this device node. The device node has only been designed to return ENXIO when opened. It doesn't make any sense to lock/unlock Giant, just to return this error. D_TTY is also unneeded. The unimplemented functions don't need to be patched by devfs. We don't need to lock Giant when we want to lookup the proper TTY vnode. s_ttyvp is already protected by proctree_lock (see devfs_vnops.c). Approved by: philip (mentor)	2008-06-03 12:38:00 +00:00
David Xu	6e24e61797	Use a seperated hash table for mutex and rwlock, avoid wasting some time on walking through idle threads sleeping on condition variables.	2008-05-30 02:18:54 +00:00
Ed Schouten	06d425f92e	Remove the distinction between device minor and unit numbers. Even though we got rid of device major numbers some time ago, device drivers still need to provide unique device minor numbers to make_dev(). These numbers are only used inside the kernel. They are not related to device major and minor numbers which are visible in devfs. These are actually based on the inode number of the device. It would eventually be nice to remove minor numbers entirely, but we don't want to be too agressive here. Because the 8-15 bits of the device number field (si_drv0) are still reserved for the major number, there is no 1:1 mapping of the device minor and unit numbers. Because this is now unused, remove the restrictions on these numbers. The MAXMAJOR definition was actually used for two purposes. It was used to convert both the userspace and kernelspace device numbers to their major/minor pair, which is why it is now named UMINORMASK. minor2unit() and unit2minor() have now become useless. Both minor() and dev2unit() now serve the same purpose. We should eventually remove some of them, at least turning them into macro's. If devfs would become completely minor number unaware, we could consider using si_drv0 directly, just like si_drv1 and si_drv2. Approved by: philip (mentor)	2008-05-29 12:50:46 +00:00
Ed Schouten	cc8945d204	Remove redundant checks from fcntl()'s F_DUPFD. Right now we perform some of the checks inside the fcntl()'s F_DUPFD operation twice. We first validate the `fd' argument. When finished, we validate the `arg' argument. These checks are also performed inside do_dup(). The reason we need to do this, is because fcntl() should return different errno's when the `arg' argument is out of bounds (EINVAL instead of EBADF). To prevent the redundant locking of the PROC_LOCK and FILEDESC_SLOCK, patch do_dup() to support the error semantics required by fcntl(). Approved by: philip (mentor)	2008-05-28 20:25:19 +00:00
Ed Schouten	09a80aba8e	Rename `tty_subr.c' to` subr_clist.c'. Because clists are also used outside the TTY layer, rename the file containing the clist routines to something more accurate. The mpsafetty TTY layer doesn't use clists. It uses its own buffers, which also implement the unbuffered copying to userspace. We cannot simply remove the clist routines then, because this would break various drivers that are present within the kernel. Approved by: philip (mentor)	2008-05-27 06:41:50 +00:00
Attilio Rao	48972152ee	Improve a comment which, in the actual CVS stock, doesn't completely explain the logic of the code chunk.	2008-05-27 00:27:50 +00:00
Konstantin Belousov	887aedc64e	Take into account possible overflow when multiplying. The casuality is the malloc call later, panicing kernel due to the oversized allocation. Reported by: pho Reviewed by: jeff	2008-05-26 10:01:13 +00:00
Robert Watson	e4372ceba0	Remove netatm from HEAD as it is not MPSAFE and relies on the now removed NET_NEEDS_GIANT. netatm has been disconnected from the build for ten months in HEAD/RELENG_7. Specifics: - netatm include files - netatm command line management tools - libatm - ATM parts in rescue and sysinstall - sample configuration files and documents - kernel support as a module or in NOTES - netgraph wrapper nodes for netatm - ctags data for netatm. - netatm-specific device drivers. MFC after: 3 weeks Reviewed by: bz Discussed with: bms, bz, harti	2008-05-25 22:11:40 +00:00
Attilio Rao	5047a8fd88	The "if" semantic is not needed, just fix this.	2008-05-25 16:11:27 +00:00
Attilio Rao	258f4727f1	Replace direct atomic operation for the file refcount witht the refcount interface. It also introduces the correct usage of memory barriers, as sometimes fdrop() and fhold() are used with shared locks, which don't use any release barrier.	2008-05-25 14:57:43 +00:00
John Birrell	6f5f25e521	Add the vtime (virtual time) hooks for DTrace.	2008-05-25 01:44:58 +00:00
John Birrell	5d217f173c	Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.	2008-05-24 06:22:16 +00:00
Craig Rodrigues	a9722ace80	Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since we do it further down in ffs_vfsops.c MFC after: 1 month	2008-05-23 23:33:07 +00:00
Konstantin Belousov	15822fcdbe	Rev. 1.274 put the ttyrel() call before the destroy_dev() in the ttyfree(), freeing the tty. Since destroy_dev() may call d_purge() cdevsw method, that is the ttypurge() for the tty, the code ends up accessing freed tty structure. Put the ttyrel() after destroy_dev() in the ttyfree. To prevent the panic the rev. 1.274 provided fix for, check the TS_GONE in sysctl handler and refuse to provide information on such tty. Reported, debugging help and tested by: pho DIscussed with and reviewed by: jhb MFC after: 1 week	2008-05-23 16:47:55 +00:00
Konstantin Belousov	cc57af357b	The dev_refthread() in the tty_gettp() may fail, because Giant is taken in the giant_trick routines after the dev_refthread increments the si_threadcount. Remove assert, do not perform dev_relthread() for failed dev_refthread(), and handle failure in the tty_gettp() callers (cdevsw tty methods). Before kern_conf.c 1.210 and 1.211, the kernel usually paniced in the giant_trick routines dereferencing NULL cdevsw, not taking this fault. Reported by: Vince Hoffman <jhary unsane co uk> Debugging help and tested by: pho Reviewed by: jhb MFC after: 1 week	2008-05-23 16:46:27 +00:00
Konstantin Belousov	ca091c56e3	Use the t_state for the TS_GONE test. Submitted by: jhb MFC after: 3 days	2008-05-23 16:43:59 +00:00
Konstantin Belousov	06fe11294d	Assert that si_threadcount > 0 before decrementing it. This helps catching the improper use of the dev_refthread/dev_relthread. Tested by: pho MFC after: 1 week	2008-05-23 16:38:38 +00:00

1 2 3 4 5 ...

10536 Commits