freebsd-dev

Author	SHA1	Message	Date
John Baldwin	64ecd1399f	- Make maxpipekva a signed long rather than an unsigned long as overflow is more likely to be noticed with signed types. - Make amountpipekva a long as well to match maxpipekva. Discussed with: bde	2009-03-10 21:28:43 +00:00
John Baldwin	060e911cf4	In the ABI shim for vfs.bufspace, rather than truncating values larger than INT_MAX to INT_MAX, just go ahead and write out the full long to give an error of ENOMEM to the user process. Requested by: bde	2009-03-10 21:27:15 +00:00
John Baldwin	72150f4cf7	- Remove a recently added comment from kernel_sysctlbyname() that isn't needed. - Move the release of the sysctl sx lock after the vsunlock() in userland_sysctl() to restore the original memlock behavior of minimizing the amount of memory wired to handle sysctl requests. MFC after: 1 week	2009-03-10 17:00:28 +00:00
John Baldwin	38cce81ab3	Add an ABI compat shim for the vfs.bufspace sysctl for sysctl requests that try to fetch it as an int rather than a long. If the current value is greater than INT_MAX it reports a value of INT_MAX.	2009-03-10 15:26:50 +00:00
John Baldwin	5bd65606f4	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month	2009-03-09 19:35:20 +00:00
John Baldwin	4ab2a9a022	Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of the total runtime of 'sysctl -a'. It also performs lots of copyout's while holding the namecache lock (this does not attempt to fix that). MFC after: 2 weeks	2009-03-09 19:04:53 +00:00
Warner Losh	2a7e13e5ad	Fix a long-standing bug in newbus. It was introduced when subclassing was introduced. If you have a bus, say cardbus, that is derived from a base-bus (say PCI), then ordinarily all PCI drivers would attach to cardbus devices. However, there had been one exception: kldload wouldn't work. The problem is in devclass_add_driver. In this routine, all we did was call to the pci device's BUS_DRIVER_ADDED routine. However, since cardbus bus instances had a different devclass, none of them were called. The solution is to call all subclass devclasses, recursively down the tree, of the class that was loaded. Since we don't have a 'children class' pointer, we search the whole list of devclasses for a class whose parent matches. Since just done a kldload time, this isn't as bad as it sounds. In addition, we short-circuit the whole process by marking those classes with subclasses with a flag. We'll likely have to reevaluate this method the number of devclasses with subclasses gets large. This means we can remove the "cardbus" lines from all the PCI drivers since we have no cardbus specific attach device attachments in the tree. # Also: minor tweak to an error message	2009-03-09 13:20:23 +00:00
Robert Watson	83160d1408	By default, don't compile in counters of calls to various time query functions in the kernel, as these effectively serialize parallel calls to the gettimeofday(2) system call, as well as other kernel services that use timestamps. Use the NetBSD version of the fix (kern_tc.c:1.32 by ad@) as they have picked up our timecounter code and also ran into the same problem. Reported by: kris Obtained from: NetBSD MFC after: 3 days	2009-03-08 22:19:28 +00:00
Robert Watson	3dab55bc86	Decompose the global UNIX domain sockets rwlock into two different locks: a global list/counter/generation counter protected by a new mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock, which protects the connections between UNIX domain sockets. This eliminates conditional lock acquisition that was previously a property of the global lock being held over sonewconn() leading to a call to uipc_attach(), which also required the global lock, but couldn't rely on it as other paths existed to uipc_attach() that didn't hold it: now uipc_attach() uses only the list lock, which follows the linkage lock in the lock order. It may also reduce contention on the global lock for some workloads. Add global UNIX domain socket locks to hard-coded witness lock order. MFC after: 1 week Discussed with: kris	2009-03-08 21:48:29 +00:00
Joe Marcus Clarke	f8ecc40737	Add a default implementation for VOP_VPTOCNP(9) which scans the parent directory of a vnode to find a dirent with a matching file number. The name from that dirent is then used to provide the component name. Note: if the initial vnode argument is not a directory itself, then the default VOP_VPTOCNP(9) implementation still returns ENOENT. Reviewed by: kib Approved by: kib Tested by: pho	2009-03-08 19:05:53 +00:00
Robert Watson	fefd0ac8a9	Remove 'uio' argument from MAC Framework and MAC policy entry points for extended attribute get/set; in the case of get an uninitialized user buffer was passed before the EA was retrieved, making it of relatively little use; the latter was simply unused by any policies. Obtained from: TrustedBSD Project Sponsored by: Google, Inc.	2009-03-08 12:32:06 +00:00
Robert Watson	6f6174a762	Improve the consistency of MAC Framework and MAC policy entry point naming by renaming certain "proc" entry points to "cred" entry points, reflecting their manipulation of credentials. For some entry points, the process was passed into the framework but not into policies; in these cases, stop passing in the process since we don't need it. mac_proc_check_setaudit -> mac_cred_check_setaudit mac_proc_check_setaudit_addr -> mac_cred_check_setaudit_addr mac_proc_check_setauid -> mac_cred_check_setauid mac_proc_check_setegid -> mac_cred_check_setegid mac_proc_check_seteuid -> mac_cred_check_seteuid mac_proc_check_setgid -> mac_cred_check_setgid mac_proc_check_setgroups -> mac_cred_ceck_setgroups mac_proc_check_setregid -> mac_cred_check_setregid mac_proc_check_setresgid -> mac_cred_check_setresgid mac_proc_check_setresuid -> mac_cred_check_setresuid mac_proc_check_setreuid -> mac_cred_check_setreuid mac_proc_check_setuid -> mac_cred_check_setuid Obtained from: TrustedBSD Project Sponsored by: Google, Inc.	2009-03-08 10:58:37 +00:00
Konstantin Belousov	125dcf8c7d	Extract the no_poll() and vop_nopoll() code into the common routine poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO. Noted and tested by: hps MFC after: 1 week	2009-03-06 15:35:37 +00:00
Konstantin Belousov	45329b60da	Systematically use vm_size_t to specify the size of the segment for VM KPI. Do not overload the local variable size in kern_shmat() due to vm_size_t change. Fix style bug by adding explicit comparision with 0. Discussed with: bde MFC after: 1 week	2009-03-05 11:45:42 +00:00
Dmitry Chagin	b2421c29f6	as suggested by jhb@, panic in case the ncpus == 0. it helps to catch bugs in the callers. Approved by: kib (mentor) MFC after: 5 days	2009-03-03 17:34:09 +00:00
Robert Watson	73e416e35d	Reduce the verbosity of SDT trace points for DTrace by defining several wrapper macros that allow trace points and arguments to be declared using a single macro rather than several. This means a lot less repetition and vertical space for each trace point. Use these macros when defining privilege and MAC Framework trace points. Reviewed by: jb MFC after: 1 week	2009-03-03 17:15:05 +00:00
Jamie Gritton	f86bce5ed0	Extend the "vfsopt" mount options for more general use. Make struct vfsopt and the vfs_buildopts function public, and add some new fields to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and vfs_opterror. Further extend the interface to allow reading options from the kernel in addition to sending them to the kernel, with vfs_setopt and related functions. While this allows the "name=value" option interface to be used for more than just FS mounts (planned use is for jails), it retains the current "vfsopt" name and <sys/mount.h> requirement. Approved by: bz (mentor)	2009-03-02 23:26:30 +00:00
Alexander Kabaev	5ab4bb35fb	Change vfs_busy to wait until an outcome of pending unmount operation is known and to retry or fail accordingly to that outcome. This fixes the problem with namespace traversing programs failing with random ENOENT errors if someone just happened to try to unmount that same filesystem at the same time. Reported by: dhw Reviewed by: kib, attilio Sponsored by: Juniper Networks, Inc.	2009-03-02 20:51:39 +00:00
Konstantin Belousov	65067cc8b0	Correct types of variables used to track amount of allocated SysV shared memory from int to size_t. Implement a workaround for current ABI not allowing to properly save size for and report more then 2Gb sized segment of shared memory. This makes it possible to use > 2 Gb shared memory segments on 64bit architectures. Please note the new BUGS section in shmctl(2) and UPDATING note for limitations of this temporal solution. Reviewed by: csjp Tested by: Nikolay Dzham <i levsha org ua> MFC after: 2 weeks	2009-03-02 18:53:30 +00:00
Konstantin Belousov	2883703e00	Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process executing on 64bit kernel. This eliminates the direct comparisions of p_sysent with &ia32_freebsd_sysvec, that were left intact after r185169.	2009-03-02 18:43:50 +00:00
Dmitry Chagin	6485a22ccb	Fix range-check error introduced in r182292. Also do not do anything if all processors in the map are not available, simply return. Approved by: kib (mentor) MFC after: 1 week	2009-03-01 14:26:24 +00:00
Ed Schouten	c4d4bcdaf6	Improve my previous changes to the TTY code: also remove memcpy(). It's better to just use internal language constructs, because it is likely the compiler has a better opinion on whether to perform inlining, which is very likely to happen to struct winsize. Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-03-01 09:50:13 +00:00
Andrew Thompson	fef11cb704	Move the NORELEASE check to after the recurse count decrement and bailout, this is not counted as actually releasing the lock.	2009-02-28 19:10:43 +00:00
Ed Schouten	4b2d6aaf4b	Replace bcopy() calls inside the TTY layer with memcpy()/strlcpy(). In all these cases the buffers never overlap. Program names are also likely to be shorter, so use a regular strlcpy() to copy p_comm.	2009-02-28 14:20:26 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
Ed Schouten	91c3cbfe1f	Remove redundant code in printf() and vprintf(). printf() and vprintf() are exactly the same, except the way arguments are passed. Just like we see in other pieces of code (i.e. libc's printf()), implement printf() using vprintf(). Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-02-27 13:28:54 +00:00
Ed Schouten	ff7b7d9039	Revert previous commit to subr_prf.c and make it more tidy. As mentioned by bz and bde, the change I made wasn't the proper way to fix. Inspired by bde's patch, perform some small cleanups to uprintf(). Reviewed by: bz	2009-02-27 12:50:25 +00:00
Ed Schouten	69c9eff894	Remove unneeded pointer `ndp'. Inside do_execve(), we have a pointer `ndp', which always points to `&nd'. I can imagine a primitive (non-optimizing) compiler to really reserve space for such a pointer, so just remove the variable and use `&nd' directly.	2009-02-26 16:32:48 +00:00
Ed Schouten	c90c9021e9	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build	2009-02-26 15:51:54 +00:00
Ed Schouten	318b1c3fd0	Remove unneeded variable `ocn_mute'. Found by: LLVM's scan-build	2009-02-26 13:01:45 +00:00
Ed Schouten	5225593633	Remove unused variables `p' and unneeded assignments of` rval'. Found by: LLVM's scan-build	2009-02-26 13:00:13 +00:00
Ed Schouten	2bbada90c8	Remove redundant assignment of `p'. `p' is already initialized with `td->td_proc'. Because td is always curthread, it is safe to initialize it without any locks. Found by: LLVM's scan-build	2009-02-26 12:12:34 +00:00
Robert Watson	6efcc2f26a	Add static tracing for privilege checking: priv:kernel:priv_check:priv_ok fires for granted privileges priv:kernel:priv_check:priv_errr fires for denied privileges The first argument is the requested privilege number. The naming convention is a little different from the OpenSolaris equivilent because we can't have '-' in probefunc names, and our privilege namespace is different. MFC after: 1 week	2009-02-26 10:56:13 +00:00
Ed Schouten	9e5775857d	Silence compiler warning inside our ^T handler. It turns out we're casting fixpt_t* to int*. Spotted by: clang	2009-02-26 10:38:19 +00:00
Ed Schouten	1d952ed28c	Use unsigned longs for the TTY's sysctl stats. Spotted by: clang	2009-02-26 10:28:32 +00:00
Ed Schouten	1e737f33a0	Don't use PTY name as format string, even though it isn't insecure here. It's guaranteed that the `name' variable always contains a string of the form pty[l‐sL‐S][0‐9a‐v], but I'd rather keep the compiler happy (LLVM).	2009-02-26 10:14:10 +00:00
Jamie Gritton	613042491b	Add support for methods to the OSD subsystem. Each object type has a predefined set of methods, which are set in osd_register() and called via osd_call(). Currently, no methods are defined, though prison objects will have some in the future. Expand the locking from a single per-type mutex to three different kinds of locks (four if you include the requirement that the container (e.g. prison) be locked when getting/setting data). This clears up one existing issue, as well as others added by the method support. Approved by: bz (mentor)	2009-02-21 11:15:38 +00:00
Ed Schouten	0eee862a54	Don't make Linux stat() open character devices to resolve its name. The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)	2009-02-20 13:05:29 +00:00
John Baldwin	03964c8e09	Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans	2009-02-19 22:28:48 +00:00
Ed Schouten	40d05103d8	Squash some small bugs in pts(4). - Don't return a negative errno when using an unknown ioctl() on a pseudo-terminal master device. Be sure to convert ENOIOCTL to ENOTTY, just like the TTY layer does. - Even though we should return st_rdev of the master device node when emulating pty(4) devices, FIODGNAME should still return the name of the slave device. Otherwise ptsname(3) and ttyname(3) return an invalid device name.	2009-02-19 17:54:42 +00:00
Attilio Rao	f8d9048018	- Add a function (fill_kinfo_aggregate()) which aggregates relevant members for a kinfo entry on a process-wide system. - Use the newly introduced function in order to fix cases like KERN_PROC_PROC where aggregating stats are broken because they just consider the first thread in the pool for each process. (Note, additively, that KERN_PROC_PROC is rather inaccurate on thread-wide informations like the 'state' of the process. Such informations should maybe be invalidated and being forceably discarded by the consumers?). - Simplify the logic of sysctl_out_proc() and adjust the fill_kinfo_thread() accordingly. - Remove checks on the FIRST_THREAD_IN_PROC() being NULL but add assertives. This patch should fix aggregate statistics for KERN_PROC_PROC. This is one of the reasons why top doesn't use this option and now it can be use it safely. ps, when launched in order to display just processes, now should report correct cpu utilization percentages and times (as opposed by the old code). Reviewed by: jhb, emaste Sponsored by: Sandvine Incorporated	2009-02-18 21:52:13 +00:00
Joe Marcus Clarke	0618630015	Remove the printf's when the vnode to be exported for procstat is not a VDIR. If the file system backing a process' cwd is removed, and procstat -f PID is called, then these messages would have been printed. The extra verbosity is not required in this situation. Requested by: kib Approved by: kib	2009-02-14 21:55:09 +00:00
Joe Marcus Clarke	03fd9c2092	Change two KASSERTS to printfs and simple returns. Stress testing has revealed that a process' current working directory can be VBAD if the directory is removed. This can trigger a panic when procstat -f PID is run. Tested by: pho Discovered by: phobot Reviewed by: kib Approved by: kib	2009-02-14 21:12:24 +00:00
Andrew Thompson	a1797ef6c8	Remove semicolon left in the last commit Spotted by: csjp	2009-02-13 18:51:39 +00:00
John Baldwin	ea77ff0a15	Use shared vnode locks when invoking VOP_READDIR(). MFC after: 1 month	2009-02-13 18:18:14 +00:00
Luigi Rizzo	d4619572b4	Clarify and reimplement the bioq API so that bioq_disksort() has the correct behaviour (sorting by distance from the current head position in the scan direction) and bioq_insert_head() and bioq_insert_tail() have a well defined (and useful) behaviour, especially when intermixed with calls to bioq_disksort(). In particular: - fix a bug in the existing bioq_disksort() that did not use the current head position correctly; - redefine semantics of bioq_insert_head() and bioq_insert_tail(). bioq_insert_tail() can now be used as a barrier between previous and subsequent calls to bioq_disksort(). The code is heavily documented in the source code so please refer to that for the details. Much of this code comes from Fabio Checconi. Also thanks to Kirk for feedback on the (re)definition of bioq_insert_tail(). NOTE: in the current tree there is only a handful of files which intermix calls to bioq_disksort() with bioq_insert_head() and bioq_insert_tail(). The ordering of the queue in these situation was not specified (nor easy to figure out) before, so I doubt any of that code could be affected by the specification of the API. Also note that the current implementation is significantly simpler than the previous one (also used in ata_sort_queue()). It would be useful to reimplement ata_sort_queue() using the same code used in bioq_disksort(). MFC after: 1 week	2009-02-13 11:36:32 +00:00
Andrew Thompson	24ef070126	Check the exit flag at the start of the taskqueue loop rather than the end. It is possible to tear down the taskqueue before the thread has run and the taskqueue loop would sleep forever. Reviewed by: sam MFC after: 1 week	2009-02-13 01:16:51 +00:00
Ed Schouten	c0086bf202	Serialize write() calls on TTYs. Just like the old TTY layer, the current MPSAFE TTY layer does not make any attempt to serialize calls of write(). Data is copied into the kernel in 256 (TTY_STACKBUF) byte chunks. If a write() call occurs at the same time, the data may interleave. This is especially likely when the TTY starts blocking, because the output queue reaches the high watermark. I've implemented this by adding a new flag, TTY_BUSY_OUT, which is used to mark a TTY as having a thread stuck in write(). Because I don't want non-blocking processes to be possibly blocked by a sleeping thread, I'm still allowing it to bypass the protection. According to this message, the Linux kernel returns EAGAIN in such cases, but I think that's a little too restrictive: http://kerneltrap.org/index.php?q=mailarchive/linux-kernel/2007/5/2/85418/thread PR: kern/118287	2009-02-11 16:28:49 +00:00
Robert Watson	54fffe2d67	Modify fdcopy() so that, during fork(2), it won't copy file descriptors from the parent to the child process if they have an operation vector of &badfileops. This narrows a set of races involving system calls that allocate a new file descriptor, potentially block for some extended period, and then return the file descriptor, when invoked by a threaded program that concurrently invokes fork(2). Similar approches are used in both Solaris and Linux, and the wideness of this race was introduced in FreeBSD when we moved to a more optimistic implementation of accept(2) in order to simplify locking. A small race necessarily remains because the fork(2) might occur after the finit() in accept(2) but before the system call has returned, but that appears unavoidable using current APIs. However, this race is vastly narrower. The fix can be validated using the newfileops_on_fork regression test. PR: kern/130348 Reported by: Ivan Shcheklein <shcheklein at gmail dot com> Reviewed by: jhb, kib MFC after: 1 week	2009-02-11 15:22:01 +00:00
Warner Losh	c9584ebe61	o Use NULL in pereference to 0 in pointer contexts. o Use newly minted KOBJMETHOD_END as appropriate o fix prototype for root_setup_intr.	2009-02-11 04:54:02 +00:00

1 2 3 4 5 ...

10976 Commits