freebsd-skq

Author	SHA1	Message	Date
jhb	ee8312c8bb	Use shared vnode locks instead of exclusive vnode locks for the access(), chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month	2008-11-03 20:31:00 +00:00
attilio	26a604f3bc	Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless. Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho	2008-11-03 20:00:35 +00:00
jhb	24139401dd	A few style nits.	2008-11-03 19:33:20 +00:00
dfr	6929a6d99b	Regen.	2008-11-03 10:39:35 +00:00
dfr	2fb03513fc	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month	2008-11-03 10:38:00 +00:00
ivoras	d819bb20f8	Increase the initial sbuf size for CPU topology dump to something more usable for newer CPUs. The new value allows 2 x quad core configuration dumps to fit within the initial buffer without reallocations. Approved by: gnn (mentor) (older version) Pointed out by: rdivacky	2008-11-02 23:11:20 +00:00
attilio	e1f493235e	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho	2008-11-02 10:15:42 +00:00
ed	57b4089c20	Clamp the values of t_column to 5 digits in `pstat -t' and` show all ttys'. We often run into these very high column numbers when we run curses applications, because they don't print any newlines. This messes up the table output of `pstat -t'. If these numbers get really high, they aren't of any use to the reader anyway. Convert them to `99999' when they run out of bounds.	2008-11-01 13:40:46 +00:00
ed	c2c324d379	Reimplement the /dev/console device node. One of the pieces of code that I had left alone during the development of the MPSAFE TTY layer, was tty_cons.c. This file actually has two different functions: - It contains low-level console input/output routines (cnputc(), etc). - It creates /dev/console and wraps all its cdevsw calls to the appropriate TTY. This commit reimplements the second set of functions by moving it directly into the TTY layer. /dev/console is now a character device node that's basically a regular TTY, but does a lookup of `si_drv1' each time you open it. d_write has also been changed to call log_console(). d_close() is not present, because we must make sure we don't revoke the TTY after writing a log message to it. Even though I'm not convinced this is in line with the future directions of our console code, it is a good move for now. It removes recursive locking from the top half of the TTY layer. The previous implementation called into the TTY layer with Giant held. I'm renaming tty_cons.c to kern_cons.c now. The code hardly contains any TTY related bits, so we'd better give it a less misleading name. Tested by: Andrzej Tobola <ato iem pw edu pl>, Carlos A.M. dos Santos <unixmania gmail com>, Eygene Ryabinkin <rea-fbsd codelabs ru>	2008-11-01 08:35:28 +00:00
peter	1f7fd22cbb	Add three extra to the kinfo_proc_vmmap data. kve_offset - the offset within an object that a mapping refers to. fileid and fsid are inode/dev for vnodes. (Linux procfs has these and valgrind is really unhappy without them.) I believe I didn't change the size of the struct.	2008-10-31 05:43:19 +00:00
sobomax	dafc63cd43	Make it possible to compile kernel with KTR but without DDB.	2008-10-30 21:48:28 +00:00
ivoras	483637ae39	Introduce a new sysctl, kern.sched.topology_spec, that returns an XML dump of detected ULE CPU topology. This dump can be used to check the topology detection and for general system information. An example of CPU topology dump is: kern.sched.topology_spec: <groups> <group level="1" cache-level="0"> <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu> <flags></flags> <children> <group level="2" cache-level="0"> <cpu count="4" mask="0xf">0, 1, 2, 3</cpu> <flags></flags> </group> <group level="2" cache-level="0"> <cpu count="4" mask="0xf0">4, 5, 6, 7</cpu> <flags></flags> </group> </children> </group> </groups> Reviewed by: jeff Approved by: gnn (mentor)	2008-10-29 13:36:23 +00:00
davidxu	11aa09b488	If threads limit is exceeded, increase the totoal number of failures.	2008-10-29 12:11:48 +00:00
trasz	4e57a80147	Rename a variable missed in previous accmode_t-related commits. Approved by: rwatson (mentor)	2008-10-28 21:58:48 +00:00
trasz	0ad8692247	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
kib	b9b0d2c54c	Style return statements in vn_pollrecord().	2008-10-28 12:22:33 +00:00
kib	86b5e61ab2	Protect check for v_pollinfo == NULL and assignment of the newly allocated vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week	2008-10-28 12:08:36 +00:00
rwatson	a2129bd144	Rename three MAC entry points from _proc_ to _cred_ to reflect the fact that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies. Obtained from: TrustedBSD Project	2008-10-28 11:33:06 +00:00
peter	b5b26198a7	After a machine has been up for a bit more than 20 days with HZ=1000, "ticks" goes negative. This breaks the signed comparison in softclock. This causes sleep() to never wake up, tcp to stop, etc etc. This is bad(TM). Use the SEQ_LT() method from tcp's sequence number comparisons.	2008-10-28 03:26:25 +00:00
jhb	c343bee743	- Whitespace fix for vop_poll. - Use the right label for vop_vptofh lock assertions so they are enforced.	2008-10-27 21:41:55 +00:00
sobomax	2bddeb51d2	vm_pnames should be "const char *const[]". Submitted by: Christoph Mallon	2008-10-27 08:09:05 +00:00
sobomax	c9fd562aa0	vm_pnames has no reason to be global. MFC after: 2 weeks	2008-10-27 06:34:41 +00:00
sobomax	6b076dc603	Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly. Due to the nature of the beast it causes lot of unproductive overhead. This is especially bad when running SMP kernel on VMWare with several virtual processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my dual-core MacBook Pro when I am enabling two virtual CPUs, making even host not very usable. Detect when we are running in the sandbox and reduce HZ to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This brings host CPU usage of idle FreeBSD/SMP on two virtual processors down to 10%. Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox and VirtualPC. MFC after: 2 weeks	2008-10-27 06:25:02 +00:00
dfr	f98e1f1bbf	Don't rely on the value of *statep without first taking the vnode interlock. Reviewed by: Mike Tancsa MFC after: 2 weeks	2008-10-24 16:04:10 +00:00
davidxu	238f3ee5f4	Don't rearm callout if the process is exiting, it may leak a callout because callout_drain() only waits for running callout, but not disable it if it is rearmed.	2008-10-24 01:09:24 +00:00
davidxu	e66e7ee6bb	partly revert revision 184199, because TDF_NEEDSIGCHK is persitent when thread is in kernel mode, it can cause dead loop, now unlock process lock after acquired sleep queue lock and thread lock to avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must be set with process lock and thread lock being hold at same time.	2008-10-24 01:03:31 +00:00
jhb	2e4682de75	Whitespace fix.	2008-10-23 21:50:16 +00:00
des	a1e1ad22e0	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
des	66f807ed8b	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
davidxu	2062caca24	Actually, for signal and thread suspension, extra process spin lock is unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.	2008-10-23 07:55:38 +00:00
jhb	327ae6eb3a	Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week	2008-10-22 21:55:48 +00:00
marcel	7de1858d0c	Trivially avoid a null pointer dereference when drivers don't set the rman description. While drivers should set it, a kernel panic is not the right behaviour when faced without one.	2008-10-22 18:20:45 +00:00
thompsa	0fcb99be5e	Fix spelling mistake in the last rev.	2008-10-21 14:44:25 +00:00
thompsa	8ee58ba9e6	If we have getc_inject hooked then the outq buffer is inaccessible to the driver so skip the drain rather than waiting indefinitely. Reviewed by: ed	2008-10-21 14:18:45 +00:00
kib	cc3d7dc928	Change vn_start_write() to clear *mpp on all failures when non-NULL vp is supplied, since vm_pageout_scan() expects it to be cleared on error. Submitted by: tegge PR: 123768 MFC after: 1 week	2008-10-21 09:55:49 +00:00
attilio	42c5b05453	In the actual code for witness_warn: - If there aren't spinlocks held, but there are problems with old sleeplocks, they are not reported. - If the spinlock found is not the only one, problems are not reported. Fix these 2 problems. Reported by: tegge	2008-10-20 19:22:16 +00:00
kib	e4785f6af4	Assert that v_holdcnt is non-zero before entering lockmgr in vn_lock and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks	2008-10-20 10:11:33 +00:00
kib	015479d466	In vfs_busy(), lockmgr() cannot legitimately sleep, because code checked MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always tries to obtain a shared lock on mnt_lock, the other user is unmount who tries to drain it, setting MNTK_UNMOUNT before. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks	2008-10-20 10:07:28 +00:00
davidxu	57a7a67ea5	In realtimer_delete(), clear timer's value and interval to tell realtimer_expire() to not rearm the timer, otherwise there is a chance that a callout will be left there and be tiggered in future unexpectly. Bug reported by: tegge@	2008-10-20 02:37:53 +00:00
kib	e8c0b1746f	Ktr(9) stores format string and arguments in the event circular buffer, not the string formatted at the time of CTRX() call. Stack_ktr(9) uses an on-stack buffer for the symbol name, that is supplied as an argument to ktr. As result, stack_ktr() traces show garbage or cause page faults. Fix stack_ktr() by using pointer to module symbol table that is supposed to have a longer lifetime. Tested by: pho MFC after: 1 week	2008-10-19 11:13:49 +00:00
kmacy	4ceda2abba	- Forward port flush of page table updates on context switch or userret - Forward port vfork XEN hack	2008-10-19 01:35:27 +00:00
bz	4d4d2d367d	Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)	2008-10-17 16:26:16 +00:00
kmacy	f9a07efdb6	make sure that SO_NO_DDP and SO_NO_OFFLOAD get passed in correctly PR: 127360 MFC after: 3 days	2008-10-17 01:25:45 +00:00
attilio	708fbd2d50	- Fix a race in witness_checkorder() where, between the PCPU_GET() and PCPU_PTR() curthread can migrate on another CPU and get incorrect results. - Fix a similar race into witness_warn(). - Fix the interlock's checks bypassing by correctly using the appropriate children even when the lock_list chunk to be explored is not the first one. - Allow witness_warn() to work with spinlocks too. Bugs found by: tegge Submitted by: jhb, tegge Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-16 12:42:56 +00:00
davidxu	3f5ab59cf2	Restore code wrongly removed in SVN revision 173004, it causes threaded process to be stuck in execv(). Noticed by: delphij	2008-10-16 04:17:17 +00:00
ed	48c0c8f51a	Import some improvements to the TTY code from the MPSAFE TTY branch. - Change the ddb(4) commands to be more useful (by thompsa@): - `show ttys' is now called `show all ttys'. This command will now also display the address where the TTY data structure resides. - Add `show tty <addr>', which dumps the TTY in a readable form. - Place an upper bound on the TTY buffer sizes. Some drivers do not want to care about baud rates. Protect these drivers by preventing the TTY buffers from getting enormous. Right now we'll just clamp it to 64K, which is pretty high, taking into account that these buffers are only used by the built-in discipline. - Only call ttydev_leave() when needed. Back in April/May the TTY reference counting mechanism was a little different, which required us to call ttydev_leave() each time we finished a cdev operation. Nowadays we only need to call ttydev_leave() when we really mark it as being closed. - Improve return codes of read() and write() on TTY device nodes. - Make sure we really wake up all blocked threads when the driver calls tty_rel_gone(). There were some possible code paths where we didn't properly wake up any readers/writers. - Add extra assertions to prevent sleeping on a TTY that has been abandoned by the driver. - Use ttydev_cdevsw as a more reliable method to figure out whether a device node is a real TTY device node. Obtained from: //depot/projects/mpsafetty/... Reviewed by: thompsa	2008-10-15 16:58:35 +00:00
davidxu	5068f6dcf0	Move per-thread userland debugging flags into seperated field, this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.	2008-10-15 06:31:37 +00:00
rdivacky	ead773b051	Check the result of copyin and in a case of error return one. This prevents setting wrong priority or (more likely) returning EINVAL. Approved by: kib (mentor)	2008-10-13 21:04:52 +00:00
rwatson	ef6dfc27c4	Downgrade XXX to a Note for fgetsock() and fputsock(). MFC after: 3 days	2008-10-12 20:03:17 +00:00
rwatson	f2c33837dd	Remove stale comment: while uipc_connect2() was, until recently, not static so it could be used by fifofs (actually portalfs), it is now static. Submitted by: kensmith	2008-10-11 17:28:22 +00:00
attilio	b8bf37e585	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
imp	793aee6634	Close, but not eliminate, a race condition. It is one that properly designed drivers would never hit, but was exposed in diving into another problem... When expanding the devclass array, free the old memory after updating the pointer to the new memory. For the following single race case, this helps: allocate new memory copy to new memory free old memory <interrupt> read pointer to freed memory update pointer to new memory Now we do allocate new memory copy to new memory update pointer to new memory free old memory Which closes this problem, but doesn't even begin to address the multicpu races, which all should be covered by Giant at the moment, but likely aren't completely. Note: reviewers were ok with this fix, but suggested the use case wasn't one we wanted to encourage. Reviewed by: jhb, scottl.	2008-10-10 17:49:47 +00:00
kib	997f16fb43	If the ABI-overriden interpreter was not loaded, do not set have_interp to TRUE. This allows the code in image activator to try /libexec/ld-elf.so.1 as interpreter when newinterp is not found to execute. Reviewed by: peter MFC after: 2 weeks (together with r175105)	2008-10-08 11:11:36 +00:00
rwatson	8315016284	Remove stale comment (and XXX saying so) about why we zero the file descriptor pointer in unp_freerights: we can no longer recurse into unp_gc due to unp_gc being invoked in a deferred way, but it's still a good idea. MFC after: 3 days	2008-10-08 06:26:51 +00:00
rwatson	82c89c763f	Differentiate pr_usrreqs for stream and datagram UNIX domain sockets, and employ soreceive_dgram for the datagram case. MFC after: 3 months	2008-10-08 06:19:49 +00:00
rwatson	72d39e41ec	In soreceive_dgram, when a 0-length buffer is passed into recv(2) and no data is ready, return 0 rather than blocking or returning EAGAIN. This is consistent with the behavior of soreceive_generic (soreceive) in earlier versions of FreeBSD, and restores this behavior for UDP. Discussed with: jhb, sam MFC after: 3 days	2008-10-07 20:57:55 +00:00
rwatson	494f70982b	Remove temporary debugging KASSERT's introduced to detect protocols improperly invoking sosend(), soreceive(), and sopoll() instead of attach either specialized or _generic() versions of those functions to their pru_sosend, pru_soreceive, and pru_sopoll protosw methods. MFC after: 3 days	2008-10-07 09:57:03 +00:00
rwatson	73c76af492	Rewrite sbreserve_locked()'s comment on NULL thread pointers, eliminating an XXXRW about the comment being stale. MFC after: 3 days	2008-10-07 09:51:39 +00:00
rwatson	064d14f0bc	Lock receive socket buffer in soo_stat() rather than commenting that we should lock it, which may marginally improve the consistency of the results. Remove comment. MFC after: 3 days	2008-10-07 07:10:28 +00:00
rwatson	730ae9451a	Now that portalfs doesn't directly invoke uipc_connect2(), make it a static symbol. MFC after: 3 days	2008-10-06 18:43:11 +00:00
sam	5a99959acc	dynamically allocate the task structure in firmware_mountroot: when booting from an MFS root (e.g. from an install CD) firmware_mountroot can be called twice with the second call happening before the task callback occurs; this results in the task structure contents being corrupted because it was declared static. Submitted by: marius (original version)	2008-10-04 23:58:02 +00:00
jhb	c07f87c6a9	Oops, missed updating a place with with 's/lock1/plock/' when adding interlock support to WITNESS. Specifically, the printf listing the first location when duplicate locks of the same type are acquired. Reported by: pho	2008-10-03 18:13:05 +00:00
rwatson	bbe0e18165	Further minor cleanups to UNIX domain sockets: - Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(), unp_init(), and unp_externalize(), none of which have been required outside of uipc_usrreq.c since uipc_proto.c was removed. - Remove stale prototype for uipc_usrreq(), which has not existed in the code since 1997 - Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and not un.h. - Comment on why uipc_connect2() is still non-static -- it is used directly by fifofs. - Remove stale comments, tidy up whitespace. MFC after: 3 days (where applicable)	2008-10-03 13:01:56 +00:00
rwatson	6ee0280c0f	Remove or update several stale comments. A bit of whitespace/style cleanup. Update copyright. MFC after: 3 days (applicable changes)	2008-10-03 09:01:55 +00:00
zec	8797d4caec	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
peter	ed8d07f232	Collect N identical (or near identical) mkdumpheader() implementations into one, as threatened in the comment. Textdump magic can be passed in.	2008-10-01 22:08:53 +00:00
jhb	f99df3bfe4	Enable shared locks for path name lookups on supported filesystems (NFS client, UFS, and ZFS) by default.	2008-10-01 19:25:37 +00:00
jhb	d705b32c33	Remove the LOOKUP_SHARED kernel option. Instead, make vfs.lookup_shared a loader tunable (it was already a sysctl).	2008-10-01 19:24:16 +00:00
jhb	ee566dffaa	Wait until after dropping the receive socket buffer lock to allocate space to store the socket address stored in the first mbuf in a packet chain. This reduces contention on the lock and CPU system time in certain UDP workloads. Tested by: ps Reviewed by: rwatson MFC after: 1 week	2008-10-01 19:14:05 +00:00
rwatson	2d03779951	Various cleanups for soreceive_dgram(): - Update or remove comments that were left over from the original soreceive_generic() implementation. Quite a few were misleading in the context of the new code. - Since soreceive_dgram() has a simpler structure, replace several gotos with a while loop making the invariants more clear. - In the blocking while loop, don't try to handle cases incompatible with the loop invariant (since m is always NULL, don't check for and handle non-NULL). - Don't drop and re-acquire the socket buffer lock unnecessarily after sbwait() returns, which may help reduce lock contention (etc). - Assume PR_ATOMIC since we assert it at the top of the function. MFC after: 3 days	2008-10-01 13:26:52 +00:00
jhb	3eb652b1c7	Update the function name in several assertions in soreceive_dgram(). Approved by: rwatson MFC after: 3 days	2008-09-30 18:44:26 +00:00
kib	1fb31bd167	If the panic thread is preempted after setting panicstr but before setting TDF_INPANIC then it will never be rescheduled again. Wrap setting the panic condition with the critical section. Noted and reviewed by: tegge MFC after: 1 week	2008-09-27 15:45:54 +00:00
ed	e40b7c4704	Move uminor() and umajor() to the same place as userspace minor() and major(). The uminor() and umajor() functions have the same use in kernel space as the minor() and major() functions in userspace. If we ever get rid of the minor() function in kernel space, we could decide to just expose minor() and major() to kernel space, making uminor() and umajor() redundant. There are two reasons why we want to have uminor() and umajor() in <sys/types.h>: - Having them close together prevents them from diverting. Even though it's unlikely the definitions will change, it's a good habit to have them at the same place. - They don't really belong in kern_conf.c. kern_conf.c has been liberated from dealing with device major and minor number handling. The device_ids(9) manpage now lists the wrong #include's, because it should only list <sys/types.h> now. I'm leaving it as it is now, because I wonder if we should document them anyway. We're probably better off documenting minor(3) and major(3).	2008-09-27 13:19:09 +00:00
ed	4efdef565f	Replace all calls to minor() with dev2unit(). After I removed all the unit2minor()/minor2unit() calls from the kernel yesterday, I realised calling minor() everywhere is quite confusing. Character devices now only have the ability to store a unit number, not a minor number. Remove the confusion by using dev2unit() everywhere. This commit could also be considered as a bug fix. A lot of drivers call minor(), while they should actually be calling dev2unit(). In -CURRENT this isn't a problem, but it turns out we never had any problem reports related to that issue in the past. I suspect not many people connect more than 256 pieces of the same hardware. Reviewed by: kib	2008-09-27 08:51:18 +00:00
ed	2e07c6d916	Don't forget to initialize `int error' in ttydev_open(). I've had some reports in the past that opening an already opened TTY through, for example, /dev/tty can fail with random error codes. Looking at ttydev_open(), I can see there is a way `error' is returned without initialising it. Even though I haven't had any confirmation this fixes the bug, I'll fix it anyway. Reported by: Andrzej Tobola <ato iem pw edu pl>	2008-09-26 18:17:04 +00:00
ed	d421caabf9	Rename the `minor' argument of make_dev(9) to` unit'. To prevent any further confusion about device minor and unit numbers, we'd better just refer to device unit numbers. Many people still think the numbers we show inside devfs have any relation to the numbers passed to make_dev(9), which is not the case. Discussed with: kib	2008-09-26 14:31:24 +00:00
ed	4212d51a7d	Remove unit2minor() use from kernel code. When I changed kern_conf.c three months ago I made device unit numbers equal to (unneeded) device minor numbers. We used to require bitshifting, because there were eight bits in the middle that were reserved for a device major number. Not very long after I turned dev2unit(), minor(), unit2minor() and minor2unit() into macro's. The unit2minor() and minor2unit() macro's were no-ops. We'd better not remove these four macro's from the kernel, because there is a lot of (external) code that may still depend on them. For now it's harmless to remove all invocations of unit2minor() and minor2unit(). Reviewed by: kib	2008-09-26 14:19:52 +00:00
jhb	6ccb676bf2	Regen.	2008-09-25 20:08:36 +00:00
jhb	00776aeb58	Tidy up a few things with syscall generation: - Instead of using a syscall slot (370) just to get a function prototype for lkmressys(), add an explicit function prototype to <sys/sysent.h>. This also removes unused special case checks for 'lkmressys' from makesyscalls.sh. - Instead of having magic logic in makesyscalls.sh to only generate a function prototype the first time 'lkmnosys' is seen, make 'NODEF' always not generate a function prototype and include an explicit prototype for 'lkmnosys' in <sys/sysent.h>. - As a result of the fix in (2), update the LKM syscall entries in the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'. - Use NOPROTO for the __syscall() entry (198) in the native ABI. This avoids the need for magic logic in makesyscalls.h to only generate a function prototype the first time 'nosys' is encountered.	2008-09-25 20:07:42 +00:00
jhb	9c5408c4f9	- Don't do a WITNESS_SAVE() on the interlock if it is Giant in the condition variable wait routines. DROP_GIANT() already manages that state in the Giant interlock case. - Assert that Giant is held when it is passed as a sleep interlock.	2008-09-25 13:42:19 +00:00
jhb	1161006cb6	Part 1 of making shared lookups more resilient with respect to forced unmounts. When we upgrade a vnode lock from shared to exclusive during a name cache lookup, fail the lookup with EBADF if the vnode is invalidated while we are waiting for the exclusive lock. Also, for correctness (though I'm not sure it can occur in practice), downgrade an exclusively locked vnode if it should be share locked. Tested by: pho	2008-09-24 18:51:33 +00:00
jhb	8d01b3e526	Update description of witness_watch.	2008-09-24 18:47:24 +00:00
ed	7993f2b835	Fix a crash when calling tty_rel_free() while draining during closure. Yesterday I got two reports of potential crashes, related to TTY deallocation during device closure. When a thread is in TF_OPENCLOSE, draining its output upon closure, we should not allow calls to tty_rel_free() to happen at the same time. This could cause the TTY to be torn down twice. PR: kern/127561 Reported by: KOIE Hidetaka <koie suri co jp> Discussed with: thompsa	2008-09-24 11:16:09 +00:00
kib	c500808674	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month	2008-09-24 10:14:37 +00:00
ed	94258be421	Track state to determine if the associated TTY device node has been used. It turns out our old TTY layer (and other implementations) block when you read() on a PTY master device of which the slave device node has not been opened yet. Our new implementation just returned 0. This caused applications like telnetd to die in a very subtle way (when child processes would open the TTY later than the first call to select()). Introduce a new flag called PTS_FINISHED, which indicates whether we should block or bail out of a read() or write() occurs. Reported by: Claude Buisson <clbuisson orange fr>	2008-09-23 17:12:25 +00:00
obrien	219d6d1626	style(9)	2008-09-23 14:25:56 +00:00
obrien	8cb3aed24c	Reverse if() logic to improve readability. Reviewed by: ru	2008-09-23 14:25:38 +00:00
ed	1475e942ed	Introduce a hooks layer for the MPSAFE TTY layer. One of the features that prevented us from fixing some of the TTY consumers to work once again, was an interface that allowed consumers to do the following: - `Sniff' incoming data, which is used by the snp(4) driver. - Take direct control of the input and output paths of a TTY, which is used by ng_tty(4), ppp(4), sl(4), etc. There's no practical advantage in committing a hooks layer without having any consumers. In P4 there is a preliminary port of snp(4) and thompsa@ is busy porting ng_tty(4) to this interface. I already want to have it in the tree, because this may stimulate others to work on the remaining modules. Discussed with: thompsa Obtained from: //depot/projects/mpsafetty/...	2008-09-22 19:25:14 +00:00
ed	d49d04e133	Fix style(9) issue in TTY header files: document function argument names. According to style(9), function argument names should only be omitted for prototypes that are exported to userspace. This means we should document the function arguments in the TTY header files, because they are only used in userspace. While there, change the type of the buffer argument of ttydisc_rint_bypass() to `const void ' instead of `char '. Requested by: attilio Obtained from: //depot/projects/mpsafetty/...	2008-09-22 18:44:09 +00:00
jkoshy	9d661b5bf6	Support sparsely numbered CPUs. Requested by: obrien, alfred (long ago)	2008-09-22 10:37:02 +00:00
ed	acdad30e01	Make fstat() on a pseudo-terminal master return sane timestamps. Because pseudo-terminal master file descriptors no longer have a vnode underneath, we have to fill in fstat() values ourselves. Make our implementation somewhat sane by returning the timestamps of the TTY device node that corresponds with our file descriptor. Obtained from: //depot/projects/mpsafettty/...	2008-09-21 19:24:15 +00:00
ed	3c13ffd7b5	Now that the number of clist consumers have dropped massively, trim down the code to prevent useless waste of space. - Remove support for quote bits. There is not a single driver that needs these bits anymore. This means putc() now accepts a char instead of an int. - Remove the unneeded catq() and nextc() routines. They were only used by the old TTY layer. - Convert the clist code to use ANSI C prototypes.	2008-09-21 18:12:18 +00:00
kib	a127656dea	fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(), vattr_null() or by zeroing it. Remove these to allow preinitialization of fields work in vn_stat(). This is needed to get birthtime initialized correctly. Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:50:52 +00:00
kib	ef4f1dc9c7	Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR(). NODEV is more appropriate when va_rdev doesn't have a meaningful value. Submitted by: Jaakko Heinonen <jh saunalahti fi> Suggested by: bde Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:49:15 +00:00
kib	570463af80	Initialize va_rdev to NODEV and va_fsid to VNOVAL before the VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't initialize those fields in VOP_GETATTR() they will have a sane default value. Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:48:24 +00:00
kib	81d455e702	Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't initialize va_vaflags and va_spare because they are not part of the VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero. Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:46:45 +00:00
kib	c6232cabbe	Initialize birthtime fields in vn_stat() to prevent stat(2) from returning uninitialized birthtime. Most file systems don't initialize birthtime properly in their VOP_GETTATTR(). Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month	2008-09-20 19:43:22 +00:00
obrien	0c0da6bba7	Add freebsd32 compat shim for nmount(2). (and quiet some compiler warnings for vfs_donmount)	2008-09-19 15:17:32 +00:00
jhb	a0cba7a231	Various style fixes. 7 space indent is just odd.	2008-09-18 20:10:11 +00:00
jhb	2ba0911fd8	Sort includes.	2008-09-18 20:04:22 +00:00
attilio	23ff3dbeb8	Remove the suser(9) interface from the kernel. It has been replaced from years by the priv_check(9) interface and just very few places are left. Note that compatibility stub with older FreeBSD version (all above the 8 limit though) are left in order to reduce diffs against old versions. It is responsibility of the maintainers for any module, if they think it is the case, to axe out such cases. This patch breaks KPI so __FreeBSD_version will be bumped into a later commit. This patch needs to be credited 50-50 with rwatson@ as he found time to explain me how the priv_check() works in detail and to review patches. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Reviewed by: rwatson	2008-09-17 15:49:44 +00:00
ed	0f8f4f624b	Fix minor TTY API inconsistency. Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine does not unlock the TTY. I once had the idea to make the code call tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This turned out a little harder than I expected, so this is how it works now. It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because the other routines do this anyway.	2008-09-16 14:57:23 +00:00
kib	f6863a9ef7	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:51:06 +00:00
kib	0488506405	Add the ffs structures introspection functions for ddb. Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month	2008-09-16 11:19:38 +00:00
kib	039c5da1b2	Garbage-collect vn_write_suspend_wait(). Suggested and reviewed by: tegge Tested by: pho MFC after: 1 month	2008-09-16 11:09:26 +00:00
sam	05a7094fc1	Make ddb command registration dynamic so modules can extend the command set (only so long as the module is present): o add db_command_register and db_command_unregister to add and remove commands, respectively o replace linker sets with SYSINIT's (and SYSUINIT's) that register commands o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table for registering top-level commands, show operands, and show all operands, respectively While here also: o sort command lists o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases for existing commands o add "show all trace" as an alias for "show alltrace" o add "show all locks" as an alias for "show alllocks" Submitted by: Guillaume Ballet <gballet@gmail.com> (original version) Reviewed by: jhb MFC after: 1 month	2008-09-15 22:45:14 +00:00
jhb	a55e334c2b	Expose a new public routine intr_event_execute_handlers() which executes all the non-filter handlers attached to an interrupt event. This can be used by device drivers which multiplex their interrupt onto the interrupt handlers for child devices.	2008-09-15 22:19:44 +00:00
attilio	00ea27d0c3	- For any lock list we hold the head in order to reduce allocation from the free list and in this way avoid contention on the w_mtx. In order to make the code simple, we rely on the rule that when the head has not a child it also doesn't have other subsequent entries. Actually this assertion is broken because we can free all the head children and quit witness_unlock() with the head still allocated, with no children and subsequent entries present. Fix this by shifting the head if other entries are present and still freeing the object, but leaving always an head. - Fix witness_thread_has_locks() in order to report, correctly, if the lock list linked to a specific thread has children or not based on the above explained rule. - Fix a printout into DDB's "show alllocks" command in order to show, correctly, the process name that is really what we want. - Fix style(9) for a comment. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Reported by: Marko Kiiskila <marko dot kiiskila at nokia dot com> Sponsored by: Nokia	2008-09-12 21:44:01 +00:00
csjp	1fa65beb80	Make sure the TTY has not disappeared out from under us before calling ttydevsw_outwakeup(). This should fix panics which occur after remote login sessions timeout during moderate TTY activity. An example of where this might occur is where a pending write to the terminal is occurring while sshd(8) is shutting down the TTY after a TCP timeout. Submitted by: ed	2008-09-10 20:12:10 +00:00
jhb	af0471aaec	Teach WITNESS about the interlocks used with lockmgr. This removes a bunch of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio	2008-09-10 19:13:30 +00:00
jhb	fd768740df	Various whitespace fixes.	2008-09-10 17:59:21 +00:00
trasz	9303940ffe	Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID. Approved by: rwatson (mentor)	2008-09-10 13:16:41 +00:00
jhb	7851995759	- Reduce scope of #ifdef's in uma_zcreate() call in init_turnstile0(). - Set UMA_ZONE_NOFREE so that the per-turnstile spin locks are type stable to avoid a race where one thread might dereference a lock in a free'd turnstile that was previously used by another thread. Theorized by: tegge (2) MFC after: 1 week	2008-09-08 21:40:15 +00:00
jhb	f147e876b7	Close a race in sleepq_broadcast() where the sleepq could be reused after it had been assigned to the last sleeping thread. That thread might have started running on another CPU and have reused that sleep queue. Fix it by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than a while loop. PR: amd64/124200 Discovered by: tegge Tested by: benjsc MFC after: 1 week	2008-09-08 19:44:57 +00:00
bz	cb1cd5ee09	Catch a possible NULL pointer deref in case the offsets got mangled somehow. As a consequence we may now get an unexpected result(). Catch that error cases with a well defined panic giving appropriate pointers to ease debugging. () While the concensus was that the case should never happen unless there was a bug, noone was definitively sure. Discussed with: kmacy (about 8 months back) Reviewed by: silby (as part of a larger patch in March) MFC after: 2 months	2008-09-07 13:09:04 +00:00
ed	e4b90a03d0	Make TIOCCONS use priv_check() instead of checking /dev/console permissions. As discussed with Robert on IRC, checking the permissions on /dev/console to see if we can call TIOCCONS could be unreliable. When we run a chroot() without a devfs instance mounted inside, it won't actually check the permissions on the device node inside the devfs instance. Using the already existing PRIV_TTY_CONSOLE for this seems like a better idea. Approved by: rwatson	2008-09-06 14:43:32 +00:00
ed	bf17d6c233	Fix a small typo in a comment in calcru1(). The word "happene" should read "happened". Submitted by: Jille Timmermans <jille quis cx>	2008-09-05 15:55:06 +00:00
davidxu	40259861b6	Fix LOR between vnode lock and internal mqueue locks.	2008-09-05 07:32:57 +00:00
thompsa	6beefd0e39	Remove the alignment of the align parameter. This is up to the caller to pass in and it breaks tap(4) on strict alignment machines as m_uiotombuf is called with ETHER_ALIGN. Found by: Jared Go Reviewed by: emax MFC after: 3 days	2008-09-05 04:05:31 +00:00
davidxu	74474a30a7	Fix lock name conflict. PR: kern/127040	2008-09-05 02:07:25 +00:00
ed	fa5c2849f2	Implement pts(4) packet mode. As reported by several users on the mailing lists, applications like screen(1) fail to properly handle ^S and ^Q characters. This was because MPSAFE TTY didn't implement packet mode (TIOCPKT) yet. Add basic packet mode support to make these applications work again. Obtained from: //depot/projects/mpsafetty/...	2008-09-04 16:39:02 +00:00
ed	28aa9d1022	Fix an awful bug inside our COMPAT_43TTY code. When I migrated tty_compat.c to MPSAFE TTY, I just hooked it up to the build and fixed it until it compiled and somewhat worked. It turns out this was not the smartest thing, because the old TTY layer also had a field called t_flags, which contained a set of sgtty flags. This means our current COMPAT_43TTY code overwrites the TTY flags, causing all strange problems to occur. Fix this code to use a new struct member called t_compatflags. This commit may cause kern/127054 to be fixed, but this still has to be tested/confirmed by the originator. It has to be fixed anyway. PR: kern/127054	2008-09-04 16:30:53 +00:00
kevlo	9f7bbf786b	If the process id specified is invalid, the system call returns ESRCH	2008-09-04 10:44:33 +00:00
simon	6bb93e188c	- Fix amd64 local privilege escalation. [08:07] - Fix nmount(2) local privilege escalation. [08:08] - Fix IPv6 remote kernel panics. [08:09] Fix for [08:07] is merge of r181823. Submitted by: kib [08:07], csjp [08:08], bz [08:09] Reviewed by: peter [08:07], jhb [08:07] Reviewed by: jinmei [08:09], rwatson [08:09] Approved by: re (SA blanket) Approved by: so (simon) Security: FreeBSD-SA-08:07.amd64 Security: FreeBSD-SA-08:08.nmount Security: FreeBSD-SA-08:09.icmp6	2008-09-03 19:09:47 +00:00
ed	8496708649	Use size_t to store the return value of ttydisc_getc(). The ttydisc_getc() routine obtains a read length from ttyoutq_read(). For no valid reason, the current code stores this value in an int, and returns a size_t. There is no need to perform this useless conversion. Obtained from: //depot/projects/mpsafetty/...	2008-09-02 17:13:11 +00:00
rwatson	34b4039202	Remove XXXRW in soreceive_dgram that proves unnecessary. Remove unused orig_resid variable in soreceive_dgram. Submitted by: alfred X-MFC with: soreceive_dgram (r180198, r180211)	2008-09-02 16:55:21 +00:00
pjd	9f3f074340	When setting error to EINVAL in 'fvp == tdvp' case, jump to out label, because if not, the error will be later overwritten by mac_vnode_check_rename_to() call. Reviewed by: rwatson	2008-09-01 10:11:39 +00:00
attilio	e2ca413d09	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>	2008-08-31 14:26:08 +00:00
attilio	d6333c0f4d	- Improve some witness_watch operability in code which does perform both lock tracking and checks, doing just the former ones. - Fix a bug where sysctl utility was printing crazy values when setting a new value for debug.witness.watch [0] [0] Reported by: yongari	2008-08-30 13:20:35 +00:00
ed	fa61dcef0f	Fix some edge cases in the TTY queues: - In the current design, when a TTY decreases its baud rate, it tries to shrink the queues. This may not always be possible, because it will not free any blocks that are still filled with data. Change the TTY queues to store a `quota' value as well, which means it will not free any blocks when changing the baud rate, but when placing blocks back into the queue. When the amount of blocks exceeds the quota, they get freed. It also fixes some edge cases, where TIOCSETA during read()/ write()-calls could actually make the queue a tiny bit bigger than in normal cases. - Don't leak blocks of memory when calling TIOCSETA when the device driver abandons the TTY while allocating memory. - Create ttyoutq_init() and ttyinq_init() to initialize the queues, instead of initializing them by hand. The new TTY snoop driver also creates an outq, so it's good to have a proper interface to do this. Obtained from: //depot/projects/mpsafetty/...	2008-08-30 09:18:27 +00:00
attilio	8f53106a9e	- Make witness_watch a 3 state value. 1 means that witness is up and running. 0 means that witness is disabled but that it can be established later again in effective way. -1 means that witness is disabled permanently - Fix a bug causing kernel to panic on witness disabling through witness_watch. lock lists queues were still full of entries and this was causing throubles with debugging stubs (like witness_thread_exit()). Reported by: kris, yongari Sponsored by: Nokia	2008-08-29 15:47:53 +00:00
ed	e9104ac4da	Backport two small fixes from the MPSAFE TTY branch in Perforce: - Implement IMAXBEL. It turned out the IMAXBEL termios switch was marked as supported, while it had not been implemented. - Don't go into the high watermark when in canonical mode, no data has been canonicalized and the input buffer is full. This caused the terminal to lock up. This prevented users from pressing backspace/^U/etc in such cases. This could easily be simulated by pasting a very big amount of data in a shell with sh(1) in canonical mode. Obtained from: //depot/projects/mpsafetty/...	2008-08-29 15:02:50 +00:00
davidxu	b08de3bf2f	Don't remove queued SIGCHLD if options contain WNOWAIT, so other threads still can be notified by the signal.	2008-08-29 01:34:05 +00:00
trhodes	4076a3a80f	Fix a typo in r180291 "NAme of the current YP/NIS domain" -> "Name of the current YP/NIS domain"	2008-08-28 23:52:34 +00:00
ed	92daa6b410	Make ureadc() warn when holding any locks, just like uiomove(). A couple of months ago I was quite impressed, because when I was writing code, I discovered that uiomove() would not allow any locks to be held, while ureadc() did, mainly because ureadc() is implemented using the same building blocks as uiomove(). Let's see if this triggers any aditional witness warnings on our source tree. Reviewed by: atillio	2008-08-28 19:34:58 +00:00
attilio	dbf35e279f	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
kib	dd53532f93	Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month	2008-08-28 09:08:15 +00:00
ed	17e49589f8	Properly unlock the init/lock-state devices when invoking TIOCSETA. For some reason a return-statement crept into this code, where it shouldn't belong. This means we didn't properly unlock the TTY before returning to userspace. Submitted by: Tor Egge <tor egge cvsup no freebsd org>	2008-08-27 19:37:21 +00:00
jhb	8011873cf5	- Only count the number of CPUs in the rendezvous map once rather than doing it on every CPU. - Use CPU_ABSENT() rather than pcpu_find() to determine if a CPU is not present. - Count up to mp_maxid rather than MAXCPU when iterating over CPUs to match the rest of the code in the kernel. MFC after: 1 week	2008-08-27 18:23:55 +00:00
kib	05dac85e4b	Implement WNOWAIT flag for wait4(2). It specifies that process whose status is returned shall be kept in the waitable state. Add WSTOPPED as an alias for WUNTRACED. Submitted by: Jukka Ukkonen <jau at iki fi> PR: standards/116221 MFC after: 2 weeks	2008-08-26 12:37:16 +00:00
kib	2d990eae05	When calculating arguments to the interpreter for the shebang script executed by fexecve(2), imgp->args->fname is NULL. Moreover, there is no way to recover the path to the script being executed. Do what some other U*ixes do unconditionally, namely supply /dev/fd/n as the script path when called from fexecve(). Document requirement of having fdescfs mounted as caveat.	2008-08-26 10:53:32 +00:00
jhb	3ed53c265b	Resort a few accessor routines so that they are consistently grouped with 'set_foo/get_foo' adjacent to each other.	2008-08-25 16:16:57 +00:00
rwatson	acf5da1d35	More fully audit fexecve(2) and its arguments. Obtained from: TrustedBSD Project Sponsored by: Google, Inc.	2008-08-25 13:50:01 +00:00
rwatson	70366e8fcc	Regenerate following r182123.	2008-08-24 21:23:08 +00:00
rwatson	6a45d33f33	When MPSAFE ttys were merged, a new BSM audit event identifier was allocated for posix_openpt(2). Unfortunately, that identifier conflicts with other events already allocated to other systems in OpenBSM. Assign a new globally unique identifier and conform better to the AUE_ event naming scheme. This is a stopgap until a new OpenBSM import is done with the correct identifier, so we'll maintain this as a local diff in svn until then. Discussed with: ed Obtained from: TrustedBSD Project	2008-08-24 21:20:35 +00:00
csjp	e30e00f1b7	Remove worrying printf warning on bootup when processing vnodes which have NULL mount-points. This is the case for special vnodes, such as the one used in nameiinit() which is used for crossing mount points in lookup() to avoid lock ordering issues. MFC after: 2 weeks Discussed with: rwatson, kib	2008-08-24 20:16:44 +00:00
ed	7eb7818496	Allow the user to suppress the rate-limited pty(4) warning. The pty(4) driver raises up to warnings when an old BSD-style PTY is created. The reason why I added this warning, was to make it easier to spot applications that allocate BSD-style PTY's, while they should just use openpty() or posix_openpt(). Add a sysctl, which allows you to override the number of remaining messages, making it possible to suppress the warnings. Requested by: kib Reviewed by: kib	2008-08-23 16:03:00 +00:00
rwatson	78a117e6fa	Introduce two related changes to the TrustedBSD MAC Framework: (1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2) so that the general exec code isn't aware of the details of allocating, copying, and freeing labels, rather, simply passes in a void pointer to start and stop functions that will be used by the framework. This change will be MFC'd. (2) Introduce a new flags field to the MAC_POLICY_SET(9) interface allowing policies to declare which types of objects require label allocation, initialization, and destruction, and define a set of flags covering various supported object types (MPC_OBJECT_PROC, MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...). This change reduces the overhead of compiling the MAC Framework into the kernel if policies aren't loaded, or if policies require labels on only a small number or even no object types. Each time a policy is loaded or unloaded, we recalculate a mask of labeled object types across all policies present in the system. Eliminate MAC_ALWAYS_LABEL_MBUF option as it is no longer required. MFC after: 1 week ((1) only) Reviewed by: csjp Obtained from: TrustedBSD Project Sponsored by: Apple, Inc.	2008-08-23 15:26:36 +00:00
jhb	6c646c3b72	Fix a race condition with concurrent LOOKUP namecache operations for a vnode not in the namecache when shared lookups are enabled (vfs.lookup_shared=1, it is currently off by default) and the filesystem supports shared lookups (e.g. NFS client). Specifically, if multiple concurrent LOOKUPs both miss in the name cache in parallel, each of the lookups may each end up adding an entry to the namecache resulting in duplicate entries in the namecache for the same pathname. A subsequent removal of the mapping of that pathname to that vnode (via remove or rename) would only evict one of the entries from the name cache. As a result, subseqent lookups for that pathname would still return the old vnode. This race was observed with shared lookups over NFS where a file was updated by writing a new file out to a temporary file name and then renaming that temporary file to the "real" file to effect atomic updates of a file. Other processes on the same client that were periodically reading the file would occasionally receive an ESTALE error from open(2) because the VOP_GETATTR() in nfs_open() would receive that error when given the stale vnode. The fix here is to check for duplicates in cache_enter() and just return if an entry for this same directory and leaf file name for this vnode is already in the cache. The check for duplicates is done by walking the per-vnode list of name cache entries. It is expected that this list should be very small in the common case (usually 0 or 1 entries during a cache_enter() since most files only have 1 "leaf" name). Reviewed by: ups, scottl MFC after: 2 months	2008-08-23 15:13:39 +00:00
ed	b738ca88a2	Remove unused tty_gone() checks inside ttyoutq_read_uio(). When my earlier MPSAFE TTY prototypes still implemented line disciplines, we needed a mechanism to abort read()'s on PTY master devices when inside the line discipline. Because this is no longer the case, these checks have become unneeded.	2008-08-23 13:32:21 +00:00
rodrigc	bef9b4336c	In nmount(), when we see the "force" option, set the MNT_FORCE flag, but do not persist "force" in the options list, since it is a command, not a persistent property of a mount. Similarly, when we see "reload", set MNT_RELOAD, but delete "reload" from the options list. MFC after: 1 week	2008-08-23 01:16:09 +00:00
kmacy	df8989694a	Submit a band-aid for interrupt set up race. MFC after: 1 month	2008-08-22 23:24:53 +00:00
ed	a6b774bc3b	Fix two small bugs in tcsetattr(). - According to POSIX, tcsetattr() must not fail when any of the bits in the structure are unsupported, but it must leave the unsupported flags alone. - The CIGNORE flag (set by TCSASOFT, extension) was not cleared from c_cflag, which means using it would cause it to be applied during its entire lifespan. Eventually make sure we clear the flag. I don't really like CIGNORE, but I think we must keep it alive right now. With our new TTY layer, we don't actually need this mechanism, because if you leave c_cflag, c_ispeed and c_ospeed alone, we won't make a call into the device driver anyway. Reported by: naddy Tested by: naddy	2008-08-22 21:27:37 +00:00
jhb	b054f3f992	A suspended thread can, in fact, be swapped out. Thus, thread_unsuspend_one() needs to optionally wakeup the swapper. Since we hold the thread lock for that entire function, however, we have to push that requirement up into the caller. Found by: rwatson	2008-08-22 16:15:58 +00:00
jhb	b908d9aa36	Use \|= rather than += when aggregrating requests to wakeup the swapper. What we really want is an inclusive or of all the requests, and += can in theory roll over to 0.	2008-08-22 16:14:23 +00:00
ed	7c4fe3955e	Fix pts(4) error codes when slave device is closed. Unlike pre-MPSAFE TTY, the pts(4) driver always returned ENXIO when a read() or write() was performed on a pseudo-terminal master device when the slave device was not opened. The old implementation had different semantics: - When the slave device had not been opened yet, read() and write() just blocked. - When the slave device had been closed, a read() call would return 0 bytes length. - When the slave device had been closed, a write() call would return EIO. Change the new implementation to return 0 and EIO as well. We don't implement the first rule, but I suspect this is not needed, because routines like openpty() also open the slave device node. posix_openpt() users also do similar things. Reported by: rink Tested by: rink	2008-08-22 10:40:21 +00:00
ed	deab1dbbf7	Prevent VSTART flooding when turning on software flow control. It turned out we transmitted VSTART after each successful read on a TTY when software flow control was turned on. This was because of a very evil bug where we tested the TF_HIWAT_IN flag the other way around. Reported by: Christian Weisgerber <naddy mips inka de>	2008-08-22 05:15:52 +00:00
obrien	3b12eba1b0	Add comments on NOARGS, NODEF, and NOPROTO.	2008-08-21 22:57:31 +00:00
ed	2be6ecbc22	Properly lock proctree_lock before locking the process while accounting. During the import of the MPSAFE TTY layer (r181905), I changed acct_process() to lock proctree_lock instead of SESS_LOCK, because s_ttyp is now locked using proctree_lock. One of the things I forgot, was to lock it before we PROC_LOCK. Commit this patch, written by kib@. To ensure we hold proctree_lock as short as possible, obtaining `ac_tty' has now been made the first step of filling `acct'. Reported by: Kevin <kevinxlinuz 163 com> Solved by: kib	2008-08-21 15:02:17 +00:00
ed	ae0c3320a7	Remove the now unused `lbolt' variable from the kernel. We used to have a single wait channel inside the kernel which could be used by threads that just wanted to sleep for some time (the next second). The old TTY layer was the only piece of code that still used lbolt, because I already removed the use of lbolt from the NFS clients and the VFS syncer. Approved by: philip	2008-08-20 12:20:22 +00:00
kmacy	37e5c521d0	remove scheduler_running as xenbus no longer needs it MFC after: 1 month	2008-08-20 09:21:24 +00:00
ed	4b93c9151b	Update system call tables. The previous commit also included changes to all the system call lists, but it is a tradition to update these lists in a second commit, so rerun make sysent to update the $FreeBSD$ tags inside these files to refer to the latest version of syscalls.master. Requested by: rwatson	2008-08-20 08:39:10 +00:00
ed	cc3116a938	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
kib	ee27d03e64	In brelse, put the B_NEEDSGIANT buffer on the QUEUE_DIRTY_GIANT queue, instead of QUEUE_DIRTY. Tested by: pho Reviewed by: attilio MFC after: 3 days	2008-08-19 11:31:49 +00:00
bz	1021d43b56	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
alfred	f8f6317629	Prevent crashes due to unlocked access to hash buckets in two sysctls. Use CACHE_LOCK to prevent crashes. Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash. Obtained from: Juniper Networks MFC After: 1 week	2008-08-16 21:48:10 +00:00
kmacy	716fc76367	Add flag to indicate to xen support code that threads are running (and thus we can block). MFC after: 1 month	2008-08-15 21:03:13 +00:00
attilio	ff459eb3cf	Introduce some WITNESS improvements: - Speedup the lock orderings lookup modifying the witness graph from a linked tree to a matrix. A table lookup caches the lock orderings in order to make a O(1) access for them. Any witness object has an unique index withing this lookup cache table. - Reduce the lock contention on w_mtx acquiring it only when the LOR actually happens and not in a sane case. In order to do this don't totally flush lock lists (per-CPU spinlocks list and per-thread sleeplocks list) but check for ll_count anytime we need to have to verify allocations sanity. - Introduce the function witness_thread_exit() in the witness namespace which should verify a thread doesn't hold any witness occurrence why exiting. - Rename the sysctl debug.witness.graphs into debug.witness.fullgraph and add debug.witness.badstacks which prints out stacks for LOR revealed. This is implemented using the stack(9) support, which makes WITNESS to be dependent by the STACK option or by the DDB (including STACK) option. - Fix style(9) for src/sys/kern/subr_witness.c The hash table approach has been developed by Ilya Maykov on the behalf of Isilon Systems which kindly released the patch. Jeff Roberson, ported the patch to -CURRENT and fixed w_mtx contention, on the behalf of Nokia. Submitted by: Ilya Maykov <ivmaykov at gmail dot com> (Isilon Systems), jeff Sponsored by: Nokia	2008-08-13 18:24:22 +00:00
csjp	0cdadff20e	Reduce the scope of the vnode lock such that it does not cover the various copyouts associated with initializing the process's argv/env data in userspace. It is possible that these copyout operations can fault under memory pressure, possibly resulting in dead locks. This is believed to be safe since none of the copyout_strings() operations need to interact with the vnode here. Submitted by: Zhouyi Zhou PR: kern/111260 Discussed with: kib MFC after: 3 weeks	2008-08-12 21:27:48 +00:00
kib	ca3c43733a	Revert r181345. Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days	2008-08-10 12:15:36 +00:00
ed	746d949d89	Remove unneeded D_NEEDGIANT from /dev/fd/{0,1,2}. There is no reason the fdopen() routine needs Giant. It only sets curthread->td_dupfd, based on the device unit number of the cdev. I guess we won't get massive performance improvements here, but still, I assume we eventually want to get rid of Giant.	2008-08-09 12:42:12 +00:00
des	c2c1c946ae	Add sbuf_new_auto as a shortcut for the very common case of creating a completely dynamic sbuf. Obtained from: Varnish MFC after: 2 weeks	2008-08-09 11:14:05 +00:00
des	50ef01bba1	Switch to simplified BSD license (with phk's approval), plus whitespace and style(9) cleanup.	2008-08-09 10:26:21 +00:00
jhb	e306c86e1b	Permit Giant to be passed as the explicit interlock either to msleep/mtx_sleep or the various cv_wait() routines. Currently, the "unlock" behavior of PDROP and cv_wait_unlock() with Giant is not permitted as it is will be confusing since Giant is fully unrecursed and unlocked during a thread sleep. This is handy for subsystems which wish to allow unlocked drivers to continue to use Giant such as CAM, the new TTY layer, and the new USB stack. CAM currently uses a hack that I told Scott to use because I really didn't want to permit this behavior, and the TTY and USB patches both have various patches to permit this. MFC after: 2 weeks	2008-08-07 21:00:13 +00:00
jhb	8af56fb687	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
jhb	11d83a7f89	Close two different races with concurrent opens of pty master devices that could result in leaked ttys or a leaked pty + tty pair. MFC after: 1 week	2008-08-04 19:51:23 +00:00
jhb	e6ae2f5414	- Close a race with concurrent open's of a pts master device which could result in leaked tty structures. - When constructing a new pty, allocate it's tty structure before adding it to the list. MFC after: 1 week	2008-08-04 19:49:05 +00:00
antoine	4dc3acdf62	Kill a dead variable PR: 126223 Submitted by: Mateusz Guzik	2008-08-03 21:07:19 +00:00
rwatson	8883e5c019	Remove broken code to replace st_mode value with ACCESSPERMS when lstat(2) is called on symlinks -- this code appears never to have worked. The PR this addresses suggests that the intended original behavior is the right one, but as bde points out in the PR comments, we do actually support storing a mode on symlinks, so returning it seems reasonable. This is consistent with Mac OS X, which despite documentation to the contrary does return the mode set on a symlink, but not some other platforms. The Single Unix Spec requires only that the returned bits be "meaningful", which seems at best unhelpful as advice goes. PR: 25018 MFC after: 3 days	2008-08-03 15:44:56 +00:00
kib	52aa4f35d0	Calling linker_load_dependencies() while holding the module' vnode lock may cause a LOR between kld_sx lock and vnode lock. linker_load_dependencies() drops kld_sx, and another thread may attempt to load the same kld. Reported and tested by: pjd MFC after: 1 week	2008-08-03 13:33:45 +00:00
sam	f28149353a	add callout_schedule; besides being useful it also improves compatibility with other systems Reviewed by: ed, battlez	2008-08-02 17:42:38 +00:00
csjp	743d0edd92	Currently, BSM audit pathname token generation for chrooted or jailed processes are not producing absolute pathname tokens. It is required that audited pathnames are generated relative to the global root mount point. This modification changes our implementation of audit_canon_path(9) and introduces a new function: vn_fullpath_global(9) which performs a vnode -> pathname translation relative to the global mount point based on the contents of the name cache. Much like vn_fullpath, vn_fullpath_global is a wrapper function which called vn_fullpath1. Further, the string parsing routines have been converted to use the sbuf(9) framework. This change also removes the conditional acquisition of Giant, since the vn_fullpath1 method will not dip into file system dependent code. The vnode locking was modified to use vhold()/vdrop() instead the vref() and vrele(). This will modify the hold count instead of modifying the user count. This makes more sense since it's the kernel that requires the reference to the vnode. This also makes sure that the vnode does not get recycled we hold the reference to it. [1] Discussed with: rwatson Reviewed by: kib [1] MFC after: 2 weeks	2008-07-31 16:57:41 +00:00
ed	faa0cddcb0	Remove the use of lbolt from the VFS syncer. It seems we only use `lbolt' inside the VFS syncer and the TTY layer now. Because I'm planning to replace the TTY layer next month, there's no reason to keep `lbolt' if it's only used in a single thread inside the kernel. Because the syncer code wanted to wake up the syncer thread before the timeout, it called sleepq_remove(). Because we now just use a condvar(9) with a timeout value of `hz', we can wake it up using cv_broadcast() without waking up any unrelated threads. Reviewed by: phk	2008-07-30 12:39:18 +00:00
ed	870de37626	Don't make subr_clist.c depend on the TTY layer. After the import of the new TTY layer, the TTY_QUOTE definition will not be present anymore. To make sure clists will still work as expected, introduce an internal definition called QUOTEMASK. Maybe we can decide to remove the quote bits entirely, but we still have to look into this. There may be drivers that still use the quote bits. Obtained from: //depot/projects/mpsafetty	2008-07-30 12:32:42 +00:00
jhb	bed722e078	When choosing a CPU for a thread in a cpuset, prefer the last CPU that the thread ran on if there are no other CPUs in the set with a shorter per-CPU runqueue.	2008-07-28 20:39:21 +00:00
jhb	421b41fe8c	Really fix this.	2008-07-28 18:33:43 +00:00
pjd	642dbd51b0	Properly check if td_name is empty and if it is, print process name, instead of empty thread name. Reviewed by: jhb	2008-07-28 18:10:26 +00:00
jhb	68f0af82de	Implement support for cpusets in the 4BSD scheduler. - When a cpuset is applied to a thread, walk the cpuset to see if it is a "full" cpuset (includes all available CPUs). If not, set a new TDS_AFFINITY flag to indicate that this thread can't run on all CPUs. When inheriting a cpuset from another thread during thread creation, the new thread also inherits this flag. It is in a new ts_flags field in td_sched rather than using one of the TDF_SCHEDx flags because fork() clears td_flags after invoking sched_fork(). - When placing a thread on a runqueue via sched_add(), if the thread is not pinned or bound but has the TDS_AFFINITY flag set, then invoke a new routine (sched_pickcpu()) to pick a CPU for the thread to run on next. sched_pickcpu() walks the cpuset and picks the CPU with the shortest per-CPU runqueue length. Note that the reason for the TDS_AFFINITY flag is to avoid having to walk the cpuset and examine runq lengths in the common case. - To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array of counters to hold the length of the per-CPU runqueues and update them when adding and removing threads to per-CPU runqueues. MFC after: 2 weeks	2008-07-28 17:25:24 +00:00
jhb	69cc3c8c8a	Various and sundry style and whitespace fixes.	2008-07-28 15:52:02 +00:00
kmacy	70741e0245	- track maximum wait time - resize columns based on actual observed numerical values MFC after: 3 days	2008-07-27 21:45:20 +00:00
pjd	3f1807709d	Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel() functions. Reviewed by: kib	2008-07-27 11:48:15 +00:00
pjd	4dd19696a7	- Move vp test for beeing NULL under IGNORE_LOCK(). - Check if panicstr isn't set, if it is ignore the lock. This helps to avoid confusion, because lockmgr is a no-op when panicstr isn't NULL, so asserting anything at this point doesn't make sense and can just race with other panic. Discussed with: kib	2008-07-27 11:46:42 +00:00
trhodes	56ab14a8ae	Fill in a few sysctl descriptions. Approved by: rwatson	2008-07-26 00:55:35 +00:00
ed	c9af5459f4	Move ttyinfo() into its own C file. The ttyinfo() routine generates the fancy output when pressing ^T. Right now it is stored in tty.c. In the MPSAFE TTY code it is already stored in tty_info.c. To make integration of the MPSAFE TTY code a little easier, take the same approach. This makes the TTY code a little bit more readable, because having the proc_/thread_ routines in tty.c is very distractful. Approved by: philip (mentor)	2008-07-25 14:31:00 +00:00
kib	e2333a32b6	Call pargs_drop() unconditionally in do_execve(), the function correctly handles the NULL argument. Make pargs_free() static. MFC after: 1 week	2008-07-25 11:55:32 +00:00
kib	a0a58ba099	s/alredy/already/ in the comments and the log message.	2008-07-25 11:22:25 +00:00
kib	42aeaf36b0	Do the pargs_hold() on the copy of the pointer to the p_args of the child process immediately after bulk bcopy() without dropping the process lock. Since process is not single-threaded when forking, dropping and reacquiring the lock allows an other thread to change the process title of the parent in between, and results in hold being done on the invalid pointer. The problem manifested itself as the double free of the old p_args. Reported by: kris Reviewed by: jhb MFC after: 1 week	2008-07-23 08:45:25 +00:00
attilio	823ce79a5b	- Disallow XFS mounting in write mode. The write support never worked really and there is no need to maintain it. - Fix vn_get() in order to let it call vget(9) with a valid locking request. vget(9) returns the vnode locked in order to prevent recycling, but in this case internal XFS locks alredy prevent it from happening, so it is safe to drop the vnode lock before to return by vn_get(). - Add a VNASSERT() in vget(9) in order to catch malformed locking requests. Discussed with: kan, kib Tested by: Lothar Braun <lothar at lobraun dot de>	2008-07-21 23:01:09 +00:00
rwatson	f3c6f1e959	If run_interrupt_driven_config_hooks() waits 360 seconds and INVARIANTS is compiled into the kernel, then panic. MFC after: 3 days Discussed with: scottl	2008-07-21 20:50:49 +00:00
pjd	9d11b5b5b3	Implement the following macros for completeness: SYSCTL_QUAD() SYSCTL_ADD_QUAD() TUNABLE_QUAD() TUNABLE_QUAD_FETCH() Now we can use 64bit tunables on 32bit systems.	2008-07-21 15:05:25 +00:00
kmacy	565bc001a5	Add accessor functions for socket fields. MFC after: 1 week	2008-07-21 00:49:34 +00:00
alc	08181df483	Eliminate dead code. (The commit message for revision 1.287 explains why this code is dead.)	2008-07-20 04:13:51 +00:00
rwatson	b53b96f01c	Rather than simply waiting silently and indefinitely for all interrupt-driven configuration handlers to complete, print out a diagnostic message every 60 second indicating which handlers are still running. Do this at most 5 times per run so as to avoid scrolling out any useful information from the kernel message buffer. The interval of 60 seconds was selected based on a best guess as to the nature of "long enough" and may want to be tuned higher or lower depending on real-world tolerances. MFC after: 3 days Discussed with: scottl	2008-07-19 19:08:35 +00:00
rwatson	2df3fcd0c6	witness_addgraph() is required even if DDB isn't compiled into the kernel, so exclude it from #ifdef DDB. Submitted by: attilio	2008-07-19 17:47:23 +00:00
rwatson	8fd5cf995c	Add DDB "show conifhk" command, which lists hooks currently waiting for completion in run_interrupt_driven_config_hooks(). This is helpful when trying to figure out which device drivers have gone into la-la land during boot-time autoconfiguration. MFC after: 3 days	2008-07-19 12:12:54 +00:00
jeff	7ff6e9903f	Fix a race which could result in some timeout buckets being skipped. - When a tick occurs on a cpu, iterate from cs_softticks until ticks. The per-cpu tick processing happens asynchronously with the actual adjustment of the 'ticks' variable. Sometimes the results may be visible before the local call and sometimes after. Previously this could cause a one tick window where we didn't evaluate the bucket. - In softclock fetch curticks before incrementing cc_softticks so we don't skip insertions which were made for the current time. Sponsored by: Nokia	2008-07-19 05:18:29 +00:00
jeff	b2f69d1b1e	- Check whether we've recorded this tick in ts_ticks on another cpu in sched_tick() to prevent multiple increments for one tick. This pushes the value out of range and breaks priority calculation. Reviewed by: kib Found by: pho/nokia Sponsored by: Nokia MFC after: 3 days	2008-07-19 05:13:47 +00:00
kmacy	6dfc39c2b6	revert local change	2008-07-18 07:10:33 +00:00
kmacy	eacfaa0e61	revert change from local tree	2008-07-18 07:07:57 +00:00
kmacy	c01ed5ad9b	import vendor fixes to cxgb	2008-07-18 06:12:31 +00:00
kib	eff9ee09b4	Pair the VOP_OPEN call from do_execve() with the reciprocal VOP_CLOSE. This was unnoticed because local filesystems usually do nothing non-trivial in the close vop. Reported and tested by: Rick Macklem MFC after: 2 weeks	2008-07-17 16:44:07 +00:00
antoine	89ca3c5933	Staticize M_STACK. Approved by: rwatson (mentor) MFC after: 1 month	2008-07-13 17:15:05 +00:00
rodrigc	f280e5ed8f	In nmount(), if we see "update" in the mount options, set MNT_UPDATE in fsflags, and delete the "update" option from the global mount options. MNT_UPDATE is a command, and not a property of a mount that should persist after the command is executed. We need to do similar things for MNT_FORCE and MNT_RELOAD. All mount flags are prefixed by MNT_..... it would be nice if flags which were commands were named differently from flags which are persistent properties of a mount. This was not such a big deal in the pre-nmount() days, but with nmount() it is more important. Requested by: yar MFC after: 2 weeks	2008-07-12 20:12:40 +00:00
obrien	fa9172e3f7	Improve readability and cscope searches a little bit by not using the same variable name in closely related (but not conflicting) contexts.	2008-07-11 14:48:28 +00:00
kib	da671c0533	Make it atomic for the devfs_populate_loop() to see the setting of SI_ALIAS flag and initialization of the si_parent when alias is created. Assert that supplied parent device is not NULL. Both situations could cause NULL dereference in the devfs_populate_loop() when creating a symlink for SI_ALIAS'ed device. Namely, cdp->cdp_c.si_parent may be NULL. Reported by: mav MFC after: 2 weeks	2008-07-11 11:22:19 +00:00
obrien	3b9db50b75	Revert r180431. r180431 broke the AMD64 build (the only arch using kern/link_elf_obj.c)	2008-07-11 01:10:40 +00:00
obrien	0bc4bc025d	Allow 'elf_file_t' to be used in a wider scope.	2008-07-10 16:35:57 +00:00
edwin	e80b338f3b	Improve the output of kldload(8) to show which module can't be loaded. Was: kldload: Unsupported file type Is now: kldload: /boot/modules/test.ko: Unsupported file type PR: kern/121276 Submitted by: Edwin Groothuis <edwin@mavetju.org> Approved by: bde (mentor) MFC after: 1 week	2008-07-08 23:51:38 +00:00
bz	f93b85c0df	Add a `show cpusets' DDB command to print numbered root and assigned CPU affinity sets. Reviewed by: brooks	2008-07-07 21:32:02 +00:00
bz	6988e35234	MFp4 144659: Plug a memory leak with jail services. PR: 125257 Submitted by: Mateusz Guzik <mjguzik gmail.com> MFC after: 6 days	2008-07-07 20:53:49 +00:00
bz	cf63123d06	Move cpuset_refroot and cpuset_refbase functions up, grouping the cpuset_ref* functions together. Will make it easier to read and add code without forward declarations. No functional changes.	2008-07-07 20:45:55 +00:00
kib	d39c6bcffb	The kqueue_register() function assumes that it is called from the top of the syscall code and acquires various event subsystem locks as needed. The handling of the NOTE_TRACK for EVFILT_PROC is currently done by calling the kqueue_register() from filt_proc() filter, causing recursive entrance of the kqueue code. This results in the LORs and recursive acquisition of the locks. Implement the variant of the knote() function designed to only handle the fork() event. It mostly copies the knote() body, but also handles the NOTE_TRACK, removing the handling from the filt_proc(), where it causes problems described above. The function is called from the fork1() instead of knote(). When encountering NOTE_TRACK knote, it marks the knote as influx and drops the knlist and kqueue lock. In this context call to kqueue_register is safe from the problems. An error from the kqueue_register() is reported to the observer as NOTE_TRACKERR fflag. PR: 108201 Reviewed by: jhb, Pramod Srinivasan <pramod juniper net> (previous version) Discussed with: jmg Tested by: pho MFC after: 2 weeks	2008-07-07 09:30:11 +00:00
kib	ea1979e3d2	The r178914 I erronously put the setting of the KQ_FLUXWAIT flag before KQ_FLUX_WAKEUP(). Since the later macro clears the KQ_FLUXWAIT, the kqueue_scan() thread may be not woken up. Move the setting of KQ_FLUXWAIT after wakeup to correct the issue. Reported and tested by: pho MFC after: 3 days	2008-07-07 09:15:29 +00:00
alc	c016906f4e	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout	2008-07-05 19:34:33 +00:00
rwatson	051819b847	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
alc	b7d6153751	Correct an error in the comments for init_param3(). Discussed with: silby	2008-07-04 19:36:58 +00:00
rwatson	482bfeab47	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
emaste	240825654b	Use bcopy instead of strlcpy in uipc_bind and unp_connect, since soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the terminating NUL is not part of the address." Since strlcpy has to return "the total length of the string [it] tried to create," it walks off the end of soun->sun_path looking for a \0. This reverts r105332. Reported by: Ryan Stone	2008-07-03 23:26:10 +00:00
julian	7b11deb4f4	Change a variable name to not shadow a global Obtained from: vimage	2008-07-03 08:35:59 +00:00
rwatson	108da791bb	Update copyright date in light of soreceive_dgram(9).	2008-07-03 06:47:45 +00:00
rwatson	0c50a62527	Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers significant performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps	2008-07-02 23:23:27 +00:00
rdivacky	d3e39bd522	Use msleep_spin() instead of unlock/tsleep/lock. This was already commited but with a wrong msleep variant and then backed out. Note that this changes the semantic a little as msleep_spin does not let us to specify priority after wakeup. Approved by: wkoszek, cognet Approved by: kib (mentor)	2008-07-02 20:44:33 +00:00
bz	30064ea555	Remove an unneeded error variable to make clear that if reaching the end of the function we never return an error.	2008-06-29 18:26:07 +00:00
bz	103613ceb8	Add a new priv 'PRIV_SCHED_CPUSET' to check if manipulating cpusets is allowed and replace the suser() call. Do not allow it in jails. Reviewed by: rwatson	2008-06-29 17:58:16 +00:00
jhb	411d068395	Rework the lifetime management of the kernel implementation of POSIX semaphores. Specifically, semaphores are now represented as new file descriptor type that is set to close on exec. This removes the need for all of the manual process reference counting (and fork, exec, and exit event handlers) as the normal file descriptor operations handle all of that for us nicely. It is also suggested as one possible implementation in the spec and at least one other OS (OS X) uses this approach. Some bugs that were fixed as a result include: - References to a named semaphore whose name is removed still work after the sem_unlink() operation. Prior to this patch, if a semaphore's name was removed, valid handles from sem_open() would get EINVAL errors from sem_getvalue(), sem_post(), etc. This fixes that. - Unnamed semaphores created with sem_init() were not cleaned up when a process exited or exec'd. They were only cleaned up if the process did an explicit sem_destroy(). This could result in a leak of semaphore objects that could never be cleaned up. - On the other hand, if another process guessed the id (kernel pointer to 'struct ksem' of an unnamed semaphore (created via sem_init)) and had write access to the semaphore based on UID/GID checks, then that other process could manipulate the semaphore via sem_destroy(), sem_post(), sem_wait(), etc. - As part of the permission check (UID/GID), the umask of the proces creating the semaphore was not honored. Thus if your umask denied group read/write access but the explicit mode in the sem_init() call allowed it, the semaphore would be readable/writable by other users in the same group, for example. This includes access via the previous bug. - If the module refused to unload because there were active semaphores, then it might have deregistered one or more of the semaphore system calls before it noticed that there was a problem. I'm not sure if this actually happened as the order that modules are discovered by the kernel linker depends on how the actual .ko file is linked. One can make the order deterministic by using a single module with a mod_event handler that explicitly registers syscalls (and deregisters during unload after any checks). This also fixes a race where even if the sem_module unloaded first it would have destroyed locks that the syscalls might be trying to access if they are still executing when they are unloaded. XXX: By the way, deregistering system calls doesn't do any blocking to drain any threads from the calls. - Some minor fixes to errno values on error. For example, sem_init() isn't documented to return ENFILE or EMFILE if we run out of semaphores the way that sem_open() can. Instead, it should return ENOSPC in that case. Other changes: - Kernel semaphores now use a hash table to manage the namespace of named semaphores nearly in a similar fashion to the POSIX shared memory object file descriptors. Kernel semaphores can now also have names longer than 14 chars (up to MAXPATHLEN) and can include subdirectories in their pathname. - The UID/GID permission checks for access to a named semaphore are now done via vaccess() rather than a home-rolled set of checks. - Now that kernel semaphores have an associated file object, the various MAC checks for POSIX semaphores accept both a file credential and an active credential. There is also a new posixsem_check_stat() since it is possible to fstat() a semaphore file descriptor. - A small set of regression tests (using the ksem API directly) is present in src/tools/regression/posixsem. Reported by: kris (1) Tested by: kris Reviewed by: rwatson (lightly) MFC after: 1 month	2008-06-27 05:39:04 +00:00
julian	e62e072121	Someone cut and pasted a bunch of stuff here so lots of indents were spaces when they should have been tabs, screwing up diffs and patches.. Whitespace commit as my first SVN commit. (yay) MFC after: 1 week	2008-06-26 22:45:04 +00:00
dfr	41cea6d5ca	Re-implement the client side of rpc.lockd in the kernel. This implementation provides the correct semantics for flock(2) style locks which are used by the lockf(1) command line tool and the pidfile(3) library. It also implements recovery from server restarts and ensures that dirty cache blocks are written to the server before obtaining locks (allowing multiple clients to use file locking to safely share data). Sponsored by: Isilon Systems PR: 94256 MFC after: 2 weeks	2008-06-26 10:21:54 +00:00
ru	c878414354	Fix a chicken-and-egg problem: this files implements SSP support, so we cannot compile it with -fstack-protector[-all] flags (or it will self-recurse); this is ensured in sys/conf/files. This OTOH means that checking for defines __SSP__ and __SSP_ALL__ to determine if we should be compiling the support is impossible (which it was trying, resulting in an empty object file). Fix this by always compiling the symbols in this files. It's good because it allows us to always have SSP support, and then compile with SSP selectively. Repoted by: tinderbox	2008-06-26 07:52:45 +00:00
ru	8735fdbd4c	Enable GCC stack protection (aka Propolice) for userland: - It is opt-out for now so as to give it maximum testing, but it may be turned opt-in for stable branches depending on the consensus. You can turn it off with WITHOUT_SSP. - WITHOUT_SSP was previously used to disable the build of GNU libssp. It is harmless to steal the knob as SSP symbols have been provided by libc for a long time, GNU libssp should not have been much used. - SSP is disabled in a few corners such as system bootstrap programs (sys/boot), process bootstrap code (rtld, csu) and SSP symbols themselves. - It should be safe to use -fstack-protector-all to build world, however libc will be automatically downgraded to -fstack-protector because it breaks rtld otherwise. - This option is unavailable on ia64. Enable GCC stack protection (aka Propolice) for kernel: - It is opt-out for now so as to give it maximum testing. - Do not compile your kernel with -fstack-protector-all, it won't work. Submitted by: Jeremie Le Hen <jeremie@le-hen.org>	2008-06-25 21:33:28 +00:00
davidxu	70dd244f26	Add two commands to _umtx_op system call to allow a simple mutex to be locked and unlocked completely in userland. by locking and unlocking mutex in userland, it reduces the total time a mutex is locked by a thread, in some application code, a mutex only protects a small piece of code, the code's execution time is less than a simple system call, if a lock contention happens, however in current implemenation, the lock holder has to extend its locking time and enter kernel to unlock it, the change avoids this disadvantage, it first sets mutex to free state and then enters kernel and wake one waiter up. This improves performance dramatically in some sysbench mutex tests. Tested by: kris Sounds great: jeff	2008-06-24 07:32:12 +00:00
jhb	437891381c	Remove the posixsem_check_destroy() MAC check. It is semantically identical to doing a MAC check for close(), but no other types of close() (including close(2) and ksem_close(2)) have MAC checks. Discussed with: rwatson	2008-06-23 21:37:53 +00:00
rwatson	1e17e3cd45	If S_IFIFO is passed to mknod(2), invoke kern_mkfifoat(9) to create a FIFO, as required by SUSv3. No specific privilege check is performed in this case, as FIFOs may be created by unprivileged processes (subject to the normal file system name space restrictions that may be in place). Unlike the Apple implementation, we reject requests to create a FIFO using mknod(2) if there is a non-zero dev argument to the system call, which is permitted by the Open Group specification ("... undefined ..."). We might want to revise this if we find it causes compatibility problems for applications in practice. PR: kern/74242, kern/68459 Obtained from: Apple, Inc. MFC after: 3 weeks	2008-06-22 21:51:32 +00:00
gonzo	f0ffee5444	Use minimum of max_aio_procs and target_aio_procs when spawning new aiod since there should be no more then max_aio_procs processes.	2008-06-21 11:34:34 +00:00
imp	bf94b8a5bf	Split out the probing magic of device_probe_and_attach into device_probe() so that it can be used by busses that may wish to do additional processing between probe and attach. Reviewed by: dfr@	2008-06-20 16:58:15 +00:00
alc	c5556f0762	Enforce the mapping of kernel loadable modules in the uppermost 2GB of the kernel virtual address space on amd64.	2008-06-20 06:24:34 +00:00
delphij	4f152d47fa	Revert rev. 178124 as requested by kris@. Having jail id not being reused too frequently is useful for script controlled environment.	2008-06-19 21:41:57 +00:00
gonzo	c5bc6314e2	Renew semaphore's pointer after wakeup since during msleep sem_base may have been modified by destroying one of semaphores and semptr would not be valid in this case. PR: kern/123731	2008-06-19 18:08:42 +00:00
kib	eecc60305f	Struct cdev is always the member of the struct cdev_priv. When devfs needed to promote cdev to cdev_priv, the si_priv pointer was followed. Use member2struct() to calculate address of the wrapping cdev_priv. Rename si_priv to __si_reserved. Tested by: pho Reviewed by: ed MFC after: 2 weeks	2008-06-16 17:34:59 +00:00
jb	567c5d727e	Remove code that isn't required. It actually breaks the case where KDTRACE_HOOKS is defined and KDB isn't. This is the case that it was intended for.	2008-06-16 04:44:29 +00:00
ed	4327eebef0	Turn dev2unit(), minor(), unit2minor() and minor2unit() into macro's. Now that we got rid of the minor-to-unit conversion and the constraints on device minor numbers, we can convert the functions that operate on minor and unit numbers to simple macro's. The unit2minor() and minor2unit() macro's are now no-ops. The ZFS code als defined a macro named `minor'. Change the ZFS code to use umajor() and uminor() here, as it is the correct approach to do this. Also add $FreeBSD$ to keep SVN happy. Approved by: philip (mentor), pjd	2008-06-12 08:30:54 +00:00

... 3 4 5 6 7 ...

10960 Commits