freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	a9385ad10f	Change ttyhook_register() second argument from thread to process pointer. Thread was not really needed there, while previous ng_tty implementation that used thread pointer had locking issues (using sx while holding mutex).	2008-12-13 21:17:46 +00:00
Joseph Koshy	6fe00c7876	- Bug fix: prevent a thread from migrating between CPUs between the time it is marked for user space callchain capture in the NMI handler and the time the callchain capture callback runs. - Improve code and control flow clarity by invoking hwpmc(4)'s user space callchain capture callback directly from low-level code. Reviewed by: jhb (kern/subr_trap.c) Testing (various patch revisions): gnn, Fabien Thomas <fabien dot thomas at netasq dot com>, Artem Belevich <artemb at gmail dot com>	2008-12-13 13:07:12 +00:00
Ed Schouten	d4892ee51e	Add FIONREAD to pseudo-terminal master devices. All ioctl()'s that aren't implemented by pts(4) are forwarded to the TTY itself. Unfortunately this is not correct for FIONREAD, because it will give the wrong amount of bytes that are available to read. Tested by: keramida Reminded by: keramida	2008-12-13 07:23:55 +00:00
Konstantin Belousov	cd2983ca71	Uio_yield() already does DROP_GIANT/PICKUP_GIANT, no need to repeat this around the call. Noted by: bde	2008-12-12 14:03:04 +00:00
Konstantin Belousov	c7462f4387	Reference the vmspace of the process being inspected by procfs, linprocfs and sysctl kern_proc_vmmap handlers. Reported and tested by: pho Reviewed by: rwatson, des MFC after: 1 week	2008-12-12 12:12:36 +00:00
Konstantin Belousov	af80b2c901	The userland_sysctl() function retries sysctl_root() until returned error is not EAGAIN. Several sysctls that inspect another process use p_candebug() for checking access right for the curproc. p_candebug() returns EAGAIN for some reasons, in particular, for the process doing exec() now. If execing process tries to lock Giant, we get a livelock, because sysctl handlers are covered by Giant, and often do not sleep. Break the livelock by dropping Giant and allowing other threads to execute in the EAGAIN loop. Also, do not return EAGAIN from p_candebug() when process is executing, use more appropriate EBUSY error [1]. Reported and tested by: pho Suggested by: rwatson [1] Reviewed by: rwatson, des MFC after: 1 week	2008-12-12 12:06:28 +00:00
Joe Marcus Clarke	b9022449b3	Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name on a best-effort basis. Teach vn_fullpath to use this new VOP if a regular VFS cache lookup fails. This VOP is designed to supplement the VFS cache to provide a better chance that a vnode-to-name lookup will succeed. Currently, an implementation for devfs is being committed. The default implementation is to return ENOENT. A big thanks to kib for the mentorship on this, and to pho for running it through his stress test suite. Reviewed by: arch Approved by: kib	2008-12-12 00:57:38 +00:00
Ed Schouten	1ff90be789	Add kqueue()-support to pseudo-terminal master devices. One thing I didn't expect many applications to use, was kqueue() on pseudo-terminal master devices. There are applications that use kqueue() on the TTY itself (rtorrent, etc). That doesn't mean we shouldn't implement this. Libraries like libevent use kqueue() by default, which means they wouldn't be able to use kqueue(). The old TTY layer implements a very broken version of kqueue() by performing the actual polling on the TTY device. Discussed with: peter	2008-12-11 21:44:02 +00:00
Bjoern A. Zeeb	9ea9ef7e89	Order #includes - also to reduce diffs with vimage branches in p4. Sponsored by: The FreeBSD Foundation	2008-12-11 16:09:31 +00:00
Bjoern A. Zeeb	0f1fe22db5	Correctly check the number of prison states to not access anything outside the prison_states array. When checking if there is a name configured for the prison, check the first character to not be '\0' instead of checking if the char array is present, which it always is. Note, that this is different for the *jailname in the syscall. Found with: Coverity Prevent(tm) CID: 4156, 4155 MFC after: 4 weeks (just that I get the mail)	2008-12-11 01:04:25 +00:00
Marko Zec	385195c062	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
Bjoern A. Zeeb	629386598e	Make sure nmbclusters are initialized before maxsockets by running the tunable_mbinit() SYSINIT at SI_ORDER_MIDDLE before the init_maxsockets() SYSINT at SI_ORDER_ANY. Reviewed by: rwatson, zec Sponsored by: The FreeBSD Foundation MFC after: 4 weeks	2008-12-10 22:17:09 +00:00
Bjoern A. Zeeb	36b5ba0c49	Style changes only. Put the return type on an extra line[1] and add an empty line at the beginning as we do not have any local variables. Submitted by: rwatson [1] Reviewed by: rwatson MFC after: 4 weeks	2008-12-10 22:10:37 +00:00
Ed Schouten	d16ebcd4fe	Remove added newlines from logged messages written to /dev/console. The /dev/console device node logs all strings that are written to it. When the string does not contain a trailing newline, it appends one. I can imagine this was useful a long time ago, but with our current rc-scripts, it generates a whole bunch of messages that look like: \| Configuring syscons: \| blanktime \| . By not appending the newlines, the output of `dmesg -a' is now (almost?) exactly the same as what the user will see on the console device (syscons, uart).	2008-12-10 21:48:05 +00:00
John Baldwin	3858a1f4f5	- Add 32-bit compat system calls for VFS_AIO. The system calls live in the aio code and are registered via the recently added SYSCALL32_*() helpers. - Since the aio code likes to invoke fuword and suword a lot down in the "bowels" of system calls, add a structure holding a set of operations for things like storing errors, copying in the aiocb structure, storing status, etc. The 32-bit system calls use a separate operations vector to handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now done by having seperate operation vectors with different aiocb copyin routines. - Split out kern_foo() functions for the various AIO system calls so the 32-bit front ends can manage things like copying in and converting timespec structures, etc. - For both the native and 32-bit aio_suspend() and lio_listio() calls, just use copyin() to read the array of aiocb pointers instead of using a for loop that iterated over fuword/fuword32. The error handling in the old case was incomplete (lio_listio() just ignored any aiocb's that it got an EFAULT trying to read rather than reporting an error), and possibly slower. MFC after: 1 month	2008-12-10 20:56:19 +00:00
Kip Macy	e1d881ba31	add RW_SYSINIT_FLAGS macro and rw_sysinit_flags initialization function	2008-12-08 21:46:55 +00:00
Jung-uk Kim	9bd2cbe43f	- Detect Bochs BIOS variants and use HZ_VM as well. - Free kernel environment variable after its use. - Fix style(9) nits.	2008-12-08 18:39:59 +00:00
Konstantin Belousov	118d0afa28	Do drop vm map lock earlier in the sysctl_kern_proc_vmmap(), to avoid locking a vnode while having vm map locked. Reported and tested by: pho MFC after: 1 week	2008-12-08 12:29:30 +00:00
Kip Macy	3120b9d428	- convert radix node head lock from mutex to rwlock - make radix node head lock not recursive - fix LOR in rtexpunge - fix LOR in rtredirect Reviewed by: sam	2008-12-07 21:15:43 +00:00
Konstantin Belousov	aeb325719a	Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week	2008-12-05 20:50:24 +00:00
John Baldwin	75444a8590	When the SYSINIT() to load a module invokes the MOD_LOAD event successfully, move that module to the head of the associated linker file's list of modules. The end result is that once all the modules are loaded, they are sorted in the reverse of their load order. This causes the kernel linker to invoke the MOD_QUIESCE and MOD_UNLOAD events in the reverse of the order that MOD_LOAD was invoked. This means that the ordering of MOD_LOAD events that is set by the SI_* paramters to DECLARE_MODULE() are now honored in the same order they would be for SYSUNINIT() for the MOD_QUIESCE and MOD_UNLOAD events. MFC after: 1 month	2008-12-05 16:47:30 +00:00
John Baldwin	b4824b48b4	- Invoke MOD_QUIESCE on all modules in a linker file (kld) before unloading any modules. As a result, if any module veto's an unload request via MOD_QUIESCE, the entire set of modules for that linker file will remain loaded and active now rather than leaving the kld in a weird state where some modules are loaded and some are unloaded. - This also moves the logic for handling the "forced" unload flag out of kern_module.c and into kern_linker.c which is a bit cleaner. - Add a module_name() routine that returns the name of a module and use that instead of printing pointer values in debug messages when a module fails MOD_QUIESCE or MOD_UNLOAD. MFC after: 1 month	2008-12-05 13:40:25 +00:00
Bjoern A. Zeeb	118258f5c2	Fix a credential reference leak. [1] Close subtle but relatively unlikely race conditions when propagating the vnode write error to other active sessions tracing to the same vnode, without holding a reference on the vnode anymore. [2] PR: kern/126368 [1] Submitted by: rwatson [2] Reviewed by: kib, rwatson MFC after: 4 weeks	2008-12-03 15:54:35 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Konstantin Belousov	d6568724e1	Shared lookup makes it possible to create several negative cache entries for one name. Then, creating inode with that name would remove one entry, leaving others dormant. Reclaiming the vnode would uncover negative entries, causing false return of ENOENT from the calls like stat, that do not create inode. Prevent creation of the duplicated negative entries. Reported and debugged with: pho Reviewed by: jhb X-MFC: after shared lookup changes	2008-12-02 11:14:16 +00:00
Peter Wemm	43151ee6cf	Merge user/peter/kinfo branch as of r185547 into head. This changes struct kinfo_filedesc and kinfo_vmentry such that they are same on both 32 and 64 bit platforms like i386/amd64 and won't require sysctl wrapping. Two new OIDs are assigned. The old ones are available under COMPAT_FREEBSD7 - but it isn't that simple. The superceded interface was never actually released on 7.x. The other main change is to pack the data passed to userland via the sysctl. kf_structsize and kve_structsize are reduced for the copyout. If you have a process with 100,000+ sockets open, the unpacked records require a 132MB+ copyout. With packing, it is "only" ~35MB. (Still seriously unpleasant, but not quite as devastating). A similar problem exists for the vmentry structure - have lots and lots of shared libraries and small mmaps and its copyout gets expensive too. My immediate problem is valgrind. It traditionally achieves this functionality by parsing procfs output, in a packed format. Secondly, when tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled 32 bit binary which ran directly into the differing data structures in 32 vs 64 bit mode. (valgrind uses this to track file descriptor operations and this therefore affected every single 32 bit binary) I've added two utility functions to libutil to unpack the structures into a fixed record length and to make it a little more convenient to use.	2008-12-02 06:50:26 +00:00
Alexander Kabaev	6ee7dd87ba	Shared memory objects that have size which is not necessarily equal to exact multiple of system page size should still be allowed to be mapped in their entirety to match the regular vnode backed file behavior. Reported by: ed Reviewed by: jhb	2008-12-01 22:33:50 +00:00
Ken Smith	f34015d47b	Catch up with the disappearance of sys/dev/hfa.	2008-12-01 14:34:42 +00:00
Attilio Rao	ccc55b33b7	Fix an inverted check introduced in r184554. Submitted by: tegge Pointy hat to: me	2008-12-01 03:00:26 +00:00
David Xu	6d9b63d6c8	Revision 184199 had not been fully reverted, add missing piece. Reported by: phk	2008-12-01 01:54:55 +00:00
Bjoern A. Zeeb	d465a41d3b	Unbreak the no-networks (no INET/6) build that I broke with the commit in r185435. Pointyhat: no, but I could need a ski cap for the winter	2008-11-29 16:17:39 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Konstantin Belousov	6179164448	In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month	2008-11-29 13:34:59 +00:00
Pawel Jakub Dawidek	0c6a80e78d	Improve KASSERT() call a bit: - Print flags in hex. - Note that flags can be fine and panic can be due unexpected error condition. - Remove redundant new line character. Eventhough panic message excess 80 characters keep it in one line so it is easier to grep.	2008-11-29 12:40:14 +00:00
Bjoern A. Zeeb	b9f0b66c75	With the permissions of phk@ change the license on kern_jail.c to a 2 clause BSD license.	2008-11-28 19:23:46 +00:00
Ed Schouten	1cbae70533	Fix matching of message queues by name. The mqfs_search() routine uses strncmp() to match message queue objects by name. This is because it can be called from environments where the file name is not null terminated (the VFS for example). Unfortunately it doesn't compare the lengths of the message queue names, which means if a system has "Queue12345", the name "Queue" will also match. I noticed this when a student of mine handed in an exercise using message queues with names "Queue2" and "Queue". Reviewed by: rink	2008-11-28 14:53:18 +00:00
Konstantin Belousov	b7a813fc21	Explicitely note that destroy_dev() sleeps. Requested by: ed (some time ago), Jaakko Heinonen <jh saunalahti fi>	2008-11-27 16:47:25 +00:00
Ganbold Tsagaankhuu	559b717f5e	Remove unused variable. Found with: Coverity Prevent(tm) CID: 3664 Approved by: kib	2008-11-27 04:40:37 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
Joe Marcus Clarke	ef61995ebd	Move vn_fullpath1() outside of FILEDESC locking. This is being done in advance of teaching vn_fullpath1() how to query file systems for vnode-to-name mappings when cache lookups fail. Thanks to kib for guidance and patience on this process. Reviewed by: kib Approved by: kib	2008-11-25 15:36:15 +00:00
Ed Maste	6ffb78d173	Correct typo in comment: thier -> their	2008-11-24 19:28:52 +00:00
David Malone	27d68f904f	It's possible that the dump device has gone away after it was configured, change the message to let people know this is a possibility. I've slightly changed the message from the one submitted by Pekka to keep the printf on one line. Submitted by: Pekka Savola <pekkas@netcore.fi>	2008-11-23 21:05:22 +00:00
Konstantin Belousov	b4cf0e62f4	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter	2008-11-22 12:36:15 +00:00
Kip Macy	db7f0b974f	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
Julian Elischer	bc97ba5100	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
John Baldwin	0d484d249f	Allow device hints to wire the unit numbers of devices. - An "at" hint now reserves a device name. - A new BUS_HINT_DEVICE_UNIT method is added to the bus interface. When determining the unit number of a device, this method is invoked to let the bus driver specify the unit of a device given a specific devclass. This is the only way a device can be given a name reserved via an "at" hint. - Implement BUS_HINT_DEVICE_UNIT() for the acpi(4) and isa(4) bus drivers. Both of these busses implement this by comparing the resources for a given hint device with the resources enumerated by ACPI/PnPBIOS and wire a unit if the hint resources are a subset of the "real" resources. - Use bus_hinted_children() for adding hinted devices on isa(4) busses now instead of doing it by hand. - Remove the unit kludging from sio(4) as it is no longer necessary. Prodding from: peter, imp OK'd by: marcel MFC after: 1 month	2008-11-18 21:01:54 +00:00
John Baldwin	02f0ff6d92	When checking to see if another CPU is running its idle thread, examine the thread running on the other CPU instead of the thread being placed on the run queue. Reported by: Ravi Murty @ Intel Reviewed by: jeff	2008-11-18 05:41:34 +00:00
Xin LI	e1088cdca3	Obey signedness flag in %z case. MFC after: 2 months	2008-11-17 23:57:40 +00:00
Pawel Jakub Dawidek	1ba4a712dd	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris	2008-11-17 20:49:29 +00:00
Konstantin Belousov	c5f77bf986	Revert r184118. There is actually a code in the kernel, for instance in kern_unlinkat(), that expects that vn_start_write() actually fills the mp even when the call failed. As Tor noted, that pattern relies on the the type stability of the mount points, as well as that suspended mount points are never freed and V_XSLEEP is always passed to vn_start_write() when called on a freed mount point. Reported by: stass Reviewed by: tegge PR: 123768	2008-11-16 21:56:29 +00:00
Nick Hibma	eaa5bb21ca	Silence detach messages if the device has marked itself quiet (u3g). MFC after: 3 weeks	2008-11-13 21:46:19 +00:00
Ed Schouten	87fe0fa84f	Don't forget to relock the TTY after uiomove() returns an error. Peter Holm just discovered this funny bug inside the TTY code: if uiomove() in ttydisc_write() returns an error, we forget to relock the TTY before jumping out of ttydisc_write(). Fix it by placing tty_unlock() and tty_lock() around uiomove(). Submitted by: pho	2008-11-12 09:04:44 +00:00
Ed Schouten	ab0d10f68e	Several cleanups related to pipe(2). - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch	2008-11-11 14:55:59 +00:00
Andrew Gallatin	528fb798ad	Avoid scheduling firmware taskqs when cold. This prevents a panic which occurs when a driver attempts to load firmware at boot via firmware_get() when the firmware module has not been preloaded. firmware_get() will enqueue a task using a struct taskqueue allocated on the stack, and the machine will crash much later in the firmware taskq thread when taskqs are started and the struct taskqueue is garbage. Not objected to by: sam	2008-11-11 12:25:08 +00:00
Ed Schouten	ebb45b0620	Regenerate system call tables for r184789.	2008-11-09 10:48:06 +00:00
Ed Schouten	a1b5a8955e	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib	2008-11-09 10:45:13 +00:00
Kip Macy	8aa7a58108	make kern.ipc.nmbclusters actually have a useful effect on nmbclusters et al. initialize pkthdr in field order	2008-11-09 01:53:06 +00:00
Ed Schouten	5bbae50149	Reduce the default baud rate of PTY's to 9600. On RELENG_6 (and probably RELENG_7) we see our syscons windows and pseudo-terminals have the following buffer sizes: \| LINE RAW CAN OUT IHIWT ILOWT OHWT LWT COL STATE SESS PGID DISC \| ttyv0 0 0 0 7680 6720 2052 256 7 OCcl 1146 1146 term \| ttyp0 0 0 0 7680 6720 1296 256 0 OCc 82033 82033 term These buffer sizes make no sense, because we often have much more output than input, but I guess having higher input buffer sizes improves guarantees of the system. On MPSAFE TTY I just sent both the input and output buffer sizes to 7 KB, which is pretty big on a standard FreeBSD install with 8 syscons windows and some PTY's. Reduce the baud rate to 9600 baud, which means we now have the following buffer sizes: \| LINE INQ CAN LIN LOW OUTQ USE LOW COL SESS PGID STATE \| ttyv0 1920 0 0 192 1984 0 199 7 2401 2401 Oil \| pts/0 1920 0 0 192 1984 0 199 5631 1305 2526 Oi This is a lot smaller, but for pseudo-devices this should be good enough. You need to do a lot of punching to fill up a 7.5 KB input buffer. If it turns out things don't work out this way, we'll just switch to 19200 baud.	2008-11-08 20:40:39 +00:00
Craig Rodrigues	e506f34b24	Merge latest DTrace changes from Perforce. Approved by: jb	2008-11-05 19:40:36 +00:00
David Xu	7b4a950a7d	Revert rev 184216 and 184199, due to the way the thread_lock works, it may cause a lockup. Noticed by: peter, jhb	2008-11-05 03:01:23 +00:00
John Baldwin	927edcc9ba	Use shared vnode locks for auditing vnode arguments as auditing only does a VOP_GETATTR() which does not require an exclusive lock. Reviewed by: csjp, rwatson	2008-11-04 22:31:04 +00:00
John Baldwin	0caf1ab15a	Don't bother calling setrunnable() and clearing the sleeping flag in sleepq_resume_thread() if the thread isn't asleep.	2008-11-04 19:13:53 +00:00
John Baldwin	2ff47c5f18	Remove unnecessary locking around vn_fullpath(). The vnode lock for the vnode in question does not need to be held. All the data structures used during the name lookup are protected by the global name cache lock. Instead, the caller merely needs to ensure a reference is held on the vnode (such as vhold()) to keep it from being freed. In the case of procfs' <pid>/file entry, grab the process lock while we gain a new reference (via vhold()) on p_textvp to fully close races with execve(2). For the kern.proc.vmmap sysctl handler, use a shared vnode lock around the call to VOP_GETATTR() rather than an exclusive lock. MFC after: 1 month	2008-11-04 19:04:01 +00:00
Ed Schouten	394e94079c	Remove redundant return value tests. There is no need to test whether the return value is non-zero here. Just return the error number directly.	2008-11-04 10:58:02 +00:00
John Baldwin	4482f952b1	Adjust the license statement to more closely match a standard 3-clause BSD license. MFC after: 3 days	2008-11-03 21:17:02 +00:00
John Baldwin	21fc02d271	Use shared vnode locks instead of exclusive vnode locks for the access(), chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month	2008-11-03 20:31:00 +00:00
Attilio Rao	30f60d8c31	Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless. Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho	2008-11-03 20:00:35 +00:00
John Baldwin	0f54f8c2b3	A few style nits.	2008-11-03 19:33:20 +00:00
Doug Rabson	45e6ab7f81	Regen.	2008-11-03 10:39:35 +00:00
Doug Rabson	a9148abd9d	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month	2008-11-03 10:38:00 +00:00
Ivan Voras	aa880b9018	Increase the initial sbuf size for CPU topology dump to something more usable for newer CPUs. The new value allows 2 x quad core configuration dumps to fit within the initial buffer without reallocations. Approved by: gnn (mentor) (older version) Pointed out by: rdivacky	2008-11-02 23:11:20 +00:00
Attilio Rao	83b3bdbc8a	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho	2008-11-02 10:15:42 +00:00
Ed Schouten	37a9f58275	Clamp the values of t_column to 5 digits in `pstat -t' and` show all ttys'. We often run into these very high column numbers when we run curses applications, because they don't print any newlines. This messes up the table output of `pstat -t'. If these numbers get really high, they aren't of any use to the reader anyway. Convert them to `99999' when they run out of bounds.	2008-11-01 13:40:46 +00:00
Ed Schouten	c9dba40cc8	Reimplement the /dev/console device node. One of the pieces of code that I had left alone during the development of the MPSAFE TTY layer, was tty_cons.c. This file actually has two different functions: - It contains low-level console input/output routines (cnputc(), etc). - It creates /dev/console and wraps all its cdevsw calls to the appropriate TTY. This commit reimplements the second set of functions by moving it directly into the TTY layer. /dev/console is now a character device node that's basically a regular TTY, but does a lookup of `si_drv1' each time you open it. d_write has also been changed to call log_console(). d_close() is not present, because we must make sure we don't revoke the TTY after writing a log message to it. Even though I'm not convinced this is in line with the future directions of our console code, it is a good move for now. It removes recursive locking from the top half of the TTY layer. The previous implementation called into the TTY layer with Giant held. I'm renaming tty_cons.c to kern_cons.c now. The code hardly contains any TTY related bits, so we'd better give it a less misleading name. Tested by: Andrzej Tobola <ato iem pw edu pl>, Carlos A.M. dos Santos <unixmania gmail com>, Eygene Ryabinkin <rea-fbsd codelabs ru>	2008-11-01 08:35:28 +00:00
Peter Wemm	7a9c4d2409	Add three extra to the kinfo_proc_vmmap data. kve_offset - the offset within an object that a mapping refers to. fileid and fsid are inode/dev for vnodes. (Linux procfs has these and valgrind is really unhappy without them.) I believe I didn't change the size of the struct.	2008-10-31 05:43:19 +00:00
Maxim Sobolev	b0606bd11a	Make it possible to compile kernel with KTR but without DDB.	2008-10-30 21:48:28 +00:00
Ivan Voras	07095abf5d	Introduce a new sysctl, kern.sched.topology_spec, that returns an XML dump of detected ULE CPU topology. This dump can be used to check the topology detection and for general system information. An example of CPU topology dump is: kern.sched.topology_spec: <groups> <group level="1" cache-level="0"> <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu> <flags></flags> <children> <group level="2" cache-level="0"> <cpu count="4" mask="0xf">0, 1, 2, 3</cpu> <flags></flags> </group> <group level="2" cache-level="0"> <cpu count="4" mask="0xf0">4, 5, 6, 7</cpu> <flags></flags> </group> </children> </group> </groups> Reviewed by: jeff Approved by: gnn (mentor)	2008-10-29 13:36:23 +00:00
David Xu	94ec9c0245	If threads limit is exceeded, increase the totoal number of failures.	2008-10-29 12:11:48 +00:00
Edward Tomasz Napierala	013098c874	Rename a variable missed in previous accmode_t-related commits. Approved by: rwatson (mentor)	2008-10-28 21:58:48 +00:00
Edward Tomasz Napierala	15bc6b2bd8	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
Konstantin Belousov	7cd5a03a8e	Style return statements in vn_pollrecord().	2008-10-28 12:22:33 +00:00
Konstantin Belousov	ae53539e21	Protect check for v_pollinfo == NULL and assignment of the newly allocated vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week	2008-10-28 12:08:36 +00:00
Robert Watson	212ab0cfb3	Rename three MAC entry points from _proc_ to _cred_ to reflect the fact that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies. Obtained from: TrustedBSD Project	2008-10-28 11:33:06 +00:00
Peter Wemm	1d387fe73b	After a machine has been up for a bit more than 20 days with HZ=1000, "ticks" goes negative. This breaks the signed comparison in softclock. This causes sleep() to never wake up, tcp to stop, etc etc. This is bad(TM). Use the SEQ_LT() method from tcp's sequence number comparisons.	2008-10-28 03:26:25 +00:00
John Baldwin	a48ac38144	- Whitespace fix for vop_poll. - Use the right label for vop_vptofh lock assertions so they are enforced.	2008-10-27 21:41:55 +00:00
Maxim Sobolev	4da059f304	vm_pnames should be "const char *const[]". Submitted by: Christoph Mallon	2008-10-27 08:09:05 +00:00
Maxim Sobolev	fa38c35148	vm_pnames has no reason to be global. MFC after: 2 weeks	2008-10-27 06:34:41 +00:00
Maxim Sobolev	7f03c419bc	Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly. Due to the nature of the beast it causes lot of unproductive overhead. This is especially bad when running SMP kernel on VMWare with several virtual processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my dual-core MacBook Pro when I am enabling two virtual CPUs, making even host not very usable. Detect when we are running in the sandbox and reduce HZ to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This brings host CPU usage of idle FreeBSD/SMP on two virtual processors down to 10%. Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox and VirtualPC. MFC after: 2 weeks	2008-10-27 06:25:02 +00:00
Doug Rabson	842832aeae	Don't rely on the value of *statep without first taking the vnode interlock. Reviewed by: Mike Tancsa MFC after: 2 weeks	2008-10-24 16:04:10 +00:00
David Xu	300fa5ef6e	Don't rearm callout if the process is exiting, it may leak a callout because callout_drain() only waits for running callout, but not disable it if it is rearmed.	2008-10-24 01:09:24 +00:00
David Xu	6406fd0be6	partly revert revision 184199, because TDF_NEEDSIGCHK is persitent when thread is in kernel mode, it can cause dead loop, now unlock process lock after acquired sleep queue lock and thread lock to avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must be set with process lock and thread lock being hold at same time.	2008-10-24 01:03:31 +00:00
John Baldwin	6239752624	Whitespace fix.	2008-10-23 21:50:16 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
David Xu	3f9be10eb0	Actually, for signal and thread suspension, extra process spin lock is unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.	2008-10-23 07:55:38 +00:00
John Baldwin	63f8fe9e8b	Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week	2008-10-22 21:55:48 +00:00
Marcel Moolenaar	2e0ce59f94	Trivially avoid a null pointer dereference when drivers don't set the rman description. While drivers should set it, a kernel panic is not the right behaviour when faced without one.	2008-10-22 18:20:45 +00:00
Andrew Thompson	93113aac8c	Fix spelling mistake in the last rev.	2008-10-21 14:44:25 +00:00
Andrew Thompson	8429751f67	If we have getc_inject hooked then the outq buffer is inaccessible to the driver so skip the drain rather than waiting indefinitely. Reviewed by: ed	2008-10-21 14:18:45 +00:00
Konstantin Belousov	3ba28ace77	Change vn_start_write() to clear *mpp on all failures when non-NULL vp is supplied, since vm_pageout_scan() expects it to be cleared on error. Submitted by: tegge PR: 123768 MFC after: 1 week	2008-10-21 09:55:49 +00:00

1 2 3 4 5 ...

10872 Commits