freebsd-nq

Author	SHA1	Message	Date
Dag-Erling Smørgrav	32bf7cdf5a	Let vfs_lookup() return ENOTDIR if the path has a trailing slash and the last component is a symlink to something that isn't a directory. We introduce a new namei flag, TRAILINGSLASH, which is set by lookup() if the last component is followed by a slash. The trailing slash is then stripped, as before. If the final component is a symlink, lookup() will return to namei(), which will expand the symlink and call lookup() with the new path. When all symlinks have been resolved, lookup() checks if the TRAILINGSLASH flag is set, and if it is, and the vnode it ended up with is not a directory, it returns ENOTDIR. PR: kern/21768 Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru> MFC after: 3 weeks	2009-05-29 10:02:44 +00:00
Dag-Erling Smørgrav	b181c8aac6	Fix misleading comment. MFC after: 1 week	2009-05-29 09:52:13 +00:00
Attilio Rao	e31d083357	The patch for r193011 was partially rejected when applied, complete it.	2009-05-29 08:01:48 +00:00
Ed Schouten	c5e30cc02b	Last minute TTY API change: remove mutex argument from tty_alloc(). I don't want people to override the mutex when allocating a TTY. It has to be there, to keep drivers like syscons happy. So I'm creating a tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex() should eventually be removed. The advantage of this approach, is that we can just remove a function, without breaking the regular API in the future.	2009-05-29 06:41:23 +00:00
Attilio Rao	1ae1c2a3bd	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
Zachary Loafman	cfeb7489c2	fail(9) support: Add support for kernel fault injection using KFAIL_POINT_* macros and fail_point_* infrastructure. Add example fail point in vfs_bio.c to simulate VM buf pressure. Approved by: dfr (mentor)	2009-05-27 16:36:54 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Stacey Son	00a5db46de	Add the ksyms(4) pseudo driver. The ksyms driver allows a process to get a quick snapshot of the kernel's symbol table including the symbols from any loaded modules (the symbols are all merged into one symbol table). Unlike like other implementations, this ksyms driver maps memory in the process memory space to store the snapshot at the time /dev/ksyms is opened. It also checks to see if the process has already a snapshot open and won't allow it to open /dev/ksyms it again until it closes first. This prevents kernel and process memory from being exhausted. Note that /dev/ksyms is used by the lockstat(1) command. Reviewed by: gallatin kib (freebsd-arch) Approved by: gnn (mentor)	2009-05-26 21:39:09 +00:00
Stacey Son	a5aedd68b4	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
Ed Schouten	8b0d29d858	Get rid of M_TEMP.	2009-05-26 18:33:36 +00:00
Pawel Jakub Dawidek	ce332f1e67	Add missing socket options.	2009-05-26 09:19:21 +00:00
Konstantin Belousov	8af54d4cfc	The advisory lock may be activated or activated and removed during the sleep waiting for conditions when the lock may be granted. To prevent lf_setlock() from accessing possibly freed memory, add reference counting to the struct lockf_entry. Bump refcount around the sleep. Make lf_free_lock() return non-zero when structure was freed, and use this after the sleep to return EINTR to the caller. The error code might need a clarification, but we cannot return success to usermode, since the lock is not owned anymore. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:39:38 +00:00
Konstantin Belousov	9727972e2c	In lf_purgelocks(), assert that state->ls_pending is empty after we weeded out threads, and clean ls_active instead of ls_pending. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:37:55 +00:00
Konstantin Belousov	b33d617717	In lf_advlockasync(), recheck for doomed vnode after the state->ls_lock is acquired. In the lf_purgelocks(), assert that vnode is doomed and set *statep to NULL before clearing ls_pending list. Otherwise, we allow for the thread executing lf_advlockasync() to put new pending entry after state->ls_lock is dropped in lf_purgelocks(). Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:33:16 +00:00
Ed Schouten	47e6a3971f	Block when initially opening a TTY multiple times. In the original MPSAFE TTY code, I changed the behaviour by returning EBUSY. I thought this made more sense, because it's basically a race to see who gets the TTY first. It turns out this is not a good change, because it also causes EBUSY to be returned when another process is closing the TTY. This can happen during startup, when /etc/rc (or one of its children) is still busy draining its data and /sbin/init is attempting to open the TTY to spawn a getty. Reported by: bz Tested by: bz	2009-05-24 12:32:03 +00:00
Konstantin Belousov	8aec91b5e8	Replace the while statement with the if for clarity. The loop body cannot be executed more then once. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:28:38 +00:00
Marko Zec	37f17770e0	V_irtualize the if_clone framework, thus allowing for clonable ifnets to optionally have overlapping unit numbers if attached in different vnets. At this stage if_loop is the only clonable ifnet class that has been extended to allow for such overlapping allocation of unit numbers, i.e. in each vnet it is possible to have a lo0 interface. Other clonable ifnet classes remain to operate with traditional semantics, i.e. each instance of a clonable ifnet will be assigned a globally unique unit number, regardless in which vnet such an ifnet becomes instantiated. While here, garbage collect unused _lo_list field in struct vnet_net, as well as improve indentation for #defines in sys/net/vnet.h. The layout of struct vnet_net has changed, therefore bump __FreeBSD_version. This change has no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, brooks Approved by: julian (mentor)	2009-05-23 21:43:44 +00:00
Jamie Gritton	1e2a13e62a	Delay an error message until the variable it uses gets initialized. Found with: Coverity Prevent(tm) CID: 4316 Reported by: trasz Approved by: bz (mentor)	2009-05-23 16:13:26 +00:00
Marko Zec	e0c14af9b3	Introduce the if_vmove() function, which will be used in the future for reassigning ifnets from one vnet to another. if_vmove() works by calling a restricted subset of actions normally executed by if_detach() on an ifnet in the current vnet, and then switches to the target vnet and executes an appropriate subset of if_attach() actions there. if_attach() and if_detach() have become wrapper functions around if_attach_internal() and if_detach_internal(), where the later variants have an additional argument, a flag indicating whether a full attach or detach sequence is to be executed, or only a restricted subset suitable for moving an ifnet from one vnet to another. Hence, if_vmove() will not call if_detach() and if_attach() directly, but will call the if_detach_internal() and if_attach_internal() variants instead, with the vmove flag set. While here, staticize ifnet_setbyindex() since it is not referenced from outside of sys/net/if.c. Also rename ifccnt field in struct vimage to ifcnt, and do some minor whitespace garbage collection where appropriate. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, rwatson, brooks? Approved by: julian (mentor)	2009-05-22 22:09:00 +00:00
Edward Tomasz Napierala	ae1add4e55	Make 'struct acl' larger, as required to support NFSv4 ACLs. Provide compatibility interfaces in both kernel and libc. Reviewed by: rwatson	2009-05-22 15:56:43 +00:00
Ed Schouten	52f542a8e4	Enable secure TTY input buffer flushing by default. I'm leaving the sysctl there. If people really notice a slowdown, they can revert to the old behaviour. Discussed with: kib	2009-05-21 16:48:06 +00:00
Ed Schouten	770c15f60f	Add a new sysctl: kern.tty_inq_flush_secure. When enabled all TTY input queue buffers are zeroed when flushing or closing the TTY. Because TTY input queues are also used to store filled in passwords, this may be an interesting switch to enable for security minded people.	2009-05-21 16:19:54 +00:00
John Baldwin	d422da9a0a	Only use the ABI compat shim for vfs.bufspace if the old buffer is smaller than a long. PR: amd64/134786 Submitted by: Emil Mikulic emikulic\| gmail MFC after: 3 days	2009-05-21 16:18:45 +00:00
Attilio Rao	9995e57b01	Move the M_WAITOK flag in notify() into an M_NOWAIT one in order to match the behaviour alredy present with the further malloc() call in devctl_notify(). This fixes a bug in the CAM layer where the camisr handler finished to call camperiphfree() (and subsequently destroy_dev() resulting in a new dev notify) while the xpt lock is held. PR: kern/130330 Tested by: Riccardo Torrini <riccardo dot torrini at esaote dot com>	2009-05-21 13:22:07 +00:00
John Baldwin	6ca33ea345	Set the umask in a new file descriptor table earlier in fdcopy() to remove two lock operations.	2009-05-20 18:42:04 +00:00
John Baldwin	583220dc4c	Remove an obsolete assertion. We always wake up all waiters when unlocking a mutex and never set the lock cookie == MTX_CONTESTED.	2009-05-20 18:29:14 +00:00
John Baldwin	4ab9c8af92	Fix a typo.	2009-05-20 17:19:30 +00:00
Warner Losh	248343f9d1	We no longer need to use d_thread_t for portability here, switch to struct thread *.	2009-05-20 16:58:16 +00:00
Kip Macy	126f8425c3	Add minimal ZFS lock hierarchy	2009-05-20 02:51:48 +00:00
Robert Watson	56a3c6d4a7	With SMPng, DEVICE_POLLING uses its own idle threads, rather than the system idle loop, to run ether_poll(), so make ether_poll() static. MFC after: 1 week	2009-05-19 19:21:25 +00:00
Andriy Gapon	51ca6cd6df	sysctl_rman: report shared resources to devinfo shared uses of a resource are recorded on a sub-list hanging off a main resource object on a main resource list; without this change a shared resource (e.g. irq) is reported only once by devinfo -r/-u; with this change the resource is reported for each driver that allocates it (which is even more than what vmstat -i -a reports). Approved by: jhb (mentor)	2009-05-19 14:08:21 +00:00
Robert Watson	e84bcd8494	Binding interrupts to a CPU consists of two parts: setting up CPU affinity for the interrupt thread, and requesting that underlying hardware direct interrupts to the CPU. For software interrupt threads, implement a no-op interrupt event binder that returns success, so that the interrupt management code will just set the ithread's affinity and succeed. Reviewed by: jhb MFC after: 1 week	2009-05-18 14:02:55 +00:00
Ed Schouten	c383c2211b	Mark the clock sysctls as MPSAFE. These sysctls don't need any form of locking. At least cp_times is used by powerd very often, which means I get 50% less calls to non-MPSAFE sysctls on my system. The other 50% is consumed by dev.cpu.0.freq, but this seems to need Giant for Newbus.	2009-05-18 12:03:43 +00:00
Alan Cox	1be5269359	Several changes to vfs_bio_clrbuf(): Provide a more descriptive comment. Eliminate dead code. The page cannot possibly have PG_ZERO set. Eliminate unnecessary blank lines. Reviewed by: tegge	2009-05-17 23:25:53 +00:00
Alan Cox	6e5982caf7	Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge	2009-05-17 20:26:00 +00:00
Ed Schouten	379affd5cb	Print an extra newline when not at the first column already. This makes siginfo output look a lot better when pressing it the first time when in sh(1), for example: $ load: 0.00 cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k load: 0.00 cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k will now become: $ load: 0.00 cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k load: 0.00 cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k	2009-05-17 16:17:48 +00:00
Ed Schouten	dd970f41f7	Several cleanups to tty_info(), better known as Ctrl-T. - Only pick up PROC_LOCK once, which means we can drop the PGRP_LOCK right after picking up PROC_LOCK for the first time. - Print the process real time, making it consistent with tools like time(1). - Use `p' and `td' to reference the process/thread we are going to print. Only use pick-variables inside the loops. We already did this for the threads, but not the processes.	2009-05-17 12:30:25 +00:00
Dag-Erling Smørgrav	433e2f4763	Remove do-nothing code that was required to dirty the old buffer on Alpha. Coverity ID: 838 Approved by: jhb, alc	2009-05-15 21:34:58 +00:00
Konstantin Belousov	6b72d8db47	Revert r192094. The revision caused problems for sysctl(3) consumers that expect that oldlen is filled with required buffer length even when supplied buffer is too short and returned error is ENOMEM. Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when remaining buffer space is too short for the current kinfo_file structure. Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition to keep existing interface for the sysctl, though. Reported by: ed, Florian Smeets <flo kasimir com> Tested by: pho	2009-05-15 14:41:44 +00:00
John Baldwin	3e829b18d6	- Use a separate sx lock to try to limit the number of concurrent userland sysctl requests to avoid wiring too much user memory. Only grab this lock if the user's old buffer is larger than a page as a tradeoff to allow more concurrency for common small requests. - Just use a shared lock on the sysctl tree for user sysctl requests now. MFC after: 1 week	2009-05-14 22:01:32 +00:00
Konstantin Belousov	e401a6a54e	Do not advance req->oldidx when sysctl_old_user returning an error due to copyout failure or short buffer. The later breaks the usermode iterators of the sysctl results that pack arbitrary number of variable-sized structures. Iterator expects that kernel filled exactly oldlen bytes, and tries to interpret half-filled or garbage structure at the end of the buffer. In particular, kinfo_getfile(3) segfaulted. Reported and tested by: pho MFC after: 3 weeks	2009-05-14 10:54:57 +00:00
Jeff Roberson	bf422e5f27	- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.	2009-05-14 03:24:22 +00:00
Alan Cox	1c1b26f276	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
Edward Tomasz Napierala	e5023dd9f6	Add missing 'break' statement. Found with: Coverity Prevent(tm) CID: 3919	2009-05-12 17:05:40 +00:00
Konstantin Belousov	3b616faed5	Prevent overflow of uio_resid. Noted by: jhb MFC after: 3 days	2009-05-11 19:58:03 +00:00
Attilio Rao	22d7ae67d4	Fix a kernel compilation error, introduced after r191990, by defining thread with curthread in the AUDIT case. Reported by: dchagin	2009-05-11 16:32:58 +00:00
Attilio Rao	dfd233edd5	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00
Alan Cox	c3d3fe6314	Revert CVS revision 1.94 (svn r16840). Current pmap implementations don't suffer from the race condition that motivated revision 1.94. Consequently, the work-around that was implemented by revision 1.94 is no longer needed. Moreover, reverting this work-around eliminates the need for vfs_busy_pages() to acquire the page queues lock when preparing a buffer for read. Reviewed by: tegge	2009-05-11 05:16:57 +00:00
Warner Losh	7cddab635b	Spell NULL properly, use (void) rather than () for functions with no parameters. Mark two items as static that aren't used elsewhere...	2009-05-09 19:08:22 +00:00
Warner Losh	e678f09a15	Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so it can go.	2009-05-09 19:00:47 +00:00
Alexander Kabaev	5679fe1957	Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib	2009-05-09 18:09:17 +00:00
Marko Zec	2114e063f0	A NOP change: style / whitespace cleanup of the noise that slipped into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)	2009-05-08 14:34:25 +00:00
Marko Zec	29b02909eb	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
Jamie Gritton	7ae27ff49f	Move the per-prison Linux MIB from a private one-off pointer to the new OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-07 18:36:47 +00:00
Konstantin Belousov	41b72e6e50	Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy(). Suggested and reviewed by: jeff Tested by: pho MFC after: 3 weeks	2009-05-07 18:14:21 +00:00
Ed Schouten	14358b0fec	If we have a regular rint handler, never go into rint_bypass mode. It turns out if we called cfmakeraw() on a TTY with only a rint handler in place, it could inject data into the TTY, even though it should be redirected. Always take a look at the hooks before looking at the termios flags.	2009-05-07 17:39:23 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Jamie Gritton	49939083a0	Add a constant PR_MAXMETHOD to better define the jail/OSD interface. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-05 05:49:08 +00:00
Ed Schouten	3382ac3233	Remove unneeded check for SESS_LEADER(). We perform the same check ~10 lines above.	2009-05-04 11:11:10 +00:00
Jamie Gritton	3dd4fac97c	Don't call the OSD destructor if the data slot is NULL (since it's already not done on unused slots, which are indistinguishable to the caller). Approved by: bz (mentor)	2009-04-30 22:43:21 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Jeff Roberson	09c8a4cc21	- Fix non-SMP build by encapsulating idle spin logic in a macro. Pointy hat to: me	2009-04-29 23:04:31 +00:00
Jamie Gritton	fe2f3c651f	Regen for new jail system calls in r191673. Approved by: bz (mentor)	2009-04-29 21:50:13 +00:00
Jamie Gritton	b38ff370e4	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)	2009-04-29 21:14:15 +00:00
Bruce M Simpson	33cde13046	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
Jamie Gritton	af7bd9a4f4	Some non-functional changes: whitespace, KASSERT strings, declaration order. Approved by: bz (mentor)	2009-04-29 18:41:08 +00:00
Jeff Roberson	113dda8a7c	- Fix the FBSDID line.	2009-04-29 03:26:30 +00:00
Jeff Roberson	7b55ab0534	- Remove the bogus idle thread state code. This may have a race in it and it only optimized out an ipi or mwait in very few cases. - Skip the adaptive idle code when running on SMT or HTT cores. This just wastes cpu time that could be used on a busy thread on the same core. - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use CG_FLAG_THREAD to mean SMT or HTT. Sponsored by: Nokia	2009-04-29 03:15:43 +00:00
Bjoern A. Zeeb	6aaa0b3cf1	Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail. Processes inside the jail will still be able to change child sets. A superuser outside of a jail will still be able to change the jail cpuset and thus limit the number of cpus available to the jail. Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman) PR: kern/134050 Reviewed by: jeff MFC after: 3 weeks X-MFC: backout r191596	2009-04-28 21:00:50 +00:00
Robert Watson	d02add54ea	Improve approximation of style(9).	2009-04-26 21:16:03 +00:00
Marko Zec	5624194730	Extend the vnet module registration / initialization framework first introduced @ r190909 with a vnet module deregistration service. kldunloadable modules, which are currently using vnet_mod_register() to attach their per-vnet initialization routines to the vnet initialization framework, should call vnet_mod_deregister() before acknowledging MOD_UNLOAD requests in their mod_event handlers. Such changes to the existing code base will follow in subsequent commits. vnet_mod_deregister() does not check whether departing vnet modules are registered as prerequisites for another module(s), so it should be used with care. Currently I'm only aware of vnet modules which are leafs on module dependency graphs that are kldunloadable. This change also introduces per-vnet module destructor handler, which calls vnet's module cleanup function, which (if required) has to be registered in vnet module's vnet_modinfo_t structure .vmi_idetach field. Once options VIMAGE becomes operational, the framework will take care that module's cleanup function become invoked for each active vnet instance, and that the memory allocated for each instance gets freed. Currently calls to destructor handlers must always succeed.	2009-04-26 07:09:39 +00:00
Ed Schouten	ccfd3aab30	Turn MAXPTSDEVS into a sysctl tunable. This allows users to increase the maximum amount of pseudo-terminals without changing any source code. Users must increase UT_LINESIZE before attempting to increase kern.pts_maxdev.	2009-04-25 10:05:55 +00:00
Bjoern A. Zeeb	47479a8ceb	Correct a comment: the function name given had never existed in any (relevant) version of this file orany of my patches. MFC after: 1 month	2009-04-22 20:49:54 +00:00
Maksim Yevmenkin	e72a94adc3	Fix sbappendrecord_locked(). The main problem is that sbappendrecord_locked() relies on sbcompress() to set sb_mbtail. This will not happen if sbappendrecord_locked() is called with mbuf chain made of exactly one mbuf (i.e. m0->m_next == NULL). In this case sbcompress() will be called with m == NULL and will do nothing. I'm not entirely sure if m == NULL is a valid argument for sbcompress(), and, it rather pointless to call it like that, but keep calling it so it can do SBLASTMBUFCHK(). The problem is triggered by the SOCKBUF_DEBUG kernel option that enables SBLASTRECORDCHK() and SBLASTMBUFCHK() checks. PR: kern/126742 Investigated by: pluknet < pluknet -at- gmail -dot- com > No response from: freebsd-current@, freebsd-bluetooth@ MFC after: 3 days	2009-04-21 19:14:13 +00:00
Konstantin Belousov	6fae832ad7	Fix typo. Noted by: jhb MFC after: 2 weeks	2009-04-20 15:10:03 +00:00
Konstantin Belousov	007abb3d0f	On the exit of the child process which parent either set SA_NOCLDWAIT or ignored SIGCHLD, unconditionally wake up the parent instead of doing this only when the child is a last child. This brings us in line with other U**xes that support SA_NOCLDWAIT. If the parent called waitpid(childpid), then exit of the child should wake up the parent immediately instead of forcing it to wait for all children to exit. Reported by: Alan Ferrency <alan pair com> Submitted by: Jilles Tjoelker <jilles stack nl> PR: 108390 MFC after: 2 weeks	2009-04-20 14:34:55 +00:00
Robert Watson	e5a9a8ead8	Lock the interface address list while iterating a network interface's address list when searching for a link-layer address to use during uuid generation. MFC after: 2 weeks	2009-04-19 21:36:18 +00:00
Robert Watson	bb1c7df80f	struct malloc_type has had a 'magic' field statically initialized to M_MAGIC by MALLOC_DEFINE() for a long time; add assertions that malloc_type's passed to malloc(), free(), etc have that magic set. MFC after: 2 weeks	2009-04-19 12:41:37 +00:00
Edward Tomasz Napierala	e0ee758989	When allocating 'struct acl' instances, use malloc(9) instead of uma(9). This struct will get much bigger soon, and we don't want to waste too much memory on UMA caches. Reviewed by: rwatson	2009-04-19 09:56:30 +00:00
Edward Tomasz Napierala	b998d381f2	Use acl_alloc() and acl_free() instead of using uma(9) directly. This will make switching to malloc(9) easier; also, it would be neccessary to add these routines if/when we implement variable-size ACLs.	2009-04-18 16:47:33 +00:00
Alexander Kabaev	8aeb69d0f2	Undo private changes that should never have been committed.	2009-04-17 18:34:11 +00:00
Alexander Kabaev	348496ad39	More fallout from negative dotdot caching. Negative entries should be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib	2009-04-17 18:11:11 +00:00
Konstantin Belousov	28a1b4eb37	In flushbufqueues(), do not allocate sentinel buffer on the stack, struct buf is large. Use sleeping malloc(9) call, and zero the allocated buf as a debugging feature.	2009-04-16 09:37:48 +00:00
Konstantin Belousov	949af70942	Export the number of times bufdaemon got help from the normal threads.	2009-04-16 09:33:52 +00:00
Ed Schouten	6672361085	Remove dead code from devtoname(). In the good old days it was possible to have dev_t's that referred to nonexistent devices. In these cases devtoname() automatically generated names. This is no longer possible, so remove this dead code. Discussed with: kib	2009-04-15 20:43:12 +00:00
Ed Schouten	bce79dbb29	Remove unneeded variable and casting from newdev(). Remove the `udev' variable, which has a different type than the original function argument and si_drv0. The `udev' name is also misleading, because it is not the number returned by dev2udev(). Rename this argument to `unit'. It is the same number as returned by dev2unit().	2009-04-15 20:15:36 +00:00
Ed Schouten	d7cbfc1b18	Don't use si_drv0 directly. We should still access si_drv0 using dev2unit(). Also change the KASSERT() to really print the udev instead of the unit number. I suspect it's still useful to print the unit number, especially for devices that use clone lists, so keep the unit number in the panic string.	2009-04-15 20:08:26 +00:00
John Baldwin	3f11530b79	Update comment above _fget() for earlier change to FWRITE failures return EBADF rather than EINVAL. Submitted by: Jaakko Heinonen jh saunalahti fi MFC after: 1 month	2009-04-15 19:10:37 +00:00
Alexander Kabaev	9cf6772211	Redo previous change using simpler patch that happens to be also more correct. Submitted by: tor	2009-04-14 23:56:48 +00:00
Alexander Kabaev	eed8a9edba	Fix yet another negative dotodot entry fallout. Reported by: pho	2009-04-14 23:46:57 +00:00
Kip Macy	5e6a926611	- use a shared lock for reads - remove stale comment Reviewed by: jeffr	2009-04-13 23:09:44 +00:00
David Xu	945488297b	Make UMTX_OP_WAIT_UINT actually wait for an unsigned integer on 64-bits machine. MFC after: 1 week	2009-04-13 05:21:17 +00:00
Kip Macy	f0b9868d3a	sendfile doesn't modify the vnode - acquire vnode lock shared Reviewed by: ups, jeffr	2009-04-12 05:19:35 +00:00
Robert Watson	89f28b1b86	Remove conditionally compiled time counter statistics; tools like DTrace, kernel profiling, etc, can provide this information without the overhead. MFC after: 3 days Suggested by: bde	2009-04-11 22:01:40 +00:00
Alexander Kabaev	9d75482f99	Fix v_cache_dd handling for negative entries. v_cache_dd pointer was not populated in parent directory if negative entry was being created, yet entry itself was added to the nc_neg list. It was possible for parent vnode to get discarded later, leaving negative entry pointing to now unused memory block. Reported by: dho Revewed by: kib	2009-04-11 20:23:08 +00:00
Konstantin Belousov	fd409594c6	When zapping v_cache_dd for !MAKEENTRY case in cache_lookup(), we shall lock cache as writer. Reviewed by: kan	2009-04-11 16:12:20 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
Konstantin Belousov	3f54086eba	Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then. In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim. Reported and tested by: pho	2009-04-10 10:22:44 +00:00
Andrew Thompson	853a10a581	Revert r190676,190677 The geom and CAM changes for root_hold are the wrong solution for USB design quirks. Requested by: scottl	2009-04-10 04:08:34 +00:00

1 2 3 4 5 ...

11174 Commits