freebsd-skq

Author	SHA1	Message	Date
jamie	c0264518e9	Call prison_check from vfs_suser rather than re-implementing it. Approved by: re (kib), bz (mentor)	2009-07-02 14:19:33 +00:00
rwatson	5260e3a73c	Audit file descriptor and command arguments to ioctl(2). Approved by: re (audit argument blanket) MFC after: 1 week	2009-07-02 09:16:25 +00:00
rwatson	b803771ae1	Clean up a number of aspects of token generation from audit arguments to system calls: - Centralize generation of argument tokens for VM addresses in a macro, ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments. - Fix up argument numbers across a large number of syscalls so that they match the numeric argument into the system call. - Don't audit the address argument to ioctl(2) or ptrace(2), but do keep generating tokens for mmap(2), minherit(2), since they relate to passing object access across execve(2). Approved by: re (audit argument blanket) Obtained from: TrustedBSD Project MFC after: 1 week	2009-07-02 09:15:30 +00:00
rwatson	0dd7c48b8f	For access(2) and eaccess(2), audit the requested access mode. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 22:47:45 +00:00
jeff	cd329071c1	- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than using the size of the descriptor array. - A lock is not needed to fetch fd_lastfile. The results are stale the instant it is dropped. - Use a private mutex pool for select since the pool mutex is not used as a leaf. - Fetch the si_mtx pointer first before resorting to hashing to compute the mutex address. Reviewed by: McKusick Approved by: re (kib)	2009-07-01 20:43:46 +00:00
rwatson	cd692bb0b9	Audit file descriptor numbers for various socket-related system calls. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 19:55:11 +00:00
rwatson	4aa2654586	Define missing audit argument macro AUDIT_ARG_SOCKET(), and capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 18:54:49 +00:00
jhb	76256698a1	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
rwatson	9f2c78b3f9	When auditing unmount(2), capture FSID arguments as regular text strings rather than as paths, which would lead to them being treated as relative pathnames and hence confusingly converted into absolute pathnames. Capture flags to unmount(2) via an argument token. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 16:56:56 +00:00
rwatson	f90eaa96d0	Audit the file descriptor number passed to lseek(2). Approved by: re (kib) MFC after: 3 days	2009-07-01 15:37:23 +00:00
rwatson	0e50a12ccd	Fix link(2) auditing: use the second audit record path for the new object name. Approved by: re (kib) MFC after: 3 days	2009-07-01 13:22:08 +00:00
rwatson	dff449105c	udit the 'options' argument to wait4(2). Approved by: re (kib) MFC after: 3 days	2009-07-01 12:36:10 +00:00
alc	45ba0262c0	Remove a stale comment. The very same revision (r85511) that introduced this comment also implemented the proposed change to the code. Approved by: re (kib)	2009-06-30 19:39:17 +00:00
emaste	02a0b3a839	Add FIONSPACE from NetBSD. FIONSPACE is provided so that programs may easily determine how much space is left in the send queue; they do not need to know the send queue size. NetBSD revisions: sys_socket.c r1.41, 1.42 filio.h r1.9 Obtained from: NetBSD Approved by: re (kensmith)	2009-06-30 13:38:49 +00:00
kib	242d1f11af	Free struct ucreds allocated in vfs_hang_addrlist() when deleting the export element. While there, remove register storage-class specifiers. Reported and tested by: pho Reviewed by: kan Approved by: re (kensmith)	2009-06-29 18:09:07 +00:00
attilio	84c65084a0	Don't assume a default (currently 15) value for preloaded klds when loading hwpmc, but calculate at runtime and allocate the necessary space. Also the current logic is wrong as it can lead to an endless loop. Sponsored by: Sandvine Incorporated Reported by: Ryan Stone <rstone at sandvine dot com> Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Approved by: re (kib)	2009-06-29 16:03:18 +00:00
stas	c61e1d6988	- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks	2009-06-28 21:49:43 +00:00
ed	fda6c6ecf4	Add FIONWRITE support to TTYs. TTYs already supported TIOCOUTQ, but FIONWRITE seems to be a more generic name for this. Approved by: re (kib)	2009-06-28 12:02:15 +00:00
phk	ed08f853a1	There are a number of ways an application can check if there are inbound data waiting on a filedescriptor, such as a pipe or a socket, for instance by using select(2), poll(2), kqueue(2), ioctl(FIONREAD) etc. But we have no way of finding out if written data have yet to be disposed of, for instance, transmitted (and ack'ed!) to some remote host, or read by the applicantion at the far end of the pipe. The closest we get, is calling shutdown(2) on a TCP socket in non-blocking mode, but this has the undesirable sideeffect of preventing future communication. Add a complement to FIONREAD, called FIONWRITE, which returns the number of bytes not yet properly disposed of. Implement it for all sockets. Background: A HTTP server will want to time out connections, if no new request arrives within a certain period after the last transmitted response has actually been sent (and ack'ed). For a busy HTTP server, this timeout can be subsecond duration. In order to signal to a load-balancer that the connection is truly dead, TCP_RST will be the preferred method, as this avoids the need for a RTT delay for FIN handshaking, with a client which, surprisingly often, no longer at the remote IP number. If a slow, distant client is being served a response which is big enough to fill the window, but small enough to fit in the socket buffer, the write(2) call will return immediately. If the session timeout is armed at that time, all bytes in the response may not have been transmitted by the time it fires. FIONWRITE allows the timeout to check that no data is outstanding on the connection, before it TCP_RST's it. Input & Idea from: rwatson Approved by: re (kib)	2009-06-28 11:28:14 +00:00
alc	9456008567	Correct a long-standing performance bug in cluster_rbuild(). Specifically, in the case of a file system with a block size that is less than the page size, cluster_rbuild() looks at too many of the page's valid bits. Consequently, it may terminate prematurely, resulting in poor performance. Reported by: bde Reviewed by: tegge Approved by: re (kib)	2009-06-27 21:37:36 +00:00
rwatson	da78c9e4a2	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
alc	91cafd48b1	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
kib	4ce38c4283	In lf_iteratelocks_vnode, increment state->ls_threads around iterating of the vnode advisory lock list. This prevents deallocation of state while inside the loop. Reported and tested by: pho MFC after: 2 weeks	2009-06-25 18:54:56 +00:00
jhb	e059cd9143	Return errors from intr_event_bind() to the caller of intr_set_affinity(). Specifically, if a non-root user attempts to bind an interrupt the request will now report failure with EPERM rather than silently failing with a successful return code. MFC after: 1 week	2009-06-25 18:35:19 +00:00
jhb	fb327076f9	Use the correct cast for the arguments passed to freebsd_shmctl() in oshmctl(). Submitted by: kib	2009-06-25 17:11:27 +00:00
jhb	59a38c2e5e	Tweak the oshmctl() compile fix: convert the K&R definition to ANSI.	2009-06-25 13:36:57 +00:00
rwatson	769ee782fc	oshmctl() now requires a sysv_shm.c-local function prototype.	2009-06-25 07:16:10 +00:00
jeff	826fa583d1	- Use DPCPU for SCHED_STATS. This is somewhat awkward because the offset of the stat is not known until link time so we must emit a function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC directly. - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu variables. Sched stats are always incremented while we're holding a spinlock so no further protection is required. Reviewed by: sam	2009-06-25 01:33:51 +00:00
jeff	a90de6966f	- Add a sysctl_dpcpu_long to support long typed pcpu stats. - Remove the #ifndef SMP case as the SMP code works on UP as well. Reviewed by: sam	2009-06-25 01:31:59 +00:00
jamie	3a4b2dc4d4	Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies. bz wants the blame for this. Noticed by: rwatson Approved by: bz (mentor)	2009-06-24 22:06:56 +00:00
jhb	2908b25ed7	Regen.	2009-06-24 21:54:08 +00:00
jamie	e53e57277b	In case of prisons with their own network stack, permit additional privileges as well as not restricting the type of sockets a user can open. Note: the VIMAGE/vnet fetaure of of jails is still considered experimental and cannot guarantee that privileged users can be kept imprisoned if enabled. Reviewed by: rwatson Approved by: bz (mentor)	2009-06-24 21:39:50 +00:00
jhb	6f52fe78fb	Change the ABI of some of the structures used by the SYSV IPC API: - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1] PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1]	2009-06-24 21:10:52 +00:00
jhb	0894d349bd	Deprecate the msgsys(), semsys(), and shmsys() system calls by moving them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.) rather than indirecting through the var-args *sys() system calls. The shmsys() system call was already effectively deprecated for all but COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case was to simply invoke nosys().	2009-06-24 20:01:13 +00:00
lulf	4208ef9967	- Similar to the previous commit, but for CURRENT: Fix a bug where a FIFO vnode use count was increased twice, but only decreased once.	2009-06-24 18:44:38 +00:00
lulf	6c60345b34	- Fix a bug where a FIFO vnode use count was increased twice, but only decreased once. MFC after: 1 week	2009-06-24 18:38:51 +00:00
jamie	5c8985f2c9	Fix a race in vi_if_move, where a vnet is used after the prison that referred to it has been released. Approved by: bz (mentor)	2009-06-24 15:29:36 +00:00
jhb	cceae54c51	Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.	2009-06-24 13:36:37 +00:00
jhb	d8d39adf3c	- Move syscall function argument structure types to be just above the relevenat system call function. - Whitespace fixes.	2009-06-24 13:35:38 +00:00
rwatson	df217187ce	Add stack_print_short() and stack_print_short_ddb() interfaces to stack(9), which generate a more compact rendition of a stack trace via the kernel's printf. MFC after: 1 week	2009-06-24 12:06:15 +00:00
jeff	5bc3a65e40	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
jeff	92b4ecdc77	- Use cpuset_t and the CPU_ macros in place of cpumask_t so that ULE supports arbitrary numbers of cpus rather than being limited by cpumask_t to the number of bits in a long.	2009-06-23 22:12:37 +00:00
ed	4a4fe9e6fc	Improve my last commit: use a separate condvar to serialize. The advantage of using a separate condvar is that we can just use cv_signal(9) instead of cv_broadcast(9). It makes no sense to wake up multiple threads. It also makes the TTY code easier to understand. t_dcdwait sounds totally unrelated.	2009-06-23 21:43:02 +00:00
ed	e93642ed4f	Use dcdwait to block threads to serialize writes. I suspect the usage of bgwait causes a lot of spurious wakeups when threads are blocked in the background, because they will be woken up each time a write() call is performed. Also wakeup dcdwait when the TTY is abandoned.	2009-06-23 21:33:26 +00:00
kib	fa686c638e	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
jamie	eeafb36508	Add a limit for child jails via the "children.cur" and "children.max" parameters. This replaces the simple "allow.jails" permission. Approved by: bz (mentor)	2009-06-23 20:35:51 +00:00
bz	0808d0b1a6	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
jamie	4405625484	Remove unnecessary/redundant includes. Approved by: bz (mentor)	2009-06-23 14:39:21 +00:00
pho	6945f47d28	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib	2009-06-23 11:29:54 +00:00
andre	e66ed06df4	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
andre	74f8982f2f	Add m_mbuftouio() helper function to copy(out) an arbitrary long mbuf chain into an arbitrary large uio in a single step. It is a functional mirror image of m_uiotombuf(). This function is supposed to be used instead of hand rolled code with the same purpose and to concentrate it into one place for potential further optimization or hardware assistance.	2009-06-22 22:20:38 +00:00
andre	00948f56d3	In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain. sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently. This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket. For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.	2009-06-22 21:46:40 +00:00
jhb	e206daf142	Regen.	2009-06-22 20:24:03 +00:00
jhb	4c0e16be52	Include definitions for the audit identifiers for compat system calls in sysproto.h. This makes it possible to use SYSCALL_MODULE() for compat system calls that live in kernel modules.	2009-06-22 20:14:10 +00:00
jhb	062accfe3d	Fix a typo in a comment.	2009-06-22 20:12:40 +00:00
andre	469e8778b5	Update m_demote: - remove HT_HEADER test (MT_HEADER == MT_DATA for some time now) - be more pedantic about m_nextpkt in other than first mbuf - update m_flags to be retained	2009-06-22 19:35:39 +00:00
kib	2fc79768f3	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson	2009-06-21 19:21:01 +00:00
kib	171c37f865	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
rdivacky	f56dfc12fb	In non-debugging mode make this define (void)0 instead of nothing. This helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@	2009-06-21 07:54:47 +00:00
brooks	03ed423a4a	Change crsetgroups_locked() (called by crsetgroups()) to sort the supplemental groups using insertion sort. Use this property in groupmember() to let us use a binary search instead of the previous linear search.	2009-06-20 20:29:21 +00:00
ed	63a4c7f522	Improve nested jail awareness of devfs by handling credentials. Now that we start to use credentials on character devices more often (because of MPSAFE TTY), move the prison-checks that are in place in the TTY code into devfs. Instead of strictly comparing the prisons, use the more common prison_check() function to compare credentials. This means that pseudo-terminals are only visible in devfs by processes within the same jail and parent jails. Even though regular users in parent jails can now interact with pseudo-terminals from child jails, this seems to be the right approach. These processes are also capable of interacting with the jailed processes anyway, through signals for example. Reviewed by: kib, rwatson (older version)	2009-06-20 14:50:32 +00:00
kmacy	573bf49614	define helper routines for deferred mbuf initialization	2009-06-19 21:14:39 +00:00
brooks	f53c1c309d	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
jhb	b74ac99a51	Fix a deadlock in the getpeername() method for UNIX domain sockets. Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week	2009-06-18 20:56:22 +00:00
alc	32a3828c70	Utilize the new function kmem_alloc_contig() to implement the UMA back-end allocator for the jumbo frames zones. This change has two benefits: (1) a custom back-end deallocator is no longer required. UMA's standard deallocator suffices. (2) It eliminates a potentially confusing artifact of using contigmalloc(): The malloc(9) statistics contain bogus information about the usage of jumbo frames. Specifically, the malloc(9) statistics report all jumbo frames in use whereas the UMA zone statistics report the "truth" about the number in use vs. the number free.	2009-06-18 17:59:04 +00:00
jhb	0abfb2bd6a	Regen.	2009-06-17 19:53:47 +00:00
jhb	fd29528e09	- Add the ability to mix multiple flags seperated by pipe ('\|') characters in the type field of system call tables. Specifically, one can now use the 'NO' types as flags in addition to the 'COMPAT' types. For example, to tag 'COMPAT' system calls as living in a KLD via NOSTD. The COMPAT type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT\|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.	2009-06-17 19:50:38 +00:00
jhb	e4d63f780a	Remove the now-unused NOIMPL flag. It serves no useful purpose given the existing UNIMPL and NOSTD types.	2009-06-17 18:46:14 +00:00
jhb	4881bdf1ef	- NOSTD results in lkmressys being used instead of lkmssys. - Mark nfsclnt as UNIMPL. It should have been NOSTD instead of NOIMPL back when it lived in nfsclient.ko, but it was removed from that a long time ago.	2009-06-17 18:44:15 +00:00
bz	48dc6805f8	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
kib	21376236dd	Decrement state->ls_threads when vnode appeared to be doomed. Reported and tested by: pho	2009-06-17 12:43:04 +00:00
attilio	256667d4fb	Introduce support for adaptive spinning in lockmgr. Actually, as it did receive few tuning, the support is disabled by default, but it can opt-in with the option ADAPTIVE_LOCKMGRS. Due to the nature of lockmgrs, adaptive spinning needs to be selectively enabled for any interested lockmgr. The support is bi-directional, or, in other ways, it will work in both cases if the lock is held in read or write way. In particular, the read path is passible of further tunning using the sysctls debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls should be axed or compiled out before release. Addictionally note that adaptive spinning doesn't cope well with LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers of LK_SLEEPFAIL are mainly interested in knowing if the interlock was dropped or not in order to reacquire it and re-test initial conditions. This directly interacts with adaptive spinning because lockmgr needs to drop the interlock while spinning in order to avoid a deadlock (further details in the comments inside the patch). Final note: finding someone willing to help on tuning this with relevant workloads would be either very important and appreciated. Tested by: jeff, pho Requested by: many	2009-06-17 01:55:42 +00:00
kib	b8351fcda2	Do not use casts (int )0 and (struct thread )0 for the arguments of vn_rdwr, use NULL. Reviewed by: jhb MFC after: 1 week	2009-06-16 15:13:45 +00:00
ed	fa3d9801cc	Perform some more cleanups to in-kernel session handling. The code that was in place in exit1() was mainly based on code from the old TTY layer. The main reason behind this, was because at one moment I ran a system that had two TTY layers in place at the same time. It is now sufficient to do the following: - Remove references from the session structure to the TTY vnode and the session leader. - If we have a controlling TTY and the session used by the TTY is equal to our session, send the SIGHUP. - If we have a vnode to the controlling TTY which has not been revoked, revoke it. While there, change sys/kern/tty.c to use s_ttyp in the comparison instead of s_ttyvp. It should not make any difference, because s_ttyvp can only become null when the session leader already left, but it's nicer to compare against the proper value.	2009-06-15 20:45:51 +00:00
jhb	28b41377e3	Regen.	2009-06-15 20:40:23 +00:00
jhb	447d980cd0	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks	2009-06-15 20:38:55 +00:00
ed	7de50e3afb	Make tcsetsid(3) work on revoked TTYs. Right now the only way to make tcsetsid(3)/TIOCSCTTY work, is by ensuring the session leader is dead. This means that an application that catches SIGHUPs and performs a sleep prevents us from assigning a new session leader. Change the code to make it work on revoked TTYs as well. This allows us to change init(8) to make the shutdown script run in a more clean environment.	2009-06-15 19:17:52 +00:00
jamie	f950eed7d7	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
jamie	5675a54fb1	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 18:59:29 +00:00
jamie	f419891544	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
bz	3bf4a233cf	Remove the static from int hardlink_check_uid. There is an external use in the opensolaris code. I am not sure how this ever worked but I have seen two reports of: link_elf: symbol hardlink_check_uid undefined lately. Reported by: Scott Ullrich (sullrich gmail.com), pfsense Reported by: Mister Olli (mister.olli googlemail.com)	2009-06-13 13:09:20 +00:00
jamie	e9da16507b	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz	2009-06-13 00:12:02 +00:00
ed	66c6fddb34	Revert my previous change, because it reintroduces an old regression. Because our rc scripts also open the /etc/ttyv* nodes, it revokes the console, preventing startup messages from being displayed. I really have to think about this. Maybe we should just give the console its own TTY and let it build on top of other TTYs. I'm still not sure what to do with input handling there.	2009-06-12 21:21:17 +00:00
ed	0e4ff69bba	Prevent yet another staircase effect bug in the console device. Even though I thought I fixed the staircase issue (and I was no longer able to reproduce it), I got some reports of the issue still being there. It turns out the staircase effect still occurred when /dev/console was kept open while killing the getty on the same TTY (ttyv0). For some reason I can't figure out how the old TTY code dealt with that, so I assume the issue has always been there. I only exposed it more by merging consolectl with ttyv0, which means that the issue was present, even on systems without a serial console. I'm now marking the console device as being closed when closing the regular TTY device node. This means that when the getty shuts down, init(8) will open /dev/console, which means the termios attributes will always be reset in this case.	2009-06-12 20:29:55 +00:00
ps	9cb3f3ca17	Stop asserting on exclusive locks in fsync since it can now support shared vnode locking on ZFS. Reviewed by: jhb	2009-06-11 17:06:45 +00:00
avg	fb53877c40	strict kobj signatures: linker_if fixes in symtab_get method symtab parameter is made constant as this reflects actual intention and usage of the method Reviewed by: imp, current@ Approved by: jhb (mentor)	2009-06-11 17:05:45 +00:00
kib	e1cb2941d4	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
kib	4184c18920	Do not leak the state->ls_lock after VI_DOOMED check introduced in the r192683. Reported by: pho Submitted by: jhb	2009-06-10 16:17:38 +00:00
bz	0b5c06357f	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
kib	754aba6b97	Fix r193923 by noting that type of a_fp is struct file *, not int. It was assumed that r193923 was trivial change that cannot be done wrong. MFC after: 2 weeks	2009-06-10 14:24:31 +00:00
kib	2a37bc559b	s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args definition. Discussed with: bde MFC after: 2 weeks	2009-06-10 14:09:05 +00:00
cperciva	632fa45574	Prevent integer overflow in direct pipe write code from circumventing virtual-to-physical page lookups. [09:09] Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10] Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11] Approved by: so (cperciva) Approved by: re (not really, but SVN wants this...) Security: FreeBSD-SA-09:09.pipe Security: FreeBSD-SA-09:10.ipv6 Security: FreeBSD-SA-09:11.ntpd	2009-06-10 10:31:11 +00:00
imp	9271db1148	We can actually remove devclass_find_driver.	2009-06-10 01:02:38 +00:00
imp	2f4822fe6a	As discussed on arch@, restire devclass_{add,delete,find,quiesce}_driver. They aren't needed or used and complicate locking newbus.	2009-06-09 23:24:04 +00:00
jamie	a353b40c7e	Fix some overflow errors: a signed allocation and an insufficiant array size. Reported by: pho Tested by: pho Approved by: bz (mentor)	2009-06-09 22:09:29 +00:00
trasz	583b806b87	Add part of NFSv4 ACL kernel support code that is required for the upcoming libc changes to work. Not connected to the kernel build yet; for now, it will be compiled into libc. Reviewed by: rwatson	2009-06-09 19:51:22 +00:00
alc	5e741eef84	Eliminate an instance of VM_PROT_READ_IS_EXEC that I overlooked in r190705.	2009-06-09 17:18:41 +00:00
jhb	77373ed468	Add support for multiple passes of the device tree during the boot-time probe. The current device order is unchanged. This commit just adds the infrastructure and ABI changes so that it is easier to merge later changes into 8.x. - Driver attachments now have an associated pass level. Attachments are not allowed to probe or attach to drivers until the system-wide pass level is >= the attachment's pass level. By default driver attachments use the "last" pass level (BUS_PASS_DEFAULT). Driver's that wish to probe during an earlier pass use EARLY_DRIVER_MODULE() instead of DRIVER_MODULE() which accepts the pass level as an additional parameter. - A new method BUS_NEW_PASS has been added to the bus interface. This method is invoked when the system-wide pass level is changed to kick off a rescan of the device tree so that drivers that have just been made "eligible" can probe and attach. - The bus_generic_new_pass() function provides a default implementation of BUS_NEW_PASS(). It first allows drivers that were just made eligible for this pass to identify new child devices. Then it propogates the rescan to child devices that already have an attached driver by invoking their BUS_NEW_PASS() method. It also reprobes devices without a driver. - BUS_PROBE_NOMATCH() is only invoked for devices that do not have an attached driver after being scanned during the final pass. - The bus_set_pass() function is used during boot to raise the pass level. Currently it is only called once during root_bus_configure() to raise the pass level to BUS_PASS_DEFAULT. This has the effect of probing all devices in a single pass identical to previous behavior. Reviewed by: imp Approved by: re (kib)	2009-06-09 14:26:23 +00:00
ps	ae099c88f7	Simply shared vnode locking and extend it to also include fsync. Also, in vop_write, no longer assert for exclusive locks on the vnode. Reviewed by: jhb, kmacy, jeffr	2009-06-08 21:23:54 +00:00
bz	b7ff2bdc20	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
zec	8b1f38241a	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
rwatson	2dc666fafe	Move zombie-reaping code out of kern_wait() and into its own function, proc_reap(). Reviewed by: jhb MFC after: 3 days Sponsored by: Google, Inc.	2009-06-08 15:26:09 +00:00
kib	66aa1d7eeb	Do not dereference vp->v_rdev without holding any of dev_mtx or vnode lock. Use code similar to devfs_fp_check(), but inlined to feet other checks performed by ttyhook_register(). Reviewed by: ed	2009-06-08 13:34:45 +00:00
alc	919e3cbf28	Eliminate unnecessary obfuscation when testing a page's valid bits.	2009-06-07 19:38:26 +00:00
alc	f482e4a9e0	Eliminate an unused variable from allocbuf(). Eliminate the unnecessary setting of page valid bits from a non-VMIO buffer in vm_hold_load_pages().	2009-06-07 18:19:04 +00:00
alc	569ccdf52b	If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)	2009-06-06 20:13:14 +00:00
des	8b0be2f4d3	Eliminate trailing_slash, which was made redundant in r193028. Remove a couple of 4-year-old "temporary" KASSERTs. Improve comments. MFC after: 1 week	2009-06-06 00:49:49 +00:00
marcus	3a1cf7a450	Unlock the cache lock before returning when we run out of buffer space trying to fill in the full path name. Reported by: David Naylor <naylor.b.david@gmail.com> Approved by: kib	2009-06-05 16:44:42 +00:00
ed	e2c9e0cf9e	Remove clists from the kernel. Clists were originally used by the TTY layer as a text buffer interface. The advantage of clists were that it would allocate a small set of additional buffers that could be shared between TTYs when needed. In the modern days we can just allocate some more KBs of memory to keep the TTYs satisfied. The global cfreelist also requires synchronisation, which may not be useful when trying to improve scalability. The MPSAFE TTY layer uses its own text buffers (ttyinq and ttyoutq). We had a small amount of drivers in the tree that still uses clists, like the old USB stack and some keyboard drivers. With the old USB stack gone and the keyboard drivers changed to use a circular buffer, we can safely remove clists from the kernel.	2009-06-05 15:31:38 +00:00
rwatson	f4934662e5	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
rwatson	5ed24cd3e4	Add mac_framework.h include missed when MAC code was (presumably) copied from another file.	2009-06-05 14:23:24 +00:00
brian	3f1308d2b5	If we're passed garbage in malloc_init(), panic() rather than expecting a KASSERT to handle it. People are likely to turn off INVARIANTS RSN and loading an old module can cause garbage-in here. I saw the issue with an older nvidia driver (x11/nvidia-driver) loading into a new kernel - a crash wasn't seen 'till sysctl_kern_malloc_stats(). I was lucky that mtp->ks_shortdesc was NULL and not something horrible. While I'm here, KASSERT that malloc_uninit() isn't passed something that's not in kmemstatistics. MFC after: 3 weeks	2009-06-05 09:16:52 +00:00
ps	46a77b2b36	When checking for shared writes, use the struct mount returned from vn_start_write. Reviewed by: jhb	2009-06-04 16:50:03 +00:00
ps	4505aa56ed	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr	2009-06-04 16:18:07 +00:00
rwatson	0f9e858440	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00
rwatson	0010c8697b	Remove unneeded include. MFC after: 3 days	2009-06-02 15:59:46 +00:00
attilio	44c490ae17	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho	2009-06-02 13:03:35 +00:00
alc	4a00409486	Correct a boundary case error in the management of a page's dirty bits by shm_dotruncate() and vnode_pager_setsize(). Specifically, if the length of a shared memory object or a file is truncated such that the length modulo the page size is between 1 and 511, then all of the page's dirty bits were cleared. Now, a dirty bit is cleared only if the corresponding block is truncated in its entirety.	2009-06-02 08:02:27 +00:00
jeff	7bd92180e7	- Use an acquire barrier to increment f_count in fget_unlocked and remove the volatile cast. Describe the reason in detail in a comment. Discussed with: bde, jhb	2009-06-02 06:55:32 +00:00
jhb	fea04a3fd1	Add an extension to the character device interface that allows character device drivers to use arbitrary VM objects to satisfy individual mmap() requests. - A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is added to cdevsw. This function is called for each mmap() request. If it returns ENODEV, then the mmap() request will fall back to using the device's device pager object and d_mmap(). Otherwise, the method can return a VM object to satisfy this entire mmap() request via object. It can also modify the starting offset into this object via foff. This allows device drivers to use the file offset as a cookie to identify specific VM objects. - vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when mapping V_CHR vnodes. This avoids duplicating all the cdev mmap handling code and simplifies some of vm_mmap_vnode(). - D_VERSION has been bumped to D_VERSION_02. Older device drivers using D_VERSION_01 are still supported. MFC after: 1 month	2009-06-01 21:32:52 +00:00
jhb	a1af9ecca4	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
jhb	e45af7ed87	Add a simple API to manage scatter/gather lists of phyiscal addresses. Each list describes a logical memory object that is backed by one or more physical address ranges. To minimize locking, the sglist objects themselves are immutable once they are shared. These objects may be used in the future to facilitate I/O requests using physically-addressed buffers. For the immediate future I plan to use them to implement a new type of VM object and pager. Reviewed by: jeff, scottl MFC after: 1 month	2009-06-01 20:35:39 +00:00
rwatson	d9b4d146e8	Add a flags field to struct ucred, and export that via kinfo_proc, consuming one of its spare fields. The cr_flags field is currently unused, but will be used for features, including capability mode and pay-as-you-go audit. Discussed with: jhb, sson	2009-06-01 20:26:51 +00:00
rwatson	b850b85534	Regenerate generated syscall files following changes to struct sysent in r193234.	2009-06-01 16:14:38 +00:00
rwatson	16864a2b80	Add 'sy_flags', a currently unused per-syscall entry flags field that will see future use in 9-CURRENT and 8-STABLE for features such as the capability-mode enable flag and pay-as-you-audit. Discussed with: jhb, sson	2009-06-01 16:13:06 +00:00
rwatson	2bab695560	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
alc	41ce4c9579	Eliminate a comment describing code that was deleted over eight years ago. Move another comment to its proper place. Fix a typo in a third comment.	2009-06-01 06:12:08 +00:00
rodrigc	2380a39feb	sys/boot/common.c ================= Extend the loader to parse the root file system mount options in /etc/fstab, and set a new loader variable vfs.root.mountfrom.options with these options. The root mount options must be a comma-delimited string, as specified in /etc/fstab. Only set the vfs.root.mountfrom.options variable if it has not been set in the environment. sys/kern/vfs_mount.c ==================== When mounting the root file system, pass the mount options specified in vfs.root.mountfrom.options, but filter out "rw" and "noro", since the initial mount of the root file system must be done as "ro". While we are here, try to add a few hints to the mountroot prompt to give users and idea what might of gone wrong during mounting of the root file system. Reviewed by: jhb (an earlier patch)	2009-06-01 01:02:30 +00:00
alc	5e539a33da	nfs_write() can use the recently introduced vfs_bio_set_valid() instead of vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.	2009-05-31 20:18:02 +00:00
kib	baf33bb569	Unbreak the build. Add missed probes. Reviewed by: rwatson Pointy hat to: me	2009-05-31 20:16:06 +00:00
kib	443228ae3b	Eliminate code duplication in vn_fullpath1() around the cache lookups and calls to vn_vptocnp() by moving more of the common code to vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that cache is locked around the call. Do not track buffer position by both the pointer and offset, use only buflen to record the start of the free space. Export vn_vptocnp() for external consumers as a wrapper around vn_vptocnp_locked() that locks the cache and handles hold counts. Tested by: pho	2009-05-31 14:57:43 +00:00
dchagin	ceed8fedd1	Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself. Approved by: kib (mentor) MFC after: 1 month	2009-05-31 12:12:38 +00:00
zec	861b77b017	Introduce an interm userland-kernel API for creating vnets and assigning ifnets from one vnet to another. Deletion of vnets is not yet supported. The interface is implemented as an ioctl extension so that no syscalls had to be introduced. This should be acceptable given that the new interface will be used for a short / interim period only, until the new jail management framwork gains the capability of managing vnets. This method for managing vimages / vnets has been in use for the past 7 years without any observable issues. The userland tool to be used in conjunction with the interim API can be found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and will most probably never get commited to svn. While here, bump copyright notices in kern_vimage.c and vimage.h to cover work done in year 2009. Approved by: julian (mentor) Discussed with: bz, rwatson	2009-05-31 12:10:04 +00:00
nwhitehorn	71b3aa6738	Provide a new CPU device driver ivar to report the nominal speed of the CPU, if available. This is meant to solve the issue of cpufreq misreporting speeds on CPUs that boot in a reduced power mode and have only relative speed control.	2009-05-31 08:59:15 +00:00
attilio	e6a06610ff	Remove the now invalid (and possibly unused) debug.mpsafevfs sysctl/tunable. Reviewed by: emaste Sponsored by: Sandvine Incorporated	2009-05-30 23:52:23 +00:00
trasz	0c63bcbfa4	Add VOP_ACCESSX, which can be used to query for newly added V* permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@	2009-05-30 13:59:05 +00:00
jamie	572db1408a	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
alc	40062c9280	Modify vm_hold_load_pages() to allocate pages using VM_ALLOC_NOOBJ rather than using the kernel object. This allows the elimination of page queues locking from vm_hold_free_pages().	2009-05-29 18:35:51 +00:00
rwatson	24aebfac43	Minor style tweak.	2009-05-29 14:25:51 +00:00
rwatson	baa6dded68	Since sched_pin() and sched_unpin() are already inlined, don't manually inline in rmlocks.	2009-05-29 14:20:10 +00:00
jhb	9f31bca435	Remove extra cpu_spinwait() invocations. This should really only be used in tight spin loops, not in these edge cases where we restart a much larger loop only a few times. Reviewed by: attilio	2009-05-29 14:03:34 +00:00
jhb	cdcee493a2	Tweak a few comments on adaptive spinning.	2009-05-29 13:56:34 +00:00
rwatson	52ba259960	Make the rmlock(9) interface a bit more like the rwlock(9) interface: - Add rm_init_flags() and accept extended options only for that variation. - Add a flags space specifically for rm_init_flags(), rather than borrowing the lock_init() flag space. - Define flag RM_RECURSE to use instead of LO_RECURSABLE. - Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS checking; this wasn't possible previously as rm_init() always passed LO_WITNESS when initializing an rmlock's struct lock. - Add RM_SYSINIT_FLAGS(). - Rename embedded mutex in rmlocks to make it more obvious what it is. - Update consumers. - Update man page.	2009-05-29 10:52:37 +00:00
des	ae136aabf0	Let vfs_lookup() return ENOTDIR if the path has a trailing slash and the last component is a symlink to something that isn't a directory. We introduce a new namei flag, TRAILINGSLASH, which is set by lookup() if the last component is followed by a slash. The trailing slash is then stripped, as before. If the final component is a symlink, lookup() will return to namei(), which will expand the symlink and call lookup() with the new path. When all symlinks have been resolved, lookup() checks if the TRAILINGSLASH flag is set, and if it is, and the vnode it ended up with is not a directory, it returns ENOTDIR. PR: kern/21768 Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru> MFC after: 3 weeks	2009-05-29 10:02:44 +00:00
des	0df4f0c59f	Fix misleading comment. MFC after: 1 week	2009-05-29 09:52:13 +00:00
attilio	4ae494ac2c	The patch for r193011 was partially rejected when applied, complete it.	2009-05-29 08:01:48 +00:00
ed	8d73adc757	Last minute TTY API change: remove mutex argument from tty_alloc(). I don't want people to override the mutex when allocating a TTY. It has to be there, to keep drivers like syscons happy. So I'm creating a tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex() should eventually be removed. The advantage of this approach, is that we can just remove a function, without breaking the regular API in the future.	2009-05-29 06:41:23 +00:00
attilio	e05714ba70	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
zml	b5e46da5a4	fail(9) support: Add support for kernel fault injection using KFAIL_POINT_* macros and fail_point_* infrastructure. Add example fail point in vfs_bio.c to simulate VM buf pressure. Approved by: dfr (mentor)	2009-05-27 16:36:54 +00:00
jamie	a013e0afcb	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
sson	527dd57555	Add the ksyms(4) pseudo driver. The ksyms driver allows a process to get a quick snapshot of the kernel's symbol table including the symbols from any loaded modules (the symbols are all merged into one symbol table). Unlike like other implementations, this ksyms driver maps memory in the process memory space to store the snapshot at the time /dev/ksyms is opened. It also checks to see if the process has already a snapshot open and won't allow it to open /dev/ksyms it again until it closes first. This prevents kernel and process memory from being exhausted. Note that /dev/ksyms is used by the lockstat(1) command. Reviewed by: gallatin kib (freebsd-arch) Approved by: gnn (mentor)	2009-05-26 21:39:09 +00:00
sson	c0d5996eb6	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
ed	a3ca9cc9a8	Get rid of M_TEMP.	2009-05-26 18:33:36 +00:00
pjd	d86eeb0dbe	Add missing socket options.	2009-05-26 09:19:21 +00:00
kib	862b0fc4e3	The advisory lock may be activated or activated and removed during the sleep waiting for conditions when the lock may be granted. To prevent lf_setlock() from accessing possibly freed memory, add reference counting to the struct lockf_entry. Bump refcount around the sleep. Make lf_free_lock() return non-zero when structure was freed, and use this after the sleep to return EINTR to the caller. The error code might need a clarification, but we cannot return success to usermode, since the lock is not owned anymore. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:39:38 +00:00
kib	c54d127bf0	In lf_purgelocks(), assert that state->ls_pending is empty after we weeded out threads, and clean ls_active instead of ls_pending. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:37:55 +00:00
kib	fec8f2f845	In lf_advlockasync(), recheck for doomed vnode after the state->ls_lock is acquired. In the lf_purgelocks(), assert that vnode is doomed and set *statep to NULL before clearing ls_pending list. Otherwise, we allow for the thread executing lf_advlockasync() to put new pending entry after state->ls_lock is dropped in lf_purgelocks(). Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:33:16 +00:00
ed	cb56810df2	Block when initially opening a TTY multiple times. In the original MPSAFE TTY code, I changed the behaviour by returning EBUSY. I thought this made more sense, because it's basically a race to see who gets the TTY first. It turns out this is not a good change, because it also causes EBUSY to be returned when another process is closing the TTY. This can happen during startup, when /etc/rc (or one of its children) is still busy draining its data and /sbin/init is attempting to open the TTY to spawn a getty. Reported by: bz Tested by: bz	2009-05-24 12:32:03 +00:00
kib	56e5e587c8	Replace the while statement with the if for clarity. The loop body cannot be executed more then once. Reviewed by: dfr Tested by: pho MFC after: 1 month	2009-05-24 12:28:38 +00:00
zec	48f748dc29	V_irtualize the if_clone framework, thus allowing for clonable ifnets to optionally have overlapping unit numbers if attached in different vnets. At this stage if_loop is the only clonable ifnet class that has been extended to allow for such overlapping allocation of unit numbers, i.e. in each vnet it is possible to have a lo0 interface. Other clonable ifnet classes remain to operate with traditional semantics, i.e. each instance of a clonable ifnet will be assigned a globally unique unit number, regardless in which vnet such an ifnet becomes instantiated. While here, garbage collect unused _lo_list field in struct vnet_net, as well as improve indentation for #defines in sys/net/vnet.h. The layout of struct vnet_net has changed, therefore bump __FreeBSD_version. This change has no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, brooks Approved by: julian (mentor)	2009-05-23 21:43:44 +00:00
jamie	e446b3e48f	Delay an error message until the variable it uses gets initialized. Found with: Coverity Prevent(tm) CID: 4316 Reported by: trasz Approved by: bz (mentor)	2009-05-23 16:13:26 +00:00
zec	363a644ce6	Introduce the if_vmove() function, which will be used in the future for reassigning ifnets from one vnet to another. if_vmove() works by calling a restricted subset of actions normally executed by if_detach() on an ifnet in the current vnet, and then switches to the target vnet and executes an appropriate subset of if_attach() actions there. if_attach() and if_detach() have become wrapper functions around if_attach_internal() and if_detach_internal(), where the later variants have an additional argument, a flag indicating whether a full attach or detach sequence is to be executed, or only a restricted subset suitable for moving an ifnet from one vnet to another. Hence, if_vmove() will not call if_detach() and if_attach() directly, but will call the if_detach_internal() and if_attach_internal() variants instead, with the vmove flag set. While here, staticize ifnet_setbyindex() since it is not referenced from outside of sys/net/if.c. Also rename ifccnt field in struct vimage to ifcnt, and do some minor whitespace garbage collection where appropriate. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, rwatson, brooks? Approved by: julian (mentor)	2009-05-22 22:09:00 +00:00
trasz	fb57d2691e	Make 'struct acl' larger, as required to support NFSv4 ACLs. Provide compatibility interfaces in both kernel and libc. Reviewed by: rwatson	2009-05-22 15:56:43 +00:00
ed	e309996070	Enable secure TTY input buffer flushing by default. I'm leaving the sysctl there. If people really notice a slowdown, they can revert to the old behaviour. Discussed with: kib	2009-05-21 16:48:06 +00:00
ed	77fae8a219	Add a new sysctl: kern.tty_inq_flush_secure. When enabled all TTY input queue buffers are zeroed when flushing or closing the TTY. Because TTY input queues are also used to store filled in passwords, this may be an interesting switch to enable for security minded people.	2009-05-21 16:19:54 +00:00
jhb	35ba8bce41	Only use the ABI compat shim for vfs.bufspace if the old buffer is smaller than a long. PR: amd64/134786 Submitted by: Emil Mikulic emikulic\| gmail MFC after: 3 days	2009-05-21 16:18:45 +00:00
attilio	68353e273f	Move the M_WAITOK flag in notify() into an M_NOWAIT one in order to match the behaviour alredy present with the further malloc() call in devctl_notify(). This fixes a bug in the CAM layer where the camisr handler finished to call camperiphfree() (and subsequently destroy_dev() resulting in a new dev notify) while the xpt lock is held. PR: kern/130330 Tested by: Riccardo Torrini <riccardo dot torrini at esaote dot com>	2009-05-21 13:22:07 +00:00
jhb	ebdd571432	Set the umask in a new file descriptor table earlier in fdcopy() to remove two lock operations.	2009-05-20 18:42:04 +00:00
jhb	53d01dc702	Remove an obsolete assertion. We always wake up all waiters when unlocking a mutex and never set the lock cookie == MTX_CONTESTED.	2009-05-20 18:29:14 +00:00
jhb	033485e00c	Fix a typo.	2009-05-20 17:19:30 +00:00
imp	d339e0404f	We no longer need to use d_thread_t for portability here, switch to struct thread *.	2009-05-20 16:58:16 +00:00
kmacy	879984a728	Add minimal ZFS lock hierarchy	2009-05-20 02:51:48 +00:00
rwatson	9c1bc41813	With SMPng, DEVICE_POLLING uses its own idle threads, rather than the system idle loop, to run ether_poll(), so make ether_poll() static. MFC after: 1 week	2009-05-19 19:21:25 +00:00
avg	89d59b82b3	sysctl_rman: report shared resources to devinfo shared uses of a resource are recorded on a sub-list hanging off a main resource object on a main resource list; without this change a shared resource (e.g. irq) is reported only once by devinfo -r/-u; with this change the resource is reported for each driver that allocates it (which is even more than what vmstat -i -a reports). Approved by: jhb (mentor)	2009-05-19 14:08:21 +00:00
rwatson	d9e163e093	Binding interrupts to a CPU consists of two parts: setting up CPU affinity for the interrupt thread, and requesting that underlying hardware direct interrupts to the CPU. For software interrupt threads, implement a no-op interrupt event binder that returns success, so that the interrupt management code will just set the ithread's affinity and succeed. Reviewed by: jhb MFC after: 1 week	2009-05-18 14:02:55 +00:00
ed	a6c06ba89c	Mark the clock sysctls as MPSAFE. These sysctls don't need any form of locking. At least cp_times is used by powerd very often, which means I get 50% less calls to non-MPSAFE sysctls on my system. The other 50% is consumed by dev.cpu.0.freq, but this seems to need Giant for Newbus.	2009-05-18 12:03:43 +00:00
alc	c8b00d493e	Several changes to vfs_bio_clrbuf(): Provide a more descriptive comment. Eliminate dead code. The page cannot possibly have PG_ZERO set. Eliminate unnecessary blank lines. Reviewed by: tegge	2009-05-17 23:25:53 +00:00
alc	dc942dabcf	Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge	2009-05-17 20:26:00 +00:00
ed	44192767f8	Print an extra newline when not at the first column already. This makes siginfo output look a lot better when pressing it the first time when in sh(1), for example: $ load: 0.00 cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k load: 0.00 cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k will now become: $ load: 0.00 cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k load: 0.00 cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k	2009-05-17 16:17:48 +00:00
ed	69acbef6ca	Several cleanups to tty_info(), better known as Ctrl-T. - Only pick up PROC_LOCK once, which means we can drop the PGRP_LOCK right after picking up PROC_LOCK for the first time. - Print the process real time, making it consistent with tools like time(1). - Use `p' and `td' to reference the process/thread we are going to print. Only use pick-variables inside the loops. We already did this for the threads, but not the processes.	2009-05-17 12:30:25 +00:00
des	9919b017d4	Remove do-nothing code that was required to dirty the old buffer on Alpha. Coverity ID: 838 Approved by: jhb, alc	2009-05-15 21:34:58 +00:00
kib	b8162aa0c9	Revert r192094. The revision caused problems for sysctl(3) consumers that expect that oldlen is filled with required buffer length even when supplied buffer is too short and returned error is ENOMEM. Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when remaining buffer space is too short for the current kinfo_file structure. Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition to keep existing interface for the sysctl, though. Reported by: ed, Florian Smeets <flo kasimir com> Tested by: pho	2009-05-15 14:41:44 +00:00
jhb	cbf4ebe5a3	- Use a separate sx lock to try to limit the number of concurrent userland sysctl requests to avoid wiring too much user memory. Only grab this lock if the user's old buffer is larger than a page as a tradeoff to allow more concurrency for common small requests. - Just use a shared lock on the sysctl tree for user sysctl requests now. MFC after: 1 week	2009-05-14 22:01:32 +00:00
kib	7361db2279	Do not advance req->oldidx when sysctl_old_user returning an error due to copyout failure or short buffer. The later breaks the usermode iterators of the sysctl results that pack arbitrary number of variable-sized structures. Iterator expects that kernel filled exactly oldlen bytes, and tries to interpret half-filled or garbage structure at the end of the buffer. In particular, kinfo_getfile(3) segfaulted. Reported and tested by: pho MFC after: 3 weeks	2009-05-14 10:54:57 +00:00
jeff	20397e6431	- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.	2009-05-14 03:24:22 +00:00
alc	82da6bfdea	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
trasz	e6d7976851	Add missing 'break' statement. Found with: Coverity Prevent(tm) CID: 3919	2009-05-12 17:05:40 +00:00
kib	60c4168558	Prevent overflow of uio_resid. Noted by: jhb MFC after: 3 days	2009-05-11 19:58:03 +00:00
attilio	c639aa3d25	Fix a kernel compilation error, introduced after r191990, by defining thread with curthread in the AUDIT case. Reported by: dchagin	2009-05-11 16:32:58 +00:00
attilio	1dcb84131b	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00
alc	123b385c44	Revert CVS revision 1.94 (svn r16840). Current pmap implementations don't suffer from the race condition that motivated revision 1.94. Consequently, the work-around that was implemented by revision 1.94 is no longer needed. Moreover, reverting this work-around eliminates the need for vfs_busy_pages() to acquire the page queues lock when preparing a buffer for read. Reviewed by: tegge	2009-05-11 05:16:57 +00:00
imp	9a12159016	Spell NULL properly, use (void) rather than () for functions with no parameters. Mark two items as static that aren't used elsewhere...	2009-05-09 19:08:22 +00:00
imp	66ca9cb573	Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so it can go.	2009-05-09 19:00:47 +00:00
kan	7b57a857b7	Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib	2009-05-09 18:09:17 +00:00
zec	b31e199a10	A NOP change: style / whitespace cleanup of the noise that slipped into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)	2009-05-08 14:34:25 +00:00
zec	639797b2e6	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
jamie	267ea54b44	Move the per-prison Linux MIB from a private one-off pointer to the new OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-07 18:36:47 +00:00
kib	78e147b4e4	Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy(). Suggested and reviewed by: jeff Tested by: pho MFC after: 3 weeks	2009-05-07 18:14:21 +00:00
ed	2f086e8725	If we have a regular rint handler, never go into rint_bypass mode. It turns out if we called cfmakeraw() on a TTY with only a rint handler in place, it could inject data into the TTY, even though it should be redirected. Always take a look at the hooks before looking at the termios flags.	2009-05-07 17:39:23 +00:00
zec	d78a1b1a82	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
jamie	8e4ffe653f	Add a constant PR_MAXMETHOD to better define the jail/OSD interface. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-05 05:49:08 +00:00
ed	fb2908c8ff	Remove unneeded check for SESS_LEADER(). We perform the same check ~10 lines above.	2009-05-04 11:11:10 +00:00
jamie	d462264a61	Don't call the OSD destructor if the data slot is NULL (since it's already not done on unused slots, which are indistinguishable to the caller). Approved by: bz (mentor)	2009-04-30 22:43:21 +00:00
zec	39b6dc8ba2	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
jeff	9ff631ca46	- Fix non-SMP build by encapsulating idle spin logic in a macro. Pointy hat to: me	2009-04-29 23:04:31 +00:00
jamie	8fbb51e637	Regen for new jail system calls in r191673. Approved by: bz (mentor)	2009-04-29 21:50:13 +00:00
jamie	453b86f943	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)	2009-04-29 21:14:15 +00:00
bms	32a71137f0	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
jamie	51a4d1c4a3	Some non-functional changes: whitespace, KASSERT strings, declaration order. Approved by: bz (mentor)	2009-04-29 18:41:08 +00:00
jeff	fe5d856f47	- Fix the FBSDID line.	2009-04-29 03:26:30 +00:00
jeff	88a1cd92bb	- Remove the bogus idle thread state code. This may have a race in it and it only optimized out an ipi or mwait in very few cases. - Skip the adaptive idle code when running on SMT or HTT cores. This just wastes cpu time that could be used on a busy thread on the same core. - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use CG_FLAG_THREAD to mean SMT or HTT. Sponsored by: Nokia	2009-04-29 03:15:43 +00:00
bz	1507f5bd4d	Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail. Processes inside the jail will still be able to change child sets. A superuser outside of a jail will still be able to change the jail cpuset and thus limit the number of cpus available to the jail. Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman) PR: kern/134050 Reviewed by: jeff MFC after: 3 weeks X-MFC: backout r191596	2009-04-28 21:00:50 +00:00
rwatson	fb89496678	Improve approximation of style(9).	2009-04-26 21:16:03 +00:00
zec	71a3c7baca	Extend the vnet module registration / initialization framework first introduced @ r190909 with a vnet module deregistration service. kldunloadable modules, which are currently using vnet_mod_register() to attach their per-vnet initialization routines to the vnet initialization framework, should call vnet_mod_deregister() before acknowledging MOD_UNLOAD requests in their mod_event handlers. Such changes to the existing code base will follow in subsequent commits. vnet_mod_deregister() does not check whether departing vnet modules are registered as prerequisites for another module(s), so it should be used with care. Currently I'm only aware of vnet modules which are leafs on module dependency graphs that are kldunloadable. This change also introduces per-vnet module destructor handler, which calls vnet's module cleanup function, which (if required) has to be registered in vnet module's vnet_modinfo_t structure .vmi_idetach field. Once options VIMAGE becomes operational, the framework will take care that module's cleanup function become invoked for each active vnet instance, and that the memory allocated for each instance gets freed. Currently calls to destructor handlers must always succeed.	2009-04-26 07:09:39 +00:00
ed	b8d7cb1963	Turn MAXPTSDEVS into a sysctl tunable. This allows users to increase the maximum amount of pseudo-terminals without changing any source code. Users must increase UT_LINESIZE before attempting to increase kern.pts_maxdev.	2009-04-25 10:05:55 +00:00
bz	cb075fbf99	Correct a comment: the function name given had never existed in any (relevant) version of this file orany of my patches. MFC after: 1 month	2009-04-22 20:49:54 +00:00
emax	08708b99cb	Fix sbappendrecord_locked(). The main problem is that sbappendrecord_locked() relies on sbcompress() to set sb_mbtail. This will not happen if sbappendrecord_locked() is called with mbuf chain made of exactly one mbuf (i.e. m0->m_next == NULL). In this case sbcompress() will be called with m == NULL and will do nothing. I'm not entirely sure if m == NULL is a valid argument for sbcompress(), and, it rather pointless to call it like that, but keep calling it so it can do SBLASTMBUFCHK(). The problem is triggered by the SOCKBUF_DEBUG kernel option that enables SBLASTRECORDCHK() and SBLASTMBUFCHK() checks. PR: kern/126742 Investigated by: pluknet < pluknet -at- gmail -dot- com > No response from: freebsd-current@, freebsd-bluetooth@ MFC after: 3 days	2009-04-21 19:14:13 +00:00
kib	76ae16d421	Fix typo. Noted by: jhb MFC after: 2 weeks	2009-04-20 15:10:03 +00:00
kib	32441d8ae9	On the exit of the child process which parent either set SA_NOCLDWAIT or ignored SIGCHLD, unconditionally wake up the parent instead of doing this only when the child is a last child. This brings us in line with other U**xes that support SA_NOCLDWAIT. If the parent called waitpid(childpid), then exit of the child should wake up the parent immediately instead of forcing it to wait for all children to exit. Reported by: Alan Ferrency <alan pair com> Submitted by: Jilles Tjoelker <jilles stack nl> PR: 108390 MFC after: 2 weeks	2009-04-20 14:34:55 +00:00
rwatson	f15ded690a	Lock the interface address list while iterating a network interface's address list when searching for a link-layer address to use during uuid generation. MFC after: 2 weeks	2009-04-19 21:36:18 +00:00
rwatson	367054e0a3	struct malloc_type has had a 'magic' field statically initialized to M_MAGIC by MALLOC_DEFINE() for a long time; add assertions that malloc_type's passed to malloc(), free(), etc have that magic set. MFC after: 2 weeks	2009-04-19 12:41:37 +00:00
trasz	128b961224	When allocating 'struct acl' instances, use malloc(9) instead of uma(9). This struct will get much bigger soon, and we don't want to waste too much memory on UMA caches. Reviewed by: rwatson	2009-04-19 09:56:30 +00:00
trasz	858b10f6e2	Use acl_alloc() and acl_free() instead of using uma(9) directly. This will make switching to malloc(9) easier; also, it would be neccessary to add these routines if/when we implement variable-size ACLs.	2009-04-18 16:47:33 +00:00
kan	06ed41c4fa	Undo private changes that should never have been committed.	2009-04-17 18:34:11 +00:00
kan	febc8407e1	More fallout from negative dotdot caching. Negative entries should be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib	2009-04-17 18:11:11 +00:00
kib	c6f7194967	In flushbufqueues(), do not allocate sentinel buffer on the stack, struct buf is large. Use sleeping malloc(9) call, and zero the allocated buf as a debugging feature.	2009-04-16 09:37:48 +00:00
kib	0f76607a29	Export the number of times bufdaemon got help from the normal threads.	2009-04-16 09:33:52 +00:00
ed	244e1ecdbf	Remove dead code from devtoname(). In the good old days it was possible to have dev_t's that referred to nonexistent devices. In these cases devtoname() automatically generated names. This is no longer possible, so remove this dead code. Discussed with: kib	2009-04-15 20:43:12 +00:00
ed	46ae43e7c6	Remove unneeded variable and casting from newdev(). Remove the `udev' variable, which has a different type than the original function argument and si_drv0. The `udev' name is also misleading, because it is not the number returned by dev2udev(). Rename this argument to `unit'. It is the same number as returned by dev2unit().	2009-04-15 20:15:36 +00:00
ed	6df1ef4d2f	Don't use si_drv0 directly. We should still access si_drv0 using dev2unit(). Also change the KASSERT() to really print the udev instead of the unit number. I suspect it's still useful to print the unit number, especially for devices that use clone lists, so keep the unit number in the panic string.	2009-04-15 20:08:26 +00:00
jhb	a66490b889	Update comment above _fget() for earlier change to FWRITE failures return EBADF rather than EINVAL. Submitted by: Jaakko Heinonen jh saunalahti fi MFC after: 1 month	2009-04-15 19:10:37 +00:00
kan	9761044c2f	Redo previous change using simpler patch that happens to be also more correct. Submitted by: tor	2009-04-14 23:56:48 +00:00
kan	3d27e1410d	Fix yet another negative dotodot entry fallout. Reported by: pho	2009-04-14 23:46:57 +00:00
kmacy	de9c351c80	- use a shared lock for reads - remove stale comment Reviewed by: jeffr	2009-04-13 23:09:44 +00:00
davidxu	87940a2686	Make UMTX_OP_WAIT_UINT actually wait for an unsigned integer on 64-bits machine. MFC after: 1 week	2009-04-13 05:21:17 +00:00
kmacy	0cb01ac4ef	sendfile doesn't modify the vnode - acquire vnode lock shared Reviewed by: ups, jeffr	2009-04-12 05:19:35 +00:00
rwatson	90c1837110	Remove conditionally compiled time counter statistics; tools like DTrace, kernel profiling, etc, can provide this information without the overhead. MFC after: 3 days Suggested by: bde	2009-04-11 22:01:40 +00:00
kan	cae135c489	Fix v_cache_dd handling for negative entries. v_cache_dd pointer was not populated in parent directory if negative entry was being created, yet entry itself was added to the nc_neg list. It was possible for parent vnode to get discarded later, leaving negative entry pointing to now unused memory block. Reported by: dho Revewed by: kib	2009-04-11 20:23:08 +00:00
kib	9260481d50	When zapping v_cache_dd for !MAKEENTRY case in cache_lookup(), we shall lock cache as writer. Reviewed by: kan	2009-04-11 16:12:20 +00:00
zec	b39b54e6de	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
rwatson	fba90f2e03	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
kib	a5ecda6fd6	Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then. In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim. Reported and tested by: pho	2009-04-10 10:22:44 +00:00
thompsa	39714cb212	Revert r190676,190677 The geom and CAM changes for root_hold are the wrong solution for USB design quirks. Requested by: scottl	2009-04-10 04:08:34 +00:00
ed	df0bd942fb	Fix tty_wait_background() to comply with standards. It turns out my handling of SIGTTOU and SIGTTIN didn't entirely comply to the standards. It is true that in the SIGTTOU case we should not return EIO when the signal is ignored/blocked, but in the SIGTTIN case we must. See also: POSIX issue 7 section 11.1.4	2009-04-08 15:56:50 +00:00
rwatson	b2a611f93a	Nul-terminate strings in the VFS name cache, which negligibly change the size and cost of name cache entries, but make adding debugging and tracing easier. Add SDT DTrace probes for various namecache events: vfs:namecache:enter:done - new entry in the name cache, passed parent directory vnode pointer, name added to the cache, and child vnode pointer. vfs:namecache:enter_negative:done - new negative entry in the name cache, passed parent vnode pointer, name added to the cache. vfs:namecache:fullpath:enter - call to vn_fullpath1() is made, passed the vnode to resolve to a name. vfs:namecache:fullpath:hit - vn_fullpath1() successfully resolved a search for the parent of an object using the namecache, passed the discovered parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:fullpath:miss - vn_fullpath1() failed to resolve a search for the parent of an object using the namecache, passed the child vnode pointer. vfs:namecache:fullpath:return - vn_fullpath1() has completed, passed the error number, and if that is zero, the vnode to resolve, and the returned path. vfs:namecache:lookup:hit - postive name cache entry hit, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:lookup:hit_negative - negative name cache entry hit, passed the parent directory vnode pointer and name. vfs:namecache:lookup:miss - name cache miss, passed the parent directory pointer and the full remaining component name (not terminated after the cache miss component). vfs:namecache:purge:done - name cache purge for a vnode, passed the vnode pointer to purge. vfs:namecache:purge_negative:done - name cache purge of negative entries for children of a vnode, passed the vnode pointer to purge. vfs:namecache:purgevfs - name cache purge for a mountpoint, passed the mount pointer. Separate probes will also be invoked for each cache entry zapped. vfs:namecache:zap:done - name cache entry zapped, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:zap_negative:done - negative name cache entry zapped, passed the parent directory vnode pointer and name. For any probes involving an extant name cache entry (enter, hit, zapp), we use the nul-terminated string for the name component. For misses, the remainder of the path, including later components, is provided as an argument instead since there is no handy nul-terminated version of the string around. This is arguably a bug. MFC after: 1 month Sponsored by: Google, Inc. Reviewed by: jhb, kan, kib (earlier version)	2009-04-07 20:58:56 +00:00
rwatson	f775cac272	Add SDT DTrace probes for namei(): vfs:namei:lookup:entry takes parent directory vnode pointer, path to look up, and lookup flags. vfs:namei:lookup:return takes an error value, and if successful, the returned vnode pointer. MFC after: 1 month	2009-04-06 10:32:40 +00:00
dchagin	01bf63c9fb	Fix KBI breakage by r190520 which affects older linux.ko binaries: 1) Move the new field (brand_note) to the end of the Brandinfo structure. 2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer is valid. 3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old modules won't have the flag set, so the new field brand_note would be ignored. Suggested by: jhb Reviewed by: jhb Approved by: kib (mentor) MFC after: 6 days	2009-04-05 09:27:19 +00:00
kan	51a543ed33	Revert change 190655 temporarily. It breaks many setups where nullfs is used and needs to be revisited.	2009-04-04 17:48:38 +00:00
marcel	c9498bd9af	PowerPC, meet kernel core dumps. The support is based on a generic dumper that creates an ELF core file and uses PMAP functions to scan and iterate over memory chunks, as well as handle memory mappings used during dumping. the PMAP layer can choose to return physical memory chunks or virtual memory chunks. For minidumps, the chunks should be virtual. The default MMU I/F implementation for the scan_md() method returns NULL. Thus, when a PMAP implementation does not implement the required methods, an empty core file is created. Here, empty means having an ELF header only. Obtained from: Juniper Networks	2009-04-04 02:12:37 +00:00
thompsa	fe5458f665	Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.	2009-04-03 19:46:12 +00:00

... 3 4 5 6 7 ...

11475 Commits