freebsd-skq

Author	SHA1	Message	Date
rwatson	0dd7c48b8f	For access(2) and eaccess(2), audit the requested access mode. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 22:47:45 +00:00
jeff	cd329071c1	- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than using the size of the descriptor array. - A lock is not needed to fetch fd_lastfile. The results are stale the instant it is dropped. - Use a private mutex pool for select since the pool mutex is not used as a leaf. - Fetch the si_mtx pointer first before resorting to hashing to compute the mutex address. Reviewed by: McKusick Approved by: re (kib)	2009-07-01 20:43:46 +00:00
rwatson	cd692bb0b9	Audit file descriptor numbers for various socket-related system calls. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 19:55:11 +00:00
rwatson	4aa2654586	Define missing audit argument macro AUDIT_ARG_SOCKET(), and capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 18:54:49 +00:00
jhb	76256698a1	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
rwatson	9f2c78b3f9	When auditing unmount(2), capture FSID arguments as regular text strings rather than as paths, which would lead to them being treated as relative pathnames and hence confusingly converted into absolute pathnames. Capture flags to unmount(2) via an argument token. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 16:56:56 +00:00
rwatson	f90eaa96d0	Audit the file descriptor number passed to lseek(2). Approved by: re (kib) MFC after: 3 days	2009-07-01 15:37:23 +00:00
rwatson	0e50a12ccd	Fix link(2) auditing: use the second audit record path for the new object name. Approved by: re (kib) MFC after: 3 days	2009-07-01 13:22:08 +00:00
rwatson	dff449105c	udit the 'options' argument to wait4(2). Approved by: re (kib) MFC after: 3 days	2009-07-01 12:36:10 +00:00
alc	45ba0262c0	Remove a stale comment. The very same revision (r85511) that introduced this comment also implemented the proposed change to the code. Approved by: re (kib)	2009-06-30 19:39:17 +00:00
emaste	02a0b3a839	Add FIONSPACE from NetBSD. FIONSPACE is provided so that programs may easily determine how much space is left in the send queue; they do not need to know the send queue size. NetBSD revisions: sys_socket.c r1.41, 1.42 filio.h r1.9 Obtained from: NetBSD Approved by: re (kensmith)	2009-06-30 13:38:49 +00:00
kib	242d1f11af	Free struct ucreds allocated in vfs_hang_addrlist() when deleting the export element. While there, remove register storage-class specifiers. Reported and tested by: pho Reviewed by: kan Approved by: re (kensmith)	2009-06-29 18:09:07 +00:00
attilio	84c65084a0	Don't assume a default (currently 15) value for preloaded klds when loading hwpmc, but calculate at runtime and allocate the necessary space. Also the current logic is wrong as it can lead to an endless loop. Sponsored by: Sandvine Incorporated Reported by: Ryan Stone <rstone at sandvine dot com> Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Approved by: re (kib)	2009-06-29 16:03:18 +00:00
stas	c61e1d6988	- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks	2009-06-28 21:49:43 +00:00
ed	fda6c6ecf4	Add FIONWRITE support to TTYs. TTYs already supported TIOCOUTQ, but FIONWRITE seems to be a more generic name for this. Approved by: re (kib)	2009-06-28 12:02:15 +00:00
phk	ed08f853a1	There are a number of ways an application can check if there are inbound data waiting on a filedescriptor, such as a pipe or a socket, for instance by using select(2), poll(2), kqueue(2), ioctl(FIONREAD) etc. But we have no way of finding out if written data have yet to be disposed of, for instance, transmitted (and ack'ed!) to some remote host, or read by the applicantion at the far end of the pipe. The closest we get, is calling shutdown(2) on a TCP socket in non-blocking mode, but this has the undesirable sideeffect of preventing future communication. Add a complement to FIONREAD, called FIONWRITE, which returns the number of bytes not yet properly disposed of. Implement it for all sockets. Background: A HTTP server will want to time out connections, if no new request arrives within a certain period after the last transmitted response has actually been sent (and ack'ed). For a busy HTTP server, this timeout can be subsecond duration. In order to signal to a load-balancer that the connection is truly dead, TCP_RST will be the preferred method, as this avoids the need for a RTT delay for FIN handshaking, with a client which, surprisingly often, no longer at the remote IP number. If a slow, distant client is being served a response which is big enough to fill the window, but small enough to fit in the socket buffer, the write(2) call will return immediately. If the session timeout is armed at that time, all bytes in the response may not have been transmitted by the time it fires. FIONWRITE allows the timeout to check that no data is outstanding on the connection, before it TCP_RST's it. Input & Idea from: rwatson Approved by: re (kib)	2009-06-28 11:28:14 +00:00
alc	9456008567	Correct a long-standing performance bug in cluster_rbuild(). Specifically, in the case of a file system with a block size that is less than the page size, cluster_rbuild() looks at too many of the page's valid bits. Consequently, it may terminate prematurely, resulting in poor performance. Reported by: bde Reviewed by: tegge Approved by: re (kib)	2009-06-27 21:37:36 +00:00
rwatson	da78c9e4a2	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
alc	91cafd48b1	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
kib	4ce38c4283	In lf_iteratelocks_vnode, increment state->ls_threads around iterating of the vnode advisory lock list. This prevents deallocation of state while inside the loop. Reported and tested by: pho MFC after: 2 weeks	2009-06-25 18:54:56 +00:00
jhb	e059cd9143	Return errors from intr_event_bind() to the caller of intr_set_affinity(). Specifically, if a non-root user attempts to bind an interrupt the request will now report failure with EPERM rather than silently failing with a successful return code. MFC after: 1 week	2009-06-25 18:35:19 +00:00
jhb	fb327076f9	Use the correct cast for the arguments passed to freebsd_shmctl() in oshmctl(). Submitted by: kib	2009-06-25 17:11:27 +00:00
jhb	59a38c2e5e	Tweak the oshmctl() compile fix: convert the K&R definition to ANSI.	2009-06-25 13:36:57 +00:00
rwatson	769ee782fc	oshmctl() now requires a sysv_shm.c-local function prototype.	2009-06-25 07:16:10 +00:00
jeff	826fa583d1	- Use DPCPU for SCHED_STATS. This is somewhat awkward because the offset of the stat is not known until link time so we must emit a function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC directly. - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu variables. Sched stats are always incremented while we're holding a spinlock so no further protection is required. Reviewed by: sam	2009-06-25 01:33:51 +00:00
jeff	a90de6966f	- Add a sysctl_dpcpu_long to support long typed pcpu stats. - Remove the #ifndef SMP case as the SMP code works on UP as well. Reviewed by: sam	2009-06-25 01:31:59 +00:00
jamie	3a4b2dc4d4	Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies. bz wants the blame for this. Noticed by: rwatson Approved by: bz (mentor)	2009-06-24 22:06:56 +00:00
jhb	2908b25ed7	Regen.	2009-06-24 21:54:08 +00:00
jamie	e53e57277b	In case of prisons with their own network stack, permit additional privileges as well as not restricting the type of sockets a user can open. Note: the VIMAGE/vnet fetaure of of jails is still considered experimental and cannot guarantee that privileged users can be kept imprisoned if enabled. Reviewed by: rwatson Approved by: bz (mentor)	2009-06-24 21:39:50 +00:00
jhb	6f52fe78fb	Change the ABI of some of the structures used by the SYSV IPC API: - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1] PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1]	2009-06-24 21:10:52 +00:00
jhb	0894d349bd	Deprecate the msgsys(), semsys(), and shmsys() system calls by moving them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.) rather than indirecting through the var-args *sys() system calls. The shmsys() system call was already effectively deprecated for all but COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case was to simply invoke nosys().	2009-06-24 20:01:13 +00:00
lulf	4208ef9967	- Similar to the previous commit, but for CURRENT: Fix a bug where a FIFO vnode use count was increased twice, but only decreased once.	2009-06-24 18:44:38 +00:00
lulf	6c60345b34	- Fix a bug where a FIFO vnode use count was increased twice, but only decreased once. MFC after: 1 week	2009-06-24 18:38:51 +00:00
jamie	5c8985f2c9	Fix a race in vi_if_move, where a vnet is used after the prison that referred to it has been released. Approved by: bz (mentor)	2009-06-24 15:29:36 +00:00
jhb	cceae54c51	Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.	2009-06-24 13:36:37 +00:00
jhb	d8d39adf3c	- Move syscall function argument structure types to be just above the relevenat system call function. - Whitespace fixes.	2009-06-24 13:35:38 +00:00
rwatson	df217187ce	Add stack_print_short() and stack_print_short_ddb() interfaces to stack(9), which generate a more compact rendition of a stack trace via the kernel's printf. MFC after: 1 week	2009-06-24 12:06:15 +00:00
jeff	5bc3a65e40	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
jeff	92b4ecdc77	- Use cpuset_t and the CPU_ macros in place of cpumask_t so that ULE supports arbitrary numbers of cpus rather than being limited by cpumask_t to the number of bits in a long.	2009-06-23 22:12:37 +00:00
ed	4a4fe9e6fc	Improve my last commit: use a separate condvar to serialize. The advantage of using a separate condvar is that we can just use cv_signal(9) instead of cv_broadcast(9). It makes no sense to wake up multiple threads. It also makes the TTY code easier to understand. t_dcdwait sounds totally unrelated.	2009-06-23 21:43:02 +00:00
ed	e93642ed4f	Use dcdwait to block threads to serialize writes. I suspect the usage of bgwait causes a lot of spurious wakeups when threads are blocked in the background, because they will be woken up each time a write() call is performed. Also wakeup dcdwait when the TTY is abandoned.	2009-06-23 21:33:26 +00:00
kib	fa686c638e	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
jamie	eeafb36508	Add a limit for child jails via the "children.cur" and "children.max" parameters. This replaces the simple "allow.jails" permission. Approved by: bz (mentor)	2009-06-23 20:35:51 +00:00
bz	0808d0b1a6	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
jamie	4405625484	Remove unnecessary/redundant includes. Approved by: bz (mentor)	2009-06-23 14:39:21 +00:00
pho	6945f47d28	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib	2009-06-23 11:29:54 +00:00
andre	e66ed06df4	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
andre	74f8982f2f	Add m_mbuftouio() helper function to copy(out) an arbitrary long mbuf chain into an arbitrary large uio in a single step. It is a functional mirror image of m_uiotombuf(). This function is supposed to be used instead of hand rolled code with the same purpose and to concentrate it into one place for potential further optimization or hardware assistance.	2009-06-22 22:20:38 +00:00
andre	00948f56d3	In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain. sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently. This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket. For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.	2009-06-22 21:46:40 +00:00
jhb	e206daf142	Regen.	2009-06-22 20:24:03 +00:00

1 2 3 4 5 ...

11272 Commits