freebsd-dev

Author	SHA1	Message	Date
Peter Holm	401679debe	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib	2009-06-23 11:29:54 +00:00
Andre Oppermann	ef760e6ad2	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
Andre Oppermann	bc05b2f6fa	Add m_mbuftouio() helper function to copy(out) an arbitrary long mbuf chain into an arbitrary large uio in a single step. It is a functional mirror image of m_uiotombuf(). This function is supposed to be used instead of hand rolled code with the same purpose and to concentrate it into one place for potential further optimization or hardware assistance.	2009-06-22 22:20:38 +00:00
Andre Oppermann	4ad1c464fe	In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain. sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently. This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket. For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.	2009-06-22 21:46:40 +00:00
John Baldwin	e47a833f38	Regen.	2009-06-22 20:24:03 +00:00
John Baldwin	773fa740c9	Include definitions for the audit identifiers for compat system calls in sysproto.h. This makes it possible to use SYSCALL_MODULE() for compat system calls that live in kernel modules.	2009-06-22 20:14:10 +00:00
John Baldwin	0b0fe06a5e	Fix a typo in a comment.	2009-06-22 20:12:40 +00:00
Andre Oppermann	0cb6db1d21	Update m_demote: - remove HT_HEADER test (MT_HEADER == MT_DATA for some time now) - be more pedantic about m_nextpkt in other than first mbuf - update m_flags to be retained	2009-06-22 19:35:39 +00:00
Konstantin Belousov	c808c9632d	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson	2009-06-21 19:21:01 +00:00
Konstantin Belousov	e0c161b89c	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
Roman Divacky	f7490cfe01	In non-debugging mode make this define (void)0 instead of nothing. This helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@	2009-06-21 07:54:47 +00:00
Brooks Davis	7f92e57820	Change crsetgroups_locked() (called by crsetgroups()) to sort the supplemental groups using insertion sort. Use this property in groupmember() to let us use a binary search instead of the previous linear search.	2009-06-20 20:29:21 +00:00
Ed Schouten	f8f6146082	Improve nested jail awareness of devfs by handling credentials. Now that we start to use credentials on character devices more often (because of MPSAFE TTY), move the prison-checks that are in place in the TTY code into devfs. Instead of strictly comparing the prisons, use the more common prison_check() function to compare credentials. This means that pseudo-terminals are only visible in devfs by processes within the same jail and parent jails. Even though regular users in parent jails can now interact with pseudo-terminals from child jails, this seems to be the right approach. These processes are also capable of interacting with the jailed processes anyway, through signals for example. Reviewed by: kib, rwatson (older version)	2009-06-20 14:50:32 +00:00
Kip Macy	5b204a113c	define helper routines for deferred mbuf initialization	2009-06-19 21:14:39 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
John Baldwin	afd9f91c63	Fix a deadlock in the getpeername() method for UNIX domain sockets. Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week	2009-06-18 20:56:22 +00:00
Alan Cox	c45c00344f	Utilize the new function kmem_alloc_contig() to implement the UMA back-end allocator for the jumbo frames zones. This change has two benefits: (1) a custom back-end deallocator is no longer required. UMA's standard deallocator suffices. (2) It eliminates a potentially confusing artifact of using contigmalloc(): The malloc(9) statistics contain bogus information about the usage of jumbo frames. Specifically, the malloc(9) statistics report all jumbo frames in use whereas the UMA zone statistics report the "truth" about the number in use vs. the number free.	2009-06-18 17:59:04 +00:00
John Baldwin	4b7b144f63	Regen.	2009-06-17 19:53:47 +00:00
John Baldwin	21def99b51	- Add the ability to mix multiple flags seperated by pipe ('\|') characters in the type field of system call tables. Specifically, one can now use the 'NO' types as flags in addition to the 'COMPAT' types. For example, to tag 'COMPAT' system calls as living in a KLD via NOSTD. The COMPAT type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT\|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.	2009-06-17 19:50:38 +00:00
John Baldwin	0ec0b41cf6	Remove the now-unused NOIMPL flag. It serves no useful purpose given the existing UNIMPL and NOSTD types.	2009-06-17 18:46:14 +00:00
John Baldwin	f425839118	- NOSTD results in lkmressys being used instead of lkmssys. - Mark nfsclnt as UNIMPL. It should have been NOSTD instead of NOIMPL back when it lived in nfsclient.ko, but it was removed from that a long time ago.	2009-06-17 18:44:15 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Konstantin Belousov	f02c9d2858	Decrement state->ls_threads when vnode appeared to be doomed. Reported and tested by: pho	2009-06-17 12:43:04 +00:00
Attilio Rao	651175c9db	Introduce support for adaptive spinning in lockmgr. Actually, as it did receive few tuning, the support is disabled by default, but it can opt-in with the option ADAPTIVE_LOCKMGRS. Due to the nature of lockmgrs, adaptive spinning needs to be selectively enabled for any interested lockmgr. The support is bi-directional, or, in other ways, it will work in both cases if the lock is held in read or write way. In particular, the read path is passible of further tunning using the sysctls debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls should be axed or compiled out before release. Addictionally note that adaptive spinning doesn't cope well with LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers of LK_SLEEPFAIL are mainly interested in knowing if the interlock was dropped or not in order to reacquire it and re-test initial conditions. This directly interacts with adaptive spinning because lockmgr needs to drop the interlock while spinning in order to avoid a deadlock (further details in the comments inside the patch). Final note: finding someone willing to help on tuning this with relevant workloads would be either very important and appreciated. Tested by: jeff, pho Requested by: many	2009-06-17 01:55:42 +00:00
Konstantin Belousov	01ed174831	Do not use casts (int )0 and (struct thread )0 for the arguments of vn_rdwr, use NULL. Reviewed by: jhb MFC after: 1 week	2009-06-16 15:13:45 +00:00
Ed Schouten	eaaaf1906b	Perform some more cleanups to in-kernel session handling. The code that was in place in exit1() was mainly based on code from the old TTY layer. The main reason behind this, was because at one moment I ran a system that had two TTY layers in place at the same time. It is now sufficient to do the following: - Remove references from the session structure to the TTY vnode and the session leader. - If we have a controlling TTY and the session used by the TTY is equal to our session, send the SIGHUP. - If we have a vnode to the controlling TTY which has not been revoked, revoke it. While there, change sys/kern/tty.c to use s_ttyp in the comparison instead of s_ttyvp. It should not make any difference, because s_ttyvp can only become null when the session leader already left, but it's nicer to compare against the proper value.	2009-06-15 20:45:51 +00:00
John Baldwin	6653a58307	Regen.	2009-06-15 20:40:23 +00:00
John Baldwin	c4f16b69e1	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks	2009-06-15 20:38:55 +00:00
Ed Schouten	9c373a81a3	Make tcsetsid(3) work on revoked TTYs. Right now the only way to make tcsetsid(3)/TIOCSCTTY work, is by ensuring the session leader is dead. This means that an application that catches SIGHUPs and performs a sleep prevents us from assigning a new session leader. Change the code to make it work on revoked TTYs as well. This allows us to change init(8) to make the shutdown script run in a more clean environment.	2009-06-15 19:17:52 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Jamie Gritton	679e13901c	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 18:59:29 +00:00
Jamie Gritton	c1f192193d	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
Bjoern A. Zeeb	aed2a06174	Remove the static from int hardlink_check_uid. There is an external use in the opensolaris code. I am not sure how this ever worked but I have seen two reports of: link_elf: symbol hardlink_check_uid undefined lately. Reported by: Scott Ullrich (sullrich gmail.com), pfsense Reported by: Mister Olli (mister.olli googlemail.com)	2009-06-13 13:09:20 +00:00
Jamie Gritton	7455b100af	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz	2009-06-13 00:12:02 +00:00
Ed Schouten	13ace80b16	Revert my previous change, because it reintroduces an old regression. Because our rc scripts also open the /etc/ttyv* nodes, it revokes the console, preventing startup messages from being displayed. I really have to think about this. Maybe we should just give the console its own TTY and let it build on top of other TTYs. I'm still not sure what to do with input handling there.	2009-06-12 21:21:17 +00:00
Ed Schouten	4650ad4cc0	Prevent yet another staircase effect bug in the console device. Even though I thought I fixed the staircase issue (and I was no longer able to reproduce it), I got some reports of the issue still being there. It turns out the staircase effect still occurred when /dev/console was kept open while killing the getty on the same TTY (ttyv0). For some reason I can't figure out how the old TTY code dealt with that, so I assume the issue has always been there. I only exposed it more by merging consolectl with ttyv0, which means that the issue was present, even on systems without a serial console. I'm now marking the console device as being closed when closing the regular TTY device node. This means that when the getty shuts down, init(8) will open /dev/console, which means the termios attributes will always be reset in this case.	2009-06-12 20:29:55 +00:00
Paul Saab	a03fb243dc	Stop asserting on exclusive locks in fsync since it can now support shared vnode locking on ZFS. Reviewed by: jhb	2009-06-11 17:06:45 +00:00
Andriy Gapon	e76f11f441	strict kobj signatures: linker_if fixes in symtab_get method symtab parameter is made constant as this reflects actual intention and usage of the method Reviewed by: imp, current@ Approved by: jhb (mentor)	2009-06-11 17:05:45 +00:00
Konstantin Belousov	d8b0556c6d	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
Konstantin Belousov	5dd6aaba88	Do not leak the state->ls_lock after VI_DOOMED check introduced in the r192683. Reported by: pho Submitted by: jhb	2009-06-10 16:17:38 +00:00
Bjoern A. Zeeb	c03528b663	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
Konstantin Belousov	d6da640860	Fix r193923 by noting that type of a_fp is struct file *, not int. It was assumed that r193923 was trivial change that cannot be done wrong. MFC after: 2 weeks	2009-06-10 14:24:31 +00:00
Konstantin Belousov	e4d9bdc105	s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args definition. Discussed with: bde MFC after: 2 weeks	2009-06-10 14:09:05 +00:00
Colin Percival	9a1bde1808	Prevent integer overflow in direct pipe write code from circumventing virtual-to-physical page lookups. [09:09] Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10] Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11] Approved by: so (cperciva) Approved by: re (not really, but SVN wants this...) Security: FreeBSD-SA-09:09.pipe Security: FreeBSD-SA-09:10.ipv6 Security: FreeBSD-SA-09:11.ntpd	2009-06-10 10:31:11 +00:00
Warner Losh	72aee35323	We can actually remove devclass_find_driver.	2009-06-10 01:02:38 +00:00
Warner Losh	98266dcf99	As discussed on arch@, restire devclass_{add,delete,find,quiesce}_driver. They aren't needed or used and complicate locking newbus.	2009-06-09 23:24:04 +00:00
Jamie Gritton	e92e0574f9	Fix some overflow errors: a signed allocation and an insufficiant array size. Reported by: pho Tested by: pho Approved by: bz (mentor)	2009-06-09 22:09:29 +00:00
Edward Tomasz Napierala	c0d345beb1	Add part of NFSv4 ACL kernel support code that is required for the upcoming libc changes to work. Not connected to the kernel build yet; for now, it will be compiled into libc. Reviewed by: rwatson	2009-06-09 19:51:22 +00:00
Alan Cox	cbe9534615	Eliminate an instance of VM_PROT_READ_IS_EXEC that I overlooked in r190705.	2009-06-09 17:18:41 +00:00
John Baldwin	4ef60d2686	Add support for multiple passes of the device tree during the boot-time probe. The current device order is unchanged. This commit just adds the infrastructure and ABI changes so that it is easier to merge later changes into 8.x. - Driver attachments now have an associated pass level. Attachments are not allowed to probe or attach to drivers until the system-wide pass level is >= the attachment's pass level. By default driver attachments use the "last" pass level (BUS_PASS_DEFAULT). Driver's that wish to probe during an earlier pass use EARLY_DRIVER_MODULE() instead of DRIVER_MODULE() which accepts the pass level as an additional parameter. - A new method BUS_NEW_PASS has been added to the bus interface. This method is invoked when the system-wide pass level is changed to kick off a rescan of the device tree so that drivers that have just been made "eligible" can probe and attach. - The bus_generic_new_pass() function provides a default implementation of BUS_NEW_PASS(). It first allows drivers that were just made eligible for this pass to identify new child devices. Then it propogates the rescan to child devices that already have an attached driver by invoking their BUS_NEW_PASS() method. It also reprobes devices without a driver. - BUS_PROBE_NOMATCH() is only invoked for devices that do not have an attached driver after being scanned during the final pass. - The bus_set_pass() function is used during boot to raise the pass level. Currently it is only called once during root_bus_configure() to raise the pass level to BUS_PASS_DEFAULT. This has the effect of probing all devices in a single pass identical to previous behavior. Reviewed by: imp Approved by: re (kib)	2009-06-09 14:26:23 +00:00

1 2 3 4 5 ...

11219 Commits