freebsd-nq

Author	SHA1	Message	Date
Matthew Dillon	b1e4abd246	Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.	2001-11-17 03:07:11 +00:00
Robert Watson	ce17880650	o Replace reference to 'struct proc' with 'struct thread' in 'struct sysctl_req', which describes in-progress sysctl requests. This permits sysctl handlers to have access to the current thread, permitting work on implementing td->td_ucred, migration of suser() to using struct thread to derive the appropriate ucred, and allowing struct thread to be passed down to other code, such as network code where td is not currently available (and curproc is used). o Note: netncp and netsmb are not updated to reflect this change, as they are not currently KSE-adapted. Reviewed by: julian Obtained from: TrustedBSD Project	2001-11-08 02:13:18 +00:00
David Malone	12396bdca7	When scanning for control messages, don't process the data mbufs. This could cause hangs if a unix domain socket was closed with data still to be read from it. Tested by: Andrea Campi <andrea@webcom.it>	2001-10-29 20:04:03 +00:00
Robert Watson	8a7d8cc675	- Combine kern.ps_showallprocs and kern.ipc.showallsockets into a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc(). Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models. Reviewed by: ps, billf Obtained from: TrustedBSD Project	2001-10-09 21:40:30 +00:00
Paul Saab	4787fd37af	Only allow users to see their own socket connections if kern.ipc.showallsockets is set to 0. Submitted by: billf (with modifications by me) Inspired by: Dave McKay (aka pm aka Packet Magnet) Reviewed by: peter MFC after: 2 weeks	2001-10-05 07:06:32 +00:00
David Malone	2bc21ed985	Hopefully improve control message passing over Unix domain sockets. 1) Allow the sending of more than one control message at a time over a unix domain socket. This should cover the PR 29499. 2) This requires that unp_{ex,in}ternalize and unp_scan understand mbufs with more than one control message at a time. 3) Internalize and externalize used to work on the mbuf in-place. This made life quite complicated and the code for sizeof(int) < sizeof(file ) could end up doing the wrong thing. The patch always create a new mbuf/cluster now. This resulted in the change of the prototype for the domain externalise function. 4) You can now send SCM_TIMESTAMP messages. 5) Always use CMSG_DATA(cm) to determine the start where the data in unp_{ex,in}ternalize. It was using ((struct cmsghdr )cm + 1) in some places, which gives the wrong alignment on the alpha. (NetBSD made this fix some time ago). This results in an ABI change for discriptor passing and creds passing on the alpha. (Probably on the IA64 and Spare ports too). 6) Fix userland programs to use CMSG_* macros too. 7) Be more careful about freeing mbufs containing (file *)s. This is made possible by the prototype change of externalise. PR: 29499 MFC after: 6 weeks	2001-10-04 13:11:48 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Julian Elischer	a8cfc0ee40	Forgot to remove this un-needed test. (M_WAITOK won't fail) I vaguely remember someone once proving it COULD return NULL.. was that changed? Reminded by: BDE MFC after: 2 weeks	2001-08-19 04:30:13 +00:00
Julian Elischer	ad4ff09012	fix typo Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2001-08-18 17:43:29 +00:00
Julian Elischer	8f364875fe	Don't alocate a 400 byte buffer on the stack, Nor 800 bytes of structures.. MFC after: 2 weeks	2001-08-18 02:53:50 +00:00
Dima Dorfman	0c1bb4fbf1	Implement a LOCAL_PEERCRED socket option which returns a `struct xucred` with the credentials of the connected peer. Obviously this only works (and makes sense) on SOCK_STREAM sockets. This works for both the connect(2) and listen(2) callers. There is precise documentation of the semantics in unix(4). Reviewed by: dwmalone (eyeballed)	2001-08-17 22:01:18 +00:00
Robert Watson	b1fc0ec1a7	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit	2001-05-25 16:59:11 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Thomas Moestl	83f3198b2b	Change uipc_sockaddr so that a sockaddr_un without a path is returned nam for an unbound socket instead of leaving nam untouched in that case. This way, the getsockname() output can be used to determine the address family of such sockets (AF_LOCAL). Reviewed by: iedowse Approved by: rwatson	2001-04-24 19:09:23 +00:00
Robert Watson	91421ba234	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project	2001-02-21 06:39:57 +00:00
Bosko Milekic	2a0c503e7a	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.	2000-12-21 21:44:31 +00:00
Poul-Henning Kamp	46aa3347cb	Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should never include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde	2000-10-27 11:45:49 +00:00
Don Lewis	f535380cb6	Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().	2000-09-05 22:11:13 +00:00
Brian Feldman	6aef685fbb	Remove any possibility of hiwat-related race conditions by changing the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and a u_long target to set it to. The whole thing is splnet(). This fixes a problem that jdp has been able to provoke.	2000-08-29 11:28:06 +00:00
Kirk McKusick	f2a2857bb3	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
Poul-Henning Kamp	77978ab8bc	Previous commit changing SYSCTL_HANDLER_ARGS violated KNF. Pointed out by: bde	2000-07-04 11:25:35 +00:00
Poul-Henning Kamp	82d9ae4e32	Style police catches up with rev 1.26 of src/sys/sys/sysctl.h: Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)	2000-07-03 09:35:31 +00:00
Alfred Perlstein	c636255150	fix races in the uidinfo subsystem, several problems existed: 1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle. fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already. 2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races. fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure. 3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity. Reviewed by: green	2000-06-22 22:27:16 +00:00
Yoshinobu Inoue	8692c02553	Enable SCM_RIGHTS on alpha. Allocate necessary buffer as conversion between int and struct file *. Approved by: jkh Submitted by: brian Reviewed by: bde, brian, peter	2000-03-09 15:15:27 +00:00
Eivind Eklund	762e6b856c	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
Poul-Henning Kamp	2e3c8fcbd0	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 10:56:05 +00:00
Peter Wemm	d1f088dab5	Trim unused options (or #ifdef for undoc options). Submitted by: phk	1999-10-11 15:19:12 +00:00
Brian Feldman	ecf723083f	Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total usage limit.	1999-10-09 20:42:17 +00:00
Guido van Rooij	974784e8b4	Do not follow symlinks when binding a unix domain socket. This fixes the ssh 1.2.27 vulnerability as reported in bugtraq.	1999-09-29 21:09:41 +00:00
Brian Feldman	2f9a21326c	Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually. Make a sonewconn3() which takes an extra argument (proc) so new sockets created with sonewconn() from a user's system call get the correct credentials, not just the parent's credentials.	1999-09-19 02:17:02 +00:00
Brian Feldman	ff8b0106a8	Get rid of some evil defines (a pair of snd and rcv.)	1999-09-17 21:38:24 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	bfbb9ce670	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).	1999-05-11 19:55:07 +00:00
Don Lewis	bd508d391b	Fix descriptor leak provoked by KKIS.05051999.003b exploit code. unp_internalize() takes a reference to the descriptor. If the send fails after unp_internalize(), the control mbuf would be freed ophaning the reference. Tested in -CURRENT by: Pierre Beyssac <beyssac@enst.fr>	1999-05-10 18:09:39 +00:00
Poul-Henning Kamp	75c1354190	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/	1999-04-28 11:38:52 +00:00
Eivind Eklund	e9e9477aac	More consistent with surrounding style. (Hey - it looked great in the diff...) Prodded by: bde	1999-04-12 14:34:52 +00:00
Eivind Eklund	632a035f84	Staticize.	1999-04-11 02:17:47 +00:00
Doug Rabson	ce02431ffa	* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)	1999-02-16 10:49:55 +00:00
Matthew Dillon	39ebd17269	The code that reclaims descriptors from in-transit unix domain descriptor-passing messages was calling sorflush() without checking to see if the descriptor was actually a socket. This can cause a crash by exiting programs that use the mechanism under certain circumstances.	1999-01-21 09:02:18 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Bruce Evans	a23d65bfc8	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.	1998-07-15 02:32:35 +00:00
Garrett Wollman	98271db4d5	Convert socket structures to be type-stable and add a version number. Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs. Convert PF_LOCAL PCB structures to be type-stable and add a version number. Define an external format for infomation about socket structures and use it in several places. Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines. It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).	1998-05-15 20:11:40 +00:00
Mike Smith	7be2d30077	In the words of the submitter: --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>	1998-05-07 04:58:58 +00:00
Dag-Erling Smørgrav	dc73342347	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.	1998-04-17 22:37:19 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
Bruce Evans	d826c47904	Fixed duplicate definitions of M_FILE (one static).	1997-11-23 10:43:49 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Poul-Henning Kamp	a1c995b626	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde	1997-10-12 20:26:33 +00:00

1 2

77 Commits