freebsd-skq

Author	SHA1	Message	Date
Jamie Gritton	ca04ba6430	Don't allow creating a socket with a protocol family that the current jail doesn't support. This involves a new function prison_check_af, like prison_check_ip[46] but that checks only the family. With this change, most of the errors generated by jailed sockets shouldn't ever occur, at least until jails are changeable. Approved by: bz (mentor)	2009-02-05 14:15:18 +00:00
Jamie Gritton	b89e82dd87	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
Ed Schouten	f3b86a5fd7	Mark most often used sysctl's as MPSAFE. After running a `make buildkernel', I noticed most of the Giant locks in sysctl are only caused by a very small amount of sysctl's: - sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like sysctl.oidfmt. - kern.ident, kern.osrelease, kern.version, etc. These are just constant strings. - kern.arandom, used by the stack protector. It is already protected by arc4_mtx. I also saw the following sysctl's show up. Not as often as the ones above, but still quite often: - security.jail.jailed. Also mark security.jail.list as MPSAFE. They don't need locking or already use allprison_lock. - kern.devname, used by devname(3), ttyname(3), etc. This seems to reduce Giant locking inside sysctl by ~75% in my primitive test setup.	2009-01-28 19:58:05 +00:00
Bjoern A. Zeeb	1cecba0fcd	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
Bjoern A. Zeeb	eb76156548	Back out r186615; the sanitizing of the pointers in the error case is not needed and seems that it will not be needed either. Pointy hat: mine, mine, mine and not pho's	2009-01-04 12:18:18 +00:00
Peter Holm	1d347f061f	Added missing second part of cleaning j->ip[46] as requested by bz Approved by: kib (mentor) Pointy hat: pho	2008-12-30 20:39:47 +00:00
Peter Holm	bc971b2c82	Make sure that unused j->ip[46] are cleared Reviewed by: bz Approved by: kib (mentor)	2008-12-30 17:54:25 +00:00
Bjoern A. Zeeb	0f1fe22db5	Correctly check the number of prison states to not access anything outside the prison_states array. When checking if there is a name configured for the prison, check the first character to not be '\0' instead of checking if the char array is present, which it always is. Note, that this is different for the *jailname in the syscall. Found with: Coverity Prevent(tm) CID: 4156, 4155 MFC after: 4 weeks (just that I get the mail)	2008-12-11 01:04:25 +00:00
Bjoern A. Zeeb	d465a41d3b	Unbreak the no-networks (no INET/6) build that I broke with the commit in r185435. Pointyhat: no, but I could need a ski cap for the winter	2008-11-29 16:17:39 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Bjoern A. Zeeb	b9f0b66c75	With the permissions of phk@ change the license on kern_jail.c to a 2 clause BSD license.	2008-11-28 19:23:46 +00:00
Pawel Jakub Dawidek	1ba4a712dd	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris	2008-11-17 20:49:29 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb	45e48455cd	MFp4 144659: Plug a memory leak with jail services. PR: 125257 Submitted by: Mateusz Guzik <mjguzik gmail.com> MFC after: 6 days	2008-07-07 20:53:49 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Xin LI	2110d913c0	Revert rev. 178124 as requested by kris@. Having jail id not being reused too frequently is useful for script controlled environment.	2008-06-19 21:41:57 +00:00
Xin LI	31c50f53da	Instead of rolling our own jail number allocation procedure, use alloc_unr() to do it. Submitted by: Ed Schouten <ed 80386 nl> PR: kern/122270 MFC after: 1 month	2008-04-11 21:31:15 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Bjoern A. Zeeb	79ba395267	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)	2008-01-24 08:25:59 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Robert Watson	e41966dc35	Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on the right to stat() a file, such as in mac_bsdextended. Obtained from: TrustedBSD Project MFC after: 3 months	2007-10-21 22:50:11 +00:00
Pawel Jakub Dawidek	24b0502ee0	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson	2007-04-13 23:54:22 +00:00
Robert Watson	4b08405682	Allow PRIV_NETINET_REUSEPORT in jail.	2007-04-10 15:59:49 +00:00
Pawel Jakub Dawidek	c2cda60911	prison_free() can be called with a mutex held. This wasn't a problem until I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR, move prison removal to prison_complete() entirely. To ensure that noone will reference this prison before it's beeing removed from the list skip prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to greater than 0 in prison_hold(). Reported by: kris OK'ed by: rwatson	2007-04-08 10:46:23 +00:00
Pawel Jakub Dawidek	b63b0c6529	Only use prison mutex to protect the fields that need to be protected by it.	2007-04-08 10:21:38 +00:00
Pawel Jakub Dawidek	264de85e73	pr_list is protected by the allprison_lock.	2007-04-08 02:13:32 +00:00
Pawel Jakub Dawidek	dc68a63332	Implement functionality I called 'jail services'. It may be used for external modules to attach some data to jail's in-kernel structure. - Change allprison_mtx mutex to allprison_sx sx(9) lock. We will need to call external functions while holding this lock, which may want to allocate memory. Make use of the fact that this is shared-exclusive lock and use shared version when possible. - Implement the following functions: prison_service_register() - registers a service that wants to be noticed when a jail is created and destroyed prison_service_deregister() - deregisters service prison_service_data_add() - adds service-specific data to the jail structure prison_service_data_get() - takes service-specific data from the jail structure prison_service_data_del() - removes service-specific data from the jail structure Reviewed by: rwatson	2007-04-05 23:19:13 +00:00
Pawel Jakub Dawidek	54b369c1ae	Make prison_find() globally accessible.	2007-04-05 21:34:54 +00:00
Pawel Jakub Dawidek	f3a8d2f93c	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson	2007-04-05 21:03:05 +00:00
Pawel Jakub Dawidek	2709e8904f	Minor simplification.	2007-03-09 05:22:10 +00:00
Pawel Jakub Dawidek	9e5dcf7b21	White space nits.	2007-03-07 21:24:51 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
Pawel Jakub Dawidek	bb531912ff	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson	2007-03-01 20:47:42 +00:00
Robert Watson	95420afea4	Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege.	2007-02-20 00:12:52 +00:00
Robert Watson	95b091d2f2	Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.	2007-02-19 13:33:10 +00:00
Robert Watson	e82d0201bd	Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.	2007-02-19 13:26:39 +00:00
Robert Watson	c3c1b5e62a	For now, reflect practical reality that Audit system calls aren't allowed in Jail: return a privilege error.	2007-02-19 13:10:29 +00:00
Robert Watson	800c940832	Add a new priv(9) kernel interface for checking the availability of privilege for threads and credentials. Unlike the existing suser(9) interface, priv(9) exposes a named privilege identifier to the privilege checking code, allowing more complex policies regarding the granting of privilege to be expressed. Two interfaces are provided, replacing the existing suser(9) interface: suser(td) -> priv_check(td, priv) suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags) A comprehensive list of currently available kernel privileges may be found in priv.h. New privileges are easily added as required, but the comments on adding privileges found in priv.h and priv(9) should be read before doing so. The new privilege interface exposed sufficient information to the privilege checking routine that it will now be possible for jail to determine whether a particular privilege is granted in the check routine, rather than relying on hints from the calling context via the SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail check function, prison_priv_check(), is exposed from kern_jail.c and used by the privilege check routine to determine if the privilege is permitted in jail. As a result, a centralized list of privileges permitted in jail is now present in kern_jail.c. The MAC Framework is now also able to instrument privilege checks, both to deny privileges otherwise granted (mac_priv_check()), and to grant privileges otherwise denied (mac_priv_grant()), permitting MAC Policy modules to implement privilege models, as well as control a much broader range of system behavior in order to constrain processes running with root privilege. The suser() and suser_cred() functions remain implemented, now in terms of priv_check() and the PRIV_ROOT privilege, for use during the transition and possibly continuing use by third party kernel modules that have not been updated. The PRIV_DRIVER privilege exists to allow device drivers to check privilege without adopting a more specific privilege identifier. This change does not modify the actual security policy, rather, it modifies the interface for privilege checks so changes to the security policy become more feasible. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:37:19 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Robert Watson	5702e0965e	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
Christian S.J. Peron	453f7d5369	Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT macros around vrele as it's possible that this can call in the VOP_INACTIVE filesystem specific code. Also while we are here, remove the Giant assertion. from the sysctl handler, we do not actually require Giant here so we shouldn't assert it. Doing so will just complicate things when Giant is removed from the sysctl framework.	2005-09-28 00:30:56 +00:00
Pawel Jakub Dawidek	06a137780b	Actually only protect mount-point if security.jail.enforce_statfs is set to 2. If we don't return statistics about requested file systems, system tools may not work correctly or at all. Approved by: re (scottl)	2005-06-23 22:13:29 +00:00
Pawel Jakub Dawidek	820a0de9a9	Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs and extend its functionality: value policy 0 show all mount-points without any restrictions 1 show only mount-points below jail's chroot and show only part of the mount-point's path (if jail's chroot directory is /jails/foo and mount-point is /jails/foo/usr/home only /usr/home will be shown) 2 show only mount-point where jail's chroot directory is placed. Default value is 2. Discussed with: rwatson	2005-06-09 18:49:19 +00:00
Jeff Roberson	22fdc83f93	- Use taskqueue_thread rather than taskqueue_swi since our task is going to vrele, which may vop lock. This is not safe in a software interrupt context.	2005-04-05 08:51:45 +00:00
John Baldwin	2945387fee	Drop a bogus mp_fixme(). Adding a lock would do nothing to reduce userland races regarding changing of jail-related sysctls.	2005-03-31 22:47:57 +00:00
Colin Percival	79653046d8	Add a new sysctl, "security.jail.chflags_allowed", which controls the behaviour of chflags within a jail. If set to 0 (the default), then a jailed root user is treated as an unprivileged user; if set to 1, then a jailed root user is treated the same as an unjailed root user. This is necessary to allow "make installworld" to work inside a jail, since it attempts to manipulate the system immutable flag on certain files. Discussed with: csjp, rwatson MFC after: 2 weeks	2005-02-08 21:31:11 +00:00

1 2

95 Commits