freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	140dedb81c	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	877d24ac8a	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
Alexander Leidinger	19e252baeb	- >500 static DTrace probes for the linuxulator - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).	2012-05-05 19:42:38 +00:00
Konstantin Belousov	3494f31ad2	Fix misuse of the kernel map in miscellaneous image activators. Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks	2012-02-17 23:47:16 +00:00
Ulrich Spörlein	9a14aa017b	Convert files to UTF-8	2012-01-15 13:23:18 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Andriy Gapon	605da56bc3	linux compat: improve and fix sendmsg/recvmsg compatibility - implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS and prctl PR_SET_KEEPCAPS. - add SCM_CREDS support to sendmsg and recvmsg - modify sendmsg to ignore control messages if not using UNIX domain sockets This should allow linux pulse audio daemon and client work on FreeBSD and interoperate with native counter-parts modulo the differences in pulseaudio versions. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 11:05:53 +00:00
Dmitry Chagin	d207e753da	Put the macro declaration in the relevant include file for future use.	2011-02-15 21:22:09 +00:00
Dmitry Chagin	596ba1bd95	Style(9) fixes. MFC after: 1 Month.	2011-01-28 19:04:15 +00:00
Dmitry Chagin	adc7ece00a	Implement a variation of the linux_common_wait() which should be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.	2011-01-28 18:47:07 +00:00
Dmitry Chagin	d908c2d2a2	Style(9) fix. MFC after: 1 month.	2011-01-28 05:42:14 +00:00
Alexander Leidinger	bb63fdde6d	By using the 32-bit Linux version of Sun's Java Development Kit 1.6 on FreeBSD (amd64), invocations of "javac" (or "java") eventually end with the output of "Killed" and exit code 137. This is caused by: 1. After calling exec() in multithreaded linux program threads are not destroyed and continue running. They get killed after program being executed finishes. 2. linux_exit_group doesn't return correct exit code when called not from group leader. Which happens regularly using sun jvm. The submitters fix this in a similar way to how NetBSD handles this. I took the PRs away from dchagin, who seems to be out of touch of this since a while (no response from him). The patches committed here are from [2], with some little modifications from me to the style. PR: 141439 [1], 144194 [2] Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk Reviewed by: rdivacky (in april 2010) MFC after: 5 days	2010-11-22 09:06:59 +00:00
Brooks Davis	3ef5ae2dde	Since all other comparisons involving ngroups_max use "ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent "> ngroups_max" to reduce confusion.	2010-01-15 07:05:00 +00:00
Brooks Davis	412f9500e2	Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month	2010-01-12 07:49:34 +00:00
Konstantin Belousov	b55ef216fe	kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week	2009-09-09 20:59:01 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
Jamie Gritton	7455b100af	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz	2009-06-13 00:12:02 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Jamie Gritton	76ca6f88da	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
Dmitry Chagin	8d30f381ef	Do not export AT_CLKTCK when emulating Linux kernel prior to 2.4.0, as it has appeared in the 2.4.0-rc7 first time. Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK), glibc falls back to the hard-coded CLK_TCK value when aux entry is not present. Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value. For older applications/libc's which depends on hard-coded CLK_TCK value user should set compat.linux.osrelease less than 2.4.0. Approved by: kib (mentor)	2009-05-10 18:43:43 +00:00
Dmitry Chagin	1ca16454b3	Rework r189362, r191883. The frequency of the statistics clock is given by stathz. Use stathz if it is available, otherwise use hz. Pointed out by: bde Approved by: kib (mentor)	2009-05-10 18:16:07 +00:00
Dmitry Chagin	c65b9bfa3c	Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry, which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is not exported, Glibc uses 100. linux_times() shall use the value that is exported to user space. Pointyhat to: dchagin PR: kern/134251 Approved by: kib (mentor) MFC after: 2 weeks	2009-05-07 14:24:50 +00:00
Dmitry Chagin	4d706dcc08	Change linux struct tms definition to match actual linux one. Approved by: kib (mentor) MFC after: 2 weeks	2009-05-07 12:55:58 +00:00
Dmitry Chagin	4d7c2e8a48	Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?" from some programs at start, among them are top and pkill. Do the assignment of the vector entries in elf_linux_fixup() as it is done in glibc. Fix some minor style issues. Submitted by: Marcin Cieslak <saper at SYSTEM PL> Approved by: kib (mentor) MFC after: 1 week	2009-03-04 12:14:33 +00:00
Ed Schouten	ddf9d24349	Push down Giant inside sysctl. Also add some more assertions to the code. In the existing code we didn't really enforce that callers hold Giant before calling userland_sysctl(), even though there is no guarantee it is safe. Fix this by just placing Giant locks around the call to the oid handler. This also means we only pick up Giant for a very short period of time. Maybe we should add MPSAFE flags to sysctl or phase it out all together. I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root() and name2oid() are called with the sysctl lock held. Reviewed by: Jille Timmermans <jille quis cx>	2008-12-29 12:58:45 +00:00
Ed Schouten	a1b5a8955e	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib	2008-11-09 10:45:13 +00:00
Konstantin Belousov	68da8b22d2	Current linux_fooaffinity() emulation fails, as the FreeBSD affinity syscalls expect the bitmap size in the range from 32 to 128. Old glibc always assumed size 1024, while newer glibc searches for approriate size, starting from 1024 and going up. For now, use FreeBSD size of cpuset_t for bitmap size parameter and return EINVAL if length of user space bitmap less than our size of cpuset_t. Submitted by: dchagin MFC after: 1 week [This requires MFC of the actual linux affinity syscalls]	2008-10-04 19:23:30 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Edward Tomasz Napierala	9545354ed5	Fix usage of mac_vnode_check_open() in linuxulator - last argument should be VREAD, not FREAD. Approved by: rwatson (mentor)	2008-09-22 18:59:24 +00:00
Roman Divacky	0d62170990	The ERESTART to EINTR conversion is already done in kern_select so there is no need to repeat it in linux_select(). Submitted by: Dmitry Chagin <dchagin@> MFC after: 1 week Approved by: kib (mentor)	2008-09-11 15:28:28 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Roman Divacky	0864e2a4f1	Fix linux_alarm, the linux behaviour is to limit the secs to INT_MAX when the passed in parameter is bigger than INT_MAX. Submitted by: Dmitry Chagin <chagin.dmitry gmail com> Approved by: kib (mentor)	2008-07-23 17:19:02 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Roman Divacky	4732e446fb	Implement robust futexes. Most of the code is modelled after what Linux does. This is because robust futexes are mostly userspace thing which we cannot alter. Two syscalls maintain pointer to userspace list and when process exits a routine walks this list waking up processes sleeping on futexes from that list. Reviewed by: kib (mentor) MFC after: 1 month	2008-05-13 20:01:27 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Ruslan Ermilov	d7a38db650	Fix build. Reported by: ache, tinderbox	2008-03-25 13:20:52 +00:00
Roman Divacky	5dfb688191	Implement sched_setaffinity and get_setaffinity using real cpu affinity setting primitives. Reviewed by: jeff Approved by: kib (mentor)	2008-03-16 16:27:44 +00:00
Konstantin Belousov	cbd2c621f8	Sanitize arguments to linux_mremap(). Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified. Check for the page alignment of the addr argument. Submitted by: rdivacky MFC after: 1 week	2008-02-22 11:47:56 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Konstantin Belousov	b6e645c90f	Implement fake linux sched_getaffinity() syscall to enable java to work with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets native scheduler affinity syscalls. Submitted by: rdivacky Reviewed by: jkim Sponsored by: Google Summer of Code 2007 Approved by: re (kensmith)	2007-08-28 12:26:35 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Alexander Leidinger	802e08a360	Partial MFp4 of 114977: Whitespace commit: Fix grammar, spelling and punctuation. Submitted by: "Scot Hetzel" <swhetzel@gmail.com>	2007-02-24 16:49:25 +00:00

1 2 3 4 5 ...

256 Commits