freebsd-nq

Author	SHA1	Message	Date
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Roman Divacky	7356a43c88	Document that all the other commands are either identical to the FreeBSD ones or rejected by kern_msgctl(). Found with: Coverity Prevent(tm) CID: 3456 Approved by: kib (mentor)	2008-11-26 16:38:43 +00:00
Konstantin Belousov	b4cf0e62f4	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter	2008-11-22 12:36:15 +00:00
Konstantin Belousov	62162dfc94	In the robust futexes list head, futex_offset shall be signed, and glibc actually supplies negative offsets. Change l_ulong to l_long. Submitted by: dchagin	2008-11-16 15:45:41 +00:00
Peter Wemm	dc5aaa8410	Sigh. Fix a pointer/int compile error.	2008-11-10 23:36:20 +00:00
Peter Wemm	a22600a1dd	Fix a signal emulation bug introduced in r163018 (and present in 7.x). This prevents 32 bit signal handlers from finding out what the faulting address is. Both the secret 4th argument and siginfo->si_addr are zero.	2008-11-10 23:26:52 +00:00
Ed Schouten	ebb45b0620	Regenerate system call tables for r184789.	2008-11-09 10:48:06 +00:00
Ed Schouten	a1b5a8955e	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib	2008-11-09 10:45:13 +00:00
Dag-Erling Smørgrav	faecfd5641	utf-8 MFC after: 3 weeks	2008-11-05 15:08:09 +00:00
John Baldwin	b1b3a8653d	Don't leak a reference on the /compat/linux vnode everytime the linprocfs 'mtab' file is read. MFC after: 1 month	2008-11-04 18:53:33 +00:00
Doug Rabson	45e6ab7f81	Regen.	2008-11-03 10:39:35 +00:00
Doug Rabson	a9148abd9d	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month	2008-11-03 10:38:00 +00:00
Konstantin Belousov	17b9edd35a	The code in linux_proc_exit() contains a race when multiple linux based processes exits at the same time. The linux_emuldata structure is freed but p->p_emuldata is left as a dangling pointer to the just freed memory. The check for W_EXIT in the loop scanning the child processes isn't safe since the state of the child process can change right afterwards. Lock the process and check the W_EXIT before delivering signal. Submitted by: tegge Reviewed by: davidxu MFC after: 1 week	2008-10-31 10:38:30 +00:00
Edward Tomasz Napierala	15bc6b2bd8	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
John Baldwin	23aa8eeafc	Regen for freebsd32_getdirentries().	2008-10-22 21:56:44 +00:00
John Baldwin	63f8fe9e8b	Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week	2008-10-22 21:55:48 +00:00
Konstantin Belousov	aa8b201112	Correctly fill siginfo for the signals delivered by linux tkill/tgkill. It is required for async cancellation to work. Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made to not linux process. Do not call em_find(p, ...) with p unlocked. Move common code for linux_tkill() and linux_tgkill() into linux_do_tkill(). Change linux siginfo_t definition to match actual linux one. Extend uid fields to 4 bytes from 2. The extension does not change structure layout and is binary compatible with previous definition, because i386 is little endian, and each uid field has 2 byte padding after it. Reported by: Nicolas Joly <njoly pasteur fr> Submitted by: dchangin MFC after: 1 month	2008-10-19 10:02:26 +00:00
Konstantin Belousov	175c6c319b	Make robust futexes work on linux32/amd64. Use PTRIN to read user-mode pointers. Change types used in the structures definitions to properly-sized architecture-specific types. Submitted by: dchagin MFC after: 1 week	2008-10-14 07:59:23 +00:00
Konstantin Belousov	68da8b22d2	Current linux_fooaffinity() emulation fails, as the FreeBSD affinity syscalls expect the bitmap size in the range from 32 to 128. Old glibc always assumed size 1024, while newer glibc searches for approriate size, starting from 1024 and going up. For now, use FreeBSD size of cpuset_t for bitmap size parameter and return EINVAL if length of user space bitmap less than our size of cpuset_t. Submitted by: dchagin MFC after: 1 week [This requires MFC of the actual linux affinity syscalls]	2008-10-04 19:23:30 +00:00
Konstantin Belousov	9a1e630dfd	Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use sbuf instead of doing uiomove. This allows for reads from non-zero offsets to work. Patch is forward-ported des@' one, and was adopted to current code by dchagin@ and me. Reviewed by: des (linprocfs part) PR: kern/101453 MFC after: 1 week	2008-10-04 14:08:16 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Olivier Houchard	3b30175391	Advertise bit 26 as sse2. Spotted out by: gahr	2008-09-26 15:29:18 +00:00
John Baldwin	88ac915a9b	Add support for installing 32-bit system calls from kernel modules. This includes syscall32_{de,}register() routines as well as a module handler and wrapper macros similar to the support for native syscalls in <sys/sysent.h>. MFC after: 1 month	2008-09-25 20:50:21 +00:00
John Baldwin	d47faadce3	Sort includes and add multiple include guards.	2008-09-25 20:12:38 +00:00
John Baldwin	74d9b5a551	Regen.	2008-09-25 20:08:36 +00:00
John Baldwin	48a43ae819	Tidy up a few things with syscall generation: - Instead of using a syscall slot (370) just to get a function prototype for lkmressys(), add an explicit function prototype to <sys/sysent.h>. This also removes unused special case checks for 'lkmressys' from makesyscalls.sh. - Instead of having magic logic in makesyscalls.sh to only generate a function prototype the first time 'lkmnosys' is seen, make 'NODEF' always not generate a function prototype and include an explicit prototype for 'lkmnosys' in <sys/sysent.h>. - As a result of the fix in (2), update the LKM syscall entries in the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'. - Use NOPROTO for the __syscall() entry (198) in the native ABI. This avoids the need for magic logic in makesyscalls.h to only generate a function prototype the first time 'nosys' is encountered.	2008-09-25 20:07:42 +00:00
Konstantin Belousov	a8d403e102	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month	2008-09-24 10:14:37 +00:00
Edward Tomasz Napierala	9545354ed5	Fix usage of mac_vnode_check_open() in linuxulator - last argument should be VREAD, not FREAD. Approved by: rwatson (mentor)	2008-09-22 18:59:24 +00:00
David E. O'Brien	c750e17cf5	Add freebsd32 compat shims for ioctl(2) CDIOREADTOCHEADER and CDIOREADTOCENTRYS requests.	2008-09-22 16:24:36 +00:00
David E. O'Brien	663c58007e	Regenerate for r183270.	2008-09-22 16:09:43 +00:00
David E. O'Brien	ae528485c4	Add freebsd32 compat shims for ioctl(2) MDIOCATTACH, MDIOCDETACH, MDIOCQUERY, and MDIOCLIST requests.	2008-09-22 16:09:16 +00:00
David E. O'Brien	f1287854fd	Regenerate for r183188.	2008-09-19 15:21:40 +00:00
David E. O'Brien	6e6049e9df	Add freebsd32 compat shim for nmount(2). (and quiet some compiler warnings for vfs_donmount)	2008-09-19 15:17:32 +00:00
David E. O'Brien	109ea24cc1	style(9)	2008-09-15 17:39:40 +00:00
David E. O'Brien	7e29bc757e	Regenerate for r183042.	2008-09-15 17:39:01 +00:00
David E. O'Brien	f0f53d8f79	Fix bug in r100384 (rev 1.2) in which the 32-bit swapon(2) was made "obsolete, not included in system", where as the system call does exist.	2008-09-15 17:37:41 +00:00
Ed Schouten	7969b32c44	Allow COMPAT_SVR4 to be built without COMPAT_43. It seems we only depend on COMPAT_43 to implement the send() and recv() routines. We can easily implement them using sendto() and recvfrom(), just like we do inside our very own C library. I wasn't able to really test it, apart from simple compilation testing. I've heard rumours that COMPAT_SVR4 is broken inside execve() anyway. It's still worth to fix this, because I suspect we'll get rid of COMPAT_43 somewhere in the future... Reviewed by: rdivacky Discussed with: jhb	2008-09-15 15:09:35 +00:00
Andrew Thompson	8fa962c745	Allow PAGE_SHIFT to already be defined. Submitted by: Hans Petter Selasky	2008-09-13 17:34:18 +00:00
Roman Divacky	0d62170990	The ERESTART to EINTR conversion is already done in kern_select so there is no need to repeat it in linux_select(). Submitted by: Dmitry Chagin <dchagin@> MFC after: 1 week Approved by: kib (mentor)	2008-09-11 15:28:28 +00:00
Roman Divacky	2963584278	Getdents requires padding with 2 bytes instead of 1 byte as with getdents64. The last byte is used for storing the d_type, add this to plain getdents case where it was missing before. Also change the code to use strlcpy instead of plain strcpy. This changes fix the getdents crash we had reports about (hl2 server etc.) PR: kern/117010 MFC after: 1 week Submitted by: Dmitry Chagin (dchagin@) Tested by: MITA Yoshio <mita ee.t.u-tokyo.ac jp> Approved by: kib (mentor)	2008-09-09 16:00:17 +00:00
Konstantin Belousov	745aaef5b5	Remove superfluous copyin() of args, structures are already in kernel space. Submitted by: dchagin MFC after: 1 week	2008-09-09 13:01:14 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Julian Elischer	576c43c844	We left out V_static_len from ip_fw2.c (also a whitespace diff that i'd rahter fix her ethan break in the vimage branch.)	2008-08-25 05:38:18 +00:00
Julian Elischer	1d89fc4ebe	All opt_x.h includes go at the top of other includes.	2008-08-25 04:55:29 +00:00
Robert Watson	5ae504055a	Regenerate following r182123.	2008-08-24 21:23:08 +00:00
Robert Watson	e484af13ed	When MPSAFE ttys were merged, a new BSM audit event identifier was allocated for posix_openpt(2). Unfortunately, that identifier conflicts with other events already allocated to other systems in OpenBSM. Assign a new globally unique identifier and conform better to the AUE_ event naming scheme. This is a stopgap until a new OpenBSM import is done with the correct identifier, so we'll maintain this as a local diff in svn until then. Discussed with: ed Obtained from: TrustedBSD Project	2008-08-24 21:20:35 +00:00
David E. O'Brien	35c316caaf	Add comments on NOARGS, NODEF, and NOPROTO.	2008-08-21 22:57:31 +00:00
Ed Schouten	18cf135421	Update system call tables. The previous commit also included changes to all the system call lists, but it is a tradition to update these lists in a second commit, so rerun make sysent to update the $FreeBSD$ tags inside these files to refer to the latest version of syscalls.master. Requested by: rwatson	2008-08-20 08:39:10 +00:00
Ed Schouten	bc093719ca	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Ed Schouten	b377be43a5	Add TIOCPKT and TIOCSPTLCK to the Linuxolator. We're very lucky, because the flags used by our TIOCPKT implementation are the same as flags used by Linux. We can safely enable TIOCPKT, assuming EXTPROC is not used. TIOCSPTLCK is used by unlockpt(). Because we don't need unlockpt() in our implementation, make this ioctl a no-op. Approved by: philip (mentor, implicit), rdivacky Obtained from: P4 (//depot/projects/mpsafetty/...)	2008-07-23 17:47:44 +00:00
Roman Divacky	0864e2a4f1	Fix linux_alarm, the linux behaviour is to limit the secs to INT_MAX when the passed in parameter is bigger than INT_MAX. Submitted by: Dmitry Chagin <chagin.dmitry gmail com> Approved by: kib (mentor)	2008-07-23 17:19:02 +00:00
Weongyo Jeong	138ddff935	when NDIS framework try to query/set informations NDIS drivers can return NDIS_STATUS_PENDING. In this case, it's waiting for 5 secs to get the response from drivers now. However, some NDIS drivers can send the response before NDIS framework gets ready to receive it so we might always be blocked for 5 secs in current implementation. NDIS framework should reset the event before calling NDIS driver's callback not after. MFC after: 1 month	2008-07-23 10:49:27 +00:00
Brooks Davis	e44f0b2a63	style(9): put parentheses around return values.	2008-07-10 19:54:34 +00:00
Brooks Davis	774b72e12e	Regen	2008-07-10 17:46:58 +00:00
Brooks Davis	a8c6d6d0ba	id_t is a 64-bit integer and thus is passed as two arguments like off_t is. As a result, those arguments must be recombined before calling the real syscal implementation. This change fixes 32-bit compatibility for cpuset_getid(), cpuset_setid(), cpuset_getaffinity(), and cpuset_setaffinity().	2008-07-10 17:45:57 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Coleman Kane	38ad9366dc	Silence warning about missing IoGetDeviceObjectPointer by implementing a simple stub that always returns STATUS_SUCCESS. Submitted by: Paul B. Mahol <onemda@gmail.com> Reviewed by: thompsa MFC after: 1 week	2008-06-15 13:37:29 +00:00
Wojciech A. Koszek	53a609f064	Remove obselete PECOFF image activator support. PRs assigned at the time of removal: kern/80742 Discussed on: freebsd-current (silence), IRC Tested by: make universe Approved by: cognet (mentor)	2008-06-14 12:51:44 +00:00
Weongyo Jeong	1f22fabdfb	fix a page fault that it occurred during ifp is NULL. This bug happens when NDIS driver's initialization is failed and NDIS driver's trying to call NdisWriteErrorLogEntry().	2008-06-11 07:55:07 +00:00
Roman Divacky	2e1a489300	d_ino member of linux_dirent structure should be unsigned long. Submitted by: Chagin Dmitry <chagin.dmitry@gmail.com> Approved by: kib (mentor)	2008-06-08 11:09:25 +00:00
Roman Divacky	a47444d525	Switch to emulating Linux 2.6 on default. Approved by: kib (mentor)	2008-06-03 17:50:13 +00:00
Ed Schouten	a147e6cadf	Push down the major/minor conversion for pts/%u to improve consistency. In the mpsafetty branch, Linux sshd seems to work properly inside a jail. Some small modifications had to be made to the Linux compatibility layer. The Linux PTY routines always expect the device major number to be 136 or higher. Our code always set the major/minor number pair to 136:0. This makes routines like ttyname() and ptsname() fail, because we'll end up having ambiguous device numbers. The conversion was not performed on all *stat() routines, which meant in some cases the numbers didn't get transformed. By pushing the conversion into linux_driver_get_major_minor(), the transformation will take place on all calls. Approved by: philip (mentor), rdivacky	2008-06-02 08:40:06 +00:00
Weongyo Jeong	32e9c9dc71	Fix a panic that a priority value which is passed to cv_broadcastpri(9) can be < 0. We don't ignore a `increment' argument but at least we keep a priority value of NDIS threads over PRI_MIN_KERN. Reviewed by: thompsa	2008-05-30 06:31:55 +00:00
Weongyo Jeong	d9585f801b	Fix a panic when it occurred during initializing the ndis driver because it try to read network address through ifnet structure which is NULL until the ndis driver's initialization is finished. Reviewed by: thompsa	2008-05-15 04:29:28 +00:00
Roman Divacky	4732e446fb	Implement robust futexes. Most of the code is modelled after what Linux does. This is because robust futexes are mostly userspace thing which we cannot alter. Two syscalls maintain pointer to userspace list and when process exits a routine walks this list waking up processes sleeping on futexes from that list. Reviewed by: kib (mentor) MFC after: 1 month	2008-05-13 20:01:27 +00:00
Roman Divacky	a6d043e30d	Implement linux_truncate64() syscall. Tested by: Aline de Freitas <aline@riseup.net> Approved by: kib (mentor)	2008-04-23 15:56:33 +00:00
Roman Divacky	cabce2bf19	The vmspace->vm_daddr is constant until freed, there is no need to hold lock while accessing it. Approved by: kib (mentor)	2008-04-21 21:24:08 +00:00
Roman Divacky	872cbe6466	Remove using magic value of -1 to distinguish between linux_open() and linux_openat(). Instead just pass AT_FDCWD into linux_common_open() for the linux_open() case. This prevents passing -1 as a dirfd to openat() from succeeding which is wrong. Suggested by: rwatson, kib Approved by: kib (mentor)	2008-04-09 16:42:50 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Konstantin Belousov	f2296b585e	Regen	2008-03-31 12:12:27 +00:00
Konstantin Belousov	4f1e7213d4	Add the freebsd32 compatibility shims for the *at() syscalls. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:08:30 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
John Birrell	8f0cc58815	Remove files that have been repo copied to their new location in cddl-specific parts of the source tree.	2008-03-28 00:08:47 +00:00
Doug Rabson	a7ac0db6cb	Regen.	2008-03-26 15:24:02 +00:00
Doug Rabson	dfdcada31e	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
John Baldwin	5c63b21a1a	Regen.	2008-03-25 19:35:34 +00:00
John Baldwin	30c6422a8a	Add entries for the cpuset-related system calls. The existing system calls can be used on little endian systems. Pointy hat to: jeff	2008-03-25 19:34:47 +00:00
Ruslan Ermilov	d7a38db650	Fix build. Reported by: ache, tinderbox	2008-03-25 13:20:52 +00:00
Roman Divacky	6af821237d	o Add stub support for some new futex operations, so the annoying message is not printed. o Don't warn about FUTEX_FD not being implemented and return ENOSYS instead of 0 (eg. success). o Clear FUTEX_PRIVATE_FLAG as we actually implement only private futexes so there is no reason to return ENOSYS when app asks for a private futex. We don't reject shared futexes because they worked just fine with our implementation so far. Approved by: kib (mentor) Tested by: bsam MFC after: 1 week	2008-03-20 17:03:55 +00:00
Antoine Brodin	afe5acff1b	Simplify fcntl(SVR4_F_DUP2FD) code now that FreeBSD has F_DUP2FD. Approved by: rwatson (mentor)	2008-03-17 18:27:28 +00:00
Roman Divacky	5dfb688191	Implement sched_setaffinity and get_setaffinity using real cpu affinity setting primitives. Reviewed by: jeff Approved by: kib (mentor)	2008-03-16 16:27:44 +00:00
Jeff Roberson	66257bc8d9	- The P_SA flag has been removed. Don't reference it in a KASSERT.	2008-03-12 22:17:06 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Konstantin Belousov	a0b0d286bc	Return ENOSYS instead of 0 for the unknown futex operations. Submitted by: rdivacky Reported and tested by: Gary Stanley <gary velocity-servers net>	2008-03-02 14:00:50 +00:00
Konstantin Belousov	cbd2c621f8	Sanitize arguments to linux_mremap(). Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified. Check for the page alignment of the addr argument. Submitted by: rdivacky MFC after: 1 week	2008-02-22 11:47:56 +00:00
Ruslan Ermilov	b95bd24d29	Regenerate for readlink(2).	2008-02-12 20:11:54 +00:00
Ruslan Ermilov	5f56182b6f	Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov	2008-02-12 20:09:04 +00:00
Poul-Henning Kamp	cf827063a9	Give MEXTADD() another argument to make both void pointers to the free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.	2008-02-01 19:36:27 +00:00
Pawel Jakub Dawidek	44ce1efd91	Change type of kmem_used() and kmem_size() functions to uint64_t, so it doesn't overflow in arc.c in this check: if (kmem_used() > (kmem_size() * 4) / 5) return (1); With this bug ZFS almost doesn't cache. Only 32bit machines are affected that have vm.kmem_size set to values >=1GB. Reported by: David Taylor <davidt@yadt.co.uk>	2008-01-24 11:21:54 +00:00
Robert Watson	20c6fe828a	Regenerate.	2008-01-20 23:44:24 +00:00
Robert Watson	6c902059f2	Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls shm_open() and shm_unlink(). More auditing will need to be done for these calls to capture arguments properly.	2008-01-20 23:43:06 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
John Baldwin	4ad6d200d6	Regen for shm_open(2) and shm_unlink(2).	2008-01-08 22:01:26 +00:00
John Baldwin	8e38aeff17	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
Konstantin Belousov	d075105da0	After applying LCONVPATH() to the path, do use the converted path instead of original user-mode string in the linux_stat() and linux_lstat() syscalls. Tested by: Peter Holm MFC after: 3 days	2008-01-05 12:36:35 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Konstantin Belousov	93eba2d50d	Plug the leaks in the present (hopefully, soon to be replaced) implementation of the linux_openat() for the quick MFC. Reported and tested by: Peter Holm MFC after: 3 days	2007-12-29 14:28:01 +00:00
Konstantin Belousov	15b78ac5d1	Apply the LCONVPATH() to the (old) linux_stat() and linux_lstat() syscalls. Without it, code has two problems: - behaviour of the old and new [l]stat are different with regard of the /compat/linux - directly accessing the userspace data from the kernel asks for the panics. Reported and tested by: Peter Holm Reviewed by: rdivacky MFC after: 3 days	2007-12-29 14:25:29 +00:00
Robert Watson	3de213cc00	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.	2007-12-25 17:52:02 +00:00
John Baldwin	0a63574164	Bah, remove last vestiges of some statfs conversion fixes that aren't quite ready for CVS yet that snuck into 1.68. Pointy hat to: jhb	2007-12-10 19:42:23 +00:00
Scott Long	d637500d06	Grrr, remove an unused variable missed in the last commit.	2007-12-08 01:41:31 +00:00
Scott Long	7815c9e2db	Don't expect a return value from statfs_scale_blocks().	2007-12-07 22:32:09 +00:00
John Baldwin	8120bb7e3a	Regen.	2007-12-06 23:37:26 +00:00
John Baldwin	695e8d536c	Add freebsd32 compat wrappers for msgctl() and __semctl() using kern_msgctl() and kern_semctl(). MFC after: 1 week	2007-12-06 23:36:57 +00:00
John Baldwin	3c39e0d8d4	Add freebsd32 compat wrappers for msgctl() and _semctl() using kern_msgctl() and kern_semctl(). MFC after: 1 week	2007-12-06 23:35:29 +00:00
John Baldwin	d43c6fa4fe	Move 32-bit SYSV IPC structure definitions into freebsd32_ipc.h. MFC after: 1 week	2007-12-06 23:23:16 +00:00
John Baldwin	74427aa423	Move several data structure definitions out of freebsd32_misc.c and into freebsd32.h instead. MFC after: 1 week	2007-12-06 23:11:27 +00:00
Jung-uk Kim	959a913b87	Remove redundant checks for msgsnd(3) and msgrcv(3). COMPAT_IA32 (implicitly) requires SYSVSEM, SYSVSHM and SYSVMSG in kernel. Pointed out by: jhb	2007-12-04 20:25:41 +00:00
Andrew Thompson	ac740aebcf	Implement functions required by some ndis drivers. NdisIMCopySendPerPacketInfo [1] KeQuerySystemTime [1] KeTickCount [1] strncat [1] KeBugCheckEx Submitted by: Marcin Simonides [1]	2007-12-03 23:43:58 +00:00
Andrew Thompson	e880149eb9	Correct the calculation for the number of 100ns intervals since January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns units. Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.	2007-12-02 08:54:50 +00:00
Andrew Thompson	f3ad39ccf5	Correct the nwbx_ies field type in struct ndis_wlan_bssid_ex. PR: kern/118369 Submitted by: Weongyo Jeong	2007-12-02 04:04:42 +00:00
Peter Wemm	7628402b07	Move the shared cp_time array (counts %sys, %user, %idle etc) to the per-cpu area. cp_time[] goes away and a new function creates a merged cp_time-like array for things like linprocfs, sysctl etc. The atomic ops for updating cp_time[] in statclock go away, and the scope of the thread lock is reduced. sysctl kern.cp_time returns a backwards compatible cp_time[] array. A new kern.cp_times sysctl returns the individual per-cpu stats. I have pending changes to make top and vmstat optionally show per-cpu stats. I'm very aware that there are something like 5 or 6 other versions "out there" for doing this - but none were handy when I needed them. I did merge my changes with John Baldwin's, and ended up replacing a few chunks of my stuff with his, and stealing some other code. Reviewed by: jhb Partly obtained from: jhb	2007-11-29 06:34:30 +00:00
John Birrell	35a04710d7	Remove some compatibility stuff that we now get from the Solaris header.	2007-11-29 00:15:08 +00:00
John Birrell	57438287ab	Add more OpenSolaris compatibility headers.	2007-11-28 21:50:40 +00:00
John Birrell	eca148b637	Remove an extern that is defined elsewhere.	2007-11-28 21:50:05 +00:00
John Birrell	edadde229a	Add compatibility cruft moved from under _SOLARIS_C_SOURCE in sys/types.h	2007-11-28 21:49:16 +00:00
John Birrell	35ba7f225f	Remove a typedef which was just a hack to avoid including vmem.h. That typedef breaks other Solaris code.	2007-11-28 21:48:25 +00:00
John Birrell	773f4e3849	Add a missing volatile so that the code compiles cleanly.	2007-11-28 21:47:09 +00:00
John Birrell	4fc8feafc7	Rename the definition of lbolt to LBOLT to avoid a clash with a global variable in FreeBSD. Until now lbolt in sys/proc.h has been #ifdef'ed out based on _SOLARIS_C_SOURCE, but that is going away now.	2007-11-28 21:44:17 +00:00
Konstantin Belousov	d60f0a3d6a	Implement LINUX_SIOCGIFCOUNT and LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX. LINUX_SIOCGIFCOUNT just returns 0 since it is not implemented in the Linux 2.6.16. LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX are mapped to the FreeBSD native SIOCGIFINDEX. Tested by: Peter Kostouros <kpeter@melbpc.org.au> Reviewed by: brooks, rpaulo (on net@) Submitted by: rdivacky MFC after: 1 week	2007-11-07 16:42:52 +00:00
Pawel Jakub Dawidek	171eb887e9	Remove "zfs:" prefix from lock and condvar names and also skip non-letter characters (mostly "&"). Because top(1) shows only first six characters of wait channel, without this change we saw only one meaningful character. Requested by: kris & others MFC after: 1 week	2007-11-05 18:40:55 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Pawel Jakub Dawidek	4f2398ea17	- Move crfree() outside MNT_ILOCK()/MNT_IUNLOCK() to eliminate a LOR: 1st 0xc4cea568 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../compat/opensolaris/kern/opensolaris_vfs.c:209 2nd 0xc3ee9010 sleep mtxpool (sleep mtxpool) @ /usr/src/sys/kern/kern_resource.c:1266 - Move crdup() outside MNT_ILOCK()/MNT_IUNLOCK(), as it can sleep. Reported by: Olli Hauer <ohauer@gmx.de> MFC after: 3 days	2007-11-01 08:58:29 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Kevin Lo	976b010645	Spelling fix for interupt -> interrupt	2007-10-12 06:03:46 +00:00
John Baldwin	977b6507cb	Allow the ia32 resource limits (compat.ia32.max{dsiz,ssiz,vmem} to be set via loader tunables. They are already tunable via sysctl. MFC after: 1 week Approved by: re (kensmith)	2007-09-24 20:49:39 +00:00
David Malone	3ab8526963	The kernel version of Linux statfs64 is actually supposed to take 3 arguments, but we had forgotten the second argument. Also make the Linux statfs64 struct depend on the architecture because it has an extra 4 bytes padding on amd64 compared to i386. The three argument fix is from David Taylor, the struct statfs64 stuff is my fault. With this patch I can install i386 Linux matlab on an amd64 machine. Submitted by: David Taylor <davidt_at_yadt.co.uk> Approved by: re (kensmith)	2007-09-18 19:50:33 +00:00
John Baldwin	cc479dda4a	Rework the routines to convert a 5.x+ statfs structure (with fixed-size 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)	2007-08-28 20:28:12 +00:00
Konstantin Belousov	b6e645c90f	Implement fake linux sched_getaffinity() syscall to enable java to work with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets native scheduler affinity syscalls. Submitted by: rdivacky Reviewed by: jkim Sponsored by: Google Summer of Code 2007 Approved by: re (kensmith)	2007-08-28 12:26:35 +00:00
Pawel Jakub Dawidek	70eaa4219c	Some ZFS threads needs stack larger than the default 8kB, so use 16kB of alternate stack if the default is smaller than 16kB. Approved by: re (rwatson)	2007-08-16 20:33:20 +00:00
David Xu	6ec46f7aa8	Regenerate. Approved by: re(kensmith)	2007-08-16 05:32:26 +00:00
David Xu	81ca5b4257	Add thr_kill2 compat32 syscall. Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)	2007-08-16 05:30:04 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Andrew Thompson	a4e531102e	ndis will signal the kthread to exit and then sleep on the proc pointer to be woken up by kthread_exit. This is racey and in some cases the kthread will exit before ndis gets around to sleep so it will be stuck indefinitely. This change reuses the kq_exit variable to indicate that the thread has gone and will loop on tsleep with a timeout waiting for it. If the kthread has already exited then it will not sleep at all. Approved by: re (rwatson)	2007-07-22 20:53:28 +00:00
John Baldwin	59d8f3ff08	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)	2007-07-12 18:01:31 +00:00
Peter Wemm	b77acb8748	Quiet warnings. I believe gcc is incorrect about these. Approved by: re (rwatson)	2007-07-05 07:38:17 +00:00
Peter Wemm	79d5bdcca5	Don't add the 'pad' argument to the mmap/truncate/etc syscalls. Submitted by: kensmith Approved by: re (kensmith)	2007-07-04 23:06:43 +00:00
Peter Wemm	5aa69f9c72	Add compat6 wrapper code for mmap/lseek/pread/pwrite/truncate/ftruncate. Approved by: re (kensmith)	2007-07-04 23:04:41 +00:00
Peter Wemm	486abf939c	Regenerate after mmap/lseek/etc syscall changes Approved by: re (kensmith)	2007-07-04 23:03:50 +00:00
Peter Wemm	b9f3e68f95	Add i386 emulation wrappers for mmap/lseek/etc. These use COMPAT6, so you must use the already existing, already in generic, COMPAT_FREEBSD6 kernel option for running old 32 bit binaries. Approved by: re (kensmith)	2007-07-04 23:02:40 +00:00
Matt Jacob	739c673c8d	Try a cheap way to get around gcc4.2 believing that user arguments to system calls can change across intervening functions.	2007-06-17 04:37:57 +00:00
Ed Maste	1dd702a59a	Remove stale 'XXX implement' comments for syscalls which have since been implemented.	2007-06-15 21:54:26 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Matt Jacob	3a4ac24970	Quiesce warnings by initializing irql values to zero.	2007-06-10 04:40:13 +00:00
Matt Jacob	2ba956ed13	Ensure that newpath is always initialized, even for the error case.	2007-06-10 04:37:22 +00:00
Attilio Rao	a1fe14bc33	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)	2007-06-09 21:48:44 +00:00
Attilio Rao	a140976eb4	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)	2007-06-09 18:56:11 +00:00
Pawel Jakub Dawidek	3b7917d766	- Reduce number of atomic operations needed to be implemented in asm by implementing some of them using existing ones. - Allow to compile ZFS on all archs and use atomic operations surrounded by global mutex on archs we don't have or can't have all atomic operations needed by ZFS.	2007-06-08 12:35:47 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
David Malone	041b706b2f	Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.	2007-06-04 18:25:08 +00:00
Pawel Jakub Dawidek	b166b92692	Reimplement traverse() helper function: 1. Pass locking flags to VFS_ROOT(). 2. Check v_mountedhere while the vnode is locked. 3. Always return locked vnode on success. Change 1 fixes problem reported by Stephen M. Rumble - after zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode unconditionally and traverse() didn't pass right lock type to VFS_ROOT(). The result was that kernel paniced when .zfs/ directory was accessed via NFS.	2007-06-04 11:31:46 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Pawel Jakub Dawidek	0d99488ded	There are too many false positive LORs reported by WITNESS, so when ZFS debug is turned off, initialize locks with NOWITNESS flag. At some point I'll get back to them, we would probably need BLESSING functionality, which is currently turned off by default.	2007-05-26 21:37:14 +00:00
Pawel Jakub Dawidek	fbd08bbe6a	DNLC_NO_VNODE can't be NULL. Reported by: ru	2007-05-24 13:44:45 +00:00
Pawel Jakub Dawidek	d4c4dfe96f	FreeBSD's namecache works quite well with ZFS, so remove DNLC.	2007-05-23 21:33:02 +00:00
Olivier Houchard	302e130edc	Remove duplicate includes. Submitted by: Cyril Nguyen Huu <cyril ci0 org>	2007-05-23 13:36:02 +00:00
Konstantin Belousov	1c182de9a9	Move futex support code from <arch>/support.s into linux compat directory. Implement all futex atomic operations in assembler to not depend on the fuword() that does not allow to distinguish between -1 and failure return. Correctly return 0 from atomic operations on success. In collaboration with: rdivacky Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz> Sponsored by: Google SoC 2007	2007-05-23 08:33:06 +00:00
Alexander Kabaev	23a29e45cd	Allow FreeBSD's native ELF image activators to execute shared libraries the same way it was enabled for Linux binares in linuxulator. This allows binaries built with -pie. Many ports auto-detect -fPIE support in GCC 4.2 and build binaries FreeBSD was unable to run.	2007-05-22 02:22:58 +00:00
Jeff Roberson	0ad5e7f326	- Move GDT/LDT locking into a seperate spinlock, removing the global scheduler lock from this responsibility. Contributed by: Attilio Rao <attilio@FreeBSD.org> Tested by: jeff, kkenn	2007-05-20 22:03:57 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
John Baldwin	19059a13ed	Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week	2007-05-14 22:40:04 +00:00
Pawel Jakub Dawidek	57504dcfaf	Share-lock a vnode where possible.	2007-05-02 01:03:10 +00:00
Alan Cox	37f3c8939a	Eliminate the use of Giant from ia64-specific code in freebsd32_mmap().	2007-05-01 17:10:01 +00:00
Alan Cox	4bd4f5a2e2	Synchronize vm map and object accesses. Approved by: des@	2007-05-01 03:09:57 +00:00
Pawel Jakub Dawidek	cc7cd831b2	MFp4: Reduce diff against vendor code: - Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c and keep original functions as similar to vendor's code as possible. - Add various includes back, now that we have them.	2007-04-23 00:52:07 +00:00
Dag-Erling Smørgrav	7621783a55	Now that we're MPSAFE, tell namei() to acquire Giant if necessary.	2007-04-22 08:41:52 +00:00
Pawel Jakub Dawidek	9de81c7273	MFp4: @118370 Correct typo. @118371 Integrate changes from vendor. @118491 Show backtrace on unexpected code paths. @118494 Integrate changes from vendor. @118504 Fix sendfile(2). I had two ways of fixing it: 1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of hacking around with vn_rdwr(UIO_NOCOPY), which was suggested by ups. 2. Modify ZFS behaviour to handle this special case. Although 1 is more correct, I've choosen 2, because hack from 1 have a side-effect of beeing faster - it reads ahead MAXBSIZE bytes instead of reading page by page. This is not easy to implement with VOP_GETPAGES(), at least not for me in this very moment. Reported by: Andrey V. Elsukov <bu7cher@yandex.ru> @118525 Reorganize the code to reduce diff. @118526 This code path is expected. It is simply when file is opened with O_FSYNC flag. Reported by: kris Reported by: Michal Suszko <dry@dry.pl>	2007-04-21 12:02:57 +00:00
Pawel Jakub Dawidek	32371d2025	MFp4: Fix automatic snapshot mount when unprivileged user does lookup on a snapshot directory: - Remove PRIV_VFS_MOUNT check - regular users can mount snapshots via lookups on snapshot directory. - Reset mount credential to kcred, so user won't be able to unmount the snapshot. - Reset owner uid. - Unlock vnode in case of a failure. Reported by: simokawa	2007-04-18 15:24:48 +00:00
Pawel Jakub Dawidek	a1bcf4dc7b	- Fix a leftover - vfs_mount_alloc() is now exported properly. This fixes stange panics when listing .zfs/snapshot/ directory for me. Reported by: simokawa Reported by: Johan Hendriks <Johan@double-l.nl> - Hide cache_purge() under FREEBSD_NAMECACHE like in other files. - Protect mnt_flag with mount interlock.	2007-04-17 21:16:34 +00:00
Dag-Erling Smørgrav	78c3440e7d	Whitespace cleanup.	2007-04-15 17:02:03 +00:00
Robert Watson	d72a615878	Some Linux applications (ping) pass a non-NULL msg_control argument to sendmsg() while using a 0-length msg_controllen. This isn't allowed in the FreeBSD system call ABI, so detect this case and set msg_control to NULL. This allows Linux ping to work. Submitted by: rdivacky	2007-04-14 10:35:09 +00:00
Wojciech A. Koszek	f7caeade24	strchr() and strrchr() are already present in the kernel, but with less popular names. Hence: - comment current index() and rindex() functions, as these serve the same functionality as, respectively, strchr() and strrchr() from userland; - add inlined version of strchr() and strrchr(), as we tend to use them more often; - remove str[r]chr() definitions from ZFS code; Reviewed by: pjd Approved by: cognet (mentor)	2007-04-10 21:42:12 +00:00
Scott Long	6eef46be3b	Whitespace fixes	2007-04-10 21:37:37 +00:00
Pawel Jakub Dawidek	2d03e33170	Try to stabilize ZFS with regard to memory consumption: - Allow to shrink ARC down to 16MB (instead of 64MB). - Set arc_max to 1/2 of kmem_map by default. - Start freeing things earlier when low memory situation is detected. - Serialize execution of arc_lowmem(). I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of kmem_map size. If there is less RAM or kmem_map, a warning will be printed. World is cruel, be no better. In other words: modern file system requires modern hardware:) From ZFS administration guide: "Currently the minimum amount of memory recommended to install a Solaris system is 512 Mbytes. However, for good ZFS performance, at least one Gbyte or more of memory is recommended."	2007-04-10 02:35:57 +00:00
Pawel Jakub Dawidek	24bda1641f	Instead of detecting if lock is already initialized based on standard 1 bit check, use more accurate 13 bits check. We had too many false-positives with the standard check. Reported by: mlaier	2007-04-09 01:05:31 +00:00
Pawel Jakub Dawidek	bdebccf9b9	Extend kobj compatibility KPI to support operating on files before and after the root file system is mounted. This is one of the changes that will allow to put root file system on ZFS.	2007-04-08 23:57:08 +00:00
Pawel Jakub Dawidek	ffe54ff0ec	MFp4: Synchronize with recent OpenSolaris changes.	2007-04-08 16:29:25 +00:00
Scott Long	1eba4c7948	Add the CAM 'SG' peripheral device. This device implements a subset of the Linux SCSI SG passthrough device API. The intention is to allow for both running of Linux apps that want to talk to /dev/sg* nodes, and to facilitate porting of apps from Linux to FreeBSD. As such, both native and linuxolator entry points and definitions are provided. Caveats: - This does not support the procfs and sysfs nodes that the Linux SG driver provides. Some Linux apps may rely on these for operation, others may only use them for informational purposes. - More ioctls need to be implemented. - Linux uses a naming scheme of "sg[a-z]" for devices, while FreeBSD uses a scheme of "sg[0-9]". Devfs aliasis (symlinks) are automatically created to link the two together. However, tools like camcontrol only see the native names. - Some operations were originally designed to return byte counts or other data directly as the syscall return value. The linuxolator doesn't appear to support this well, so this driver just punts for these cases. Now that the driver is in place, others are welcome to add missing functionality. Thanks to Roman Divacky for pushing this work along.	2007-04-07 19:40:58 +00:00
Jung-uk Kim	6e612eca81	Fix kernel module dependency. linprocfs depends on sysvmsg and sysvsem. Submitted by: nork	2007-04-06 18:15:56 +00:00
Pawel Jakub Dawidek	4d00f78b40	We have strcasecmp() in libkern now.	2007-04-06 11:18:57 +00:00
Pawel Jakub Dawidek	f0a75d274a	Please welcome ZFS - The last word in file systems. ZFS file system was ported from OpenSolaris operating system. The code in under CDDL license. I'd like to thank all SUN developers that created this great piece of software. Supported by: Wheel LTD (http://www.wheel.pl/) Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/) Supported by: Sentex (http://www.sentex.net/)	2007-04-06 01:09:06 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Jung-uk Kim	357afa7113	MFP4: Turn emul_lock into a mutex. Submitted by: rdivacky	2007-04-02 18:38:13 +00:00
Jung-uk Kim	3dd8390fd9	Use underlying structures instead of kernel_sysctlbyname() for msginfo and seminfo because kernel_sysctlbyname() is slow. There is no dependency problem since linux module depends on both sysvmsg and sysvsem and linprocfs depends on it in turn. Pointed out by: des Reviewed by: des	2007-03-30 17:56:44 +00:00
Jung-uk Kim	a328699b34	MFP4: Linux futex support for amd64. Initial patch was submitted by kib and additional work was done by Divacky Roman. Tested by: emulation	2007-03-30 01:07:28 +00:00
Julian Elischer	6734f35eac	Implement the openat() linux syscall Submitted by: Roman Divacky (rdivacky@) MFC after: 2 weeks	2007-03-29 02:11:46 +00:00
Dag-Erling Smørgrav	771709eb78	Add a pn_destroy field to pfs_node. This field points to a destructor function which is called from pfs_destroy() before the node is reclaimed. Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor function in addition to the usual attr / fill / vis pointers. This breaks both the programming and binary interfaces between pseudofs and its consumers. It is believed that there are no pseudofs consumers outside the source tree, so that the impact of this change is minimal. Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>	2007-03-12 12:16:52 +00:00
Robert Watson	b77ad8fc3b	In translate_path_major_minor(), do not calculate otherwise unused 'fp' variable, avoiding an extra locking of the file descriptor array.	2007-03-06 07:39:12 +00:00
Jung-uk Kim	5017af608d	MFP4: 113090, 113130, 113132 Add Linux kernel version strings to /proc/sys/kernel.	2007-03-02 01:10:26 +00:00
Jung-uk Kim	a4e3bad794	MFP4: 115220, 115222 - Fix style(9) and reduce diff between amd64 and i386. - Prefix Linuxulator macros with LINUX_ to prevent future collision.	2007-03-02 00:08:47 +00:00
Alexander Leidinger	8cf5ee2e2a	MFp4 (110541): Sync with rev 1.7 in NetBSD. Obtained from: NetBSD	2007-02-25 12:43:07 +00:00
Alexander Leidinger	f9dac96185	MFp4 (110523, parts which apply cleanly): semi-automatic style(9) The futex stuff already differs a lot (only a small part does not differ) from NetBSD, so we are already way off and can't apply changes from NetBSD automatically. As we need to merge everything by hand already, we can even make the files comply to our world order.	2007-02-25 12:40:35 +00:00
Alexander Leidinger	802e08a360	Partial MFp4 of 114977: Whitespace commit: Fix grammar, spelling and punctuation. Submitted by: "Scot Hetzel" <swhetzel@gmail.com>	2007-02-24 16:49:25 +00:00
Alexander Leidinger	1a26db0a3a	MFp4 (114193 (i386 part), 114194, 114195, 114200): - Dont "return" in linux_clone() after we forked the new process in a case of problems. - Move the copyout of p2->p_pid outside the emul_lock coverage in linux_clone(). - Cache the em->pdeath_signal in a local variable and move the copyout out of the emul_lock coverage. - Move the free() out of the emul_shared_lock coverage in a preparation to switch emul_lock to non-sleepable lock (mutex). Submitted by: rdivacky	2007-02-23 22:39:26 +00:00
Alexander Leidinger	e8b8b834b4	MFp4 (part of 114132): - Fix a LOR caused by holding emul_lock and proctree_lock at once. Submitted by: rdivacky	2007-02-23 22:29:24 +00:00

... 2 3 4 5 6 ...

1676 Commits