freebsd-dev

Author	SHA1	Message	Date
Jung-uk Kim	c9cefec159	Support memory wraparound instead of high memory as VM86 mode does. Suggested by: delphij	2010-03-22 18:43:36 +00:00
Jung-uk Kim	a6c8a9c258	Fix i386 PAE kernel build. Reported by: tinderbox	2010-03-22 17:30:34 +00:00
Ed Schouten	0fef797f4a	Actually make O_DIRECTORY work. According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.	2010-03-21 20:43:23 +00:00
Jung-uk Kim	ef8201d39d	- Map EBDA if available and add 64KB above 1MB (high memory), just in case. - Print the initial memory map when bootverbose is set. - Change the page fault address format from linear to %cs:%ip style. - Move duplicate code into a newly added function. - Add strictly aligned memory access for distant future. ;-)	2010-03-19 21:15:43 +00:00
Konstantin Belousov	28ad01d2ba	Regen	2010-03-19 11:14:37 +00:00
Konstantin Belousov	f7ae46da1f	Remove empty line. MFC after: 2 weeks	2010-03-19 11:13:42 +00:00
Konstantin Belousov	afde2b6593	Implement compat32 shims for mqueuefs. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:10:24 +00:00
Konstantin Belousov	0e5d5bc279	Implement compat32 shims for ksem syscalls. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:08:43 +00:00
Konstantin Belousov	75d633cbf6	Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding sysv_{msg,sem,shm}.c files. Mark SysV IPC freebsd32 syscalls as NOSTD and add required SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto register/unregister on module load. This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded as modules. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:04:42 +00:00
Konstantin Belousov	4cfc39cfa6	Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to sysv_ipc.c. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:01:51 +00:00
Konstantin Belousov	0687ba3e90	Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and neccessary support functions to allow registering dynamically loaded syscalls from the MOD_LOAD handlers. Helpers handle registration failures semi-automatically. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 10:56:30 +00:00
Konstantin Belousov	99b331a982	FOr SYSCALL_MODULE_HELPER, use "sys/<syscallname>" module name. FOr SYSCALL32_MODULE_HELPER, use "sys32/<syscallname>" module name. This avoids modules name conflict when compat32 syscall does not need shims. Note that SYSCALL_MODULE_HELPER is going to be unused in the tree by several next commits. Suggested by: jhb MFC after: 2 weeks	2010-03-19 10:52:54 +00:00
Konstantin Belousov	c5e4763dd3	Make freebsd32_copyiniov() available outside of freebsd32_misc. MFC after: 2 weeks	2010-03-19 10:49:03 +00:00
Jung-uk Kim	b92184ec0a	Detect illegal access to unmapped memory within real mode emulator to aid debugging. Update copyright date while I am here.	2010-03-18 20:15:34 +00:00
Nathan Whitehorn	6754ffc8a1	Regen after big endian compatibility import.	2010-03-11 14:56:59 +00:00
Nathan Whitehorn	841c0c7ec7	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
Ed Schouten	ec492b4973	Make /proc/self/fd `work'. On Linux, /proc/<pid>/fd is comparable to fdescfs, where it allows you to inspect the file descriptors used by each process. Glibc's ttyname() works by performing a readlink() on these nodes, since all nodes in this directory are symlinks. It is a bit hard to implement this in linprocfs right now, so I am not going to bother. Add a way to make ttyname(3) work, by adding a /proc/<pid>/fd symlink, which points to /dev/fd only if the calling process matches. When fdescfs is mounted, this will cause the readlink() in ttyname() to fail, causing it to fall back on manually finding a matching node in /dev. Discussed on: emulation@	2010-03-07 10:43:45 +00:00
Joel Dahl	2f7bcda248	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:20:04 +00:00
Pawel Jakub Dawidek	957d68dd91	No need to include security/mac/mac_framework.h here.	2010-02-18 22:26:01 +00:00
Xin LI	5cb9c68cc9	- Return EAFNOSUPPORT instead of EINVAL for unsupported address family, this matches the Linux behavior. - Check if we have sufficient space allocated for socket structure, which fixes a buffer overflow when wrong length is being passed into the emulation layer. [1] PR: kern/138860 Submitted by: Mateusz Guzik <mjguzik gmail com> Reported by: Alexander Best [1] MFC after: 2 weeks	2010-02-09 22:30:51 +00:00
Ed Schouten	790f66db55	Remove unused LIBCOMPAT keyword from syscalls.master.	2010-02-08 10:02:01 +00:00
Wojciech A. Koszek	edfe497ed4	Let us to use our libusb(3) in Linuxolator. With this change, Linux binaries can work with our libusb(3) when it's compiled against our header files on GNU/Linux system -- this solves the problem with differences between /dev layouts. With ported libusb(3), I am able to use my USB JTAG cable with Linux binaries that support it. Reviewed by: thompsa	2010-01-18 22:46:06 +00:00
Alexander Leidinger	2883eb1ce1	Whitespace change to be able to provide the correct commit log for r202364: ---snip--- Add video clipping support but with the caveats below. Background info: Video clipping allows the user to provide either a series of clip rectangles or a clip bitmap to the driver and have the driver mask the video according to the clipping specs provided. Adding support for clipping to the FreeBSD Linux emulator is problematic because it seems that this feature is not supported by many drivers and therefore it is ignored by many applications. Unfortunately, when not using it, rather than passing in a null clipping list, some apps leave the clipping fields uninitialized, casuing random values to be passed in. In the case where the driver does not use the clipping info, this is not a problem (although it is bad form). But the Linux emulator does not know which drivers will use this and which won't, so the Linux emulator must try to handle this clip list, and deal gracefully with cases where the values seem to be uninitialized. Video clipping info is passed in using the VIDIOCSWIN ioctl in two fields in the video_window structure: the integer clipcount and the pointer clips. How the linuxulator handles this from this commit on: * if (clipcount == VIDEO_CLIP_BITMAP) The clips variable is a void * pointer to a 128625 byte (1024625 bit) memory area containing a bitmap of the clipping area. The pointer in the video_window structure is copied, but no video_clip structures are copied. * if (clipcount > 0 && clipcount <= 16384) The clips variable is pointer to a list of video_clip structures. Up to clipcount structures are copied and passed to the driver. The upper limit of 16384 was imposed here so that user code that does not properly initialize clipcount falls through below and no attempt is made to copy an uninitialized list. This value was found by examining Linux drivers that support the clip list. * else The clipcount is either negative (but not VIDEO_CLIP_BITMAP), zero or positive (> 16384). All these cases are treated as invalid data. Both the clipcount field and clips pointer are forced to zero/NULL and passed to the driver. It should be noted that, at the time of developing this V4L emulator code, the pwc(4) V4L driver does not support clipping. Submitted by: J.R. Oldroyd <fbsd@opal.com> MFC after: 1 month ---snip---	2010-01-15 15:38:31 +00:00
Alexander Leidinger	0f6800b944	This is v4l support for the linuxulator. This allows to access FreeBSD native devices which support the v4l API from processes running within the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd driver. Not tested is firmware upload, framebuffer stuff and video tuner stuff due to lack of hardware. The clipping part (VIDIOCSWIN) needs a little bit of further work (partly in progress, but can not be tested due to lack of a suitable device). The submitter tested this sucessfully with Skype and flash apps on amd64 and i386 with the multimedia/pwcbsd driver. Submitted by: J.R. Oldroyd <fbsd@opal.com>	2010-01-15 14:58:19 +00:00
Brooks Davis	3ef5ae2dde	Since all other comparisons involving ngroups_max use "ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent "> ngroups_max" to reduce confusion.	2010-01-15 07:05:00 +00:00
Brooks Davis	412f9500e2	Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month	2010-01-12 07:49:34 +00:00
Kirk McKusick	e268f54cb4	Background: When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson	2010-01-11 20:44:05 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
Konstantin Belousov	9ae781dfcf	Signal 0 is used to check the permission for current process to signal target one. Since r184058, linux_do_tkill() calls tdsignal() instead of kill(), without checking for validity of supplied signal number. Prevent panic when supplied signal is 0 by finishing work after checks. Found and tested by: scf MFC after: 3 days	2009-12-18 14:27:18 +00:00
Warner Losh	8ffab8645b	Revert 200606.	2009-12-16 21:53:56 +00:00
Warner Losh	42b3331b8a	Fix compiling FREEBSD_COMPAT[4,5,6] without FREEBSD_COMPAT7. Note: Not sure this is the right way to do compat, but it makes the headers consistent with the implementations.	2009-12-16 17:17:40 +00:00
Jung-uk Kim	3afa8e569e	Add two new debugging tunables for x86bios instead of abusing bootverbose, i.e., debug.x86bios.call and debug.x86bios.int.	2009-12-15 22:44:28 +00:00
Konstantin Belousov	9afcc9986c	Regenerate.	2009-12-04 21:53:20 +00:00
Konstantin Belousov	195edf8a5a	Add several syscall compat32 entries for acl manipulation. They do not require translation of the arguments. Tested by: bsam MFC after: 1 week	2009-12-04 21:52:31 +00:00
Alexander Leidinger	7b6bedd3a7	This is v4l support for the linuxulator. This allows to access FreeBSD native devices which support the v4l API from processes running within the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd driver. Not tested is firmware upload, framebuffer stuff and video tuner stuff due to lack of hardware. The clipping part (VIDIOCSWIN) needs a little bit of further work (partly in progress, but can not be tested due to lack of a suitable device). The submitter tested this sucessfully with Skype and flash apps on amd64 and i386 with the multimedia/pwcbsd driver. Submitted by: J.R. Oldroyd <fbsd@opal.com>	2009-12-04 21:06:54 +00:00
Alexander Leidinger	63f743fb25	Import the unchanged v4l videodev.h from the vendor branch.	2009-12-04 20:46:45 +00:00
Ed Schouten	0ab939d261	Include <sys/tty.h> instead of <sys/termios.h>. Right now <sys/termios.h> includes <sys/ttycom.h>, which provides the TTY ioctls to the svr4 code. We need both struct termios and the ioctls, so include <sys/tty.h> for now.	2009-11-28 16:30:06 +00:00
Alexander Leidinger	f3d62ac43d	Fix typo in kernel message. The fix is based upon the patch in the PR. PR: kern/140279 Submitted by: Alexander Best <alexbestms@math.uni-muenster.de> MFC after: 1 week	2009-11-05 07:37:48 +00:00
Rui Paulo	060ed74b53	Revert a functional change that snuck in.	2009-11-02 19:13:12 +00:00
Rui Paulo	f8b8dab2eb	Fix a non-style change that snuck in. Spotted by: danfe	2009-11-02 18:51:24 +00:00
Rui Paulo	99081d1c61	Big style cleanup. While there remove references to FreeBSD versions older than 6.0. Submitted by: Paul B Mahol <onemda at gmail.com>	2009-11-02 11:07:42 +00:00
Konstantin Belousov	063e9958a6	Regenerate	2009-10-27 11:02:04 +00:00
Konstantin Belousov	066d836b02	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:55:34 +00:00
Konstantin Belousov	d6e029adbe	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
Konstantin Belousov	84440afb54	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:42:24 +00:00
Bjoern A. Zeeb	d97bee3e7e	Unconditionally call the setsockopt for IPV6_V6ONLY for v6 linux sockets no matter whether we are compiled as module or if our default of the net.inet6.ip6.v6only sysctl already matches what we would set. This avoids unnecessary complications with modules, VIMAGES, INET6 and the sysctl value, especially considering that most users will use linux compat as a module. Discussed with: kib, rwatson (weeks ago) Reviewed by: rwatson MFC after: 6 weeks	2009-10-25 09:58:56 +00:00
Jung-uk Kim	5a0a9182fe	Fix a copy-and-pasto in the previous commit.	2009-10-19 21:01:42 +00:00
Jung-uk Kim	3219f535d9	Rewrite x86bios and update its dependent drivers. - Do not map entire real mode memory (1MB). Instead, we map IVT/BDA and ROM area separately. Most notably, ROM area is mapped as device memory (uncacheable) as it should be. User memory is dynamically allocated and free'ed with contigmalloc(9) and contigfree(9). Remove now redundant and potentially dangerous x86bios_alloc.c. If this emulator ever grows to support non-PC hardware, we may implement it with rman(9) later. - Move all host-specific initializations from x86emu_util.c to x86bios.c and remove now unnecessary x86emu_util.c. Currently, non-PC hardware is not supported. We may use bus_space(9) later when the KPI is fixed. - Replace all bzero() calls for emulated registers with more obviously named x86bios_init_regs(). This function also initializes DS and SS properly. - Add x86bios_get_intr(). This function checks if the interrupt vector is available for the platform. It is not necessary for PC-compatible hardware but it may be needed later. ;-) - Do not try turning off monitor if DPMS does not support the state. - Allocate stable memory for VESA OEM strings instead of just holding pointers to them. They may or may not be accessible always. Fix a memory leak of video mode table while I am here. - Add (experimental) BIOS POST call for vesa(4). This function calls VGA BIOS POST code from the current VGA option ROM. Some video controllers cannot save and restore the state properly even if it is claimed to be supported. Usually the symptom is blank display after resuming from suspend state. If the video mode does not match the previous mode after restoring, we try BIOS POST and force the known good initial state. Some magic was taken from NetBSD (and it was taken from vbetool, I believe.) - Add a loader tunable for vgapci(4) to give a hint to dpms(4) and vesa(4) to identify who owns the VESA BIOS. This is very useful for multi-display adapter setup. By default, the POST video controller is automatically probed and the tunable "hw.pci.default_vgapci_unit" is set to corresponding vgapci unit number. You may override it from loader but it is very unlikely to be necessary. Unfortunately only AGP/PCI/PCI-E controllers can be matched because ISA controller does not have necessary device IDs. - Fix a long standing bug in state save/restore function. The state buffer pointer should be ES:BX, not ES:DI according to VBE 3.0. If it ever worked, that's because BX was always zero. :-) - Clean up register initializations more clearer per VBE 3.0. - Fix a lot of style issues with vesa(4).	2009-10-19 20:58:10 +00:00
Bjoern A. Zeeb	52bf2041ac	Make sure that the primary native brandinfo always gets added first and the native ia32 compat as middle (before other things). o(ld)brandinfo as well as third party like linux, kfreebsd, etc. stays on SI_ORDER_ANY coming last. The reason for this is only to make sure that even in case we would overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo would still be there and the system would be operational. Reviewed by: kib MFC after: 1 month	2009-10-03 11:57:21 +00:00
Robert Watson	718dbcfeaa	Regenerate system call files following r197636.	2009-09-30 08:48:59 +00:00
Robert Watson	72c35fc69c	Reserve system call numbers for Capsicum security framework capabilities, capability mode, and process descriptors: cap_new, cap_getrights, cap_enter, cap_getmode, pdfork, pdkill, pdgetpid, and pdwait. Obtained from: TrustedBSD Project Sponsored by: Google MFC after: 3 weeks	2009-09-30 08:46:01 +00:00
Xin LI	3d0cd00514	Use a 2 clause BSD-style license instead of stating the code as public domain, as requested by core@ and reviewed by the author.	2009-09-28 08:14:15 +00:00
Jung-uk Kim	6aabed1c77	- Reduce BIOS memory mapping. We want 1MB of physical memory, not 12MB[1]. - Remove CS and IP registers from x86bios.h. They have no use for us. - Adjust register dump to make it little bit more useful for debugging. Submitted by: paradox (ddkprog yahoo com)[1] (initial version)	2009-09-25 17:56:32 +00:00
Jung-uk Kim	5ec510d8f4	Dump real mode registers under bootverbose to help debugging BIOS emulator.	2009-09-24 22:42:35 +00:00
Jung-uk Kim	a867274808	- Use FreeBSD function naming convention. - Change x86biosCall() to more appropriate x86bios_intr().[1] Discussed with: delphij, paradox (ddkprog yahoo com) Submitted by: paradox (ddkprog yahoo com)[1]	2009-09-24 19:24:42 +00:00
Jung-uk Kim	19de5df5e5	Move sys/dev/x86bios to sys/compat/x86bios. It may not be optimal but it is clearly better than the old place. OK'ed by: delphij, paradox (ddkprog yahoo com)	2009-09-23 20:49:14 +00:00
Marko Zec	ed539ef656	Lock the ifnet list while iterating over it. Submitted by: julian MFC after: 3 days	2009-09-13 21:30:18 +00:00
Dag-Erling Smørgrav	80c03b8eee	As jhb@ pointed out to me, r197057 was incorrect, not least because these are generated files.	2009-09-10 13:20:27 +00:00
Dag-Erling Smørgrav	edefbdd7cb	If a certain feature that was present in FreeBSD 7 was removed or changed in FreeBSD 8, the compatibility shims should be built not just when FreeBSD 7 compatibility is requested, but also when compatibility with any older FreeBSD version where that feature was present is requested.o Without this patch, a kernel config that sets COMPAT_FREEBSD6 but not 7 would fail to build due to inconsistencies between the declaration of the compatibility shims and their use in the SysV code. There are similar errors in other proto.h headers in the tree. MFC after: 3 weeks	2009-09-10 08:33:28 +00:00
Konstantin Belousov	b55ef216fe	kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week	2009-09-09 20:59:01 +00:00
Bjoern A. Zeeb	ecc2fda872	Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days	2009-08-30 14:38:17 +00:00
Marko Zec	a26f987f5d	Fix a few panics in linuxulator + VIMAGE due to curvnet not being set. This change affects only options VIMAGE builds. Reviewed by: julian MFC after: 3 days	2009-08-28 22:51:07 +00:00
Bjoern A. Zeeb	89ffc202d6	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days	2009-08-24 16:19:47 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
John Baldwin	8e3764c0a6	Fix the freebsd32 versions of semsys(), shmsys(), and msgsys() to use the old ABI versions of the relevant control system call (e.g. freebsd7_freebsd32_msgctl() instead of freebsd32_msgctl() for msgsys()). Approved by: re (kib)	2009-07-27 16:03:04 +00:00
Jamie Gritton	7cbf72137f	Some jail parameters (in particular, "ip4" and "ip6" for IP address restrictions) were found to be inadequately described by a boolean. Define a new parameter type with three values (disable, new, inherit) to handle these and future cases. Approved by: re (kib), bz (mentor) Discussed with: rwatson	2009-07-25 14:48:57 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Edward Tomasz Napierala	51334c8257	Regen the freebsd32 parts. Approved by: re (kib)	2009-07-08 16:30:34 +00:00
Edward Tomasz Napierala	92f76afe3a	Fix freebsd32 version of lpathconf(2). Approved by: re (kib)	2009-07-08 16:26:43 +00:00
Edward Tomasz Napierala	c38898116a	There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)	2009-07-08 15:23:18 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Weongyo Jeong	58f30773d8	provides a extra write buffer when the NDIS driver want to send a request whose body has some datas through the default pipe. Tested by: Nikos Vassiliadis <nvass9573 at gmx.com>	2009-06-26 01:42:41 +00:00
John Baldwin	3899e0dfe7	Regen.	2009-06-24 21:54:08 +00:00
John Baldwin	b648d4806b	Change the ABI of some of the structures used by the SYSV IPC API: - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1] PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1]	2009-06-24 21:10:52 +00:00
John Baldwin	3c366f1f14	Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.	2009-06-24 13:36:37 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Andrew Thompson	8f9e0ef947	Fix a typeo in the frame len function to unbreak the build, make it shorter while I am here.	2009-06-23 06:00:31 +00:00
Andrew Thompson	ed6d949afd	- Make struct usb_xfer opaque so that drivers can not access the internals - Reduce the number of headers needed for a usb driver, the common case is just usb.h and usbdi.h	2009-06-23 02:19:59 +00:00
John Baldwin	e47a833f38	Regen.	2009-06-22 20:24:03 +00:00
John Baldwin	0b0fe06a5e	Fix a typo in a comment.	2009-06-22 20:12:40 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
John Baldwin	4b7b144f63	Regen.	2009-06-17 19:53:47 +00:00
John Baldwin	21def99b51	- Add the ability to mix multiple flags seperated by pipe ('\|') characters in the type field of system call tables. Specifically, one can now use the 'NO' types as flags in addition to the 'COMPAT' types. For example, to tag 'COMPAT' system calls as living in a KLD via NOSTD. The COMPAT type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT\|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.	2009-06-17 19:50:38 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
John Baldwin	6653a58307	Regen.	2009-06-15 20:40:23 +00:00
John Baldwin	c4f16b69e1	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks	2009-06-15 20:38:55 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Andrew Thompson	a593f6b8de	s/usb2_/usb_\|usbd_/ on all function names for the USB stack.	2009-06-15 01:02:43 +00:00
Dmitry Chagin	0046fd5dd9	Unlock process lock when return error from getrobustlist call. Tested by: Alexander Best <alexbestms at math uni-muenster de> Approved by: kib (mentor) MFC after: 3 days	2009-06-14 17:53:55 +00:00
Jamie Gritton	7455b100af	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz	2009-06-13 00:12:02 +00:00
Konstantin Belousov	859357f2b3	Regenerate	2009-06-10 13:48:43 +00:00
Konstantin Belousov	f68be132fe	Add several syscall compat32 entries for extattr manipulation syscalls, that do not require translation of the arguments. Requested by: kientzle Reviewed by: jhb (previous wrong version) MFC after: 1 week	2009-06-10 13:48:13 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Andrew Thompson	ae60fdfba2	Rename usb pipes to endpoints as it better represents what they are, and struct usb_pipe may be used for a different purpose later on.	2009-06-07 19:41:11 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Dmitry Chagin	f83427b833	Add forgotten in previous commit flags argument. Approved by: kib (mentor) MFC after: 1 month	2009-06-01 20:54:41 +00:00
Dmitry Chagin	f8cd0af232	Implement accept4 syscall. Approved by: kib (mentor) MFC after: 1 month	2009-06-01 20:48:39 +00:00
Dmitry Chagin	93e694c9df	Implement a variation of the accept_common() which takes a flags argument. Do not preserve td_retval before kern_fcntl(F_SETFL) as it does not changed. Approved by: kib (mentor) MFC after: 1 month	2009-06-01 20:44:58 +00:00
Dmitry Chagin	c8f37d612d	Split linux_accept() syscall onto linux_accept_common() which should be used by linuxulator and linux_accept() itself. Approved by: kib (mentor) MFC after: 1 month	2009-06-01 20:42:27 +00:00
Robert Watson	33dd50646e	Regenerate generated syscall files following changes to struct sysent in r193234.	2009-06-01 16:14:38 +00:00
Dmitry Chagin	39253cf9bb	Implement a variation of the socketpair() syscall which takes a flags in addition to the type argument. Approved by: kib (mentor) MFC after: 1 month	2009-05-31 12:16:31 +00:00
Dmitry Chagin	38a18e9760	Move new socket flags handling into a separate function as Linux introduced more syscalls which uses these flags. Approved by: kib (mentor) MFC after: 1 month	2009-05-31 12:04:01 +00:00
Dmitry Chagin	20a4ff27b0	Remove empty lines. Approved by: kib (mentor) MFC after: 1 month	2009-05-31 12:00:16 +00:00
Xin LI	67feef6e5d	Attempt to fix build by updating hostid to follow the new world order.	2009-05-30 07:33:32 +00:00
Jamie Gritton	76ca6f88da	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
Andrew Thompson	e0a69b51ac	s/usb2_/usb_/ on all typedefs for the USB stack.	2009-05-29 18:46:57 +00:00
Xin LI	b6151caa3a	Implement SI_ISALIST. PR: kern/91293 Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD	2009-05-29 06:27:30 +00:00
Xin LI	db4bab7fb9	Fix the sysinfo(SI_HW_SERIAL, emulation so that we actually get the hostid of the machine rather than always getting "0". PR: kern/91293 Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD	2009-05-29 06:19:37 +00:00
Xin LI	818c7c128e	copyinstr(9) takes parameter 'len' as a size_t , not int . PR: kern/91293 Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD	2009-05-29 06:04:26 +00:00
Xin LI	329f10dc21	de-register. Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD PR: kern/91293	2009-05-29 05:58:46 +00:00
Xin LI	273bcf6d0c	svr4_sys_getdents64() should not assume that the cookie would exist everywhere. PR: kern/91293 Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD	2009-05-29 05:51:19 +00:00
Xin LI	531ad6344e	Add new sysconfig bits, Fix the bogus numbering of the old bits. Submitted by: "Pedro f. Giffuni" <giffunip asme org> Obtained from: NetBSD PR: kern/91293	2009-05-29 05:37:27 +00:00
Xin LI	90db92dccd	Use strlcpy().	2009-05-28 21:12:43 +00:00
Andrew Thompson	760bc48e7e	s/usb2_/usb_/ on all C structs for the USB stack.	2009-05-28 17:36:36 +00:00
Andriy Gapon	93f0eafde3	linux_ioctl_cdrom: reduce stack usage ... by moving two ~2KB structures from stack to heap allocation. I experienced stack overflow in linux emulation on i386 (8K stack) when LINUX_DVD_READ_STRUCT ioctl was performed on atapicam cd device and there was an error that resulted in additional quite heavy stack use in cam layer. Reviewed by: dchagin Approved by: jhb (mentor)	2009-05-27 15:23:12 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Antoine Brodin	ad0498a383	Remove an unused variable.	2009-05-24 18:35:53 +00:00
John Baldwin	31bd8c7dcf	Comment nits.	2009-05-20 18:36:17 +00:00
John Baldwin	63ba66086a	Put the vnode returned from namei() immediately after namei() returns in svr4_sys_resolvepath().	2009-05-20 18:25:16 +00:00
Dmitry Chagin	ea7b81d2bd	Validate user-supplied arguments values. Args argument is a pointer to the structure located in user space in which the socketcall arguments are packed. The structure must be copied to the kernel instead of direct dereferencing. Approved by: kib (mentor) MFC after: 1 week	2009-05-19 09:10:53 +00:00
Dmitry Chagin	3a72bf04c4	Implement MSG_CMSG_CLOEXEC flag for linux_recvmsg(). Approved by: kib (mentor) MFC after: 1 month	2009-05-18 04:07:46 +00:00
Dmitry Chagin	3933bde22e	Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and SOCK_NONBLOCK flags, that allow to save fcntl() calls. Implement a variation of the socket() syscall which takes a flags in addition to the type argument. Approved by: kib (mentor) MFC after: 1 month	2009-05-16 18:48:41 +00:00
Dmitry Chagin	eeb63e515f	Return EINVAL in case when the incorrect or unsupported type argument is specified. Do not map type argument value as its Linux values are identical to FreeBSD values. Approved by: kib (mentor)	2009-05-16 18:46:51 +00:00
Dmitry Chagin	6994ea543f	Use the protocol family constants for the domain argument validation. Return immediately when the socket() failed. Approved by: kib (mentor) MFC after: 1 month	2009-05-16 18:44:56 +00:00
Dmitry Chagin	d4dd69c46c	Emulate SO_PEERCRED socket option. Temporarily use 0 for pid member as the FreeBSD does not cache remote UNIX domain socket peer pid. PR: kern/102956 Reviewed by: rwatson Approved by: kib (mentor) MFC after: 1 month	2009-05-16 18:42:18 +00:00
Christian Brueffer	bf15937833	Remove an unused variable. Found with: Coverity Prevent(tm) CID: 1167	2009-05-14 09:28:02 +00:00
Christian Brueffer	c3c1eab368	Fix memory leak in an error case. Found with: Coverity Prevent(tm) CID: 371 MFC after: 2 weeks	2009-05-13 08:50:13 +00:00
Dmitry Chagin	03cc95d21a	Translate l_timeval arg to native struct timeval in linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO, SO_SNDTIMEO opts as l_timeval has MD members. Remove bogus __packed attribute from l_timeval struct on __amd64__. PR: kern/134276 Submitted by: Thomas Mueller <tmueller sysgo com> Approved by: kib (mentor) MFC after: 2 weeks	2009-05-11 13:50:42 +00:00
Dmitry Chagin	3980a435a2	Add forgotten linux to bsd flags argument mapping into the linux_recv(). PR: kern/134276 Submitted by: Thomas Mueller <tmueller sysgo com> Approved by: kib (mentor) MFC after: 2 weeks	2009-05-11 13:42:40 +00:00
Dmitry Chagin	8d30f381ef	Do not export AT_CLKTCK when emulating Linux kernel prior to 2.4.0, as it has appeared in the 2.4.0-rc7 first time. Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK), glibc falls back to the hard-coded CLK_TCK value when aux entry is not present. Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value. For older applications/libc's which depends on hard-coded CLK_TCK value user should set compat.linux.osrelease less than 2.4.0. Approved by: kib (mentor)	2009-05-10 18:43:43 +00:00
Dmitry Chagin	580dd797fd	Introduce linux_kernver() interface which is intended for an exact designation of the emulated kernel version. linux_kernver() returns integer value formatted as 'VVVMMMIII' where VVV - version, MMM - major revision, III - minor revision. Approved by: kib (mentor)	2009-05-10 18:27:20 +00:00
Dmitry Chagin	1ca16454b3	Rework r189362, r191883. The frequency of the statistics clock is given by stathz. Use stathz if it is available, otherwise use hz. Pointed out by: bde Approved by: kib (mentor)	2009-05-10 18:16:07 +00:00
Ed Schouten	e5a34e5582	Regenerate system call tables to use SVN ids.	2009-05-08 20:16:04 +00:00
Ed Schouten	51e6ba0f00	Burn TTY ioctl bridges in compat layers. I really don't want any pieces of code to include ioctl_compat.h, so let the ibcs2 and svr4 compat leave sgtty alone. If they want to support sgtty, they should emulate it on top of termios, not sgtty. The code has been marked with BURN_BRIDGES for a long time. ibcs2 and svr4 are not really popular pieces of code anyway.	2009-05-08 20:06:37 +00:00
Marko Zec	29b02909eb	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
Jamie Gritton	e03d223bd4	Give vfs_getopt the type it's expecting. Write 100 times: "32 bits is so twentieth century." Noticed by: dchagin	2009-05-07 19:46:29 +00:00
Jamie Gritton	7ae27ff49f	Move the per-prison Linux MIB from a private one-off pointer to the new OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-07 18:36:47 +00:00
Dmitry Chagin	ca8c3e7bba	Add KTR(9) tracing for futex emulation. Approved by: kib (mentor) MFC after: 1 month	2009-05-07 16:14:31 +00:00
Dmitry Chagin	c65b9bfa3c	Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry, which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is not exported, Glibc uses 100. linux_times() shall use the value that is exported to user space. Pointyhat to: dchagin PR: kern/134251 Approved by: kib (mentor) MFC after: 2 weeks	2009-05-07 14:24:50 +00:00
Dmitry Chagin	4d706dcc08	Change linux struct tms definition to match actual linux one. Approved by: kib (mentor) MFC after: 2 weeks	2009-05-07 12:55:58 +00:00
Dmitry Chagin	4ec3ea90eb	Add preliminary KTR(9) support to the linux emulation layer. Approved by: kib (mentor) MFC after: 1 month	2009-05-07 10:01:05 +00:00
Dmitry Chagin	13f20d7e86	To avoid excessive code duplication move MI definitions to the MI header file. As it is defined in Linux. Approved by: kib (mentor) MFC after: 1 month	2009-05-07 09:39:20 +00:00
Dmitry Chagin	d9b063cc9d	Return EAFNOSUPPORT instead of EINVAL in case when the incorrect or unsupported domain argument is specified. Approved by: kib (mentor)	2009-05-07 09:34:02 +00:00
Dmitry Chagin	1a52a4abf7	Rework r191742. Use the protocol family constants for the domain argument validation. Return EAFNOSUPPORT in case when the incorrect domain argument is specified. Return EPROTONOSUPPORT instead of passing values that are not 0 to the BSD layer. Suggested by: rwatson Approved by: kib (mentor) MFC after: 1 month	2009-05-07 03:23:22 +00:00
Jamie Gritton	84a8cad0f6	Mark Linux MIB sysctls MPSAFE. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-04 19:06:05 +00:00
Dmitry Chagin	40092d93b4	Linux socketpair() call expects explicit specified protocol for AF_LOCAL domain unlike FreeBSD which expects 0 in this case. Approved by: kib (mentor) MFC after: 1 month	2009-05-02 10:51:40 +00:00
Dmitry Chagin	d789bfd562	Move extern variable definitions to the header file. Approved by: kib (mentor) MFC after: 1 month	2009-05-02 10:06:49 +00:00
Dmitry Chagin	79262bf1f0	Reimplement futexes. Old implemention used Giant to protect the kernel data structures, but at the same time called malloc(M_WAITOK), that could cause the calling thread to sleep and lost Giant protection. User-visible result was the missed wakeup. New implementation uses one sx lock per futex. The sx protects the futex structures and allows to sleep while copyin or copyout are performed. Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation is requested and either caller specified futexes are equial or second futex already exists. This is acceptable since the situation can only occur from the application error, and glibc falls back to old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error. Approved by: kib (mentor) MFC after: 1 month	2009-05-01 15:36:02 +00:00
Jamie Gritton	fe2f3c651f	Regen for new jail system calls in r191673. Approved by: bz (mentor)	2009-04-29 21:50:13 +00:00
Jamie Gritton	b38ff370e4	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)	2009-04-29 21:14:15 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Dmitry Chagin	b1121623d2	Remove support for FUTEX_REQUEUE operation. Glibc does not use this operation since 2.3.3 version (Jun 2004), as it is racy and replaced by FUTEX_CMP_REQUEUE operation. Glibc versions prior to 2.3.3 fall back to FUTEX_WAKE when FUTEX_REQUEUE returned EINVAL. Any application directly using FUTEX_REQUEUE without return value checking are definitely broken. Limit quantity of messages per process about unsupported operation. Approved by: kib (mentor) MFC after: 1 month	2009-04-19 13:48:42 +00:00
Andrew Thompson	4eae601ebd	MFp4 //depot/projects/usb@159909 - make usb2_power_mask_t 16-bit - remove "usb2_config_sub" structure from "usb2_config". To compensate for this "usb2_config" has a new field called "usb_mode" which select for which mode the current xfer entry is active. Options are: a) Device mode only b) Host mode only (default-by-zero) c) Both modes. This change was scripted using the following sed script: "s/\.mh\././g". - the standard packet size table in "usb_transfer.c" is now a function, hence the code for the function uses less memory than the table itself. Submitted by: Hans Petter Selasky	2009-04-05 18:20:38 +00:00
Dmitry Chagin	cd899aad76	Fix KBI breakage by r190520 which affects older linux.ko binaries: 1) Move the new field (brand_note) to the end of the Brandinfo structure. 2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer is valid. 3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old modules won't have the flag set, so the new field brand_note would be ignored. Suggested by: jhb Reviewed by: jhb Approved by: kib (mentor) MFC after: 6 days	2009-04-05 09:27:19 +00:00
Konstantin Belousov	9da6804056	Regen	2009-04-01 13:12:40 +00:00
Konstantin Belousov	088b38ac84	Rename implementation function for freebsd32 sysarch(2) to allow for the arguments translations. Provide ABI-compatible definition of the struct i386_ldt_args for freebsd32 compat layer. In collaboration with: pho Reviewed by: jhb	2009-04-01 13:11:50 +00:00
Konstantin Belousov	0cdf4ffabc	Add all segment registers for the amd64 CPU to struct reg and mcontext. To keep these structures ABI-compatible, half the size of r_trapno, r_err, mc_trapno, mc_flags. Add fsbase and gsbase to mcontext on both amd64 and i386. Add flags to amd64 mcontext to indicate that it contains valid segments or bases. In collaboration with: pho Discussed with: peter Reviewed by: jhb	2009-04-01 12:44:17 +00:00
Ed Schouten	7eac36ed3c	Emulate the FIODGNAME ioctl in our 32-bit emulator. It's quite strange that nobody reported this issue before. It turns out functions like ttyname(), ptsname() and fdevname() don't work in compat32. This means it't not even possible to run applications like script(1) inside a 32-bit FreeBSD jail. Fix this by converting 32-bit fiodgname_arg structures to their 64-bit equivalent. Reported by: kris Tested by: kris	2009-03-29 20:09:51 +00:00
Jamie Gritton	8571af59e5	Whitespace/spelling fixes in advance of upcoming functional changes. Approved by: bz (mentor)	2009-03-27 13:13:59 +00:00
Doug Ambrisko	d2b2128a28	Add stuff to support upcoming BMC/IPMI flashing of newer Dell machine via the Linux tool. - Add Linux shim to ipmi(4) - Create a partitions file to linprocfs to make Linux fdisk see disks. This file is dynamic so we can see disks come and go. - Convert msdosfs to vfat in mtab since Linux uses that for msdosfs. - In the Linux mount path convert vfat passed in to msdosfs so Linux mount works on FreeBSD. Note that tasting works so that if da0 is a msdos file system /compat/linux/bin/mount /dev/da0 /mnt works. - fix a 64it bug for l_off_t. Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to /compat/linux/etc/mtab and then some careful unpacking of the Linux bmc update tool and hacking makes it work on newer Dell boxes. Note, probably if you can't figure out how to do this, then you probably shouldn't be doing it :-)	2009-03-26 17:14:22 +00:00
Weongyo Jeong	577b9fa3f8	Some NDIS USB drivers try to call URB funcs like URB_FUNCTION_VENDOR_xxx or URB_FUNCTION_CLASS_xxx with HAL preemption lock that means it's non-sleepable during USB requests though usb2_do_request() requires a sleep so it needs to send queries to the default pipe without those interfaces to avoid sleep.	2009-03-18 02:38:35 +00:00
Weongyo Jeong	08e06b60c1	If the caller sets irp_usriostat or irp_usrevent it try to process it whatever the IRP flag is because some drivers (eg. RTL8187L NDIS driver) call IoCompleteRequest() without setting flags. It will prevent waiting a event forever at attach.	2009-03-18 01:57:54 +00:00
Konstantin Belousov	3ff063577b	Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and compat32 binaries. Tested by: pho Reviewed by: kan	2009-03-17 12:53:28 +00:00
Weongyo Jeong	8d3fb5e2bc	grab NDIS USB lock instead of HAL preemption. This change should be happened in the previous.	2009-03-17 05:57:43 +00:00
Weongyo Jeong	d3947a869c	use usb2_desc_foreach() to iterate the USB config descriptor instread of accessing structures directly to check some invalid descriptors. Pointed by: hps	2009-03-16 11:19:07 +00:00
Dmitry Chagin	731aded86d	Sort include files in the alphabetical order. Approved by: kib (mentor) MFC after: 2 weeks	2009-03-16 05:39:37 +00:00
Dmitry Chagin	b41a7787e1	Ignore FUTEX_FD op, as it is done by linux. Approved by: kib (mentor) MFC after: 2 weeks	2009-03-15 19:38:34 +00:00
Dmitry Chagin	3b8cbbded3	Include linux_futex.h before linux_emul.h Approved by: kib (mentor) MFC after: 6 days	2009-03-15 19:16:12 +00:00
Dmitry Chagin	32c01de21c	Implement new way of branding ELF binaries by looking to a ".note.ABI-tag" section. The search order of a brand is changed, now first of all the ".note.ABI-tag" is looked through. Move code which fetch osreldate for ELF binary to check_note() handler. PR: 118473 Approved by: kib (mentor)	2009-03-13 16:40:51 +00:00
Weongyo Jeong	2c964f43b6	o change a lock model based on HAL preemption lock to a normal mtx. Based on the HAL preemption lock there is a problem on SMP machines and causes a panic. o When a device detached the current tactic to detach NDIS USB driver is to call SURPRISE_REMOVED event. So it don't need to call ndis_halt_nic() again. This fixes some page faults when some drivers work abnormal. o it assumes now that URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER is in DISPATCH_LEVEL (non-sleepable) and as further work URB_FUNCTION_VENDOR_XXX and URB_FUNCTION_CLASS_XXX should be. Reviewed by: Hans Petter Selasky <hselasky_at_freebsd.org> Tested by: Paul B. Mahol <onemda_at_gmail.com>	2009-03-12 02:51:55 +00:00
Weongyo Jeong	6affafd098	o port NDIS USB support from USB1 to the new usb(USB2). o implement URB_FUNCTION_ABORT_PIPE handling. o remove unused code related with canceling the timer list for USB drivers. o whitespace cleanup and style(9) Obtained from: hps's original patch	2009-03-07 07:26:22 +00:00
John Baldwin	2ee8325f42	A better fix for handling different FPU initial control words for different ABIs: - Store the FPU initial control word in the pcb for each thread. - When first using the FPU, load the initial control word after restoring the clean state if it is not the standard control word. - Provide a correct control word for Linux/i386 binaries under FreeBSD/amd64. - Adjust the control word returned for fpugetregs()/npxgetregs() when a thread hasn't used the FPU yet to reflect the real initial control word for the current ABI. - The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control word instead of trashing whatever the current state of the FPU is. Reviewed by: bde	2009-03-05 19:42:11 +00:00
Dmitry Chagin	4d7c2e8a48	Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?" from some programs at start, among them are top and pkill. Do the assignment of the vector entries in elf_linux_fixup() as it is done in glibc. Fix some minor style issues. Submitted by: Marcin Cieslak <saper at SYSTEM PL> Approved by: kib (mentor) MFC after: 1 week	2009-03-04 12:14:33 +00:00
Jamie Gritton	f86bce5ed0	Extend the "vfsopt" mount options for more general use. Make struct vfsopt and the vfs_buildopts function public, and add some new fields to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and vfs_opterror. Further extend the interface to allow reading options from the kernel in addition to sending them to the kernel, with vfs_setopt and related functions. While this allows the "name=value" option interface to be used for more than just FS mounts (planned use is for jails), it retains the current "vfsopt" name and <sys/mount.h> requirement. Approved by: bz (mentor)	2009-03-02 23:26:30 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
Roman Divacky	af83f5d77c	Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15. Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current	2009-02-24 18:09:31 +00:00
Andrew Thompson	3975e3a1ea	Move usb to a graveyard location under sys/legacy/dev, it is intended that the new USB2 stack will fully replace this for 8.0. Remove kernel modules, a subsequent commit will update conf/files. Unhook usbdevs from the build.	2009-02-23 18:16:17 +00:00
Ed Schouten	0eee862a54	Don't make Linux stat() open character devices to resolve its name. The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)	2009-02-20 13:05:29 +00:00
John Baldwin	ea77ff0a15	Use shared vnode locks when invoking VOP_READDIR(). MFC after: 1 month	2009-02-13 18:18:14 +00:00
John Baldwin	3d744ccdd2	Fix a bug in the previous change to the mtab handler: use the path returned by vn_fullpath() when vn_fullpath() succeeds instead of when it fails. Submitted by: Artem Belevich fbsdlist of src.cx MFC after: 3 days	2009-02-13 15:32:03 +00:00
Alexander Leidinger	0134f12f7a	Fix an edge-case of the linux readdir: We need the size of a linux dirent structure, not the size of a pointer to it. PR: 131099 Submitted by: Andreas Kies <andikies@gmail.com> MFC after: 2 weeks	2009-02-13 11:55:19 +00:00
David E. O'Brien	e6493bbebf	Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions for moving between a segment register and a 32-bit memory location. Looked at by: jhb	2009-01-31 11:37:21 +00:00
Ed Schouten	a4611ab612	Last step of splitting up minor and unit numbers: remove minor(). Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.	2009-01-28 17:57:16 +00:00
Jung-uk Kim	34fe89473f	Replace couple of strcmp(cpu_vendor, "foo") with cpu_vendor_id for i386 and hide i386-specific code under #ifdef.	2009-01-22 17:06:33 +00:00
Ed Schouten	ddf9d24349	Push down Giant inside sysctl. Also add some more assertions to the code. In the existing code we didn't really enforce that callers hold Giant before calling userland_sysctl(), even though there is no guarantee it is safe. Fix this by just placing Giant locks around the call to the oid handler. This also means we only pick up Giant for a very short period of time. Maybe we should add MPSAFE flags to sysctl or phase it out all together. I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root() and name2oid() are called with the sysctl lock held. Reviewed by: Jille Timmermans <jille quis cx>	2008-12-29 12:58:45 +00:00
Konstantin Belousov	22a448c4d9	vm_map_lock_read() does not increment map->timestamp, so we should compare map->timestamp with saved timestamp after map read lock is reacquired, not with saved timestamp + 1. The only consequence of the +1 was unconditional lookup of the next map entry, though. Tested by: pho Approved by: des MFC after: 2 weeks	2008-12-29 12:45:11 +00:00
Ganbold Tsagaankhuu	dff1491c74	Remove unused variable. Found with: Coverity Prevent(tm) CID: 542 Approved by: weongyo	2008-12-28 13:50:58 +00:00
Weongyo Jeong	e3a7c990ff	fix a bug to handling the argument that it passed `device_t' but it's handled as `struct ndis_softc'. It'll cause a panic when the driver is detached.	2008-12-27 09:42:17 +00:00
Weongyo Jeong	b3974c00b5	Integrate the NDIS USB support code to CURRENT. Now the NDISulator supports NDIS USB drivers that it've tested with devices as follows: - Anygate XM-142 (Conexant) - Netgear WG111v2 (Realtek) - U-Khan UW-2054u (Marvell) - Shuttle XPC Accessory PN20 (Realtek) - ipTIME G054U2 (Ralink) - UNiCORN WL-54G (ZyDAS) - ZyXEL G-200v2 (ZyDAS) All of them succeeded to attach and worked though there are still some problems that it's expected to be solved. To use NDIS USB support, you should rebuild and install ndiscvt(8) and if you encounter a problem to attach please set `hw.ndisusb.halt' to 0 then retry. I expect no changes of the NDIS code for PCI, PCMCIA devices. Obtained from: //depot/projects/ndisusb/...	2008-12-27 08:03:32 +00:00
Konstantin Belousov	6f3475454e	Remove two remnant uses of AT_DEBUG.	2008-12-17 13:13:35 +00:00
Konstantin Belousov	c7462f4387	Reference the vmspace of the process being inspected by procfs, linprocfs and sysctl kern_proc_vmmap handlers. Reported and tested by: pho Reviewed by: rwatson, des MFC after: 1 week	2008-12-12 12:12:36 +00:00
Bjoern A. Zeeb	b2dbdae9f3	Add 32-bit compat support for AIO. jhb probably forgot to commit this file with r185878 and will want to review this. It unbreaks the build here. Obtained from: p4 //depot/user/jhb/lock/compat/freebsd32/freebsd32_signal.h#2	2008-12-11 00:58:05 +00:00
John Baldwin	5d8d23c71b	Regen.	2008-12-10 20:57:16 +00:00
John Baldwin	3858a1f4f5	- Add 32-bit compat system calls for VFS_AIO. The system calls live in the aio code and are registered via the recently added SYSCALL32_*() helpers. - Since the aio code likes to invoke fuword and suword a lot down in the "bowels" of system calls, add a structure holding a set of operations for things like storing errors, copying in the aiocb structure, storing status, etc. The 32-bit system calls use a separate operations vector to handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now done by having seperate operation vectors with different aiocb copyin routines. - Split out kern_foo() functions for the various AIO system calls so the 32-bit front ends can manage things like copying in and converting timespec structures, etc. - For both the native and 32-bit aio_suspend() and lio_listio() calls, just use copyin() to read the array of aiocb pointers instead of using a for loop that iterated over fuword/fuword32. The error handling in the old case was incomplete (lio_listio() just ignored any aiocb's that it got an EFAULT trying to read rather than reporting an error), and possibly slower. MFC after: 1 month	2008-12-10 20:56:19 +00:00
Konstantin Belousov	c96f374195	Relock user map earlier, to have the lock held when break leaves the loop earlier due to sbuf error. Pointy hat to: me Submitted by: dchagin	2008-12-10 16:11:09 +00:00
Konstantin Belousov	9499cb83bf	Make two style changes to create new commit and document proper commit message for r185765. Noted by: rdivacky Requested by: des Commit message for r185765 should be: In procfs map handler, and in linprocfs maps handler, do not call vn_fullpath() while having vm map locked. This is done in anticipation of the vop_vptocnp commit, that would make vn_fullpath sometime acquire vnode lock. Also, in linprocfs, maps handler already acquires vnode lock. No objections from: des MFC after: 2 week	2008-12-08 13:15:31 +00:00
Konstantin Belousov	5a66e0259b	Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use sbuf instead of doing uiomove. This allows for reads from non-zero offsets to work. Patch is forward-ported des@' one, and was adopted to current code by dchagin@ and me. Reviewed by: des (linprocfs part) PR: kern/101453 MFC after: 1 week	2008-12-08 12:34:52 +00:00
John Baldwin	3cdf485f87	When unloading a 32-bit system call module, restore the sysent vector in the 32-bit system call table instead of the main system call table.	2008-12-03 18:45:38 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Konstantin Belousov	74f5d68011	Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64. Change types used in the linux' struct msghdr and struct cmsghdr definitions to the properly-sized architecture-specific types. Move ancillary data handler from linux_sendit() to linux_sendmsg(). Submitted by: dchagin	2008-11-29 17:14:06 +00:00
Bjoern A. Zeeb	759e7c0bbb	Regen after jail support was added in r185435.	2008-11-29 14:34:30 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Roman Divacky	7356a43c88	Document that all the other commands are either identical to the FreeBSD ones or rejected by kern_msgctl(). Found with: Coverity Prevent(tm) CID: 3456 Approved by: kib (mentor)	2008-11-26 16:38:43 +00:00
Konstantin Belousov	b4cf0e62f4	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter	2008-11-22 12:36:15 +00:00
Konstantin Belousov	62162dfc94	In the robust futexes list head, futex_offset shall be signed, and glibc actually supplies negative offsets. Change l_ulong to l_long. Submitted by: dchagin	2008-11-16 15:45:41 +00:00
Peter Wemm	dc5aaa8410	Sigh. Fix a pointer/int compile error.	2008-11-10 23:36:20 +00:00
Peter Wemm	a22600a1dd	Fix a signal emulation bug introduced in r163018 (and present in 7.x). This prevents 32 bit signal handlers from finding out what the faulting address is. Both the secret 4th argument and siginfo->si_addr are zero.	2008-11-10 23:26:52 +00:00
Ed Schouten	ebb45b0620	Regenerate system call tables for r184789.	2008-11-09 10:48:06 +00:00
Ed Schouten	a1b5a8955e	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib	2008-11-09 10:45:13 +00:00
Dag-Erling Smørgrav	faecfd5641	utf-8 MFC after: 3 weeks	2008-11-05 15:08:09 +00:00
John Baldwin	b1b3a8653d	Don't leak a reference on the /compat/linux vnode everytime the linprocfs 'mtab' file is read. MFC after: 1 month	2008-11-04 18:53:33 +00:00
Doug Rabson	45e6ab7f81	Regen.	2008-11-03 10:39:35 +00:00
Doug Rabson	a9148abd9d	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month	2008-11-03 10:38:00 +00:00
Konstantin Belousov	17b9edd35a	The code in linux_proc_exit() contains a race when multiple linux based processes exits at the same time. The linux_emuldata structure is freed but p->p_emuldata is left as a dangling pointer to the just freed memory. The check for W_EXIT in the loop scanning the child processes isn't safe since the state of the child process can change right afterwards. Lock the process and check the W_EXIT before delivering signal. Submitted by: tegge Reviewed by: davidxu MFC after: 1 week	2008-10-31 10:38:30 +00:00
Edward Tomasz Napierala	15bc6b2bd8	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
John Baldwin	23aa8eeafc	Regen for freebsd32_getdirentries().	2008-10-22 21:56:44 +00:00
John Baldwin	63f8fe9e8b	Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week	2008-10-22 21:55:48 +00:00
Konstantin Belousov	aa8b201112	Correctly fill siginfo for the signals delivered by linux tkill/tgkill. It is required for async cancellation to work. Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made to not linux process. Do not call em_find(p, ...) with p unlocked. Move common code for linux_tkill() and linux_tgkill() into linux_do_tkill(). Change linux siginfo_t definition to match actual linux one. Extend uid fields to 4 bytes from 2. The extension does not change structure layout and is binary compatible with previous definition, because i386 is little endian, and each uid field has 2 byte padding after it. Reported by: Nicolas Joly <njoly pasteur fr> Submitted by: dchangin MFC after: 1 month	2008-10-19 10:02:26 +00:00
Konstantin Belousov	175c6c319b	Make robust futexes work on linux32/amd64. Use PTRIN to read user-mode pointers. Change types used in the structures definitions to properly-sized architecture-specific types. Submitted by: dchagin MFC after: 1 week	2008-10-14 07:59:23 +00:00
Konstantin Belousov	68da8b22d2	Current linux_fooaffinity() emulation fails, as the FreeBSD affinity syscalls expect the bitmap size in the range from 32 to 128. Old glibc always assumed size 1024, while newer glibc searches for approriate size, starting from 1024 and going up. For now, use FreeBSD size of cpuset_t for bitmap size parameter and return EINVAL if length of user space bitmap less than our size of cpuset_t. Submitted by: dchagin MFC after: 1 week [This requires MFC of the actual linux affinity syscalls]	2008-10-04 19:23:30 +00:00
Konstantin Belousov	9a1e630dfd	Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use sbuf instead of doing uiomove. This allows for reads from non-zero offsets to work. Patch is forward-ported des@' one, and was adopted to current code by dchagin@ and me. Reviewed by: des (linprocfs part) PR: kern/101453 MFC after: 1 week	2008-10-04 14:08:16 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Olivier Houchard	3b30175391	Advertise bit 26 as sse2. Spotted out by: gahr	2008-09-26 15:29:18 +00:00
John Baldwin	88ac915a9b	Add support for installing 32-bit system calls from kernel modules. This includes syscall32_{de,}register() routines as well as a module handler and wrapper macros similar to the support for native syscalls in <sys/sysent.h>. MFC after: 1 month	2008-09-25 20:50:21 +00:00
John Baldwin	d47faadce3	Sort includes and add multiple include guards.	2008-09-25 20:12:38 +00:00
John Baldwin	74d9b5a551	Regen.	2008-09-25 20:08:36 +00:00
John Baldwin	48a43ae819	Tidy up a few things with syscall generation: - Instead of using a syscall slot (370) just to get a function prototype for lkmressys(), add an explicit function prototype to <sys/sysent.h>. This also removes unused special case checks for 'lkmressys' from makesyscalls.sh. - Instead of having magic logic in makesyscalls.sh to only generate a function prototype the first time 'lkmnosys' is seen, make 'NODEF' always not generate a function prototype and include an explicit prototype for 'lkmnosys' in <sys/sysent.h>. - As a result of the fix in (2), update the LKM syscall entries in the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'. - Use NOPROTO for the __syscall() entry (198) in the native ABI. This avoids the need for magic logic in makesyscalls.h to only generate a function prototype the first time 'nosys' is encountered.	2008-09-25 20:07:42 +00:00
Konstantin Belousov	a8d403e102	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month	2008-09-24 10:14:37 +00:00
Edward Tomasz Napierala	9545354ed5	Fix usage of mac_vnode_check_open() in linuxulator - last argument should be VREAD, not FREAD. Approved by: rwatson (mentor)	2008-09-22 18:59:24 +00:00
David E. O'Brien	c750e17cf5	Add freebsd32 compat shims for ioctl(2) CDIOREADTOCHEADER and CDIOREADTOCENTRYS requests.	2008-09-22 16:24:36 +00:00
David E. O'Brien	663c58007e	Regenerate for r183270.	2008-09-22 16:09:43 +00:00
David E. O'Brien	ae528485c4	Add freebsd32 compat shims for ioctl(2) MDIOCATTACH, MDIOCDETACH, MDIOCQUERY, and MDIOCLIST requests.	2008-09-22 16:09:16 +00:00
David E. O'Brien	f1287854fd	Regenerate for r183188.	2008-09-19 15:21:40 +00:00
David E. O'Brien	6e6049e9df	Add freebsd32 compat shim for nmount(2). (and quiet some compiler warnings for vfs_donmount)	2008-09-19 15:17:32 +00:00
David E. O'Brien	109ea24cc1	style(9)	2008-09-15 17:39:40 +00:00
David E. O'Brien	7e29bc757e	Regenerate for r183042.	2008-09-15 17:39:01 +00:00
David E. O'Brien	f0f53d8f79	Fix bug in r100384 (rev 1.2) in which the 32-bit swapon(2) was made "obsolete, not included in system", where as the system call does exist.	2008-09-15 17:37:41 +00:00
Ed Schouten	7969b32c44	Allow COMPAT_SVR4 to be built without COMPAT_43. It seems we only depend on COMPAT_43 to implement the send() and recv() routines. We can easily implement them using sendto() and recvfrom(), just like we do inside our very own C library. I wasn't able to really test it, apart from simple compilation testing. I've heard rumours that COMPAT_SVR4 is broken inside execve() anyway. It's still worth to fix this, because I suspect we'll get rid of COMPAT_43 somewhere in the future... Reviewed by: rdivacky Discussed with: jhb	2008-09-15 15:09:35 +00:00
Andrew Thompson	8fa962c745	Allow PAGE_SHIFT to already be defined. Submitted by: Hans Petter Selasky	2008-09-13 17:34:18 +00:00
Roman Divacky	0d62170990	The ERESTART to EINTR conversion is already done in kern_select so there is no need to repeat it in linux_select(). Submitted by: Dmitry Chagin <dchagin@> MFC after: 1 week Approved by: kib (mentor)	2008-09-11 15:28:28 +00:00
Roman Divacky	2963584278	Getdents requires padding with 2 bytes instead of 1 byte as with getdents64. The last byte is used for storing the d_type, add this to plain getdents case where it was missing before. Also change the code to use strlcpy instead of plain strcpy. This changes fix the getdents crash we had reports about (hl2 server etc.) PR: kern/117010 MFC after: 1 week Submitted by: Dmitry Chagin (dchagin@) Tested by: MITA Yoshio <mita ee.t.u-tokyo.ac jp> Approved by: kib (mentor)	2008-09-09 16:00:17 +00:00
Konstantin Belousov	745aaef5b5	Remove superfluous copyin() of args, structures are already in kernel space. Submitted by: dchagin MFC after: 1 week	2008-09-09 13:01:14 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Julian Elischer	576c43c844	We left out V_static_len from ip_fw2.c (also a whitespace diff that i'd rahter fix her ethan break in the vimage branch.)	2008-08-25 05:38:18 +00:00
Julian Elischer	1d89fc4ebe	All opt_x.h includes go at the top of other includes.	2008-08-25 04:55:29 +00:00
Robert Watson	5ae504055a	Regenerate following r182123.	2008-08-24 21:23:08 +00:00
Robert Watson	e484af13ed	When MPSAFE ttys were merged, a new BSM audit event identifier was allocated for posix_openpt(2). Unfortunately, that identifier conflicts with other events already allocated to other systems in OpenBSM. Assign a new globally unique identifier and conform better to the AUE_ event naming scheme. This is a stopgap until a new OpenBSM import is done with the correct identifier, so we'll maintain this as a local diff in svn until then. Discussed with: ed Obtained from: TrustedBSD Project	2008-08-24 21:20:35 +00:00
David E. O'Brien	35c316caaf	Add comments on NOARGS, NODEF, and NOPROTO.	2008-08-21 22:57:31 +00:00
Ed Schouten	18cf135421	Update system call tables. The previous commit also included changes to all the system call lists, but it is a tradition to update these lists in a second commit, so rerun make sysent to update the $FreeBSD$ tags inside these files to refer to the latest version of syscalls.master. Requested by: rwatson	2008-08-20 08:39:10 +00:00
Ed Schouten	bc093719ca	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Ed Schouten	b377be43a5	Add TIOCPKT and TIOCSPTLCK to the Linuxolator. We're very lucky, because the flags used by our TIOCPKT implementation are the same as flags used by Linux. We can safely enable TIOCPKT, assuming EXTPROC is not used. TIOCSPTLCK is used by unlockpt(). Because we don't need unlockpt() in our implementation, make this ioctl a no-op. Approved by: philip (mentor, implicit), rdivacky Obtained from: P4 (//depot/projects/mpsafetty/...)	2008-07-23 17:47:44 +00:00
Roman Divacky	0864e2a4f1	Fix linux_alarm, the linux behaviour is to limit the secs to INT_MAX when the passed in parameter is bigger than INT_MAX. Submitted by: Dmitry Chagin <chagin.dmitry gmail com> Approved by: kib (mentor)	2008-07-23 17:19:02 +00:00
Weongyo Jeong	138ddff935	when NDIS framework try to query/set informations NDIS drivers can return NDIS_STATUS_PENDING. In this case, it's waiting for 5 secs to get the response from drivers now. However, some NDIS drivers can send the response before NDIS framework gets ready to receive it so we might always be blocked for 5 secs in current implementation. NDIS framework should reset the event before calling NDIS driver's callback not after. MFC after: 1 month	2008-07-23 10:49:27 +00:00
Brooks Davis	e44f0b2a63	style(9): put parentheses around return values.	2008-07-10 19:54:34 +00:00
Brooks Davis	774b72e12e	Regen	2008-07-10 17:46:58 +00:00
Brooks Davis	a8c6d6d0ba	id_t is a 64-bit integer and thus is passed as two arguments like off_t is. As a result, those arguments must be recombined before calling the real syscal implementation. This change fixes 32-bit compatibility for cpuset_getid(), cpuset_setid(), cpuset_getaffinity(), and cpuset_setaffinity().	2008-07-10 17:45:57 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Coleman Kane	38ad9366dc	Silence warning about missing IoGetDeviceObjectPointer by implementing a simple stub that always returns STATUS_SUCCESS. Submitted by: Paul B. Mahol <onemda@gmail.com> Reviewed by: thompsa MFC after: 1 week	2008-06-15 13:37:29 +00:00
Wojciech A. Koszek	53a609f064	Remove obselete PECOFF image activator support. PRs assigned at the time of removal: kern/80742 Discussed on: freebsd-current (silence), IRC Tested by: make universe Approved by: cognet (mentor)	2008-06-14 12:51:44 +00:00
Weongyo Jeong	1f22fabdfb	fix a page fault that it occurred during ifp is NULL. This bug happens when NDIS driver's initialization is failed and NDIS driver's trying to call NdisWriteErrorLogEntry().	2008-06-11 07:55:07 +00:00
Roman Divacky	2e1a489300	d_ino member of linux_dirent structure should be unsigned long. Submitted by: Chagin Dmitry <chagin.dmitry@gmail.com> Approved by: kib (mentor)	2008-06-08 11:09:25 +00:00
Roman Divacky	a47444d525	Switch to emulating Linux 2.6 on default. Approved by: kib (mentor)	2008-06-03 17:50:13 +00:00
Ed Schouten	a147e6cadf	Push down the major/minor conversion for pts/%u to improve consistency. In the mpsafetty branch, Linux sshd seems to work properly inside a jail. Some small modifications had to be made to the Linux compatibility layer. The Linux PTY routines always expect the device major number to be 136 or higher. Our code always set the major/minor number pair to 136:0. This makes routines like ttyname() and ptsname() fail, because we'll end up having ambiguous device numbers. The conversion was not performed on all *stat() routines, which meant in some cases the numbers didn't get transformed. By pushing the conversion into linux_driver_get_major_minor(), the transformation will take place on all calls. Approved by: philip (mentor), rdivacky	2008-06-02 08:40:06 +00:00
Weongyo Jeong	32e9c9dc71	Fix a panic that a priority value which is passed to cv_broadcastpri(9) can be < 0. We don't ignore a `increment' argument but at least we keep a priority value of NDIS threads over PRI_MIN_KERN. Reviewed by: thompsa	2008-05-30 06:31:55 +00:00
Weongyo Jeong	d9585f801b	Fix a panic when it occurred during initializing the ndis driver because it try to read network address through ifnet structure which is NULL until the ndis driver's initialization is finished. Reviewed by: thompsa	2008-05-15 04:29:28 +00:00
Roman Divacky	4732e446fb	Implement robust futexes. Most of the code is modelled after what Linux does. This is because robust futexes are mostly userspace thing which we cannot alter. Two syscalls maintain pointer to userspace list and when process exits a routine walks this list waking up processes sleeping on futexes from that list. Reviewed by: kib (mentor) MFC after: 1 month	2008-05-13 20:01:27 +00:00
Roman Divacky	a6d043e30d	Implement linux_truncate64() syscall. Tested by: Aline de Freitas <aline@riseup.net> Approved by: kib (mentor)	2008-04-23 15:56:33 +00:00
Roman Divacky	cabce2bf19	The vmspace->vm_daddr is constant until freed, there is no need to hold lock while accessing it. Approved by: kib (mentor)	2008-04-21 21:24:08 +00:00
Roman Divacky	872cbe6466	Remove using magic value of -1 to distinguish between linux_open() and linux_openat(). Instead just pass AT_FDCWD into linux_common_open() for the linux_open() case. This prevents passing -1 as a dirfd to openat() from succeeding which is wrong. Suggested by: rwatson, kib Approved by: kib (mentor)	2008-04-09 16:42:50 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Konstantin Belousov	f2296b585e	Regen	2008-03-31 12:12:27 +00:00
Konstantin Belousov	4f1e7213d4	Add the freebsd32 compatibility shims for the *at() syscalls. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:08:30 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
John Birrell	8f0cc58815	Remove files that have been repo copied to their new location in cddl-specific parts of the source tree.	2008-03-28 00:08:47 +00:00
Doug Rabson	a7ac0db6cb	Regen.	2008-03-26 15:24:02 +00:00
Doug Rabson	dfdcada31e	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
John Baldwin	5c63b21a1a	Regen.	2008-03-25 19:35:34 +00:00
John Baldwin	30c6422a8a	Add entries for the cpuset-related system calls. The existing system calls can be used on little endian systems. Pointy hat to: jeff	2008-03-25 19:34:47 +00:00
Ruslan Ermilov	d7a38db650	Fix build. Reported by: ache, tinderbox	2008-03-25 13:20:52 +00:00
Roman Divacky	6af821237d	o Add stub support for some new futex operations, so the annoying message is not printed. o Don't warn about FUTEX_FD not being implemented and return ENOSYS instead of 0 (eg. success). o Clear FUTEX_PRIVATE_FLAG as we actually implement only private futexes so there is no reason to return ENOSYS when app asks for a private futex. We don't reject shared futexes because they worked just fine with our implementation so far. Approved by: kib (mentor) Tested by: bsam MFC after: 1 week	2008-03-20 17:03:55 +00:00
Antoine Brodin	afe5acff1b	Simplify fcntl(SVR4_F_DUP2FD) code now that FreeBSD has F_DUP2FD. Approved by: rwatson (mentor)	2008-03-17 18:27:28 +00:00
Roman Divacky	5dfb688191	Implement sched_setaffinity and get_setaffinity using real cpu affinity setting primitives. Reviewed by: jeff Approved by: kib (mentor)	2008-03-16 16:27:44 +00:00
Jeff Roberson	66257bc8d9	- The P_SA flag has been removed. Don't reference it in a KASSERT.	2008-03-12 22:17:06 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Konstantin Belousov	a0b0d286bc	Return ENOSYS instead of 0 for the unknown futex operations. Submitted by: rdivacky Reported and tested by: Gary Stanley <gary velocity-servers net>	2008-03-02 14:00:50 +00:00
Konstantin Belousov	cbd2c621f8	Sanitize arguments to linux_mremap(). Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified. Check for the page alignment of the addr argument. Submitted by: rdivacky MFC after: 1 week	2008-02-22 11:47:56 +00:00
Ruslan Ermilov	b95bd24d29	Regenerate for readlink(2).	2008-02-12 20:11:54 +00:00
Ruslan Ermilov	5f56182b6f	Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov	2008-02-12 20:09:04 +00:00
Poul-Henning Kamp	cf827063a9	Give MEXTADD() another argument to make both void pointers to the free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.	2008-02-01 19:36:27 +00:00
Pawel Jakub Dawidek	44ce1efd91	Change type of kmem_used() and kmem_size() functions to uint64_t, so it doesn't overflow in arc.c in this check: if (kmem_used() > (kmem_size() * 4) / 5) return (1); With this bug ZFS almost doesn't cache. Only 32bit machines are affected that have vm.kmem_size set to values >=1GB. Reported by: David Taylor <davidt@yadt.co.uk>	2008-01-24 11:21:54 +00:00
Robert Watson	20c6fe828a	Regenerate.	2008-01-20 23:44:24 +00:00
Robert Watson	6c902059f2	Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls shm_open() and shm_unlink(). More auditing will need to be done for these calls to capture arguments properly.	2008-01-20 23:43:06 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
John Baldwin	4ad6d200d6	Regen for shm_open(2) and shm_unlink(2).	2008-01-08 22:01:26 +00:00
John Baldwin	8e38aeff17	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
Konstantin Belousov	d075105da0	After applying LCONVPATH() to the path, do use the converted path instead of original user-mode string in the linux_stat() and linux_lstat() syscalls. Tested by: Peter Holm MFC after: 3 days	2008-01-05 12:36:35 +00:00

... 4 5 6 7 8 ...

1978 Commits