freebsd-nq

Author	SHA1	Message	Date
Dmitry Chagin	32fd44657c	In r218101 I have not changed properly the futex syscall definition. Some Linux futex ops atomically verifies that the futex address uaddr (uval) contains the value val. Comparing signed uval and unsigned val may lead to an unexpected result, mostly to a deadlock. So copyin uaddr to an unsigned int to compare the parameters correctly. While here change ktr records to print parameters in more readable format. Tested by eadler@ MFC after: 3 days	2014-05-28 05:57:35 +00:00
Marcel Moolenaar	0fa211be96	In freebsd32_sendmsg(), replace the call to sockargs() followed by a call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to readin and convert in a single step. This makes it simpler to put all the control messages in a single mbuf or mbuf cluster as per the limitations imposed upon us by ip6_setpktopts(). The logic is as follows: 1. Go over the array of control messages to determine overall size and include extra padding for proper alignment as we go. 2. Get a mbuf or mbuf cluster as needed or fail if the overall (adjusted) size is larger than a cluster. 3. Go over the array of control messages again, but now copy them into kernel space and into aligned offsets. 4. Update the length of the control message to take padding between the header and the data into account (but not for padding added between one control message and the next). Obtained from: Juniper Networks, Inc. MFC after: 1 week	2014-04-05 18:56:01 +00:00
Warner Losh	8a27a339b6	Remove instances of variables that were set, but never used. gcc 4.9 warns about these by default.	2014-03-30 23:43:36 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Konstantin Belousov	88b124cede	Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-19 12:35:04 +00:00
Attilio Rao	4f11a684ff	Regen per r263318. Sponsored by: EMC / Isilon storage division	2014-03-18 21:34:11 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
Ed Maste	0fcefb433d	Update NetBSD Foundation copyrights to 2-clause BSD The NetBSD Foundation states "Third parties are encouraged to change the license on any files which have a 4-clause license contributed to the NetBSD Foundation to a 2-clause license." This change removes clauses 3 and 4 from copyright / license blocks that list The NetBSD Foundation as the only copyright holder. Sponsored by: The FreeBSD Foundation	2014-03-18 01:40:25 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
John-Mark Gurney	6f2b769cac	change td_retval into a union w/ off_t, with defines to mask the change... This eliminates a cast, and also forces td_retval (often 2 32-bit registers) to be aligned so that off_t's can be stored there on arches with strict alignment requirements like armeb (AVILA)... On i386, this doesn't change alignment, and on amd64 it doesn't either, as register_t is already 64bits... This will also prevent future breakage due to people adding additional fields to the struct... This gets AVILA booting a bit farther... Reviewed by: bde	2014-03-16 00:53:40 +00:00
Gleb Smirnoff	b245f96c44	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
Eitan Adler	9ace80105a	linprocfs: add support for /sys/kernel/random/uuid PR: kern/186187 Submitted by: Fernando <fernando.apesteguia@gmail.com> MFC After: 2 weeks	2014-02-27 00:43:10 +00:00
Konstantin Belousov	49d39308ba	The posix_madvise(3) and posix_fadvise(2) should return error on failure, same as posix_fallocate(2). Noted by: Bob Bishop <rb@gid.co.uk> Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-30 18:04:39 +00:00
Konstantin Belousov	2852de0489	The posix_fallocate(2) syscall should return error number on error, without modifying errno. Reported and tested by: Gennady Proskurin <gpr@mail.ru> Reviewed by: mdf PR: standards/186028 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-23 17:24:26 +00:00
Adrian Chadd	0cfea1c8fc	Implement a kqueue notification path for sendfile. This fires off a kqueue note (of type sendfile) to the configured kqfd when the sendfile transaction has completed and the relevant memory backing the transaction is no longer in use by this transaction. This is analogous to SF_SYNC waiting for the mbufs to complete - except now you don't have to wait. Both SF_SYNC and SF_KQUEUE should work together, even if it doesn't necessarily make any practical sense. This is designed for use by applications which use backing cache/store files (eg Varnish) or POSIX shared memory (not sure anything is using it yet!) to know when a region of memory is free for re-use. Note it doesn't mark the region as free overall - only free from this transaction. The application developer still needs to track which ranges are in the process of being recycled and wait until all pending transactions are completed. TODO: * documentation, as always Sponsored by: Netflix, Inc.	2014-01-17 05:26:55 +00:00
Adrian Chadd	a43caef195	Refactor out the common sendfile code from the do_sendfile() and the compat32 sendfile syscall. Sponsored by: Netflix, Inc.	2014-01-09 00:11:14 +00:00
Adrian Chadd	79750e3b36	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
Peter Wemm	b5019bc45b	jail_v0.ip_number was always in host byte order. This was handled in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally.	2013-11-28 19:40:33 +00:00
Konstantin Belousov	80c3af4e80	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-26 19:47:09 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Adrian Chadd	7689abaedc	Fix the compat32 sendfile() to be in line with my recent changes. Reminded by: kib	2013-11-26 08:32:37 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Gleb Smirnoff	37968927c8	Fix build. Pointy hat to: glebius	2013-11-05 19:17:19 +00:00
Gleb Smirnoff	af50ea380f	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
Gleb Smirnoff	5fb009bda7	Drop support for historic ioctls and also undefine them, so that code that checks their presence via ifdef, won't use them. Bump __FreeBSD_version as safety measure.	2013-11-05 10:29:47 +00:00
Gleb Smirnoff	66e01d73cd	- Provide necessary includes. - Remove unnecessary includes. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-29 11:17:49 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Konstantin Belousov	3a2092bad0	Add padding to match the compat32 struct stat32 definition to the real struct stat on 32bit architectures. Debugged and tested by: bsam Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-04 22:05:23 +00:00
Mark Johnston	92c6196caa	Fix some typos that were causing probe argument types to show up as unknown. Reviewed by: rwatson (mac provider) Approved by: re (glebius) MFC after: 1 week	2013-10-01 15:40:27 +00:00
Mark Johnston	8d305ba0dc	Regenerate syscall argument strings after r255777. Approved by: re (gjb) MFC after: 1 week	2013-09-21 23:06:36 +00:00
John Baldwin	a566e8e3c5	Regen. Approved by: re (delphij)	2013-09-19 18:56:00 +00:00
John Baldwin	55648840de	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
Jilles Tjoelker	9fdb497cd0	Regenerate for freebsd32_cap_enter(). Approved by: re (hrs)	2013-09-17 20:49:05 +00:00
Jilles Tjoelker	529411c369	Disallow cap_enter() in freebsd32 compatibility mode. The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit kernels) does not currently allow any system calls in capability mode, but still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels that use capability mode do not work (they crash after being disallowed to call sys_exit()). Affected binaries include dhclient and uniq. The latter's crashes cause obscure build failures. This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability mode was not compiled in. Applications deal with this by doing their work without capability mode. This commit does not fix the uncommon situation where a 64-bit process enters capability mode and then executes a 32-bit binary using fexecve(). This commit should be reverted when allowing the necessary freebsd32 system calls in capability mode. Reviewed by: pjd Approved by: re (hrs)	2013-09-17 20:48:19 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Pawel Jakub Dawidek	00a7f703b3	Regenerate after r255219. Sponsored by: The FreeBSD Foundation	2013-09-05 00:11:59 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Will Andrews	5e9ccc8797	Add the ability to display the default FIB number for a process to the ps(1) utility, e.g. "ps -O fib". bin/ps/keyword.c: Add the "fib" keyword and default its column name to "FIB". bin/ps/ps.1: Add "fib" as a supported keyword. sys/compat/freebsd32/freebsd32.h: sys/kern/kern_proc.c: sys/sys/user.h: Add the default fib number for a process (p->p_fibnum) to the user land accessible process data of struct kinfo_proc. Submitted by: Oliver Fromme <olli@fromme.com>, gibbs	2013-08-26 23:48:21 +00:00
Andre Oppermann	bb25e5ab00	Give (*ext_free) an int return value allowing for very sophisticated external mbuf buffer management capabilities in the future. For now only EXT_FREE_OK is defined with current legacy behavior. Sponsored by: The FreeBSD Foundation	2013-08-25 10:57:09 +00:00
Pawel Jakub Dawidek	f69e5b30c9	Regenerate after r254491.	2013-08-18 13:38:39 +00:00
Pawel Jakub Dawidek	32536142f6	The cap_rights_limit(2) system calls needs a wrapper for 32bit binaries running under 64bit kernels as the 'rights' argument has to be split into two registers or the half of the rights will disappear. Reported by: jilles Sponsored by: The FreeBSD Foundation	2013-08-18 13:37:54 +00:00
Pawel Jakub Dawidek	d6f6b87647	Move the PAIR32TO64() macro and the RETVAL_HI/RETVAL_LO defines to a header file for use by other .c files. Sponsored by: The FreeBSD Foundation	2013-08-18 13:34:11 +00:00
Pawel Jakub Dawidek	e11dc435ba	Regenerate after r254481.	2013-08-18 10:31:30 +00:00
Pawel Jakub Dawidek	0dac22d8ea	Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64. Reported by: jilles Sponsored by: The FreeBSD Foundation	2013-08-18 10:30:41 +00:00
Mark Johnston	1570438586	Remove a couple of unused macros. MFC after: 3 days	2013-08-17 21:53:37 +00:00
Pawel Jakub Dawidek	9a57f6e88c	Regenerate after r254447. Sponsored by: The FreeBSD Foundation	2013-08-17 14:18:41 +00:00
Pawel Jakub Dawidek	b49f2e4b48	Make pdfork(2), pdkill(2) and pdgetpid(2) syscalls available for 32bit binaries running under 64bit kernel. Sponsored by: The FreeBSD Foundation	2013-08-17 14:17:13 +00:00
Gleb Smirnoff	ca04d21d5f	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Konstantin Belousov	68044954d2	Regenerate.	2013-07-21 19:44:53 +00:00
Konstantin Belousov	643ee87175	Implement compat32 wrappers for the ktimer_* syscalls. Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:43:52 +00:00
Konstantin Belousov	058262c69a	Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32 argument. Reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:40:30 +00:00
Konstantin Belousov	ab751a525b	The freebsd32_lio_listio() compat syscall takes the struct sigevent32. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:36:53 +00:00
Konstantin Belousov	9731998997	Move the convert_sigevent32() utility function into freebsd32_misc.c for consumption outside the vfs_aio.c. For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods, also copy in the sigev_value, since librt event pumping loop compares note generation number with the value passed through sigev_value. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:33:48 +00:00
Konstantin Belousov	df4d0ed1cd	Cosmetic change, use the same union name on the left and right sides of the conversion. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:17:46 +00:00
Konstantin Belousov	c5d0363709	Regenerate	2013-07-20 13:40:03 +00:00
Konstantin Belousov	d31e4b3a58	id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2). Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation	2013-07-20 13:39:41 +00:00
Hans Petter Selasky	a40a377cc7	Add some missing LIBUSB IOCTL conversion codes.	2013-07-14 10:13:01 +00:00
Alexander Leidinger	b85e1f7d05	- Move videodev headers from compat/linux to contrib/v4l (cp from vendor and apply diff to compat/linux versions). - The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one. The update makes video in skype v4 work on FreeBSD. Tested by: Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com> (update of header only)	2013-07-06 19:59:06 +00:00
Gleb Smirnoff	8d1aa3c6b4	aio_mlock() added: - Regen for r251526. - Bump __FreeBSD_version.	2013-06-08 13:30:13 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Alan Cox	66c392df53	Relax the vm object locking. Use a read lock. Sponsored by: EMC / Isilon Storage Division	2013-06-05 17:00:10 +00:00
David E. O'Brien	1f4e9654bd	Add a "kern.features" MIB for 32bit support under a 64bit kernel.	2013-05-31 21:43:17 +00:00
Konstantin Belousov	f85769eb75	Regenerate.	2013-05-21 11:41:08 +00:00
Konstantin Belousov	48947eccee	Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument. Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week	2013-05-21 11:40:16 +00:00
Jilles Tjoelker	b201f4a0dc	Regenerate files for pipe2().	2013-05-01 22:45:04 +00:00
Jilles Tjoelker	dc570d5e56	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
Jilles Tjoelker	1bf6b724f1	Regenerate files for accept4().	2013-05-01 20:12:58 +00:00
Jilles Tjoelker	da7d2afb6d	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.	2013-05-01 20:10:21 +00:00
Matthew D Fleming	b3e6bbc676	Regen. MFC after: 1 week	2013-04-02 05:30:52 +00:00
Matthew D Fleming	e324bf91e8	Fix return type of extattr_set_* and fix rmextattr(8) utility. extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_. Also, the user-space utility was using an int for the return value of extattr_get_ and extattr_list_*, both of which return an ssize_t. MFC after: 1 week	2013-04-02 05:30:41 +00:00
Jilles Tjoelker	d289dc7b73	Rename do_pipe() to kern_pipe2() and declare it properly.	2013-03-31 17:42:54 +00:00
Pawel Jakub Dawidek	5d46382415	Regenerate after r248599. Sponsored by: The FreeBSD Foundation	2013-03-21 23:02:19 +00:00
Pawel Jakub Dawidek	e948704e4b	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation	2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek	14cd1ffdf8	Regenerate after r248597. Sponsored by: The FreeBSD Foundation	2013-03-21 22:47:03 +00:00
Pawel Jakub Dawidek	b4b2596b97	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
Gleb Smirnoff	dc4ad05ecd	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Eitan Adler	1eb9ea583b	Remove check for NULL prior to free(9) and m_freem(9). Approved by: cperciva (mentor)	2013-03-04 02:21:34 +00:00
Pawel Jakub Dawidek	378a73d1bd	Regen after r247667.	2013-03-02 21:12:54 +00:00
Pawel Jakub Dawidek	7493f24ee6	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek	1dc31587bf	Regen after r247602.	2013-03-02 00:55:09 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Xin LI	69136e792a	Fix wrong assignment. Submitted by: Sascha Wildner <saw online de> Obtained from: DragonFly rev 9568dd07a22a136e380e6c19a8ea188eb92976d5 MFC after: 2 weeks	2013-03-01 23:21:18 +00:00
John Baldwin	d825ce0a5d	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
Dmitry Chagin	4d04cf1d9e	Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling. MFC after: 1 Week	2013-01-25 14:40:54 +00:00
John Baldwin	fb709557a3	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Colin Percival	43f13bea35	MFS security patches which seem to have accidentally not reached HEAD: Fix insufficient message length validation for EAP-TLS messages. Fix Linux compatibility layer input validation error. Security: FreeBSD-SA-12:07.hostapd Security: FreeBSD-SA-12:08.linux Security: CVE-2012-4445, CVE-2012-4576 With hat: so@	2012-11-23 01:48:31 +00:00
Konstantin Belousov	a2a8559624	Style fixes for r242958. Reported and reviewed by: bde MFC after: 28 days	2012-11-16 06:22:14 +00:00
Konstantin Belousov	552e993580	Regen	2012-11-13 12:53:41 +00:00
Konstantin Belousov	f13b5a0f01	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month	2012-11-13 12:52:31 +00:00
Konstantin Belousov	140dedb81c	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Konstantin Belousov	877d24ac8a	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
Kevin Lo	457a9cfbc1	Remove redundant check	2012-09-12 10:12:03 +00:00
John Baldwin	a89828a2b0	Remove some more NetBSD compat shims and other unused bits from these drivers: - Remove scsi_low_pisa.*, they were unused. - Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that header. They were empty nops. - Retire sl_xname and use device_get_nameunit() and device_printf() with the underlying device_t instead. - Remove unused {ct,ncv,nsp,stg}print() functions. - Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.	2012-09-10 18:49:49 +00:00
David Xu	e31eb35c3f	regen.	2012-08-17 02:47:16 +00:00
David Xu	d65f1abca7	Implement syscall clock_getcpuclockid2, so we can get a clock id for process, thread or others we want to support. Use the syscall to implement POSIX API clock_getcpuclock and pthread_getcpuclockid. PR: 168417	2012-08-17 02:26:31 +00:00
Konstantin Belousov	f9ca52f21d	Regenerate.	2012-08-15 15:18:20 +00:00
Konstantin Belousov	1a5fe89842	Provide 32bit compat for truncate(2) and ftruncate(2). MFC after: 1 week	2012-08-15 15:17:56 +00:00
Konstantin Belousov	a8d4becf02	Regenerate.	2012-08-14 12:09:36 +00:00
Konstantin Belousov	f90fabce5a	Implement the old mmap syscall for compat32, when COMPAT_43 option is enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker. MFC after: 1 week	2012-08-14 12:09:09 +00:00
Konstantin Belousov	481af8b933	Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct sysentvec .sv_minuser. Also improve style. Submitted by: Oliver Pinter <oliver.pinter@gmail.com> MFC after: 1 week	2012-07-22 13:41:45 +00:00
Konstantin Belousov	c5c1199c83	Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks	2012-07-02 21:01:03 +00:00
Kevin Lo	544c5e5b53	Make sure that each va_start has one and only one matching va_end, especially in error cases.	2012-05-29 01:48:06 +00:00
Konstantin Belousov	371778a333	Fix ki_cow for compat32 binaries. MFC after: 3 days	2012-05-27 05:24:53 +00:00
Ed Schouten	4412ad4887	Regenerate system call tables.	2012-05-25 21:52:57 +00:00
Ed Schouten	520b6a84f6	Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types.	2012-05-25 21:50:48 +00:00
Gleb Kurtsou	76dcec5d09	Add kern_fhstat(), adjust sys_fhstat() to use it. Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue. Sponsored by: Google Summer of Code 2011	2012-05-24 08:00:26 +00:00
Alexander Leidinger	19e252baeb	- >500 static DTrace probes for the linuxulator - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).	2012-05-05 19:42:38 +00:00
Jung-uk Kim	d69a426fce	- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27 but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation	2012-04-16 21:22:02 +00:00
Tijl Coosemans	a5e3a5a56f	Remove some unnecessary includes.	2012-03-18 19:15:11 +00:00
Tijl Coosemans	6e310b206f	Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h. Reviewed by: kib	2012-03-18 19:12:11 +00:00
Tijl Coosemans	01cd19680d	Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98 reg.h with stubs. The tREGISTER macros are only made visible on i386. These macros are deprecated and should not be available on amd64. The i386 and amd64 versions of struct reg have been renamed to struct __reg32 and struct __reg64. During compilation either __reg32 or __reg64 is defined as reg depending on the machine architecture. On amd64 the i386 struct is also available as struct reg32 which is used in COMPAT_FREEBSD32 code. Most of compat/ia32/ia32_reg.h is now IA64 only. Reviewed by: kib (previous version)	2012-03-18 19:06:38 +00:00
Tijl Coosemans	786645078b	Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h. Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed. Create machine/npx.h on amd64 to allow compiling i386 code that uses this header. The original npx.h and fpu.h define struct envxmm differently. Both definitions have been included in the new x86 header as struct __envxmm32 and struct __envxmm64. During compilation either __envxmm32 or __envxmm64 is defined as envxmm depending on machine architecture. On amd64 the i386 struct is also available as struct envxmm32. Reviewed by: kib	2012-03-16 20:24:30 +00:00
Rebecca Cran	03225fac13	Fix race condition in KfRaiseIrql(). After getting the current irql, if the kthread gets preempted and subsequently runs on a different CPU, the saved irql could be wrong. Also, correct the panic string. PR: kern/165630 Submitted by: Vladislav Movchan <vladislav.movchan at gmail.com>	2012-03-04 17:08:43 +00:00
Juli Mallett	a6d20bbaa2	On MIPS, _ALIGN always aligns to 8 bytes, even for 32-bit binaries. This might not be ideal, but is the ABI we've shipped so far. Fix macros which reflect the results of _ALIGN on 32-bit MIPS to use the right alignment. This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.	2012-03-03 21:39:12 +00:00
Juli Mallett	9624d94701	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
Martin Matuska	41c0675e6e	Add procfs to jail-mountable filesystems. Reviewed by: jamie MFC after: 1 week	2012-02-29 00:30:18 +00:00
Konstantin Belousov	526d0bd547	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
Konstantin Belousov	3494f31ad2	Fix misuse of the kernel map in miscellaneous image activators. Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks	2012-02-17 23:47:16 +00:00
Ed Schouten	7870adb640	Remove direct access to si_name. Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks	2012-02-10 12:35:57 +00:00
David Xu	d56e058a79	Add 32-bit compat code for AIO kevent flags introduced in revision 230857.	2012-02-05 04:49:31 +00:00
Konstantin Belousov	8c6f8f3d5b	Add support for the extended FPU states on amd64, both for native 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month	2012-01-21 17:45:27 +00:00
Kirk McKusick	cc672d3599	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks	2012-01-17 01:08:01 +00:00
Mikolaj Golub	fe7f89b71a	Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want to read strings completely to know the actual size. As a side effect it fixes the issue with kern.proc.args and kern.proc.env sysctls, which didn't return the size of available data when calling sysctl(3) with the NULL argument for oldp. Note, in get_ps_strings(), which does actual work for proc_getargv() and proc_getenvv(), we still have a safety limit on the size of data read in case of a corrupted procces stack. Suggested by: kib MFC after: 3 days	2012-01-15 18:47:24 +00:00
Ulrich Spörlein	9a14aa017b	Convert files to UTF-8	2012-01-15 13:23:18 +00:00
Dimitry Andric	69ee3e2f52	In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer is compared to an integer, by casting the pointer to l_uintptr_t. No functional difference on both i386 and amd64. Reviewed by: ed, jhb MFC after: 1 week	2012-01-03 18:49:39 +00:00
Dimitry Andric	84143cee4f	In sys/compat/ndis/subr_ntoskrnl.c, change the RtlFillMemory function definition from K&R to ANSI, to avoid a clang warning about the uint8_t parameter being promoted to int, which is not compatible with the type declared in the earlier prototype. MFC after: 1 week	2011-12-30 17:18:09 +00:00
John Baldwin	dd01579cde	Implement linux_fadvise64() and linux_fadvise64_64() using kern_posix_fadvise(). Reviewed by: silence on emulation@ MFC after: 2 weeks	2011-12-29 15:34:59 +00:00
Mikolaj Golub	022eba3410	Protect process environment variables with p_candebug(). Discussed with: jilles, kib, rwatson MFC after: 2 weeks	2011-12-04 21:43:13 +00:00
Mikolaj Golub	7a837e17ba	Retire linprocfs_doargv(). Instead use new functions, proc_getargv() and proc_getenvv(), which were implemented using linprocfs_doargv() as a reference. Suggested by: kib Reviewed by: kib Approved by: des (linprocfs maintainer) MFC after: 2 weeks	2011-11-22 20:45:11 +00:00
Lawrence Stewart	cf13a58510	- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate() system calls to provide feed-forward clock management capabilities to userspace processes. ffclock_getcounter() returns the current value of the kernel's feed-forward clock counter. ffclock_getestimate() returns the current feed-forward clock parameter estimates and ffclock_setestimate() updates the feed-forward clock parameter estimates. - Document the syscalls in the ffclock.2 man page. - Regenerate the script-derived syscall related files. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 01:26:10 +00:00
Ed Schouten	767a32641c	Make the Linux *at() calls a bit more complete. Properly support: - AT_EACCESS for faccessat(), - AT_SYMLINK_FOLLOW for linkat().	2011-11-19 07:19:37 +00:00
Ed Schouten	51cfb9474f	Regenerate system call tables.	2011-11-19 06:36:11 +00:00
Ed Schouten	d3a993d46b	Improve access() parameter name consistency. The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag. Make this less confusing by using the same name as used in the POSIX specification -- `amode'.	2011-11-19 06:35:15 +00:00
John Baldwin	7edec6214e	- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.	2011-11-14 18:00:15 +00:00
Sergey Kandaurov	81f7f2c4db	struct timespec32: change types of tv_sec and tv_nsec fields to signed to match native struct timespec ABI on __LP32__. This change is a prerequisite for upcoming futimens()/utimensat() in whose implementations it is assumed that timespec32 can take a negative value. MFC after: 1 week	2011-11-11 07:17:00 +00:00
Ryan Stone	493b584dbd	Correct the types of the arguments to return probes of the syscall provider. Previously we were erroneously supplying the argument types of the corresponding entry probe. Reviewed by: rpaulo MFC after: 1 week	2011-11-11 03:49:42 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
John Baldwin	cd06ae5c1b	Regen.	2011-11-04 04:06:31 +00:00
John Baldwin	936c09ac0f	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month	2011-11-04 04:02:50 +00:00

1 2 3 4 5 ...

2170 Commits