freebsd-nq

Author	SHA1	Message	Date
Kevin Lo	41f1dccceb	Add unicode support to msdosfs and smbfs; original pathes from imura, bug fixes by Kuan-Chung Chiu <buganini at gmail dot com>. Tested by me in production for several days at work.	2011-11-18 03:05:20 +00:00
Pawel Jakub Dawidek	d576deedb5	Constify arguments for locking KPIs where possible. This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb	2011-11-16 21:51:17 +00:00
Pawel Jakub Dawidek	a20358302f	Constify stack argument for functions that don't modify it. Reviewed by: ed, kib, jhb	2011-11-16 19:06:55 +00:00
Marius Strobl	d7ecd801ed	As it turns out, r186347 actually is insufficient to avoid the use of the curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style curthread implementation on sparc64, crashing the kernel in its early cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the things we need for that, which leads to a chicken-and-egg problem). What happens is that due to the fact that the idea of r210623 actually is to allow the compiler to cache invocations of curthread, it factors out obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to before the branch based on kobj_mutex_inited when compiling the kernel without the debugging options. So change kobj_class_compile_static(9) to just never acquire kobj_mtx, effectively restricting it to its documented use, and add a kobj_init_static(9) for initializing objects using a class compiled with the former and that also avoids using mutex(9) (and malloc(9)). Also assert in both of these functions that they are used in their intended way only. While at it, inline kobj_register_method() and kobj_unregister_method() as there wasn't much point for factoring them out in the first place and so that a reader of the code has to figure out the locking for fewer functions missing a KOBJ_ASSERT. Tested on powerpc{,64} by andreast. Reviewed by: nwhitehorn (earlier version), jhb MFC after: 3 days	2011-11-15 20:11:03 +00:00
David E. O'Brien	0e31b3c15f	Reformat comment to be more readable in standard Xterm. (while I'm here, wrap other long lines)	2011-11-15 01:48:53 +00:00
Robert Millan	ea4d9a14f1	Remove a few bits of FreeBSD 2.x compatibility code. Approved by: kib (mentor)	2011-11-14 18:21:27 +00:00
John Baldwin	7edec6214e	- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.	2011-11-14 18:00:15 +00:00
Alfred Perlstein	cfb09e00e6	Constify args to copyiniov and copyinuio.	2011-11-14 07:12:10 +00:00
Konstantin Belousov	56be1b9a7a	To limit amount of the kernel memory allocated, and to optimize the iteration over the fdsets, kern_select() limits the length of the fdsets copied in by the last valid file descriptor index. If any bit is set in a mask above the limit, current implementation ignores the filedescriptor, instead of returning EBADF. Fix the issue by scanning the tails of fdset before entering the select loop and returning EBADF if any bit above last valid filedescriptor index is set. The performance impact of the additional check is only imposed on the (somewhat) buggy applications that pass bad file descriptors to select(2) or pselect(2). PR: kern/155606, kern/162379 Discussed with: cognet, glebius Tested by: andreast (powerpc, all 64/32bit ABI combinations, big-endian), marius (sparc64, big-endian) MFC after: 2 weeks	2011-11-13 10:28:01 +00:00
Konstantin Belousov	4d651f4e5f	Style. MFC after: 1 week	2011-11-11 04:13:47 +00:00
Konstantin Belousov	f403cfb19c	Guard against the unlikely case of the alias path containing the '%' symbols. Reported by: arundel MFC after: 1 week	2011-11-11 04:12:58 +00:00
Ryan Stone	493b584dbd	Correct the types of the arguments to return probes of the syscall provider. Previously we were erroneously supplying the argument types of the corresponding entry probe. Reviewed by: rpaulo MFC after: 1 week	2011-11-11 03:49:42 +00:00
Ed Schouten	d09ebcec17	Simplify the code emitted by makeobjops.awk slightly. Just place the default kobj_method inside the kobjop_desc structure. There's no need to give these kobj_methods their own symbol. This shaves off 10 KB of a GENERIC kernel binary.	2011-11-09 11:00:29 +00:00
Ed Schouten	3f3f6bc302	Make kobj_methods constant. These structures hold no information that is modified during runtime. By marking this constant, we see approximately 600 symbols become read-only (amd64 GENERIC). While there, also mark the kobj_method structures generated by makeobjops.awk static. They are only referenced by the kobjop_desc structures within the same file. Before: $ ls -l kernel -rwxr-xr-x 1 ed wheel 15937309 Nov 8 16:29 kernel* $ size kernel text data bss dec hex filename 12260854 1358468 2848832 16468154 fb48ba kernel $ nm kernel \| fgrep -c ' r ' 8240 After: $ ls -l kernel -rwxr-xr-x 1 ed wheel 15922469 Nov 8 16:25 kernel* $ size kernel text data bss dec hex filename 12302869 `1302660` 2848704 16454233 fb1259 kernel $ nm kernel \| fgrep -c ' r ' 8838	2011-11-08 15:38:21 +00:00
Ryan Stone	6f6924e5a6	The in-kernel CTF parser caches the result of its first attempt to parse CTF data from a module. On subsequent attempts to retrieve CTF data for a module, return an error if there no CTF data. This fixes a panic if you try to enable fbt probes on a module with CTF data twice. Submitted by: Paul Ambrose (ambrosehua AT gmail DOT com) MFC after: 3 days	2011-11-08 15:17:54 +00:00
Attilio Rao	ed1f6dc235	Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on all the architectures. The option allows to mount non-MPSAFE filesystem. Without it, the kernel will refuse to mount a non-MPSAFE filesytem. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch. Tested by: gianni Reviewed by: kib	2011-11-08 10:18:07 +00:00
Mikolaj Golub	5384d08913	Add KVME_FLAG_SUPER and use it in sysctl_kern_proc_vmmap for marking entries with superpages. Submitted by: Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net> Reviewed by: alc, rwatson	2011-11-07 21:13:19 +00:00
Mikolaj Golub	bde886fba4	In lim_fork() assert that processes locks are held. Suggested by: kib	2011-11-07 21:09:04 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Max Khon	4e313b699e	Add KLD_DEBUG option.	2011-11-06 08:10:41 +00:00
John Baldwin	cd06ae5c1b	Regen.	2011-11-04 04:06:31 +00:00
John Baldwin	936c09ac0f	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month	2011-11-04 04:02:50 +00:00
John Baldwin	dccc45e4c0	Move the cleanup of f_cdevpriv when the reference count of a devfs file descriptor drops to zero out of _fdrop() and into devfs_close_f() as it is only relevant for devfs file descriptors. Reviewed by: kib MFC after: 1 week	2011-11-04 03:39:31 +00:00
Attilio Rao	2b10b1f872	Disable interrupt and preemption for smp_rendezvous() also in the UP/!SMP case. The callbacks may be relying on this feature and having 2 different ways to deal with them is not correct. Reported by: rstone Reviewed by: jhb MFC after: 2 weeks	2011-11-03 14:36:56 +00:00
Marcel Moolenaar	b2f1a8f2b3	Revert rev. 226893: subr_syscall.c is being included from C files and on amd64 with FREEBSD32 enabled, this means that systrace_probe_func gets defined twice.	2011-10-30 02:19:39 +00:00
Marcel Moolenaar	056f0ec755	Define systrace_probe_func in subr_syscall.c where it's used, instead of defining it in MD code. This eliminates porting to other architectures.	2011-10-29 01:26:36 +00:00
Sergey Kandaurov	c241c5e49a	Fix arguments list for proc:::signal-discard DTrace probe. Reported by: Anton Yuzhaninov <citrin citrin ru> MFC after: 1 week	2011-10-28 15:22:51 +00:00
John Baldwin	62238a6791	Whitespace fix.	2011-10-27 17:43:36 +00:00
Alan Cox	703dec68bf	Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.	2011-10-27 16:39:17 +00:00
Sergey Kandaurov	3bedc94069	Remove the long reprecated ``/stand/sysinstall'' from the init_path. It can be put back using the INIT_PATH config option or init_path loader variable, if still needed (which I doubt). MFC after: 1 week	2011-10-27 10:25:11 +00:00
Alan Cox	f346986b76	contigmalloc(9) and contigfree(9) are now implemented in terms of other more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.	2011-10-27 02:52:24 +00:00
John Baldwin	c48fb4da4c	- Fixup filenames in a few more places where they are used. - Some whitespace fixes.	2011-10-26 15:17:42 +00:00
Pawel Jakub Dawidek	4c11f091df	The v_data field is a pointer, so set it to NULL, not 0. MFC after: 3 days	2011-10-25 14:01:17 +00:00
Marcel Moolenaar	421b7fe574	Don't terminate the interactive root mount prompt on mount failure. This restores the previous behaviour. While here, match '?' and '.' inputs exactly and improve the error message. Requested by: avg@ Derived from a patch by: Arnaud Lacombe <lacombar@gmail.com>	2011-10-23 20:03:33 +00:00
Dag-Erling Smørgrav	e141be6f79	Revisit the capability failure trace points. The initial implementation only logged instances where an operation on a file descriptor required capabilities which the file descriptor did not have. By adding a type enum to struct ktr_cap_fail, we can catch other types of capability failures as well, such as disallowed system calls or attempts to wrap a file descriptor with more capabilities than it had to begin with.	2011-10-18 07:28:58 +00:00
Marcel Moolenaar	80f1c58b0a	Fix double vision syndrome (read: double output) when in the debugger without a panic.	2011-10-16 14:16:46 +00:00
Konstantin Belousov	126b36a21e	Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel	2011-10-15 12:35:18 +00:00
Marcel Moolenaar	676eda08d0	In elf32_trans_prot() and when compiling for amd64 or ia64, add PROT_EXECUTE when PROT_READ is needed. By default i386 allows execution when reading is allowed and JDK 1.4.x depends on that.	2011-10-13 16:16:46 +00:00
Gleb Smirnoff	8d689e042f	Make memguard(9) capable to guard uma(9) allocations.	2011-10-12 18:08:28 +00:00
Robert Watson	b160c14194	Correct a bug in export of capability-related information from the sysctls supporting procstat -f: properly provide capability rights information to userspace. The bug resulted from a merge-o during upstreaming (or rather, a failure to properly merge FreeBSD-side changed downstream). Spotted by: des, kibab MFC after: 3 days	2011-10-12 12:08:03 +00:00
Adrian Chadd	df46ae53f6	Don't call fixup_filename() on each witness lock call. This has been irking me for a while. This causes significant CPU use on bottlenecked CPUs (eg my older EEEPC w/ an earlier Celeron CPU and my MIPS24k boards) when they're passing a lot of traffic. Since the file/line values are only used for printing, this should only affect display. It should have no operational change on the code, besides reducing CPU use.	2011-10-12 09:21:02 +00:00
Dag-Erling Smørgrav	c601ad8eeb	Add a new trace point, KTRFAC_CAPFAIL, which traces capability check failures. It is included in the default set for ktrace(1) and kdump(1).	2011-10-11 20:37:10 +00:00
Kirk McKusick	cd795a6e1f	When unmounting a filesystem always wait for the vfs_busy lock to clear so that if no vnodes in the filesystem are actively in use the unmount will succeed rather than failing with EBUSY. Reported by: Garrett Cooper Reviewed by: Attilio Rao and Kostik Belousov Tested by: Garrett Cooper PR: kern/161016 MFC after: 3 weeks	2011-10-11 18:46:41 +00:00
Marius Strobl	f305d1b0db	In device_get_children() avoid malloc(0) in order to increase portability to other operating systems. PR: 154287	2011-10-09 21:21:37 +00:00
Alan Cox	1549ed03ff	Fix the handling of an empty kmem map by sysctl_kmem_map_free(). In the unlikely event that sysctl_kmem_map_free() was performed on an empty kmem map, it would incorrectly report the free space as zero. Discussed with: avg MFC after: 1 week	2011-10-08 18:29:30 +00:00
Jonathan Anderson	25e33e625f	Change one printf() to log(). As noted in kern/159780, printf() is not very jail-friendly, since it can't be easily monitored by jail management tools. This patch reports an error via log() instead, which, if nobody is watching the log file, still prints to the console. Approved by: mentor (rwatson) Submitted by: Eugene Grosbein <eugen@eg.sd.rdtc.ru> MFC after: 5 days	2011-10-07 09:51:12 +00:00
David E. O'Brien	ef522f9515	Disallow various debug.kdb sysctl's when securelevel is raised. PR: 161350	2011-10-07 05:47:30 +00:00
Xin LI	2b03effa01	Return proper errno when we hit error when doing sanity check. This fixes dtrace crashes when module is not compiled with CTF data. Submitted by: Paul Ambrose ambrosehua at gmail.com MFC after: 1 week	2011-10-07 01:37:58 +00:00
Marius Strobl	880bf8b9bd	- Currently, sched_balance_pair() may cause a CPU to send an IPI_PREEMPT to itself, which sparc64 hardware doesn't support. One way to solve this would be to directly call sched_preempt() instead of issuing a self-IPI. However, quoting jhb@: "On the other hand, you can probably just skip the IPI entirely if we are going to send it to the current CPU. Presumably, once this routine finishes, the current CPU will exit softlock (or will do so "soon") and will then pick the next thread to run based on the adjustments made in this routine, so there's no need to IPI the CPU running this routine anyway. I think this is the better solution. Right now what is probably happening on other platforms is as soon as this routine finishes the CPU processes its self-IPI and causes mi_switch() which will just switch back to the softclock thread it is already running." - With r226054 and the the above change in place, sparc64 now no longer is incompatible with ULE and vice versa. However, powerpc/E500 still is. Submitted by: jhb Reviewed by: jeff	2011-10-06 11:48:13 +00:00
Edward Tomasz Napierala	4ce9c95bd8	Remove assertion against empty NFSv4 ACLs. An empty ACL is not exactly valid - we don't allow for setting it on a file, for example - but it's not something we should assert on. For STABLE kernel, it changes nothing, because it's not compiled with INVARIANTS. If it was, it would fix crashes. It also fixes an assert in libc encountered with NFSv4 without nfsuserd(8) running. Submitted by: Yuri Pankov (earlier version) MFC after: 1 month	2011-10-05 17:29:49 +00:00
Konstantin Belousov	a101072d7f	Supply unique (st_dev, st_ino) value pair for the fstat(2) done on the pipes. Reviewed by: jhb, Peter Jeremy <peterjeremy acm org> MFC after: 2 weeks	2011-10-05 16:56:06 +00:00
Konstantin Belousov	837b4d462d	Move parts of the commit log for r166167, where Tor explained the interaction between vnode locks and vfs_busy(), into comment. MFC after: 1 week	2011-10-04 18:45:29 +00:00
Edward Tomasz Napierala	c0c0936205	Actually enforce limit for inheritable resources on fork. MFC after: 3 days	2011-10-04 14:56:33 +00:00
Edward Tomasz Napierala	2d8696d1e8	Move some code inside the racct_proc_fork(); it spares a few lock operations and it's more logical this way. MFC after: 3 days	2011-10-03 17:40:55 +00:00
Konstantin Belousov	24f3dcfe50	Assert that exiting process does not return to usermode. Reviewed by: avg, jhb MFC after: 1 week	2011-10-03 16:58:58 +00:00
Edward Tomasz Napierala	72a401d918	Fix another bug introduced in r225641, which caused rctl to access certain fields in 'struct proc' before they got initialized in do_fork(). MFC after: 3 days	2011-10-03 16:23:20 +00:00
Edward Tomasz Napierala	ac6fafe6c2	Fix bug introduced in r225641, which would cause panic if racct_proc_fork() returned error -- the racct_destroy_locked() would get called twice. MFC after: 3 days	2011-10-03 15:32:15 +00:00
Konstantin Belousov	8e9a54ee46	The sigwait(3) function shall not return EINTR, according to the POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7 contains the wrapper sigwait(3) which hides EINTR from callers. The EINTR return is used by libthr to handle required cancellation point in the sigwait(3). To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and earlier, to have right ABI for sigwait(3), transform EINTR return from sigwait(2) into ERESTART. Discussed with: davidxu MFC after: 1 week	2011-10-01 10:18:55 +00:00
Bjoern A. Zeeb	a06534c3c2	Fix handling of corrupt compress(1)ed data. [11:04] Add missing length checks on unix socket addresses. [11:05] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-11:04.compress Security: CVE-2011-2895 [11:04] Security: FreeBSD-SA-11:05.unix	2011-09-28 08:47:17 +00:00
Attilio Rao	79a5956c23	Revert r225372: wdog_kern_pat() acquires eventhandler mutex, thus it cannot work in kernel context (from where kdb_trap() runs). The right way to fix this is both offering the cpu-stop-on-panic-and-skip-locking logic and also a context for KDB to officially run. We can re-enable this (or a similar) improvement when these 2 patches hit the tree. Sponsored by: Sandvine Incorporated Discussed with: emaste, rstone MFC after: immediately	2011-09-27 13:42:11 +00:00
Konstantin Belousov	ce8bd78b2a	Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal() do not want to deliver the signal, and debugger never get a notification of exec. Found and tested by: Anton Yuzhaninov <citrin citrin ru> Discussed with: jhb MFC after: 2 weeks	2011-09-27 13:17:02 +00:00
Alexander Motin	556a5850fa	Fix interrupt counters dumping on SW_WATCHDOG fire.	2011-09-27 09:30:20 +00:00
Edward Tomasz Napierala	1dbf9dcc20	Fix error handling bug that would prevent MAC structures from getting freed properly if resource limit got exceeded. Approved by: re (kib)	2011-09-17 20:48:49 +00:00
Edward Tomasz Napierala	b38520f09c	Fix long-standing thinko regarding maxproc accounting. Basically, we were accounting the newly created process to its parent instead of the child itself. This caused problems later, when the child changed its credentials - the per-uid, per-jail etc counters were not properly updated, because the maxproc counter in the child process was 0. Approved by: re (kib)	2011-09-17 19:55:32 +00:00
Kip Macy	9eca9361f9	Auto-generated code from sys_ prefixing makesyscalls.sh change Approved by: re(bz)	2011-09-16 14:04:14 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Adrian Chadd	d2849f27bc	Ensure that ta_pending doesn't overflow u_short by capping its value at USHRT_MAX. If it overflows before the taskqueue can run, the task will be re-added to the taskqueue and cause a loop in the task list. Reported by: Arnaud Lacombe <lacombar@gmail.com> Submitted by: Ryan Stone <rysto32@gmail.com> Reviewed by: jhb Approved by: re (kib) MFC after: 1 day	2011-09-15 08:42:06 +00:00
Rick Macklem	4d30adc494	Modify vfs_register() to use a hash calculation on vfc_name to set vfc_typenum, so that vfc_typenum doesn't change when file systems are loaded in different orders. This keeps NFS file handles from changing, for file systems that use vfc_typenum in their fsid. This change is controlled via a loader.conf variable called vfs.typenumhash, since vfc_typenum will change once when this is enabled. It defaults to 1 for 9.0, but will default to 0 when MFC'd to stable/8. Tested by: hrs Reviewed by: jhb, pjd (earlier version) Approved by: re (kib) MFC after: 1 month	2011-09-13 21:01:26 +00:00
Attilio Rao	58379067a3	dump_write() returns ENXIO if the dump is trying to be written outside of the device boundry. While this is generally ok, the problem is that all the consumers handle similar cases (and expect to catch) ENOSPC for this (for a reference look at minidumpsys() and dumpsys() constructions). That ends up in consumers not recognizing the issue and amd64 failing to retry if the number of pages grows up during minidump. Fix this by returning ENOSPC in dump_write() and while here add some more diagnostic on involved values. Sponsored by: Sandvine Incorporated In collabouration with: emaste Approved by: re (kib) MFC after: 10 days	2011-09-12 20:39:31 +00:00
Ed Schouten	ca0856d3d1	Fix error return codes for ioctls on init/lock state devices. In revision 223722 we introduced support for driver ioctls on init/lock state devices. Unfortunately the call to ttydevsw_cioctl() clobbers the value of the error variable, meaning that in many cases ioctl() will now return ENOTTY, even though the ioctl() was processed properly. Reported by: Boris Samorodov <bsam ipt ru> Patch by: jilles@ Approved by: re@ (kib@)	2011-09-12 10:07:21 +00:00
Konstantin Belousov	26ccf4f10f	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:05:09 +00:00
Attilio Rao	fa2b39a18d	Improve the informations reported in case of busy buffers during the shutdown: - Axe out the SHOW_BUSYBUFS option and uses a tunable for selectively enable/disable it, which is defaulted for not printing anything (0 value) but can be changed for printing (1 value) and be verbose (2 value) - Improves the informations outputed: right now, there is no track of the actual struct buf object or vnode which are referenced by the shutdown process, but it is printed the related struct bufobj object which is not really helpful - Add more verbosity about the state of the struct buf lock and the vnode informations, with the latter to be activated separately by the sysctl Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib Approved by: re (ksmith) MFC after: 10 days	2011-09-08 12:56:26 +00:00
Edward Tomasz Napierala	ba1b206990	Fix whitespace. Submitted by: amdmi3 Approved by: re (rwatson)	2011-09-07 07:52:45 +00:00
Edward Tomasz Napierala	3044751e35	Work around a kernel panic triggered by forkbomb with an rctl rule such as j:name:maxproc:sigkill=100. Proper fix - deferring psignal to a taskqueue - is somewhat complicated and thus will happen after 9.0. Approved by: re (kib)	2011-09-06 17:22:40 +00:00
Attilio Rao	9f39e22e6d	Interrupts are disabled/enabled when entering and exiting the KDB context. While this is generally good, it brings along a serie of problems, like clocks going off sync and in presence of SW_WATCHDOG, watchdogs firing without a good reason (missed hardclock wdog ticks update). Fix the latter by kicking the watchdog just before to re-enable the interrupts. Also, while here, not rely on users to stop the watchdog manually when entering DDB but do that when entering KDB context. Sponsored by: Sandvine Incorporated Reviewed by: emaste, rstone Approved by: re (kib) MFC after: 1 week	2011-09-04 13:07:02 +00:00
Edward Tomasz Napierala	cff08ec0f4	Since r224036 the cputime and wallclock are supposed to be in seconds, not microseconds. Make it so. Approved by: re (kib)	2011-09-04 05:04:34 +00:00
Edward Tomasz Napierala	2419d7f93b	Fix panic that happens when fork(2) fails due to a limit other than the rctl one - for example, it happens when someone reaches maximum number of processes in the system. Approved by: re (kib)	2011-09-03 08:08:24 +00:00
Robert Watson	9b6dd12e5d	Correct several issues in the integration of POSIX shared memory objects and the new setmode and setowner fileops in FreeBSD 9.0: - Add new MAC Framework entry point mac_posixshm_check_create() to allow MAC policies to authorise shared memory use. Provide a stub policy and test policy templates. - Add missing Biba and MLS implementations of mac_posixshm_check_setmode() and mac_posixshm_check_setowner(). - Add 'accmode' argument to mac_posixshm_check_open() -- unlike the mac_posixsem_check_open() entry point it was modeled on, the access mode is required as shared memory access can be read-only as well as writable; this isn't true of POSIX semaphores. - Implement full range of POSIX shared memory entry points for Biba and MLS. Sponsored by: Google Inc. Obtained from: TrustedBSD Project Approved by: re (kib)	2011-09-02 17:40:39 +00:00
Robert Watson	4cf7545589	Attempt to make break-to-debugger and alternative break-to-debugger more accessible: (1) Always compile in support for breaking into the debugger if options KDB is present in the kernel. (2) Disable both by default, but allow them to be enabled via tunables and sysctls debug.kdb.break_to_debugger and debug.kdb.alt_break_to_debugger. (3) options BREAK_TO_DEBUGGER and options ALT_BREAK_TO_DEBUGGER continue to behave as before -- only now instead of compiling in break-to-debugger support, they change the default values of the above sysctls to enable those features by default. Current kernel configurations should, therefore, continue to behave as expected. (4) Migrate alternative break-to-debugger state machine logic out of individual device drivers into centralised KDB code. This has a number of upsides, but also one downside: it's now tricky to release sio spin locks when entering the debugger, so we don't. However, similar logic does not exist in other device drivers, including uart. (5) dcons requires some special handling; unlike other console types, it allows overriding KDB's own debugger selection, so we need a new interface to KDB to allow that to work. GENERIC kernels in -CURRENT will now support break-to-debugger as long as appropriate boot/run-time options are set, which should improve the debuggability of BETA kernels significantly. MFC after: 3 weeks Reviewed by: kib, nwhitehorn Approved by: re (bz)	2011-08-26 21:46:36 +00:00
Xin LI	cd39bb098e	Fix format strings for KTR_STATE in 4BSD ad ULE schedulers. Submitted by: Ivan Klymenko <fidaj@ukr.net> PR: kern/159904, kern/159905 MFC after: 2 weeks Approved by: re (kib)	2011-08-26 18:00:07 +00:00
Jamie Gritton	e6d5cb63fa	Delay the recursive decrement of pr_uref when jails are made invisible but not removed; decrement it instead when the child jail actually goes away. This avoids letting the counter go below zero in the case where dying (pr_uref==0) jails are "resurrected", and an associated KASSERT panic. Submitted by: Steven Hartland Approved by: re (bz) MFC after: 1 week	2011-08-26 16:03:34 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
Bjoern A. Zeeb	b233773bb9	Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB. For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths. Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls. Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)	2011-08-25 09:20:13 +00:00
Martin Matuska	82378711f9	Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week	2011-08-25 08:17:39 +00:00
Attilio Rao	e75baa2802	callout_cpu_switch() allows preemption when dropping the outcoming callout cpu lock (and after having dropped it). If the newly scheduled thread wants to acquire the old queue it will just spin forever. Fix this by disabling preemption and interrupts entirely (because fast interrupt handlers may incur in the same problem too) while switching locks. Reported by: hrs, Mike Tancsa <mike AT sentex DOT net>, Chip Camden <sterling AT camdensoftware DOT com> Tested by: hrs, Mike Tancsa <mike AT sentex DOT net>, Chip Camden <sterling AT camdensoftware DOT com>, Nicholas Esborn <nick AT desert DOT net> Approved by: re (kib) MFC after: 10 days	2011-08-21 10:52:50 +00:00
Konstantin Belousov	aab4f50170	Prevent the hiwatermark for the unix domain socket from becoming effectively negative. Often seen as upstream fastcgi connection timeouts in nginx when using sendfile over unix domain sockets for communication. Sendfile(2) may send more bytes then currently allowed by the hiwatermark of the socket, e.g. because the so_snd sockbuf lock is dropped after sbspace() call in the kern_sendfile() loop. In this case, recalculated hiwatermark will overflow. Since lowatermark is renewed as half of the hiwatermark by sendfile code, and both are unsigned, the send buffer never reaches the free space requested by lowatermark, causing indefinite wait in sendfile. Reviewed by: rwatson Approved by: re (bz) MFC after: 2 weeks	2011-08-20 16:12:29 +00:00
Robert Watson	311fa10b52	r222015 introduced a new assertion that the size of a fixed-length sbuf buffer is greater than 1. This triggered panics in at least one spot in the kernel (the MAC Framework) which passes non-negative, rather than >1 buffer sizes based on the size of a user buffer passed into a system call. While 0-size buffers aren't particularly useful, they also aren't strictly incorrect, so loosen the assertion. Discussed with: phk (fears I might be EDOOFUS but willing to go along) Spotted by: pho + stress2 Approved by: re (kib)	2011-08-19 08:29:10 +00:00
Jonathan Anderson	f8ca0a757a	Auto-generated system call code based on r224987. Approved by: re (implicit)	2011-08-18 23:08:52 +00:00
Jonathan Anderson	cfb5f76865	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
John Baldwin	f55d3fbe84	One of the general principles of the sysctl(3) API is that a user can query the needed size for a sysctl result by passing in a NULL old pointer and a valid oldsize. The kern.proc.args sysctl handler broke this assumption by not calling SYSCTL_OUT() if the old pointer was NULL. Approved by: re (kib) MFC after: 3 days	2011-08-18 22:20:45 +00:00
Konstantin Belousov	68889ed699	Fix build breakage. Initialize error variables explicitely for !MAC case. Pointy hat to: kib Approved by: re (bz)	2011-08-17 12:37:14 +00:00
Konstantin Belousov	9c00bb9190	Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)	2011-08-16 20:07:47 +00:00
Jonathan Anderson	d6f7248983	poll(2) implementation for capabilities. When calling poll(2) on a capability, unwrap first and then poll the underlying object. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-16 14:14:56 +00:00
Robert Watson	359b396113	Trim some warnings and notes from capabilities.conf -- these are left over from Capsicum development, and no longer apply. Approved by: re (kib) Sponsored by: Google Inc	2011-08-13 17:22:16 +00:00
Robert Watson	fd9a5f73f6	When falloc() was broken into separate falloc_noinstall() and finstall(), a bug was introduced in kern_openat() such that the error from the vnode open operation was overwritten before it was passed as an argument to dupfdopen(). This broke operations on /dev/{stdin,stdout,stderr}. Fix by preserving the original error number across finstall() so that it is still available. Approved by: re (kib) Reported by: cognet	2011-08-13 16:03:40 +00:00
Robert Watson	854d7b9fc8	Update use of the FEATURE() macro in sys_capability.c to reflect the move to two different kernel options for capability mode vs. capabilities. Approved by: re (bz)	2011-08-13 13:34:01 +00:00
Robert Watson	73516dbd27	Now that capability support has been committed, update and expand the comment at the type of sys_capability.c to describe its new contents. Approved by: re (xxx)	2011-08-13 13:26:40 +00:00
Robert Watson	74536eddbe	Regenerate system call files following r224812 changes to capabilities.conf. A no-op for non-Capsicum kernels; for Capsicum kernels, completes the enabling of fooat(2) system calls using capabilities. With this change, and subject to bug fixes, Capsicum capability support is now complete for 9.0. Approved by: re (kib) Submitted by: jonathan Sponsored by: Google Inc	2011-08-13 12:14:40 +00:00
Jonathan Anderson	bc69c09054	Allow openat(2), fstatat(2), etc. in capability mode. namei() and lookup() can now perform "strictly relative" lookups. Such lookups, performed when in capability mode or when looking up relative to a directory capability, enforce two policies: - absolute paths are disallowed (including symlinks to absolute paths) - paths containing '..' components are disallowed These constraints make it safe to enable openat() and friends. These system calls are instrumental in supporting Capsicum components such as the capability-mode-aware runtime linker. Finally, adjust comments in capabilities.conf to reflect the actual state of the world (e.g. shm_open(2) already has the appropriate constraints, getdents(2) already requires CAP_SEEK). Approved by: re (bz), mentor (rwatson) Sponsored by: Google Inc.	2011-08-13 10:43:21 +00:00

1 2 3 4 5 ...

12420 Commits