freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	803a9b3efd	panic() with reasonable message instead of returning zero frequency causing division by zero later if event timer's minimal period is above one second. For now it is just a theoretical possibility. Found by: Clang Static Analyzer	2012-10-10 19:46:46 +00:00
Attilio Rao	3a4730256a	Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks	2012-10-09 14:32:30 +00:00
Andriy Gapon	298fbd1605	cngetc: use cpu_spinwait to ease the cncheckc loop a tiny bit Reviewed by: julian MFC after: 10 days	2012-10-06 19:50:23 +00:00
Andriy Gapon	c331c9703c	ktrace/kern_exec: check p_tracecred instead of p_cred .. when deciding whether to continue tracing across suid/sgid exec. Otherwise if root ktrace-d an unprivileged process and the processed exec-ed a suid program, then tracing didn't continue across exec. Reviewed by: bde, kib MFC after: 22 days	2012-10-06 19:23:44 +00:00
Ed Schouten	6b1b791da6	Fix faulty error code handling in read(2) on TTYs. When performing a non-blocking read(2), on a TTY while no data is available, we should return EAGAIN. But if there's a modem disconnect, we should return 0. Right now we only return 0 when doing a blocking read, which is wrong. MFC after: 1 month	2012-10-03 13:51:03 +00:00
Garrett Wollman	48b5c7410f	Fix spelling of the function name in two assertion messages.	2012-10-02 18:38:05 +00:00
Eitan Adler	8dbce2a343	Provide a generic way to disable devices at boot time PR: kern/119202 Requested by: peterj Reviewed by: sbruno, jhb Approved by: cperciva MFC after: 1 week	2012-10-02 03:33:41 +00:00
Pawel Jakub Dawidek	55711729f3	- Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change mkfifoat(2) was not restricted. - Introduce CAP_MKNOD and enforce it on mknodat(2). Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-10-01 05:43:24 +00:00
Konstantin Belousov	877d24ac8a	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
Matthew D Fleming	fc8fdae0df	Fix up kernel sources to be ready for a 64-bit ino_t. Original code by: Gleb Kurtsou	2012-09-27 23:30:49 +00:00
Pawel Jakub Dawidek	c8e781f6e0	Revert r240931, as the previous comment was actually in sync with POSIX. I have to note that POSIX is simply stupid in how it describes O_EXEC/fexecve and friends. Yes, not only inconsistent, but stupid. In the open(2) description, O_RDONLY flag is described as: O_RDONLY Open for reading only. Taken from: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html Note "for reading only". Not "for reading or executing"! In the fexecve(2) description you can find: The fexecve() function shall fail if: [EBADF] The fd argument is not a valid file descriptor open for executing. Taken from: http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html As you can see the function shall fail if the file was not open with O_EXEC! And yet, if you look closer you can find this mess in the exec.html: Since execute permission is checked by fexecve(), the file description fd need not have been opened with the O_EXEC flag. Yes, O_EXEC flag doesn't have to be specified after all. You can open a file with O_RDONLY and you still be able to fexecve(2) it.	2012-09-27 16:43:23 +00:00
Mikolaj Golub	47813f5d94	Kernel and modules have "set_vnet" linker set, where virtualized global variables are placed. When a module is loaded by link_elf linker its variables from "set_vnet" linker set are copied to the kernel "set_vnet" ("modspace") and all references to these variables inside the module are relocated accordingly. The issue is when a module is loaded that has references to global variables from another, previously loaded module: these references are not relocated so an invalid address is used when the module tries to access the variable. The example is V_layer3_chain, defined in ipfw module and accessed from ipfw_nat. The same issue is with DPCPU variables, which use "set_pcpu" linker set. Fix this making the link_elf linker on a module load recognize "external" DPCPU/VNET variables defined in the previously loaded modules and relocate them accordingly. For this set_pcpu_list and set_vnet_list are used, where the addresses of modules' "set_pcpu" and "set_vnet" linker sets are stored. Note, archs that use link_elf_obj (amd64) were not affected by this issue. Reviewed by: jhb, julian, zec (initial version) MFC after: 1 month	2012-09-27 14:55:15 +00:00
Konstantin Belousov	94cb35459d	Make the updates of the tid ring buffer' head and tail pointers explicit by moving them into separate statements from the buffer element accesses. Requested by: jhb MFC after: 3 days	2012-09-26 09:25:11 +00:00
Pawel Jakub Dawidek	28f865b0b1	Fix freebsd32_kmq_timedreceive() and freebsd32_kmq_timedsend() to use getmq_read() and getmq_write() respectively, just like sys_kmq_timedreceive() and sys_kmq_timedsend(). Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 22:15:59 +00:00
Pawel Jakub Dawidek	8c706ce0d0	vn_write() always expects FOF_OFFSET flag, which is asserted at the begining, so there is no need to check for it. Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 21:31:17 +00:00
Pawel Jakub Dawidek	3a038c4d68	We cannot open file for reading and executing (O_RDONLY \| O_EXEC). Well, in theory we can pass those two flags, because O_RDONLY is 0, but we won't be able to read from a descriptor opened with O_EXEC. Update the comment. Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 21:11:40 +00:00
Pawel Jakub Dawidek	5c3e5c7f03	Require CAP_DELETE on directory descriptor for unlinkat(2). Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 21:00:36 +00:00
Pawel Jakub Dawidek	cffcbad2bf	Require CAP_CREATE on directory descriptor for symlinkat(2). Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 20:59:12 +00:00
Pawel Jakub Dawidek	d2e166e654	Require CAP_CREATE on directory descriptor for linkat(2). Sponsored by: FreeBSD Foundation MFC after: 2 weeks	2012-09-25 20:58:15 +00:00
Pawel Jakub Dawidek	1159429db8	O_EXEC flag is not part of the O_ACCMODE mask, check it separately. If O_EXEC is provided don't require CAP_READ/CAP_WRITE, as O_EXEC is mutually exclusive to O_RDONLY/O_WRONLY/O_RDWR. Without this change CAP_FEXECVE capability right is not enforced. Sponsored by: FreeBSD Foundation MFC after: 3 days	2012-09-25 20:48:49 +00:00
George V. Neville-Neil	0bf9cb917c	Change the module name for the I/O provider to "kernel" from "genunix" This will requires us to modify externally created DTrace scripts but makes logical sense for FreeBSD. Requested by: rpaulo MFC after: 2 weeks	2012-09-25 19:16:28 +00:00
John Baldwin	d95dca1d08	Add optional entropy harvesting for software interrupts in swi_sched() as controlled by kern.random.sys.harvest.swi. SWI harvesting feeds into the interrupt FIFO and each event is estimated as providing a single bit of entropy. Reviewed by: markm, obrien MFC after: 2 weeks	2012-09-25 14:55:46 +00:00
Konstantin Belousov	787a64ddd2	Do not skip two elements of the tid_buffer when reusing the buffer slot. This eventually results in exhaustion of the tid space, causing new threads get tid -1 as identifier. The bad effect of having the thread id equal to -1 is that UMTX_OP_UMUTEX_WAIT returns EFAULT for a lock owned by such thread, because casuword cannot distinguish between literal value -1 read from the address and -1 returned as an indication of faulted access. _thr_umutex_lock() helper from libthr does not check for errors from _umtx_op_err(2), causing an infinite loop in mutex_lock_sleep(). We observed the JVM processes hanging and consuming enormous amount of system time on machines with approximately 100 days uptime. Reported by: Mykola Dzham <freebsd levsha org ua> MFC after: 1 week	2012-09-22 12:17:09 +00:00
Eitan Adler	96240c89f0	Correct double "the the" Approved by: cperciva MFC after: 3 days	2012-09-14 21:28:56 +00:00
Andriy Gapon	e87fc7cf7b	sched_ule: fix inverted condition in reporting of priority lending via ktr Reviewed by: kan MFC after: 1 week	2012-09-14 19:55:28 +00:00
Attilio Rao	0a15e5d30d	Remove all the checks on curthread != NULL with the exception of some MD trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week	2012-09-13 22:26:22 +00:00
John Baldwin	0f14f15b62	Ignore stop and continue signals sent to an exiting process. Stop signals set p_xstat to the signal that triggered the stop, but p_xstat is also used to hold the exit status of an exiting process. Without this change, a stop signal that arrived after a process was marked P_WEXIT but before it was marked a zombie would overwrite the exit status with the stop signal number. Reviewed by: kib MFC after: 1 week	2012-09-13 15:51:18 +00:00
Attilio Rao	e3ae0dfe69	Improve check coverage about idle threads. Idle threads are not allowed to acquire any lock but spinlocks. Deny any attempt to do so by panicing at the locking operation when INVARIANTS is on. Then, remove the check on blocking on a turnstile. The check in sleepqueues is left because they are not allowed to use tsleep() either which could happen still. Reviewed by: bde, jhb, kib MFC after: 1 week	2012-09-12 22:10:53 +00:00
Attilio Rao	faa1082aa2	Tweak the commit message in case of panic for sleeping from threads with TDP_NOSLEEPING on. The current message has no informations on the thread and wchan involed, which may be useful in case where dumps have mangled dwarf informations. Reported by: kib Reviewed by: bde, jhb, kib MFC after: 1 week	2012-09-12 22:05:54 +00:00
Konstantin Belousov	bcd5bb8e57	Add a facility for vgone() to inform the set of subscribed mounts about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks	2012-09-09 19:17:15 +00:00
Konstantin Belousov	84c3cd4f19	Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the lookup code that dotdot lookups shall override any shared lock requests with the exclusive one. The flag is useful for filesystems which sometimes need to upgrade shared lock to exclusive inside the VOP_LOOKUP or later, which cannot be done safely for dotdot, due to dvp also locked and causing LOR. In collaboration with: pho MFC after: 3 weeks	2012-09-09 19:11:52 +00:00
Attilio Rao	16cbf13b53	Move the checks for td_pinned, td_critnest, TDP_NOFAULTING and TDP_NOSLEEPING leaking from syscallret() to userret() so that also trap handling is covered. Also, the check on td_locks is not duplicated between the two functions. Reported by: avg Reviewed by: kib MFC after: 1 week	2012-09-08 18:35:15 +00:00
Attilio Rao	fbe18392a1	Move PT_UPDATED_FLUSH() before td_locks check in order to have more coverage also in the XEN case. Reviewed by: kib MFC after: 1 week	2012-09-08 18:29:53 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Gleb Smirnoff	aaf6343576	Supply the pr_ctloutput method for local datagram sockets, so that setsockopt() and getsockopt() work on them. This makes 'tools/regression/sockets/unix_cmsg -t dgram' more successful.	2012-09-07 21:06:54 +00:00
John Baldwin	773e3b7dda	A few whitespace and comment fixes.	2012-09-07 15:10:46 +00:00
Aleksandr Rybalko	1bccd8638e	Style fixes. Suggested by: mdf Approved by: adrian (menthor)	2012-09-04 23:16:55 +00:00
Aleksandr Rybalko	6a8dada257	Add missing braces. Approved by: bschmidt (while mentor offline) Pointed by: gcooper Pointy hat to: ray	2012-09-03 09:46:46 +00:00
Andrey Zonov	ceb0f71506	- Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN. Pointed out by: avg Approved by: kib (mentor) MFC after: 1 week	2012-09-03 09:26:56 +00:00
Aleksandr Rybalko	70da14c4bb	Add kern.hintmode sysctl variable to show current state of hints: 0 - loader hints in environment only; 1 - static hints only 2 - fallback mode (Dynamic KENV with fallback to kernel environment) Add kern.hintmode write handler, accept only value 2. That will switch static KENV to dynamic. So it will be possible to change device hints. Approved by: adrian (mentor)	2012-09-03 08:52:05 +00:00
Andrey Zonov	c3927cd956	- Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssiz and kern.sgrowsiz sysctls writable. Approved by: kib (mentor)	2012-09-02 17:39:02 +00:00
Mikolaj Golub	bb9f214f64	In soreceive_generic() remove the optimization for the case when MSG_WAITALL is set, and it is possible to do the entire receive operation at once if we block (resid <= hiwat). Actually it might make the recv(2) with MSG_WAITALL flag get stuck when there is enough space in the receiver buffer to satisfy the request but not enough to open the window closed previously due to the buffer being full. The issue can be reproduced using the following scenario: On the sender side do 2 send(2) requests: 1) data of size much smaller than SOBUF_SIZE (e.g. SOBUF_SIZE / 10); 2) data of size equal to SOBUF_SIZE. On the receiver side do 2 recv(2) requests with MSG_WAITALL flag set: 1) recv() data of SOBUF_SIZE / 10 size; 2) recv() data of SOBUF_SIZE size; We totally fill the receiver buffer with one SOBUF_SIZE/10 size request and partial SOBUF_SIZE request. When the first request is processed we get SOBUF_SIZE/10 free space. It is just enough to receive the rest of bytes for the second request, and soreceive_generic() blocks in the part that is a subject of this change waiting for the rest. But the window was closed when the buffer was filled and to avoid silly window syndrome it opens only when available space is larger than sb_hiwat/4 or maxseg. So it is stuck and pending data is only sent via TCP window probes. Discussed with: kib (long ago) MFC after: 2 weeks	2012-09-02 07:33:52 +00:00
Mikolaj Golub	2ad099fcb1	In soreceive_generic() when checking if the type of mbuf has changed check it for MT_CONTROL type too, otherwise the assertion "m->m_type == MT_DATA" below may be triggered by the following scenario: - the sender sends some data (MT_DATA) and then a file descriptor (MT_CONTROL); - the receiver calls recv(2) with a MSG_WAITALL asking for data larger than the receive buffer (uio_resid > hiwat). MFC after: 2 week	2012-09-02 07:29:37 +00:00
Pawel Jakub Dawidek	707641ec28	Fix panic in procdesc that can be triggered in the following scenario: 1. Process A pdfork(2)s process B. 2. Process A passes process descriptor of B to unrelated process C. 3. Hit CTRL+C to terminate process A. Process B is also terminated with SIGINT. 4. init(8) collects status of process B. 5. Process C closes process descriptor associated with process B. When we have such order of events, init(8), by collecting status of process B, will call procdesc_reap(). This function sets pd_proc to NULL. Now when process C calls close on this process descriptor, procdesc_close() is called. Unfortunately procdesc_close() assumes that pd_proc points at a valid proc structure, but it was set to NULL earlier, so the kernel panics. The patch also adds setting 'p->p_procdesc' to NULL in procdesc_reap(), which I think should be done. MFC after: 1 week	2012-09-01 11:21:56 +00:00
Attilio Rao	d4a2ab8c07	Post r222812 KTR_CPUMASK started being initialized only as a tunable handler and not more statically. Unfortunately, it seems that this is not ideal for new platform bringup and boot low level development (which needs ktr_cpumask to be effective before tunables can be setup). Because of this, add a way to statically initialize cpusets, by passing an list of initializers, divided by commas. Also, provide a way to enforce an all-set mask, for above mentioned initializers. This imposes some differences on how KTR_CPUMASK is setup now as a kernel option, and in particular this makes the words specifications backward wrt. what is currently in -CURRENT. In order to avoid mismatches between KTR_CPUMASK definition and other way to setup the mask (tunable, sysctl) and to print it, change the ordering how cpusetobj_print() and cpusetobj_scan() acquire the words belonging to the set. Please give a look to sys/conf/NOTES in order to understand how the new format is supposed to work. Also, ktr manpages will be updated shortly by gjb which volountereed for this. This patch won't be merged because it changes a POLA (at least from the theoretical standpoint) and this is however a patch that proves to be effective only in development environments. Requested by: rpaulo Reviewed by: jeff, rpaulo	2012-08-30 21:22:47 +00:00
Marius Strobl	bf38cf8ab3	- Unlike cache invalidation and TLB demapping IPIs, reading registers from other CPUs doesn't require locking so get rid of it. As the latter is used for the timecounter on certain machine models, using a spin lock in this case can lead to a deadlock with the upcoming callout(9) rework. - Merge r134227/r167250 from x86: Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB demapping IPIs. - Mark some unused function arguments as such. MFC after: 1 week	2012-08-29 16:56:50 +00:00
Ed Schouten	fa4dd27847	Remove unused SI_* flags. The SI_DEVOPEN, SI_CONSOPEN and SI_CANDELETE flags are not used by any piece of code in the tree.	2012-08-28 19:30:29 +00:00
John Baldwin	10f0ab3933	Shorten the name of the fast SWI taskqueue to "fast taskq" so that it fits. Reported by: lev MFC after: 1 week	2012-08-28 13:35:37 +00:00
Navdeep Parhar	812302c3eb	Allow nmbjumbop, nmbjumbo9, and nmbjumbo16 to be set directly via loader tunables. MFC after: 1 month	2012-08-23 21:32:02 +00:00
Konstantin Belousov	258f94423b	Provide some compat32 shims for sysctl vfs.conflist. It is required for getvfsbyname(3) operation when called from 32bit process, and getvfsbyname(3) is used by recent bsdtar import. Reported by: many Tested by: David Naylor <naylor.b.david@gmail.com> MFC after: 5 days	2012-08-22 20:05:34 +00:00

1 2 3 4 5 ...

12841 Commits