freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	75c586a4c8	In r198506, kern_sigsuspend() started doing cursig/postsig loop to make sure that a signal was delivered to the thread before returning from syscall. Signal delivery puts new return frame on the user stack, and modifies trap frame to enter signal handler. As a consequence, syscall return code sets EINTR as error return for signal frame, instead of the syscall return. Also, for ia64, due to different registers layout for those two kind of frames, usermode sigsegfaulted when returned from signal handler. Use newly-introduced cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying this frame [1]. Another issue is that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal [2]. Modify postsig() to return 1 if signal was posted, and 0 otherwise, and use this in the kern_sigsuspend loop. Proposed by: marcel [1] Noted by: davidxu [2] Reviewed by: marcel, davidxu MFC after: 1 month	2009-11-10 11:46:53 +00:00
Edward Tomasz Napierala	7ba889b8f4	Add suggestion for zfs root.	2009-11-08 09:54:25 +00:00
Attilio Rao	337c5ff4fc	Save the sack when doing a lockmgr_disown() call. Requested by: kib MFC: 3 days	2009-11-06 22:33:03 +00:00
Edward Tomasz Napierala	7a172809b5	Fix build. Submitted by: Andrius Morkūnas <hinokind at gmail.com>	2009-11-04 08:25:58 +00:00
Edward Tomasz Napierala	3b1b09809f	Revert r198874, pending further discussion.	2009-11-04 07:14:16 +00:00
Edward Tomasz Napierala	da9ce28ecb	Style fixes.	2009-11-04 07:04:15 +00:00
Edward Tomasz Napierala	8fafa5cecf	Make sure we don't end up with VAPPEND without VWRITE, if someone calls open(2) like this: open(..., O_APPEND).	2009-11-04 06:48:34 +00:00
Edward Tomasz Napierala	597954c813	While VAPPEND without VWRITE makes sense for VOP_ACCESSX(9) (e.g. to check for the permission to create subdirectory (ACE4_ADD_SUBDIRECTORY)), it doesn't really make sense for VOP_ACCESS(9). Also, many VOP_ACCESS(9) implementations don't expect that. Make sure we don't confuse them.	2009-11-04 06:47:14 +00:00
Ed Schouten	ca1d2f657a	Make /dev/klog and kern.msgbuf* MPSAFE. Normally msgbufp is locked using Giant. Switch it to use the msgbuf_lock. Instead of changing the tsleep() calls to msleep(), just convert it to condvar(9). In my opinion the locking around msgbuf_peekbytes() still remains questionable. It looks like locks are dropped while performing copies of multiple blocks to userspace, which may cause the msgbuf to be reset in the mean time. At least getting it underneath from Giant should make it a little easier for us to figure out how to solve that. Reminded by: rdivacky	2009-11-03 21:06:19 +00:00
Attilio Rao	1b9d701fee	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>	2009-11-03 16:46:52 +00:00
Konstantin Belousov	1c89fc757a	If socket buffer space appears to be lower then sum of count of already prepared bytes and next portion of transfer, inner loop of kern_sendfile() aborts, not preparing next mbuf for socket buffer, and not modifying any outer loop invariants. The thread loops in the outer loop forever. Instead of breaking from inner loop, prepare only bytes that fit into the socket buffer space. In collaboration with: pho Reviewed by: bz PR: kern/138999 MFC after: 2 weeks	2009-11-03 12:52:35 +00:00
Konstantin Belousov	80a8b0f3bf	Trapsignal() and postsig() call kern_sigprocmask() with both process lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have ps_mtx locked to decide and wakeup a thread, causing recursion on the mutex. Inform kern_sigprocmask() and reschedule_signals() about lock state of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion. Reported and tested by: keramida MFC after: 1 month	2009-10-30 10:10:39 +00:00
Konstantin Belousov	8084540253	Trapsignal() calls kern_sigprocmask() when delivering catched signal with proc lock held. Reported and tested by: Mykola Dzham freebsd at levsha org ua MFC after: 1 month	2009-10-29 14:34:24 +00:00
Konstantin Belousov	7415a41f4a	Fix style issue.	2009-10-29 10:03:08 +00:00
Konstantin Belousov	17c974499c	Regenerate	2009-10-27 11:01:15 +00:00
Konstantin Belousov	066d836b02	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:55:34 +00:00
Konstantin Belousov	d6e029adbe	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
Konstantin Belousov	84440afb54	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:42:24 +00:00
John Baldwin	86855bf549	Another nit that both I and ispell missed. Submitted by: Ben Kaduk minimarmot of gmail	2009-10-26 18:32:06 +00:00
John Baldwin	9390262576	Fix some spelling nits.	2009-10-26 17:42:03 +00:00
Joseph Koshy	16d95d4f92	Inform hwpmc(4) of a thread's impending demise prior to invoking sched_throw(). Debugging help: fabient Review and testing by: fabient	2009-10-25 04:34:47 +00:00
Alan Cox	a0c703bf21	Update a comment to reflect the previous change.	2009-10-25 02:48:29 +00:00
Ruslan Ermilov	4d9d1e823c	- Rename tunable kern.ipc.shmmaxpgs to kern.ipc.shmall. - Explain the fuss when initializing shmmax. PR: 75542 (mistakenly closed instead of PR 75541)	2009-10-24 19:00:58 +00:00
John Baldwin	5ca4819ddf	- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[]. Reviewed by: bde	2009-10-23 15:14:54 +00:00
John Baldwin	4f9d48e478	Don't bother copying the name of a kproc or kthread out into a temporary array just to pass that array to printf(). kproc and kthread names are NUL-terminated and can be printed using printf() directly. Reviewed by: bde	2009-10-23 15:09:51 +00:00
John Baldwin	3eec6f034a	Set the devclass_t pointer specified in the DRIVER_MODULE() macro sooner so it is always valid when a driver's identify routine is called. Previously, new-bus would attempt to create the devclass for a newly loaded driver in two separate places, once in devclass_add_driver(), and again after devclass_add_driver() returned in driver_module_handler(). Only the second lookup attempted to set a device class' parent and set the devclass_t pointer specified in the DRIVER_MODULE() macro. However, by the time it was executed, the driver was already added to existing instances of the parent driver at which point in time the new driver's identify routine would have been invoked. The fix is to merge the two attempts and only create the devclass once in devclass_add_driver() including setting the devclass_t pointer passed to DRIVER_MODULE() before the driver is added to any existing bus devices. Reported by: avg Reviewed by: imp MFC after: 2 weeks	2009-10-22 14:53:44 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Ruslan Ermilov	e64585bdc2	Random number generator initialization cleanup: - Introduce new SI_SUB_RANDOM point in boot sequence to make it clear from where one may start using random(9). It should be as early as possible, so place it just after SI_SUB_CPU where we have some randomness on most platforms via get_cyclecount(). - Move stack protector initialization to be after SI_SUB_RANDOM as before this point we have no randomness at all. This fixes stack protector to actually protect stack with some random guard value instead of a well-known one. Note that this patch doesn't try to address arc4random(9) issues. With current code, it will be implicitly seeded by stack protector and hence will get the same entropy as random(9). It will be securely reseeded once /dev/random is feeded by some entropy from userland. Submitted by: Maxim Dounin <mdounin@mdounin.ru> MFC after: 3 days	2009-10-20 16:36:51 +00:00
Ed Schouten	6015f6f35a	Properly set the low watermarks when reducing the baud rate. Now that buffers are deallocated lazily, we should not use tty*q_getsize() to obtain the buffer size to calculate the low watermarks. Doing this may cause the watermark to be placed outside the typical buffer size. This caused some regressions after my previous commit to the TTY code, which allows pseudo-devices to resize the buffers as well. Reported by: yongari, dougb MFC after: 1 week	2009-10-19 07:17:37 +00:00
Ed Schouten	5ed8d12443	Allow the buffer size to be configured for pseudo-like TTY devices. Devices that don't implement param() (which means they don't support hardware parameters such as flow control, baud rate) hardcode the baud rate to TTYDEF_SPEED. This means the buffer size cannot be configured, which is a little inconvenient when using canonical mode with big lines of input, etc. Make it adjustable, but do clamp it between B50 and B115200 to prevent awkward buffer sizes. Remove the baud rate assignment from /etc/gettytab. Trust the kernel to fill in a proper value. Reported by: Mikolaj Golub <to my trociny gmail com> MFC after: 1 month	2009-10-18 19:48:53 +00:00
Ed Schouten	99087885be	Make lock devices work properly. It turned out I did add the code to use the init state devices to set the termios structure when opening the device, but it seems I totally forgot to add the bits required to force the actual locking of flags through the lock state devices. Reported by: ru MFC after: 1 week (to be discussed)	2009-10-18 19:45:44 +00:00
Konstantin Belousov	7564c4ad9a	If ET_DYN binary has non-zero base address for some reason, honour it and do not relocate the binary to ET_DYN_LOAD_ADDR. This allows for the binary author to influence address map of the process. In particular, when the binary is actually an interpeter, this allows to have almost usual process address map. Communicate the relocation bias of the mapping for interpeter-less ET_DYN binary, that is interperter itself, in AT_BASE aux entry. This way, rtld is able to find its dynamic structure and relocate itself. Note that mapbase in the rtld is still wrong and requires further fixing. Reported and tested by: rwatson Discussed with: kan MFC after: 3 days	2009-10-18 12:57:48 +00:00
Ed Schouten	39410373b3	Print backspaces after echoing an EOF. Applications like shells expect EOF to give no graphical output, while our implementation prints ^D by default (tunable with stty echoctl). Make the new implementation behave like the old TTY code. Print two backspaces afterwards. Reported by: koitsu MFC after: 1 month	2009-10-17 08:59:41 +00:00
John Baldwin	d0c9a29169	Use language more closely resembling English in a panic message. Pointy hat to: jhb Submitted by: pluknet	2009-10-15 18:51:19 +00:00
John Baldwin	37b8ef16cd	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks	2009-10-15 14:54:35 +00:00
John Baldwin	a0f1535205	Fix a sign bug in the handling of nice priorities when computing the interactive score for a thread. Submitted by: Taku YAMAMOTO taku of tackymt.homeip.net Reviewed by: jeff MFC after: 3 days	2009-10-15 11:41:12 +00:00
Joseph Koshy	8b5adf9dd9	Improve the description of sysctl "kern.sugid_coredump". Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net> on -hackers	2009-10-12 15:49:48 +00:00
Konstantin Belousov	7df8f6ab6f	Fix typo. Submitted by: rdivacky MFC after: 1 month	2009-10-12 10:09:48 +00:00
Konstantin Belousov	6b286ee8b5	Currently, when signal is delivered to the process and there is a thread not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month	2009-10-11 16:49:30 +00:00
Konstantin Belousov	6728dc169d	Refine r195509, instead of checking that vnode type is VBAD, that is set quite late in the revocation path, properly verify that vnode is not doomed before calling VOP. Reported and tested by: Harald Schmalzbauer <h.schmalzbauer omnilan de> MFC after: 3 days	2009-10-10 21:17:30 +00:00
Konstantin Belousov	ab02d85f0d	Map PIE binaries at non-zero base address. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:33:01 +00:00
Konstantin Belousov	5b33842a9b	Do not map segments of zero length. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:28:52 +00:00
Konstantin Belousov	47d81c1b70	Postpone dropping fp till both kq_global and kqueue mutexes are unlocked. fdrop() closes file descriptor when reference count goes to zero. Close method for vnodes locks the vnode, resulting in "sleepable after non-sleepable". For pipes, pipe mutex is before kqueue lock, causing LOR. Reported and tested by: pho MFC after: 2 weeks	2009-10-10 14:56:34 +00:00
Robert Watson	604f19c91e	Fix build on amd64, where sysctl arg1 is a pointer. Reported by: Mr Tinderbox MFC after: 3 months	2009-10-05 22:23:12 +00:00
Edward Tomasz Napierala	d5e0b21541	Fix NFSv4 ACLs on sparc64. Turns out that fuword(9) fetches 64 bits instead of sizeof(int), and on sparc64 that resulted in fetching wrong value for acl_maxcnt, which in turn caused __acl_get_link(2) to fail with EINVAL. PR: sparc64/139304 Submitted by: Dmitry Afanasiev <KOT at MATPOCKuH.Ru>	2009-10-05 19:56:56 +00:00
Robert Watson	84d61770bc	First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google	2009-10-05 14:49:16 +00:00
Attilio Rao	7f9f80ce03	When releasing a lockmgr held in shared way we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio	2009-10-03 15:02:55 +00:00
Bjoern A. Zeeb	925c8b5b04	Print a warning in case we cannot add more brandinfo because we would overflow the MAX_BRANDS sized array. Reviewed by: kib MFC After: 1 month	2009-10-03 10:50:00 +00:00
Robert Watson	afd8e45b45	Don't comment on stream socket handling in sosend_dgram, since that's not handled. MFC after: 3 weeks	2009-10-02 21:31:15 +00:00
Bjoern A. Zeeb	878adb8517	Add a mitigation feature that will prevent user mappings at virtual address 0, limiting the ability to convert a kernel NULL pointer dereference into a privilege escalation attack. If the sysctl is set to 0 a newly started process will not be able to map anything in the address range of the first page (0 to PAGE_SIZE). This is the default. Already running processes are not affected by this. You can either change the sysctl or the tunable from loader in case you need to map at a virtual address of 0, for example when running any of the extinct species of a set of a.out binaries, vm86 emulation, .. In that case set security.bsd.map_at_zero="1". Superseeds: r197537 In collaboration with: jhb, kib, alc	2009-10-02 17:48:51 +00:00

1 2 3 4 5 ...

11438 Commits