Commit Graph

10872 Commits

Author SHA1 Message Date
Konstantin Belousov
9316467d05 FFS puts the extended attributes blocks at the negative blocks for the
vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it
incorrectly does vm_object_page_remove(0, 0), removing all pages from
the underlying vm object, not only the pages that back the extended
attributes data.

Change vinvalbuf() to not remove any pages from the object when
V_NORMAL or V_ALT are specified. Instead, the only in-tree caller
in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely
removes the corresponding page range. The V_NORMAL caller
does vnode_pager_setsize(vp, 0) immediately after the call to
vinvalbuf(V_NORMAL) already.

Reported by:	csjp
Reviewed by:	ups
MFC after:	3 weeks
2009-01-20 11:27:45 +00:00
Stephen McKay
58c1607e03 Add a limit on namecache entries.
In normal operation, the number of cache entries is roughly equal to the
number of active vnodes.  However, when most of the recently accessed
vnodes have many hard links, the number of cache entries can be 32000
times as large, exhausting kernel memory and provoking a panic in
kmem_malloc().

MFC after: 2 weeks
2009-01-20 04:21:21 +00:00
Alexander Motin
129c5c814d Teach m_copyback() to use trailing space of the last mbuf in chain. 2009-01-18 20:19:55 +00:00
Jeff Roberson
8f51ad55e7 - Implement generic macros for producing KTR records that are compatible
with src/tools/sched/schedgraph.py.  This allows developers to quickly
   create a graphical view of ktr data for any resource in the system.
 - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly
   identifying records associated with a thread or cpu.
 - Reimplement the KTR_SCHED traces using the new generic facility.

Obtained from:	attilio
Discussed with:	jhb
Sponsored by:	Nokia
2009-01-17 07:17:57 +00:00
Konstantin Belousov
137cf3b682 Lock the semaphore identifier lock during semaphore initialization to
guarantee atomicity of the operation for other semaphore consumers.
In particular, this should guard against access to the semaphore with
not done or partially done MAC label assignment.

Reviewed by:	rwatson
MFC after:	1 month
2009-01-15 12:15:46 +00:00
Konstantin Belousov
90a017ba64 It seems that there are at least three issues with IPC_RMID operation
on SysV semaphores.

  The squeeze of the semaphore array in the kern_semctl() modifies
  sem_base for the semaphores with sem_base greater then sem_base of
  the removed semaphore, as well as the values of the semaphores,
  without locking their mutex. This can lead to (killable) hangs or
  unexpected behaviour of the processes performing any sem operations
  while other process does IPC_RMID.

  The semexit_myhook() eventhandler unlocks SEMUNDO_LOCK() while
  accessing *suptr. This allows for IPC_RMID for the sem id to be
  performed in parallel with undo hook referenced by the current undo
  structure. This leads to the panic("semexit - semid not allocated") [1].

  The semaphore creation is protected by Giant, while IPC_RMID is done
  while only semaphore mutex is held. This seems to result in invalid
  values for semtot, causing random ENOSPC error returns [2].

Redo the locking of the semaphores lifetime cycle. Delegate the
sem_mtx to the sole purpose of protecting semget() and
semctl(IPC_RMID). Introduce new sem_undo_mtx to protect SEM_UNDO
handling. Remove the Giant remnants from the code.
Note that  mac_sysvsem_check_semget() and mac_sysvsem_create() are
now called while sem_mtx is held, as well as mac_sysvsem_cleanup() [3].

When semaphore is removed, acquire semaphore locks for all semaphores
with sem_base that is going to be changed by squeeze of the sema
array. The lock order is not important there, because the region is
protected by sem_mtx.

Organize both used and free sem_undo structures into the lists,
protected by sem_undo_mtx. In semexit_myhook(), remove sem_undo
structure that is being processed, from used list, without putting it
onto the free to prevent modifications by other threads. This allows
for sem_undo_lock to be dropped to acquire individial semaphore locks
without violating lock order. Since IPC_RMID may no longer find this
sem_undo, do tolerate references to unallocated semaphores in undo
structure, and check sequential number to not undo unrelated semaphore
with the same id.

While there, convert functions definitions to ANSI C and fix small
style(9) glitches.

Reported by:	Omer Faruk Sen <omerfsen gmail com> [1], pho [2]
Reviewed by:	rwatson [3]
Tested by:	pho
MFC after:	1 month
2009-01-14 15:20:13 +00:00
John Baldwin
b7f1c1d210 Add a new KTR tracepoint in the KTR_CALLOUT class to note when a callout
routine finishes executing.

MFC after:	1 week
2009-01-13 15:56:53 +00:00
Konstantin Belousov
d19b99278c Do not call namei() while having another user-controlled vnode
locked. Lookup could attempt to recursively lock that vnode.

Do not call vn_start_write(V_WAIT) while vnode is locked, this may
result in a deadlock with suspension.

vfs_busy() the mountpoint before dropping vnode lock for vnode
that was used to look up the mountpoint, to prevent unmount in
between.

Reported and tested by:	pho
Reviewed by:	rwatson
MFC after:	3 weeks
2009-01-08 12:47:30 +00:00
Ed Schouten
fdef61da21 Remove Giant locking from domains list.
During boot, the domain list is locked with Giant. It is not possible to
register any protocols after the system has booted, so the lock is only
used to protect insertion of entries.

There is already a mutex in uipc_domain.c called dom_mtx. Use this mutex
to lock the list, instead of using Giant. It won't matter anything with
respect to performance, but we'll never get rid of Giant if we don't
remove from places where we don't need it.

Approved by:	rwatson
MFC after:	3 weeks
2009-01-04 19:22:53 +00:00
Robert Watson
192a6120fc Remove two further uses (debugging and NULLing) of pr_ousrreq, missed due
to svn commit in the wrong directory.

Spotted by:	bz
2009-01-04 19:16:36 +00:00
Bjoern A. Zeeb
eb76156548 Back out r186615; the sanitizing of the pointers in the error case
is not needed and seems that it will not be needed either.

Pointy hat:	mine, mine, mine and not pho's
2009-01-04 12:18:18 +00:00
Konstantin Belousov
641e2829b6 Extend the struct vm_page wire_count to u_int to avoid the overflow
of the counter, that may happen when too many sendfile(2) calls are
being executed with this vnode [1].

To keep the size of the struct vm_page and offsets of the fields
accessed by out-of-tree modules, swap the types and locations
of the wire_count and cow fields. Add safety checks to detect cow
overflow and force fallback to the normal copy code for zero-copy
sockets. [2]

Reported by:	Anton Yuzhaninov <citrin citrin ru> [1]
Suggested by:	alc [2]
Reviewed by:	alc
MFC after:	2 weeks
2009-01-03 13:24:08 +00:00
Ed Schouten
bcca92c0e5 Fix a corner case in my previous commit.
Even though there are not many setups that have absolutely no console
device, make sure a close() on a TTY doesn't dereference a null pointer.
2009-01-02 23:39:29 +00:00
Ed Schouten
916501c81c Don't let /dev/console be revoked if the TTY below is being closed.
During startup some of the syscons TTY's are used to set attributes like
the screensaver and mouse options. These actions cause /dev/console to
be rendered unusable.

Fix the issue by leaving the TTY opened when it is used as the console
device.

Reported by:	imp
2009-01-02 23:32:43 +00:00
Robert Watson
b523ec24b9 White space and comment tweaks.
MFC after:	3 weeks
2009-01-01 20:03:22 +00:00
Robert Watson
877e88123e Temporary workaround for the limitations of the mbuf flowid field: zero
the field in the mbuf constructors, since otherwise we have no way to
tell if they are valid.  In the future, Kip has plans to add a flag
specifically to indicate validity, which is the preferred model.
2009-01-01 20:03:01 +00:00
Ed Schouten
4cfe479098 Don't clobber sysctl_root()'s error number.
When sysctl() is being called with a buffer that is too small, it will
return ENOMEM. Unfortunately the changes I made the other day sets the
error number to 0, because it just returns the error number of the
copyout(). Revert this part of the change.
2009-01-01 00:19:51 +00:00
Ivan Voras
98a4431181 Document the relationship between enum VM_GUEST and the vm_guest_sysctl_names
array.

Approved by:	gnn (original version)
2008-12-30 23:49:54 +00:00
Peter Holm
1d347f061f Added missing second part of cleaning j->ip[46] as requested by bz
Approved by:	kib (mentor)
Pointy hat:	pho
2008-12-30 20:39:47 +00:00
Peter Holm
bc971b2c82 Make sure that unused j->ip[46] are cleared
Reviewed by:	bz
Approved by:	kib (mentor)
2008-12-30 17:54:25 +00:00
Robert Watson
a9f3c7d2ff Rename mbcnt to mbcnt_delta in uipc_send() -- unlike other local
variables named mbcnt in uipc_usrreq.c, this instance is a delta
rather than a cache of sb_mbcnt.

MFC after:	3 weeks
2008-12-30 16:09:57 +00:00
Konstantin Belousov
7efa697d80 Clear the pointers to the file in the struct filedesc before file is closed
in fdfree. Otherwise, sysctl_kern_proc_filedesc may dereference stale
struct file * values.

Reported and tested by:	pho
MFC after:	1 month
2008-12-30 12:51:56 +00:00
Konstantin Belousov
83e73926ad In r185557, the check for existing negative entry for the given name
did not compared nc_dvp with supplied parent directory vnode pointer.
Add the check and note that now branches for vp != NULL and vp == NULL
are the same, thus can be merged.

Reported and reviewed by:	kan
Tested by:	pho
MFC after:	2 weeks
2008-12-30 12:51:14 +00:00
Ed Schouten
71b6d504c1 Fix compilation. Also move ogetkerninfo() to kern_xxx.c.
It seems I forgot to remove `int error' from a single piece of code. I'm
also moving ogetkerninfo() to kern_xxx.c, because it belongs to the
class of compat system information system calls, not the generic sysctl
code.
2008-12-29 19:24:00 +00:00
Ed Schouten
ddf9d24349 Push down Giant inside sysctl. Also add some more assertions to the code.
In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by:	Jille Timmermans <jille quis cx>
2008-12-29 12:58:45 +00:00
Konstantin Belousov
22a448c4d9 vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by:	pho
Approved by:	des
MFC after:	2 weeks
2008-12-29 12:45:11 +00:00
Kip Macy
08a2459ee1 drop rnh lock before destroying it 2008-12-28 14:32:27 +00:00
Bjoern A. Zeeb
34820bbf06 Hide detect_virtual() along with the accompanying string
arrays under #ifndef XEN to make XEN config compile again.
In case of Xen vm_guest is hard coded.

Move the list for the vm_guest sysctl out of the restictive
bounds as the sysctl is there in either case.
2008-12-27 17:19:16 +00:00
Peter Holm
ab62a2d023 Prevent overflow of uio_resid.
Approved by:	kib
2008-12-27 10:13:43 +00:00
Robert Watson
9c232f86ca Following the recent security advisory, add a comment describing our
invariants and approach for protocol switch methods in protsw_init(),
and also some KASSERT's for non-domain init entries in protocol
switch tables: pru_abort and pru_send must both be implemented.

For now, leave those assertions #if 0'd, since there are a few
protocols that violate them in non-harmful ways.  Whether or not we
should enforce pru_abort being implemented for non-stream protocols
is an interesting question: currently abort is only invoked on stream
sockets in situations where un-accepted sockets must be abruptly
closed (i.e., close() on a listen socket with pending connections),
but in principle it is useful for datagram sockets and most datagram
socket types implement it.

MFC after:	3 weeks
2008-12-25 11:32:38 +00:00
Joe Marcus Clarke
4769218f4b Do not KASSERT when vp->v_dd is NULL. Only directories which have had ".."
looked up would have v_dd set to a non-NULL value.  This fixes a panic
seen when running installworld on a diskless system with a separate /usr
file system.

Submitted by:	cracauer
Approved by:	kib
2008-12-23 20:43:42 +00:00
Konstantin Belousov
86dcb537c9 Keep the hold on the vnode during VOP_VPTOCNP() call, allowing the vop
implementation to drop vnode lock, if needed.

Reported and tested by:	pho
2008-12-23 20:04:31 +00:00
Ivan Voras
59d9578919 Add missing newlines to flags tags of CPU topology, for prettier
output.

Reviewed by:	jeff (original version)
Approved by:	gnn (mentor) (original version)
2008-12-23 16:19:59 +00:00
Colin Percival
f0b40b1c97 Prevent cross-site forgery attacks on ftpd(8) due to splitting
long commands into multiple requests. [08:12]

Avoid calling uninitialized function pointers in protocol switch
code. [08:13]

Merry Christmas everybody...

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-08:12.ftpd, FreeBSD-SA-08:13.protosw
2008-12-23 01:23:09 +00:00
Ed Schouten
3a4d0c86aa Revert r185891.
In r185891 I removed the newlines from messages written to /dev/console,
because it made startup messages from rc-scripts harder to read. This,
unfortunately, causes the kernel message that is printed after a
non-terminated log message to be concatenated.

This could be fixed, but on short term it's better to just revert the
change.

Reported by:	Jaakko Heinonen <jh saunalahti fi>
2008-12-21 21:54:01 +00:00
Ed Schouten
67dd0ccbee Set PTS_FINISHED before waking up any threads.
Inside ptsdrv_{in,out}wakeup() we call KNOTE_LOCKED() to wake up any
kevent(2) users. Because the kqueue handlers are executed synchronously,
we must set PTS_FINISHED before calling ptsdrv_{in,out}wakeup().

Discovered by:	nork
2008-12-21 21:16:57 +00:00
Ed Schouten
9d34a1338c Let wchan names more closely match pre-MPSAFE TTY behaviour.
Right now the wchan strings "ttyinp" and "ttybgw" only differ one
character from the strings we used prior to MPSAFE TTY. Just rename them
back to their pre-MPSAFE TTY counterparts.

Also rename "ttylck" to "ttymtx", which should make it more clear that a
process is blocked on the TTY mutex, not some other form of locking.
2008-12-20 09:36:40 +00:00
Nathan Whitehorn
91416fb268 Modularize the Open Firmware client interface to allow run-time switching
of OFW access semantics, in order to allow future support for real-mode
OF access and flattened device frees. OF client interface modules are
implemented using KOBJ, in a similar way to the PPC PMAP modules.

Because we need Open Firmware to be available before mutexes can be used on
sparc64, changes are also included to allow KOBJ to be used very early in
the boot process by only using the mutex once we know it has been initialized.

Reviewed by:    marius, grehan
2008-12-20 00:33:10 +00:00
Ivan Voras
bb501b18e8 Further beautify the lock strings to be more pleasing to the eye and
self documenting within 6 characters.

Reviewed by:	ed (older version)
Approved by:	gnn (older version)
2008-12-19 14:49:14 +00:00
Ruslan Ermilov
98bd6a1982 Removed a comment made obsolete by revisions 157927 and 174292. 2008-12-18 15:56:12 +00:00
Ivan Voras
3610a2260b By popular request, stringify kern.vm_guest sysctl. Now it returns a
short, self-documenting string describing the detected virtual
environment.

Approved by:	gnn (mentor) (earlier version)
2008-12-18 15:34:38 +00:00
Ivan Voras
0e469db660 Remove spaces in wait object names to make top (1) output prettier and
unbreak scripts that examine ps (1) output.

Reviewed by:	ed
Approved by:	gnn (mentor)
2008-12-18 15:25:33 +00:00
Konstantin Belousov
548066ea66 The quotactl, statfs and fstatfs syscall implementations may dereference
NULL pointer to struct mount if the looked up vnode is reclaimed. Also,
these syscalls only mnt_ref() the mp, still allowing it to be unmounted;
only struct mount memory is kept from being reused.

Lock the vnode when doing name lookup, then reference its mount point,
unlock the vnode and vfs_busy the mountpoint. This sequence shall take
care of both races.

Reported and tested by:	pho
Discussed with:	attilio
MFC after:	1 month
2008-12-18 12:01:19 +00:00
Konstantin Belousov
2cfddad734 Do not return success and doomed vnode from lookup. LK_UPGRADE allows
the vnode to be reclaimed.

Tested by:	pho
MFC after:	1 month
2008-12-18 11:58:12 +00:00
Ivan Voras
3dc309114a Introduce a sysctl kern.vm_guest that reflects what the kernel knows about
it running under a virtual environment. This also introduces a globally
accessible variable vm_guest that can be used where appropriate in the
kernel to inspect this environment.

To make it easier for the long run, an enum VM_GUEST is also introduced,
which could possibly be factored out in a header somewhere (but the
question is where - vm/vm_param.h? sys/param.h?) so it eventually becomes
a part of the standard KPI. In any case, it's a start.

The purpose of all this isn't to absolutely detect that the OS is running
under a virtual environment (cf. "redpill") but to allow the parts of the
kernel and the userland that care about this particular aspect and can do
something useful depending on it to have a standardised interface. Reducing
kern.hz is one example but there are other things that could be done like
avoiding context switches, not using CPU instructions that are known to be
slow in emulation, possibly different strategies in VM (memory) allocation,
CPU scheduling, etc.

It isn't clear if the JAILS/VIMAGE functionality should also be exposed
by this particular mechanism (probably not since they're not "full"
virtual hardware environments). Sometime in the future another sysctl and
a variable could be introduced to reflect if the kernel supports any kind
of virtual hosting (e.g. VMWare VMI, Xen dom0).

Reviewed by:	silence from src-commiters@, virtualization@, kmacy@
Approved by:	gnn (mentor)
Security:	Obscurity doesn't help.
2008-12-17 19:57:12 +00:00
Peter Wemm
a3ac8c94cf Remove sysctl debug.elf_trace and the trace field in auxargs. They go
nowhere.  It used to be the equivalent of $LD_DEBUG in rtld-elf.
Elf_Auxargs is an internal structure.
2008-12-17 16:54:29 +00:00
Warner Losh
35c2a5a852 Minor style(9) nit. 2008-12-17 16:25:20 +00:00
Konstantin Belousov
6f3475454e Remove two remnant uses of AT_DEBUG. 2008-12-17 13:13:35 +00:00
Attilio Rao
4a0f807602 1) Fix a deadlock in the VFS:
- threadA runs vfs_rel(mp1)
- threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK()
- threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps
  waiting for threadB to complete the unmount
- threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting
  for the refcount to expire.

Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the
unmounter is waiting for mnt_ref to expire.
The vfs_busy contenders got awake, fails, and if they retry the
MNTK_REFEXPIRE won't allow them to sleep again.

2) Simplify significantly the code of vfs_mount_destroy() trimming
   unnecessary codes:
   - as long as any reference exited, it is no-more possible to have
     write-op (primarty and secondary) in progress.
   - it is no needed to drop and reacquire the mount lock.
   - filling the structures with dummy values is unuseful as long as
     it is going to be freed.

Tested by:	pho, Andrea Barberio <insomniac at slackware dot it>
Discussed with:	kib
2008-12-16 23:16:10 +00:00
Alexander Motin
d288bcc4df If possible, try to obtain max_mhz on cpufreq attach instead of first request.
On HyperThreading CPUs logical cores have same frequency, so setting it
on any core will change the other's one. In most cases first request
to the second core will be the "set" request, done after setting frequency
of the first core. In such case second CPU will obtain throttled frequency
of the first core as it's max_mhz making cpufreq broken due to different
frequency sets.
2008-12-16 01:24:05 +00:00