7234 Commits

Author SHA1 Message Date
John Baldwin
e506e182dd Export some more useful info about shared memory objects to userland
via procstat(1) and fstat(1):
- Change shm file descriptors to track the pathname they are associated
  with and add a shm_path() method to copy the path out to a caller-supplied
  buffer.
- Use the fo_stat() method of shared memory objects and shm_path() to
  export the path, mode, and size of a shared memory object via
  struct kinfo_file.
- Add a struct shmstat to the libprocstat(3) interface along with a
  procstat_get_shm_info() to export the mode and size of a shared memory
  object.
- Change procstat to always print out the path for a given object if it
  is valid.
- Teach fstat about shared memory objects and to display their path,
  mode, and size.

MFC after:	2 weeks
2012-04-01 18:22:48 +00:00
David Chisnall
87d94367b9 Bump __FreeBSD_version for xlocale cleanup, as requested by ports people.
Approved by:	dim (mentor)
2012-04-01 09:35:23 +00:00
Hans Petter Selasky
d1eacc02f1 Move tty_opened_ns() into syscons.c which is currently the
only client of this macro.

Suggested by:	ed @
MFC after:	1 week
2012-03-29 15:47:29 +00:00
Hans Petter Selasky
8dbeb1b6cc Fix for NULL-pointer panic during boot, if keys are pressed too early.
MFC after:	1 week
2012-03-29 14:53:14 +00:00
Fabien Thomas
f5f9340b98 Add software PMC support.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after:	1 month
2012-03-28 20:58:30 +00:00
Kirk McKusick
1faacf5d09 Keep track of the mount point associated with a special device
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.

Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.

This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.

Reviewed by: kib
2012-03-28 20:49:11 +00:00
Ryan Stone
9742410797 Instead of only iterating over the set of known SDT probes when sdt.ko is
loaded and unloaded, also have sdt.ko register callbacks with kern_sdt.c
that will be called when a newly loaded KLD module adds more probes or
a module with probes is unloaded.

This fixes two issues: first, if a module with SDT probes was loaded after
sdt.ko was loaded, those new probes would not be available in DTrace.
Second, if a module with SDT probes was unloaded while sdt.ko was loaded,
the kernel would panic the next time DTrace had cause to try and do
anything with the no-longer-existent probes.

This makes it possible to create SDT probes in KLD modules, although there
are still two caveats: first, any SDT probes in a KLD module must be part
of a DTrace provider that is defined in that module.  At present DTrace
only destroys probes when the provider is destroyed, so you can still
panic the system if a KLD module creates new probes in a provider from a
different module(including the kernel) and then unload the the first module.

Second, the system will panic if you unload a module containing SDT probes
while there is an active D script that has enabled those probes.

MFC after:	1 month
2012-03-27 15:07:43 +00:00
Oleksandr Tymoshenko
b74d1af74e Add .reginfo section entry 2012-03-26 21:26:23 +00:00
Robert Millan
63eebf9c6b Register signal 33 explicitly as reserved by real-time library, and
use it by its new name (SIGLIBRT) rather than internal definition
in librt (SIGSERVICE).

Approved by:	davidxu, arch
2012-03-26 19:12:09 +00:00
Marius Strobl
f7b9ae0882 Remove second consts in r233288 in order to appease C++ compilers.
While at it, remove some style(9) bugs in libkern.h.

Submitted by:	kan
2012-03-26 18:22:04 +00:00
Alexander V. Chernikov
b25711e6b0 - Add knlist_init_rw_reader() function to kqueue(9).
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues

Reviewed by:    glebius
Approved by:    ae(mentor)

MFC after:      2 weeks
2012-03-26 09:34:17 +00:00
Edward Tomasz Napierala
bd944e019e Remove unused define.
Discussed with:	kib
2012-03-25 12:53:19 +00:00
Oleksandr Tymoshenko
27f3b996d0 Add define for MIPS.options 2012-03-23 22:52:23 +00:00
Mikolaj Golub
903712c99c Add a sysctl to set and retrieve binary osreldate of another process.
Suggested by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-03-23 20:05:41 +00:00
Oleksandr Tymoshenko
d8a2d243bf Add Octeon class and CPU type 2012-03-23 00:03:26 +00:00
Oleksandr Tymoshenko
2d57314a1b Fix PMC syscall on 64-bit big endian systems.
Sycall argument is pointer to array of register_t values. Casting it to
pointer to structure with fields of size smaller then register_t we rely
on compiler-dependent memory layout of structure.

Tested on: mips64 and amd64 systems
2012-03-22 17:36:53 +00:00
Alan Cox
5730afc9b6 Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB.  In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB.  This is exactly as
recommended in AMD's documentation.  In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry.  In short, this saves us from
having to perform potentially costly TLB flushes.  In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry.  Usually, such spurious page faults are handled
by vm_fault() without incident.  However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault().  This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with:	kib
Tested by:		gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after:	1 week
2012-03-22 04:52:51 +00:00
Marius Strobl
cc0c154ffb Declare the CRC lookup-tables const as they hardly should change at
run-time.
2012-03-21 20:55:21 +00:00
Eitan Adler
24c10828e4 - Clean up timestamps in msgbuf code. The timestamps should now be
inserted after the priority token thus cleaning up the output.
- Remove the needless double internal do_add_char function.
- Resolve a possible deadlock if interrupts are
    disabled and getnanotime is called

Reviewed by:	bde  kmacy, avg, sbruno (various versions)
Approved by:	cperciva
MFC after:	2 weeks
2012-03-19 00:36:32 +00:00
Alexander Motin
3907c073e5 Tune cpuset macros to optimize cases when CPU_SETSIZE fits into single
machine word. For example, it turns CPU_SET() into expected shift and OR,
removing two extra shifts and additional index on memory access.

Generated code checked for kernel (optimized) and user-level (unoptimized)
cases with GCC and CLANG.

Reviewed by:	attilio
MFC after:	2 weeks
2012-03-12 07:02:16 +00:00
Konstantin Belousov
b80dcb55aa Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by:	gianni
2012-03-11 12:19:58 +00:00
Alexander Motin
bcfd016cff Idle ticks optimization:
- Pass number of events to the statclock() and profclock() functions
   same as to hardclock() before to not call them many times in a loop.
 - Rename them into statclock_cnt() and profclock_cnt().
 - Turn statclock() and profclock() into compatibility wrappers,
   still needed for arm.
 - Rename hardclock_anycpu() into hardclock_cnt() for unification.

MFC after:	1 week
2012-03-10 14:57:21 +00:00
Konstantin Belousov
3e1ca43bde Add brackets around bare '-1' used as the macro body.
Noted by:	bde
MFC after:	1 week
2012-03-10 08:48:52 +00:00
Konstantin Belousov
38ddb5725b Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after:	1 week
2012-03-09 00:12:05 +00:00
John Baldwin
44ad547522 Add a new sched_clear_name() method to the scheduler interface to clear
the cached name used for KTR_SCHED traces when a thread's name changes.
This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's
name changes, most commonly via execve().

MFC after:	2 weeks
2012-03-08 19:41:05 +00:00
Konstantin Belousov
f950879e16 The pipe_poll() performs lockless access to the vnode to test
fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed
by fifo_cleanup().

Precalculate EOF at the places were fo_wgen is changed, and cache the
state in a new pipe state flag PIPE_SAMEWGEN.

Reported and tested by:	bf
Submitted by:	gianni
MFC after:	1 week (a backport)
2012-03-07 07:31:50 +00:00
Edward Tomasz Napierala
c34bbd2ada Make racct and rctl correctly handle jail renaming. Previously
they would continue using old name, the one jail was created with.

PR:		bin/165207
2012-03-06 11:05:50 +00:00
David Chisnall
a8ed63bb3d Reapply 227753 (xlocale cleanup), plus some fixes so that it passes build
universe with gcc.

Approved by:	dim (mentor)
2012-03-04 15:31:13 +00:00
Juli Mallett
9624d94701 o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
Rick Macklem
5e99212d36 Post r230394, the Lookup RPC counts for both NFS clients increased
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.

Reported by:	bde
Discussed with:	kib
Reviewed by:	jhb
MFC after:	2 weeks
2012-03-03 01:06:54 +00:00
Davide Italiano
78d763a29b - Add support for the Intel Sandy Bridge microarchitecture (both core and uncore counting events)
- New manpages with event lists.
- Add MSRs for the Intel Sandy Bridge microarchitecture

Reviewed by:	attilio, brueffer, fabient
Approved by:	gnn (mentor)
MFC after:	3 weeks
2012-03-01 21:23:26 +00:00
John Baldwin
831ce4cb3d - Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
  constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t).  Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by:	scottl
2012-03-01 19:58:34 +00:00
Kirk McKusick
35338e6091 This change avoids a kernel deadlock on "snaplk" when using
snapshots on UFS filesystems running with journaled soft updates.
This is the first of several bugs that need to be fixed before
removing the restriction added in -r230250 to prevent the use
of snapshots on filesystems running with journaled soft updates.

The deadlock occurs when holding the snapshot lock (snaplk)
and then trying to flush an inode via ffs_update(). We become
blocked by another process trying to flush a different inode
contained in the same inode block that we need. It holds the
inode block for which we are waiting locked. When it tries to
write the inode block, it gets blocked waiting for the our
snaplk when it calls ffs_copyonwrite() to see if the inode
block needs to be copied in our snapshot.

The most obvious place that this deadlock arises is in the
ffs_copyonwrite() routine when it updates critical metadata
in a snapshot and tries to write it out before proceeding.
The fix here is to write the data and indirect block pointer
for the snapshot, but to skip the call to ffs_update() to
write the snapshot inode. To ensure that we will never have
to update a pointer in the inode itself, the ffs_snapshot()
routine that creates the snapshot has to ensure that all the
direct blocks are allocated as part of the creation of the
snapshot.

A less obvious place that this deadlock occurs is when we hold
the snaplk because we are deleting a snapshot. In the course of
doing the deletion, we need to allocate various soft update
dependency structures and allocate some journal space. If we
hit a resource limit while doing this we decrease the resources
in use by flushing out an existing dirty file to get it to give
up the soft dependency resources that it holds. The flush can
cause an ffs_update() to be done on the inode for the file that
we have selected to flush resulting in the same deadlock as
described above when the inode that we have chosen to flush
resides in the same inode block as the snapshot inode that we hold.
The fix is to defer cleaning up any time that the inode on which
we are operating is a snapshot.

Help and review by:    Jeff Roberson
Tested by:             Peter Holm
MFC (to 9 only) after: 2 weeks
2012-03-01 18:45:25 +00:00
Mikolaj Golub
c7e41c8b50 Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
operations for setting and accessing vnode's v_socket field.

The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).

This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.

PR:		kern/51583, kern/159663
Suggested by:	kib
Reviewed by:	arch
MFC after:	2 weeks
2012-02-29 21:38:31 +00:00
Martin Matuska
41c0675e6e Add procfs to jail-mountable filesystems.
Reviewed by:	jamie
MFC after:	1 week
2012-02-29 00:30:18 +00:00
Konstantin Belousov
1d7ca9bb8e Currently, the debugger attached to the process executing vfork() does
not get syscall exit notification until the child performed exec of
exit.  Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.

Reported, tested and reviewed by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 2 weeks
2012-02-27 21:10:10 +00:00
John Baldwin
d7ccbd7009 Typo. 2012-02-27 18:28:31 +00:00
Martin Matuska
e7af90ab00 Analogous to r232059, add a parameter for the ZFS file system:
allow.mount.zfs:
	allow mounting the zfs filesystem inside a jail

This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.

Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.

TODO:	document the connection of allow.mount.* and VFCF_JAIL for kernel
	developers

MFC after:	10 days
2012-02-26 16:30:39 +00:00
Mikolaj Golub
6ce13747dc Add sysctl to retrieve or set umask of another process.
Submitted by:	Dmitry Banschikov <me ubique spb ru>
Discussed with:	kib, rwatson
Reviewed by:	kib
MFC after:	2 weeks
2012-02-26 14:25:48 +00:00
Konstantin Belousov
747d2fa178 Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number.  This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by:	Jukka A. Ukkonen <jau iki fi>
PR:	  kern/162352
Discussed with:	bz
Reviewed by:	glebius
MFC after:	2 weeks
2012-02-26 13:55:43 +00:00
Martin Matuska
34b95dbb6a Bump __FreeBSD_version due to libarchive update. 2012-02-25 11:03:13 +00:00
Mikolaj Golub
662c901c54 When detaching an unix domain socket, uipc_detach() checks
unp->unp_vnode pointer to detect if there is a vnode associated with
(binded to) this socket and does necessary cleanup if there is.

The issue is that after forced unmount this check may be too late as
the unp_vnode is reclaimed and the reference is stale.

To fix this provide a helper function that is called on a socket vnode
reclamation to do necessary cleanup.

Pointed by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-02-25 10:15:41 +00:00
David Xu
df1f1bae9e In revision 231989, we pass a 16-bit clock ID into kernel, however
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
2012-02-25 02:12:17 +00:00
Martin Matuska
bf3db8aa65 To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:

allow.mount.devfs:
	allow mounting the devfs filesystem inside a jail

allow.mount.nullfs:
	allow mounting the nullfs filesystem inside a jail

Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.

Reviewed by:	jamie
Suggested by:	pjd
MFC after:	2 weeks
2012-02-23 18:51:24 +00:00
Kip Macy
11ac7ec076 merge pipe and fifo implementations
Also reviewed by: jhb, jilles (initial revision)
Tested by: pho, jilles

Submitted by:	gianni
Reviewed by:	bde
2012-02-23 18:37:30 +00:00
Konstantin Belousov
dcd432817e Allow the parent to gather the exit status of the children reparented
to the debugger.  When reparenting for debugging, keep the child in
the new orphan list of old parent.  When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.

Submitted by:	Dmitry Mikulin <dmitrym juniper.net>
MFC after:	2 weeks
2012-02-23 11:50:23 +00:00
David Xu
b13a8fa78f Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.
2012-02-22 03:22:49 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
Eitan Adler
f17a6f1b17 Add a timestamp to the msgbuf output in order to determine when when
messages were printed.

This can be enabled with the kern.msgbuf_show_timestamp sysctl

PR:		kern/161553
Reviewed by:	avg
Submitted by:	Arnaud Lacombe <lacombar@gmail.com>
Approved by:	cperciva
MFC after:	1 month
2012-02-16 05:11:35 +00:00
Dimitry Andric
b74cf6dcf1 Revert r231673 and r231682 for now, until we can run a full make
universe with them.  Sorry for the breakage.

Pointy hat to:	     me and brooks
2012-02-14 21:48:46 +00:00