11262 Commits

Author SHA1 Message Date
ed
e309996070 Enable secure TTY input buffer flushing by default.
I'm leaving the sysctl there. If people really notice a slowdown, they
can revert to the old behaviour.

Discussed with:	kib
2009-05-21 16:48:06 +00:00
ed
77fae8a219 Add a new sysctl: kern.tty_inq_flush_secure.
When enabled all TTY input queue buffers are zeroed when flushing or
closing the TTY. Because TTY input queues are also used to store filled
in passwords, this may be an interesting switch to enable for security
minded people.
2009-05-21 16:19:54 +00:00
jhb
35ba8bce41 Only use the ABI compat shim for vfs.bufspace if the old buffer is smaller
than a long.

PR:		amd64/134786
Submitted by:	Emil Mikulic  emikulic| gmail
MFC after:	3 days
2009-05-21 16:18:45 +00:00
attilio
68353e273f Move the M_WAITOK flag in notify() into an M_NOWAIT one in order to match
the behaviour alredy present with the further malloc() call in
devctl_notify().
This fixes a bug in the CAM layer where the camisr handler finished to
call camperiphfree() (and subsequently destroy_dev() resulting in a new
dev notify) while the xpt lock is held.

PR:		kern/130330
Tested by:	Riccardo Torrini <riccardo dot torrini at esaote dot com>
2009-05-21 13:22:07 +00:00
jhb
ebdd571432 Set the umask in a new file descriptor table earlier in fdcopy() to remove
two lock operations.
2009-05-20 18:42:04 +00:00
jhb
53d01dc702 Remove an obsolete assertion. We always wake up all waiters when unlocking
a mutex and never set the lock cookie == MTX_CONTESTED.
2009-05-20 18:29:14 +00:00
jhb
033485e00c Fix a typo. 2009-05-20 17:19:30 +00:00
imp
d339e0404f We no longer need to use d_thread_t for portability here, switch to
struct thread *.
2009-05-20 16:58:16 +00:00
kmacy
879984a728 Add minimal ZFS lock hierarchy 2009-05-20 02:51:48 +00:00
rwatson
9c1bc41813 With SMPng, DEVICE_POLLING uses its own idle threads, rather than the
system idle loop, to run ether_poll(), so make ether_poll() static.

MFC after:	1 week
2009-05-19 19:21:25 +00:00
avg
89d59b82b3 sysctl_rman: report shared resources to devinfo
shared uses of a resource are recorded on a sub-list hanging off
a main resource object on a main resource list;
without this change a shared resource (e.g. irq) is reported only
once by devinfo -r/-u;
with this change the resource is reported for each driver that
allocates it (which is even more than what vmstat -i -a reports).

Approved by:	jhb (mentor)
2009-05-19 14:08:21 +00:00
rwatson
d9e163e093 Binding interrupts to a CPU consists of two parts: setting up CPU
affinity for the interrupt thread, and requesting that underlying
hardware direct interrupts to the CPU.  For software interrupt
threads, implement a no-op interrupt event binder that returns
success, so that the interrupt management code will just set the
ithread's affinity and succeed.

Reviewed by:	jhb
MFC after:	1 week
2009-05-18 14:02:55 +00:00
ed
a6c06ba89c Mark the clock sysctls as MPSAFE.
These sysctls don't need any form of locking. At least cp_times is used
by powerd very often, which means I get 50% less calls to non-MPSAFE
sysctls on my system. The other 50% is consumed by dev.cpu.0.freq, but
this seems to need Giant for Newbus.
2009-05-18 12:03:43 +00:00
alc
c8b00d493e Several changes to vfs_bio_clrbuf():
Provide a more descriptive comment.

Eliminate dead code.  The page cannot possibly have PG_ZERO set.

Eliminate unnecessary blank lines.

Reviewed by:	tegge
2009-05-17 23:25:53 +00:00
alc
dc942dabcf Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This
eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg().

In collaboration with:	tegge
2009-05-17 20:26:00 +00:00
ed
44192767f8 Print an extra newline when not at the first column already.
This makes siginfo output look a lot better when pressing it the first
time when in sh(1), for example:

	$ load: 0.00  cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k
	load: 0.00  cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k

will now become:

	$
	load: 0.00  cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k
	load: 0.00  cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k
2009-05-17 16:17:48 +00:00
ed
69acbef6ca Several cleanups to tty_info(), better known as Ctrl-T.
- Only pick up PROC_LOCK once, which means we can drop the PGRP_LOCK
  right after picking up PROC_LOCK for the first time.

- Print the process real time, making it consistent with tools like
  time(1).

- Use `p' and `td' to reference the process/thread we are going to
  print. Only use pick-variables inside the loops. We already did this
  for the threads, but not the processes.
2009-05-17 12:30:25 +00:00
des
9919b017d4 Remove do-nothing code that was required to dirty the old buffer on Alpha.
Coverity ID:	838
Approved by:	jhb, alc
2009-05-15 21:34:58 +00:00
kib
b8162aa0c9 Revert r192094. The revision caused problems for sysctl(3) consumers
that expect that oldlen is filled with required buffer length even when
supplied buffer is too short and returned error is ENOMEM.

Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when
remaining buffer space is too short for the current kinfo_file structure.
Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition
to keep existing interface for the sysctl, though.

Reported by:	ed, Florian Smeets <flo kasimir com>
Tested by:	pho
2009-05-15 14:41:44 +00:00
jhb
cbf4ebe5a3 - Use a separate sx lock to try to limit the number of concurrent userland
sysctl requests to avoid wiring too much user memory.  Only grab this
  lock if the user's old buffer is larger than a page as a tradeoff to
  allow more concurrency for common small requests.
- Just use a shared lock on the sysctl tree for user sysctl requests now.

MFC after:	1 week
2009-05-14 22:01:32 +00:00
kib
7361db2279 Do not advance req->oldidx when sysctl_old_user returning an
error due to copyout failure or short buffer.

The later breaks the usermode iterators of the sysctl results that pack
arbitrary number of variable-sized structures. Iterator expects that
kernel filled exactly oldlen bytes, and tries to interpret half-filled
or garbage structure at the end of the buffer. In particular,
kinfo_getfile(3) segfaulted.

Reported and tested by:	pho
MFC after:	3 weeks
2009-05-14 10:54:57 +00:00
jeff
20397e6431 - Implement a lockless file descriptor lookup algorithm in
fget_unlocked().
 - Save old file descriptor tables created on expansion until
   the entire descriptor table is freed so that pointers may be
   followed without regard for expanders.
 - Mark the file zone as NOFREE so we may attempt to reference
   potentially freed files.
 - Convert several fget_locked() users to fget_unlocked().  This
   requires us to manage reference counts explicitly but reduces
   locking overhead in the common case.
2009-05-14 03:24:22 +00:00
alc
82da6bfdea Eliminate page queues locking from bufdone_finish() through the
following changes:

Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does.  Suggested by: tegge

Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies.  Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.

Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.

Introduce vm_page_set_valid().

Reviewed by:	tegge
2009-05-13 05:39:39 +00:00
trasz
e6d7976851 Add missing 'break' statement.
Found with:	Coverity Prevent(tm)
CID:		3919
2009-05-12 17:05:40 +00:00
kib
60c4168558 Prevent overflow of uio_resid.
Noted by:	jhb
MFC after:	3 days
2009-05-11 19:58:03 +00:00
attilio
c639aa3d25 Fix a kernel compilation error, introduced after r191990, by defining
thread with curthread in the AUDIT case.

Reported by:	dchagin
2009-05-11 16:32:58 +00:00
attilio
1dcb84131b Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
alc
123b385c44 Revert CVS revision 1.94 (svn r16840). Current pmap implementations don't
suffer from the race condition that motivated revision 1.94.  Consequently,
the work-around that was implemented by revision 1.94 is no longer needed.
Moreover, reverting this work-around eliminates the need for
vfs_busy_pages() to acquire the page queues lock when preparing a buffer
for read.

Reviewed by:	tegge
2009-05-11 05:16:57 +00:00
imp
9a12159016 Spell NULL properly, use (void) rather than () for functions with no
parameters.  Mark two items as static that aren't used elsewhere...
2009-05-09 19:08:22 +00:00
imp
66ca9cb573 Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so
it can go.
2009-05-09 19:00:47 +00:00
kan
7b57a857b7 Do not embed struct ucred into larger netcred parent structures.
Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.

While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.

Reported by:	John Hickey
PR:		kern/133439
Reviewed by:	dfr, kib
2009-05-09 18:09:17 +00:00
zec
b31e199a10 A NOP change: style / whitespace cleanup of the noise that slipped
into r191816.

Spotted by:	bz
Approved by:	julian (mentor) (an earlier version of the diff)
2009-05-08 14:34:25 +00:00
zec
639797b2e6 Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
jamie
267ea54b44 Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
kib
78e147b4e4 Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If
vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy().

Suggested and reviewed by:	jeff
Tested by:	pho
MFC after:	3 weeks
2009-05-07 18:14:21 +00:00
ed
2f086e8725 If we have a regular rint handler, never go into rint_bypass mode.
It turns out if we called cfmakeraw() on a TTY with only a rint handler
in place, it could inject data into the TTY, even though it should be
redirected. Always take a look at the hooks before looking at the
termios flags.
2009-05-07 17:39:23 +00:00
zec
d78a1b1a82 Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
jamie
8e4ffe653f Add a constant PR_MAXMETHOD to better define the jail/OSD interface.
Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-05 05:49:08 +00:00
ed
fb2908c8ff Remove unneeded check for SESS_LEADER().
We perform the same check ~10 lines above.
2009-05-04 11:11:10 +00:00
jamie
d462264a61 Don't call the OSD destructor if the data slot is NULL
(since it's already not done on unused slots, which are indistinguishable
to the caller).

Approved by:	bz (mentor)
2009-04-30 22:43:21 +00:00
zec
39b6dc8ba2 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
jeff
9ff631ca46 - Fix non-SMP build by encapsulating idle spin logic in a macro.
Pointy hat to:	me
2009-04-29 23:04:31 +00:00
jamie
8fbb51e637 Regen for new jail system calls in r191673.
Approved by:	bz (mentor)
2009-04-29 21:50:13 +00:00
jamie
453b86f943 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
bms
32a71137f0 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
jamie
51a4d1c4a3 Some non-functional changes: whitespace, KASSERT strings, declaration order.
Approved by:	bz (mentor)
2009-04-29 18:41:08 +00:00
jeff
fe5d856f47 - Fix the FBSDID line. 2009-04-29 03:26:30 +00:00
jeff
88a1cd92bb - Remove the bogus idle thread state code. This may have a race in it
and it only optimized out an ipi or mwait in very few cases.
 - Skip the adaptive idle code when running on SMT or HTT cores.  This
   just wastes cpu time that could be used on a busy thread on the same
   core.
 - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive.  Re-use
   CG_FLAG_THREAD to mean SMT or HTT.

Sponsored by:   Nokia
2009-04-29 03:15:43 +00:00
bz
1507f5bd4d Prevent a superuser inside a jail from modifying the dedicated
root cpuset of that jail.
Processes inside the jail will still be able to change child sets.
A superuser outside of a jail will still be able to change the jail cpuset
and thus limit the number of cpus available to the jail.

Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman)
PR:		kern/134050
Reviewed by:	jeff
MFC after:	3 weeks
X-MFC:		backout r191596
2009-04-28 21:00:50 +00:00
rwatson
fb89496678 Improve approximation of style(9). 2009-04-26 21:16:03 +00:00