Commit Graph

11106 Commits

Author SHA1 Message Date
Marko Zec
e0c14af9b3 Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another.  Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz, rwatson, brooks?
Approved by:	julian (mentor)
2009-05-22 22:09:00 +00:00
Edward Tomasz Napierala
ae1add4e55 Make 'struct acl' larger, as required to support NFSv4 ACLs. Provide
compatibility interfaces in both kernel and libc.

Reviewed by:	rwatson
2009-05-22 15:56:43 +00:00
Ed Schouten
52f542a8e4 Enable secure TTY input buffer flushing by default.
I'm leaving the sysctl there. If people really notice a slowdown, they
can revert to the old behaviour.

Discussed with:	kib
2009-05-21 16:48:06 +00:00
Ed Schouten
770c15f60f Add a new sysctl: kern.tty_inq_flush_secure.
When enabled all TTY input queue buffers are zeroed when flushing or
closing the TTY. Because TTY input queues are also used to store filled
in passwords, this may be an interesting switch to enable for security
minded people.
2009-05-21 16:19:54 +00:00
John Baldwin
d422da9a0a Only use the ABI compat shim for vfs.bufspace if the old buffer is smaller
than a long.

PR:		amd64/134786
Submitted by:	Emil Mikulic  emikulic| gmail
MFC after:	3 days
2009-05-21 16:18:45 +00:00
Attilio Rao
9995e57b01 Move the M_WAITOK flag in notify() into an M_NOWAIT one in order to match
the behaviour alredy present with the further malloc() call in
devctl_notify().
This fixes a bug in the CAM layer where the camisr handler finished to
call camperiphfree() (and subsequently destroy_dev() resulting in a new
dev notify) while the xpt lock is held.

PR:		kern/130330
Tested by:	Riccardo Torrini <riccardo dot torrini at esaote dot com>
2009-05-21 13:22:07 +00:00
John Baldwin
6ca33ea345 Set the umask in a new file descriptor table earlier in fdcopy() to remove
two lock operations.
2009-05-20 18:42:04 +00:00
John Baldwin
583220dc4c Remove an obsolete assertion. We always wake up all waiters when unlocking
a mutex and never set the lock cookie == MTX_CONTESTED.
2009-05-20 18:29:14 +00:00
John Baldwin
4ab9c8af92 Fix a typo. 2009-05-20 17:19:30 +00:00
Warner Losh
248343f9d1 We no longer need to use d_thread_t for portability here, switch to
struct thread *.
2009-05-20 16:58:16 +00:00
Kip Macy
126f8425c3 Add minimal ZFS lock hierarchy 2009-05-20 02:51:48 +00:00
Robert Watson
56a3c6d4a7 With SMPng, DEVICE_POLLING uses its own idle threads, rather than the
system idle loop, to run ether_poll(), so make ether_poll() static.

MFC after:	1 week
2009-05-19 19:21:25 +00:00
Andriy Gapon
51ca6cd6df sysctl_rman: report shared resources to devinfo
shared uses of a resource are recorded on a sub-list hanging off
a main resource object on a main resource list;
without this change a shared resource (e.g. irq) is reported only
once by devinfo -r/-u;
with this change the resource is reported for each driver that
allocates it (which is even more than what vmstat -i -a reports).

Approved by:	jhb (mentor)
2009-05-19 14:08:21 +00:00
Robert Watson
e84bcd8494 Binding interrupts to a CPU consists of two parts: setting up CPU
affinity for the interrupt thread, and requesting that underlying
hardware direct interrupts to the CPU.  For software interrupt
threads, implement a no-op interrupt event binder that returns
success, so that the interrupt management code will just set the
ithread's affinity and succeed.

Reviewed by:	jhb
MFC after:	1 week
2009-05-18 14:02:55 +00:00
Ed Schouten
c383c2211b Mark the clock sysctls as MPSAFE.
These sysctls don't need any form of locking. At least cp_times is used
by powerd very often, which means I get 50% less calls to non-MPSAFE
sysctls on my system. The other 50% is consumed by dev.cpu.0.freq, but
this seems to need Giant for Newbus.
2009-05-18 12:03:43 +00:00
Alan Cox
1be5269359 Several changes to vfs_bio_clrbuf():
Provide a more descriptive comment.

Eliminate dead code.  The page cannot possibly have PG_ZERO set.

Eliminate unnecessary blank lines.

Reviewed by:	tegge
2009-05-17 23:25:53 +00:00
Alan Cox
6e5982caf7 Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This
eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg().

In collaboration with:	tegge
2009-05-17 20:26:00 +00:00
Ed Schouten
379affd5cb Print an extra newline when not at the first column already.
This makes siginfo output look a lot better when pressing it the first
time when in sh(1), for example:

	$ load: 0.00  cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k
	load: 0.00  cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k

will now become:

	$
	load: 0.00  cmd: sh 1945 [ttyin] 3.94r 0.00u 0.00s 0% 1960k
	load: 0.00  cmd: sh 1945 [ttyin] 4.19r 0.00u 0.00s 0% 1960k
2009-05-17 16:17:48 +00:00
Ed Schouten
dd970f41f7 Several cleanups to tty_info(), better known as Ctrl-T.
- Only pick up PROC_LOCK once, which means we can drop the PGRP_LOCK
  right after picking up PROC_LOCK for the first time.

- Print the process real time, making it consistent with tools like
  time(1).

- Use `p' and `td' to reference the process/thread we are going to
  print. Only use pick-variables inside the loops. We already did this
  for the threads, but not the processes.
2009-05-17 12:30:25 +00:00
Dag-Erling Smørgrav
433e2f4763 Remove do-nothing code that was required to dirty the old buffer on Alpha.
Coverity ID:	838
Approved by:	jhb, alc
2009-05-15 21:34:58 +00:00
Konstantin Belousov
6b72d8db47 Revert r192094. The revision caused problems for sysctl(3) consumers
that expect that oldlen is filled with required buffer length even when
supplied buffer is too short and returned error is ENOMEM.

Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when
remaining buffer space is too short for the current kinfo_file structure.
Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition
to keep existing interface for the sysctl, though.

Reported by:	ed, Florian Smeets <flo kasimir com>
Tested by:	pho
2009-05-15 14:41:44 +00:00
John Baldwin
3e829b18d6 - Use a separate sx lock to try to limit the number of concurrent userland
sysctl requests to avoid wiring too much user memory.  Only grab this
  lock if the user's old buffer is larger than a page as a tradeoff to
  allow more concurrency for common small requests.
- Just use a shared lock on the sysctl tree for user sysctl requests now.

MFC after:	1 week
2009-05-14 22:01:32 +00:00
Konstantin Belousov
e401a6a54e Do not advance req->oldidx when sysctl_old_user returning an
error due to copyout failure or short buffer.

The later breaks the usermode iterators of the sysctl results that pack
arbitrary number of variable-sized structures. Iterator expects that
kernel filled exactly oldlen bytes, and tries to interpret half-filled
or garbage structure at the end of the buffer. In particular,
kinfo_getfile(3) segfaulted.

Reported and tested by:	pho
MFC after:	3 weeks
2009-05-14 10:54:57 +00:00
Jeff Roberson
bf422e5f27 - Implement a lockless file descriptor lookup algorithm in
fget_unlocked().
 - Save old file descriptor tables created on expansion until
   the entire descriptor table is freed so that pointers may be
   followed without regard for expanders.
 - Mark the file zone as NOFREE so we may attempt to reference
   potentially freed files.
 - Convert several fget_locked() users to fget_unlocked().  This
   requires us to manage reference counts explicitly but reduces
   locking overhead in the common case.
2009-05-14 03:24:22 +00:00
Alan Cox
1c1b26f276 Eliminate page queues locking from bufdone_finish() through the
following changes:

Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does.  Suggested by: tegge

Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies.  Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.

Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.

Introduce vm_page_set_valid().

Reviewed by:	tegge
2009-05-13 05:39:39 +00:00
Edward Tomasz Napierala
e5023dd9f6 Add missing 'break' statement.
Found with:	Coverity Prevent(tm)
CID:		3919
2009-05-12 17:05:40 +00:00
Konstantin Belousov
3b616faed5 Prevent overflow of uio_resid.
Noted by:	jhb
MFC after:	3 days
2009-05-11 19:58:03 +00:00
Attilio Rao
22d7ae67d4 Fix a kernel compilation error, introduced after r191990, by defining
thread with curthread in the AUDIT case.

Reported by:	dchagin
2009-05-11 16:32:58 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Alan Cox
c3d3fe6314 Revert CVS revision 1.94 (svn r16840). Current pmap implementations don't
suffer from the race condition that motivated revision 1.94.  Consequently,
the work-around that was implemented by revision 1.94 is no longer needed.
Moreover, reverting this work-around eliminates the need for
vfs_busy_pages() to acquire the page queues lock when preparing a buffer
for read.

Reviewed by:	tegge
2009-05-11 05:16:57 +00:00
Warner Losh
7cddab635b Spell NULL properly, use (void) rather than () for functions with no
parameters.  Mark two items as static that aren't used elsewhere...
2009-05-09 19:08:22 +00:00
Warner Losh
e678f09a15 Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so
it can go.
2009-05-09 19:00:47 +00:00
Alexander Kabaev
5679fe1957 Do not embed struct ucred into larger netcred parent structures.
Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.

While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.

Reported by:	John Hickey
PR:		kern/133439
Reviewed by:	dfr, kib
2009-05-09 18:09:17 +00:00
Marko Zec
2114e063f0 A NOP change: style / whitespace cleanup of the noise that slipped
into r191816.

Spotted by:	bz
Approved by:	julian (mentor) (an earlier version of the diff)
2009-05-08 14:34:25 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Konstantin Belousov
41b72e6e50 Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If
vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy().

Suggested and reviewed by:	jeff
Tested by:	pho
MFC after:	3 weeks
2009-05-07 18:14:21 +00:00
Ed Schouten
14358b0fec If we have a regular rint handler, never go into rint_bypass mode.
It turns out if we called cfmakeraw() on a TTY with only a rint handler
in place, it could inject data into the TTY, even though it should be
redirected. Always take a look at the hooks before looking at the
termios flags.
2009-05-07 17:39:23 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Jamie Gritton
49939083a0 Add a constant PR_MAXMETHOD to better define the jail/OSD interface.
Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-05 05:49:08 +00:00
Ed Schouten
3382ac3233 Remove unneeded check for SESS_LEADER().
We perform the same check ~10 lines above.
2009-05-04 11:11:10 +00:00
Jamie Gritton
3dd4fac97c Don't call the OSD destructor if the data slot is NULL
(since it's already not done on unused slots, which are indistinguishable
to the caller).

Approved by:	bz (mentor)
2009-04-30 22:43:21 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Jeff Roberson
09c8a4cc21 - Fix non-SMP build by encapsulating idle spin logic in a macro.
Pointy hat to:	me
2009-04-29 23:04:31 +00:00
Jamie Gritton
fe2f3c651f Regen for new jail system calls in r191673.
Approved by:	bz (mentor)
2009-04-29 21:50:13 +00:00
Jamie Gritton
b38ff370e4 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
Bruce M Simpson
33cde13046 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
Jamie Gritton
af7bd9a4f4 Some non-functional changes: whitespace, KASSERT strings, declaration order.
Approved by:	bz (mentor)
2009-04-29 18:41:08 +00:00
Jeff Roberson
113dda8a7c - Fix the FBSDID line. 2009-04-29 03:26:30 +00:00
Jeff Roberson
7b55ab0534 - Remove the bogus idle thread state code. This may have a race in it
and it only optimized out an ipi or mwait in very few cases.
 - Skip the adaptive idle code when running on SMT or HTT cores.  This
   just wastes cpu time that could be used on a busy thread on the same
   core.
 - Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive.  Re-use
   CG_FLAG_THREAD to mean SMT or HTT.

Sponsored by:   Nokia
2009-04-29 03:15:43 +00:00