Commit Graph

5577 Commits

Author SHA1 Message Date
John Baldwin
028923e54d - Use PARTIAL_PICKUP_GIANT() to implement PICKUP_GIANT().
- Move UGAR() macro up to the comment that describes it.
- Fix a couple of typos.
2007-03-30 18:10:08 +00:00
John Baldwin
ab2dab1680 - Use lock_init/lock_destroy() to setup the lock_object inside of lockmgr.
We can now use LOCK_CLASS() as a stronger check in lockmgr_chain() as a
  result.  This required putting back lk_flags as lockmgr's use of flags
  conflicted with other flags in lo_flags otherwise.
- Tweak 'show lock' output for lockmgr to match sx, rw, and mtx.
2007-03-30 18:07:24 +00:00
Konstantin Belousov
c146055fad Extend rev. 1.210 to avoid dereference NULL mp in VFS_NEEDSGIANT and
VFS_ASSERT_GIANT. Stop using reserved namespace.

Reported and tested by:		kris
Reviewed and enhanced by:	tegge
MFC after:	1 week
2007-03-29 08:21:09 +00:00
John Baldwin
8b4b92d2f6 Fix a comment grammar nit. 2007-03-27 15:09:10 +00:00
Nate Lawson
54c2673813 Bump FreeBSD version for inclusion of CPU frequency change notifiers. 2007-03-26 18:04:41 +00:00
Nate Lawson
0d4ac62a35 Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change.  cpufreq_post_change provides the results of the
change (success or failure).  cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed.  Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value.  When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq.  This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by:	bde, phk
MFC after:	1 month
2007-03-26 18:03:29 +00:00
Robert Watson
92bf861a71 General style cleanup.
Correct spelling errors.

Remove references to M_COPY_PKTHDR -- it was deprecated in 6.x and is not
used (or defined) in our tree.
2007-03-24 20:19:44 +00:00
John Baldwin
8f27b08e87 Rename the cv_*wait*() functions to _cv_*wait*() and change their second
argument from a mutex to a lock_object.  Add cv_*wait*() wrapper macros
that accept either a mutex, rwlock, or sx lock as the second argument and
convert it to a lock_object and then call _cv_*wait*().  Basically, the
visible difference is that you can now use rwlocks and sx locks with
condition variables using the same API as with mutexes.
2007-03-21 22:22:13 +00:00
John Baldwin
73de183262 Make use of 'lock_object' being the same field name in the witness_check*()
macros.
- witness_check() replaces witness_check_mtx() and
  witness_check_exclusive_sx() and checks for an exclusive acquire of
  either a mutex, rwlock, or sx lock.
- witness_check_shared() replaces witness_check_shared_sx() and checks for
  a shared acquire of either a rwlock or sx lock.
2007-03-21 22:18:10 +00:00
John Baldwin
aa89d8cd52 Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.
2007-03-21 21:20:51 +00:00
Andre Oppermann
ddca17a686 Space to tab in SB_* defines to match with rest of file. 2007-03-19 18:40:31 +00:00
Andre Oppermann
4e02375908 Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.

Adjust tcp_output() make use of it.

Tested by:	gallatin
2007-03-19 18:35:13 +00:00
Ken Smith
97fa781515 Bump __FreeBSD_version after changes to how insmntque(), getnewvnode(), and
vfs_hash_insert() work.
2007-03-19 00:19:35 +00:00
Robert Watson
985267d15e Revert/re-make previous commit in a manner that maintains hyphenation of
extended attributes.  I'm not sure I like it, but it is grammatically more
correct.

Requested by:	mckusick
2007-03-16 19:18:49 +00:00
Robert Watson
aec2fc247a Minor white space tweaks in comments. 2007-03-16 13:39:04 +00:00
Robert Watson
eef1df45d3 Update a comment: Rather than suggesting suser(), suggest priv(9) for
checking privilege.
2007-03-14 19:52:19 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Alan Cox
c640357f04 Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
John Baldwin
e7573e7ad7 Allow threads to atomically release rw and sx locks while waiting for an
event.  Locking primitives that support this (mtx, rw, and sx) now each
include their own foo_sleep() routine.
- Rename msleep() to _sleep() and change it's 'struct mtx' object to a
  'struct lock_object' pointer.  _sleep() uses the recently added
  lc_unlock() and lc_lock() function pointers for the lock class of the
  specified lock to release the lock while the thread is suspended.
- Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks
  (rw_sleep()), and sx locks (sx_sleep()).  msleep() still exists and
  is now identical to mtx_sleep(), but it is deprecated.
- Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP.
- Rewrite much of sleep.9 to not be msleep(9) centric.
- Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS'
  section.
- Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will
  warn if you try to pass a NULL wait channel.  The functions already have
  a KASSERT to that effect.
2007-03-09 22:41:01 +00:00
John Baldwin
6e21afd40c Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.
These functions are intended to be used to drop a lock and then reacquire
it when doing an sleep such as msleep(9).  Both functions accept a
'struct lock_object *' as their first parameter.  The 'lc_unlock' function
returns an integer that is then passed as the second paramter to the
subsequent 'lc_lock' function.  This can be used to communicate state.
For example, sx locks and rwlocks use this to indicate if the lock was
share/read locked vs exclusive/write locked.

Currently, spin mutexes and lockmgr locks do not provide working lc_lock
and lc_unlock functions.
2007-03-09 16:27:11 +00:00
Rong-En Fan
03c506ac80 Bump __FreeBSD_version for ncurses wide character support
Approved by:	delphij (mentor, implicit)
2007-03-09 12:12:55 +00:00
Mohan Srinivasan
f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
Julian Elischer
486a941418 Instead of doing comparisons using the pcpu area to see if
a thread is an idle thread, just see if it has the IDLETD
flag set. That flag will probably move to the pflags word
as it's permenent and never chenges for the life of the
system so it doesn't need locking.
2007-03-08 06:44:34 +00:00
John Baldwin
224a2f3144 Wrap a few lines at 80 cols. 2007-03-07 20:46:04 +00:00
Kirk McKusick
a9093e846d Move macros describing extended attributes in UFS from
<sys/extattr.h> to <ufs/ufs/extattr.h>. Move description
of extended attributes in UFS from man9/extattr.9 to
man5/fs.5.

Note that restore will not compile until <sys/extattr.h>
and <ufs/ufs/extattr.h> have been updated.

Suggested by:	Robert Watson
2007-03-06 08:13:21 +00:00
Florent Thoumie
7bd6fde395 - Add Intel firmwares for Intel PRO/Wireless LAN 2100/2200/2915 cards in a
uuencoded format along with their respective LICENSE files.
- Add new share/doc/legal directory to BSD.usr.dist mtree file. This is the
place we install LICENSE files for restricted firmwares.
- Teach firmware(9) and kmod.mk about licensed firmwares. Restricted firmwares
won't load properly unless legal.<name>.license_ack is set to 1, either
via kenv(1) or /boot/loader.conf.

Reviewed by:	mlaier, sam
Permitted by:	Intel (via Andrew Wilson)
MFC after:	1 month
2007-03-02 11:42:56 +00:00
Pawel Jakub Dawidek
bb531912ff Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by:	rwatson
2007-03-01 20:47:42 +00:00
Bruce M Simpson
a96ef0cd9a Introduce a new mbuf flag, M_PROMISC.
An mbuf packet chain with the M_PROMISC flag set contains a unicast packet
received by the link layer, which does not correspond to any configured
link layer address in the local system.

It is copied when copying m_pkthdr. It is not cleared when crossing layers.
As such, it is defined to have a flag value which is outside of the
M_PROTO* range, like M_VLANTAG has.

Reviewed by:	andre
Obtained from:	NetBSD
2007-03-01 14:38:08 +00:00
Thomas Quinot
f59aa46799 Minor reformatting. 2007-02-28 16:51:52 +00:00
Pawel Jakub Dawidek
1d1f5f8560 Add a comment for PRIV_NET_SETLLADDR.
OK'ed by:	rwatson
2007-02-27 23:38:58 +00:00
Kirk McKusick
34505cb376 KASSERT fails when the condition is false, not when it is true. 2007-02-27 07:34:28 +00:00
Kip Macy
f183910b97 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
Pawel Jakub Dawidek
3ad48efa4a Replace spaces with tabs in some places. 2007-02-27 01:48:58 +00:00
Robert Watson
e7c33e29ed Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
Robert Watson
8525230afd Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Kirk McKusick
fdece2b8f7 Declare a `struct extattr' that defines the format of an extended
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.

The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eap, u_char **eac)
{
	u_char *p, *pe, *pn, *p0;
	int eapad1, eapad2, ealength, ealen, nlen;
	uint32_t ul;

	pe = ptr + length;
	nlen = strlen(name);

	for (p = ptr; p < pe; p = pn) {
		p0 = p;
		bcopy(p, &ul, sizeof(ul));
		pn = p + ul;
		/* make sure this entry is complete */
		if (pn > pe)
			break;
		p += sizeof(uint32_t);
		if (*p != nspace)
			continue;
		p++;
		eapad2 = *p++;
		if (*p != nlen)
			continue;
		p++;
		if (bcmp(p, name, nlen))
			continue;
		ealength = sizeof(uint32_t) + 3 + nlen;
		eapad1 = 8 - (ealength % 8);
		if (eapad1 == 8)
			eapad1 = 0;
		ealength += eapad1;
		ealen = ul - ealength - eapad2;
		p += nlen + eapad1;
		if (eap != NULL)
			*eap = p0;
		if (eac != NULL)
			*eac = p;
		return (ealen);
	}
	return(-1);
}

After applying the structure and macros, it would look like this:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eapp, u_char **eac)
{
	struct extattr *eap, *eaend;

	eaend = (struct extattr *)(ptr + length);
	for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
		/* make sure this entry is complete */
		if (EXTATTR_NEXT(eap) > eaend)
			break;
		if (eap->ea_namespace != nspace ||
		    eap->ea_namelength != length ||
		    bcmp(eap->ea_name, name, length))
			continue;
		if (eapp != NULL)
			*eapp = eap;
		if (eac != NULL)
			*eac = EXTATTR_CONTENT(eap);
		return (EXTATTR_CONTENT_SIZE(eap));
	}
	return(-1);
}

Not only is it considerably shorter, but it hopefully more readable :-)
2007-02-26 06:18:53 +00:00
John Baldwin
37e80fcac2 Add a new kernel sleep function pause(9). pause(9) is for places that
want an equivalent of DELAY(9) that sleeps instead of spins.  It accepts
a wmesg and a timeout and is not interrupted by signals.  It uses a private
wait channel that should never be woken up by wakeup(9) or wakeup_one(9).

Glanced at by:	phk
2007-02-23 16:22:09 +00:00
Paolo Pisati
80b7fd0f47 Bump __FreeBSD_version after newbus api modification.
Reviewed by: re@
Approved by: re@
2007-02-23 16:21:08 +00:00
Paolo Pisati
ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Robert Watson
7ee76f9d4e Remove unnecessary privilege and privilege check for WITNESS sysctl.
Head nod:	jhb
2007-02-20 23:49:31 +00:00
Robert Watson
3bb153ea78 Remove discontinuity in network privilege number space.
Spotted by:	emaste (ages ago)
2007-02-20 00:28:19 +00:00
Robert Watson
95420afea4 Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege. 2007-02-20 00:12:52 +00:00
Robert Watson
95b091d2f2 Rename three quota privileges from the UFS privilege namespace to the
VFS privilege namespace: exceedquota, getquota, and setquota.  Leave
UFS-specific quota configuration privileges in the UFS name space.

This renumbers VFS and UFS privileges, so requires rebuilding modules
if you are using security policies aware of privilege identifiers.
This is likely no one at this point since none of the committed MAC
policies use the privilege checks.
2007-02-19 13:33:10 +00:00
Pawel Jakub Dawidek
2c7b0f41ec Remove VFS_VPTOFH entirely. API is already broken and it is good time to
do it.

Suggested by:	rwatson
2007-02-16 17:32:41 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
Luigi Rizzo
33d5497079 Cleanup and document the implementation of firmware(9) based on
a version that i posted earlier on the -current mailing list,
and subsequent feedback received.

The core of the change is just in sys/firmware.h and kern/subr_firmware.c,
while other files are just adaptation of the clients to the ABI change
(const-ification of some parameters and hiding of internal info,
so this is fully compatible at the binary level).

In detail:
- reduce the amount of information exported to clients in struct firmware,
  and constify the pointer;

- internally, document and simplify the implementation of the various
  functions, and make sure error conditions are dealt with properly.

The diffs are large, but the code is really straightforward now (i hope).

Note also that there is a subtle issue with the implementation of
firmware_register(): currently, as in the previous version, we just
store a reference to the 'imagename' argument, but we should rather
copy it because there is no guarantee that this is a static string.
I realised this while testing this code, but i prefer to fix it in
a later commit -- there is no regression with respect to the past.

Note, too, that the version in RELENG_6 has various bugs including
missing locks around the module release calls, mishandling of modules
loaded by /boot/loader, and so on, so an MFC is absolutely necessary
there.  I was just postponing it until this cleanup to avoid doing
things twice.

MFC after: 1 week
2007-02-15 17:21:31 +00:00
Colin Percival
9c518c234b Optimize bitcount32 by replacing 6 logical operations with 2. The key
observation here is that it doesn't matter what garbage accumulates in
bits which we're going to end up masking away anyway, as long as the
garbage doesn't overflow into bits which we care about.

This improved version may not be the fastest possible on all systems,
but it's certainly going to be better than what was here before.
2007-02-14 05:21:22 +00:00
Jeff Roberson
ed0e8f2fe9 - Change types for necent runq additions to u_char rather than int.
- Fix these types in ULE as well.  This fixes bugs in priority index
   calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.

Reported by:	kkenn using pho's stress2.
2007-02-08 01:52:25 +00:00
Marcel Moolenaar
1d3aed33e8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00