Commit Graph

114 Commits

Author SHA1 Message Date
jeff
4ec9caf00c Refactor select to reduce contention and hide internal implementation
details from consumers.

 - Track individual selecters on a per-descriptor basis such that there
   are no longer collisions and after sleeping for events only those
   descriptors which triggered events must be rescaned.
 - Protect the selinfo (per descriptor) structure with a mtx pool mutex.
   mtx pool mutexes were chosen to preserve api compatibility with
   existing code which does nothing but bzero() to setup selinfo
   structures.
 - Use a per-thread wait channel rather than a global wait channel.
 - Hide select implementation details in a seltd structure which is
   opaque to the rest of the kernel.
 - Provide a 'selsocket' interface for those kernel consumers who wish to
   select on a socket when they have no fd so they no longer have to
   be aware of select implementation details.

Tested by:	kris
Reviewed on:	arch
2007-12-16 06:21:20 +00:00
julian
51d643caa6 Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
avatar
c5a4c40ab2 Fixing the mount_smbfs(8) hanging by utilising the destroy_dev_sched() KPI.
Relevant threads:

  http://lists.freebsd.org/pipermail/freebsd-current/2007-June/074329.html

Reviewed by:	kib, bp (slightly different version)
Tested by:	Yuri Pankov <yuri.pankov at gmail dot com>,
		Jiawei Ye <leafy7382 at gmail dot com>
Approved by:	re (kensmith)
2007-07-10 09:23:10 +00:00
mjacob
2e4e9b90cb Initialize some variables that GCC4.2 thinks might possibly be used without
being initialized.
2007-06-15 23:49:54 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
rwatson
765a83fd79 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
avatar
c9f2d4f91b Backing out the wrong fix which could possibly trash the memory if devfs
tries to drop the reference count after our close routine returns.

A more correct fix is to defer the destroy_dev() to a taskqueue(either
in devfs or locally).

Reminded by:	jhb
2007-02-09 17:22:10 +00:00
avatar
7f38d2a60b It turns out that devfs_close() does a dev_refthread() before invoking
device specific d_close(), which makes subsequent destroy_dev() being
blocked in the "devdrn" loop.

This bandaid should fix the smbfs hang/crashing observed on -CURRENT since
the introduction of sys/kern/kern_conf.c:1.199:

 	# mount_smbfs -I server //server/share /mnt
 	Password:
 	[hang]

Reviewed by:	bp
See also:	http://lists.freebsd.org/pipermail/cvs-src/2006-November/071379.html
2007-02-09 02:54:13 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
bp
499769a943 It seems to be safe to ignore 'file not locked' error
from server.  This effectively suppresses 'Unmapped error 1:158'.

MFC after:	1 month
2006-11-05 06:31:08 +00:00
marcel
6723c51456 Fix misalignment bugs caused by invalid type casts of pointers
returned by md_reserve(). Space reserved by mb_reserve() is
byte aligned and need to be used in conjunction with le16enc()
and le32enc().

Tested on: ia64
2006-08-22 03:05:51 +00:00
jhb
0daea30185 - Fix ncp_poll() to not panic if the socket doesn't have any pending data.
We have to adjust curthread's state enough so that it appears to be
  in a poll(2) or select(2) call so that selrecord() will work and then
  teardown that state after calling sopoll().
- Fix some minor nits in nearby ncp_sock_rselect() and in the identical
  nbssn_rselect() function in the netsmb code:
  - Don't call nb_poll()/ncp_poll() now that ncp_poll() already fakes up
    poll(2) state since the rselect() functions already do that.  Just
    invoke sopoll() directly.
  - To make things slightly more intuitive, store the results of sopoll()
    in a new 'revents' variable rather than 'error' since that's what
    sopoll() actually returns.
  - If the requested timeout time has been exceeded by the time we get
    ready to block, then return EWOULDBLOCK rather than 0 to signal a
    timeout as this is what the calling code expects.

Tested by:	Eric Christeson <eric.j.christeson AT gmail> (1)
MFC after:	1 week
2006-08-03 15:31:52 +00:00
rwatson
40868fda8a soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
jhb
5d50738911 Always lock the lockmgr lock when creating an smb connection object rather
than only locking it if INVARIANTS is enabled.  All the callers expect
smb_co_init() to return with the lock held.

Tested by:	"Jiawei Ye" <leafy7382 at gmail>
2006-07-17 16:12:59 +00:00
yar
66715ad5a3 Retire NETSMBCRYPTO as a kernel option and make its functionality
enabled by default in NETSMB and smbfs.ko.

With the most of modern SMB providers requiring encryption by
default, there is little sense left in keeping the crypto part
of NETSMB optional at the build time.

This will also return smbfs.ko to its former properties users
are rather accustomed to.

Discussed with:		freebsd-stable, re (scottl)
Not objected by:	bp, tjr (silence)
MFC after:		5 days
2006-03-05 22:52:17 +00:00
csjp
be2af71ad1 Although we check the return value of copyin(9) while determaining how
long the string is in userspace, afterwards we call malloc(M_WAITOK),
which could sleep for an unknown amount of time. Check the return
value of copyin(9) just to be sure that nothing has changed during that
time.

Found with:	Coverity Prevent (tm)
MFC after:	1 week
2006-01-16 17:03:21 +00:00
bp
9032fdcdd0 Prevent module unloading if there are active connections.
PR:		kern/89085
Submitted by:	Rostislav Krasny
MFC after:	1 week
2005-11-22 02:15:46 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
bp
0d80e85872 Allow user to override default port numbers used by communication
protocols.  This is very useful for tunneled SMB connections.

MFC after:	4 weeks
2005-10-02 08:32:49 +00:00
rwatson
daa1c89f45 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
imura
3c148b71eb Change API of mb_copy_t in libmchain so that netsmb can handle
multibyte character share name correctly.

Reviewed by:	bp
2005-07-29 13:22:37 +00:00
peadar
67a392cd25 lockmgr(...,LK_DRAIN,...) requires a balancing LK_RELEASE: recent
INVARIANTS dependent checks in userret() pinpointed a missing
invocation here.

Remove an unused variable while here.

Reviewed By: bp@
Reported By: yongari@
MFC After: 3 days
2005-05-13 11:27:48 +00:00
phk
7af1e31761 Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
imp
a50ffc2912 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
phk
28bd6b5898 Don't use vn_todev(). 2004-11-10 07:16:59 +00:00
kan
3140931e1f Avoid casts as lvalues. 2004-07-28 06:59:55 +00:00
rwatson
855c4bb01f Merge additional socket buffer locking from rwatson_netperf:
- Lock down low hanging fruit use of sb_flags with socket buffer
  lock.

- Lock down low hanging fruit use of so_state with socket lock.

- Lock down low hanging fruit use of so_options.

- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
  socket buffer lock.

- Annotate situations in which we unlock the socket lock and then
  grab the receive socket buffer lock, which are currently actually
  the same lock.  Depending on how we want to play our cards, we
  may want to coallesce these lock uses to reduce overhead.

- Convert a if()->panic() into a KASSERT relating to so_state in
  soaccept().

- Remove a number of splnet()/splx() references.

More complex merging of socket and socket buffer locking to
follow.
2004-06-17 22:48:11 +00:00
phk
40dd98a3bd Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
rwatson
f2c0db1521 The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
phk
f43aa0c4bc add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
rwatson
b0b5f961bd Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by:	sam
2004-03-01 03:14:23 +00:00
truckman
1de257deb3 Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
phk
ad925439e0 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
tjr
28782c7efc Use automatic major number allocation for nsmb devices. 2004-02-11 12:49:49 +00:00
tjr
622d036645 Add support for SMB request signing, which prevents "man in the middle"
attacks and is required to connect to Windows 2003 servers in their
default configuration. This adds an extra field to the SMB header
containing the truncated 64-bit MD5 digest of a key (a function of the
user's password and the server's authentication challenge), an implicit
sequence number, and the message data itself. As signing each message
imposes a significant performance penalty, we only enable it if the
server will not let us connect without it; this should eventually become
an option to mount_smbfs.
2004-01-02 22:38:42 +00:00
fjoe
571ef024e3 - Support for multibyte charsets in LIBICONV.
- CD9660_ICONV, NTFS_ICONV and MSDOSFS_ICONV kernel options
(with corresponding modules).
- kiconv(3) for loadable charset conversion tables support.

Submitted by:	Ryuichiro Imura <imura@ryu16.org>
2003-09-26 20:26:25 +00:00
marcel
5d4c069fc5 Rewrite the code that uses the try/catch paradigm implemented by
goto and abstracted by the itry, ithrow and icatch macros (among
others). The problem with this code is that it doesn't compile on
ia64. The compiler is sufficiently confused that it inserts a call
to __ia64_save_stack_nonlock(). This is a magic function that saves
enough of the stack to allow for non-local gotos, such as would be
the case for nested functions. Since it's not a compiler defined
function, it needs a runtime implementation. This we have not in a
standalone compilation as is the kernel.

There's no indication that the compiler is not confused on other
platforms. It's likely that saving the stack in those cases is
trivial enough that the compiler doesn't need to off-load the
complexity to a runtime function.

The code is believed to be correctly translated, but has not been
tested. The overall structure remained the same, except that it's
made explicit. The macros that implement the try/catch construct
have been removed to avoid reintroduction of their use. It's not
a good idea.

In general the rewritten code is slightly more optimal in that it
doesn't need as much stack space and generally is smaller in size.

Found by: LINT
2003-08-23 21:43:33 +00:00
tjr
e8459951b2 Reserve space for the trailing null byte in the srvname member of
struct smb_vc_info.

PR:		46902
2003-07-27 11:36:00 +00:00
peter
44b5ea3111 size_t != int. Make this compile on 64 bit platforms (eg: amd64).
Also, "u_short value; if (value > 0xffff)" can never be true.
2003-07-24 01:59:18 +00:00
phk
c81c59299b Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
tjr
5155733cbd Avoid dereferencing the thread pointer in smb_iod_addrq() if it's NULL.
Fixes mdconfig -t vnode on smbfs: mdsetcred()'s "horrible kludge"
calls into smbfs VOP_READ with a NULL uio_td.
2003-06-14 15:45:34 +00:00
obrien
8b64eb1925 Use __FBSDID(). 2003-06-11 05:37:42 +00:00
jhb
89a4eb17de - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
jeff
46e6ba39f1 - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
tjr
96122ae0b7 Remove fragments of support for the FreeBSD 3.x and 4.x branches. 2003-03-06 10:38:18 +00:00
peter
fbc7526e8f Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
2003-03-05 19:24:24 +00:00
phk
0ae911eb0e Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
tjr
5b9d37f7e4 Use noread(), nowrite() and nopoll() instead of our own stub functions. 2003-02-27 14:35:21 +00:00
phk
89fec4c203 NODEVFS cleanup: Don't call cdevsw_{add,remove}() 2003-02-26 21:04:51 +00:00