Commit Graph

43 Commits

Author SHA1 Message Date
davide
210fb477ab Call make_dev_credf() rather than using the couple make_dev()/dev_ref().
This closes a race with clone_cleanup().
2013-03-09 16:58:19 +00:00
pjd
f07ebb8888 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
davide
793cdde76e Fix panic due to page faults while in kernel mode, under conditions of
VM pressure. The reason is that in some codepaths pointers to stack
variables were passed from one thread to another.

In collaboration with:	pho
Reported by:	pho's stress2 suite
Sponsored by:	iXsystems inc.
2012-10-31 03:34:07 +00:00
pjd
0ca632f7e9 When checking if file descriptor number is valid, explicitely check for 'fd'
being less than 0 instead of using cast-to-unsigned hack.

Today's commit was brought to you by the letters 'B', 'D' and 'E' :)
2012-06-13 22:12:10 +00:00
joel
e0860804e3 Switch to our preferred 2-clause BSD license.
Approved by:	bp
2010-04-07 16:50:38 +00:00
n_hibma
532c339625 Don't print out a message on loading a module. 'kldload -v' and 'kldstat
| grep <modname>' can be used instead.

Put a message behind bootverbose as

	ichwd0: <Intel ICH6M watchdog timer> on isa0
	ichwd0: Intel ICH6M watchdog timer (ICH6 or equivalent)

does not make a lot of sense.

MFC after:	1 week
2009-09-10 18:33:08 +00:00
rwatson
8b15685a63 Implement device cloning for /dev/nsmb, the netsmb control pseudo-device.
The smb library in userspace already knows how to deal with this type of
cloning.

This also corrects a leak in which the netsmb kernel module could not be
unloaded if device nodes had been stat'd but not open'd.

Discussed with:	kib
2008-11-03 14:23:15 +00:00
rwatson
bd5c63c83e Catch up with netsmb locking: explicit thread arguments no longer required. 2008-11-02 23:20:27 +00:00
ed
4efdef565f Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
ed
4212d51a7d Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
avatar
c5a4c40ab2 Fixing the mount_smbfs(8) hanging by utilising the destroy_dev_sched() KPI.
Relevant threads:

  http://lists.freebsd.org/pipermail/freebsd-current/2007-June/074329.html

Reviewed by:	kib, bp (slightly different version)
Tested by:	Yuri Pankov <yuri.pankov at gmail dot com>,
		Jiawei Ye <leafy7382 at gmail dot com>
Approved by:	re (kensmith)
2007-07-10 09:23:10 +00:00
rwatson
765a83fd79 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
avatar
c9f2d4f91b Backing out the wrong fix which could possibly trash the memory if devfs
tries to drop the reference count after our close routine returns.

A more correct fix is to defer the destroy_dev() to a taskqueue(either
in devfs or locally).

Reminded by:	jhb
2007-02-09 17:22:10 +00:00
avatar
7f38d2a60b It turns out that devfs_close() does a dev_refthread() before invoking
device specific d_close(), which makes subsequent destroy_dev() being
blocked in the "devdrn" loop.

This bandaid should fix the smbfs hang/crashing observed on -CURRENT since
the introduction of sys/kern/kern_conf.c:1.199:

 	# mount_smbfs -I server //server/share /mnt
 	Password:
 	[hang]

Reviewed by:	bp
See also:	http://lists.freebsd.org/pipermail/cvs-src/2006-November/071379.html
2007-02-09 02:54:13 +00:00
bp
9032fdcdd0 Prevent module unloading if there are active connections.
PR:		kern/89085
Submitted by:	Rostislav Krasny
MFC after:	1 week
2005-11-22 02:15:46 +00:00
rwatson
daa1c89f45 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
phk
7af1e31761 Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
imp
a50ffc2912 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
phk
28bd6b5898 Don't use vn_todev(). 2004-11-10 07:16:59 +00:00
phk
40dd98a3bd Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
phk
f43aa0c4bc add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
phk
ad925439e0 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
tjr
28782c7efc Use automatic major number allocation for nsmb devices. 2004-02-11 12:49:49 +00:00
fjoe
571ef024e3 - Support for multibyte charsets in LIBICONV.
- CD9660_ICONV, NTFS_ICONV and MSDOSFS_ICONV kernel options
(with corresponding modules).
- kiconv(3) for loadable charset conversion tables support.

Submitted by:	Ryuichiro Imura <imura@ryu16.org>
2003-09-26 20:26:25 +00:00
phk
c81c59299b Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
obrien
8b64eb1925 Use __FBSDID(). 2003-06-11 05:37:42 +00:00
tjr
96122ae0b7 Remove fragments of support for the FreeBSD 3.x and 4.x branches. 2003-03-06 10:38:18 +00:00
phk
0ae911eb0e Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
tjr
5b9d37f7e4 Use noread(), nowrite() and nopoll() instead of our own stub functions. 2003-02-27 14:35:21 +00:00
phk
89fec4c203 NODEVFS cleanup: Don't call cdevsw_{add,remove}() 2003-02-26 21:04:51 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
tjr
d16e9e6edf Pass a minor number instead of a unit number to make_dev().
Devices with the wrong major were being created for units >255.
2003-02-07 23:29:57 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
dillon
ccd5574cc6 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
dillon
ddf9ef103e Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
alfred
927595101c When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
2003-01-01 01:56:19 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
alfred
844237b396 SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
bp
0b6f0e4522 Pull netsmb requester from the pre-KSE world. This update mostly based
on the patches submitted by Max Khon <fjoe@iclub.nsu.ru>
2001-12-02 08:47:29 +00:00
bp
52c8a1f983 Remove unnecessary "#if __FreeBSD_version". 2001-08-21 08:58:02 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
bp
a414f03f5d Import kernel part of SMB/CIFS requester.
Add smbfs(CIFS) filesystem.

Userland part will be in the ports tree for a while.

Obtained from:	smbfs-1.3.7-dev package.
2001-04-10 07:59:06 +00:00