Commit Graph

1450 Commits

Author SHA1 Message Date
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
John Baldwin
5c63b21a1a Regen. 2008-03-25 19:35:34 +00:00
John Baldwin
30c6422a8a Add entries for the cpuset-related system calls. The existing system calls
can be used on little endian systems.

Pointy hat to:	jeff
2008-03-25 19:34:47 +00:00
Ruslan Ermilov
d7a38db650 Fix build.
Reported by:	ache, tinderbox
2008-03-25 13:20:52 +00:00
Roman Divacky
6af821237d o Add stub support for some new futex operations,
so the annoying message is not printed.

	o	Don't warn about FUTEX_FD not being implemented
		and return ENOSYS instead of 0 (eg. success).

	o	Clear FUTEX_PRIVATE_FLAG as we actually implement
		only private futexes so there is no reason to
		return ENOSYS when app asks for a private futex.
		We don't reject shared futexes because they worked
		just fine with our implementation so far.

Approved by:	kib (mentor)
Tested by:	bsam
MFC after:	1 week
2008-03-20 17:03:55 +00:00
Antoine Brodin
afe5acff1b Simplify fcntl(SVR4_F_DUP2FD) code now that FreeBSD has F_DUP2FD.
Approved by:	rwatson (mentor)
2008-03-17 18:27:28 +00:00
Roman Divacky
5dfb688191 Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by:	jeff
Approved by:	kib (mentor)
2008-03-16 16:27:44 +00:00
Jeff Roberson
66257bc8d9 - The P_SA flag has been removed. Don't reference it in a KASSERT. 2008-03-12 22:17:06 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Konstantin Belousov
a0b0d286bc Return ENOSYS instead of 0 for the unknown futex operations.
Submitted by: rdivacky
Reported and tested by: Gary Stanley <gary velocity-servers net>
2008-03-02 14:00:50 +00:00
Konstantin Belousov
cbd2c621f8 Sanitize arguments to linux_mremap().
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.

Submitted by:	rdivacky
MFC after:	1 week
2008-02-22 11:47:56 +00:00
Ruslan Ermilov
b95bd24d29 Regenerate for readlink(2). 2008-02-12 20:11:54 +00:00
Ruslan Ermilov
5f56182b6f Change readlink(2)'s return type and type of the last argument
to match POSIX.

Prodded by:	Alexey Lyashkov
2008-02-12 20:09:04 +00:00
Poul-Henning Kamp
cf827063a9 Give MEXTADD() another argument to make both void pointers to the
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.

Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.

Likely break the only non-convetional user of the API, after informing
the relevant committer.

Update the mbuf(9) manual page, which was already out of sync on
this point.

Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.

This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.

This does not affect the memory layout or size of mbufs.

Parental oversight by:	sam and rwatson.

No MFC is anticipated.
2008-02-01 19:36:27 +00:00
Pawel Jakub Dawidek
44ce1efd91 Change type of kmem_used() and kmem_size() functions to uint64_t, so it
doesn't overflow in arc.c in this check:

	if (kmem_used() > (kmem_size() * 4) / 5)
		return (1);

With this bug ZFS almost doesn't cache.

Only 32bit machines are affected that have vm.kmem_size set to values >=1GB.

Reported by:	David Taylor <davidt@yadt.co.uk>
2008-01-24 11:21:54 +00:00
Robert Watson
20c6fe828a Regenerate. 2008-01-20 23:44:24 +00:00
Robert Watson
6c902059f2 Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls
shm_open() and shm_unlink().  More auditing will need to be done for
these calls to capture arguments properly.
2008-01-20 23:43:06 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
John Baldwin
4ad6d200d6 Regen for shm_open(2) and shm_unlink(2). 2008-01-08 22:01:26 +00:00
John Baldwin
8e38aeff17 Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
  object which provides the backing store.  Each descriptor starts off with
  a size of zero, but the size can be altered via ftruncate(2).  The shared
  memory file descriptors also support fstat(2).  read(2), write(2),
  ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
  memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
  manage shared memory file descriptors.  The virtual namespace that maps
  pathnames to shared memory file descriptors is implemented as a hash
  table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
  of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
  path argument to shm_open(2).  In this case, an unnamed shared memory
  file descriptor will be created similar to the IPC_PRIVATE key for
  shmget(2).  Note that the shared memory object can still be shared among
  processes by sharing the file descriptor via fork(2) or sendmsg(2), but
  it is unnamed.  This effectively serves to implement the getmemfd() idea
  bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
  collected when they are not referenced by any open file descriptors or
  the shm_open(2) virtual namespace.

Submitted by:	dillon, peter (previous versions)
Submitted by:	rwatson (I based this on his version)
Reviewed by:	alc (suggested converting getmemfd() to shm_open())
2008-01-08 21:58:16 +00:00
Konstantin Belousov
d075105da0 After applying LCONVPATH() to the path, do use the converted path
instead of original user-mode string in the linux_stat() and
linux_lstat() syscalls.

Tested by:	Peter Holm
MFC after:	3 days
2008-01-05 12:36:35 +00:00
Jeff Roberson
397c19d175 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
Konstantin Belousov
93eba2d50d Plug the leaks in the present (hopefully, soon to be replaced)
implementation of the linux_openat() for the quick MFC.

Reported and tested by: Peter Holm
MFC after:      3 days
2007-12-29 14:28:01 +00:00
Konstantin Belousov
15b78ac5d1 Apply the LCONVPATH() to the (old) linux_stat() and linux_lstat() syscalls.
Without it, code has two problems:
- behaviour of the old and new [l]stat are different with regard of
  the /compat/linux
- directly accessing the userspace data from the kernel asks for
  the panics.

Reported and tested by:	Peter Holm
Reviewed by:	rdivacky
MFC after:	3 days
2007-12-29 14:25:29 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
John Baldwin
0a63574164 Bah, remove last vestiges of some statfs conversion fixes that aren't quite
ready for CVS yet that snuck into 1.68.

Pointy hat to:	jhb
2007-12-10 19:42:23 +00:00
Scott Long
d637500d06 Grrr, remove an unused variable missed in the last commit. 2007-12-08 01:41:31 +00:00
Scott Long
7815c9e2db Don't expect a return value from statfs_scale_blocks(). 2007-12-07 22:32:09 +00:00
John Baldwin
8120bb7e3a Regen. 2007-12-06 23:37:26 +00:00
John Baldwin
695e8d536c Add freebsd32 compat wrappers for msgctl() and __semctl() using
kern_msgctl() and kern_semctl().

MFC after:	1 week
2007-12-06 23:36:57 +00:00
John Baldwin
3c39e0d8d4 Add freebsd32 compat wrappers for msgctl() and _semctl() using
kern_msgctl() and kern_semctl().

MFC after:	1 week
2007-12-06 23:35:29 +00:00
John Baldwin
d43c6fa4fe Move 32-bit SYSV IPC structure definitions into freebsd32_ipc.h.
MFC after:	1 week
2007-12-06 23:23:16 +00:00
John Baldwin
74427aa423 Move several data structure definitions out of freebsd32_misc.c and into
freebsd32.h instead.

MFC after:	1 week
2007-12-06 23:11:27 +00:00
Jung-uk Kim
959a913b87 Remove redundant checks for msgsnd(3) and msgrcv(3).
COMPAT_IA32 (implicitly) requires SYSVSEM, SYSVSHM and SYSVMSG in kernel.

Pointed out by:	jhb
2007-12-04 20:25:41 +00:00
Andrew Thompson
ac740aebcf Implement functions required by some ndis drivers.
NdisIMCopySendPerPacketInfo [1]
 KeQuerySystemTime [1]
 KeTickCount [1]
 strncat [1]
 KeBugCheckEx

Submitted by:	Marcin Simonides [1]
2007-12-03 23:43:58 +00:00
Andrew Thompson
e880149eb9 Correct the calculation for the number of 100ns intervals since
January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns
units.

Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.
2007-12-02 08:54:50 +00:00
Andrew Thompson
f3ad39ccf5 Correct the nwbx_ies field type in struct ndis_wlan_bssid_ex.
PR:		kern/118369
Submitted by:	Weongyo Jeong
2007-12-02 04:04:42 +00:00
Peter Wemm
7628402b07 Move the shared cp_time array (counts %sys, %user, %idle etc) to the
per-cpu area.  cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc.  The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.

sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.

I have pending changes to make top and vmstat optionally show per-cpu
stats.

I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.

I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.

Reviewed by:  jhb
Partly obtained from:  jhb
2007-11-29 06:34:30 +00:00
John Birrell
35a04710d7 Remove some compatibility stuff that we now get from the Solaris header. 2007-11-29 00:15:08 +00:00
John Birrell
57438287ab Add more OpenSolaris compatibility headers. 2007-11-28 21:50:40 +00:00
John Birrell
eca148b637 Remove an extern that is defined elsewhere. 2007-11-28 21:50:05 +00:00
John Birrell
edadde229a Add compatibility cruft moved from under _SOLARIS_C_SOURCE in sys/types.h 2007-11-28 21:49:16 +00:00
John Birrell
35ba7f225f Remove a typedef which was just a hack to avoid including vmem.h.
That typedef breaks other Solaris code.
2007-11-28 21:48:25 +00:00
John Birrell
773f4e3849 Add a missing volatile so that the code compiles cleanly. 2007-11-28 21:47:09 +00:00
John Birrell
4fc8feafc7 Rename the definition of lbolt to LBOLT to avoid a clash with a global
variable in FreeBSD. Until now lbolt in sys/proc.h has been #ifdef'ed
out based on _SOLARIS_C_SOURCE, but that is going away now.
2007-11-28 21:44:17 +00:00
Konstantin Belousov
d60f0a3d6a Implement LINUX_SIOCGIFCOUNT and LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX.
LINUX_SIOCGIFCOUNT just returns 0 since it is not implemented in the
Linux 2.6.16.

LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX are mapped to the FreeBSD native
SIOCGIFINDEX.

Tested by:	Peter Kostouros <kpeter@melbpc.org.au>
Reviewed by:	brooks, rpaulo (on net@)
Submitted by:	rdivacky
MFC after:	1 week
2007-11-07 16:42:52 +00:00
Pawel Jakub Dawidek
171eb887e9 Remove "zfs:" prefix from lock and condvar names and also skip non-letter
characters (mostly "&"). Because top(1) shows only first six characters of
wait channel, without this change we saw only one meaningful character.

Requested by:	kris & others
MFC after:	1 week
2007-11-05 18:40:55 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Pawel Jakub Dawidek
4f2398ea17 - Move crfree() outside MNT_ILOCK()/MNT_IUNLOCK() to eliminate a LOR:
1st 0xc4cea568 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../compat/opensolaris/kern/opensolaris_vfs.c:209
  2nd 0xc3ee9010 sleep mtxpool (sleep mtxpool) @ /usr/src/sys/kern/kern_resource.c:1266
- Move crdup() outside MNT_ILOCK()/MNT_IUNLOCK(), as it can sleep.

Reported by:	Olli Hauer <ohauer@gmx.de>
MFC after:	3 days
2007-11-01 08:58:29 +00:00