and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.
The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.
The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.
Sponsored by: Isilon Systems
MFC after: 1 month
usable for newer CPUs. The new value allows 2 x quad core configuration
dumps to fit within the initial buffer without reallocations.
Approved by: gnn (mentor) (older version)
Pointed out by: rdivacky
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.
This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.
Discussed with: kib
Tested by: pho
We often run into these very high column numbers when we run curses
applications, because they don't print any newlines. This messes up the
table output of `pstat -t'. If these numbers get really high, they
aren't of any use to the reader anyway. Convert them to `99999' when
they run out of bounds.
One of the pieces of code that I had left alone during the development
of the MPSAFE TTY layer, was tty_cons.c. This file actually has two
different functions:
- It contains low-level console input/output routines (cnputc(), etc).
- It creates /dev/console and wraps all its cdevsw calls to the
appropriate TTY.
This commit reimplements the second set of functions by moving it
directly into the TTY layer. /dev/console is now a character device node
that's basically a regular TTY, but does a lookup of `si_drv1' each time
you open it. d_write has also been changed to call log_console().
d_close() is not present, because we must make sure we don't revoke the
TTY after writing a log message to it.
Even though I'm not convinced this is in line with the future directions
of our console code, it is a good move for now. It removes recursive
locking from the top half of the TTY layer. The previous implementation
called into the TTY layer with Giant held.
I'm renaming tty_cons.c to kern_cons.c now. The code hardly contains any
TTY related bits, so we'd better give it a less misleading name.
Tested by: Andrzej Tobola <ato iem pw edu pl>,
Carlos A.M. dos Santos <unixmania gmail com>,
Eygene Ryabinkin <rea-fbsd codelabs ru>
within an object that a mapping refers to. fileid and fsid are inode/dev
for vnodes. (Linux procfs has these and valgrind is really unhappy
without them.) I believe I didn't change the size of the struct.
dump of detected ULE CPU topology. This dump can be used to check the
topology detection and for general system information.
An example of CPU topology dump is:
kern.sched.topology_spec: <groups>
<group level="1" cache-level="0">
<cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
<flags></flags>
<children>
<group level="2" cache-level="0">
<cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
<flags></flags>
</group>
<group level="2" cache-level="0">
<cpu count="4" mask="0xf0">4, 5, 6, 7</cpu>
<flags></flags>
</group>
</children>
</group>
</groups>
Reviewed by: jeff
Approved by: gnn (mentor)
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
vpollinfo with vnode interlock. Fully initialize vpollinfo before putting
pointer to it into vp->v_pollinfo.
Discussed with: dwhite
Tested by: pho
MFC after: 1 week
that they operate directly on credentials: mac_proc_create_swapper(),
mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies.
Obtained from: TrustedBSD Project
"ticks" goes negative. This breaks the signed comparison in softclock.
This causes sleep() to never wake up, tcp to stop, etc etc. This is
bad(TM). Use the SEQ_LT() method from tcp's sequence number comparisons.
Due to the nature of the beast it causes lot of unproductive overhead. This
is especially bad when running SMP kernel on VMWare with several virtual
processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my
dual-core MacBook Pro when I am enabling two virtual CPUs, making even host
not very usable. Detect when we are running in the sandbox and reduce HZ
to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This
brings host CPU usage of idle FreeBSD/SMP on two virtual processors down
to 10%.
Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox
and VirtualPC.
MFC after: 2 weeks
when thread is in kernel mode, it can cause dead loop, now unlock
process lock after acquired sleep queue lock and thread lock to
avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must
be set with process lock and thread lock being hold at same time.
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.
rest in kern_getdirentries(). Use kern_getdirentries() to implement
freebsd32_getdirentries(). This fixes a bug where calls to getdirentries()
in 32-bit binaries would trash the 4 bytes after the 'long base' in
userland.
Submitted by: ups
MFC after: 1 week
- If there aren't spinlocks held, but there are problems with old
sleeplocks, they are not reported.
- If the spinlock found is not the only one, problems are not reported.
Fix these 2 problems.
Reported by: tegge
and ffs_lock. This cannot catch situations where holdcnt is incremented
not by curthread, but I think it is useful.
Reviewed by: tegge, attilio
Tested by: pho
MFC after: 2 weeks
MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always
tries to obtain a shared lock on mnt_lock, the other user is unmount who
tries to drain it, setting MNTK_UNMOUNT before.
Reviewed by: tegge, attilio
Tested by: pho
MFC after: 2 weeks
realtimer_expire() to not rearm the timer, otherwise there is a chance
that a callout will be left there and be tiggered in future unexpectly.
Bug reported by: tegge@
not the string formatted at the time of CTRX() call. Stack_ktr(9) uses
an on-stack buffer for the symbol name, that is supplied as an argument
to ktr. As result, stack_ktr() traces show garbage or cause page faults.
Fix stack_ktr() by using pointer to module symbol table that is supposed
to have a longer lifetime.
Tested by: pho
MFC after: 1 week
credentials from inp_cred which is also available after the
socket is gone.
Switch cr_canseesocket consumers to cr_canseeinpcb.
This removes an extra acquisition of the socket lock.
Reviewed by: rwatson
MFC after: 3 months (set timer; decide then)
PCPU_PTR() curthread can migrate on another CPU and get incorrect
results.
- Fix a similar race into witness_warn().
- Fix the interlock's checks bypassing by correctly using the appropriate
children even when the lock_list chunk to be explored is not the first
one.
- Allow witness_warn() to work with spinlocks too.
Bugs found by: tegge
Submitted by: jhb, tegge
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
- Change the ddb(4) commands to be more useful (by thompsa@):
- `show ttys' is now called `show all ttys'. This command will now
also display the address where the TTY data structure resides.
- Add `show tty <addr>', which dumps the TTY in a readable form.
- Place an upper bound on the TTY buffer sizes. Some drivers do not want
to care about baud rates. Protect these drivers by preventing the TTY
buffers from getting enormous. Right now we'll just clamp it to 64K,
which is pretty high, taking into account that these buffers are only
used by the built-in discipline.
- Only call ttydev_leave() when needed. Back in April/May the TTY
reference counting mechanism was a little different, which required us
to call ttydev_leave() each time we finished a cdev operation.
Nowadays we only need to call ttydev_leave() when we really mark it as
being closed.
- Improve return codes of read() and write() on TTY device nodes.
- Make sure we really wake up all blocked threads when the driver calls
tty_rel_gone(). There were some possible code paths where we didn't
properly wake up any readers/writers.
- Add extra assertions to prevent sleeping on a TTY that has been
abandoned by the driver.
- Use ttydev_cdevsw as a more reliable method to figure out whether a
device node is a real TTY device node.
Obtained from: //depot/projects/mpsafetty/...
Reviewed by: thompsa
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()
and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.
As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
designed drivers would never hit, but was exposed in diving into
another problem...
When expanding the devclass array, free the old memory after updating
the pointer to the new memory. For the following single race case,
this helps:
allocate new memory
copy to new memory
free old memory
<interrupt> read pointer to freed memory
update pointer to new memory
Now we do
allocate new memory
copy to new memory
update pointer to new memory
free old memory
Which closes this problem, but doesn't even begin to address the
multicpu races, which all should be covered by Giant at the moment,
but likely aren't completely.
Note: reviewers were ok with this fix, but suggested the use case
wasn't one we wanted to encourage.
Reviewed by: jhb, scottl.
have_interp to TRUE. This allows the code in image activator to try
/libexec/ld-elf.so.1 as interpreter when newinterp is not found to
execute.
Reviewed by: peter
MFC after: 2 weeks (together with r175105)