This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
"A function can be preceded by one or more '!' characters, in which
case the function shall be applied if the addresses do not select
the pattern space."
from one parent directory to another, in addition to the usual access checks
one also needs write access to the subdirectory being moved.
Approved by: rwatson (mentor), pjd
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.
The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.
The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.
Sponsored by: Isilon Systems
MFC after: 1 month
it relies on non-portable flock(2) semantics. Not only is flock(2) not
portable, but on some OSes that do have it, it is implemented in terms
of fcntl(2) locks, which are per-process rather than per-descriptor.
will cause it to return 0, not EAGAIN.
Add UNIX domain socket support to udpzerobyte, which suggests this
regression test should be moved to the general sockets test area rather
than netinet.
once it is lost, all data is gone.
Option '-B none' can by used to prevent backup. Option '-B path' can be
used to backup metadata to a different file than the default, which is
/var/backups/<prov>.eli.
The 'geli init' command also prints backup file location and gives short
procedure how to restore metadata.
The 'geli setkey' command now warns that even after passphrase change or keys
update there could be version of the master key encrypted with old
keys/passphrase in the backup file.
Add regression tests to verify that new functionality works as expected.
Update other regression tests so they don't create backup files.
Reviewed by: keramida, rink
Dedicated to: a friend who lost 400GB of his live by accidentally overwritting geli metadata
MFC after: 2 weeks
the first value (environ[0]) to NULL. This is in addition to the
current detection of environ being replaced, which includes being set to
NULL. Without this fix, the environment is not truly wiped, but appears
to be by getenv() until an *env() call is made to alter the enviroment.
This change is necessary to support those applications that use this
method for clearing environ such as Dovecot and Postfix. Applications
such as Sendmail and the base system's env replace environ (already
detected). While neither of these methods are defined by SUSv3, it is
best to support them due to historic reasons and in lieu of a clean,
defined method.
Add extra units tests for clearing environ using four different methods:
1. Set environ to NULL pointer.
2. Set environ[0] to NULL pointer.
3. Set environ to calloc()'d NULL-terminated array.
4. Set environ to static NULL-terminated array.
Noticed by: Timo Sirainen
MFC after: 3 days
the default ICMPv6 filter is pass all, test that we can set it to block
all and restore to pass all. No attempt is made to test that the
filtering works, just that we can get and set it.
I wrote these to test amd64 asm functions that used
maxss, maxsd, minss, and minsd, but it turns out that
those instructions don't handle NaNs and signed zero
in the same way as fmin() and fmax() are required to,
so we're stuck with the C versions for now.
The first test comes from OpenBSD, and the others are additions or
adaptations.
This is based on OpenBSD's
src/regress/lib/libc/sprintf/sprintf_test.c, v1.3.
I deliberately did not use v1.4 because it's bogus.
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
provides the correct semantics for flock(2) style locks which are used by the
lockf(1) command line tool and the pidfile(3) library. It also implements
recovery from server restarts and ensures that dirty cache blocks are written
to the server before obtaining locks (allowing multiple clients to use file
locking to safely share data).
Sponsored by: Isilon Systems
PR: 94256
MFC after: 2 weeks
fifos, as this is required by the Single UNIX Specification, although
not currently implemented on FreeBSD.
While here, fix a bug in the directory timestamp checking test by
sleeping after querying the starting timestamp, rather than before.
AIO calls.
This small program queues up a controllable number of concurrent AIO
read operations w/ controllable io size against a disk or regular file.
There are a few other things to add (notably optional write support!)
but it works well enough at the present time to stress the AIO code out
relatively harshly in the disk IO case.
after similar calls related to struct pwd in libutil/pw_util.c:
- gr_equal()
Perform a deep comparison of two struct grp's. It does a thorough, yet
unoptimized comparison of all the members regardless of order.
- gr_make()
Create a string (see group(5)) from a struct grp.
- gr_dup()
Duplicate a struct grp. Returns a value that is a single contiguous
block of memory.
- gr_scan()
Create a struct grp from a string (as produced by gr_make()).
MFC after: 3 weeks
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
Use the correct value of errno. Although the errno value passed into
printf() follows the *env() call, it is not guaranteed to be the errno
from that call. When I wrote the regression tester, the environment I
used did pass the errno from the call. Consolidate the print for the
return code and errno into a function in the process of fixing this.
Approved by: wes (mentor)
various open flags and then tests various operations to make sure that
they are properly constrained by open flags. Various I/O mechansms
are tried, including aio if compiled into the kernel or loaded as a
module. There's more to be done here but it's a useful start, running
about 220 individual tests.
This is in support of FreeBSD-SA-08:03.sendfile.
the semantics of pthread_mutex_islocked_np() to return true if and only if
the mutex is held by the current thread.
Obviously, change the regression test to match.
MFC after: 2 weeks
- Process (a) is blocked in read on a socket waiting on data.
- Process (b) is blocked in shutdown() on a socket waiting on (a).
- Process (c) delivers a signal to (b) interrupting its wait.
When the signal is delivered, the kernel panics as sblock() fails in
sorflush(). Even if it didn't panic, shutdown() would block potentially
indefinitely waiting for recv() to succeeded. Fixes to follow.
Reported by: Jos Backus <jos at catnook dot com>
mostly just test corner cases rather than accuracy. Some of the
tests don't pass right now if you compile libm at -O2 due to gcc
constant-folding some things that it shouldn't. I'll fix that
shortly.
Add README.tcpmd5 to describe how to build a simple test setup
and run tests.
Convert compile time options to run time options [1].
Discussed with: rwatson
Suggested by: rwatson [1]
Add regression tests for privileged and supposedly unprivileged
IP_IPSEC_POLICY,IPV6_IPSEC_POLICY setsockopt cases.
We may need to review the current 'good' results to make
sure they reflect what we really want.
Discussed with: rwatson
Reviewed by: rwatson
Before that non-su users were able to open pfkey sockets as well.
Add a regression test so we can detect such problems in an automated way
in the future.
work present in FreeBSD 7.0 to refine the kernel privilege model:
- Introduce support for jail as a testing variable, in order to
confirm that privileges are properly restricted in the jail
environment.
- Restructure overall testing approach so that privilege and jail
conditions are set in the testing infrastructure before tests
are invoked, and done so in a custom-created process to isolate
the impact of tests from each other in a more consistent way.
- Tests now provide setup and cleanup hooks that occur before and
after the test runs.
- New privilege tests are now present for several audit
privileges, several credential management privileges, dmesg
buffer reading privilege, and netinet raw socket creation.
- Other existing tests are restructured and generally improved as
a result of better framework structure and jail as a variable.
For exampe, we now test that certain sysctls are writable only
outside jail, while others are writable within jail. On a
similar note, privileges relating to setting UFS file flags are
now better exercised, as with the right to chmod and utimes
files.
Approved by: re (bmah)
Obtained from: TrustedBSD Project
or replace (i.e., zdump) the environment after a call to setenv(), putenv()
or unsetenv() has been made, a few changes were made.
- getenv() will return the value from the new environ array.
- setenv() was split into two functions: __setenv() which is most of the
previous setenv() without checks on the name and setenv() which
contains the checks before calling __setenv().
- setenv(), putenv() and unsetenv() will unset all previous values and
call __setenv() on all entries in the new environ array which in turn
adds them to the end of the envVars array. Calling __setenv() instead
of setenv() is done to avoid the temporary replacement of the '=' in a
string with a NUL byte. Some strings may be read-only data.
Added more regression checks for clearing the environment array.
Replaced gettimeofday() with getrusage() in timing regression check for
better accuracy.
Fixed an off-by-one bug in __remove_putenv() in the use of memmove(). This
went unnoticed due to the allocation of double the number of environ
entries when building envVars.
Fixed a few spelling mistakes in the comments.
Reviewed by: ache
Approved by: wes
Approved by: re (kensmith)
- Solaris' setgroups(2) doesn't change process' effective gid, so set it
explicitly.
- POSIX doesn't define O_NOFOLLOW. FreeBSD returns EMLINK when target is
a symbolic link, but Solaris returns ELOOP then.
- Solaris doesn't define O_SHLOCK and O_EXLOCK flags.
Approved by: re (rwatson)
setenv(3) by tracking the size of the memory allocated instead of using
strlen() on the current value.
Convert all calls to POSIX from historic BSD API:
- unsetenv returns an int.
- putenv takes a char * instead of const char *.
- putenv no longer makes a copy of the input string.
- errno is set appropriately for POSIX. Exceptions involve bad environ
variable and internal initialization code. These both set errno to
EFAULT.
Several patches to base utilities to handle the POSIX changes from
Andrey Chernov's previous commit. A few I re-wrote to use setenv()
instead of putenv().
New regression module for tools/regression/environ to test these
functions. It also can be used to test the performance.
Bump __FreeBSD_version to 700050 due to API change.
PR: kern/99826
Approved by: wes
Approved by: re (kensmith)
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
Four tests currently fail:
test_ether_line_bad_1() and test_ether_line_bad_2() due to bugs in
ether_line(3).
test_ether_ntohost() and test_ether_hostton() due to not being fully
implemented tests.
on socket buffers is interruptible or not, which detacts the regression I
introduced recently in 7-CURRENT (spotted by alfred). This test passes
in older -CURRENT, and with the as-yet uncommitted sx_xlock_sig and
sblock fix patches.
each file independently from other files. The new semantics are
desired in the most of practical cases, e.g.: delete lines 5-9
from each file.
Keep the previous semantics of -i under a new option, -I, which
uses a single continuous address space covering all files to edit
in-place -- they are too cool to just drop them.
Add regression tests for -i and -I.
Approved by: dds
Compared with: GNU sed
Discussed on: -hackers
MFC after: 2 weeks
[Since the change to strict refcounting for in_multi objects, this test
began to fail; formerly the refcount was a count of the number of requests
for a given address, NOT a count of pointers to the object.]
- Close the new file objects created during socketpair() if the copyout of
the new file descriptors fails.
- Add a test to the socketpair regression test for this edge case.
and had no chance to match it by the 2nd address precisely.
Otherwise the unclosed range would bogusly extend to the end
of stream.
Add a basic regression test for the bug fixed. (This change
also fixes the more complex case 5.3 from `multitest.t'.)
Compared with: SUN and GNU seds
Tested by: regression tests
MFC after: 1 week
in a more reasonable way than BSD sed does: they properly
close the range even if we branched over its end. No doubt,
the range `1,5' should not match lines from 9 through 14.
them are related to the `c' function's need to know if we are at
the actual end of the address range. (It must print the text not
earlier than the whole pattern space was deleted.) It appears the
only sed function with this requirement.
There is `lastaddr' set by applies(), which is to notify the `c'
function, but it can't always help because it's false when we are
hitting the end of file early. There is also a bug in applies()
due to which `lastaddr' isn't set to true on degenerate ranges such
as `$,$' or `N,$' if N appears the last line number.
Handling early EOF condition in applies() could look more logical,
but it would effectively revert sed to the unreasonable behaviour
rev. 1.26 of main.c fought against, as it would require lastline()
be called for each line within each address range. So it's better
to call lastline() only if needed by the `c' function.
Together with this change to sed go regression tests for the bugs
fixed (c1-c3). A basic test of `c' (c0) is also added as it helped
me to spot my own error.
Discussed with: dds
Tested by: the regression tests
MFC after: 1 week
I have verified these with GNU sed 4.1.5 (and in some cases with Solaris
sed) and they are identical, with the following exceptions:
5.3: The result is unspecified and BSD sed behaves differently.
6.3: GNU sed gets it wrong
7.1: GNU sed gets it wrong
7.8: BSD sed gets it wrong
According to IEEE Std 1003.1, 2004 "Whenever the pattern space is
written to standard output or a named file, sed shall immediately
follow it with a <newline>."
An attempt at the same correction might have been made with r1.3,
which is however identical with r1.2.
effective group ID or to group ID of its parent directory.
- Add some comments from POSIX.
- Verify that after successful O_TRUNC open, size is equal to 0.
he is the file's owner, he can't set set-gid bit.
POSIX requires to return 0 and clear the bit, but FreeBSD returns
EPERM for UFS in such case. For now do the same in ZFS.
Almost all regression tests are based on very flexible fstest tool.
They verify correctness (POSIX conformance) of almost all file
system-related system calls.
The motivation behind this work is my ZFS port and POSIX, who doesn't
provide free test suites.
Runs on: FreeBSD/UFS, FreeBSD/ZFS, Solaris/UFS, Solaris/ZFS
To try it out:
# cd fstest
# make
# find tests/* -type d | xargs prove
various types, as well as pipes and fifos for good measure. RELENG_6
currently passes all of these tests, but 7-CURRENT fails 0-byte writes
and sends on all stream socket types (and fifos, as they are based on
stream sockets).
Bumped into by: peter
Diagnosed by: jhb
Problem of: andre
to floating-point, the result is a quiet NaN. The current implementation
may return a signaling NaN, and the vendor has no plans for changing this,
for reasons explained in the comment I added.
mbuf is dropped, to preserve the invariant in the PR_ADDR case.
Add a regression test to detect this condition, but do not hook it
up to the build for now.
PR: kern/38495
Submitted by: James Juran
Reviewed by: sam, rwatson
Obtained from: NetBSD
MFC after: 2 weeks
wildcard specifications. Earlier the only wildcard syntax
was "-j 0" for "any jail". There were at least
two shortcomings in it: First, jail ID 0 was abused; it
meant "no jail" in other utils, e.g., ps(1). Second, it
was impossible to match processed not in jail, which could
be useful to rc.d developers. Therefore a new syntax is
introduced: "-j any" means any jail while "-j none" means
out of jail. The old syntax is preserved for compatibility,
but now it's deprecated because it's limited and confusing.
Update the respective regression tests. While I'm here,
make the tests more complex but sensitive: Start several
processes, some in jail and some out of jail, so we can
detect that only the right processes are killed by pkill
or matched by pgrep.
Reviewed by: gad, pjd
MFC after: 1 week
nature of implied connect via sendto(). Oddly, uipc_usrreq.c implements
this for stream sockets, but doesn't set the flag in its protocol
definition so that it can actually be used. As such, the stream test is
implemented but doesn't run for now.
implemented properly for a number of kernel subsystems. In general, they
try to exercise the privilege first as the root user, then as a test user,
in order to determine when privilege is being checked.
Currently, these tests do not compare inside/outside jail, and probably
should be enhanced to do that.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
It is by no means expected to perform a complete test of the library
for correctness, but is meant to test the API to make sure libmp (or
libcrypto) updates don't totally break the library.
o If something is wrong with options, then output short usage help message.
o Output errstr returned from strtonum(3).
PR: bin/98141
Submitted by: Andrey Simonenko
subject: ranges of uid, ranges of gid, jail id
objects: ranges of uid, ranges of gid, filesystem,
object is suid, object is sgid, object matches subject uid/gid
object type
We can also negate individual conditions. The ruleset language is
a superset of the previous language, so old rules should continue
to work.
These changes require a change to the API between libugidfw and the
mac_bsdextended module. Add a version number, so we can tell if
we're running mismatched versions.
Update man pages to reflect changes, add extra test cases to
test_ugidfw.c and add a shell script that checks that the the
module seems to do what we expect.
Suggestions from: rwatson, trhodes
Reviewed by: trhodes
MFC after: 2 months
o Add mount and umount actions so that partitions can be in use.
o Extend the testing of the add verb to include overlapping
partitions.
o Add tests for the remove verb. this includes tests to remove
a partition when in use (i.e. is mounted).
o Add a MD5 checksum to the output of the conf action so that
it can be tested. Make sure the MD5 doesn't vary based on
certain dynamic behaviour that is irrelevant to the output.
o Add MD5 checksums to the expected result of conf actions.
Add support for read-write parameters. Allow an optional initializer
for read-write parameters. Print the value of those parameters on
success following the PASS.