This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
"A function can be preceded by one or more '!' characters, in which
case the function shall be applied if the addresses do not select
the pattern space."
from one parent directory to another, in addition to the usual access checks
one also needs write access to the subdirectory being moved.
Approved by: rwatson (mentor), pjd
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.
The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.
The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.
Sponsored by: Isilon Systems
MFC after: 1 month
it relies on non-portable flock(2) semantics. Not only is flock(2) not
portable, but on some OSes that do have it, it is implemented in terms
of fcntl(2) locks, which are per-process rather than per-descriptor.
will cause it to return 0, not EAGAIN.
Add UNIX domain socket support to udpzerobyte, which suggests this
regression test should be moved to the general sockets test area rather
than netinet.
possible to make NanoBSD output more quite or verbose. The default
output should remain mostly unchanged. [1]
- Add missing shift for -i.
- Clean up usage() so it's now (mostly) sorted alphabetically.
- Make command line argument handling more consistent in the code and
remove redundant semicolons.
Reviwed by: phk [1]
* Allow the image name to be renamed via NANO_IMGNAME.
* Propagate TARGET_ARCH into src top level make targets
explicitly to support cross-building.
* Increase the default size of NanoBSD media from 488MB to
584MB to accomodate a -CURRENT world.
Reviewed by: phk
control over the result of buildworld and installworld; this especially
helps packaging systems such as nanobsd
Reviewed by: various (posted to arch)
MFC after: 1 month
once it is lost, all data is gone.
Option '-B none' can by used to prevent backup. Option '-B path' can be
used to backup metadata to a different file than the default, which is
/var/backups/<prov>.eli.
The 'geli init' command also prints backup file location and gives short
procedure how to restore metadata.
The 'geli setkey' command now warns that even after passphrase change or keys
update there could be version of the master key encrypted with old
keys/passphrase in the backup file.
Add regression tests to verify that new functionality works as expected.
Update other regression tests so they don't create backup files.
Reviewed by: keramida, rink
Dedicated to: a friend who lost 400GB of his live by accidentally overwritting geli metadata
MFC after: 2 weeks
This fixes potential out-of-bound accesses when testing ciphers with block size
greater than 8 bytes (e.g. AES).
Submitted by: Bartlomiej Sieka tur ! semihalf dot com
Discussed with: pjd, sam
larger than 2GB to prevent an overflow [1].
Make case-insensitive comparison work for siliconsystems, soekris and
transcend devices.
PR: conf/126386 [1]
Submitted by: Mark A [1]
MFC after: 1 month
the first value (environ[0]) to NULL. This is in addition to the
current detection of environ being replaced, which includes being set to
NULL. Without this fix, the environment is not truly wiped, but appears
to be by getenv() until an *env() call is made to alter the enviroment.
This change is necessary to support those applications that use this
method for clearing environ such as Dovecot and Postfix. Applications
such as Sendmail and the base system's env replace environ (already
detected). While neither of these methods are defined by SUSv3, it is
best to support them due to historic reasons and in lieu of a clean,
defined method.
Add extra units tests for clearing environ using four different methods:
1. Set environ to NULL pointer.
2. Set environ[0] to NULL pointer.
3. Set environ to calloc()'d NULL-terminated array.
4. Set environ to static NULL-terminated array.
Noticed by: Timo Sirainen
MFC after: 3 days
the default ICMPv6 filter is pass all, test that we can set it to block
all and restore to pass all. No attempt is made to test that the
filtering works, just that we can get and set it.
is used to grab and hold some number of multicast addresses in order
to test what happens when an interface goes over the number of multicast
addresses it can filter in hardware.
I wrote these to test amd64 asm functions that used
maxss, maxsd, minss, and minsd, but it turns out that
those instructions don't handle NaNs and signed zero
in the same way as fmin() and fmax() are required to,
so we're stuck with the C versions for now.
The first test comes from OpenBSD, and the others are additions or
adaptations.
This is based on OpenBSD's
src/regress/lib/libc/sprintf/sprintf_test.c, v1.3.
I deliberately did not use v1.4 because it's bogus.
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
provides the correct semantics for flock(2) style locks which are used by the
lockf(1) command line tool and the pidfile(3) library. It also implements
recovery from server restarts and ensures that dirty cache blocks are written
to the server before obtaining locks (allowing multiple clients to use file
locking to safely share data).
Sponsored by: Isilon Systems
PR: 94256
MFC after: 2 weeks
- It is opt-out for now so as to give it maximum testing, but it may be
turned opt-in for stable branches depending on the consensus. You
can turn it off with WITHOUT_SSP.
- WITHOUT_SSP was previously used to disable the build of GNU libssp.
It is harmless to steal the knob as SSP symbols have been provided
by libc for a long time, GNU libssp should not have been much used.
- SSP is disabled in a few corners such as system bootstrap programs
(sys/boot), process bootstrap code (rtld, csu) and SSP symbols themselves.
- It should be safe to use -fstack-protector-all to build world, however
libc will be automatically downgraded to -fstack-protector because it
breaks rtld otherwise.
- This option is unavailable on ia64.
Enable GCC stack protection (aka Propolice) for kernel:
- It is opt-out for now so as to give it maximum testing.
- Do not compile your kernel with -fstack-protector-all, it won't work.
Submitted by: Jeremie Le Hen <jeremie@le-hen.org>
fifos, as this is required by the Single UNIX Specification, although
not currently implemented on FreeBSD.
While here, fix a bug in the directory timestamp checking test by
sleeping after querying the starting timestamp, rather than before.
a. The BSD version will be built and installed unless
WITHOUT_BSD_CPIO is defined.
b. The GNU version will not be built or installed unless
WITH_GNU_CPIO is defined. If this is defined, the symlink
in /usr/bin will be to the GNU version whether the BSD
version is present or not.
When these changes are MFCed the defaults should be flipped.
2. Add a knob to disable the building of GNU grep. This will
make it easier for those that want to test the BSD version in
the ports.
Approved by: kientzle [1]
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.
This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation
Discussed with: rwatson, re
AIO calls.
This small program queues up a controllable number of concurrent AIO
read operations w/ controllable io size against a disk or regular file.
There are a few other things to add (notably optional write support!)
but it works well enough at the present time to stress the AIO code out
relatively harshly in the disk IO case.
fit in a signed char
o change default output to something more useful for sta mode
o futz w/ various field names and widths; need to do full pass over this stuff
post collection. This is too error prone and introduces uncertainty into
the timing. We'll simply have to require synchronized TSCs to run
schedgraph on MP.
Sponsored by: Nokia
transmissions by the number of them running so that they do not
overwhelm the source.
Added a simple shell script to kick off sinks on multiple hosts as
well as a source on the host where the shell script is run. The script
also collects the output of all the sinks and the source into files named
for the host on which the tests are run. A date is appended to each output
file to make it unique per run.
after similar calls related to struct pwd in libutil/pw_util.c:
- gr_equal()
Perform a deep comparison of two struct grp's. It does a thorough, yet
unoptimized comparison of all the members regardless of order.
- gr_make()
Create a string (see group(5)) from a struct grp.
- gr_dup()
Duplicate a struct grp. Returns a value that is a single contiguous
block of memory.
- gr_scan()
Create a struct grp from a string (as produced by gr_make()).
MFC after: 3 weeks
a tinybsd build. If we do not do this, we can accidentally remove
critical files from directories like /lib (if mounted).
PR: misc/121763
Submitted by: Richard Arends < richard at unixguru dot nl >
MFC after: 3 days
for multicast packets. Parameters for the interface, packet size,
number of packets, and interpacket gap may be given on the command line.
The sink records how many packets were missed, and at what time each
packet arrived.
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
have dependencies or need to install a new user/group, are not
problematic anymore.
PR: 121367
Submitted by: Richard Arends < richard at unixguru dot nl >
MFC after: 3 days
Use the correct value of errno. Although the errno value passed into
printf() follows the *env() call, it is not guaranteed to be the errno
from that call. When I wrote the regression tester, the environment I
used did pass the errno from the call. Consolidate the print for the
return code and errno into a function in the process of fixing this.
Approved by: wes (mentor)
Note: it may be a good idea to deduce obsolete usr/lib32/ files from
obsolete lib/ and usr/lib/ files.
PR: 120492
PR: 121118
PR: 121121
Submitted by: KAMIYA Satosi, Richard Tector
Approved by: rwatson (mentor)
MFC after: 1 month
various open flags and then tests various operations to make sure that
they are properly constrained by open flags. Various I/O mechansms
are tried, including aio if compiled into the kernel or loaded as a
module. There's more to be done here but it's a useful start, running
about 220 individual tests.
This is in support of FreeBSD-SA-08:03.sendfile.
the semantics of pthread_mutex_islocked_np() to return true if and only if
the mutex is held by the current thread.
Obviously, change the regression test to match.
MFC after: 2 weeks
- Process (a) is blocked in read on a socket waiting on data.
- Process (b) is blocked in shutdown() on a socket waiting on (a).
- Process (c) delivers a signal to (b) interrupting its wait.
When the signal is delivered, the kernel panics as sblock() fails in
sorflush(). Even if it didn't panic, shutdown() would block potentially
indefinitely waiting for recv() to succeeded. Fixes to follow.
Reported by: Jos Backus <jos at catnook dot com>
mostly just test corner cases rather than accuracy. Some of the
tests don't pass right now if you compile libm at -O2 due to gcc
constant-folding some things that it shouldn't. I'll fix that
shortly.
* Explain why 32768 entries is usually not enough
* Increase the scaling ratio to 10 to deal with 32-bit overflows that
can occur in calculating the canvas offsets
disk image. In some cases this can be a significant speed-up, if
most of the image can be kept in RAM while being populated.
On the 2GB image I'm currently working with, the build time,
excluding buildworld/buildkernel, goes from ~17 minutes to ~6
minutes.
This is not enabled by default, as it might have the opposite effect
on low-memory systems.
- During the generation of the image file be a bit more verbose in the
log file so it is possible to see what's being done.
- Add a NANO_DISKIMGDIR variable which makes it possibly to place the
final images somewhere other than ${MAKEOBJDIRPREFIX}. The default
value for NANO_DISKIMGDIR is $MAKEOBJDIRPREFIX.
Go for it: phk
Due to a typo, setting NANO_DATASIZE=-1 resulted in the data slice
being the same size as entire image instead of the size of the
remaining space on the image.
- Fix detection of overcommit of the slices.
This fix mainly result in a nicer error than when newfs etc. tries to
write beyond the end of the disk image.
MFC after: 2 weeks
X-MFC after: RELENG_7 is open again
o push include paths to the Makefile
o use the AFTER trick to simplify adding new items
o prepare stat blocks for additional data
o align values for verbose output
o fillin some missing stats
MFC after: 1 week
default to the value of MK_KERBEROS unless set explicitly by
WITH_GSSAPI/WITHOUT_GSSAPI. (This introduces another type of
MK_* variables which itself is questionable.)
- Teach tools/build/options/makeman script that generates the
src.conf(5) manpage about the new type of MK_* variables.
- Fix broken logic in lib/Makefile.
WITHOUT_KERBEROS knob. While GSS can be used for other things
some third party software (most notably ports/x11/kdelibs3)
takes the presence of libgssapi as an indication that kerberos
is available, and attempts to link with the kerberos libs. If
they are not available, the build will fail.
Because you might want to use GSS but not kerberos, add a knob
to re-enable it if WITHOUT_KERBEROS is present.
Document the new knob, and the new behavior of WITHOUT_KERBEROS.
Not objected and/or generally agreed to by: freebsd-arch
Problem discussed/analyzed in:
PR: ports/116484
Add README.tcpmd5 to describe how to build a simple test setup
and run tests.
Convert compile time options to run time options [1].
Discussed with: rwatson
Suggested by: rwatson [1]
o add things i want to TODO list
o add Record entry to each event which back-maps to the line # in the ktr file;
useful for finding local context when the ktr file has lots of items that
schedgraph doesn't grok
o add missing KTR_SCHED event handlers
o expose Counter max value through a ymax method for widget building
o show timestamps in records rejected 'cuz time goes backwards
Add regression tests for privileged and supposedly unprivileged
IP_IPSEC_POLICY,IPV6_IPSEC_POLICY setsockopt cases.
We may need to review the current 'good' results to make
sure they reflect what we really want.
Discussed with: rwatson
Reviewed by: rwatson
Before that non-su users were able to open pfkey sockets as well.
Add a regression test so we can detect such problems in an automated way
in the future.
changes:
01 - Enhanced LRO:
LRO feature is extended to support multi-buffer mode. Previously,
Ethernet frames received in contiguous buffers were offloaded.
Now, frames received in multiple non-contiguous buffers can be
offloaded, as well. The driver now supports LRO for jumbo frames.
02 - Locks Optimization:
The driver code was re-organized to limit the use of locks.
Moreover, lock contention was reduced by replacing wait locks
with try locks.
03 - Code Optimization:
The driver code was re-factored to eliminate some memcpy
operations. Fast path loops were optimized.
04 - Tag Creations:
Physical Buffer Tags are now optimized based upon frame size.
For better performance, Physical Memory Maps are now re-used.
05 - Configuration:
Features such as TSO, LRO, and Interrupt Mode can be configured
either at load or at run time. Rx buffer mode (mode 1 or mode 2)
can be configured at load time through kenv.
06 - Driver Statistics:
Run time statistics are enhanced to provide better visibility
into the driver performance.
07 - Bug Fixes:
The driver contains fixes for the problems discovered and
reported since last submission.
08 - MSI support:
Added Message Signaled Interrupt feature which currently uses 1
message.
09 Removed feature:
Rx 3 buffer mode feature has been removed. Driver now supports 1,
2 and 5 buffer modes of which 2 and 5 buffer modes can be used
for header separation.
10 Compiler warning:
Fixed compiler warning when compiled for 32 bit system.
11 Copyright notice:
Source files are updated with the proper copyright notice.
MFC after: 3 days
Submitted by: Alicia Pena <Alicia dot Pena at neterion dot com>,
Muhammad Shafiq <Muhammad dot Shafiq at neterion dot com>