This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
specification and regression test regress:25.
"A function can be preceded by one or more '!' characters, in which
case the function shall be applied if the addresses do not select
the pattern space."
MFC after: 2 weeks
subtle why it comes out the way it does. Once you realize that it
depends on the archiving order, it's also important to realize that
filesystem differences aren't going to break this case. (Some of the
other tests have had to be extensively rewritten to make them
independent of the order in which a particular filesystem returns file
entries.)
(This commit also serves to note the PR number that I accidentally
omitted from the previous commit.)
PR: bin/128562
MFC after: 30 days
good job writing this test; it exercises a lot of subtle cases. The
trickiest one is that a hardlink to something that didn't get
extracted should not itself be extracted. In some sense, this is not
the desired behavior (we'd rather restore the file), but it's the best
you can do in a single-pass restore of a tar archive.
The test here should be extended to exercise cpio and newc formats as
well, since their hardlink models are different, which will lead to
different handling of some of these edge cases.
Submitted by: Jaakko Heinonen
MFC after: 30 days
parenthesized subexpression is defined. For example, the
following command line caused unexpected behavior like
segmentation fault:
% echo test | sed -e 's/test/\1/'
PR: bin/126682
MFC after: 1 week
This replaces the getopt()/getopt_long() wrapper, the old-style
argument rewriter and the associated configuration glue with a more
straightforward custom command parser. In particular, this ensures
that bsdtar will have consistent option parsing on every platform,
regardless of whether the platform supports getopt_long().
MFC after: 30 days
-A Display the apparent size instead of the disk usage. This can be
helpful when operating on compressed volumes or sparse files.
-B blocksize
Calculate block counts in blocksize byte blocks. This is differ-
ent from the -k, -m options or setting BLOCKSIZE and gives an
estimate of how much space the examined file hierachy would
require on a filesystem with the given blocksize. Unless in -A
mode, blocksize is rounded up to the next multiple of 512.
The former is similar to GNU's du(1) --apparent-size. The latter is
different from what GNU's du(1) -B does, which is equivalent to setting
BLOCKSIZE in our implementation and is rather pointless as it doesn't add
any real value (i.e. you can achieve the same with a simple awk-script).
No change in the normal output or processing.
Reviewed by: keramida@, Peter French
Otherwise silience from: freebsd-hackers@
from Jaakko's original patch: I have misgivings about the portability
of the 'z' printf modifier so opted to cast the arguments to (int)
instead.
PR: bin/128561
Submitted by: Jaakko Heinonen
MFC after: 30 days
Use ioctl() to get the window size in vmstat(8), and force a new
header to be prepended to the output every time the current window
size changes. Change the number of lines before each header to the
current lines of the terminal when the terminal is resized, so that
the full terminal length can be used for output lines.
Inspired by: svn change 175562 (same feature for iostat)
Reviewed by: ru (who fixed some of my bugs too)
MFC after: 1 week
control over the result of buildworld and installworld; this especially
helps packaging systems such as nanobsd
Reviewed by: various (posted to arch)
MFC after: 1 month
script mode like the MRI(Microtec Research Inc.) "librarian" program.
Originally this option is provided by Binutils ar(1) to ease the
transition for developers who are used to writing "librarian" scripts.
We added this option to BSD ar(1) because:
1. Further improve the compatibility with Binutils ar(1).
2. There are still a few software using this -M option. (at least one
in our ports collection)
Suggested by: rink & erwin
HAVE_STRUCT_STAT_ST_FLAGS, Linux support depends on the
existence of the appropriate ioctl() options. In particular,
this should fix some nagging compile errors on Linux platforms
that don't have e2fsprogs-devel installed.
In particular:
* tar -x -P follows symlinks to existing dirs, but not without -P
* symlinks to files are always replaced
* broken symlinks are always replaced
After the MPSAFE TTY import we support an additional rlimit, called
RLIMIT_NPTS. This limit allows you to cap the amount of pseudo-terminals
allocated by one user.
We forgot to add support for this limit to limits(1), which means it
crashed. Add the proper bits to make it work like it should.
Unfortunately not all shells actually implement the RLIMIT, so
unfortunately I suspect it to be broken with certain shells.
Submitted by: Yuriy Tsibizov <yuriy tsibizov gfk ru>
for the convenience of rc.d. Now it has happily lived there for quite
a while. So move the pkill(1) source files from usr.bin to bin, too.
Approved by: gad
file with different permissions and set a non-zero umask
during the actual copy tests. The extra entry increases
the size of the test archives of course, so adjust the
expected sizes.
backslash if he/she wants to use a non-traditional delimiter, i.e.,
anything other than a slash. That is, /abc/ works as is, but xabcx
needs to be spelled as \xabcx.
Add appropriate markup.
Bump Dd.
Checked with: IEEE Std 1003.1, 2004 Edition
MFC after: 3 days
copying "dir/file" and then copying "dir" results in
"File on disk is not older; skipping" for the "dir" because
it was implicitly created by "dir/file." Among other sins,
this means that "dir" ends up with the wrong permissions
and ownership.
This is actually a libarchive bug; fix is forthcoming.
The number of blocks read from ustar archives is just an implementation
difference. The failure of bsdcpio to emit a block count to stderr
in -p mode is a real bug in bsdcpio.
following the archive structure. In particular, it no longer
crashes if you run it against GNU cpio 2.9 (although it does
still complain a lot more than it should).
This is easy to confuse with the actual exit status of the program.
Instead exit with EX_SOFTWARE if the command doesn't exit normally.
MFC after: 1 month
I would like to provide a way to preview the effects of pathname edits,
but pattern selection has to happen against the unedited path, so it
seems that we have to show people the unedited path to help in
designing selection patterns.
if a user logged in more than a week ago.
This may contain multibyte characters (e.g. when using UTF-8).
This string is then aligned on byte-length rathern than char-length,
resulting in misalignment and unfinished multibyte characters.
PR: 126657
Submitted by: Johan van Selst <johans@stack.nl>
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
than linear relations. We can now convert degC to degF.
586 units, 56 prefixes
You have: 24 degC
You want: degF
75.2
You have: degC
You want: K
(-> x*1 +273.15)
(<- y*1 -273.15)
During the import of the 4.4BSD Lite sources, four files got added to
the repository called :tt, :tty, :var and :ww. They seem to contain some
kind of debug information. These files aren't used/installed anywhere.
Unfortunately the colons in the filenames prevents us from checking out
the source tree on file systems that don't support colons (such as FAT).
Just remove these unneeded files to keep SVN happy.
Reported by: Rohit Tripathi <rohit trip gmail com>
MFC after: 3 days
- Merge changes from NetBSD and OpenBSD.
- Add the Euro as a primitive unit, add old converted currency and
pegged currency (Obtained from Wikipedia)
- Rename "dollar" to "usdollar" as primitive unit, remove non-pegged
currency and add pegged currency (Obtained from Wikipedia)
- Updated the accuracy of a lot of constants (Obtained from Wikipedia)
PR: bin/106545 bin/88252
Submitted by: trasz<trasz@pin.if.uz.zgora.pl>, J Vinopal <banshee@abattoir.com>
Approved by: bde@ (mentor)
MFC after: 1 week
understand which code paths aren't possible.
This commit eliminates 117 false positive bug reports of the form
"allocate memory; error out if pointer is NULL; use pointer".
big endian platforms where time_t is 64bits (ie armeb and sparc64), it will
be a problem.
Use a temporary time_t to work around this.
Submitted by: Matthew Luckie <mjl AT luckie DOT org dot nz>
MFC after: 3 days
but \0ddd in a %b argument, with a length restriction of 3 octal digits
in either case. This seems silly, but it needs to be right so it's possible
to write an octal escape followed by an ordinary digit. Solaris printf(1)
and GNU printf(1) also behave this way.
Example: "printf '\0752'" now produces "=2" instead of garbage.
Specifically, build a 32-bit /usr/bin/ldd32 on amd64 which handles 32-bit
objects. Since it is a 32-bit binary, it can fork a child process which
can dlopen() a 32-bit shared library. The current 32-bit support in ldd
can't do this because it does the dlopen() from a 64-bit process. In order
to preserve an intuitive interface for users, the ldd binary automatically
execs /usr/bin/ldd32 for 32-bit objects. The end result is that ldd on
amd64 now transparently handles 32-bit shared libraries in addition to
32-bit binaries.
Submitted by: ps (indirectly)
This article [1] describes the -p flag for make(1):
Write to standard output the complete set of macro definitions and
target descriptions. The output format is unspecified.
We already support a similar flag (-d g1), but unlike -p, it still
executes commands. Our implementation just turns it into -d g1, but also
sets flag `printGraphOnly', which will cause make(1) to skip execution.
[1] http://www.opengroup.org/onlinepubs/009695399/utilities/make.html
Reviewed by: imp
PR: standards/99960
In particular, this fixes the oddity that -dumpl would apply
umask to copied dirs (which are created in the target tree)
but not to "copied" files (which are only linked). After
this change:
$ ls -ld a a/b a/b/c
d--x-w-r-- 3 tim tim 512 Jul 29 20:08 a
drwxr----x 3 tim tim 512 Jul 29 20:09 a/b
dr----x-w- 2 tim tim 512 Jul 29 20:09 a/b/c
$ (echo a; echo a/b; echo a/b/c) | cpio -dumpl o
$ cd o
$ ls -ld a a/b a/b/c
d--x-w-r-- 3 tim tim 512 Jul 29 20:08 a
drwxr----x 3 tim tim 512 Jul 29 20:09 a/b
dr----x-w- 2 tim tim 512 Jul 29 20:09 a/b/c
if we're reducing a rule that has an empty
right hand side and the yacc stackpointer is pointing at the very
end of the allocated stack, we end up accessing the stack out of
bounds by the implicit $$ = $1 action
Obtained from: OpenBSD
where it is used. [1]
Don't leak file descriptors in write_entry_backend if archive_write_header
returns ARCHIVE_FAILED.
Found by: Coverity Prevent [1]
[/] root@ed-exigent>ldd `which httpd`
ldd: /usr/local/sbin/httpd: can't read program header
ldd: /usr/local/sbin/httpd: not a dynamic executable
But...
[/] root@ed-exigent>LD_32_TRACE_LOADED_OBJECTS==1 `which httpd`
libm.so.4 => /lib32//libm.so.4 (0x280c8000)
libaprutil-1.so.2 => /usr/local/lib/libaprutil-1.so.2 (0x280de000)
libexpat.so.6 => /usr/local/lib/libexpat.so.6 (0x280f2000)
libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x28110000)
libapr-1.so.2 => /usr/local/lib/libapr-1.so.2 (0x281fd000)
libcrypt.so.3 => /lib32//libcrypt.so.3 (0x2821d000)
libpthread.so.2 => not found (0x0)
libc.so.6 => /lib32//libc.so.6 (0x28235000)
libpthread.so.2 => /usr/lib32/libpthread.so.2 (0x2830d000)
Added support in ldd(1) for the LD_32_xxx environment variables if
the architecture of the machine is >32 bits. If we ever go to 128
bit architectures this excercise will have to be repeated but thanks
to earlier commits today it will be relative simple.
PR: bin/124906
Submitted by: edwin
Approved by: bde (mentor)
MFC after: 1 week
the main-loop into a seperate function.
Instead of using hardcoded environment variables, define them in a
lookup table.
For the rest, no functionality changes.
Approved by: bde (mentor)
MFC after: 1 week
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
link, just ignore the -l option and copy the file instead.
In particular, this should fix the COPYTREE_* macros used in the
ports infrastructure which use -l to preserve space but often get
used for cross-device copies.
noticed that a "whereis -qs qemu" matched the distfiles subdir of qemu
rather than /usr/ports/emulators/qemu.
It now ignores all dot entries in /usr/ports, plus all entries
starting with a capital letter (maintenance stuff like Templates, but
also includes subdir CVS), plus /usr/ports/distfiles which is simply a
magic name in that respect.
needed to promote cdev to cdev_priv, the si_priv pointer was followed.
Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.
Tested by: pho
Reviewed by: ed
MFC after: 2 weeks
a. The BSD version will be built and installed unless
WITHOUT_BSD_CPIO is defined.
b. The GNU version will not be built or installed unless
WITH_GNU_CPIO is defined. If this is defined, the symlink
in /usr/bin will be to the GNU version whether the BSD
version is present or not.
When these changes are MFCed the defaults should be flipped.
2. Add a knob to disable the building of GNU grep. This will
make it easier for those that want to test the BSD version in
the ports.
Approved by: kientzle [1]
Even though I ran a `make universe' to see whether the changes to the
device minor number macro's broke the build, I was not expecting `make
universe' to silently continue if build errors occured, thus causing me
to overlook the build error.
Approved by: philip (mentor)
Pointyhat to: me
since they are only tested for zero/nonzero; but it's arguably a bad
idea to set a {-1, 0} variable to 1 (as happens in this code).
Found by: Coverity Prevent
characters. [1]
Add $FreeBSD$ tag so that I can actually commit this.
PR: bin/118782
Reported by: Bjoern Koenig
Patch by: edwin, Jaakko Heinonen (not used patch)
MFC after: 1 week
Approved by: imp (mentor, implicit)
Starting now, there are two cpio programs in the base system:
/usr/bin/gcpio - GNU cpio
/usr/bin/bsdcpio - bsdcpio
In addition, there is a symlink:
/usr/bin/cpio -> /usr/bin/gcpio (default)
/usr/bin/cpio -> /usr/bin/bsdcpio (WITH_BSDCPIO)
In particular, WITH_BSDCPIO only controls the
symlink; bsdcpio is always built regardless.
Unless there are objections or problems, I intend:
* to make /usr/bin/bsdcpio available in 7.1
* to have /usr/bin/cpio default to bsdcpio in 8.0
(WITH_GCPIO will be an option instead of WITH_BSDCPIO)
* to leave /usr/bin/gcpio in the tree until 9.0
A new implementation of cpio that uses libarchive as it's back-end
archiving/dearchiving infrastructure. Includes test harness;
"make check" in the bsdcpio directory to build and run the test
harness.
In addition to a number of bug fixes and minor changes:
* --numeric-owner (ignore user/group names on create and extract)
* -S (sparsify files on extraction)
* -s (regex filename substitutions)
* Use new libarchive 'linkify' to get correct hardlink handling for
both old and new cpio formats
* Rework 'copy' test to be insensitive to readdir() filename ordering
Most of the credit for this work goes to Joerg Sonnenberger, who
has been duplicating features from NetBSD's 'pax' program.
similar to _WANT_UCRED and _WANT_PRISON and seems to be much nicer than
defining _KERNEL.
It is also needed for my sys/refcount.h change going in soon.
NET_NEEDS_GIANT. netatm has been disconnected from the build for ten
months in HEAD/RELENG_7. Specifics:
- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.
MFC after: 3 weeks
Reviewed by: bz
Discussed with: bms, bz, harti
hardlink table for two reasons: 1. If le->name is set to NULL, the
structure le won't be inserted into the table; 2. Even if le somehow
did manage to get into the table with le->name equal to NULL, we would
die when we dereferenced le->null before we could get to the point of
freeing the entry.
Remove the unnecessary "if (le->name != NULL)" test and just free the
pointer.
Found by: Coverity Prevent
running 'tar ""' would print 'No memory' instead of the correct error
message, 'Must specify one of -c, -r, -t, -u, -x' if malloc is set to
System V mode (malloc(0) == NULL).
(in fact, there has never been any way for it to be NULL, going all the
way back to revision 1.1 of this file), so remove the check and
unconditionally free entry.
Found by: Coverity Prevent
handling to bsdtar. When writing archives (including copying via the
@archive directive) a line is output to stderr indicating what is being
done (adding or copying), the path, and how far through the file we are;
extracting currently does not report progress within each file, but
this is likely to happen eventually.
Discussed with: kientzle
Obtained from: tarsnap
files if the existing file is newer than the archive entry).
Currently if any files are ignored, bsdtar will exit with a non-zero
exit status; this is likely to change in the future, but requires some
API changes in libarchive.
Discussed with: kientzle
Obtained from: tarsnap
(all types) used per socket buffer.
Add support to netstat to print out all of the socket buffer
statistics.
Update the netstat manual page to describe the new -x flag
which gives the extended output.
Reviewed by: rwatson, julian
Document this. Do not require channel number in server mode. If not
specified - bind to ''wildcard'' channel zero. Real channel number will
be obtained automatically and registered with local sdpd(8). While I'm
here fix serial port service registration.
Submitted by: luigi
Tested by: Helge Oldach <freebsd-bluetooth at oldach dot net>
MFC after: 3 days
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
PR:
Reviewed by: several including rwatson, bz and mlair (parts each)
Approved by:
Obtained from: Ironport systems/Cisco
MFC after:
Security:
the vnode pointer is not NULL. This avoids spurious warnings in fstat -v
output for kernel processes.
MFC after: 1 week
PR: amd64/123456
Submitted by: KOIE Hidetaka | hide koie.org
Previously they would have left TIOCEXCL enabled, requiring
either a reboot or use of tip/cu as the root user.
Observed when running QEMU with character devices redirected to pty instances.
MFC after: 2 weeks
* --format can be used with -r or -u
* -o is a synonym for --format=ustar when used with -c, -r, or -u
Also, fix the erroneous sanity check that suppressed --format with -r or -u.
The bug was unnoticed on non-i386 because mp_maxid is
initialized differently, kern.cp_times doesn't print
zeroes for non-existing CPUs, so no "writing outside of
array bounds" happens.
MFC after: 3 days
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
The current FreeBSD syscall generation script uses all 20 and I need
another open file.
It's a shame that something named as the 'one-true-awk' is so limited
by an old denition like FOPEN_MAX when it could just make the file
handling dynamic.
This is done to avoid touching contrib sources on a vendor branch.
spaces in values. Without this change, the following valid
call broke due to parsing of .MAKEFLAGS in bsd.symver.mk:
cd /usr/src/lib/libc && make -n DEBUG_FLAGS="-DFOO -DBAR"
Spotted by: Igor Sysoev
Submitted by: Maxim Dounin, ru
MFC after: 1 week
by default rather than the setmask. This is consistent with the linux
tool and more consistent with the notion that the default level is
the process level. The cpuset mask can still be modified by specifying
the -c option. You can not set the per-thread and cpuset mask in
a single command.
- Update the man page to reflect this change.
Contributed by: gallatin