semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
link, just ignore the -l option and copy the file instead.
In particular, this should fix the COPYTREE_* macros used in the
ports infrastructure which use -l to preserve space but often get
used for cross-device copies.
noticed that a "whereis -qs qemu" matched the distfiles subdir of qemu
rather than /usr/ports/emulators/qemu.
It now ignores all dot entries in /usr/ports, plus all entries
starting with a capital letter (maintenance stuff like Templates, but
also includes subdir CVS), plus /usr/ports/distfiles which is simply a
magic name in that respect.
needed to promote cdev to cdev_priv, the si_priv pointer was followed.
Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.
Tested by: pho
Reviewed by: ed
MFC after: 2 weeks
a. The BSD version will be built and installed unless
WITHOUT_BSD_CPIO is defined.
b. The GNU version will not be built or installed unless
WITH_GNU_CPIO is defined. If this is defined, the symlink
in /usr/bin will be to the GNU version whether the BSD
version is present or not.
When these changes are MFCed the defaults should be flipped.
2. Add a knob to disable the building of GNU grep. This will
make it easier for those that want to test the BSD version in
the ports.
Approved by: kientzle [1]
Even though I ran a `make universe' to see whether the changes to the
device minor number macro's broke the build, I was not expecting `make
universe' to silently continue if build errors occured, thus causing me
to overlook the build error.
Approved by: philip (mentor)
Pointyhat to: me
since they are only tested for zero/nonzero; but it's arguably a bad
idea to set a {-1, 0} variable to 1 (as happens in this code).
Found by: Coverity Prevent
characters. [1]
Add $FreeBSD$ tag so that I can actually commit this.
PR: bin/118782
Reported by: Bjoern Koenig
Patch by: edwin, Jaakko Heinonen (not used patch)
MFC after: 1 week
Approved by: imp (mentor, implicit)
Starting now, there are two cpio programs in the base system:
/usr/bin/gcpio - GNU cpio
/usr/bin/bsdcpio - bsdcpio
In addition, there is a symlink:
/usr/bin/cpio -> /usr/bin/gcpio (default)
/usr/bin/cpio -> /usr/bin/bsdcpio (WITH_BSDCPIO)
In particular, WITH_BSDCPIO only controls the
symlink; bsdcpio is always built regardless.
Unless there are objections or problems, I intend:
* to make /usr/bin/bsdcpio available in 7.1
* to have /usr/bin/cpio default to bsdcpio in 8.0
(WITH_GCPIO will be an option instead of WITH_BSDCPIO)
* to leave /usr/bin/gcpio in the tree until 9.0
A new implementation of cpio that uses libarchive as it's back-end
archiving/dearchiving infrastructure. Includes test harness;
"make check" in the bsdcpio directory to build and run the test
harness.
In addition to a number of bug fixes and minor changes:
* --numeric-owner (ignore user/group names on create and extract)
* -S (sparsify files on extraction)
* -s (regex filename substitutions)
* Use new libarchive 'linkify' to get correct hardlink handling for
both old and new cpio formats
* Rework 'copy' test to be insensitive to readdir() filename ordering
Most of the credit for this work goes to Joerg Sonnenberger, who
has been duplicating features from NetBSD's 'pax' program.
similar to _WANT_UCRED and _WANT_PRISON and seems to be much nicer than
defining _KERNEL.
It is also needed for my sys/refcount.h change going in soon.
NET_NEEDS_GIANT. netatm has been disconnected from the build for ten
months in HEAD/RELENG_7. Specifics:
- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.
MFC after: 3 weeks
Reviewed by: bz
Discussed with: bms, bz, harti
hardlink table for two reasons: 1. If le->name is set to NULL, the
structure le won't be inserted into the table; 2. Even if le somehow
did manage to get into the table with le->name equal to NULL, we would
die when we dereferenced le->null before we could get to the point of
freeing the entry.
Remove the unnecessary "if (le->name != NULL)" test and just free the
pointer.
Found by: Coverity Prevent
running 'tar ""' would print 'No memory' instead of the correct error
message, 'Must specify one of -c, -r, -t, -u, -x' if malloc is set to
System V mode (malloc(0) == NULL).
(in fact, there has never been any way for it to be NULL, going all the
way back to revision 1.1 of this file), so remove the check and
unconditionally free entry.
Found by: Coverity Prevent
handling to bsdtar. When writing archives (including copying via the
@archive directive) a line is output to stderr indicating what is being
done (adding or copying), the path, and how far through the file we are;
extracting currently does not report progress within each file, but
this is likely to happen eventually.
Discussed with: kientzle
Obtained from: tarsnap
files if the existing file is newer than the archive entry).
Currently if any files are ignored, bsdtar will exit with a non-zero
exit status; this is likely to change in the future, but requires some
API changes in libarchive.
Discussed with: kientzle
Obtained from: tarsnap
(all types) used per socket buffer.
Add support to netstat to print out all of the socket buffer
statistics.
Update the netstat manual page to describe the new -x flag
which gives the extended output.
Reviewed by: rwatson, julian
Document this. Do not require channel number in server mode. If not
specified - bind to ''wildcard'' channel zero. Real channel number will
be obtained automatically and registered with local sdpd(8). While I'm
here fix serial port service registration.
Submitted by: luigi
Tested by: Helge Oldach <freebsd-bluetooth at oldach dot net>
MFC after: 3 days
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
PR:
Reviewed by: several including rwatson, bz and mlair (parts each)
Approved by:
Obtained from: Ironport systems/Cisco
MFC after:
Security:
the vnode pointer is not NULL. This avoids spurious warnings in fstat -v
output for kernel processes.
MFC after: 1 week
PR: amd64/123456
Submitted by: KOIE Hidetaka | hide koie.org
Previously they would have left TIOCEXCL enabled, requiring
either a reboot or use of tip/cu as the root user.
Observed when running QEMU with character devices redirected to pty instances.
MFC after: 2 weeks
* --format can be used with -r or -u
* -o is a synonym for --format=ustar when used with -c, -r, or -u
Also, fix the erroneous sanity check that suppressed --format with -r or -u.
The bug was unnoticed on non-i386 because mp_maxid is
initialized differently, kern.cp_times doesn't print
zeroes for non-existing CPUs, so no "writing outside of
array bounds" happens.
MFC after: 3 days
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
The current FreeBSD syscall generation script uses all 20 and I need
another open file.
It's a shame that something named as the 'one-true-awk' is so limited
by an old denition like FOPEN_MAX when it could just make the file
handling dynamic.
This is done to avoid touching contrib sources on a vendor branch.