about the risks of enabling raw sockets in prisons.
Because raw sockets can be used to configure and interact
with various network subsystems, extra caution should be
used where privileged access to jails is given out to
untrusted parties. As such, by default this option is disabled.
A few others and I are currently auditing the kernel
source code to ensure that the use of raw sockets by
privledged prison users is safe.
Approved by: bmilekic (mentor)
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
Add two additional pairs of assertions, one at the end of the NFS
server event loop, and one one exit from the NFS daemon, that
assert that if debug.mpsafenet is enabled, Giant is not held, and
that if it is not enabled, Giant will be held. This is intended
to support debugging scenarios where Giant is "leaked" during NFS
processing.
my Elektor card. Note that the hints are necessary to specify the
IO base of the pcf chip. This enables to check the IO base when the
probe routine is called during ISA enumeration.
The interrupt driven code is mixed with polled mode, which is wrong
and produces supposed spurious interrupts at each access. I still have
to work on it.
/usr/local/www
[1] Semi-arbitrary cutoff, but I didn't want to add every locale directory
used by ports, because a lot are only used by one or two, and it's less
intrusive for these ports to just clean up after themselves.
MFC after: 2 days
install nfssvc(). It also updates the argument count, but did so
without setting SYF_MPSAFE, effectively removing the MPSAFE flag even
when syscalls.master indicates it doesn't require Giant. This change
forces the modevent to set MPSAFE as a flag to its internal notion of
an argument coutn.
Note: this duplication of information is a bad thing, but is a more
general problem I'm not currently willing to address.
kernel). No other sys/*.h file requires machine/foo.h to be included
before it. In addition, all the files that include rman.h would need
to include those two anyway. From these two perspectives, it is
traditional to include things like this.
This lets us stop treating sys/rman.h specially in every bus frontend
file.
a vnode. Not bumped into with asserts in the main tree because we
run the NFS server with Giant by default. Discovered by inspection.
Complete annotations of Giant acquisition/release to note that it's
only because of VFS that we acquire Giant in most places in the NFS
server.
explicitly fsynced after kernel messages are logged. This option
should be syntax compatible with a similar option in Linux syslogd.
I've made some small changes to Pekka's patch, hoepfully I haven't
goofed anything.
PR: 66790
Submitted by: Pekka Savola <pekkas@netcore.fi>
Obtained from: Martin Schulze's syslogd
MFC after: 1 month
arguments to the needed type and so the result type depended on the argument
type. Fixing them isn't really worth the effort because GCC emits the same
assembler code with or without them.
Not minded by: ru
Approved by: das (mentor)
long time, i.e., since the cleanup of the VM Page-queues code done two
years ago.
Reviewed by: Alan Cox <alc at freebsd.org>,
Matthew Dillon <dillon at backplane.com>
Syslogd should ensure that f_file is a valid file descriptor when
f_type is FILE, CONSOLE, TTY and for a PIPE where f_pid > 0. If the
descriptor is closed/invalid then the type should be set to UNUSED
or the pid should be set to 0.
To this end:
1) Don't close(f->f_file) if we can't send a message to a remote
host because the file descriptor used for remote logging is
stored in finet, not in f->f_file. f->f_file is probably
uninitialised, so I guess we usually end up closing fd 0.
2) Don't close PIPE file descriptors if they are invalid.
3) If the call to p_open fails, don't set the pid.
The OpenBSD patches in this area set f_file to -1 after the fd is
closed and then avoids calling close if f_file < 0. I haven't done
this, but it might be a good idea too.
Inspired by: PR 67139/OpenBSD