security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
This may be a MFC candidate for RELENG_5.
Reviewed by: rwatson
Approved by: bmilekic (mentor)
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd
UDP/IP header, make sure that space is also allocated for the link
layer header. If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer. This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
to check aperture size, avoiding hangs. Maintain the rest of the bits when
setting/unsetting ATTBASE. This essentially matches Linux's AGP driver as well.
PR: kern/70037
Submitted by: Mark Tinguely <tinguely at casselton dot net>
Obtained from: NetBSD
in the shutdown_final state if the RB_NOSYNC flag is set.
The specific motivation in this case is that a system panic in an
interrupt context results in a call to module_shutdown(), which
calls g_modevent(), which calls g_malloc(..., M_WAITOK), which
results in a second panic. While g_modevent() could be fixed to
not call malloc() for MOD_SHUTDOWN events (which it doesn't handle
in any case), it is probably also a good idea to entirely skip the
execution of the module shutdown handlers after a panic.
This may be a MFC candidate for RELENG_5.
shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.
This is a MFC candidate for RELENG_5.
MFC after: 3 days
systems that have overlapping regions specified in their sysresource
objects. This patch fixes ATA DMA and acpi_timer allocation for such
sysctems. It should eventually be moved to resource_list_add() if it is
a valid generalized approach. The minimal approach for 5.3 is:
"Loop through all current resources to see if the new one overlaps
any existing ones. If so, the old one always takes precedence and
the new one is adjusted (or rejected). We check for three cases:
1. Tail of new resource overlaps head of old resource: truncate the
new resource so it is contiguous with the start of the old.
2. New resource wholly contained within the old resource: error.
3. Head of new resource overlaps tail of old resource: truncate the
new resource so it is contiguous, following the old."
Tested by: Radek Kozlowski <radek_at_raadradd.com>
Discussed with: imp
MFC after: 4 days
of 0x3f2-0x3f5,0x3f7 the ports are not 7 bytes apart. This should fix
floppy probing on such systems. (We handle the case of adjusting for
a start of 0x3f2 -> 0x3f0 separately, although that code should still be
checked if there are still floppy problems for others.)
Tested by: Sarunas Vancevicius <vsarunas_at_eircom.net>
MFC after: 3 days
sockets are connection-oriented for the purposes of kqueue
registration. Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS). Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.
Apologies to: jmg
Discussed with: imp, scottl
Centralize the fdctl_wr() function by adding the offset in
the resource to the softc structure.
Bugfix: Read the drive-change signal from the correct place:
same place as the ctl register.
Remove the cdevsw{} related code and implement a GEOM class.
Ditch the state-engine and park a thread on each controller
to service the queue.
Make the interrupt FAST & MPSAFE since it is just a simple
wakeup(9) call.
Rely on a per controller mutex to protect the bioqueues.
Grab GEOMs topology lock when we have to and Giant when
ISADMA needs it. Since all access to the hardware is
isolated in the per controller thread, the rest of the
driver is lock & Giant free.
Create a per-drive queue where requests are parked while
the motor spins up. When the motor is running the requests
are purged to the per controller queue. This allows
requests to other drives to be serviced during spin-up.
Only setup the motor-off timeout when we finish the last
request on the queue and cancel it when a new request
arrives. This fixes the bug in the old code where the motor
turned off while we were still retrying a request.
Make the "drive-change" work reliably. Probe the drive on
first opens. Probe with a recal and a seek to cyl=1 to
reset the drive change line and check again to see if we
have a media.
When we see the media disappear we destroy the geom provider,
create a new one, and flag that autodetection should happen
next time we see a media (unless a specific format is configured).
Add sysctl tunables for a lot of drive related parameters.
If you spend a lot of time waiting for floppies you can
grab the i82078 pdf from Intels web-page and try tuning
these.
Add sysctl debug.fdc.debugflags which will enable various
kinds of debugging printfs.
Add central definitions of our well known floppy formats.
Simplify datastructures for autoselection of format and
call the code at the right times.
Bugfix: Remove at least one piece of code which would have
made 2.88M floppies not work.
Use implied seeks on enhanced controllers.
Use multisector transfers on all controllers. Increase
ISADMA bounce buffers accordingly.
Fall back to single sector when retrying. Reset retry count
on every successful transaction.
Sort functions in a more sensible order and generally tidy
up a fair bit here and there.
Assorted related fixes and adjustments in userland utilities.
WORKAROUNDS:
Do allow r/w opens of r/o media but refuse actual write
operations. This is necessary until the p4::phk_bufwork
branch gets integrated (This problem relates to remounting
not reopening devices, see sys/*/*/${fs}_vfsops.c for details).
Keep PC98's private copy of the old floppy driver compiling
and presumably working (see below).
TODO (planned)
Move probing of drives until after interrupts/timeouts work
(like for ATA/SCSI drives).
TODO (unplanned)
This driver should be made to work on PC98 as well.
Test on YE-DATA PCMCIA floppy drive.
Fix 2.88M media.
This is a MT5 candidate (depends on the bioq_takefirst() addition).
is an effective band-aid for at least some of the scheduler corruption seen
recently. The real fix will involve protecting threads while they are
inconsistent, and will come later.
Submitted by: julian
requires a recompile of netgraph users.
Also change the size of a field in the bluetooth code
that was waiting for the next change that needed recompiles so
it could piggyback its way in.
Submitted by: jdp, maksim
MFC after: 2 days
changes to the ATA driver cause a kernel crash, no fault of the ATA
code. Work is in progress to add the necessary feature to the sparc64
kernel and this commit will be backed out when it is complete. This
bandaid is being put in mostly in the interests of getting the first
release snapshot done and out the door.
Tested on: Ultra-10 exhibiting the insta-panic.
MFC: Real Soon
If the bioq is empty, NULL is returned. Otherwise the front element
is removed and returned.
This can simplify locking in many drivers from:
lock()
bp = bioq_first(bq);
if (bp == NULL) {
unlock()
return
}
bioq_remove(bp, bq)
unlock
to:
lock()
bp = bioq_takefirst(bq);
unlock()
if (bp == NULL)
return;
Since pmap_enter() calls pmap_invalidate_page(), which needs interrupts
enabled in the SMP case, we defer the disable to right before saving the
register context. This has been incorrect for about a year but caused no
real problems because the identity page never actually replaces a previously
mapped page and suspend/resume on SMP systems has been uncommon.
Tested by: sos
MFC after: 3 days
the ipfw KLD.
For IPFIREWALL_FORWARD this does not have any side effects. If the module
has it but not the kernel it just doesn't do anything.
For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT
compiled in too. However this is the least disturbing behaviour. The user
can just recompile either module or the kernel to match the other one. The
access to the machine is not denied if ipfw refuses to load.
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
Specifically, if a thread is awakened by something other than a signal
while checking for signals before going to sleep, clear TDF_SINTR in
sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in
that edge case during an interruptible sleep. Also, fix
sleepq_check_signals() to properly handle the condition if TDF_SINTR is
clear rather than requiring the callers of the sleepq API to notice
this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
sleepq_broadcast() by creating an explicit submask for sleepq types.
Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
than sleepq_catch_signals(). Note that it is the caller's responsibility
to ensure that sleepq_catch_signals() is called if and only if this flag
is passed to the preceeding sleepq_add(). Note that this also removes a
sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures
that for an interruptible sleep, TDF_SINTR is always set when
TD_ON_SLEEPQ() is true.
has only been partly initialized via newfs(8) so that it applies to both
UFS1 and UFS2.
Submitted by: "Xin LI" delphij at frontfree dot net
MFC: maybe?
lock is not held.
Rather than annotating that the lock is released after calls to
unp_detach() with a comment, annotate with an assertion.
Assert that the UNIX domain socket subsystem lock is not held when
unp_externalize() and unp_internalize() are called.
This provides greater context for the locking and allows us to avoid
locking the pcbinfo structure if not binding operations will take
place (i.e., already bound, connected, and no expliti sendto()
address).