- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.
proctree lock and the process lock are held when updating p_pptr and
p_oppid. When we are just reaading p_pptr we only need the proc lock and
not a proctree lock as well.
Giant. The only exception is the CANSIGNAL() macro. Unlocking the proc
lock around sendsig() in trapsignal() is also questionable. Note that
the functions sigexit(), psignal(), and issignal() must be called with
the proc lock of the process in question held. postsig() and
trapsignal() should not be called with the proc lock held, but they
also do not require Giant anymore either.
- Remove spl's that are now no longer needed as they are fully replaced.
don't end up back at ourselves which would indicate deadlock.
- Add the proc lock to the witness dup_list as we may hold more than one
process lock at a time.
- Don't assert a mutex is owned in _mtx_unlock_sleep() as that is too late.
We do the checks in the macros instead.
mutex operations in kthread_create().
- Lock a kthread's proc before changing its parent via proc_reparent().
- Test P_KTHREAD not P_SYSTEM in kthread_suspend() and kthread_resume().
P_SYSTEM just means that the process shouldn't be swapped and is used
for vinum's daemon for example.
- Lock all the signal state used for suspending and resuming kthreads with
the proc lock.
- Add proc locking to fork1(). Always lock the child procoess (new
process) first when both processes need to be locked at the same
time.
- Remove unneeded spl()'s as the data they protected is now locked.
- Ensure that the proctree is exclusively locked and the new process is
locked when setting up the parent process pointer.
- Lock the check for P_KTHREAD in p_flag in fork_exit().
possible for us to see a process in the early stages of fork before p_fd
has been initialized. Ideally, we wouldn't stick a process on the allproc
list until it was fully created however.
than dinking around in the process lists explicitly.
- Hold both the proctree lock and proc lock of the child process when
reparenting a process via proc_reparent.
- Lock processes while sending them signals.
- Miscellaenous proc locking.
- proc_reparent() now asserts that the child is locked in addition to an
exclusive proctree lock.
- Move the _mtx_assert() prototype up to the top of the file with the rest
of the function prototypes.
- Define all the mtx_foo() macros in terms of mtx_foo_flags().
- Add a KASSERT() to check for invalid options in mtx_lock_flags().
- Move the mtx_assert() to ensure a mutex is owned before releasing it
in front of WITNESS_EXIT() in all the mtx_unlock_* macros.
- Change the MPASS* macros to be on #ifdef INVARIANTS, not just #ifdef
MUTEX_DEBUG since most of them check to see that the mutex functions are
called properly. Define MPASS4() in terms of KASSERT() to do this.
- Define MPASS{,[23]} in terms of MPASS4() to simplify things and avoid
code duplication.
that write access to a member requires both locks and read access only
requires one of the given locks. Convert instances of '(c+)' to
'(c + k)' as a result.
- Change p_pptr from (e) to (c + e).
- Change p_oppid from (c) to (c + e).
- Change p_args from (b?) to (c + k).
- Move the actual work of STOPEVENT, PHOLD, and PRELE to _STOPEVENT,
_PHOLD, and _PRELE. The new macros do not acquire the proc lock and
simply assert that it is held. The non _ prefixed macros acquire the
proc lock and then call the _ prefixed macros.
- Add a PROC_LOCK_NOSWITCH() macro to be used when releasing the proc lock
while already holding a spin lock (usually sched_lock).
- Add a PROC_LOCK_ASSERT() macro to be used to make assertions about the
proc lock. It takes the usual mtx_assert() macro arguments as its
second argument.
INVARIANTS case, define the actual KASSERT() in _SX_ASSERT_[SX]LOCKED
macros that are used in the sx code itself and convert the
SX_ASSERT_[SX]LOCKED macros to simple wrappers that grab the mutex for the
duration of the check.
support implementations of ACLs in file systems. Introduce the
following new functions:
vaccess_acl_posix1e() vaccess() that accepts an ACL
acl_posix1e_mode_to_perm() Convert mode bits to ACL rights
acl_posix1e_mode_to_entry() Build ACL entry from mode/uid/gid
acl_posix1e_perms_to_mode() Generate file mode from ACL
acl_posix1e_check() Syntax verification for ACL
These functions allow a file system to rely on central ACL evaluation
and syntax checking, as well as providing useful utilities to
allow ACL-based file systems to generate mode/owner/etc information
to return via VOP_GETATTR(), and to support file systems that split
their ACL information over their existing inode storage (mode, uid,
gid) and extended ACL into extended attributes (additional users,
groups, ACL mask).
o Add prototypes for exported functions to sys/acl.h, sys/vnode.h
Reviewed by: trustedbsd-discuss, freebsd-arch
Obtained from: TrustedBSD Project
but potentially significant in -4.x.)
Eliminate a pointless parameter to aio_fphysio().
Remove unnecessary casts from aio_fphysio() and aio_physwakeup().
- Add sx_xholder member to sx struct which is used for INVARIANTS-enabled
assertions. It indicates the thread that presently owns the xlock.
- Add some assertions to the sx lock code that will detect the fatal
API abuse:
xlock --> xlock
xlock --> slock
which now works thanks to sx_xholder.
Notice that the remaining two problematic cases:
slock --> xlock
slock --> slock (a little less problematic, but still recursion)
will need to be handled by witness eventually, as they are more
involved.
Reviewed by: jhb, jake, jasone
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.
Submitted by: dfr
Debugging help: peter
Tested by: me
cursor.
The reason is: mouse cursor goes into hide/visible loop while text cursor even
not moved.
PR: 25536
Submitted by: David Xu <davidx@viasoft.com.cn>
palette. As a result, the colors on the video console can look rather
weird. For example, sysinstall on the alpha has a read background. We
can work around this partially by remapping the colors used by syscons for
the ANSI color escape sequences. Note that screen savers and anything that
sets the colors explicitly will still get incorrect colors, but programs
such as sysinstall will now use the correct colors. A more correct fix
would be to actually fix the VGA palette on boot by either swapping all
the red and blue attributes or by hardcoding a standard palette and
overwriting the entire palette.
Requested by: gallatin
Obtained from: NetBSD
This lets us run programs containing newer (eg bwx) instructions
on older (eg EV5 and less) machines. One win is that we can
now run Acrobat4 on EV4s and EV5s.
Obtained from: NetBSD
Glanced at by: mjacob
MFS: bring the consistent `compat_3_brand' support
This should fix the linux-related panics reported
by naddy@mips.inka.de (Christian Weisgerber)
Forgotten by: obrien
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.
Add warning comments about the ip_checkinterface feature.
aiocb's allocated by zalloc(). In other words, zfree() was never
called. Now, we call zfree(). Why eliminate this micro-
optimization? At some later point, when we multithread the AIO
system, we would need a mutex to synchronize access to aio_freejobs,
making its use nearly indistinguishable in cost from zalloc() and
zfree().
Remove unnecessary fhold() and fdrop() calls from aio_qphysio(),
undo'ing a part of revision 1.86. The reference count on the file
structure is already incremented by _aio_aqueue() before it calls
aio_qphysio(). (Update the comments to document this fact.)
Remove unnecessary casts from _aio_aqueue(), aio_read(), aio_write()
and aio_waitcomplete().
Remove an unnecessary "return;" from aio_process().
Add "static" in various places.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.
Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).
This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)
Submitted by: Mike Silbersack <silby@silby.com>
addressed to the interface on the other side of the box follow their
historical path.
Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().
Always check the arrival interface when matching the packet destination
against the interface broadcast addresses. This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side. This was broken
when the directed broadcast code was removed from revision 1.32. If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.
Optimize the order of the tests.
Reviewed by: jlemon
related code from aio_read() and aio_write(). This field was
intended, but never used, to allow a mythical user-level library to
make an aio_read() or aio_write() behave like an ordinary read() or
write(), i.e., a blocking I/O operation.
user space. It has already been copied in and mp->mnt_stat.f_mntonname has
already been initialised by the caller.
This fixes a panic on the alpha caused by the fact that the variable
'size' wasn't initialised because the call to copyinstr() bailed out with
an EFAULT error.
bolted to a ne-2000 chip. This is necessary for the NetGear FA-410TX
and other cards.
This also requires you add mii to your kernel if you have an ed driver
configured.
This code will result in a couple of timeout messages for ed on the
impacted cards. Additional work will be needed, but this does work
right now, and many people need these cards.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
`infrastructure' built with INVARIANT_SUPPORT for kern_mutex.c essentially
involves _mtx_assert(), which makes use of constants that were defined
under #ifdef INVARIANTS here.
This is a pretty invasive change, but there are three good
reasons to do this:
1. We'll never have > 16 bits of handle.
2. We can (eventually) enable the RIO (Reduced Interrupt Operation)
bits which return multiple completing 16 bit handles in mailbox
registers.
3. The !)$*)$*~)@$*~)$* Qlogic target mode for parallel SCSI spec
changed such that at_reserved (which was 32 bits) was split into
two pieces- and one of which was a 16 bit handle id that functions
like the at_rxid for Fibre Channel (a tag for the f/w to correlate
CTIOs with a particular command). Since we had to muck with that
and this changed the whole handler architecture, we might as well...
Propagate new at_handle on through int ct_fwhandle. Follow
implications of changing to 16 bit handles.
These above changes at least get Qlogic 1040 cards working in target
mode again. 1080/12160 cards don't work yet.
In isp.c:
Prepare for doing all loop management in outer layers.
with egcs-1.1.1. bus_space_write_multi_2() had an extra operation that
should have been removed.
Remove it.
This fixes the panic when bus_space_write_multi_2() is used.
Obtained from: jake
- Add a KASSERT() to ensure an ithread has a backing kernel thread when we
schedule it.
- Don't attempt to preemptively switch to an ithread if p_stat of curproc
is not SRUN.
newbus in revision 1.19. As a result, lnc was, I believe, broken
for all PCI cards. The softc fields `lnc_btag' and `lnc_bhandle'
were not initialised, `rap', `rdp' and `bdp' were initialised to
the wrong values, and the size of the DMA ring memory was calculated
incorrectly.
Paul Richards has further cleanups in the pipeline, but this at
least is enough to make the driver usable with VMware.
Approved by: paul
is to return EINPROGRESS, EALREADY, (so_error ONCE), EISCONN. Certain
linux applications rely on the so_error (normally 0) being returned in
order to operate properly.
Tested by: Thomas Moestl <tmoestl@gmx.net>
An initial tidyup of the mount() syscall and VFS mount code.
This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.
* the guts of the mount work has been moved into vfs_mount().
* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.
* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.
* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)
* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.
* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.
* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
of memory, rather than from the start.
This fixes problems allocating bouncebuffers on alphas where there is only
1 chunk of memory (unlike PCs where there is generally at least one small
chunk and a large chunk). Having 1 chunk had been fatal, because these
structures take over 13MB on a machine with 1GB of ram. This doesn't leave
much room for other structures and bounce buffers if they're at the front.
Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch
Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>
for selecting unit). Instead, use the resource hints mechanism.
One unfortunate situation here is that there is no resource_quad_value
function- which is what I needed for WWN boot time replacement. Worse-
you can't store the hint as just plain
hint.isp.0.nodewwn="0x50000000aaaa0001"
because this gets interpreted as an int- incorrectly because it can't
be converted to an int. I can't even get this as a string. To work
around this particular case for nodewwn && portwwn setting, this
rather grotesque form will be used:
hint.isp.0.nodewwn="w50000000aaaa0001"
hint.isp.0.portwwn="w50000000aaaa0002"
At the same time, if we have no hinted WWN, set the default WWN (which, btw,
gets overridden if the card has valid NVRAM, which is usual) to
0x400000007F000009ull (which translates to NAA == IPv4, 127.0.0.9).
Eliminate more printf's and replace them either with device_printf or
isp_prt calls.
ones where we have a CAM path) and replacing them with calls to isp_prt.,
Eliminate isp_unit references- we no longer have an isp_unit- we now
have an isp_dev that device_get_unit can work with.
`rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that
it did. This bug reliably caused a panic at reboot time if any
filesystem had been mounted directly over /.
The checkdirs() function is called at mount time to find any process
fd_cdir or fd_rdir pointers referencing the covered mountpoint
vnode. It transfers these to point at the root of the new filesystem.
However, this process was not reversed at unmount time, so processes
with a cwd/root at a mount point would unexpectedly lose their
cwd/root following a mount-unmount cycle at that mountpoint.
This change should fix both of the above issues. Start_init() now
holds an extra vnode reference corresponding to `rootvnode', and
dounmount() releases this reference when the root filesystem is
unmounted just before reboot. Dounmount() now undoes the actions
taken by checkdirs() at mount time; any process cdir/rdir pointers
that reference the root vnode of the unmounted filesystem are
transferred to the now-uncovered vnode.
Reviewed by: bde, phk
a regular basis. Adjust our linux emulation to conform. This will
cause more dirty pages to be left for the pagedaemon to deal with,
but our new low-memory handling code can deal with it. The linux
way appears to be a trend, and we may very well make MAP_NOSYNC the
default for FreeBSD as well (once we have reasonable sequential
write-behind heuristics for random faults).
(will be MFC'd prior to 4.3 freeze)
Suggested by: Andrew Gallatin
make sure that PG_NOSYNC is properly set. Previously we only set it
for a write-fault, but this can occur on a read-fault too.
(will be MFCd prior to 4.3 freeze)
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)
Testing by: mjacob, dillon
update native priority, it is diffcult to get right and likely
to end up horribly wrong. Use an honestly wrong fixed value
that seems to work; PUSER for user threads, and the interrupt
priority for ithreads. Set it once when the process is created
and forget about it.
Suggested by: bde
Pointy hat: me
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface. Skip this check if the packet was locally
generated.
CVSrepo deletion of the previous attempt will be requested:
--original message--
Add the 'virtual nulmodem driver'
Particularly useful for debuging kernels using vmware.
If your name is Bruce evans and you are a WIZ at tty interfaces,
then you should probably rip this to shreds and offer lots of suggestions and
patches. I've been using this since 4.0-CURRENT and it's never caused
problems but I'm sure I got something wrong. This is similar to the pty/cty
device driver except that both sides are ttys. Even minor numbers
are side A and odd minor numbers are side B.
Work needs to be done regarding what happens to the other side when you
close a node.
to use with vmware, configure vmware to redirect COM2 out to side A of one
of these and boot a kernel with teh gdb remote port set to sio1.
AFTER dropping into the gdb kernel debugger in your test kernel,
fire up gdb with it's remote port pointing at the appropriate side B.
To catch all console output, you can boot the vmware kernel with a serial
console, (COM1) similarly redirected to a nulmodem, and use 'tip' to observe it.
This is practically unaltered since pre 4.0 days except for
changes made along the way needed to make it compile, so any suggestions
or offers of total rewrites will be listenned to :-)
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.
Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).
Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).
rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
rather than in silly places like "VFS Cluster debugging". People
should really be using COMPAT_LINUX instead of the linux module on
dynamic systems like -current.
process's priority go through the roof when it released a (contested)
mutex. Only set the native priority in mtx_lock if hasn't already
been set.
Reviewed by: jhb
in VMware reports 0x00000000 in the PCI subsystem ID register, but
0x10001000 when you read the mirror registers in I/O space. This causes
pcn_probe() to think it's found a card in 32-bit mode, and performing
a 32-bit I/O access makes on a 16-bit port makes VMware go boom. Special
case the 0x10001000 value until somebody at VMware grows a clue.
Finally discovered by: Andrew Gallatin
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.
Clean up some layering violations (access to tcp internals from in_pcb)
handle read and write requests for widths of multiple bytes. This
can be used to read 16-bit battery status registers for example.
- Remove some unused variables and #if 0'd debugging cruft.
- Don't complain about a GPE query that fails due to AE_NOT_FOUND if the
query method was _Q00.
from a BIF, use the size of the destinatino buffer, not the length of the
string to determine where to put the nul char. As a side effect, the
old code would truncate the string by one character while it was possibly
overflowing the buffer.
This piece of code has not been referenced since it was put there
in 1995. Also done a codebased search on popular networking libraries
and third-party applications. This is an orphan.
Reviewed by: jesper
o Allocate memory mapped by pcic even when not used for ncv.
This is for PC-Cards which needs offset, because I/O space should not be
used by other devices.
Pointed-out-by: YAMAMOTO Shigeru <shigeru@iij.ad.jp>
Incredibally useful for debugging kernels using vmware.
Vmware com1 is diverted to one side, and gdb listens to the other side.
viola.. instant debugging sandbox on one system.
Since we know there's always an upper bound we force that bound,
otherwise users can cause a panic via malloc getting hit with a
odd (huge or negative) amount of memory to allocate.
Tested by: kris
Pointed out by: Andrey Valyaev <dron@infosec.ru>
If you ever want to run midi(4) out of the giant lock, uncomment
MIDI_OUTOFGIANT in midi.h. Confirmed to work for csamidi with WITNESS
and INVARIANTS.
- midi_info, midi_open and seq_info are now tailqs, allowing arbitrary
numbers of devices to be configured.
- Do not send an active sensing message to reset midi modules.
- Clone /dev/sequencer*. /dev/sequencer0 and /dev/sequencer are generated
upon initialization.
Includes the following revisions from KAME (two of these were actually
committed previously but the CVS revisions weren't documented):
1.40 kame/kame/sys/netinet6/ah_core.c (committed in previous rev)
1.41 kame/kame/sys/netinet6/ah_core.c
1.28 kame/kame/sys/netinet6/ah_output.c (committed in previous rev)
1.29 kame/kame/sys/netinet6/ah_output.c
1.30 kame/kame/sys/netinet6/ah_output.c
1.129 kame/kame/sys/netinet6/nd6.c
1.130 kame/kame/sys/netinet6/nd6.c
1.24 kame/kame/sys/netinet6/dest6.c
1.25 kame/kame/sys/netinet6/dest6.c
Obtained from: KAME
- Convert to a more efficient queueing implementation.
- Don't allocate command buffers on the fly; simply work from a
static pool.
- Add a control device interface, for later use.
- Handle controller overload better as a consequence of the
improved queue implementation.
- Add support for the XPT_GET_TRAN_SETTINGS ccb, and correctly
set the virtual SCSI channels up for multiple outstanding I/Os.
- Update copyrights for 2001.
- Some whitespace fixes to improve readability.
Due to a misunderstanding on my part, previous versions of the
driver were limited to a single outstanding I/O per virtual drive.
Needless to say, this update improves performance substantially.
connection, but send it immediately. Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
o Offset and period in synch messages and width negotiation should be
done for per target not per lun. Move these from *lun_info to
*targ_info.
o Change in handling XPT_RESET_DEV and XPT_GET_TRAN_SETTINGS .
o Change CAM_* xpt_done return values.
o Busy loop did not timeout. Change this to timeout as original NetBSD/pc98.
Reviewed by: bsd-nomads ML
by the compiler. ie: char foo[0] comes out as 4 bytes on a.out, and
we depended on it coming out as 0 for the script version. :-(
Make double sure that genassym.o is built and nm'ed in elf mode.
(ia64 skipped since it is stuck on the linux toolchain and doesn't
understand the -elf switches)
gcc -aout -mno-underscores. The bioscall.s tweak is not an a.out
requirement really, but to work around the bugs in the antique version of
gas that used for a.out. Makefile hacks are all that is needed to
get an a.out kernel. There is no telling if it will work though.
This is little more than an academic curiosity anyway since all it is
good for is situations where the boot code is hard wired, eg: rom
bootstraps (such as the gnat box).
GENERIC:
...
size -aout kernel ; chmod 755 kernel
text data bss dec hex
3051520 368640 198688 3618848 373820
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.
Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.
moral support from: peter, jhb
ehternet frames to a netgraph hook.
Submitted by: "Vitaly V. Belekhov" <vitaly@riss-telecom.ru>
translated to 5.0 by me. man page not yet written.
This node still needs a little work.. don't use yet. Not yet linked into
the build.