one the number of variables needed for top and other setgid kmem
utilities that could only be accessed via /dev/kmem previously.
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: freebsd-audit
- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.
not be retried. It is an indication that there was an error that was
corrected during the execution of the command. This is per ANSI SCSI2
spec.
It's possible that these should also be noted to the console (as indicative,
perhaps, of growing media defect lists in drives), but the default of
printing errors out if bootverbose in this case is probably enough.
Also, there'd been a missing ERESTART for that clause anyway.
2. If you have an ABORTED COMMAND, it's almost invariably a SCSI parity
error. You should never be silent about these since users should do something
about this if it occurs (moving that power cord *away* from the SCSI cable is
always a good first start). This should print irrespective of bootverbose
because it's an actual real error even if we retry a transmission.
Reviewed by: audit@freebsd.org, gibbs@freebsd.org
- Missing cpu_to_scr() added (endian-ness).
Improvement (fix|workaroung??):
- Blindly firing a PPR can lead to some messy situations due to
various causes or misfeatures, for example:
* The 53C1010-[33|66] supports offset 62 in DT mode, but only
offset 31 in ST mode. As a result, a PPR(DT, offset 62)
responded with PPR(ST, any offset > 31) must be rejected.
* A device that doesn't know about PPR should reject it, but
may also be confused by this message.
When a PPR encounters problems, the driver now patches the goal
transfer settings for legacy negotiations to be performed later
with the offending target. This give a chance for bad situations
to be fixed automagically.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.
Submitted by: Mike Silbersack <silby@silby.com>
and function argument declarations. Make sure that functions that are
supposed to return a pointer return NULL in case of failure. Don't cast
NULL. Finally, get rid of annoying `register' uses.
isp_iid_set/isp_iid for fibre channel- this is because we now
fake a port database entry for ourselves. Add the additional loop
states between LOOP_PDB_RCVD and LOOP_READY.
Change and comment on a wad of Fibre Channel isp_control functions.
Change and comment on some of the ISPASYNC Fibre Channel events.
the unit number doesn't get reused.
Make sure that if we've compiled for ISP_TARGET_MODE we set the
default role to be ISP_ROLE_INITIATOR|ISP_ROLE_TARGET.
Do some misc other cleanups.
and depending on role, make sure link is up, scan the fabric (if we're
connected to a fabric), scan the local loop (if appropriate), merge
the results into the local port database then, check once again
to make sure we have f/w at FW_READY state and the the loopstate
is LOOP_READY.
Comment out usage of ISP_SMPLOCK- I have my doubts that this works sanely
as yet because CAM itself still needs Giant. I *was* dropping my lock
and grabbing Giant when doing the upcall for completion, but this is all
seems ridiculous until CAM is fixed.
if we're ISP_ROLE_NONE. Change ISPASYNC_LOGGED_INOUT to ISPASYNC_PROMENADE.
Make sure we note if something is a fabric device.
Target mode:
Finally fix (to a first approximation) SCSI Target Mode again- we needed
to correctly check against CAM_TARGET_WILDCARD and CAM_LUN_WILDCARD
so that targbh won't confuse us. Comment out the drainqueue stuff for
now. Use isp_fc_runstate instead if isp_control/ISPCTL_FCLINK_TEST.
Remove ISP2100_FABRIC defines- we always handle fabric now. Insert
isp_getmap helper function (for getting Loop Position map). Make
sure we (for our own benefit) mark req_state_flags with RQSF_GOT_SENSE
for Fibre Channel if we got sense data- the !*$)!*$)~*$)*$ Qlogic
f/w doesn't do so. Add ISPCTL_SCAN_FABRIC, ISPCTL_SCAN_LOOP, ISPCTL_SEND_LIP,
and ISPCTL_GET_POSMAP isp_control functions. Correctly send async notifications
upstream for changes in the name server, changes in the port database, and
f/w crashes. Correctly set topology when we get a ASYNC_PTPMODE event.
Major stuff:
Quite massively redo how we handle Loop events- we've now added several
intermediate states between LOOP_PDB_RCVD and LOOP_READY. This allows us
a lot finer control about how we scan fabric, whether we go further
than scanning fabric, how we look at the local loop, and whether we
merge entries at the level or not. This is the next to last step for
moving managing loop state out of the core module entirely (whereupon
loop && fabric events will simply freeze the command queue and a thread
will run to figure out what's changed and *it* will re-enable the queu).
This fine amount of control also gets us closer to having an external
policy engine decide which fabric devices we really want to log into.
tracing in order to avoid duplication.
- Insert some tracepoints back into the mutex acq/rel code, thus ensuring
that we can trace all lock acq/rel's again.
- All CURPROC != NULL checks are MPASS()es (under MUTEX_DEBUG) because they
signify a serious mutex corruption.
- Change up some KASSERT()s to MPASS()es, and vice-versa, depending on the
type of problem we're debugging (INVARIANTS is used here to check that
the API is being used properly whereas MUTEX_DEBUG is used to ensure that
something general isn't happening that will have bad impact on mutex
locks).
Reminded by: jhb, jake, asmodai
genassym here, but what I've also noticed is that we're dorking
with a mutex directly at assembler level- I'm not sure that this
is wise at this stage in the SMP port- I think it's going to be much
safer for a while to do things in C until SMP wunderkind figure out
what works and slow down this 3 order differential...
Style nits.
Make sure that our selection hardware is disabled
as soon as possible after detecting a busfree and
even go so far as to disable the selection hardware
in advance of an event that will cause a busfree
(ABORT or BUS DEVICE RESET message). The concern
is that the selection hardware will select a target
for which, after processing the bus free, there
will be no commands pending. The sequencer idle
loop will re-enable the selection should it still be
necessary.
In ahc_handle_scsiint(), clear SSTAT0 events several
PCI transactions (most notably reads) prior to clearing
SCSIINT. The newer chips seem to take a bit of time to
see the change which can make the clearing of SCSIINT
ineffective.
Don't bother panicing at the end of ahc_handle_scsiint().
Getting to the final else just means we lost the race
with clearing SCSIINT.
In ahc_free(), handle init-level 0. This can happen when we
fail the attach for RAID devices. While I'm here, also kill
the parent dma tag.
In ahc_match_scb(), consider initiator ccbs to be any
that are not from the target mode group. This fixes
a bug where an external target reset CCB was not getting
cleaned up by the reset code.
Don't bother freezing a ccb in any of our "abort" routines
when the status is set to CAM_REQ_CMP. This can happen
for a target reset ccb.
aic7xxx.reg:
Reserve space for a completion queue. This will be used
to enhance performance in the near future.
aic7xxx.seq:
Remove an optimization for the 7890 autoflush bug that
turned out to allow, in rare cases, some data to get
lost.
Implement a simpler, faster, fix for the PCI_2_1 retry
bug that hangs the sequencer on an SCB dma for certain chips.
Test against SAVED_SCSIID rather than SELID during target
reselections. This is how we always did it in the past,
but the code was modified while trying to work around an
issue with the 7895. SAVED_SCSIID takes into account
twin channel adapters such as the 2742T, whereas SELID
does not have the channel bit. This caused invalid
selection warnings and other strangeness on these cards.
aic7xxx_pci.c
Use the correct mask for checking the generic aic7892
entry.
it as I was playing with some other ways of doing kernel preemption.
You must still specify the PREEMPTION option in your config file to get a
preemptive kernel.
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.
- I can't seem to reproduce the warning I got from WITNESS anymore.
- The fix was wrong. Since a uidinfo struct is a member of proc, it
makes sense for the locking order to be such that you are allowed to
hold proc and then grab the uidinfo lock.
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
- Add a set of MI helper functions for interrupt threads:
- ithread_create() creates a new interrupt thread
- ithread_destroy() destroys an interrupt thread
- ithread_add_handler() attaches a new handler to an interrupt thread
- ithread_remove_handler() detaches a handler from an interrupt thread
- Rename sinthand_add() and sched_swi() to swi_add() and swi_sched()
respectively so that they live in a consistent namespace.
- struct intrhand is no longer a public type. It would be private to
kern_intr.c but the current implementation of fast interrupts on the
alpha requires the type to be exported. However, all handlers should
be treated as void * cookies in the way that new-bus treats them. This
includes references to software interrupt handlers.
This mistake seems to have been benign until very recently, probably
until msmith's PCI code reshuffle which cleaned up a lot of things.
Still, my AIC7770 doesn't work again, but it at least probes the
EISA bus now.
will only display sleep mutexes held by the current process.
- Clean up some nits in the witness_display() function and add a ddb
command 'show witness' that dumps the hierarchy and order lists to the
console.
- Use queue(3) macros where appropriate.
- Resort the spin lock order list so that "com" is before "sched_lock".
Also, add appropriate #ifdef's around SMP and i386-specific mutexes.
- Add two new mutexes used to protect the ithread lists and tables to the
order list.
Requested by: bde (1)
follows:
- show ktr_first display the first entry
- show ktr_next display the next entry
- show ktr display the entire buffer
The /v modifiers continue to work as described previously.
Requested by: bde
only the boot processor should be running in the comments.
- Initialize curproc to point to each CPU's respective idleproc if their
curproc is NULL.
- Keep track of the number of context switches performed by idleproc.
of returning an error code to the caller, NFS server op routines
must themselves build an error reply and return 0 to the caller.
This is achieved by replacing the erroneous return statements with
code that jumps forward to the op function's reply code. We need
to be careful to ensure that the 'struct mount' pointer is NULL
though, so that the final vn_finished_write() call becomes a no-op.
Reviewed by: mckusick, dillon
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
chipset. The MAC address is stored in the APC CMOS RAM and we have to
commit trememdous evil in order to read it. The code to do this is only
activated on the i386 platform. Thanks to Cameron Grant for providing
access to a test box for me to tinker with.
This will fix the problem where the sis driver ends up with a station
address of 00:00:00:00:00:00 on boards that use the 630E chipset.
* a ">" is really ">=" ;
* do not try to fetch zero-sized blocks from the card;
* make sure that bpf gets the packets it wants even with
bridging active;
different hardware address, we should drop it (this should only
happen in promiscuous mode). Relocate the code for this check
from before ng_ether(4) processing to after ng_ether(4) processing.
Also fix a compiler warning.
PR: kern/24465
kmem_free() for now. Kmem_malloc() and kmem_free() now have appropriate
assertions in place, and these checks aren't feasible until more of the
networking code is locked down. Also, the extra assertions here should
already be caught by the WITNESS code as lock order violations should
mutex operations on Giant be reintroduced here later.
adv_free() as the ISA probe routine doesn't malloc() ccb_infos but does
call adv_free().
- Release the ISA-only overrun DMA tags, bufs, and maps if the probe fails.
Tested by: rwatson
the index of the pollfd array to the number of fd's currently open, not
the maximum number of fd's. ie: if you had 0,1,2 open, you could not
use pollfd slots higher than 20. The specs say we only have to support
OPEN_MAX [64] entries but we allow way more than that.
only covers about 3-4 lines.
- Don't lower the IPL while we are on the interrupt stack. Instead, save
the raised IPL and change the saved IPL in sched_lock to IPL_0 before
calling mi_switch(). When we are resumed, restore the saved IPL in
sched_lock to the saved raised IPL so that when we release sched_lock
we won't lower the IPL. Without this, we would get nested interrupts
that would overflow the kernel stack.
Tested by: mjacob
also try implement teh documented behaviour in socket nodes
so that when there is only one hook, an unaddressed write/send
will DTRT and send the data to that hook.
except for setting it. Also remove count from aha and replace it with
optional.
Also add commented out pccard lines for all the old card drivers.
They have to be commented out until they are converted because it
causes problems in NEWCARD.
There are two 3rd party code chunks using this still - the IPv6 stuff and
i4b. Give them a private copy as an alternative to changing them too much.
XXX sys/kernel.h still has a #include <sys/module.h> in it. I will be
taking this out shortly - this affects a number of drivers.
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.
Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.
Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.
Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.
Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).
PR: 20609
Submitted by: (Tor Egge) tegge
entry fits within its DIRBLKSIZ block. The surrounding code is
extremely fragile with respect to corruption of the directory entry
'd_reclen' field; if directory corruption occurs, it can blindly
scan forward beyond the end of the filesystem block. Usually this
results in a 'fault on nofault entry' panic.
Directory corruption is now much more likely to be detected, resulting
in a 'ufs_dirbad' panic. If the filesystem is read-only, it will
simply print a warning message, and skip the corrupted block.
Reviewed by: mckusick
in ufs_dirbad(). The mnt_stat.f_flags field is only updated by the
syscalls *statfs and getfsstat, so mnt_flag should be used instead.
This only affects whether or not a panic is generated on detection of
certain types of directory corruption.
Reviewed by: mckusick
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.
Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
in tunopen())
o Change the default device permissions to 0600 root:wheel
(were uucp:dialer)
o Only let root (suser()) change the MTU
This makes it possible for an administrator to open up the
permissions on /dev/tun*, letting non-root programs service
a tun interface. Co-operation is still required with a
priviledged program that will configure the interface side
of things.
* Optimise the return path for syscalls so that they only restore a minimal
set of registers instead of performing a full exception_return.
A new flag in the trapframe indicates that the frame only holds partial
state. When it is necessary to perform a full state restore (e.g. after an
execve or signal), the flag is cleared to force a full restore.
striped plexes. This prevents various panics introduced in the last
rewrite of the locking code.
Suffered by: "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>