civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
check handling. In its current form, it only determines the largest
amount of state information it can get from SAL and allocates a region
7 memory block for it.
The next steps involve:
o get and log any unconsumed (NVM stored) error records across
reboots,
o register an OS_MCA handler and enable machine checks.
the SMP case. While on the subject, remove unnecessary stops. I don't
know if this resolves the memory corruption I'm seeing, but it does
have the potential. We'll see...
both Elf_Rel and Elf_Rela types of relocation, so handle them both
even though we only have Rel_Rela ATM. We don't handle 32-bit and
big-endian variants yet. Support for that is not trivial enough to
implement it without any evidence that we ever need it in the near
future.
For the FPTR relocations, we currently use the fptr_storage used by
_reloc() is locore.s. This is in no way a real solution, but for now
provides the service we need to get the basics going.
A static recursive function lookup_fdesc() is used to find the address
of a function in a way that keeps track of the load module so that
we can get the correct GP value if we need to construct an OPD (ie
there's no OPD yet for the function.
For simplicity, we create an OPD for the IPLT relocations as well and
simply fill the user provided function descriptor from the OPD. Since
the the official descriptors are unique, this has no bad side effects.
Note that we ignore the addend for FPTR relocations, but use the
addend for IPLT relocations as an offset to the function address.
This commit allows us to load and relocate modules and modules appear
to work correctly, although we probably need to make sure that we set
GP correctly in all cases when we have inter-module calls. This
especially applies to assembly coded functions that have cross module
calls.
the DT_PLTGOT value. On ia64 this is the value of GP. We need this
to construct function descriptors, but the elf file structure is
not exported to MD code.
Note that the name of the function is based on the meaning that
DT_PLTGOT has on ia64. This may differ on other architectures. As
such, link_elf_get_gp() has a high level of MD to it. Renaming the
function to describe what DT_* value is returned makes it generic,
but also makes the MD code less clear and if we only need this on
ia64, then a general name for a specific function doesn't help.
In short: I don't know what is "right" at this time, so I'll go
with what I have.
nfsrv_readdir and nfsrv_readdirplus can return. A client request
containing an over-large `count' field could trigger the "Bad nfs
svc reply" panic in nfs_syscalls.c.
Spotted while trying to reproduce kern/37304, which turned out to
be fixed in FreeBSD a long time ago.
MFC after: 1 week
here mostly mirror the changes made in
boot/efi/libefi/arch/ia64/start.S rev 1.5
Significant difference: We don't handle the IPLT relocation here.
For barebones KLD support, we make the fptr_storage global.
o We don't expect the PLT relocations to follow the .rela section
anymore. We still assume that PLT relocations are long formed,
o Document register usage,
o Improve ILP,
o Fix the FPTR relocation by creating unique OPDs per function.
Comparing functions is valid now,
o The IPLT relocation naturally handles the addend. Deal with it.
We ignore the addend for FPTR relocations for now. It's not at
all clear what it means anyway.
Fix ABI misinterpretation:
o For Elf_Rela relocations, the addend is explicit and should not
be loaded from the memory address we're relocating. Only do that
for Elf_Rel relocations (ie the short form).
o DIR64LSB is not the same as REL64LSB. DIR64LSB applies to a
symbol (S+A), whereas REL64LSB applies to the base address (BD+A),
user stack in response to a failed window fill, allowing the process to be
killed if its wrong. This caused user programs which misalign their stack
pointer to get stuck in an infinite loop at the kernel-userland boundary,
which is mostly harmless.
The same thing causes a fatal RED state exception on OpenBSD and probably
NetBSD.
Inspired by: art@openbsd.org
VOP_OPEN() and doing lots of manual checking. This would further
centralize use of the name functions, and once the MAC code is integrated,
meaning few extraneous MAC checks scattered all over the place. I don't
have time to fix this now, but want to make sure it doesn't get
forgotten. Anyone interested in fixing this should feel free.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
in various extattr_*() calls to match the rest of the file. Originally,
these bits at the end looked more like style(9). This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through. Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
-stable machine via the old-school methods):
Use __FreeBSD_version in preference to __FreeBSD__ >= N where possible.
Define a single variable mythread which is set to curproc or curthread
depending on the OS version (with a comment saying it is a white lie on
4.x since it really is a proc).
NB: __FreeBSD__ is the OS level of the host machine, not the target,
and should never be used, if possible, as __FreeBSD__ >= N.
constructing a struct aio and invoking VOP_READ() directly. This cleans
up the code a little, but also has the advantage of making sure almost
all vnode read/write access in the kernel goes through the helper
function, meaning that instrumentation of that helper function can impact
almost all relevant read/write operations. In this case, it permits us
to put MAC hooks into vn_rdwr() and not modify uipc_syscalls.c (yet).
In general, if helper vn_*() functions exist, they should be used in
preference to direct VOP's in system call service code.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
needed in the current code, in the MAC tree, create_init() relies on the
ability to modify the credentials present for initproc, and should not
perform that modification on a shared credential. Pro-active diff
reduction against MAC changes that are in the queue; also facilitates
other work, including the capabilities implementation.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
o Make the cam, cd9660 lomac and sound modules i386 and alpha
specific due to link problems (@gprel relocation when @ltoff
is required). Once resolved, these can be moved back to the
generic list.
o Build linprocfs only on those architectures that have the
linux module.
o Make the sppp module i386 and alpha specific due to compile
problems (pointers as switch cases). Once resolved, this can
be moved back to the generic list.
o Build all i386 specific modules, with the exception of those
mentioned above as being moved from the generic list to the
i386 list and those with dependencies on the linux module (aac)
or i386 dependent (ar, apm, atspeaker, fpu, gnufpu, ibcs2,
linux, ncv, nsp, netgraph, oltr, pecoff, s3, sbni, stg and
vesa).
o Don't build acpi as a module yet. It most be ported first.
Once ported, it can be added to the ia64 list.
o Don't build ipfilter yet due to compile errors (osreldate.h
not found).
Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.
Sponsored by: DARPA & NAI Labs.
Replace with kevent(2) ops.
This is untested, but the code would rot even further if this wasn't
applied. I've chosen to apply this to prompt some cleanup.
Submitted by: bde
Rev 1.56 of if_dc.c removed calls to mii_pollstat() from the dc_tick()
routine. dc_tick() is called regularly to detect link up and link down
status, especially when autonegotiating.
The expectation was that mii_tick() (which is still called from dc_tick())
would update status information automatically in all cases where it would
be sensible to do so.
Unfortunately, with authentic 21143 chips this is not the case, and
the driver never successfully autonegotiates. This is because (despite
what it says in the 21143 manual) the chip always claims that link is not
present while the autonegotiation enable bit is set. Autonegotation takes
place and succeeds, but the driver tests the link bits before it switches
off the autonegotiation enable bit, and success is not recognised.
The simplest solution is to call dcphy_status() more often for MII_TICK
calls by dropping out of the switch statement instead of exiting when
we are autonegotiating and link appears to not be present. When
autonegotiation succeeds, dcphy_status() will note the speed and fdx/hdx
state and turn off the autonegotiation enable bit. The next call to
dcphy_status() will notice that link is present, and the dc driver code
will be notified.
Macronix chips also use this code, but implement link detection as
described in the manual, and hence don't need this patch. However, tests
on a Macronix 98715AEC-C show that it does not adversely affect them.
This could be done better but is the minimal effective change, and most
closely mimics what was happening prior to rev 1.56 of if_dc.c. (Actually
I also deleted a small amount of unnecessary code while I was in the area.)
Reviewed by: wpaul
due to conditions that suggest the possible need for stack growth.
This has two beneficial effects: (1) we can
now remove calls to vm_map_growstack() from the MD trap handlers and (2)
simple page faults are faster because we no longer unnecessarily perform
vm_map_growstack() on every page fault.
o Remove vm_map_growstack() from the i386's trap_pfault().
o Remove the acquisition and release of Giant from i386's trap_pfault().
(vm_fault() still acquires it.)
loop is inversly proportional to hz.
This makes things more sane for configurations with hz != 100.
Cosmetic: Make the loops look similar to the loops in digi.c
loop is inversly proportional to hz.
This makes things more sane for configurations with hz > 100.
Submitted by: Peter Jeremy <peter.jeremy@alcatel.com.au>
machine_checks.
This fixes pci config reads for non existing devices on secondary
pci busses.
Thanks to Andrew Gallatin for pointing me to the register
Reviewed by: gallatin
Approved by: gallatin
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.
This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.
The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).
Reviewed by: peter
statclock can access it in the tail end of statclock_process() at an
unfortunate time. This bit me several times on an SMP alpha (UP2000)
and the problem went away with this change. I'm not sure why it doesn't
break x86 as well. Maybe it's because the clocks are much faster
on alpha (HZ=1024 by default).
the tokens are legal ANSI-C. Maybe to enable 'op' to be a macro itself?
Anyway, with the ## concatenation Gcc 3.1's integrated `cpp' treats "=op("
as a single token vs. the three tokens it is.
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.
Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.
Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.
Reported by: simokawa
MFC after: 1 week
- Add stubs for EISA and SBUS cards.
(VME, FutureBUS, and TurboChannel stubs not provided.)
- Add infrastructure to build driver and bus front-end modules.
lun address modifier of sorts. Only an HP XP-512 seems to have cared.
Fix a few misplaced pointers for the new fabric goop, which has been
demonstrated to work on newer Brocades and McData switches now.
Put in commented out code which would run GFF_ID if the QLogic f/w
allowed it.
Don't whine about not being able to find a handle for a command if it
was a command aborted (by us).
- Use temporary variables to hold a pointer to a pgrp while we dink with it
while not holding either the associated proc lock or proctree_lock. It
is in theory possible that p->p_pgrp could change out from under us.
sx lock. Trying to get the lock order between these locks was getting
too complicated as the locking in wait1() was being fixed.
- leavepgrp() now requires an exclusive lock of proctree_lock to be held
when it is called.
- fixjobc() no longer gets a shared lock of proctree_lock now that it
requires an xlock be held by the caller.
- Locking notes in sys/proc.h are adjusted to note that everything that
used to be protected by the pgrpsess_lock is now protected by the
proctree_lock.
o move timeout from wihap_info to wihap_sta_info
o sprinkle spls into the code (need to use proper -current locking)
o better use of le16toh and htole16
o fix a few leaks m_freem(m)
o minor knf
o minor de-knf to match OpenBSD
o de__P
- Add a device_method_t array, fore_methods.
- Add a fore_ident_table that contains the various FORE Systems PCA-200
series devices.
- Rewrite of the fore_probe routine (formerly known as fore_pci_probe).
- Minor changes... mostly WIP stuff to get this updated... still much to
be done.
Gcc 3.1's 'cpp' vs. 2.95.3's. Maybe it is due to other code movement and
it just shows up weirdly in handling the .stab's. Anyway, w/o this change
building a kernel gives:
alpha/alpha/pal.s:75: relocation truncated to fit: REFLONG .text
alpha/alpha/prom_disp.s:67: relocation truncated to fit: REFLONG .text
the per-channel bus_addr_t offset. Also, cast the offset to (long long)
and use %#llx instead of %#x to fix printf warnings on architectures where
sizeof(bus_addr_t) != sizeof(int).
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)
Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
es137x.c: In function `es1371_rdcd':
es137x.c:598: warning: `x' might be used uninitialized in this function
PR: kern/35408
Submitted by: Thomas Quinot <thomas@cuivre.fr.eu.org>
Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.
In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder. If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.
The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.
Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.
This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.
that declares itself to be a disk, which may be the wrong thing to do in
the long term but it works well enough to attach to emulated disks in the
PowerPC simulator in gdb now that they have the proper device_type
property.
information related to bucket size effeciency. Three things are printed on
each row:
Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.
At the end the total memory used and total waste is displayed. Currently my
system displays about 33% wasted memory.
The intent of this code is to gather statistics for tuning the malloc bucket
sizes. It is not intended to be run with INVARIANTS and it is not entirely
mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.
Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data. This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.
Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).
Here's a description of the problem and my fix:
The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.
This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.
To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.
I have seen one side effect of the fix, which is that ACPI calls
after X has been run seem to hang my machine. As running X would
previously panic the machine, this is still an improvement ;-)
I'd like to MFC this before the 4.6 code freeze - please let me
know if it causes any problems.
PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).
Here's a description of the problem and my fix:
The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.
This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.
To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.
I'd like to merge this before the 4.6 code freeze, so if people
can test this with XFree 4 that would be very useful.
PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks
exhausting the kernel timeout table. Perform the usual gymnastics to
avoid race conditions between node shutdown and timeouts occurring.
Also fix a bug in handling ack delays < PPTP_MIN_ACK_DELAY. Before,
we were ack'ing immediately. Instead, just impose a minimum ack delay
time, like the name of the macro implies.
MFC after: 1 week
hash while holding the lock on a zone. Fix this by doing the allocation
seperately from the actual hash expansion.
The lock is dropped before the allocation and reacquired before the expansion.
The expansion code checks to see if we lost the race and frees the new hash
if we do. We really never will lose this race because the hash expansion is
single threaded via the timeout mechanism.
The extra microphone channel capability is part of the "normal" ac97
capabilities and not an extended ac97 capability. Now recording on
codecs without a seperate mic channel works.
MFC after: 1 week
o Use chunk instead of region when we talk about a memory range.
Region can be confused with region register and we already
call it chunk in machdep.c
o Update the twiddle every 16MB
Fortunately we have no large zones with maximums specified yet, so it wasn't
breaking anything.
Implement blocking when a zone exceeds the maximum and M_WAITOK is specified.
Previously this just failed like the old zone allocator did. The old zone
allocator didn't support WAITOK/NOWAIT though so we should do what we
advertise.
While I was in there I cleaned up some more zalloc logic to further simplify
that code path and reduce redundant code. This was needed to make the blocking
work properly anyway.
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
or not. We don't need sched_lock.
- Close some races in psignal(). In psignal() there is a big switch
statement based on p_stat. All the different cases are assuming that
the process (or thread) isn't going to change state out from under it.
To ensure this is true, just lock sched_lock for the entire switch. We
practically held it the entire time already anyways. This also
simplifies the locking somewhat and actually results in fewer lock
operations.
- Allow signotify() to be called with the sched_lock held since psignal()
now does that.
- Use td_ucred in a couple of places.
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
prior to locking the target process.
- rtprio() no longer needs Giant. In theory with full KSE it would still
need Giant to protect p_ucred of curproc for the p_canfoo() functions
but p_canfoo() will be changing to using td_ucred of curthread before
full KSE hits the tree.
of a process pointer.
- Move the p_candebug() at the start of procfs_control() a bit to make
locking feasible. We still perform the access check before doing
anything, we just now perform it after acquiring locks.
- Don't lock the sched_lock for TRACE_WAIT_P() and when checking to see if
p_stat is SSTOP. We lock the process while setting p_stat to SSTOP
so locking the process is sufficient to do a read to see if p_stat is
SSTOP or not.
allocate a blank cred first, lock the process, perform checks on the
old process credential, copy the old process credential into the new
blank credential, modify the new credential, update the process
credential pointer, unlock the process, and cleanup rather than trying
to allocate a new credential after performing the checks on the old
credential.
- Cleanup _setugid() a little bit.
- setlogin() doesn't need Giant thanks to pgrp/session locking and
td_ucred.
FIFO or the in-RAM descriptors it will switch to RX_IDLE from where it
is not restarted.
We used to deal with RX_IDLE by doing a total reinit but this lost
our link and caused a potential 30sec autonegotiation against
switches. This was changed to a less heavyhanded approach, but this
failed to restart the receiver it it were in the RX_IDLE state.
This change adds the RX_IDLE and the RX_FIFO_OFLOW conditions as
triggers for interrupts and receive side processing, and restarts
the receiver when it is RX_IDLE.
Remove the #ifdef notyet'ed nge_rxeoc() function.
Sponsored by: Cybercity Internet, Denmark.
MFC after: 7 days
and acquire the proctree_lock if needed first. Then we lock the process
if necessary and fiddle with it as appropriate. Finally we drop locks and
do any needed copyout's. This greatly simplifies the locking.
belong to a user virtual address; while this happens to work on some
architectures, it can't on sparc64, since user and kernel virtual
address spaces overlap there (the distinction between them is done via
separate address space identifiers).
Instead, look up the page in the vm_map of the process in question.
Reviewed by: jake
(apparently by markus@, at least committed by him). This has the
advantage of not using the bad IV's from Fluhrer/Mantin/Shamir as well
as bringing the drivers a little closer together.
Also use a few constants in place of magic numbers in one place.
Obtained from: OpenBSD 1.25, 1.28, 1.36, 1.38, 1.42
time we tell CAM to rescan the bus. Together with the previous patch
this should avoid the problem where the devices would wedge because they
got spoken to over two different pipes.
Tested by: Tomas Pluskal <plusik@pohoda.cz>
up the module_path string, we would walk one past the end of the buffer.
This hurting ia64 originally, but it was probably also happening on i386
occasionally as well. The effects were usually harmless, it would add
bogus "binary" search directories to the places it actually looked for
files.
the S_IFREG bit for regular files. This caused the path search code to
skip it when it finally did find the kernel (after the common/module.c
buffer overrun bug was fixed)