wrong for many years that negative niceness would lower the priority
of a process below PUSER, and once below PUSER, there were conditionals
in the code that are required to test for whether a process was in
the kernel which would break.
The breakage could (and did) cause lock-ups, basically nothing else
but the least nice program being able to run in some conditions. The
algorithm which adjusts the priority now subtracts PRIO_MIN to do
things properly, and the ESTCPULIM() algorithm was updated to use
PRIO_TOTAL (PRIO_MAX - PRIO_MIN) to calculate the estcpu.
NICE_WEIGHT is now 1 to accomodate the full range of priorities better
(a -20 process with full CPU time has the priority of a +0 process with
no CPU time). There are now 20 queues (exactly; 80 priorities) for
use in user processes' scheduling, and PUSER has been lowered to 48
to accomplish this.
This means, to the user, that things will be scheduled more correctly
(noticeable), there is no lock-up anymore WRT a niced -20 process
never releasing the CPU time for other processes. In this fair system,
tsleep()ed < PUSER processes now will get the proper higher priority
than priority >= PUSER user processes.
The detective work of this was done by me, along with part of the
solution. Luoqi Chen has provided most of the solution, and really
helped me understand what was happening better, to boot :)
Submitted by: luoqi
Concept reviewed by: bde
bus/driver/kobj system. I am not 100% sure that this is the correct fix,
but it is harmless and does seem to solve the problem. At worst, it could
cause a tiny memory leak at unload time - this is better than a free(NULL)
and subsequent panic. I'm waiting for comments from Doug about this.
This may yet be backed out and fixed differently.
The change itself is to increment the reference count on drivers in one
case where it appears to have been missed. When everything is unloaded,
kobj_class_free() was being called twice in some cases, and panicing the
second time.
version dependency system. This isn't quite finished, but it is at a
useful stage to do a functional checkpoint.
Highlights:
- version and dependency metadata is gathered via linker sets, so things
are handled the same for static kernels and code built to live in a kld.
- The dependencies are at module level (versus at file level).
- Dependencies determine kld symbol search order - this means that you
cannot link against symbols in another file unless you depend on it. This
is so that you cannot accidently unload the target out from underneath
the ones referencing it.
- It is flexible enough that we can put tags in #include files and macros
so that we can get decent hooks for enforcing recompiles on incompatable
ABI changes. eg: if we change struct proc, we could force a recompile
for all kld's that reference the proc struct.
- Tangled dependency references at boot time are sorted. Files are
relocated once all their dependencies are already relocated.
Caveats:
- Loader support is incomplete, but has been worked on seperately.
- Actual enforcement of the version number tags is not active yet - just
the module dependencies are live. The actual structure of versioning
hasn't been agreed on yet. (eg: major.minor, or whatever)
- There is some backwards compatability for old modules without metadata
but I'm not sure how good it is.
This is based on work originally done by Boris Popov (bp@freebsd.org),
but I'm not sure he'd recognize much of it now. Don't blame him. :-)
Also, ideas have been borrowed from Mike Smith.
program running under linux emulation, the script binary is checked for
in /compat/linux first. Without this patch the wrong script binary
(i.e. the FreeBSD binary) will be run instead of the linux binary.
For example, #!/bin/sh, thus breaking out of linux compatibility mode.
This solves a number of problems people have had installing linux
software on FreeBSD boxes.
There's no excuse to have code in synthetic filestores that allows direct
references to the textvp anymore.
Feature requested by: msmith
Feature agreed to by: warner
Move requested by: phk
Move agreed to by: bde
in struct bio. Eventually, bio_offset will probably obsolete the
bio_blkno and bio_pblkno fields.
Remove the special hack in atapi-cd.c to determine of bio_offset was valid.
* Report link errors to stdout with uprintf() so that the user can see
what went wrong (PR kern/9214).
* Add support code to allow module symbols to be loaded into GDB using
the debugger's "sharedlibrary" command.
maintainers.
After we established our branding method of writing upto 8 characters of
the OS name into the ELF header in the padding; the Binutils maintainers
and/or SCO (as USL) decided that instead the ELF header should grow two new
fields -- EI_OSABI and EI_ABIVERSION. Each of these are an 8-bit unsigned
integer. SCO has assigned official values for the EI_OSABI field. In
addition to this, the Binutils maintainers and NetBSD decided that a better
ELF branding method was to include ABI information in a ".note" ELF
section.
With this set of changes, we will now create ELF binaries branded using
both "official" methods. Due to the complexity of adding a section to a
binary, binaries branded with ``brandelf'' will only brand using the
EI_OSABI method. Also due to the complexity of pulling a section out of an
ELF file vs. poking around in the ELF header, our image activator only
looks at the EI_OSABI header field.
Note that a new kernel can still properly load old binaries except for
Linux static binaries branded in our old method.
*
* For a short period of time, ``ld'' will also brand ELF binaries
* using our old method. This is so people can still use kernel.old
* with a new world. This support will be removed before 5.0-RELEASE,
* and may not last anywhere upto the actual release. My expiration
* time for this is about 6mo.
*
Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
non-device code.
* Re-implement the method dispatch to improve efficiency. The new system
takes about 40ns for a method dispatch on a 300Mhz PII which is only
10ns slower than a direct function call on the same hardware.
This changes the new-bus ABI slightly so make sure you re-compile any
driver modules which you use.
devstat_end_transaction_bio()
bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.
Move b_offset, b_data and b_bcount to struct bio.
Add BIO_FORMAT as a hack for fd.c etc.
We are now largely ready to start converting drivers to use struct
bio instead of struct buf.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
async I/O's. The sequential read heuristic has been extended to
cover writes as well. We continue to call cluster_write() normally,
thus blocks in the file will still be reallocated for large (but still
random) I/O's, but I/O will only be initiated for truely sequential
writes.
This solves a number of annoying situations, especially with DBM (hash
method) writes, and also has the side effect of fixing a number of
(stupid) benchmarks.
Reviewed-by: mckusick
release for inclusion into the release, but bde talked me out of
committing the module that needs this until after the release. It is
after the release now. :-)
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).
A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.
Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.
This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.
This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.
Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).
The geteblk() fix solves a performance problem that DG was able reproduce.
This patch does not completely fix the KVA fragmentation problems, but
it goes a long way
Mostly Reviewed by: bde and others
Approved by: jkh