longer has anything to do with vnodes and never had anything to do
with buffers, but it needs the definitions of B_READ and B_WRITE
for use with the bogus useracc() interface and was getting them
bogusly due to excessive cleanups in rev.1.49.
space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.
Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.
This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.
Obtained from: bde
Use the name argument almost the same in all LKM types. Maintain
the current behavior for the external (e.g., modstat) name for DEV,
EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only
#name. This is a candidate for change and I vote just the name without
the "_mod".
Change the DISPATCH macro to MOD_DISPATCH for consistency with the
other macros.
Add an LKM_ANON #define to eliminate the magic -1 and associated
signed/unsigned warnings.
Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.
Change source in tree to use the new interface.
Reviewed by: Bruce Evans
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little
so.)
implementation #ifdef out. This can be used for now by NFS. As soon
as all the other filesystems' locking is fixed, this can go away.
Print the vnode address in vprint for easier debugging.
1. imgp->image_header needs to be cleared for the bp == NULL && `goto
interpret' case, else exec_fail_dealloc would free it twice after
an error.
2. Moved the vp->v_writecount check in exec_check_permissions() to
near the end. This fixes execve("/dev/null", ...) returning the
bogus errno ETXTBSY. ETXTBSY is still returned for attempts to
exec interpreted files that are open for writing. The man page
is very old and wrong here. It says that ETXTBSY is for pure
procedure (shared text) files that are open for writing or reading.
3. Moved the setuid disabling in exec_check_permissions() to the end.
Cosmetic. It's more natural to dispose of all the error cases
first.
...plus a couple of other cosmetic changes.
Submitted by: bde
by bde.
Don't return EPERM in setre[ug]id() just because the caller passes in
the current effective id in the second arg (ie: no change), as suggested
by ache.
The magic number conflicted with the rotting disabled one in ext2fs for
debug.doasyncfree.
Removed messy debugging variable/constant/sysctl debug.doreallocblks.
Lite2 removed it, and we don't use the code that it controls.
This is valueable for library code which needs to be able to find out
whether the current process is or *was* set[ug]id at some point in the
past, and may have a "tainted" execution environment. This is especially
a problem with the trend to immediately revoke privs at startup and regain
them for critical sections. One problem with this is that if a cracker
is able to compromise the program while it's still got a saved id, the
cracker can direct the program to regain the privs. Another problem is
that the user may be able to affect the program in some other way (eg:
setting resolver host aliases) and the library code needs to know when it
should disable these sorts of features.
Reviewed by: ache
Inspired by: OpenBSD (but with a different implementation)
that allows traditional BSD setuid/setgid behavior.
The only visible difference should be that a non-root setuid program
(eg: inn's "rnews" program) that is setuid to news, can completely
"become" uid news. (ie: setuid(geteuid()) This was allowed in
traditional 4.2/4.3BSD and is now "blessed" by Posix as a special
case of "appropriate privilige".
Also, be much more careful with the P_SUGID flag so that we can use it
for issetugid() - only set it if something changed.
Reviewed by: ache
vector except for the egid in groups[0]. There is a risk that programs
that come from SYSV/Linux that expect this to work and don't check for
error returns may accidently pass root's groups on to child processes.
We now do what is least suprising (to non BSD programs/programmers) in
this scenario, and nothing is changed for programs written with BSD groups
rules in mind.
Reviewed by: ache
to removing the connection from the queue. The problem here is that
falloc() may block and this would allow another process to accept the
connection instead. If this happens to leave the queue empty, then the
system will panic with an "accept: nothing queued".
Also changed a wakeup() to a wakeup_one() to avoid the "thundering herd"
problem on new connections in Apache (or any other application that has
multiple processes blocked in accept() for the same socket).
as shadows of their containing directory. This should solve the problem
of users not being able to delete their symlinks from /tmp once and for
all.
Symlinks do not have modes though, they are accessable to everything that
can read the directory (as before). They are made to show this fact at
lstat time (they appear as mode 0777 always, since that's how the the
lookup routines in the kernel treat them).
More commits will follow, eg: add a real lchown() syscall and man pages.
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.
they were created later on. This is not the case when processing
syscalls.isc in the ibcs2 area. (It generates no declarations, it's
all either hidden (already prototyped elsewhere) or unimplemented).
<sys/ioctl_compat.h> and sometimes <sys/filio.h> instead of
<sys/ioctl.h> in tty-related files. <sys/ttycom.h> is still
usually imported bogusly via <sys/termios.h>.
<sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h>
in miscellaneous files. Most of these files have nothing to do
with ttys but need to include <sys/ttycom.h> to get the definitions
of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into
ioctls.
automatically have random generation numbers. The kenel way of handling those
also changed. Further it is advised to run fsirand on all your nfs exported
filesystems. the code is mostly copied from OpenBSD, with the randomization
chanegd to use /dev/urandom
Reviewed by: Garrett
Obtained from: OpenBSD
null casts. `time' is nonvolatile for accesses within a region locked
by splclock()/splx(). Accesses outside such a region are invalid, and
splx() must have the side effect of potentially changing all global
variables (since there are hundreds of sort of volatile variables like
`time'), so declaring `time' as volatile didn't have any real benefits.
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).
What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:
- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.
- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.
- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.
The receiver can now inspect the credential information and use it to
authenticate the client.
devtotty(). devtotty() must check its arg carefully since the arg is
supplied as ioctl data. This should fix PR3004.
Renamed devtotty() to snpdevtotty().
formula uses `& nchash'. This is very broken when nchash is a prime
number instead of 1 less than a power of 2, but the Lite2 formula was
merged in.
Merged some cosmetic changes from Lite2, rev.1.21 and Lite1. The merge
was difficult because the Lite2 code is essentially ours (phk's) except
where Lite2 improved or broke it.
Summary of the Lite2 changes:
- in the copyright, phk's rights have been transferred to the Regents.
This change should be reviewed.
- nchENOENT went away; the "no" vnode is now simply 0.
- comments were improved.
- style was "improved".
- goto instead of Fanatism (sic) was considered bad :-).
- there are some small changes to support whiteouts.
- new cache entries are added in more cases. More work is required
near here to change the hash table size if kern.desiredvnodes is
changed using sysctl.
- rescanning of the hash bucket in cache_purgevfs() was removed. This
change should be reviewed.
(phk's) sysctl framework, and I needed special code to disambiguate
the VFS_GENERIC node from the VFS_VFSCONF leaf, so I only converted
the leaves to the FreeBSD framework. The error handling isn't quite
right. CSRGS's sysctls seem to return ENOTDIR too much and FreeBSD's
sysctls don't agree with the man page.
and getvfsbyname() interfaces. The new interfaces are now hidden from
applications unless _NEW_VFSCONF is defined. The new vfsconf interfaces
don't work yet.
cruft and resulted in loading usually following a null pointer. Use
something closer to the pre-Lite2 code, including not making a copy of
the new filesystem's config info. Not making a copy also fixes a race
for loading and a memory leak for unloading.
Fixed unloading of vfs's. maxvfsconf wasn't maintained.
Look up the vfs to unload by name instead of by number. The numbers
should go away as soon as all mount utilities are converted.
- getnewvnode() and vref() were missing one simple_unlock() each.
- the Lite2 locking changes weren't merged at all in
printlockedvnodes() or sysctl_vnode(). Merging these undid
some KNF style regressions.
all of the configurables and instrumentation related to
inter-process communication mechanisms. Some variables,
like mbuf statistics, are instrumented here for the first
time.
For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
- avoid malloc() if the number of fds is small.
- pack the bits better so that `small' is quite large.
- don't waste time generating zero bits for null fd_set pointers or
scanning these bits.
Possibly improved select():
- free malloc()ed storage before returning. This is simpler and I
think huge select()s aren't worth optimizing since they are rare,
relative gain would be small and there would be tiny costs for all
selects().
Reviewed by: ache (first version by him too)
execve() clears the P_SUGID process flag in execve() if the binary
executed does not have suid or sgid permission bits set.
This also happens when the effective uid is different from the real
uid or the effective gid is different from the real gid. Under
these circumstances, the process still has set id privileges and
the P_SUGID flag should not be cleared.
Submitted by: Tor Egge <Tor.Egge@idt.ntnu.no>
Successful lstat()s purged an existing entry as well as not caching the
result.
This bug was introduced in Lite1 by setting the LOCKPARENT flag for
[o]lstat() in order to support the inherit-attributes-from-parent-
directory misfeature for symlinks. LOCKPARENT was previously only set
for CREATEs and DELETEs. It is now set for LOOKUPs, but only for
[o]lstat(), so the problem wasn't very noticeable.
the old VFS_VFSCONF sysctl is enabled by default.
Initialize the vfc_vfsops field to non-NULL in sysctl_ovfs_conf()
so that the old VFS_VFSCONF sysctl actually works. The old (still
current) getvfsent.c uses this "kernel-only" field to decide which
vfs's are configured (the old implementation returned null entries
for unconfigured vfs's).
to coredump previously since it (somewhat uniquely) is setuid and forks
without execing, and thus without passing P_SUGID the child could
coredump and possibly divulge sensitive information (such as encrypted
passwords from the passwd database).
clusters greater than one page in length by calling contigmalloc1().
This uses a helper process `mclalloc' to do the allocation if
the system runs out at interrupt time to avoid calling contigmalloc
at high spl. It is not yet clear to me whether this works.
sb_max * MCLBYTES / (MSIZE + MCLBYTES)
used in sbreserve() to overflow, causing all socket creation attempts
to fail. Force the calculation to use u_quad_t's, which makes overflow
less likely.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
The limit is now only used by init, so it may as well be "infinite".
Don't use RLIM_INFINITY, since setrlimit() doesn't allow setting
that value. Use maxfiles instead of RLIM_INFINITY for the hard
limit for the same reason.
Similarly for the maxprocesses limits (use the "infinite" value of
maxproc instead if MAXUPRC and RLIM_INFINITY).
NOFILES, MAXUPRC, CHILD_MAX and OPEN_MAX are no longer used in
/usr/src and should go away. Their values are almost guaranteed to
be wrong now that login.conf exists, so anything that uses the values
is broken. Unfortunately, there are probably a lot of ports that
depend on them being defined.
The global limits maxfilesperproc and maxprocperuid should go away
too.
on it.
makesyscalls.sh:
This parsed $Id$. Fixed(?) to parse $FreeBSD$. The output is wrong when
the id is not expanded in the source file.
syscalls.master:
Fixed declaration of sigsuspend(). There are still some bogons and
spam involving sigset_t.
Use `struct foo *' instead of the equivalent `foo_t *' for some nfs and
lfs syscalls so that <sys/sysproto.h> doesn't depend on <sys/mount.h>.
variable `kern.maxvnodes' which gives much better control over vnode
allocation than EXTRAVNODES (except in -current between 1995/10/28 and
1996/11/12, kern.maxvnodes was read-only and thus useless).
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.
Reviewed by: wollman
rev.1.10 two years ago. Children continued to run at splhigh()
after returning from vm_fork(). This mainly affected kernel
processes and init. For ordinary processes, interrupts are normally
unmasked a few instructions later after fork() returns (it may be
important for syscall() not to reschedule the child processes).
Kernel processes had workarounds for the problem. Init manages to
start because some routines "know" that it is safe to go to sleep
despite their caller starting them at a high ipl. Then its ipl
gets fixed on its first normal return from a syscall.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).
NOTE: THAT LKMS MUST BE REBUILT!!!
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.
The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).
This makes unexpected faults (in an interrupt handler) more likely
to crash properly. It could be done even better (more robustly and
more efficiently) using lazy fault handling.
modules sort of works now. Their devswitch entries aren't cleaned
up, so accessing them after they have been unloaded causes a panic
in spec_open().
Submitted by: durian@plutotech.com (Mike Durian), IIRC
Most of the standard utilities that depended on (or were broken in
a different way by) the old behaviour of interpreting "" as "."
were fixed a year or two ago. There is still a fairly harmless
bug in tar and a harmless bug in gzip. Tar apparently replaces
"/" by "" when it strips leading slashes.
decrease the size of buffer_map to approx 2/3 of what it used to be
(buffer_map can be smaller now.) The original commit of these changes
increased the size of buffer_map to the point where the system would
not boot on large systems -- now large systems with large caches will
have even less problems than before.
the sd & od drivers. There is also slight changes to fdisk & newfs
in order to comply with different sectorsizes.
Currently sectors of size 512, 1024 & 2048 are supported, the only
restriction beeing in fdisk, which hunts for the sectorsize of
the device.
This is based on patches to od.c and the other system files by
John Gumb & Barry Scott, minor changes and the sd.c patches by
me.
There also exist some patches for the msdos filesys code, but I
havn't been able to test those (yet).
John Gumb (john@talisker.demon.co.uk)
Barry Scott (barry@scottb.demon.co.uk)
scheme. Additionally, add the capability for checking for unexpected
kernel page faults. The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.
This scheme manages the kva space similar to the buffers themselves. If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful. This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.
Now there should be NO problem allocating buffers up to MAXPHYS.
succeeds. Writing an action now succeeds iff the handler isn't changed.
(POSIX allows attempts to change the handler to be ignored or cause an
error. Changing other parts of the action is allowed (except attempts
to mask unmaskable signals are silently ignored as usual).)
Found by: NIST-PCTS
the queues and generate a SIGINT. Previously, this wasn't done if ISIG
was clear or the VINTR character was disabled, and it was done by
converting the BREAK to a VINTR character and sometimes bogusly echoing
this character.
Found by: NIST-PCTS
larger than the vfs layer can provide. We now automatically support
32K clusters if MSDOSFS is installed, and panic if a filesystem tries
to allocate a buffer larger than MAXBSIZE.
This commit is a result of some "prodding" by BDE.
substantially increasing buffer space. Specifically, we double
the number of buffers, but allocate only half the amount of memory
per buffer. Note that VDIR files aren't cached unless instantiated
in a buffer. This will significantly improve caching.
using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).
(1) deleted #if 0
pc98/pc98/mse.c
(2) hold per-unit I/O ports in ed_softc
pc98/pc98/if_ed.c
pc98/pc98/if_ed98.h
(3) merge more files by segregating changes into headers.
new file (moved from pc98/pc98):
i386/isa/aic_98.h
deleted:
well, it's already in the commit message so I won't repeat the
long list here ;)
Submitted by: The FreeBSD(98) Development Team
If DEVFS is configured, create devfs devices for previously invisible
partitions on the slices.
Fixed an old aliasing bug which caused E=17 errors from DEVFS for
DIOCSDINFO when there were no real slices.
I decided to do this for every hardclock() call instead of lazily
in microtime(). The lazy method is simpler but has more overhead
if microtime() is called a lot.
CPU_THISTICKLEN() is now a no-op and should probably go away.
Previously it did nothing directly but had the side effect of
setting i586_last_tick for CPU_CLOCKUPDATE() and i586_avg_tick for
debugging. CPU_CLOCKUPDATE() now uses a better method and
i586_avg_tick is too much trouble to maintain.
Reduced nesting of #includes in the usual case.
Increased nesting of #includes when CLOCK_HAIR is defined. This
is a kludge to get typedefs for inline functions only when the
inline functions are used. Normally only kern_clock.c defines
this. kern_clock.c can't include the i386 headers directly.
Removed unused LOCORE support.
- use a more accurate and more efficient method of compensating for
overheads. The old method counted too much time against leaf
functions.
- normally use the Pentium timestamp counter if available.
On Pentiums, the times are now accurate to within a couple of cpu
clock cycles per function call in the (unlikely) event that there
are no cache misses in or caused by the profiling code.
- optionally use an arbitrary Pentium event counter if available.
- optionally regress to using the i8254 counter.
- scaled the i8254 counter by a factor of 128. Now the i8254 counters
overflow slightly faster than the TSC counters for a 150MHz Pentium :-)
(after about 16 seconds). This is to avoid fractional overheads.
files.i386:
permon.c temporarily has to be classified as a profiling-routine
because a couple of functions in it may be called from profiling code.
options.i386:
- I586_CTR_GUPROF is currently unused (oops).
- I586_PMC_GUPROF should be something like 0x70000 to enable (but not
use unless prof_machdep.c is changed) support for Pentium event
counters. 7 is a control mode and the counter number 0 is somewhere
in the 0000 bits (see perfmon.h for the encoding).
profile.h:
- added declarations.
- cleaned up separation of user mode declarations.
prof_machdep.c:
Mostly clock-select changes. The default clock can be changed by
editing kmem. There should be a sysctl for this.
subr_prof.c:
- added copyright.
- calibrate overheads for the new method.
- documented new method.
- fixed races and and machine dependencies in start/stop code.
mcount.c:
Use the new overhead compensation method.
gmon.h:
- changed GPROF4 counter type from unsigned to int. Oops, this should
be machine-dependent and/or int32_t.
- reorganized overhead counters.
Submitted by: Pentium event counter changes mostly by wollman
add free vnodes back to the freelist. They must do their own vnode
management. Anyway, this change is *only* activated with their filesystem
and doesn't affect anyone else. Whoops, forgot the submitted-by lines
in my previous commits too.. :-(
Submitted-By: Tony Ardolino <tony@netcon.com>
The heuristic for managment of memory backing the buffer cache was
nice, but didn't work due to some architectural problems. Simplify
and improve the algorithm.
capable of being used for things other than swap space allocation,
and splvm would have been appropriate for only swap space allocation
and other VM things. My commit broke that (and was actually a mistake.)
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.
(yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.
Submitted by: fenner
Major: When blocking occurs in allocbuf() for VMIO files,
excess wire counts could accumulate.
Major: Pages are incorrectly accumulated into the physical
buffer for clustered reads. This happens when bogus
page is needed.
Minor: When reclaiming buffers, the async flag on the buffer
needs to be zero, or the reclaim is not optimal.
Minor: The age flag should be cleared, if a buffer is wanted.
1/ session leader
2/ Have a console device vnode (/dev/console)
3/ have NULL pointer for a consoel tty struct.
fix the only case where the tty struct is referenced without a prior
check for existance.
- kern.maxproc and kern.maxprocperuid were read-only (and thus essentially
useless. Apparently no one uses them).
- all the user sysctls were read-write (and thus it was possible for them
to be inconsistent with the authoritative fixed values in the library).
Removed unused #include.
It is needed for implementation details but very little of it is
needed for the interface. Include it in the few places that didn't
already include it.
Include <sys/ioccom.h> in <sys/disklabel.h> (as already in
<sys/diskslice.h>) so that all the disk-related headers are almost
self-sufficient.
incorrect, and correct the support for B_ORDERED. The spl window
fix was from Peter Wemm, and his questions led me to find the problem with
the interrupt time page manipulation.
data pointed at in a ktrace file, if this process is being ktrace'ed.
I'm using this to profile malloc usage.
The advantage is that there is no context around this call, ie, no
open file or socket, so it will work in any process, and you can
decide if you want it to collect data or not.
/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
B_ASYNC flag broke things pretty bad (freeing buffer already on
queue or other wierd buffer queue errors.) The broken code is
left in commented out, but this makes the problem go away for
now.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)
for entire SYS5 SHM segments. This is totally unnecessary, and so the
correct allocation of VM objects has been substituted. (The vm_mmap
was misused -- vm_object_allocate is more appropriate.)
Bowrite guarantees that buffers queued after a call to bowrite will
be written after the specified buffer (on a particular device).
Bowrite does this either by taking advantage of hardware ordering support
(e.g. tagged queueing on SCSI devices) or resorting to a synchronous write.
were declared as non-const. This is backwards (_lkm_exec() changes the
pointers but all the target `struct execsw's are const). Fixed this
and poisoned related declarations to match and removed the bogus casts
that hid the bug.
the file access time update on reads and can be useful in reducing
filesystem overhead in cases where the access time is not important (like
Usenet news spools).
the primary and secondary return codes, causing it to not behave as
documented. This probably originates from the ancient BSD kernels that
had pipe(2) implemented by socketpair(2), there are no binaries left that
we can run that do this.
Pointed out by: Robert Withrow <witr@rwwa.com>, PR#731
note that at_shutdown has a new parameter to indicate When
during a shutdown the callout should be made. also
add a RB_POWEROFF flag to reboot "howto" parameter..
tells the reboot code in our at_shutdown module to turn off the UPS
and kill the power. bound to be useful eventually on laptops
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.
Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.
Reviewed by: davidg
half way through the range rather than possibly colliding with
fixed elements. Increase the size of the arrays to take this into account..
remember that each element in the array is now only 1 ponter so this
isn't that much..
also note a possible bug in debugging code in uipc_socket2.c (add XXX)
I've been meaning to do this for AGES as I keep having to patch those routines
whenever I write a proprietary package or similar..
any module that assigns resources to processes needs to know when
these events occur. there are existsing modules that should be modified
to take advantage of these.. e.g. SYSV IPC primatives
presently have #ifdef entries in exit()
this also helps with making LKMs out of such things..
(see the man pages at_exit(9) and at_fork(9))
called kern_shutdown.c
note: I couldn't see anything machine dependant in the
functions boot() and dumpsys() which were in machdep.c
I have left a prototype for cpu_boot() which would go in
machdep.c, but I have nothing to put in it. Iexpect others will
let me know in no uncertain ways that this or that is machine dependant
and should be there, but I'll way for that to happen.. :)
I haven't actually taken the functions OUT of machdep
or anywhere else yet.. I'm checking in this file so others can have a look
at it and comment. SO PLEASE DO COMMENT!
I am also (in another checkin) addinf a man(9) page for the new
at_shotdown().. er freudian slip there.. at_shutdown() call
so have a look at that (and at_exit and at_fork as well)
and feed me comments..
I'll heck in the changes to make these (shutdown) changes active tomorrow
if no-one objects too strongly..
Fixes unp_externalize panic which occurs when a process is at it's
ulimit for file descriptors and tries to receive a file descriptor from
another process.
Reviewed by: wollman
block number.. (assuming Debugger() returned). The disk drivers assume
that dscheck() sets both error markers (bp->b_error and set B_ERROR in
bp->b_flags) if it fails.
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
performance issues.
1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.
Saves about 280 butes of source per driver, 56 bytes in object size
and another 56 bytes moves from data to bss.
No functional change intended nor expected.
GENERIC should be about one k smaller now :-)