enough and can cause some strange performance problems. Specifically, at
or near startup time is when the problem is worst. To reproduce
the problem, run "lat_syscall stat" from the alpha lmbench code right
after bootup. A positive side effect of this mod is that the name
cache can be set to grow again by sysctl. A noticable positive
performance impact is realized due to a larger namecache being available
as needed (or tuned.)
when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan
(sef@freebsd.org).
Also consolidated the setuid/setgid checks into one place.
Reviewed by: dyson,sef
defined, your really getting 32. Also warn about how you can't have
more than 256 pty's when your using DEVFS (non DEVFS can use more, just
the makedev script doesn't know how to make >256). it also doesn't
allocate more memory than needed in this case.
Make sure that the signal passed in TIOCSIG isn't 0 as it might cause
a panic. I personally haven't seen this happen, but after a similar
bug in syscons crashed my machine, I'm acutely aware of this one. :)
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.
mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()
Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock
Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.
It seems to work!!!
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.
We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default
POSIX.4. Additionally, there is some initial code that supports LIO.
This code supports AIO/LIO for all types of file descriptors, with
few if any restrictions. There will be a followup very soon that
will support significantly more efficient operation for VCHR type
files (raw.) This code is also dependent on some kernel features
that don't work under SMP yet. After I commit the changes to the
kernel to support proper address space sharing on SMP, this code
will also work under SMP.
General cleanup.
New functions to stop/start CPUs via IPIs:
- int stop_cpus( u_int map );
- int restart_cpus( u_int map );
Turned off by default, enabled via smptests.h:TEST_CPUSTOP.
Current version has a BUG, perhaps a deadlock?
Till now NMIs would be ignored. Now an NMI is caught by the BSP.
APs still ignore NMI, am working on code to allow a CPU to stop other CPUs
via an IPI.
Specifically, don't allow a value < 1 for any of them (it doesn't make
sense), and don't let the low water mark be greater than the corresponding
high water mark.
Pre-Approved by: wollman
Obtained from: NetBSD
nothing good except of opening a can of (potential or real) security
holes. People maintaining a machine with higher security requirements
need to be on the console anyway, so there's no point in not forcing
them to reboot before starting maintenance.
Agreed by: hackers, guido
This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])
There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.
This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.
Reviewed by: fsmp, dyson
flag wasn't being respected during vref(), et. al. Note that this
isn't the eventual fix for the locking problem. Fine grained SMP
in the VM and VFS code will require (lots) more work.
cause a problem of spiraling death due to buffer resource limitations.
The vfs_bio code in general had little ability to handle buffer resource
management, and now it does. Also, there are a lot more knobs for tuning the
vfs_bio code now. The knobs came free because of the need that there
always be some immediately available buffers (non-delayed or locked) for
use. Note that the buffer cache code is much less likely to get bogged
down with lots of delayed writes, even more so than before.
It is possible for multiple process to sleep concurrently waiting
for a buffer. When the buffer shortage is a shortage of space but
not a shortage of buffer headers, the processes took turns creating
empty buffers and waking each other to advertise the brelse() of
the empties; progress was never made because tsleep() always found
another high-priority process to run and everything was done at
splbio(), so vfs_update never had a chance to flush delayed writes,
not to mention that i/o never had a chance to complete.
The problem seems to be rare in practice, but it can easily be
reproduced by misusing block devices, at least for sufficently slow
devices on machines with a sufficiently small buffer cache. E.g.,
`tar cvf /dev/fd0 /kernel' on an 8MB system with no disk in fd0
causes the problem quickly; the same command with a disk in fd0
causes the problem not quite as quickly; and people have reported
problems newfs'ing file systems on block devices.
Block devices only cause this problem indirectly. They are pessimized
for time and space, and the space pessimization causes the shortage
(it manifests as internal fragmentation in buffer_map).
This should be fixed in 2.2.
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).
shared function.
- use p->p_sleepend to try and get more accurate "time remaining" results
when the time has been adjusted.
- verify writeability of return address so that we can fail before sleeping
if the address for the result is bogus.
Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.
be (eventually) architecture independent. It provides an emulation
of the ISA interrupt registration function register_intr(), but that
function does no longer manipulated the interrupt controller and
interrupt descriptor table, but calls the architecture dependent
function setup_icu() for that purpose.
After theISA/EISA bus code has been modified to directly call the new
interrupt registartion functions (intr_create() and intr_connect()),
the emulation of register_intr() should be dropped.
The C level interrupt handler function should take a (void*) argument,
and the function pointer type (inthand2_t) should defined in some other
place than isa_device.h.
This commit is a pre-requisite for the removal of the PCI specific shared
interrupt code.
Reviewed by: dfr,bde
This is now the default, it delays most of the MP startup to the function
machdep.c:cpu_startup(). It should be possible to move the 2 functions
found there (mp_start() & mp_announce()) even further down the path once
we know exactly where that should be...
Help from: Peter Wemm <peter@spinner.dialix.com.au>
- The 1st (preparse_mp_table()) counts the number of cpus, busses, etc. and
records the LOCAL and IO APIC addresses.
- The 2nd pass (parse_mp_table()) does the actual parsing of info and recording
into the incore MP table.
This will allow us to defer the 2nd pass untill malloc() & private pages
are available (but thats for another day!).
When a panic occurs early in the SMP boot process 'cpunumber()' hangs,
causing the panic string to be lost. Now the system appears to hang
in 'breakpoint()', but at least the user sees the panic string before the
hang.
switch. I needed 'LINT' to compile for other reasons so I kinda got the
blood on my hands. Note: I don't know how to test this, I don't know if
it works correctly.
panic( "xxxxx\n" );
to:
printf( "xxxxx\n" );
panic( "\n" );
For some as yet undetermined reason the argument to panic() is often NOT
printed, and the system sometimes hangs before reaching the panic printout.
So we hopefully at least print some useful info before the hang, as oppossed to
leaving the user clueless as to what has happened.
and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by: dyson
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.
Serious:
- An important timevalfix() in settime[ofday]() was lost.
Not so serious:
- There was a race initializing `delta' in the check for setting the
time backwards.
- The `#ifdef notyet' check for setting the time more than a day forwards
was back to front.
[[I deleted the code, it's useless because of iteration - Peter]]
- The timespec was not checked for validity in clock_settime().
- The timespec was not fully checked for validity in nanotime(). The
check in itimerfix() is too late, since the conversion from a timespec
to a timeval may overflow.
- A garbage timeval was checked in settimeofday() for the (uap->tv == NULL
&& uap->tzp != NULL) case. I added the broken check this some time ago.
Cosmetic:
- The "inadvertantly (sic) sleeping forever" test always failed. hzto()
always returns >= 1.
- The style wasn't very KNFish. (I only changed new code.)
Submitted by: bde
in NetBSD. The core of settimeofday() is moved to a seperate static
function settime() which both clock_settime() and settimeofday() call.
Note that I picked up the securelevel > 1 check from NetBSD that prevents
the clock being set backwards in high securelevel mode (this was a hole
that allowed resetting of inode access timestamps to arbitary values)
Obtained from: mostly from NetBSD, but the settime() function is from
our gettimeofday(), some tweaks by me.
the patches in freefall:/home/dfr/ld.diffs to your ld sources and set
BINFORMAT to aoutkld when linking the kernel.
Library changes and userland utilities will appear in a later commit.
".." vnode. This is cheaper storagewise than keeping it in the
namecache, and it makes more sense since it's a 1:1 mapping.
2. Also handle the case of "." more intelligently rather than stuff
the namecache with pointless entries.
3. Add two lists to the vnode and hang namecache entries which go from
or to this vnode. When cleaning a vnode, delete all namecache
entries it invalidates.
4. Never reuse namecache enties, malloc new ones when we need it, free
old ones when they die. No longer a hard limit on how many we can
have.
5. Remove the upper limit on namelength of namecache entries.
6. Make a global list for negative namecache entries, limit their number
to a sysctl'able (debug.ncnegfactor) fraction of the total namecache.
Currently the default fraction is 1/16th. (Suggestions for better
default wanted!)
7. Assign v_id correctly in the face of 32bit rollover.
8. Remove the LRU list for namecache entries, not needed. Remove the
#ifdef NCH_STATISTICS stuff, it's not needed either.
9. Use the vnode freelist as a true LRU list, also for namecache accesses.
10. Reuse vnodes more aggresively but also more selectively, if we can't
reuse, malloc a new one. There is no longer a hard limit on their
number, they grow to the point where we don't reuse potentially
usable vnodes. A vnode will not get recycled if still has pages in
core or if it is the source of namecache entries (Yes, this does
indeed work :-) "." and ".." are not namecache entries any longer...)
11. Do not overload the v_id field in namecache entries with whiteout
information, use a char sized flags field instead, so we can get
rid of the vpid and v_id fields from the namecache struct. Since
we're linked to the vnodes and purged when they're cleaned, we don't
have to check the v_id any more.
12. NFS knew about the limitation on name length in the namecache, it
shouldn't and doesn't now.
Bugs:
The namecache statistics no longer includes the hits for ".."
and "." hits.
Performance impact:
Generally in the +/- 0.5% for "normal" workstations, but
I hope this will allow the system to be selftuning over a
bigger range of "special" applications. The case where
RAM is available but unused for cache because we don't have
any vnodes should be gone.
Future work:
Straighten out the namecache statistics.
"desiredvnodes" is still used to (bogusly ?) size hash
tables in the filesystems.
I have still to find a way to safely free unused vnodes
back so their number can shrink when not needed.
There is a few uses of the v_id field left in the filesystems,
scheduled for demolition at a later time.
Maybe a one slot cache for unused namecache entries should
be implemented to decrease the malloc/free frequency.
but now that we've widened the scope of the smp work to -current, it might
be an idea to warn new people that might not have read all the docs yet
that the SMP support needs to be activated via a sysctl.
This code re-numbers PCI busses in the MP table to match PCI semantics
when the MP BIOS fails to do it properly.
Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>
replace invldebug with invltlb_ok for throttling smp_invltlb() during boot.
Reviewed by: informal discussion with Peter Wemm <peter@spinner.DIALix.COM>
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>
removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
There are various options documented in i386/conf/LINT, there is more to
come over the next few days.
The kernel should run pretty much "as before" without the options to
activate SMP mode.
There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.
This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!
Fix another bug: if argv[0] is NULL, garbadge args might be added for
shell script
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no> (with yet one fault detect from me)
difference of approx 3mins in make world on my P6!!! This means
that vfork now has full address space sharing, so beware with
sloppy vfork programming. Also, you really do need to apply
the previously committed popen fix in libc.
Zero the b_dirty{off,end} after cluster-comitting a group of buffers.
With these fixes, I was able to complete a 'make world' with remote src
and obj directories.
were always in a tss; that tss just changed from the one in the
pcb to common_tss (who knows where it was when there was no curpcb?).
Not using the pcb also fixed the problem that there is no pcb in
idle(), so we now always get useful register values.
cache queue more often. The pageout daemon had to be waken up
more often than necessary since pages were not put on the
cache queue, when they should have been.
Submitted by: David Greenman <dg@freebsd.org>
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)
Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.
Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.
Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.
Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.
longer has anything to do with vnodes and never had anything to do
with buffers, but it needs the definitions of B_READ and B_WRITE
for use with the bogus useracc() interface and was getting them
bogusly due to excessive cleanups in rev.1.49.
space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.
Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.
This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.
Obtained from: bde
Use the name argument almost the same in all LKM types. Maintain
the current behavior for the external (e.g., modstat) name for DEV,
EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only
#name. This is a candidate for change and I vote just the name without
the "_mod".
Change the DISPATCH macro to MOD_DISPATCH for consistency with the
other macros.
Add an LKM_ANON #define to eliminate the magic -1 and associated
signed/unsigned warnings.
Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.
Change source in tree to use the new interface.
Reviewed by: Bruce Evans
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little
so.)
implementation #ifdef out. This can be used for now by NFS. As soon
as all the other filesystems' locking is fixed, this can go away.
Print the vnode address in vprint for easier debugging.
1. imgp->image_header needs to be cleared for the bp == NULL && `goto
interpret' case, else exec_fail_dealloc would free it twice after
an error.
2. Moved the vp->v_writecount check in exec_check_permissions() to
near the end. This fixes execve("/dev/null", ...) returning the
bogus errno ETXTBSY. ETXTBSY is still returned for attempts to
exec interpreted files that are open for writing. The man page
is very old and wrong here. It says that ETXTBSY is for pure
procedure (shared text) files that are open for writing or reading.
3. Moved the setuid disabling in exec_check_permissions() to the end.
Cosmetic. It's more natural to dispose of all the error cases
first.
...plus a couple of other cosmetic changes.
Submitted by: bde
by bde.
Don't return EPERM in setre[ug]id() just because the caller passes in
the current effective id in the second arg (ie: no change), as suggested
by ache.
The magic number conflicted with the rotting disabled one in ext2fs for
debug.doasyncfree.
Removed messy debugging variable/constant/sysctl debug.doreallocblks.
Lite2 removed it, and we don't use the code that it controls.
This is valueable for library code which needs to be able to find out
whether the current process is or *was* set[ug]id at some point in the
past, and may have a "tainted" execution environment. This is especially
a problem with the trend to immediately revoke privs at startup and regain
them for critical sections. One problem with this is that if a cracker
is able to compromise the program while it's still got a saved id, the
cracker can direct the program to regain the privs. Another problem is
that the user may be able to affect the program in some other way (eg:
setting resolver host aliases) and the library code needs to know when it
should disable these sorts of features.
Reviewed by: ache
Inspired by: OpenBSD (but with a different implementation)
that allows traditional BSD setuid/setgid behavior.
The only visible difference should be that a non-root setuid program
(eg: inn's "rnews" program) that is setuid to news, can completely
"become" uid news. (ie: setuid(geteuid()) This was allowed in
traditional 4.2/4.3BSD and is now "blessed" by Posix as a special
case of "appropriate privilige".
Also, be much more careful with the P_SUGID flag so that we can use it
for issetugid() - only set it if something changed.
Reviewed by: ache
vector except for the egid in groups[0]. There is a risk that programs
that come from SYSV/Linux that expect this to work and don't check for
error returns may accidently pass root's groups on to child processes.
We now do what is least suprising (to non BSD programs/programmers) in
this scenario, and nothing is changed for programs written with BSD groups
rules in mind.
Reviewed by: ache
to removing the connection from the queue. The problem here is that
falloc() may block and this would allow another process to accept the
connection instead. If this happens to leave the queue empty, then the
system will panic with an "accept: nothing queued".
Also changed a wakeup() to a wakeup_one() to avoid the "thundering herd"
problem on new connections in Apache (or any other application that has
multiple processes blocked in accept() for the same socket).
as shadows of their containing directory. This should solve the problem
of users not being able to delete their symlinks from /tmp once and for
all.
Symlinks do not have modes though, they are accessable to everything that
can read the directory (as before). They are made to show this fact at
lstat time (they appear as mode 0777 always, since that's how the the
lookup routines in the kernel treat them).
More commits will follow, eg: add a real lchown() syscall and man pages.
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.
they were created later on. This is not the case when processing
syscalls.isc in the ibcs2 area. (It generates no declarations, it's
all either hidden (already prototyped elsewhere) or unimplemented).
<sys/ioctl_compat.h> and sometimes <sys/filio.h> instead of
<sys/ioctl.h> in tty-related files. <sys/ttycom.h> is still
usually imported bogusly via <sys/termios.h>.
<sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h>
in miscellaneous files. Most of these files have nothing to do
with ttys but need to include <sys/ttycom.h> to get the definitions
of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into
ioctls.
automatically have random generation numbers. The kenel way of handling those
also changed. Further it is advised to run fsirand on all your nfs exported
filesystems. the code is mostly copied from OpenBSD, with the randomization
chanegd to use /dev/urandom
Reviewed by: Garrett
Obtained from: OpenBSD
null casts. `time' is nonvolatile for accesses within a region locked
by splclock()/splx(). Accesses outside such a region are invalid, and
splx() must have the side effect of potentially changing all global
variables (since there are hundreds of sort of volatile variables like
`time'), so declaring `time' as volatile didn't have any real benefits.
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).
What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:
- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.
- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.
- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.
The receiver can now inspect the credential information and use it to
authenticate the client.
devtotty(). devtotty() must check its arg carefully since the arg is
supplied as ioctl data. This should fix PR3004.
Renamed devtotty() to snpdevtotty().
formula uses `& nchash'. This is very broken when nchash is a prime
number instead of 1 less than a power of 2, but the Lite2 formula was
merged in.
Merged some cosmetic changes from Lite2, rev.1.21 and Lite1. The merge
was difficult because the Lite2 code is essentially ours (phk's) except
where Lite2 improved or broke it.
Summary of the Lite2 changes:
- in the copyright, phk's rights have been transferred to the Regents.
This change should be reviewed.
- nchENOENT went away; the "no" vnode is now simply 0.
- comments were improved.
- style was "improved".
- goto instead of Fanatism (sic) was considered bad :-).
- there are some small changes to support whiteouts.
- new cache entries are added in more cases. More work is required
near here to change the hash table size if kern.desiredvnodes is
changed using sysctl.
- rescanning of the hash bucket in cache_purgevfs() was removed. This
change should be reviewed.
(phk's) sysctl framework, and I needed special code to disambiguate
the VFS_GENERIC node from the VFS_VFSCONF leaf, so I only converted
the leaves to the FreeBSD framework. The error handling isn't quite
right. CSRGS's sysctls seem to return ENOTDIR too much and FreeBSD's
sysctls don't agree with the man page.
and getvfsbyname() interfaces. The new interfaces are now hidden from
applications unless _NEW_VFSCONF is defined. The new vfsconf interfaces
don't work yet.
cruft and resulted in loading usually following a null pointer. Use
something closer to the pre-Lite2 code, including not making a copy of
the new filesystem's config info. Not making a copy also fixes a race
for loading and a memory leak for unloading.
Fixed unloading of vfs's. maxvfsconf wasn't maintained.
Look up the vfs to unload by name instead of by number. The numbers
should go away as soon as all mount utilities are converted.
- getnewvnode() and vref() were missing one simple_unlock() each.
- the Lite2 locking changes weren't merged at all in
printlockedvnodes() or sysctl_vnode(). Merging these undid
some KNF style regressions.
all of the configurables and instrumentation related to
inter-process communication mechanisms. Some variables,
like mbuf statistics, are instrumented here for the first
time.
For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
- avoid malloc() if the number of fds is small.
- pack the bits better so that `small' is quite large.
- don't waste time generating zero bits for null fd_set pointers or
scanning these bits.
Possibly improved select():
- free malloc()ed storage before returning. This is simpler and I
think huge select()s aren't worth optimizing since they are rare,
relative gain would be small and there would be tiny costs for all
selects().
Reviewed by: ache (first version by him too)
execve() clears the P_SUGID process flag in execve() if the binary
executed does not have suid or sgid permission bits set.
This also happens when the effective uid is different from the real
uid or the effective gid is different from the real gid. Under
these circumstances, the process still has set id privileges and
the P_SUGID flag should not be cleared.
Submitted by: Tor Egge <Tor.Egge@idt.ntnu.no>
Successful lstat()s purged an existing entry as well as not caching the
result.
This bug was introduced in Lite1 by setting the LOCKPARENT flag for
[o]lstat() in order to support the inherit-attributes-from-parent-
directory misfeature for symlinks. LOCKPARENT was previously only set
for CREATEs and DELETEs. It is now set for LOOKUPs, but only for
[o]lstat(), so the problem wasn't very noticeable.
the old VFS_VFSCONF sysctl is enabled by default.
Initialize the vfc_vfsops field to non-NULL in sysctl_ovfs_conf()
so that the old VFS_VFSCONF sysctl actually works. The old (still
current) getvfsent.c uses this "kernel-only" field to decide which
vfs's are configured (the old implementation returned null entries
for unconfigured vfs's).
to coredump previously since it (somewhat uniquely) is setuid and forks
without execing, and thus without passing P_SUGID the child could
coredump and possibly divulge sensitive information (such as encrypted
passwords from the passwd database).
clusters greater than one page in length by calling contigmalloc1().
This uses a helper process `mclalloc' to do the allocation if
the system runs out at interrupt time to avoid calling contigmalloc
at high spl. It is not yet clear to me whether this works.
sb_max * MCLBYTES / (MSIZE + MCLBYTES)
used in sbreserve() to overflow, causing all socket creation attempts
to fail. Force the calculation to use u_quad_t's, which makes overflow
less likely.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
The limit is now only used by init, so it may as well be "infinite".
Don't use RLIM_INFINITY, since setrlimit() doesn't allow setting
that value. Use maxfiles instead of RLIM_INFINITY for the hard
limit for the same reason.
Similarly for the maxprocesses limits (use the "infinite" value of
maxproc instead if MAXUPRC and RLIM_INFINITY).
NOFILES, MAXUPRC, CHILD_MAX and OPEN_MAX are no longer used in
/usr/src and should go away. Their values are almost guaranteed to
be wrong now that login.conf exists, so anything that uses the values
is broken. Unfortunately, there are probably a lot of ports that
depend on them being defined.
The global limits maxfilesperproc and maxprocperuid should go away
too.
on it.
makesyscalls.sh:
This parsed $Id$. Fixed(?) to parse $FreeBSD$. The output is wrong when
the id is not expanded in the source file.
syscalls.master:
Fixed declaration of sigsuspend(). There are still some bogons and
spam involving sigset_t.
Use `struct foo *' instead of the equivalent `foo_t *' for some nfs and
lfs syscalls so that <sys/sysproto.h> doesn't depend on <sys/mount.h>.
variable `kern.maxvnodes' which gives much better control over vnode
allocation than EXTRAVNODES (except in -current between 1995/10/28 and
1996/11/12, kern.maxvnodes was read-only and thus useless).
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.
Reviewed by: wollman
rev.1.10 two years ago. Children continued to run at splhigh()
after returning from vm_fork(). This mainly affected kernel
processes and init. For ordinary processes, interrupts are normally
unmasked a few instructions later after fork() returns (it may be
important for syscall() not to reschedule the child processes).
Kernel processes had workarounds for the problem. Init manages to
start because some routines "know" that it is safe to go to sleep
despite their caller starting them at a high ipl. Then its ipl
gets fixed on its first normal return from a syscall.