style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).
NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".
Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.
MFC-after: 5 days
in "missing dependencies" error when loading some kld modules. It is sad to
see how often these days style cleanus break doesn't broken things. Perhaps
people should recall good old principle: "don't fix it if it isn't broken".
-#if defined(__FreeBSD__) && __FreeBSD_version__ >= 500023
+#if defined(__FreeBSD__) && __FreeBSD_version >= 500023
is a genuine bug -- __FreeBSD_version__ does not exist.
The other one:
-#if (__FreeBSD__ < 5)
+#if (__FreeBSD_version < 500000)
pops out when you cross-compile the code:
__FreeBSD__ is a compiler predefine,
__FreeBSD_version is defined in <sys/param.h> .
Given that in this case (and all others in sys/dev/usb and sys/i4b)
the goal is to adapt to a different kernel interface, and not to
a compiler feature, I believe the correct form is the second one
(in the best case the two are synonyms so the change does not break
anything anyways).
for POSIX.1-2001 conformance.
o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent
redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN.
o Add a note about missing typedefs in <arpa/inet.h>.
the number of physical pages per KVA page allocated scales properly with
memory size. This fixes problems with kmem_map being too small.
Noticed by: mike, wollman
Submitted by: tmm
fully instaniated.
Revert the logic in pipeclose so that we don't have the entire function
pretty much under a single if() statement, instead invert the test and
just return if it fails.
Submitted (in different form) by: bde
Don't use pool mutexes for pipes. We can not use pool mutexes
because we will need to grab the select lock while holding a pipe
lock which is not allowed because you may not aquire additional
mutexes when holding a pool mutex.
Instead malloc(9) space for the mutex that is shared between the
pipes.
o In i386's <machine/endian.h>, macros have some advantages over
inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
(previously __uint16_swap_uint32), it has some uses on i386's with
PDP endianness.
Submitted by: bde
o Move a comment up in <machine/endian.h> that was accidentially moved
down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
functions, so that the non-GCC (libc asm) case has proper
prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
_BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
for platforms in which asm versions don't exist. This significantly
reduces the complexity of some things at the cost of duplicate code.
Reviewed by: bde
the revived code.
vm pages newly allocated are marked busy (PG_BUSY), thus calling
vm_page_delete before the pages has been freed or unbusied will
cause a deadlock since vm_page_object_page_remove will wait for the
busy flag to be cleared. This can be triggered by calling malloc
with size > PAGE_SIZE and the M_NOWAIT flag on systems low on
physical free memory.
A kernel module that reproduces the problem, written by Logan Gabriel
<logan@mail.2cactus.com>, can be found in the freebsd-hackers mail
archive (12 Apr 2001). The problem was recently noticed again by
Archie Cobbs <archie@dellroad.org>.
Reviewed by: dillon
simply need to prevent switching from another CPU and do not need
interrupts disabled.
- Add a comment to witness_list() about why displaying spin locks for
threads on other CPU's really is just a bad idea and probably shouldn't
be done.
and commenting of NETSMB, NETWMBCRYPTO, and SMBFS. In NOTES, they
had all floated to the bottom of the file with the list of seemingly
random and unclassified kernel options. This change moves them back
up to the network protocol and file system areas, and also documents
the dependencies.
This is belived to be the only place where a soft reference to a vnode
is held with no sort of hard reference, consequently this change should
allow us to free(9) vnodes from the freelist after properly cleaning
them up.
Reviewed by: dillon
it worked- but I ran into a case with a 2204 where commands were being lost
right and left. Best be safe.
For target mode, or things called if we call isp_handle_other response- note
that we might have dropped locks by changing the output pointer so we bail
from the loop. It's the responsibility of the entity dropping the lock to
make sure that we let the f/w know we've read thus far into the response
queue (else we begin processing the same entries again- blech!).
MFC after: 1 day
the upcoming 7.4 family (7xxx controllers).
- improved error reporting and handling
- more diagnostic output
- add extra command packet definitions
- merge sources again with -stable
than the other implementations; we have complete control over the tlb, so we
only demap specific pages. We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.
Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.
This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.
Due to allocating tlb contexts on the fly, we only ever need to demap the
primary context, non-primary contexts have already been implicitly flushed
by context switching. All we really need to tell is if its a kernel demap
or not, and its easier just to compare against the kernel_pmap which is a
constant.
the context is not actually stolen, as it would be for i386. Instead of
deactivating a user vmspace immediately when switching out, and recycling
its tlb context, wait until the next context switch to a different user
vmspace. In this way we can switch from a user process to any number of
kernel threads and back to the same user process again, without losing any
of its mappings in the tlb that would not already be knocked by the automatic
replacement algorithm. This is not expected to have a measurable performance
improvement on the machines we currently run on, but it sounds cool and makes
the sparc64 port SMPng buzz word compliant.
to exhaust all kmaps. The only reward for setting maxproc
to a value which will cause kmap exhaustion is a panic
during a forkbomb attack.
MFC after: 3 days
by removing parentheses. The main bug is in gcc: on machines with
64-bit longs and 64-bit long longs,
(unsigned long long)rdp->total_sectors / ((1024L * 1024L) / DEV_BSIZE))
has type plain unsigned long instead of the correctly promoted type
unsigned long long, so it can not be printfed using %llu format. Even
1ULL / 1L is mispromoted. Anyway, casting the correct operand
automatically avoids the problem. We do not want to to pessimize the
division; we just want to convert to a common maximal type for printing.
moderately improves msync's and VM object flushing for objects containing
randomly dirtied pages (fsync(), msync(), filesystem update daemon),
and improves cpu use for small-ranged sequential msync()s in the face of
very large mmap()ings from O(N) to O(1) as might be performed by a database.
A sysctl, vm.msync_flush_flag, has been added and defaults to 3 (the two
committed optimizations are turned on by default). 0 will turn off both
optimizations.
This code has already been tested under stable and is one in a series of
memq / vp->v_dirtyblkhd / fsync optimizations to remove O(N^2) restart
conditions that will be coming down the pipe.
MFC after: 3 days
- Move jail checks and some other checks involving constants and stack
variables out from under Giant. This isn't perfectly safe atm because
jail_sysvipc_allowed is read w/o a lock meaning that its value could be
stale. This global variable will soon become a per-jail flag, however,
at which time it will either not need a lock or will use the prison lock.
individual filesystems to determine whether they should operate in
"file system as a single object" mode, or "file system as a set of objects
with individual labels" mode. Note: in the trustedbsd_mac branch,
this is refered to as "MNT_MULTILEVEL", but the two mean the same thing.
MNT_MULTILABEL is more suggestive of a flexible policy system than one
providing purely hierarchal policies. The need for a reserved flag will
go away once nmount() is done.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
and for individual MAC policies. The framework event initializes the
access control subsystem; the policy event allows policies to register
themselves. The gap in between is for all the things we'll think of
later.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
pseudo-devices when an interface goes away. Otherwise, an open /dev/net/foo0
when the interface is removed can cause a crash.
Not objected to by: jlemon
Some buggy firmware workarounds. Fix some endian bugs.
These reduce the diffs from NetBSD, but NetBSD does have more changes since
my last manual merge.
people working on the MAC tree from getting toasted whenever system call
numbers are allocated in the main tree (for example, for KSE :-).
Calls allocated: __mac_{get,set}_proc, __mac_{get,set}_{fd,file}().
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Includes some minor whitespace changes, and re-ordering to be able to document
properly (e.g, grouping of variables and the SYSCTL macro calls for them, where
the documentation has been added.)
Reviewed by: phk (but all errors are mine)
active-filter in pppd(8).
PR: kern/12281
Submitted by: Tim Moore <moore@bricoworks.com>
Not objected by: peter
Reviewed by: ru
Approved by: ru
MFC after: 1 week
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.
Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.
Reviewed by: jake
Approved by: jake
Link if only ATAPI device in kernel config
Remove unused #includes
Rearrange a bit in ata-raid to make diff against -stable smaller
Enable wc as default again, dunne how this happend...
and again in vm_page.c and vm_pageq.c.
o Delete unusused prototypes. (Mainly a result of the earlier renaming
of various functions from vm_page_*() to vm_pageq_*().)
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.
- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example
on the loader to do it. Improve smp startup code to be less racy and to
defer certain things until the right time. This almost boots single user
on my dual ultra 60, it is still very fragile:
SMP: AP CPU #1 Launched!
Enter full pathname of shell or RETURN for /bin/sh:
# ls
Debugger("trapsig")
Stopped at Debugger+0x1c: ta %xcc, 1
db> heh
No such command
db>
with pmaps. When the context numbers wrap around we flush all user mappings
from the tlb. This makes use of the array indexed by cpuid to allow a pmap
to have a different context number on a different cpu. If the context numbers
are then divided evenly among cpus such that none are shared, we can avoid
sending tlb shootdown ipis in an smp system for non-shared pmaps. This also
removes a limit of 8192 processes (pmaps) that could be active at any given
time due to running out of tlb contexts.
Inspired by: the brown book
Crucial bugfix from: tmm
clobbered by the child. This is more complicated than usual because the
window that could get clobbered is pushed in kernel mode, so a lot of
registers would have to be saved in other registers in userland and we
don't have enough. What we do have is space in the pcb to temporarily
store user windows that were spilled in kernel mode, but could not be
immediately stored to the user stack. So we copy in the parent's topmost
window and save it in the pcb, and arrange for it to be copied back out
when the child is done frobbing the stack.
Reviewed by: tmm
PAGE_SIZE / MCLBYTES == 1 crash. Fix them by changing the
appropriate "allocate new page and bucket" code in mb_alloc to use
the macro for properly grabbing an allocated object from a bucket,
the one that checks whether the bucket is empty.
This should allow ken to continue testing zero-copy stuff on -CURRENT.
Noticed and provided debug info: ken
* Move the section which manipulates ia64_pal_base to after cninit() so
that we don't risk printing anything before we have a console.
* Don't call ia64_probe_sapics() for a SKI build. This should really
be dependant on ACPICA being present or something.
Add code to properly detach/attach disks that are part of a RAID.
Mark a disk that is attached on an ATA channel belonging to a
RAID as a spare disk that can be used for rebuilding failed RAID1's.
Add support for rebuilding failed RAID1's.
Several fixes to the detach/attach code.
For replacing a disk in a failed RAID1 do the following:
Find the controller channel# of the failed disk.
Exec 'atacontrol detach <channel#>' to free the disk from the system.
Replace the failed disk with a new one of at least the same size.
If your have your disks in drawers/enclosures this can be done with
the system still running.
Exec 'atacontrol attach <channel#>' to add the disk to the system and
mark it as a valid spare for rebuild.
Exec 'atacontrol rebuild <array#>'
The system will rebuild the array on the fly, the array can still
be used during this, although with slower performance.
Please let me know of any problems with this!
Sponsored by: Advanis Inc.
MFC after: 2 weeks
fill out netc_anon (a `struct ucred'), and add an XXX around the
entire operation since it isn't clear whether it's doing the right
thing with things like cr_uidinfo and cr_prison.
is not a neighbor. see comments for the detailed reason.
- Rejected the process of nd6_rtrequest() when the request is RESOLVE and
the interface does not need neighbor caches.
Obtained from: KAME
MFC After: 1 week
the data was supplied as a uio or an mbuf. Previously the limit was
ignored for mbuf data, and NFS could run the kernel out of mbufs
when an ipfw rule blocked retransmissions.
Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.
Submitted by: dillon
Reviewed by: silby
MFC after: 1 week
I put these in to match the use of spl*() in the NetBSD code I was basing this
on, but it appears to cause problems.
I'm doing this in a separate commit so as to be able to refer back if locking
becomes an issue at a later stage.
Define the atm_dev_free() routine so that its OK to free stuff that is defined
as volatile. Note this doesn't FORCE the arguemnts to be volatile,
just says that it's not an error if it is..
and isn't strictly required. However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
the pipe is locked and shouldn't be.
initialize pipe->pipe_mtxp to NULL when creating pipes in order not
to trip the above assertions.
swap pipe lock with giant around calls to pipe_destroy_write_buffer()
pipe_destroy_write_buffer issue noticed by: jhb
fully protect p_ucred yet so Giant is needed until all the p_ucred
locking is done. This is the original reason td_ucred was not used
immediately after its addition. Unfortunately, not using td_ucred is
not enough to avoid problems. Since p_ucred could be stale, we could
actually be dereferencing a stale pointer to dink with the refcount, so
we really need Giant to avoid foot-shooting. This allows td_ucred to
be safely used as well.
In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.
Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.
Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.
Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.
PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week
in most machines of the Sun Ultra series. This is a port of the NetBSD
driver which I enhanced to make use of the gather functionality and the
configurable RX buffer offset to avoid copying all received/sent packet
(instead, packets will be directly DMAd from mbuf chains and into mbuf
clusters now).
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):
- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).
htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.
Make use of the new functions in a few places where local implementations
of the same functionality existed.
Reviewed by: mike, bde
Tested on alpha by: mike
as arguments. The correct hostname is copied into the buffer
while having the prison's lock acquired in a jailed process'
case.
Reviewed by: jhb, rwatson
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.
1/ conditionalise (#if 0) function that is not used.
Unused code left in place for netBSD compatibility.
2/ Recode loop to convince gcc that it does initialise a variable
(use do-while instead of for() so gcc knows that we always go through
at least once. Feel free to check my logic.
Both ends of the pipe share a pool_mutex, this makes allocation
and deadlock avoidance easy.
Remove some un-needed FILE_LOCK ops while I'm here.
There are some issues wrt to select and the f{s,g}etown code that
we'll have to deal with, I think we may also need to move the calls
to vfs_timestamp outside of the sections covered by PIPE_LOCK.
work loads. It tapers off after that as gcc's working set generally just fits.
compiling bin/csh:
TSB_PAGES = 2
213.33 real 77.59 user 110.01 sys
TSB_PAGES = 4
116.43 real 75.78 user 19.16 sys
TSB_PAGES = 8
119.27 real 76.38 user 18.12 sys
Testing by: tmm