Problem:
selwakeup required calling pfind which would cause lock order
reversals with the allproc_lock and the per-process filedesc lock.
Solution:
Instead of recording the pid of the select()'ing process into the
selinfo structure, actually record a pointer to the thread. To
avoid dereferencing a bad address all the selinfo structures that
are in use by a thread are kept in a list hung off the thread
(protected by sellock). When a selwakeup occurs the selinfo is
removed from that threads list, it is also removed on the way out
of select or poll where the thread will traverse its list removing
all the selinfos from its own list.
Problem:
Previously the PROC_LOCK was used to provide the mutual exclusion
needed to ensure proper locking, this couldn't work because there
was a single condvar used for select and poll and condvars can
only be used with a single mutex.
Solution:
Introduce a global mutex 'sellock' which is used to provide mutual
exclusion when recording events to wait on as well as performing
notification when an event occurs.
Interesting note:
schedlock is required to manipulate the per-thread TDF_SELECT
flag, however if given its own field it would not need schedlock,
also because TDF_SELECT is only manipulated under sellock one
doesn't actually use schedlock for syncronization, only to protect
against corruption.
Proc locks are no longer used in select/poll.
Portions contributed by: davidc
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.
After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system. To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.
This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe. One more
user of lockmgr down, many more to go :)
which is initialized with whatever string a dhcp/bootp server passes
as vendor tag 134.
There is no standard tag that I know with this information, and
no vendor-defined tag that applies to FreeBSD that I could find
doing the same thing.
The intended use is to pass information to userland for run-time
configuration of a diskless client without having to run a bootp/dhcp
client for the third time (after the one in pxeboot/etherboot, and
the one in the kernel bootp), also because these clients generally
screwup the interface configuration, which is not exactly what you
want when you have your disks nfs-mounted.
Manpage update to follow soon.
MFC-after: 3 days
wait for those cpus, instead of all of them by using a count. Oops.
Make the pointer to the mask that the primary cpu spins on volatile, so
gcc doesn't optimize out an important load. Oops again.
Activate tlb shootdown ipi synchronization now that it works. We have
all involved cpus wait until all the others are done. This may not be
necessary, it is mostly for sanity.
Make the trigger level interrupt ipi handler work.
Submitted by: tmm
The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.
A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable. I have personally been running this
patch for over a year now, so it is believed to be fully stable.
Reviewed by: jake, obrien
Approved by: jake
test and play with this.
This is not yet production quality and should be run only on dedicated
test boxes.
For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).
Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf
Known significant limitations:
no kernel dump facility.
ioctls severely restricted.
Sponsored by: DARPA, NAI Labs
update the free-space statistics in some cases. The problem affected
directory blocks when the free space dropped below the size of the
maximum allowed entry size. When this happened, the free-space
summary information could claim that there are no further blocks
that can fit a maximum-size entry, even if there are.
The effect of this bug is that the directory may be enlarged even
though there is space within the directory for the new entry. This
wastes disk space and has a negative impact on performance.
Fix it by correctly computing the dh_firstfree array index, adding
a helper macro for clarity. Put an extra sanity check into
ufsdirhash_checkblock() to detect the situation in future.
Found by: dwmalone
Reviewed by: dwmalone
MFC after: 1 week
read-only.
The trouble here is that we don't reopen the device in read/write mode
when we remount in read/write mode resulting in a filesystem sending
write requests to a device which was only opened read/only.
I'm not quite sure how such a reopen would best be done and defer
the problem to more agile hackers.
unit allocation with a bitmap in the generic layer. This
allows us to get rid of the duplicated rman code in every
clonable interface.
Reviewed by: brooks
Approved by: phk
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).
NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".
Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.
MFC-after: 5 days
in "missing dependencies" error when loading some kld modules. It is sad to
see how often these days style cleanus break doesn't broken things. Perhaps
people should recall good old principle: "don't fix it if it isn't broken".
-#if defined(__FreeBSD__) && __FreeBSD_version__ >= 500023
+#if defined(__FreeBSD__) && __FreeBSD_version >= 500023
is a genuine bug -- __FreeBSD_version__ does not exist.
The other one:
-#if (__FreeBSD__ < 5)
+#if (__FreeBSD_version < 500000)
pops out when you cross-compile the code:
__FreeBSD__ is a compiler predefine,
__FreeBSD_version is defined in <sys/param.h> .
Given that in this case (and all others in sys/dev/usb and sys/i4b)
the goal is to adapt to a different kernel interface, and not to
a compiler feature, I believe the correct form is the second one
(in the best case the two are synonyms so the change does not break
anything anyways).
for POSIX.1-2001 conformance.
o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent
redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN.
o Add a note about missing typedefs in <arpa/inet.h>.
the number of physical pages per KVA page allocated scales properly with
memory size. This fixes problems with kmem_map being too small.
Noticed by: mike, wollman
Submitted by: tmm
fully instaniated.
Revert the logic in pipeclose so that we don't have the entire function
pretty much under a single if() statement, instead invert the test and
just return if it fails.
Submitted (in different form) by: bde
Don't use pool mutexes for pipes. We can not use pool mutexes
because we will need to grab the select lock while holding a pipe
lock which is not allowed because you may not aquire additional
mutexes when holding a pool mutex.
Instead malloc(9) space for the mutex that is shared between the
pipes.
o In i386's <machine/endian.h>, macros have some advantages over
inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
(previously __uint16_swap_uint32), it has some uses on i386's with
PDP endianness.
Submitted by: bde
o Move a comment up in <machine/endian.h> that was accidentially moved
down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
functions, so that the non-GCC (libc asm) case has proper
prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
_BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
for platforms in which asm versions don't exist. This significantly
reduces the complexity of some things at the cost of duplicate code.
Reviewed by: bde
the revived code.
vm pages newly allocated are marked busy (PG_BUSY), thus calling
vm_page_delete before the pages has been freed or unbusied will
cause a deadlock since vm_page_object_page_remove will wait for the
busy flag to be cleared. This can be triggered by calling malloc
with size > PAGE_SIZE and the M_NOWAIT flag on systems low on
physical free memory.
A kernel module that reproduces the problem, written by Logan Gabriel
<logan@mail.2cactus.com>, can be found in the freebsd-hackers mail
archive (12 Apr 2001). The problem was recently noticed again by
Archie Cobbs <archie@dellroad.org>.
Reviewed by: dillon
simply need to prevent switching from another CPU and do not need
interrupts disabled.
- Add a comment to witness_list() about why displaying spin locks for
threads on other CPU's really is just a bad idea and probably shouldn't
be done.
and commenting of NETSMB, NETWMBCRYPTO, and SMBFS. In NOTES, they
had all floated to the bottom of the file with the list of seemingly
random and unclassified kernel options. This change moves them back
up to the network protocol and file system areas, and also documents
the dependencies.
This is belived to be the only place where a soft reference to a vnode
is held with no sort of hard reference, consequently this change should
allow us to free(9) vnodes from the freelist after properly cleaning
them up.
Reviewed by: dillon
it worked- but I ran into a case with a 2204 where commands were being lost
right and left. Best be safe.
For target mode, or things called if we call isp_handle_other response- note
that we might have dropped locks by changing the output pointer so we bail
from the loop. It's the responsibility of the entity dropping the lock to
make sure that we let the f/w know we've read thus far into the response
queue (else we begin processing the same entries again- blech!).
MFC after: 1 day
the upcoming 7.4 family (7xxx controllers).
- improved error reporting and handling
- more diagnostic output
- add extra command packet definitions
- merge sources again with -stable
than the other implementations; we have complete control over the tlb, so we
only demap specific pages. We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.
Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.
This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.
Due to allocating tlb contexts on the fly, we only ever need to demap the
primary context, non-primary contexts have already been implicitly flushed
by context switching. All we really need to tell is if its a kernel demap
or not, and its easier just to compare against the kernel_pmap which is a
constant.
the context is not actually stolen, as it would be for i386. Instead of
deactivating a user vmspace immediately when switching out, and recycling
its tlb context, wait until the next context switch to a different user
vmspace. In this way we can switch from a user process to any number of
kernel threads and back to the same user process again, without losing any
of its mappings in the tlb that would not already be knocked by the automatic
replacement algorithm. This is not expected to have a measurable performance
improvement on the machines we currently run on, but it sounds cool and makes
the sparc64 port SMPng buzz word compliant.
to exhaust all kmaps. The only reward for setting maxproc
to a value which will cause kmap exhaustion is a panic
during a forkbomb attack.
MFC after: 3 days