the pv lists in the vm_page, even unmanaged kernel mappings. This is so
that the virtual cachability of these mappings can be tracked when a page
is mapped to more than one virtual address. All virtually cachable
mappings of a physical page must have the same virtual colour, or illegal
alises can be created in the data cache. This is a bit tricky because we
still have to recognize managed and unmanaged mappings, even though they
are all on the pv lists.
is currently conditional on both the GEOM and GEOM_GPT options to
avoid getting GPT by default and having the MBR and GPT classes
clash.
The correct behaviour of the MBR class would be to back-off (reject)
a MBR if it's a Protective MBR (a MBR with a single partition of type
0xEE that spans the whole disk (as far as the MBR is concerned).
The correct behaviour if the GPT class would be to back-off (reject)
a GPT if there's a MBR that's not a Protective MBR.
At this stage it's inconvenient to destroy a good MBR when working
with GPTs that it's more convenient to have the MBR class back-off
when it detects the GPT signature on disk and have the GPT class
ignore the MBR.
In sys/gpt.h UUIDs (GUIDs) for the following FreeBSD partitions
have been defined:
GPT_ENT_TYPE_FREEBSD
FreeBSD slice with disklabel. This is the equivalent of
the well-known FreeBSD MBR partition type.
GPT_ENT_TYPE_FREEBSD_{SWAP|UFS|UFS2|VINUM}
FreeBSD partitions in the context of disklabel. This is
speculating on the idea to use the GPT to hold partitions
instead if slices and removing the fixed (and low) limits
we have on the number of partitions.
This commit lacks a GPT image for the regression suite.
The uuidgen command, by means of the uuidgen syscall, generates one
or more Universally Unique Identifiers compatible with OSF/DCE 1.1
version 1 UUIDs.
From the Perforce logs (change 11995):
Round of cleanups:
o Give uuidgen() the correct prototype in syscalls.master
o Define struct uuid according to DCE 1.1 in sys/uuid.h
o Use struct uuid instead of uuid_t. The latter is defined
in sys/uuid.h but should not be used in kernel land.
o Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid()
to kern_uuid.c for use in the kernel (currently geom_gpt.c).
o Rename the non-standard struct uuid in kern/kern_uuid.c
to struct uuid_private and give it a slightly better definition
for better byte-order handling. See below.
o In sys/gpt.h, fix the broken uuid definitions to match the now
compliant struct uuid definition. See below.
o In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change.
A note about byte-order:
The standard failed to provide a non-conflicting and
unambiguous definition for the binary representation. My initial
implementation always wrote the timestamp as a 64-bit little-endian
(2s-complement) integral. The clock sequence was always written
as a 16-bit big-endian (2s-complement) integral. After a good
nights sleep and couple of Pan Galactic Gargle Blasters (not
necessarily in that order :-) I reread the spec and came to the
conclusion that the time fields are always written in the native
by order, provided the the low, mid and hi chopping still occurs.
The spec mentions that you "might need to swap bytes if you talk
to a machine that has a different byte-order". The clock sequence
is always written in big-endian order (as is the IEEE 802 address)
because its division is resulting in bytes, making the ordering
unambiguous.
"The only hard problem in cryptography is key-management."
All sectors are encrypted with AES in CBC mode using a constant key,
currently compiled in and all zero.
To activate this module, write the magic header on the partition:
echo "<<FreeBSD-GEOM-AES>>" | dd conv=sync of=/dev/md98
The encrypted device will be one sector shorter and have ".aes"
appended to its name.
Sponsored by: DARPA & NAI Labs.
back to -fformat-extensions (or whatever) when we have the functionality.
We are gaining warnings again that should be fixed but the are being hidden
by NO_WERROR and all the -Wformat noise.
option is used (not on by default).
- In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set,
then we can safely read the thread pointer from the mtx_lock member while
holding sched_lock. We then examine the thread to see if it is currently
executing on another CPU. If it is, then we keep looping instead of
blocking.
- In the case of trying to unlock a mutex, it is now possible for a mutex
to have MTX_CONTESTED set in mtx_lock but to not have any threads
actually blocked on it, so we need to handle that case. In that case,
we just release the lock as if MTX_CONTESTED was not set and return.
- We do not adaptively spin on Giant as Giant is held for long times and
it slows SMP systems down to a crawl (it was taking several minutes,
like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when
they adaptively spinned on Giant).
- We only compile in the code to do this for SMP kernels, it doesn't make
sense for UP kernels.
Tested on: i386, alpha, sparc64
IFS had its fingers deep in the belly of the UFS/FFS split. IFS
will be reimplemented by the maintainer at a later date.
Requested by: adrian (maintainer)
shared code and converting all ufs references. Originally it may
have made sense to share common features between the two filesystems,
but recently it has only caused problems, the UFS2 work being the
final straw.
All UFS_* indirect calls are now direct calls to ext2_* functions,
and ext2fs-specific mount and inode structures have been introduced.
nearly in its entirety from i386, so it retains the phk/nati copyright.
Savecore likes the results, but I have no way to test it as gdb is
still broken.
0xdeadc0de and then check for it just before memory is handed off as part
of a new request. This will catch any post free/pre alloc modification of
memory, as well as introduce errors for anything that tries to dereference
it as a pointer.
This code takes the form of special init, fini, ctor and dtor routines that
are specificly used by malloc. It is in a seperate file because additional
debugging aids will want to live here as well.
ever connect a SCSI Cdrom/Tape/Jukebox/Scanner/Printer/kitty-litter-scooper
to your high-end RAID controller. The interface to the arrays is still
via the block interface; this merely provides a way to circumvent the
RAID functionality and access the SCSI buses directly. Note that for
somewhat obvious reasons, hard drives are not exposed to the da driver
through this interface, though you can still talk to them via the pass
driver. Be the first on your block to low-level format unsuspecting
drives that are part of an array!
To enable this, add the 'aacp' device to your kernel config.
MFC after: 3 days
to build kernel and kernel modules so stop supporting them in
bsd.subdir.mk and reimplement them in kern.post.mk and kmod.mk
as special versions of the install and reinstall targets, and
only define them if DEBUG is also defined (when debug versions
are really built).
Prompted by: bde
Ensure all standard targets honor SUBDIR. Now `make obj' descends into
SUBDIRs even if NOOBJ is set (some descendants may still need an object
directory, but we do not have such precedents). Now `make install' in
non-bsd.subdir.mk makefiles runs `afterinstall' target _after_ `install'
in SUBDIRs, like we do in bsd.subdir.mk. Nothing depended on the wrong
order anyway.
Fixed `distribute' targets (except for the bsd.subdir.mk version) so that
they do not depend on _SUBDIR; `distribute' calls `install' which already
depends on _SUBDIR.
De-standardize `maninstall', otherwise manpages would be installed twice.
(To be revised later.)
- Add stubs for EISA and SBUS cards.
(VME, FutureBUS, and TurboChannel stubs not provided.)
- Add infrastructure to build driver and bus front-end modules.
drivers with MI portions into the MI notes. Device drivers such as busses
like the isa, eisa, and pci devices are now in the MD NOTES section even
though they have some MI code. This will ensure that only the proper bits
of device drivers will be included due to the optional bits dependent on
the busses in sys/conf/files. This commit also takes the stance that since
hints are ignored in NOTES anyways, it is ok to include hints for a bus
that may not be present.
Advice from: bde
behavior by default. Also, change the options line to reflect this.
If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.
Demanded by: obrien
time-of-day clocks, ported from NetBSD. The front-ends are expected
to be at least partly machine-dependent; the sparc64 EBus and SBus
ones will be commited to MD directories for now (in a subsequent commit).
a set of helper routines to deal with real-time clocks. The generic
functions access the clock diver using a kobj interface. This is intended
to reduce code reduplication and make it easy to support more than one
clock model on a single architecture.
This code is currently only used on sparc64, but it is planned to convert
the code of the other architectures to it later.
Unfortunately, this level doesn't really provide enough granularity. We
probably need several MI NOTES type files for things that are shared by
several architectures but not by all. For example, the PCI options could
live in a NOTES.pci.
This also updates the Makefile for i386 to generate LINT. The only changes
in the generated LINT are the order of various options.
Suggestions for improvement welcome.
following sysctl variables:
debug.mutex.prof.enable enable / disable profiling
debug.mutex.prof.acquisitions number of mutex acquisitions recorded
debug.mutex.prof.records number of acquisition points recorded
debug.mutex.prof.maxrecords max number of acquisition points
debug.mutex.prof.rejected number of rejections (due to full table)
debug.mutex.prof.hashsize hash size
debug.mutex.prof.collisions number of hash collisions
debug.mutex.prof.stats profiling statistics
The code records four numbers for each acquisition point (identified by
source file name and line number): longest time held, total time held,
number of non-recursive acquisitions, average time held. The measurements
are in clock cycles (as returned by get_cyclecount(9)); this may cause
measurements on some SMP systems to be unreliable. This can probably be
worked around by replacing get_cyclecount(9) by some incarnation of
nanotime(9).
This work was derived from initial patches by eivind.
dump the trace buffer feasible.
- Remove KTR_EXTEND. This changes the format of the trace entries when
activated, making writing a userland tool which is not tied to a specific
kernel configuration difficult.
- Use get_cyclecount() for timestamps. nanotime() is much too heavy weight
and requires recursion protection due to ktr traces occuring as a result
of ktr traces. KTR_VERBOSE may still require recursion protection, which
is now conditional on it.
- Allow KTR_CPU to be overridden by MD code. This is so that it is possible
to trace early in startup before pcpu and/or curthread are setup.
- Add a version number for the ktr interface. A userland tool can check this
to detect mismatches.
- Use an array for the parameters to make decoding in userland easier.
- Add file and line recording to the non-extended traces now that the extended
version is no more.
These changes will break gdb macros to decode the extended version of the
trace buffer which are floating around. Users of these macros should either
use the show ktr command in ddb, or use the userland utility which can be run
on a core dump.
Approved by: jhb
Tested on: i386, sparc64
I have not been able to find very much information about the PC98
extended partition layout so this is gleaned from the source in
our pc98 architecture. Corrections and patched very welcome.
Sponsored by: DARPA and NAI Labs.
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.
Reviewed by: core
Approved by: core
"env name=value ... cmd ..." is just a pessimized way of doing
"name=value ... cmd ..." in real shells. Set the environment
(without using env(1)) before starting xargs so that env(1)
is not needed in "xargs env name=value ... cmd ..."
lint, so this is turned off by default. Setting WANT_LINT will turn
on generation of lint libraries for /usr/libdata/lint/*.ln.
Reviewd by: silence in -audit.
The detection code in this method is written so that it should work on
all architectures which means that you can plug a Sun disk into a i386
now and access the partitions.
We still need an endian-agnostic ufs/ffs before this is really
interresting, but the main focus was to get sparc64 onto the GEOM
trail.
The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.
A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable. I have personally been running this
patch for over a year now, so it is believed to be fully stable.
Reviewed by: jake, obrien
Approved by: jake
and commenting of NETSMB, NETWMBCRYPTO, and SMBFS. In NOTES, they
had all floated to the bottom of the file with the list of seemingly
random and unclassified kernel options. This change moves them back
up to the network protocol and file system areas, and also documents
the dependencies.
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.
- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):
- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).
htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.
Make use of the new functions in a few places where local implementations
of the same functionality existed.
Reviewed by: mike, bde
Tested on alpha by: mike
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.
This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
- fix the warnings, they are there for a reason!
- add -DNO_ERROR to your make(1) command.
- add 'makeoptions NO_WERROR=true' to your kernel config.
- add 'nowerror' to conf/files* that have warnings that should be fixed
due to tracking 3rd party vendor code.
- add 'nowerror' to conf/files* where the warning is false due to a
compiler bug and fixing it with brute force would be too expensive.
There are some very sloppy warnings in our kernel build, come on folks!
'make release' uses -DNO_WERROR intentionally.
so that this is safe even if VARIABLE is longer than kern.argmax.
There is another instance of CFILES which might need the same treatment,
and might be noticed when doing a "make links".
The same has to be done in RELENG_4 (on some different file).
Noticed-by: picobsd cross-compiling LINT
Suggested-by: Alfred (bright@mu.org), des@freebsd.org
MFC-after: 3 days
It doesn't actually do it yet though. This adds a flag to config so
that we can exclude certain vendor files from this even when the rest
of the kernel has it on. make -DNO_WERROR would also bypass all of it.