dev_refthread() will return the cdevsw pointer or NULL. If the
return value is non-NULL a threadcount is held which much be released
with dev_relthread(). If the returned cdevsw is NULL no threadcount
is held on the device.
use <machine/efi.h> for the necessary definitions. This makes the EFI
code in sys/boot/efi totally unused, except for pure EFI loaders. As
such, maintenance and porting (to IA-32) of the EFI code is made as easy
as possible.
because it was mostly irrelevant - except for the silly BIOS_PADDRTOVADDR
etc macros. Along the way of working around this, I missed a few things.
* Make syscons properly inherit the bios capslock/shiftlock/etc state like
i386 does. Note that we cannot inherit the bios key repeat rate because
that requires a bios call (which is impossible for us).
* Give syscons the ability to beep on amd64. Oops.
While here, make bios.c compile and add it to files.amd64.
PHYSADDR : Address of the physical memory
KERNPHYSADDR : Physical address where the kernel starts
KERNVIRTADDR : Virtual address of the kernel
STARTUP_PAGETABLE_ADDR : Where to put the page table at bootstrap
+ Xscale specific options
work out of the box too, but I have no hardware to test.
It works well enough to go multiuser. Network works, SATA does not, as I have
no drive to test.
Thanks to Intel for sending such a board.
Obtained from: NetBSD
Remove the cache state logic : right now, it provides more problems than it
helps.
Add helper functions for mapping devices while bootstrapping.
Reorganize the code a bit, and remove dead code.
Obtained from: NetBSD (partially)
tree:
- td_standin is (k + a) as it is only touched by either curthread or when
a thread is being created.
- td_upcall is (k + j)
- td_sticks is (k) rather than the earlier (j) note.
- td_uuticks and td_usticks are both (k).
- td_intrval is (j)
- Neither kg_nextupcall or kg_upquantum seem to be locked and that seems
to be on purpose, so mark those as (n).
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).
This will allow to boot from gmirror device in degraded mode.
Users should move to the new geom_vinum implementation instead.
The refcount logic which is being added to devices to enable safe module
unloading and the buf/vm work also in progress would require a major rework
of the (old)-vinum code to comply with the new semantics.
The actual source files will not be removed until I have coordinated with
the geomvinum people if they need any bits repo-copied etc.
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.
Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.
Replace spechash_mtx use with dev_lock()/dev_unlock().
A thread must hold mp while calling cv_signal(), cv_broadcast(), or
cv_broadcastpri() even though it isn't passed as an argument.
and is right with this claim.
While here remove a "\" from the macro -> __inline conversion.
Found by: csjp
MFC after: 4 days
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.
LOR id: 031
panic on hub detach bugs that have been reported. This work around
detaches the device before deleting it. This changes the detach order
from in-order to pre-order. This avoids uhub's deleting the children
after its subdevs has been deleted.
This is only a workaround. This leads to a strange condition in the
device tree where attached devices are children of detached ones. I
really don't know what that's supposed to mean, but does violate my
sense of POLA. Fortunately, the violation is short lived, which is
why I'm going ahead and committing the work around.
# We really need to consider life w/o the multiple nested layers of
# compatibility macros. They make finding bugs like this *MUCH*
# harder.
Patch by: iadowse
MT5 before: next_release(5.3-BETA5) (unless someting better comes along)
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.
See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.
Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.
MT5 Candidate
Better to kill all other threads than to panic the system if 2 threads call
execve() at the same time. A better fix will be committed later.
Note that this only affects the case where the execve fails.
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.
MFC after: 1 month
Tested on: i386, alpha, sparc64
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.
Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>
Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks)
As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE
anyway, but that was an implementation side-effect....
KASSERT -> CTASSERT suggested by: dd@
Approved by: silence on -net
and 0x3f7. fdc_isa_alloc_resource() didn't work right in this case
(it accessed FDOUT correctly due to an overflow of the first resource.
It accesed FDSTS and FDDATA incorrectly via the second resource (which
wound up accessing FDOUT and the tape register at 0x3f3) and badly for
the CTL register (at location 0x3f4). This is a minimal fix that just
'eats' the first one if it covers two locations and has an offset of
0. This confusion lead the floppy driver to think there'd been a disk
change, which uncovered a deadlock in the floppy/geom code which lead
to a panic. These changes fix that by fixing the underlying resource
problem, but doesn't address the potential deadlock issue that might
still be there.
This is a minimal fix so it can more safely be merged into 5 w/o risk
for known working configurations (hence the use of the ugly goto,
which reduces case 8 to case 6 w/o affecting cases 1-7). A more
invasive fix that will handle more ACPI resource list diversity is in
the pipeline that should kill these issues once and for all, while
staying within the resources that we allocate.
Tested/Reported by: das
Reviewed by: njl
MFC before: re->next_release_name(5.3-BETA5);
registers are control bits or depending on the model contain additional
time bits with a different meaning than the lower ones. In order to
only read the desired time bits and not change the upper bits on write
use appropriate masks in the gettime and settime function respectively.
Due to the polarity of the stop oscillator bit and the fact that the
century bits aren't used on sparc64 not masking them didn't cause
problems so far.
- Fix two off-by-one errors in the handling of the day of week. The
genclock code represents the dow as 0 - 6 with 0 being Sunday but the
mk48txx use 1 - 7 with 1 being Sunday. In the settime function when
writing the dow to the clock the range wasn't adjusted accordingly but
the clock apparently played along nicely otherwise the second bug in
the gettime function which mapped 1 - 7 to 0 - 6 but with 0 meaning
Saturday would have been triggered. Fixing these makes the date being
stored in the same format Sun/Solaris uses and cures the "Invalid time
in real time clock. Check and reset the date immediately!" when the
date was set under Solaris prior to booting FreeBSD/sparc64. [1]
Looking at other clock drivers/code e.g. FreeBSD/alpha the former "bug",
i.e. storing the dow as 0 - 6 even when the clock uses 1 - 7, seems to
be common but might be on purpose for compatibility when multi-booting
with other OS which do the same. So it might make sense to add a flag
to handle the dow off-by-one for use of this driver on platforms other
than sparc64.
- Check the state of the battery on mk48txx that support this in the
attach function.
- Add a note that use of the century bit should be implemented but isn't
required at the moment because it isn't used on sparc64.
Problem noted by: joerg [1]
MT5 candidate.
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:
1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.
2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.
3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.
Reviewed by: tegge@
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called. Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code. I have retained
vm_proc_dispose(), since I consider its disuse a bug.
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.
The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory. Once the VM/buf thing is sorted out it is next on
the list.
Retire most of vnode method. ffs_getpages(). It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.
Retire specfs_getpages() as well, as it has no users now.
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.
Analogous to the drive level, give each volume and plex a worker thread
that picks up and processes incoming and completed BIOs.
This should fix the data corruption issues that have come up a few
weeks ago and improve performance, especially of RAID5 plexes.
The volume level needs a little work, though.
to 4.0 and RELENG_3), the BTX mini-kernel used paging rather than flat
mode and clients were limited to a virtual address space of 16 megabytes.
Because of this limitation, boot2 silently masked all physical addresses
in any binaries it loaded so that they were always loaded into the first
16 Meg. Since BTX no longer has this limitation (and hasn't for a long
time), remove the masking from boot2. This allows boot2 to load kernels
larger than about 12 to 14 meg (12 for non-PAE, 14 for PAE).
Submitted by: Sergey Lyubka devnull at uptsoft dot com
MFC after: 1 month
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.
These normally only manifest if the ndis compat module is statically
compiled into a kernel image by way of 'options NDISAPI'.
Submitted by: Dmitri Nikulin
Approved by: wpaul
PR: kern/71449
MFC after: 1 week
driver. Trim its fingernails by removing some useless bits before
fixing the 'thread not terminated on detach' problem.
o dmacnt is no longer used now that we allocate at attach time. Remove
it from struct fdc_data.
o ISPNP was only ever set, but never tested. It used to be used for the
allocation routines to change how it allocated resources. Since that's
no longer necessary, retire the flag.
o ISPCMICA was only ever tested, but never set. GC it. This removes
a special case in determining the drive type. The drive type is
now set in fdc_pcmcia.c, so the hack isn't needed anymore. Sadly,
this isn't tested with a Y-E Data pcmcia floppy drive because there
are a number of other issues that preclude it from working.
o Fix ifdef for reading from the rtc. I'm of the opinion that this ifdef
should be moved into fdc_isa.c, but not today as ideally there'd be
other fixes to the probing of children. So now we just read it on
i386 ! pc98 (there's no #define for MACHINE_ARCH, just MACHINE, hence
this slightly inelegant kludge) and amd64. The PC98 exclusion likely
isn't meaningful since pc98 uses a different driver, but will be when
merging of the pc98 floppy code into this driver is complete (this is the
other reason I think this block of code belongs outside fdc.c).
All of these changes are safe to MT5.
most if not all of our tty drivers in the future.
Centralizing this stuff enables us to remove about 100 lines of
almost but not quite perfectly copy&paste code from each tty driver.
This is necessary so source upgrades use the correct binary.
MFC after: 3 days
For the record: Problem spotted by Scott Long, who mentioned
that source upgrades from 4.7 to recent 5.x and 6.0 are broken.
Detailed analysis shows that 4.7 has a broken make(1) binary.
A breakage was fixed in RELENG_4 in make/main.c,v 1.35.2.7 by
imp@, though the commit log erroneously stated "MFC 1.68"
while in fact it should have been spelled as "MFC 1.67".
copper NICs, a link change event is posted whenever MII autopolling is
toggled off and on, which happens whenever someone calls
bge_miibus_readreg() or bge_miibus_writereg() to access the PHY
registers. This means anytime someone called the SIOCGIFMEDIA ioctl
on a bge interface, the link would reset. Even a simple "ifconfig bge0"
would do it, though other apps like dhclient or the PPPoE daemon could
trigger it as well. An obvious symptom of this problem is lots of
"bgeX: gigabit link up" messages appearing on the console for no
apparent reason.
Through experimentation, I determined that when a real link change
event occurs, the BGE_MIMODE_AUTOPOLL in the BGE_MI_MODE register
is always set, so now if we have a copper NIC and an link change
event occurs and the BGE_MIMODE_AUTOPOLL bit is clear, we ignore
the event.
Note that this does not apply to the original BCM5700 chip since we
use a different method for sensing link changes with that chip (the
status block method was broken), nor to fiber optic NICs since they
don't use the GMII PHY access registers.
another way to misinterpret the spec. Also, always fall back to the hints
probe on any attach failure, not just when _FDE fails.
Thanks to imp and scottl for finding this.
Tested by: rwatson (minimally)
MFC after: 5 days
functions that can be called from enable/disable pf as well. This improves
switching from non-altq ruleset to altq ruleset (and the other way 'round)
by a great deal and makes pfctl act like the user would except it to.
PR: kern/71746
Tested by: Aurilien "beorn" Rougemont (PR submitter)
MFC after: 3 days
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).
Add more comments ip_init().
This make "_NEC" devices appear as "NEC" which is more corrent.
The reason is tha NEC originally screwed up on the byteorder in the
model string, so now that they have realized that they prefixed the '_'
so that not every ATA driver on the planet would call them "EN C" :)
reserving it at use time is more miserly, low memory (< 16MB)
evaporates quickly on many systems, so there may not be any suitable
buffers available. This specifically doesn't use the newer, fancier
isa_dma_init to ease merging to 5.
Reviewed by: tegge, phk
the firmware status register on the card to see if the firmware is still
running. There is no way to recover from this, but at least it can give
a hint as whether the car has crashed (which happens all too often).
MFC after: 3 days
errors for the attachment process for the floppy controller. This is
a band-aide because it doesn't try any of the fallback methods when
_FDE isn't long enough, but should be sufficient for people
experiencing the dreaded mutex not initialized panic.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.
As the patch says:
This is not a pretty patch and only meant as a temporary
fix until a better solution is committed.
PR: i386/71715
Submitted by: Stephan Uphoff <ups@tree.com>
MFC after: 1 week
it travels through the IP stack. This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.
The IP source route options of a packet are now stored in a mtag instead of the
global variable.
and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.
Problem uncovered by: tegge