It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).
This will allow to boot from gmirror device in degraded mode.
Users should move to the new geom_vinum implementation instead.
The refcount logic which is being added to devices to enable safe module
unloading and the buf/vm work also in progress would require a major rework
of the (old)-vinum code to comply with the new semantics.
The actual source files will not be removed until I have coordinated with
the geomvinum people if they need any bits repo-copied etc.
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.
Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.
Replace spechash_mtx use with dev_lock()/dev_unlock().
A thread must hold mp while calling cv_signal(), cv_broadcast(), or
cv_broadcastpri() even though it isn't passed as an argument.
and is right with this claim.
While here remove a "\" from the macro -> __inline conversion.
Found by: csjp
MFC after: 4 days
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.
LOR id: 031
panic on hub detach bugs that have been reported. This work around
detaches the device before deleting it. This changes the detach order
from in-order to pre-order. This avoids uhub's deleting the children
after its subdevs has been deleted.
This is only a workaround. This leads to a strange condition in the
device tree where attached devices are children of detached ones. I
really don't know what that's supposed to mean, but does violate my
sense of POLA. Fortunately, the violation is short lived, which is
why I'm going ahead and committing the work around.
# We really need to consider life w/o the multiple nested layers of
# compatibility macros. They make finding bugs like this *MUCH*
# harder.
Patch by: iadowse
MT5 before: next_release(5.3-BETA5) (unless someting better comes along)
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.
See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.
Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.
MT5 Candidate
Better to kill all other threads than to panic the system if 2 threads call
execve() at the same time. A better fix will be committed later.
Note that this only affects the case where the execve fails.
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.
MFC after: 1 month
Tested on: i386, alpha, sparc64
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.
Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>
Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks)
As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE
anyway, but that was an implementation side-effect....
KASSERT -> CTASSERT suggested by: dd@
Approved by: silence on -net
and 0x3f7. fdc_isa_alloc_resource() didn't work right in this case
(it accessed FDOUT correctly due to an overflow of the first resource.
It accesed FDSTS and FDDATA incorrectly via the second resource (which
wound up accessing FDOUT and the tape register at 0x3f3) and badly for
the CTL register (at location 0x3f4). This is a minimal fix that just
'eats' the first one if it covers two locations and has an offset of
0. This confusion lead the floppy driver to think there'd been a disk
change, which uncovered a deadlock in the floppy/geom code which lead
to a panic. These changes fix that by fixing the underlying resource
problem, but doesn't address the potential deadlock issue that might
still be there.
This is a minimal fix so it can more safely be merged into 5 w/o risk
for known working configurations (hence the use of the ugly goto,
which reduces case 8 to case 6 w/o affecting cases 1-7). A more
invasive fix that will handle more ACPI resource list diversity is in
the pipeline that should kill these issues once and for all, while
staying within the resources that we allocate.
Tested/Reported by: das
Reviewed by: njl
MFC before: re->next_release_name(5.3-BETA5);
registers are control bits or depending on the model contain additional
time bits with a different meaning than the lower ones. In order to
only read the desired time bits and not change the upper bits on write
use appropriate masks in the gettime and settime function respectively.
Due to the polarity of the stop oscillator bit and the fact that the
century bits aren't used on sparc64 not masking them didn't cause
problems so far.
- Fix two off-by-one errors in the handling of the day of week. The
genclock code represents the dow as 0 - 6 with 0 being Sunday but the
mk48txx use 1 - 7 with 1 being Sunday. In the settime function when
writing the dow to the clock the range wasn't adjusted accordingly but
the clock apparently played along nicely otherwise the second bug in
the gettime function which mapped 1 - 7 to 0 - 6 but with 0 meaning
Saturday would have been triggered. Fixing these makes the date being
stored in the same format Sun/Solaris uses and cures the "Invalid time
in real time clock. Check and reset the date immediately!" when the
date was set under Solaris prior to booting FreeBSD/sparc64. [1]
Looking at other clock drivers/code e.g. FreeBSD/alpha the former "bug",
i.e. storing the dow as 0 - 6 even when the clock uses 1 - 7, seems to
be common but might be on purpose for compatibility when multi-booting
with other OS which do the same. So it might make sense to add a flag
to handle the dow off-by-one for use of this driver on platforms other
than sparc64.
- Check the state of the battery on mk48txx that support this in the
attach function.
- Add a note that use of the century bit should be implemented but isn't
required at the moment because it isn't used on sparc64.
Problem noted by: joerg [1]
MT5 candidate.
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:
1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.
2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.
3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.
Reviewed by: tegge@
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called. Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code. I have retained
vm_proc_dispose(), since I consider its disuse a bug.
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.
The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory. Once the VM/buf thing is sorted out it is next on
the list.
Retire most of vnode method. ffs_getpages(). It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.
Retire specfs_getpages() as well, as it has no users now.
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.
Analogous to the drive level, give each volume and plex a worker thread
that picks up and processes incoming and completed BIOs.
This should fix the data corruption issues that have come up a few
weeks ago and improve performance, especially of RAID5 plexes.
The volume level needs a little work, though.
to 4.0 and RELENG_3), the BTX mini-kernel used paging rather than flat
mode and clients were limited to a virtual address space of 16 megabytes.
Because of this limitation, boot2 silently masked all physical addresses
in any binaries it loaded so that they were always loaded into the first
16 Meg. Since BTX no longer has this limitation (and hasn't for a long
time), remove the masking from boot2. This allows boot2 to load kernels
larger than about 12 to 14 meg (12 for non-PAE, 14 for PAE).
Submitted by: Sergey Lyubka devnull at uptsoft dot com
MFC after: 1 month
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.
These normally only manifest if the ndis compat module is statically
compiled into a kernel image by way of 'options NDISAPI'.
Submitted by: Dmitri Nikulin
Approved by: wpaul
PR: kern/71449
MFC after: 1 week
driver. Trim its fingernails by removing some useless bits before
fixing the 'thread not terminated on detach' problem.
o dmacnt is no longer used now that we allocate at attach time. Remove
it from struct fdc_data.
o ISPNP was only ever set, but never tested. It used to be used for the
allocation routines to change how it allocated resources. Since that's
no longer necessary, retire the flag.
o ISPCMICA was only ever tested, but never set. GC it. This removes
a special case in determining the drive type. The drive type is
now set in fdc_pcmcia.c, so the hack isn't needed anymore. Sadly,
this isn't tested with a Y-E Data pcmcia floppy drive because there
are a number of other issues that preclude it from working.
o Fix ifdef for reading from the rtc. I'm of the opinion that this ifdef
should be moved into fdc_isa.c, but not today as ideally there'd be
other fixes to the probing of children. So now we just read it on
i386 ! pc98 (there's no #define for MACHINE_ARCH, just MACHINE, hence
this slightly inelegant kludge) and amd64. The PC98 exclusion likely
isn't meaningful since pc98 uses a different driver, but will be when
merging of the pc98 floppy code into this driver is complete (this is the
other reason I think this block of code belongs outside fdc.c).
All of these changes are safe to MT5.
most if not all of our tty drivers in the future.
Centralizing this stuff enables us to remove about 100 lines of
almost but not quite perfectly copy&paste code from each tty driver.
This is necessary so source upgrades use the correct binary.
MFC after: 3 days
For the record: Problem spotted by Scott Long, who mentioned
that source upgrades from 4.7 to recent 5.x and 6.0 are broken.
Detailed analysis shows that 4.7 has a broken make(1) binary.
A breakage was fixed in RELENG_4 in make/main.c,v 1.35.2.7 by
imp@, though the commit log erroneously stated "MFC 1.68"
while in fact it should have been spelled as "MFC 1.67".
copper NICs, a link change event is posted whenever MII autopolling is
toggled off and on, which happens whenever someone calls
bge_miibus_readreg() or bge_miibus_writereg() to access the PHY
registers. This means anytime someone called the SIOCGIFMEDIA ioctl
on a bge interface, the link would reset. Even a simple "ifconfig bge0"
would do it, though other apps like dhclient or the PPPoE daemon could
trigger it as well. An obvious symptom of this problem is lots of
"bgeX: gigabit link up" messages appearing on the console for no
apparent reason.
Through experimentation, I determined that when a real link change
event occurs, the BGE_MIMODE_AUTOPOLL in the BGE_MI_MODE register
is always set, so now if we have a copper NIC and an link change
event occurs and the BGE_MIMODE_AUTOPOLL bit is clear, we ignore
the event.
Note that this does not apply to the original BCM5700 chip since we
use a different method for sensing link changes with that chip (the
status block method was broken), nor to fiber optic NICs since they
don't use the GMII PHY access registers.
another way to misinterpret the spec. Also, always fall back to the hints
probe on any attach failure, not just when _FDE fails.
Thanks to imp and scottl for finding this.
Tested by: rwatson (minimally)
MFC after: 5 days
functions that can be called from enable/disable pf as well. This improves
switching from non-altq ruleset to altq ruleset (and the other way 'round)
by a great deal and makes pfctl act like the user would except it to.
PR: kern/71746
Tested by: Aurilien "beorn" Rougemont (PR submitter)
MFC after: 3 days
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).
Add more comments ip_init().
This make "_NEC" devices appear as "NEC" which is more corrent.
The reason is tha NEC originally screwed up on the byteorder in the
model string, so now that they have realized that they prefixed the '_'
so that not every ATA driver on the planet would call them "EN C" :)
reserving it at use time is more miserly, low memory (< 16MB)
evaporates quickly on many systems, so there may not be any suitable
buffers available. This specifically doesn't use the newer, fancier
isa_dma_init to ease merging to 5.
Reviewed by: tegge, phk
the firmware status register on the card to see if the firmware is still
running. There is no way to recover from this, but at least it can give
a hint as whether the car has crashed (which happens all too often).
MFC after: 3 days
errors for the attachment process for the floppy controller. This is
a band-aide because it doesn't try any of the fallback methods when
_FDE isn't long enough, but should be sufficient for people
experiencing the dreaded mutex not initialized panic.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.
As the patch says:
This is not a pretty patch and only meant as a temporary
fix until a better solution is committed.
PR: i386/71715
Submitted by: Stephan Uphoff <ups@tree.com>
MFC after: 1 week
it travels through the IP stack. This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.
The IP source route options of a packet are now stored in a mtag instead of the
global variable.
and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.
Problem uncovered by: tegge
problems:
1) Add locking for SMP, code provided by Alan Cox
2) While testing Alan's patches, I observed serious problems with
the jumbo buffer allocation code (machine crashed twice), so I gutted
it and rewrote the receive handler to use multiple chained descriptors.
Each RX descriptor gets a single 2K cluster, and the chip will fill in
as many as it needs to hold the complete packet.
User reports that this corrects the data corruption issues previously
observed and discussed on -current.
Note that this driver still needs to be hit with the busdma stick.
I intend to inflict said beating in the near future.
MFC after: 1 week
careful with the skip condition this time. Addresses are only not taken into
account if:
- The interface is POINTTOPOINT
- There is no route installed for the address
- The user specified noalias (:0)
and - We are looking at an IPv4 address.
This should be enough paranoia to not cause any false positives.
PR: misc/69954
Discussed with: yongari
MFC after: 4 days
o Allow for up to 3 resource I/O ranges to be given for the floppy
controller, rather than just two that are allowed for now.
o Make sure that we can work with either a base address of 0x3f0 or 0x3f2.
o Create new inline functions to access the YE DATA's unique BDCR register.
o Update pccard attachment to add the fd device.
o Do some minor style(9) polishing.
# I'm guessing that the fdc pccard attachment broke some time ago, since
# there are a number of issues with it still.
save to call if_attachdomain from if_attach() (as done for if_loop.c). We
will now end up with a properly initialized if_afdata array and the nd6
callout will no longer try to deref a NULL pointer.
Still this is a temp workaround and the locking for if_afdata should be
revisited at a later point.
Requested by: rwatson
Discussed with and tested by: yongari (a while ago)
PR: kern/70393
MFC after: 5 days
filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS
tagged packets to have ip_len and ip_off in host byte order instead of
network byte order.
PR: kern/71652
Submitted by: mlaier (patch)
and sent to the DIVERT socket while the original packet continues with the
next rule. Unlike a normally diverted packet no IP reassembly attemts are
made on tee'd packets and they are passed upwards totally unmodified.
Note: This will not be MFC'd to 4.x because of major infrastucture changes.
PR: kern/64240 (and many others collapsed into that one)
doesn't do this is beyond me, but that will be investigated later. This
results in programming the chip with the correct frequency, which in turn
allows devices to negotiate up to the full 20MB/s.
and the previously malloc'ed snapshot lock.
Malloc struct snapdata instead of just the lock.
Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes).
While here, give the private readblock() function a vnode argument
in preparation for moving UFS to access GEOM directly.
preparation for integration of p4::phk_bufwork. In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly. Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
the loss of a page modified (PG_M) bit in a race between processors.
Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.
Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.
In collaboration with: tegge@
MT5 candidate
PR: 61852
calls in sb_cmd2() and sb_getmixer(). The lock has already be grabbed
before these functions are called.
This is a RELENG_5 candidate.
PR: 71189
Submitted by: stephane
MFC after: 3 days
the pseudo header. We really need the TCP packet length here. This happens
to end up in ip->ip_len in tcp_input.c, but here we should get it from the
len function variable instead.
Submitted by: yongari
Tested by: Nicolas Linard, yongari (sparc64 + hme)
MFC after: 5 days
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.
Approved by: julian
value was only enough for 8GB of RAM, the new value can do 16GB. This still
isn't optimal since it doesn't scale. Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.
Reviewed by: peter
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.
Submitted by: tegge (of course!) (with changes)
MFC after: 3 days
VT6122 gigabit ethernet chip and integrated 10/100/1000 copper PHY.
The vge driver has been added to GENERIC for i386, pc98 and amd64,
but not to sparc or ia64 since I don't have the ability to test
it there. The vge(4) driver supports VLANs, checksum offload and
jumbo frames.
Also added the lge(4) and nge(4) drivers to GENERIC for i386 and
pc98 since I was in the neighborhood. There's no reason to leave them
out anymore.
to device driver unloading. Adding these two now and MT5'ing them will
allow us to preserve binary compatibility on RELENG_5 when these facilities
are MFC'ed.
MT5 Candiate.