pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.
This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.
MFC after: 3 days
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).
Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.
Sponsored by: ISPsystem
Reviewed by: rwatson
MFC after: 5 days
violated: so_pcb can never be NULL for a valid UDP socket, and it is
always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does.
MFC after: 1 week
Reviewed by: bz
Sponsored by: Juniper Networks
On Linux, /proc/<pid>/fd is comparable to fdescfs, where it allows you
to inspect the file descriptors used by each process. Glibc's ttyname()
works by performing a readlink() on these nodes, since all nodes in this
directory are symlinks.
It is a bit hard to implement this in linprocfs right now, so I am not
going to bother. Add a way to make ttyname(3) work, by adding a
/proc/<pid>/fd symlink, which points to /dev/fd only if the calling
process matches. When fdescfs is mounted, this will cause the
readlink() in ttyname() to fail, causing it to fall back on manually
finding a matching node in /dev.
Discussed on: emulation@
been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was
guaranteed to be stable when a valid socket reference is held (as it is in
the output path).
MFC after: 1 week
Reviewed by: bz
Sponsored by: Juniper Networks
tcbinfo lock there: r175612, which re-added it, masked a race between
sonewconn(2) and accept(2) that could allow an incompletely initialized
address on a newly-created socket on a listen queue to be exposed. Full
details can be found in that commit message.
MFC after: 1 week
Sponsored by: Juniper Networks
radix table root nodes. This is only needed (and available)
in the virtualization case to free the resources when tearing
down a virtual network stack.
Sponsored by: ISPsystem
Reviewed by: julian, zec
MFC after: 5 days
to not leak them making the VM subsystem unhappy with every stoped vnet(*).
We will still leak pages (especially as zones are marked NOFREE).
(*) This will also keep vmstat -z more usable.
Sponsored by: ISPsystem
MFC after: 5 days
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid. Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS. The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.
Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.
Sponsored by: ISPsystem
MFC after: 5 days
than spinning forever. This fixes booting with CF ejected.
NB: I've made the driver pretty chatty about errors in case there's hardware
that operates differently to mine, so we can easily track down any issues.
Reviewed by: imp
Sponsored by: Packet Forensics
redundant implementations.
o) Use ABI, not ISA, to determine address length.
o) Disable and restore interrupts around any operation that uses all 64 bits of
a register. In kernels using the O32 ABI, the upper 32 bits of those
registers is likely to be corrupted by an interrupt.
Sponsored by: Packet Forensics
In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.
- We don't need to fall back to uncacheable memory to satisfy BUS_DMA_COHERENT
requests on these CPUs.
- The bus_dmamap_sync() is a no-op for these CPUs.
A side-effect of this change is rename DMAMAP_COHERENT flag to
DMAMAP_UNCACHEABLE. This conveys the purpose of the flag more accurately.
Reviewed by: gonzo, imp
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).
sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.
MFC after: 3 weeks
interface didn't be attached automatically at boot time so changes a
approach to attach children based on leveraging some newbus niceties.
Submitted by: nwhitehorn
- change the way in which command queue overflow is handled;
- do not expose to CAM two command slots, used for driver's internal purposes;
- allow driver to use up to 1024 command slots, instead of 256 before.
88E1149 PHY. This will fix intermittent watchdog timeouts as well
as very slow network performance on 88E8072 Yukon Extreme.
PR: kern/144148
MFC after: 1 week
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.
In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.
Sponsored by: Sandvine Incorporated
Tested by: yongari, Richard Todd
<rmtodd at ichotolot dot servalan dot com>
MFC: 3 weeks
X-MFC: r202387, 204309
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
device is different from the device used to the original mount.
Note that update_mp does not need devvp locked, and pmp->pm_devvp cannot
be freed meantime.
Reported and tested by: pho
MFC after: 3 weeks
o rename the new variables to comply with the naming scheme
o move the new variables to an AR5212 specific struct
o use ahp when available
o revert to previous ts_flags check
- remove unused and commented code (MIPS_BUS_SPACE_PCI, pic_usb_ack)
- use rmi_pci_bus_space for USB too (needs byteswap)
- uncomment xls_ehci.c in files.xlr
- changes to xls_ehci.c - updated with dev/usb/controller/ehci_*.c as
Obtained from: JC - c.jayachandran@gmail.com
Enhanced process coredump routines.
This brings in the following features:
1) Limit number of cores per process via the %I coredump formatter.
Example:
if corefilename is set to %N.%I.core AND num_cores = 3, then
if a process "rpd" cores, then the corefile will be named
"rpd.0.core", however if it cores again, then the kernel will
generate "rpd.1.core" until we hit the limit of "num_cores".
this is useful to get several corefiles, but also prevent filling
the machine with corefiles.
2) Encode machine hostname in core dump name via %H.
3) Compress coredumps, useful for embedded platforms with limited space.
A sysctl kern.compress_user_cores is made available if turned on.
To enable compressed coredumps, the following config options need to be set:
options COMPRESS_USER_CORES
device zlib # brings in the zlib requirements.
device gzio # brings in the kernel vnode gzip output module.
4) Eventhandlers are fired to indicate coredumps in progress.
5) The imgact sv_coredump routine has grown a flag to pass in more
state, currently this is used only for passing a flag down to compress
the coredump or not.
Note that the gzio facility can be used for generic output of gzip'd
streams via vnodes.
Obtained from: Juniper Networks
Reviewed by: kan
does not generate excessive interrupts any more so we don't need
to have two copies of interrupt handler.
While I'm here remove two STAT_PUT_IDX register accesses in LE
status event handler. After r204539 msk(4) always sync status LEs
so there is no need to resort to reading STAT_PUT_IDX register to
know the end of status LE processing. Just trust status LE's
ownership bit.
countdown timer register. The timer resolution may vary among
controllers but the value would be represented by core clock
cycles. msk(4) will automatically computes number of required clock
cycles from given micro-seconds unit.
The default interrupt holdoff timer value is 100us which will
ensure less than 10k interrupts under load. The timer value can be
changed with dev.mskc.0.int_holdoff sysctl node.
Note, the interrupt moderation is shared resource on dual-port
controllers so you can't use separate interrupt moderation value
for each port. This means we can't stop interrupt moderation in
driver stop routine. Also have msk_tick() reclaim transmitted Tx
buffers as safety belt. With this change there is no need to check
missing Tx completion interrupt in watchdog handler, so remove it.
full-duplex. Previously msk(4) used to allow flow-control on
1000baseT half-duplex media. Also GMAC pause is enabled if link
partner is capable of handling it.
While I'm here use IFM_OPTIONS instead of using IFM_GMASK to check
optional flags of link.
- Rename the netisr protocol registration array, 'np' to 'netisr_proto',
in order to reduce the chances of symbol name collisions. It remains
statically defined, but it will be looked up by netstat(1).
- Move certain internal structure definitions from netisr.c to
netisr_internal.h so that netstat(1) can find them. They remain
private, and should not be used for any other purpose (for example,
they should not be used by kernel modules, which must instead use the
public interfaces in netisr.h).
- Store a kernel-compiled version of NETISR_MAXPROT in the global variable
netisr_maxprot, and export via a sysctl, so that it is available for use
by netstat(1). This is especially important for crashdump
interpretation, where the size of the workstream structure is determined
by the maximum number of protocols compiled into the kernel.
MFC after: 1 week
Sponsored by: Juniper Networks
This is a split merge because of non-uniform licensing of the DTC package
contents and the way these components will be used in the FreeBSD environment.
The original DTC package is composed of the following two major pieces:
1. sys/contrib/libfdt (BSD [dual] license)
2. contrib/dtc (GPLv2)
The libfdt component is going to be shared in all aspects of the environment:
- /boot/loader
- kernel
- dtc (the device tree compiler proper, userspace tool)
msdosfs-specific variant of vn_vget_ino(), msdosfs_deget_dotdot().
As was done for UFS, relookup the dotdot denode after the call to
msdosfs_deget_dotdot(), because vnode lock is dropped and directory
might be moved.
Tested by: pho
MFC after: 3 weeks
SLOT_EMPTY deName[0] values. Besides conforming to FAT specification, it
also clears the issue where vfs_hash_insert found the vnode in hash, and
newly allocated vnode is vput()ed. There, deName[0] == 0, and vnode is
not reclaimed, indefinitely kept on mountlist.
Tested by: pho
MFC after: 3 weeks
The plan is to use vnode lock to protect denode and fat cache,
and having separate lock for block use map.
Change the check and return on impossible condition into KASSERT().
Tested by: pho
MFC after: 3 weeks
Do not do additional dev_ref() on the newly created interface in the
if_clone create method [1]. This reference is not needed and never
removed, causing struct cdevpriv leakage. Remove the setting of
SI_CHEAPCLONE flag as well, since it is unused.
For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF)
instead of calling make_dev() and then dev_ref(), to avoid a race.
Call drain_dev_clone_events() at the module unload time after dev_clone
handler is deinstalled.
Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1]
MFC after: 1 week
It is the case, however, that the uidinfo of the temporary credential
set up for access(2) is not properly updated when its effective uid is
changed.
MFC after: 3 days
o Assign vectors based on priority, because vectors have
implied priority in hardware.
o Use unordered memory accesses to the I/O sapic and use
the acceptance form of the mf instruction.
o Remove the sapicreg.h and sapicvar.h headers. All definitions
in sapicreg.h are private to sapic.c and all definitions in
sapicvar.h are either private or interface functions. Move the
interface functions to intr.h.
o Hide the definition of struct sapic.
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.
Reviewed by: jhb, kib
MFC after: 3 days
in the process queue when gathering information for the process, and set
of signals pending for the thread, when gathering information for the
thread. Previously, the sysctl returned a union of the process and some
arbitrary thread pending set for the process, and union of the process
and the thread pending set for the thread.
MFC after: 1 week
32 bit handles. The RIO (reduced interrupt operation) and fast posting
for the parallel SCSI cards were all 16 bit handles. Furthermore,
target mode parallel SCSI only can have 16 bit handles.
Use part of a supplied patch to switch over to using 32 bit handles.
Be a bit more conservative here and only do this for parallel SCSI
for the 12160 (Ultra3) cards. There were a lot of marginal Ultra2
cards, and, frankly, few are findable now for testing.
Fix the target handle routine to only do 16 bit handles for parallel
SCSI cards. This is okay because the upper sixteen bits of the new
32 bit handles is a sequence number to help protect against duplicate
completions. This would be very unlikely to happen with parallel
SCSI target mode, and wasn't present before, so we're no worse off
than we used to be.
While we're at it, finally split the async mailbox completion handlers
into FC and parallel SCSI functions. This makes it much cleaner and
easier to figure out what is or isn't a legal async mailbox completion
code for different card classes.
PR: kern/144250
Submitted partially by: Charles D
MFC after: 1 week
the issue. I still have no idea why TSO does not work on this
controller. davidch@ also confirmed there is no known TSO related
issues for this controller.
parsing code in TSO path as the controller requires VLAN hardware
tagging to make TSO work over VLANs.
While parsing the mbuf in TSO patch, always perform check for
writable mbuf as bce(4) have to reset IP length and IP checksum
field of IP header and make sure to ensure contiguous buffer before
accessing IP/TCP headers. While I'm here replace magic number 40 to
more readable sizeof(struct ip) + sizeof(struct tcphdr).
Reviewed by: davidch
TX/RX checksum handler to set/clear relavant assist bits which was
used to cause unexpected results.
With this change, bce(4) can be bridged with other interfaces that
lack TSO, VLAN checksum offloading.
Reviewed by: davidch
around. Management firmware(ASF/IPMI/UMP) requires the VLAN
hardware tag stripping so don't actually disable VLAN hardware tag
stripping. If VLAN hardware tag stripping was disabled, bce(4)
manually reconstruct VLAN frame by appending stripped VLAN tag.
Also remove unnecessary IFCAP_VLAN_MTU message.
Reviewed by: davidch
for controllers like 88E8053 which reports two MSI messages.
Because we don't get anything useful things with 2 MSI messages,
allocating 1 MSI message would be more sane approach.
While I'm here, enable MSI for dual-port controllers too. Because
status block is shared for dual-port controllers, I don't think
msk(4) will encounter problem for using MSI on dual-port
controllers.
The register offset is not valid on 88E8072 controller. Also don't
blindly increase max read request size to 4096, instead, use 2048
which seems to be more sane value and only change the value if the
hardware default size(512) was used on that register.
For PCIX controllers, use system defined constant rather than using
magic value.
While I'm here stop showing negotiated link width.
not require checksum LE configuration if checksum start and write
position is the same as before. So keep track last checksum start
and write position and insert new LE whenever the position is
changed. This reduces number of LEs used in TX path as well as
slightly enhance TX performance.
the scratchpage is updated, the PVO's physical address is updated as well.
This makes pmap_extract() begin returning non-zero values again, causing
the panic partially fixed in r204297. Fix this by excluding addresses
beyond virtual_end from consideration as KVA addresses, instead of allowing
addresses up to VM_MAX_KERNEL_ADDRESS.
shared and generalized between our current amd64, i386 and pc98.
This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.
Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib, jhb, imp
Discussed on: arch
MFC: 3 weeks
counters have not gone about MAXCPU or NETISR_MAXPROT. These problems
caused panics on UP kernels with INVARIANTS when using sysctl -a, but
would also have caused problems for 32-core boxes or if the netisr
protocol vector was fully populated.
Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com>
MFC after: 4 days
its PVO to map physical address 0 instead of kernelstart. This fixes a
situation in which a user process could attempt to return this address
via KVM, have it fault while being modified, and then panic the kernel
because (a) it is supposed to map a valid address and (b) it lies in the
no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail.
While here, move msgbuf and dpcpu make into regular KVA space for
consistency with other implementations.
too high due to several overflows. The actual limit is somewhere in the
neighborhood of INT_MAX/4 on 64-bit machines, but most systems could not
support such a limit due to a lack of memory and the cost of duplicate
credentials.
Reported by: bde
been around for a long time now (7.1-ish or even earlier); assume
they are present. These includes MSI, TSO, LRO, VLAN, INTR_FILTERS,
FIRMWARE, etc.
Also, eliminate some dead code and clean up in other places as part
of this quick once-over.
MFC after: 1 week
race as it could already have been tx'd and freed by that time. Place
the bpf tap just _before_ writing the gen bit.
This fixes a panic when running tcpdump on a cxgb interface.
physical address is changed, there is a brief window during which its PTE
is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold
the page table lock, it was possible for another CPU to insert a new PTE
into the scratch page's PTEG slot during this interval, corrupting both
mappings.
Solve this by creating a new flag, LPTE_LOCKED, such that
moea64_pte_insert will avoid claiming locked PTEG slots even if they
are invalid. This change also incorporates some additional paranoia
added to solve things I thought might be this bug.
Reported by: linimon
- Add a separate palette data for 8-bit DAC mode when SC_PIXEL_MODE is set
and fill it up with default gray-scale palette data for text. Now we don't
have to set `hint.sc.0.vesa_mode' to get the default palette data.
- Add a new adapter flag, V_ADP_DAC8 to track whether the controller is
using 8-bit palette format and load correct palette when switching modes.
- Set 8-bit DAC mode only for non-VGA compatible graphics mode.
change and have isp_make_here scan the whole bus which will then scan all
luns.
I think xpt_rescan needs to be fixed, but that's a separable issue.
Suggested by: Alexander
not support TSO over VLAN if VLAN hardware tagging is disabled so
there is no need to check VLAN here.
While I'm here make sure to pullup IP/TCP headers in the first
buffer.
GEM_MIF_CONFIG_MDI0 cannot be trusted when the firmware has powered
down the chip so the internal transceiver has to be hardcoded. This
is also in line with the AppleGMACEthernet driver, which just doesn't
distinguish between internal/external transceiver and MDIO/MDI1
respectively in the first place. Tested by: Andreas Tobler
MFC after: 1 week
frame, make sure to update VLAN capabilities whenever jumbo frame
is configured.
While I'm here rearrange interface capabilities configuration. The
controller requires VLAN hardware tagging to make TSO work on VLANs
so explicitly check this requirement.
Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length. This also means that the relative
alignment of two adjacent bytes in the I/O stream have a difference of 1
even if they are not physically contiguous.
The old code, when needing to perform a copy in order to align data, only
copied the amount of data needed to reach the next page boundary. This
often left an unaligned end to the segment. Drivers such as Xen's blkfront
can't deal with such segments.
The downside to this approach is that, once an unaligned region is encountered,
the remainder of the I/O will be bounced. However, bouncing should be rare.
It is typically caused by non-performance critical userland programs that
don't bother to align their I/O buffers (e.g. bsdlabel). In-kernel I/O
buffers are always aligned to at least a page boundary.
Reviewed by: scottl
MFC after: 2 weeks
no superpage mappings are created within the clean submap, aligning the
start of the clean submap helps to prevent interference with kmem_alloc()'s
use of superpages.
such that a fancier thermal management algorithm can be run from user
space, but the kernel will at least ensure your machine does not either
sound like a wind tunnel or catch fire.
Although this file has historically been used as a dumping ground for
random functions, nowadays it only contains functions related to copying
bits {from,to} userspace and hash table utility functions.
Behold, subr_uio.c and subr_hash.c.
and will try to load it if it's not present. To better meet these
expectations, change the module name for the loopback interface from
'loop' to 'if_lo'. The loopback interface is always compiled into the
base kernel, so there are no resulting changes in kld files, etc.
Discussed with: brooks (ages ago)
MFC after: 1 week
The `name' and `newp' arguments can be marked const, because the buffers
they refer to are never changed. While there, perform some other
cleanups:
- Remove K&R from sysctl.c.
- Implement sysctlbyname() using sysctlnametomib() to prevent
duplication of an undocumented kernel interface.
- Fix some whitespace nits.
It seems the prototypes are now in sync with NetBSD as well.
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
This file was missed in r204152.
Tx DMA burst size 2048, I beleive PCIe maximum read request size
also should match to the value of Tx DMA burst size. With this
change I can get more than 800Mbps for TCP bulk transfers.
Previously I was not able to get more than 700Mbps. If I enable TSO
it now shows 927Mbps.
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
to make TSO work on VLAN. So if VLAN hardware tagging is disabled
explicitly clear TSO on VLAN. While I'm here remove duplicated
VLAN_CAPABILITIES call.
from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to
TSO over VLAN without VLAN hardware tagging.
Driver changes and userland support will follow.
Reviewed by: thompsa
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.
We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].
Sponsored by: ISPsystem
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 5 days
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().
Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.
Sponsored by: ISPsystem
Reviewed by: rwatson
MFC After: 5 days
- add bus_space_rmi_pci.c for PCI bus space
- files.xlr update for changes in files
- pcibus.c merged into xlr_pci.c (they were small files with inter-dependencies)
- xlr_pci.c - lot of changes here with few fixes, formatting cleanup
Obtained from: C. Jayachandran (JC) - c.jayachandran@gmail.com
- remove pci related code from bus_space_rmi.c, we will have another
file for PCI bus space functions which will do byte-swapping.
- remove local SWAP implementation
- added TODO stub for unimplemented functions
Obtained from: C. Jayachandran - c.jayachandran@gmail.com
- (cleanup) remove rmi specific 'struct mips_intrhand' - this is no
longer needed since 'struct intr_event' have all the required hooks
- add xlr_cpu_establish_hardintr, which has args for pre/post ithread
and filter hooks, so that the PCI code can add the PCI controller
interrupt ack code here
- make 'cpu_establish_hardintr' use the above function.
- (fix) change type of eirr/eimr from register_t to uint64_t. These
have to be 64bit otherwise we cannot handle interrupts from 32.
- (fix) use eimr to mask eirr before checking interrupts, so that we
will not handle masked interrupts.
Obtained from: C. Jayachandran - c.jayachandran@gmail.com
UMA segments at their physical addresses instead of into KVA. This emulates
the direct mapping behavior of OEA32 in an ad-hoc way. To make this work
properly required sharing the entire kernel PMAP with Open Firmware, so
ofw_pmap is transformed into a stub on 64-bit CPUs.
Also implement some more tweaks to get more mileage out of our limited
amount of KVA, principally by extending KVA into segment 16 until the
beginning of the first OFW mapping.
Reported by: linimon
of its argument before atomically replacing it, which could occasionally
return the wrong value on an SMP system. This resulted in user mutex
operations hanging when using threaded applications.
This header file uses __packed, without including <sys/cdefs.h>. This
means it cannot be used in the way described in sysarch(3) by only
including <machine/sysarch.h>.
The backtrace code tries to look for an instruction of the form "sw ra, x(sp)"
to figure out the program counter of the calling function. When we generate
the kernel exception frame we store the 'ra' at the time of the exception
using an instruction of the same form. The problem is that the 'ra' at the
time of the exception is not the same as the 'program counter' at the time
of the exception.
The fix is to save the 'exception program counter' register by staging
it through the 'ra' register.
The RX overflow is reported in bit 2 on real hardware and Linux driver
for the same device already has this defined correctly.
This fixes frequent interrupt storms seen on RouterStation Pro boards.
Discussed with: gonzo
questions on the thermal calibration), and to read and set fan RPMs from
software. While here, fix a number of bugs.
Calibration code from: OpenBSD
MFC after: 2 weeks
HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.
HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.
For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.
Sponsored by: FreeBSD Foundation
Sponsored by: OMCnet Internet Service GmbH
Sponsored by: TransIP BV
PVOs, and so the modified state of the page can no longer be communicated
to the VM layer, causing pages not to be flushed to swap when needed, in
turn causing memory corruption. Also make several correctness adjustments
to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs.
Obtained from: projects/ppc64
Discussed with: grehan
MFC after: 2 weeks
This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).
Tested by: jhb
Getting the little-endian PCI bus working on the big-endian CPU proved to be
quite challenging. We let the PCI devices be mapped in the "match byte lanes"
address window. This is where they are mapped by the CFE and DMA transfers
generated to or from addresses within this window are not subject to automatic
byte-swapping.
However any access by the driver to memory-mapped pci space is redirected
via the "match bit lanes" address window. We get the benefit of automatic
byte swapping through this address window and drivers don't need to change
to deal with CPU big-endianness.
module. With r203732 it became apparent that creating the sysctl nodes
twice causes at least a warning, however the whole code shouldn't be
present twice in the first place.
Discussed with: rmacklem
o uses v4 firmware instead of v3. A port will be committed to create
the bwn firmware module.
o supports B/G and LP(low power) PHYs.
o supports 32 / 64 bits DMA operations.
o tested on big / little endian machines so should work on all
architectures.
It'd not connected to the build until the firmware port is committed.
o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
by calling either bus_space_map() or pmap_mapdev().
o Implement bus_space_map() in terms of pmap_mapdev() and implement
bus_space_unmap() in terms of pmap_unmapdev().
o Have ia64_pib hold the uncached virtual address of the processor interrupt
block throughout the kernel's life and access the elements of the PIB
through this structure pointer.
This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.
With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.
With FBS enabled, we have no idea what command caused timeout.
Implement same logic as in siis(4) - wait for other commands
complete or timeout and then give some more time.
According to the last POSIX specification that contained <sys/timeb.h>,
this header should also typedef time_t properly. Also add a proper
comment to the final #endif.
caching code for IPv6 by fixing a typo that used the incorrect variable.
It also fixes the indentation of the statement above it.
Reported by: simon AT comsys.ntu-kpi.kiev.ua
MFC after: 5 days
scalable shared memory node, which is used in large UltraSPARC III based
machines to group snooping-coherency domains together, like schizo(4) to
be treated like nexus(4) children.
to the exclusion lists as the CPU nodes aren't handled as regular devices
either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
lists.
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
It would be great if we could use newbus to deal with the different ways
the 'cpu' devices can hang off of pseudo ones but unfortunately both
cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
This involves changing the MD KTR code to not directly read the UPA
module ID either. We use the MID stored in the per-CPU data instead of
calling cpu_get_mid() as a replacement in order prevent clobbering any
registers as side-effect in the assembler version. This requires CATR()
invocations from mp_startup() prior to mapping the per-CPU pages to be
removed though.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
machdep.c.
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq
Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks
no cleanwindows handler so just remove trying to trigger it from _start
and the AP trampoline code as that leads to a crash there. This should
be okay as leaking data from the OFW via the CPU registers on start of
the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
refer to the right macro.
OpenBSD and OpenSolaris do instead of fiddling with the MMUs ourselves.
Unlike direct access the firmware methods don't automatically use the
next free (?) TLB slot, instead the slot to be used has to be specified.
We allocate the TLB slots for the kernel top-down as OpenSolaris suggests
that the firmware will always allocate the ones for its own use bottom-up.
Besides being simpler, according to OpenBSD using the firmware methods is
required to allow booting on Sun Fire E10K with multi-systemboard domains.
of Sun Fire V1280 doesn't round up the size itself but instead lets
claiming of non page-sized amounts of memory fail.
- Change parameters and variables related to the TLB slots to unsigned
which is more appropriate.
- Search the whole OFW device tree instead of only the children of the
root nexus device for the BSP as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
- Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Check the return value of init_heap(). This requires moving it after
cons_probe() so we can panic when appropriate. This should be fine as
the PowerPC OFW loader uses that order for quite some time now.
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
fix was attempted in the past, these fields were being overflowed,
Now this is consistent with the spec, and we can now share the
WinChksum code with NetBSD.
Submitted by: Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from: NetBSD
Reviewed by: bde
MFC after: 2 weeks
pending blocks are scheduled for removal, goes to retry the (re)allocation,
clear the bp pointer. It might happen that meantime free space is really
exhausted and we are entering nospace: label without bread()ing buffer,
causing stale bp value to be brelse()d again.
Tested by: pho
(Producing a scenario to reliably reproduce the
race appeared to be much harder then fixing the bug)
MFC after: 1 week
this in the Sibyte PCI hostbridge driver instead.
The nexus driver sees resource allocation requests for memory and irq
resources only. These are legitimate resources on all MIPS platforms.
Suggested by: imp
It is belived that that pass s not needed anymore.
Specifically it is not required now for the reasons that were given
in the removed comment.
Discussed with: jhb
MFC after: 4 weeks
o Incorporate review comments:
- Properly reference and lock the map
- Take into account that the VM map can change inbetween requests
- Add the fileid and fsid attributes
Credits: kib@
Reviewed by: kib@