Merge changes for initial n64 support in pmap.c. Use direct mapped (XKPHYS)
access for a lot of operations that earlier needed temporary mapping. Add
support for using XKSEG for kernel mappings.
Reviewed by: imp
Obtained from: jmallett (http://svn.freebsd.org/base/user/jmallett/octeon)
The following systems are affected:
- MPC8555CDS
- MPC8572DS
This overhaul covers the following major changes:
- All integrated peripherals drivers for Freescale MPC85XX SoC, which are
currently in the FreeBSD source tree are reworked and adjusted so they
derive config data out of the device tree blob (instead of hard coded /
tabelarized values).
- This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
QUICC, UART, CFI.
- Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
ocpbus(4) driver, which was based on hard-coded config data.
Note that world for these platforms has to be built WITH_FDT.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
most one call to pmap_qremove(), and thus one TLB shootdown, instead of one
call and TLB shootdown per page.
Simplify the interface to vm_hold_free_pages().
MFC after: 3 weeks
was needed at preliminary version of the patch, where number of CPU ticks
was divided strictly on 16 seconds. Final code instead uses real interval
duration, so precise interval should not be important. Same time aliasing
issues around second boundary causes false positives, periodically logging
useless "t_delta ... too long/short" messages when HZ set below 256.
start so we should adjust the mbuf if the driver is running in PIO mode.
Now it should work well with WPA authentication and association for LP
PHY devices.
Tested by: Warren Block <wblock at wonkity.com>
MFC after: 1 month
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.
For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.
MFC after: 3 weeks
quirks for weak-symbol handling. Text symbols require also marking weak
the special dot-symbol associated with the function, and data symbols
require that you not do that. To fix this, provide a hacked
__weak_reference for powerpc64, and define a new __weak_reference_data
for the single weak data symbol in base.
Revert after: binutils 2.17 import
Obtained from: projects/ppc64
checksum offloading is enabled. The frame has a valid checksum
value so payload might be modified during TX checksum calculation.
Disable TX checksum offloading but give users chance to enable it
when they know their controller works without problems with TX
checksum offloading.
Reported by: Andrzej Tobola <ato <> iem dot pw dot edu dot pl>
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.
Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks
the maintenance of vm_pageout_deficit can be localized to just two places:
vm_page_alloc() and vm_pageout_scan().
This change also corrects an off-by-one error in the maintenance of
vm_pageout_deficit. Historically, the buffer cache functions, allocbuf()
and vm_hold_load_pages(), have not taken into account that vm_page_alloc()
already increments vm_pageout_deficit by one.
Reviewed by: kib
- Run the adapter's tick at 1Hz and remove link state checks from it.
Instead, have each port check its link state. Delay the check so that
it takes place slightly after the driver is notified of a change in
link state. This is a cheap way to debounce these notifications if
many are received in rapid succession. POLL_LINK_1ST_TIME flag can
also be eliminated as a side effect of these changes.
- Do not reset the PHY when link goes down.
- Clear port's link_fault flag if the PHY indicates link is down.
- get_link_status_r should leave speed and duplex alone when link is down.
MFC after: 1 month
checksum is enabled in sge_init_locked().
While I'm here do not set RX checksum bits in RX descriptor
initialization. It is controller's job to set these bits.
Tested by: xclin <xclin <> cs dot nctu dot edu dot tw >
The existing code only checked the alignment of the first mbuf and
didn't enforce the size constraints.
This commit introduces a simple function to check the alignment and
size of all mbufs in the list. This fixes the initial issue in the
PR.
PR: kern/148307
Reviewed by: gonzo@
Updated PTE/PDE macros from http://svn.freebsd.org/base/user/jmallett/octeon
Introduce pmap_segshift() macro, use pmap_segmap() in place of pmap_pde, and
remove pmap_pde().
Approved by: rrs (mentor)
Obtained from: jmallett@
a virtual-mode version for use on 64-bit systems, which have 32-bit
firmware implementations and require similar constraints on addressing
to the real-mode implementation.
flag is always provided, and unconditionally retry after sleep for the
busy page or failed allocation.
The intent is to remove VM_ALLOC_RETRY eventually.
Proposed and reviewed by: alc
do on i386. The consequences of not doing so on amd64 became apparent
with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS
options. Specifically, single-threaded applications were generating
unnecessary IPIs to shoot-down the TLB on other processors. However,
this is clearly nonsensical because a single-threaded application is
only running on the current processor. The reason that this happens
is that pmap_activate() is unable to properly update the old pmap's
field "pm_active" without the correct "curpmap". So, in effect, stale
bits in "pm_active" were leading pmap_protect(), pmap_remove(),
pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary
processor that wasn't even running the same application.
Reviewed by: kib
MFC after: 3 weeks
- Fix a bug where thread may be in sleeping state but the wchan won't
be set, leading to an empty container for sleepq_type(). [0]
Sponsored by: Sandvine Incorporated
[0] Submitted by: Bryan Venteicher
<bryanv at daemoninthecloset dot org>
MFC after: 3 days
X-MFC: 209577
by timeout/error of one of probe commands, process may continue infinitely.
Make CAM ATA more robust to faulty devices and false positive detections,
abort probe after two restarts on timeouts or ten on other errors.
Because slave PICs send all interrupts to their CPU 0 output line (which
is routed to a pin on the master PIC), changes to per-CPU register banks
like EOI on the slave PIC must be accessed for CPU 0, instead of the
CPU actually processing the interrupt.
Submitted by: Andreas Tobler
There are special cases where tty_rel_free() can be called twice in a
row, namely when closing and revoking the TTY at the same moment. Only
call destroy_dev_sched_cb() once.
Reported by: Jeremie Le Hen
MFC after: 1 week
the context of the process that reduced the effective count. Previously
all truncation as a result of unlink happened in the softdep flush
thread. This had the effect of being impossible to rate limit properly
with the journal code. Now the process issuing unlinks is suspended
when the journal files. This has a side-effect of improving rm
performance by allowing more concurrent work.
- Handle two cases in inactive, one for effnlink == 0 and another when
nlink finally reaches 0.
- Eliminate the SPACECOUNTED related code since the truncation is no
longer delayed.
Discussed with: mckusick
configuration to get IPv4 TSO work on BCM57780. While I'm here
apply the same fix to BCM5785 which shares similar hardware feature
of BCM57780. This change makes TSO work on BCM57780.
Tested by: Tong Liu <nemoliu <> gmail dot com>
specify the increment of vm_pageout_deficit when sleeping due to page
shortage. Then, in allocbuf(), the code to allocate pages when extending
vmio buffer can be replaced by a call to vm_page_grab().
Suggested and reviewed by: alc
MFC after: 2 weeks
numbers. This change adds a new function alloc_unr_specific() which
returns the requested unit number if it is free. If the number is
already allocated or out of the range, -1 is returned.
Update alloc_unr(9) manual page accordingly and add a MLINK for
alloc_unr_specific(9).
Discussed on: freebsd-hackers
between determining the other CPUs and calling cpu_ipi_selected(), which
apart from generally doing the wrong thing can lead to a panic when a
CPU is told to IPI itself (which sun4u doesn't support).
Reported and tested by: Nathaniel W Filardo
- Add __unused where appropriate.
MFC after: 3 days
is ordered by page index. This greatly simplifies the implementation,
since we no longer need to mark the pages with VPO_CLEANCHK to denote
the progress. It is enough to remember the current position by index
before dropping the object lock.
Remove VPO_CLEANCHK and VM_PAGER_IGNORE_CLEANCHK as unused.
Garbage-collect vm.msync_flush_flags sysctl.
Suggested and reviewed by: alc
Tested by: pho
it. This can happen in some cases when plugging in SD/SmartCard PC
Cards with empty slots. It is better to detect this bogosity, and
refuse to attach rather than panic with a division by zero (in one of
many places) down stream.
document one of the optional flags; clarify which of the flags are
optional (and which are not), and remove mention of a restriction on
the reclamation of cached pages that no longer holds since version 7.
MFC after: 1 week
large number of packets queued to a crashing process.
In a specific case you may get 2 ABORT's back (from
say two packets in flight). If the aborts happened to
be processed at the same time its possible to have
one free the association while the other is trying
to report all the outbound packets. When this occured
it could lead to a crash.
MFC after: 3 days
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log
file, providing the ability to make highly granular measurements of TCP
connection state. The tool is aimed at system administrators, developers and
researchers alike. Please take it for a spin and test it out - the man page
should have all the information required to get you going.
Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: dwmalone, gnn, rpaulo
Tested by: Many on freebsd-current@ and elsewhere over the years
MFC after: 1 month
If we save/restore the PageMask, the value set by the bootloader will
persist, and will cause problems later in TLB exception handler.
This caused a crash in AR71xx boards.
Also fixes the EntryHi mask in pte.h
Reported by: Luiz Otavio O Souza <lists.br@gmail.com>
Tested by: Luiz Otavio O Souza <lists.br@gmail.com>
Approved by: rrs (mentor)
a read-lock is being called to check the vtag-timewait cache.
Then in two cases (where a vtag is bad i.e. in the time-wait
state) the write-unlock is called NOT the read-unlock. Under
conditions where lots of associations are coming and going
this will cause the system to panic at some point.
MFC after: 3 days
the core changes but left out the shared code, lol.
Well, and a couple fixes to the core... hopefully
this will all be complete now.
Happy happy joy joy :)
What this provides is support for the 'virtual function'
interface that a FreeBSD VM may be assigned from a host
like KVM on Linux, or newer versions of Xen with such
support.
When the guest is set up with the capability, a special
limited function 82576 PCI device is present in its virtual
PCI space, so with this driver installed in the guest that
device will be detected and function nearly like the bare
metal, as it were.
The interface is only allowed a single queue in this configuration
however initial performance tests have looked very good.
Enjoy!!
Previously, the caller unlocked the page, and vm_pageout_clean()
immediately reacquired the page lock. Also, assert rather than test
that the page is neither busy nor held. Since vm_pageout_clean() is
called with the object and page locked, the page can't have changed
state since the caller verified that the page is neither busy nor
held.
one or more mappings to the bogus page must be replaced, call pmap_qenter()
just once. Previously, pmap_qenter() was called for each mapping to the
bogus page.
MFC after: 3 weeks
limit the advertised speed of an SFP+ to 1G, effectively
"forcing" link at that lower speed. It is off by default
and is enabled by sysctl dev.ix.0.force_gig=1, 0 will
set it back to the norm.
The mpt driver previously didn't report a 'maxio' size to CAM, and so the
da(4) driver limited I/O sizes to DFLTPHYS (64K) by default. The number
of scatter gather segments allowed, as reported to busdma, was
(128K / PAGE_SIZE) + 1, or 33 on architectures with 4K pages.
Change things around so that we wait until we've determined how many
segments the adapter can support before creating the busdma tag used for
buffers, so we can potentially support more S/G segments and therefore
larger I/O sizes.
Also, fix some things that were broken about the module unload path. It
still gets hung up inside CAM, though.
mpt.c: Move some busdma initialization calls in here, and call
them just after we've gotten the IOCFacts, and know how
many S/G segments this adapter can support.
mpt.h: Get rid of MPT_MAXPHYS, it is no longer used.
Add max_cam_seg_cnt, which is used to report our maximum
I/O size up to CAM.
mpt_cam.c: Use max_cam_seg_cnt to report our maximum I/O size to CAM.
Fix the locking in mpt_cam_detach().
mpt_pci.c: Pull some busdma initialization and teardown out and put
it in mpt.c. We now delay it until we know many scatter
gather segments the adapter can support, and therefore
how to setup our busdma tags.
mpt_raid.c: Make sure we wake up the right wait channel to get the
raid thread to wake up when we're trying to shut it down.
Reviewed by: gibbs, mjacob
MFC after: 2 weeks
This prevents a kernel fault by dividing with zero because the initial
rate was 0 and didn't be initialized.
Tested by: Warren Block <wblock at wonkity.com>
MFC after: 3 days
changed to RUN because ic->ic_newassoc isn't set anywhere now. In the
previous bwi_newassoc() is used to initialize AMRR rate routines.
Tested by: Warren Block <wblock at wonkity.com>
MFC after: 3 days
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
pksignal() except that they accept a thread as an argument instead of
a process. They send a signal to a specific thread rather than to an
individual process.
Reviewed by: kib
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.
To fix this, exit the inner loop with the continue operator forcibly and
explicitly.
PR: kern/147798
Use C99 initializers for the struct sysent generated by MAKE_SYSENT().
C++ does not have designator-initializer facility of C99, not using this
in the header makes us friendly to C++ kernel modules, whoever wants
such schism.
Requested by: mdf
MFC after: 6 days (not really)
shm syscalls, and initial check for the number of allocated segments
in the module deinitialization code, the following might happen:
after the check for active segment, while waiting for threads to
leave some other syscall, shmget(2) is called. Then, we can end
up with the shared segment that cannot be detached since sysvshm
module is unloaded.
Prevent the leak by rechecking and disclaiming a reference to the vm
object owned by sysvshm module, that might have grown during the drain.
Tested by: pho
Reviewed by: jhb
MFC after: 1 month
syscalls. On the dynamic syscall deregistration, wait until all
threads leave the syscall code. This somewhat increases the safety
of the loadable modules unloading.
Reviewed by: jhb
Tested by: pho
MFC after: 1 month
not providing a destination address and using ktrace.
* Do not copy out kernel memory when providing sinfo for sctp_recvmsg().
Both bug where reported by Valentin Nechayev.
The first bug results in a kernel panic.
MFC after: 3 days.
It has more features than acpi_aiboost(4) and it will eventually replace
acpi_aiboost(4).
Submitted by: Constantine A. Murenin <cnst at FreeBSD.org>
Reviewed by: freebsd-acpi, imp
MFC after: 1 month
Initial support for n32 and n64 ABIs from
http://svn.freebsd.org/base/user/jmallett/octeon
Changes are:
- syscall, exception and trap support for n32/n64 ABIs
- 64-bit address space defines
- _jmp_buf for n32/n64
- casts between registers and ptr/int updated to work on n32/n64
Approved by: rrs(mentor), jmallett
the internal interrupt sources as active-high. The internal interrupt
sources are disabled when programmed as active-low.
Note that the internal interrupts have no sense bit like the external
interrupts. We program them as edge-triggered to make sure we write a
0 value to a reserved register. It does not in any way say anything
about the sense of internal interrupt.
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.
The DF clearing for signal handlers was done some time ago.
MFC after: 1 week
PTE flag cleanup from http://svn.freebsd.org/base/user/jmallett/octeon
- Rename PTE_xx flags to match their MIPS names
- Use the new pte_set/test/clear macros uniformly, instead of a mixture
of mips_pg_xxx(), pmap_pte_x() macros and direct access.
- Remove unused macros and defines from pte.h and pmap.c
Discussed on freebsd-mips@
Approved by: rrs(mentor), jmallett
wrong side into account.
* sctp_findassociation_ep_addr() must check the local address if available.
This fixes a bug where ABORT chunks were accepted even in the case where
the local was not owned by the endpoint.
Thanks to brucec for pointing out a bug in my first version of the fix.
MFC after: 3 days
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().
Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.
Noted by: bde
removal, MFi386 r209198:
Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of FPU.
Reviewed by: bde
Tested by: pho
believed that all 486-class CPUs FreeBSD is capable to run on, either
have no FPU and cannot use external coprocessor, or have FPU on the
package and can use #MF.
Reviewed by: bde
Tested by: pho (previous version)
FPU registers for copying. Remove the switch table and jumps from
bcopy/bzero/... to the actual implementation.
As a side-effect, i486-optimized bzero is removed.
Reviewed by: bde
Tested by: pho (previous version)
properly short terminate their transfers. This fixes a problem where input
appears several seconds late.
Reported by: Alexander Yerenkow
Submitted by: Hans Petter Selasky
HPET to steal IRQ0 from i8254 and IRQ8 from RTC timers. It can be suitable
for HPETs without FSB interrupts support, as it gives them two unshared
IRQs. It allows them to provide one per-CPU event timer on dual-CPU system,
that should be suitable for further tickless kernels.
To enable it, such lines may be added to /boot/loader.conf:
hint.atrtc.0.clock=0
hint.attimer.0.clock=0
hint.hpet.0.legacy_route=1
vm_pageout_clean(). When iterating over a range of pages, these functions
can be cheaper than vm_page_lookup() because their implementation takes
advantage of the vm_object's memq being ordered.
Reviewed by: kib@
MFC after: 3 weeks
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.
Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.
For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.
This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.
Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
IBAT entry in early boot in order to prevent possible faults from races
between the instruction cache and the MMU.
PR: powerpc/148003
MFC after: 3 days
the kernel compiled with QUOTA option. ufs_accessx() upgrades the vdp
vnode lock from shared to exclusive to assign the dquot structure to
the vnode, and ufs_delete_denied() is called when tvp is locked. Since
upgrade drops shared lock when non-blocked upgrade failed, LOR is there.
Reported and tested by: Dmitry Pryanishnikov <lynx.ripe gmail com>
Tested by: pho
PR: kern/147890
MFC after: 1 week
core lower then set on other cores. Do not try to test P-states on attach
on SMP systems. It is hopeless now and will just pollute verbose logs.
If needed, check still can be forced via loader tunable.
* Add some per-device sysctl entries which record the watchdog state -
whether it is armed; whether the last reboot was due to the watchdog.
* Add a per-device sysctl debug flag to enable logging watchdog arming/
disarming.
Reviewed by: gonzo@
measured interval as upper bound. It should be more precise then just
assuming hz/2. For idle CPU it should be quite precise, for busy - not
worse then before.
external variables named "i". The "_" prefix is reserved for infrastructure
type code and is therefore not expected to be used by normal code likely to
call DPCPU_SUM(). [1]
- Change DPCPU_SUM to return the sum rather than calculate and assign it
internally. Usage is now: "sum = DPCPU_SUM(dpcpu_var, member_to_sum);" [2]
- Fix some style nits. [3]
Sponsored by: FreeBSD Foundation
Suggested by: bde [3], mdf [1], kib [1,2], pjd [1,3]
Reviewed by: kib
MFC after: 1 week (instead of r209119)
Although the Keywest registers have only 1 byte of content, they are
secretly 4-byte registers, which became apparent from them moving on the
big-endian Uninorth version of the controller.
On Apple systems at least, all the level interrupts are wired active low.
Before this change, our PIC programming only worked because Apple hardware
ignores the interrupt polarity bit on all interrupts except IRQ 0.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.
Add counters for timer-related IPIs.
Reviewed by: jhb@ (previous version)
New code that creates character devices shouldn't use device unit
numbers, but only si_drv[12] to hold pointer to per-device data. Make
this function more future proof by removing the unit number argument.
Discussed with: kib