header will make the data go over the 64k limits announced to busdma as
maxsize and the transaction will fail.
With TSO this can result in a TCP regression due to the lost packet.
According to the data sheets ixgbe(4) 82598 and 82599 can handle up to
256k so increase the maximum.
Reported by: Jon Kåre Hellan, UNINETT (jon.kare.hellan uninett.no)
Tested by: Jon Kåre Hellan, UNINETT (jon.kare.hellan uninett.no)
MFC after: 1 week
to the process id. It follows the ptrace(2) interface and allows debugging
libraries to use thread ids directly, without slow and verbose conversion
of thread id into pid.
The PGET_NOTID flag is provided to allow a specific sysctl to disallow
this behaviour. All current callers of pget(9) have useful semantic to
operate on tid and do not need this flag.
Reviewed by: jhb, trocini
MFC after: 1 week
not provide general barriers, but only barriers in the context of the
atomic sequences here. As such, make them private and keep the global
*mb() routines using a variant of sync.
isync to implement read and write barriers, following Appendix B.2 of
Book II of the architecture manual. This provides a 25% speed increase
to fork() on the PowerPC G4.
of sync (lwsync is an alternate encoding of sync on systems that do not
support it, providing graceful fallback). This provides more than an order
of magnitude reduction in the time required to acquire or release a mutex.
MFC after: 2 months
sync performs a strict superset of the functions of eieio, so using both
is redundant. While here, expand bus barriers to all bus_space operations,
since many drivers do not correctly use bus_space_barrier().
In principle, we can also replace sync just with eieio, for a significant
performance increase, but it remains to be seen whether any poorly-written
drivers currently depend on the side effects of sync to properly function.
MFC after: 1 week
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.
Suggested by: kib
MFC after: 4 days
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.
Reported by: Pawel Tyll <ptyll@nitronet.pl>
Approved by: kib(mentor)
MFC after: 1 week
as otherwise the interrupt handling code may modify data in the non-DMA
part of the cache line while we have it stashed away in the temporary
stack buffer, then we end up restoring a stale value.
PR: 160431
Submitted by: Ian Lepore
MFC after: 1 week
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.
Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.
Reported and tested by: Sushanth Rai <sushanth_rai yahoo com>
Reviewed by: alc
MFC after: 1 week
sys/dev/dpt/dpt_scsi.c:612:18: error: implicit truncation from 'int' to bitfield changes value from -2 to 2 [-Werror,-Wconstant-conversion]
dpt->cache_type = DPT_CACHE_WRITEBACK;
^ ~~~~~~~~~~~~~~~~~~~
by defining DPT_CACHE_WRITEBACK as 2, since dpt_softc::cache_type is an
unsigned bitfield. No binary change.
MFC after: 1 week
The default priority is now '1000' rather than '0'. This may cause some
unforseen regressions.
Submitted by: Stefan Bethke <stb@lassitu.de>
Reviewed by: imp
- When switching to 4-bit operation, send a SET_CLR_CARD_DETECT command
to disconnect the card-detect pull-up resistor from the DAT3 line before
sending the SET_BUS_WIDTH command.
- Add the missing "reserved" zero entry to the mantissa table used to
decode various CSD fields. This was causing SD cards to report that they
could run at 30 MHz instead of the maximum 25 MHz mandated in the spec.
o Enhancements:
- At the MMC layer, format various info from the CID into a string that
uniquely identifies the card instance (manufacturer number, serial
number, product name and revision, etc). Export it as an instance
variable.
- At the MMCSD layer, display the formatted card ID string, and also
report the clock speed of the hardware (not the card's max speed), and
the number of bits and number of blocks per transfer. It comes out like
this now:
mmcsd0: 968MB <SD SD01G 8.0 SN 276886905 MFG 08/2008 by 3 SD> at mmc0
22.5MHz/4bit/128-block
o Use DEVMETHOD_END.
o Use NULL instead of 0 for pointers.
PR: 156496
Submitted by: Ian Lepore
MFC after: 1 week
sys/contrib/rdma/rdma_cma.c:1259:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ECONNRESET:
^
@/sys/errno.h:118:20: note: expanded from macro 'ECONNRESET'
#define ECONNRESET 54 /* Connection reset by peer */
^
sys/contrib/rdma/rdma_cma.c:1263:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ETIMEDOUT:
^
@/sys/errno.h:124:19: note: expanded from macro 'ETIMEDOUT'
#define ETIMEDOUT 60 /* Operation timed out */
^
sys/contrib/rdma/rdma_cma.c:1260:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ECONNREFUSED:
^
@/sys/errno.h:125:22: note: expanded from macro 'ECONNREFUSED'
#define ECONNREFUSED 61 /* Connection refused */
^
This is because the switch uses iw_cm_event::status, which is an enum
iw_cm_event_status, while ECONNRESET, ETIMEDOUT and ECONNREFUSED are
just plain defines from errno.h.
It looks like there is only one use of any of the enumeration values of
iw_cm_event_status, in:
sys/contrib/rdma/rdma_iwcm.c: if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) {
So messing around with the enum definitions to fix the warning seems too
disruptive; the simplest fix is to cast the argument of the switch to
int.
Reviewed by: kmacy
MFC after: 1 week
sys/dev/nxge/if_nxge.c:1276:11: error: case value not in enumerated type 'xge_hal_event_e' (aka 'enum xge_hal_event_e') [-Werror,-Wswitch]
case XGE_LL_EVENT_TRY_XMIT_AGAIN:
^
sys/dev/nxge/if_nxge.c:1289:11: error: case value not in enumerated type 'xge_hal_event_e' (aka 'enum xge_hal_event_e') [-Werror,-Wswitch]
case XGE_LL_EVENT_DEVICE_RESETTING:
^
This is because the switch uses xge_queue_item_t::event_type, which is
an enum xge_hal_event_e, while the XGE_LL_EVENT_xx values are of the
enum xge_event_e.
Since messing around with the enum definitions is too disruptive, the
simplest fix is to cast the argument of the switch to int.
Reviewed by: gnn
MFC after: 1 week
STAILQ(). While here, fix another clang warning about a switch which
tests an enum type for a regular integer value.
Submitted by: jhb
MFC after: 1 week
assumes for small buffers (< 64k) that the OS driver is actually using
a buffer rounded up to the next power of 2. It also assumes that the
buffer is at least 4k in size. Furthermore, there is at least one
known instance of megarc sending a request with a 12k buffer where the
firmware writes out a 24k-ish reply.
To workaround the data corruption triggered by this "feature", ensure
that buffers for user commands use a minimum size of 32k, and that
buffers between 32k and 64k use a 64k buffer.
PR: kern/155658
Submitted by: Andreas Longwitz longwitz incore de
Reviewed by: scottl
MFC after: 1 week
code that is used to construct a loader (e.g. libstand, ficl, etc).
There is such a thing as a 64-bit EFI application, but it's not
as standard as 32-bit is. Let's make the 32-bit functional (as in
we can load and actualy boot a kernel) before solving the 64-bit
loader problem.
ar724x_pci.c.
* Move out the code which populates the firmware into ar71xx_fixup.c
* Shuffle around the ar724x fixup code to match what the ar71xx fixup
code does.
I've validated this on an AR7240 with AR9285 on-board NIC. It doesn't
yet load, as the AR9285 EEPROM code needs to be made "flash aware."
TODO:
* Validate that I haven't broken AR71xx
* Test AR9285/AR9287 onboard NICs, complete with EEPROM code changes
* Port over the needed BAR hacks for AR7240, AR7241 and AR7242 from
Linux OpenWRT. The current WAR has only been tested on the AR7240
and I'm not sure the way the BAR register is treated is "right".
The "fixup" method here is right when setting the BAR for local access -
ie, the BAR address is either 0xffff (AR7240) or 0x1000ffff (AR7241/AR7242),
but the ath9k-fixup.c code (Linux OpenWRT) does this when setting the
initial "fixup" BAR. It then restores the original BAR.
I'll have to read the ar724x PCI bus glue to see what other special cases
await.
over just the active vnodes associated with a mount point to replace
MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync
routines.
The vfs_msync routine is run every 30 seconds for every writably
mounted filesystem. It ensures that any files mmap'ed from the
filesystem with modified pages have those pages queued to be
written back to the file from which they are mapped.
The ffs_lazy_sync and qsync routines are run every 30 seconds for
every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine
ensures that any files that have been accessed in the previous
30 seconds have had their access times queued for updating in the
filesystem. The qsync routine ensures that any files with modified
quotas have those quotas queued to be written back to their
associated quota file.
In a system configured with 250,000 vnodes, less than 1000 are
typically active at any point in time. Prior to this change all
250,000 vnodes would be locked and inspected twice every minute
by the syncer. For UFS/FFS filesystems they would be locked and
inspected six times every minute (twice by each of these three
routines since each of these routines does its own pass over the
vnodes associated with a mount point). With this change the syncer
now locks and inspects only the tiny set of vnodes that are active.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.
To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).
This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
actually in it. This happens when SCTP receives an unknown chunk, which
requires the sending of an ERROR chunk, and there is no final padding but
the chunk is not 4-byte aligned.
Reported by yueting via rwatson@
MFC after: 3 days
at least until I can root cause what's going on.
The only platform I've seen this on is the AR9220 when attached to
the AR71xx CPUs. I get immediate PCIe bus errors and all subsequent
accesses cause further MIPS bus exceptions. I don't have any other
big-endian platforms to test this on.
If I get a chance (or two), I'll try to whack this on a bus analyser
and see exactly what happens.
I'd rather leave this on, especially for slower, embedded platforms.
But the #ifdef hell is something I'm trying to avoid.
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
used only as a helper function in that file. Replace sole call to
vbusy() with inline code in vholdl(). Replace sole calls to vfree()
and vdestroy() with inline code in vdropl().
The Clang compiler already inlines these functions, so they do not
show up in a kernel backtrace which is confusing. Also you cannot
set their frame in kgdb which means that it is impossible to view
their local variables. So, while the produced code is unchanged,
the debugging should be easier.
Discussed with: kib
MFC after: 2 weeks
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).
To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.
The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
allow the owner to read and write ACL and file attributes when there
was no entry with subject matching the owner. In other words,
'getfacl meh' shouldn't fail for the owner if the ACL looks like this:
# file: meh
# owner: trasz
# group: wheel
user:root:------a-------:------:allow
Reported by: kientzle
like the one triggered by this:
# kldload geom_vinum
# pwait `pgrep -S gv_worker` &
# kldunload geom_vinum
or this:
GEOM_JOURNAL: Shutting down geom gjournal 3464572051.
panic: destroying non-empty racct: 1 allocated for resource 6
which were tracked by jh@ to be caused by checking p->p_flag,
while it wasn't initialised yet. Basically, during fork, the code
checked p_flag, concluded the process isn't marked as P_SYSTEM,
incremented the counter, and later on, when exiting, checked that
the process was marked as P_SYSTEM, and thus didn't decrement it.
Also, I believe there wasn't any good reason for checking P_SYSTEM
in the first place.
Tested by: jh
even in the face of errors.
If the RX descriptor list fails, the RX lock won't be initialised, but
then the DMA free path wil try freeing it.
This commit is brought to you by a working mwl(4).
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c. There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *. Probably
this was the source of MI/MD confusion.
Reviewed by: emulation
push the address onto stack as we do for INTn emulation. This avoids stack
underflow when we encounter RETF instruction in VM86 mode. Lack of this
exit point actually caused page fault in VM86 mode with VESA module when we
resume from suspend state[1].
- Remove unnecessary CLI and STI instructions from BIOS interrupt emulation.
INTn and IRET must be able to emulate the flag correctly.
Reported by: gavin [1]
Tested by: gavin (early revision)
MFC after: 3 days
last show-stopper keeping PREEMPTION from being usable on sparc64 should
have been dealt with in r230662.
At least on 2-way systems, PREEMPTION causes a little bit of a degradation
in worldstone performance. However, FreeBSD seems to have started building
up regressions in !PREEMPTION cases so sparc64 better should not be an
oddball in this regard.
MFC after: 1 week
Since r230208 update mounts were allowed if the list of mount options
contained the "export" option. This is not correct as tmpfs doesn't
really support updating all options.
Reviewed by: kevlo, trociny
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.
This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.
To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az
Reported by: Andrey Zonov <andrey zonov.org>
Reviewed by: andre (previous version of patch)
Before r228267 the option was honored but the original content of
boot.config was not preserved. I tried to fix that but missed the idea.
Now the proper way of doing things is taken from i386/boo2.
Also, a comment is added to explain this a little bit unobvious
behavior.
Inspired by: jhb
MFC after: 5 days
* arge0 doesn't (yet) work via the switch PHY ports; I'm not sure why.
* arge1 maps to the WAN port. That works.
TODO:
* The PLL register needs a different (non-default) value for Gigabit
Ethernet. The board setup code needs to be extended a bit to allow
for non-default pll_1000 values - right now, those values come out
of hard-coded values in the per-chip set_pll_ge() routines.
Obtained from: Linux / OpenWRT
This may result in a bit of a throughput drop. However, any throughput
drop at this point should be investigated and root caused, as it's likely
because TX scheduling (all the way down to how preemption, scheduler work,
etc) is happening in a sub-optimal fashion.
This also makes it much more likely to be reloadable on a live machine.
Allocating 5120 TX ath_buf entries via contigmalloc is very unlikely
after a few hours of using X/Chromium.
dirty and murky past.
* Override the default cache line size to be something reasonable if
it's set to 0. Some NICs initialise with '0' (eg embedded ones)
and there are comments in the driver stating that various OSes (eg
older Linux ones) would incorrectly program things and 0 out this
register.
* Just default to overriding the latency timer. Every other driver
does this.
* Use a default cache line size of 32 bytes. It should be "reasonable
enough".
Obtained from: Linux ath9k, Atheros
a8af6270bd96be6ccd86f70b60fa6512b710e4f0
virtio_blk: Include function name in panic string
cbdb03a694b76c5253d7ae3a59b9995b9afbb67a
virtio_balloon: Do the notify outside of the lock
By the time we return from virtqueue_notify(), the descriptor
will be in the used ring so we shouldn't have to sleep.
10ba392e60692529a5cbc1e9987e4064e0128447
virtio: Use DEVMETHOD_END
80cbcc4d6552cac758be67f0c99c36f23ce62110
virtqueue: Add support for VIRTIO_F_RING_EVENT_IDX
This can be used to reduce the number of guest/host and
host/guest interrupts by delaying the interrupt until a
certain index value is reached.
Actual use by the network driver will come along later.
8fc465969acc0c58477153e4c3530390db436c02
virtqueue: Simplify virtqueue_nused()
Since the values just wrap naturally at UINT16_MAX, we
can just subtract the two values directly, rather than
doing 2's complement math.
a8aa22f25959e2767d006cd621b69050e7ffb0ae
virtio_blk: Remove debugging crud from 75dd732a
There seems to be an issue with Qemu (or FreeBSD VirtIO) that sets
the PCI register space for the device config to bogus values. This
only seems to happen after unloading and reloading the module.
d404800661cb2a9769c033f8a50b2133934501aa
virtio_blk: Use better variable name
75dd732a97743d96e7c63f7ced3c2169696dadd3
virtio_blk: Partially revert 92ba40e65
Just use the virtqueue to determine if any requests are
still inflight.
06661ed66b7a9efaea240f99f414c368f1bbcdc7
virtio_blk: error if allowed too few segments
Should never happen unless the host provides use with a
bogus seg_max value.
4b33e5085bc87a818433d7e664a0a2c8f56a1a89
virtio_blk: Sort function declarations
426b9f5cac892c9c64cc7631966461514f7e08c6
virtio_blk: Cleanup whitespace
617c23e12c61e3c2233d942db713c6b8ff0bd112
virtio_blk: Call disk_err() on error'd completed requests
081a5712d4b2e0abf273be4d26affcf3870263a9
virtio_blk: ASSERT the ready and inflight request queues are empty
a9be2631a4f770a84145c18ee03a3f103bed4ca8
virtio_blk: Simplify check for too many segments
At the cost of a small style violation.
e00ec09da014f2e60cc75542d0ab78898672d521
virtio_blk: Add beginnings of suspend/resume
Still not sure if we need to virtio_stop()/virtio_reinit()
the device before/after a suspend.
Don't start additional IO when marked as suspending.
47c71dc6ce8c238aa59ce8afd4bda5aa294bc884
virtio_blk: Panic when dealt an unhandled BIO cmd
1055544f90fb8c0cc6a2395f5b6104039606aafe
virtio_blk: Add VQ enqueue/dequeue wrappers
Wrapper functions managed the added/removing to the in-flight
list of requests.
Normally biodone() any completed IO when draining the virtqueue.
92ba40e65b3bb5e4acb9300ece711f1ea8f3f7f4
virtio_blk: Add in-flight list of requests
74f6d260e075443544522c0833dc2712dd93f49b
virtio_blk: Rename VTBLK_FLAG_DETACHING to VTBLK_FLAG_DETACH
7aa549050f6fc6551c09c6362ed6b2a0728956ef
virtio_blk: Finish all BIOs through vtblk_finish_bio()
Also properly set bio_resid in the case of errors. Most geom_disk
providers seem to do the same.
9eef6d0e6f7e5dd362f71ba097f2e2e4c3744882
Added function to translate VirtIO status to error code
ef06adc337f31e1129d6d5f26de6d8d1be27bcd2
Reset dumping flag when given unexpected parameters
393b3e390c644193a2e392220dcc6a6c50b212d9
Added missing VTBLK_LOCK() in dump handler
Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org
r233961:
Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.
In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
the "intr_cpus" cpuset to only contain CPU0.
This initialization is too late and nullifies the results of calls
to the intr_add_cpu() that occur much earlier in the boot process.
r234074 (partial):
The BSP is not added to the mask of valid target CPUs for interrupts.
Fix this by adding the BSP as an interrupt target directly in
r234105:
Fix !SMP build after r234074.
MFC after: 3 days
Contrarily to what i wrote in my previous commit, the 82599
does include the CRC in the length. The operating mode is
reset in ixgbe_init_locked() and so we need to hook into
the places where the two registers (HLREG0 and RDRXCTL) are
modified.
This uses the new firmware(9) method for squirreling away the EEPROM
contents from SPI flash so ath(4) can get to them later.
It won't work out of the box just yet - you have to add this to
if_ath_pci.c:
#define ATH_EEPROM_FIRMWARE
.. until I've added it as a configuration option and updated things.
interface.
* Introduce a device hint, 'eeprom_firmware', which is the name of firmware
to lookup.
* If the lookup succeeds, take a copy of it and use it as the eeprom data.
This isn't enabled by default - you have to define ATH_EEPROM_FIRMWARE.
I'll add it to the configuration variables in a later commit.
TODO:
* just keep a firmware reference in ath_softc, and remove the need to
waste the extra memory in having sc_eepromdata be a malloc()ed block.
future use by the ath(4) driver.
These embedded devices put the calibration/PCI bootstrap data on the
on board SPI flash rather than on an EEPROM connected to the NIC.
For some boards, there's two NICs and two sets of EEPROM data in the
main SPI flash.
The particulars:
* Introduce ath_fixup_size, which is the size of the EEPROM area in
bytes.
* Create a firmware image with a name based on the PCI device identifier
(bus/slot/device/function).
* Hide some verbose debugging behind 'bootverbose'.
ath(4) can then use this to load in the EEPROM data.
This requires AR71XX_ATH_EEPROM to be defined.
* the openwrt code doesn't treat 0/0/0 any differently
from other bus/slot/func combinations.
* A "local write" function writes to the LCONF area, and
so I've added it.
* The PCI workaround at attach time uses this LCONF code,
which it already did ..
* .. but it is a 4 byte write, not a 2 byte write.
Even though it's PCIR_COMMAND which is a two byte PCI register.
Tested on: AR7161
TODO: The other two AR71xx derivatives
TODO: More thoroughly stare at the datasheets I do have
and if it indeed is incorrect, push fixes to both
FreeBSD and Linux/OpenWRT.
Obtained from: Linux OpenWRT