This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
1. fix nointr check in intsmb_start, matters only if ENABLE_ALART is
defined (by default, it is not);
2. drop unnecessary inspection/reporting of power-management io registers
base address;
3. in verbose mode report errors from SMBus host controller and their
mapping to smbus(4) errors;
Approved by: jhb (mentor)
implement CD input in hardware, while unconditional showing it confuse users.
Also it was made in the way that sometimes improper with present driver.
Add patch for ALC268 based Acer TM5320 to make headphones jack sensing work.
Default configuration defines two separate playback associations, which
current driver unable to trace properly due to order they are defined and
limited codec uniformity.
Submitted by: G. Mirov <g.mirov AT gmail.com>
the "nbufkv" sleep.
First, ffs background cg group block write requests a new buffer for
the shadow copy. When ffs_bufwrite() is called from the bufdaemon due
to buffers shortage, requesting the buffer deadlock bufdaemon.
Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk
to not block while allocating the buffer, and return failure
instead. Add a flag argument to the geteblk to allow to pass the flags
to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer
allocation failed and either GB_NOWAIT_BD is specified, or geteblk()
is called from bufdaemon (or its helper, see below). In
ffs_bufwrite(), fall back to synchronous cg block write if shadow
block allocation failed.
Since r107847, buffer write assumes that vnode owning the buffer is
locked. The second problem is that buffer cache may accumulate many
buffers belonging to limited number of vnodes. With such workload,
quite often threads that own the mentioned vnodes locks are trying to
read another block from the vnodes, and, due to buffer cache
exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make
any substantial progress because the vnodes are locked.
Allow the threads owning vnode locks to help the bufdaemon by doing
the flush pass over the buffer cache before getnewbuf() is going to
uninterruptible sleep. Move the flushing code from buf_daemon() to new
helper function buf_do_flush(), that is called from getnewbuf(). The
number of buffers flushed by single call to buf_do_flush() from
getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent
recursive calls to buf_do_flush() by marking the bufdaemon and threads
that temporarily help bufdaemon by TDP_BUFNEED flag.
In collaboration with: pho
Reviewed by: tegge (previous version)
Tested by: glebius, yandex ...
MFC after: 3 weeks
LO_CSUM_FEATURES - a bitmask of supported transmit offload features, which
will be stored in if_hwassist if IFCAP_TXCSUM is enabled, and be cleared
from mbuf packet header csum flags on transmit. (1)
LO_CSUM_SET - a bitmask of supported receive offload features, which will
be set on the mbuf packet header csum flags on transmit if IFCAP_RXCSUM
is enabled.
While here, fix SCTP offload for loopback: offer generation on the
transmit side, don't just skip validation on the receive side.
Obtained from: DragonflyBSD (1)
MFC after: 1 week
have its MTU set higher than 1500 (ETHERMTU). Its new limit is now
65535 as enforced by ifhwioctl() in if.c
This allows a tap(4) device to be added to a bridge, which requires all
interface members to have the same MTU, with an interface configured for
jumbo frames. QEMU may now connect to a network via tap(4) without
requiring the real interface to have its MTU set to 1500 or lower.
Reviewed by: rpaulo, bms
MFC after: 1 week
avoidance:
- Enable setting the RXCSUM and TXCSUM flags for loopback interfaces;
set both by default.
- When RXCSUM is set, flag packets sent over the loopback interface as
having checked and valid IP, UDP, TCP checksums so that higher
protocol layers won't check them.
- Always clear CSUM_{IP,UDP_TCP} checksum required flags on transmit,
as they will have gotten there as a result of TXCSUM being set.
This is done only for packets explicitly sent over the loopback, not
simulated loopback via if_simloop() due to !SIMPLEX interfaces, etc.
Note that enabling TXCSUM but not RXCSUM will lead to unhappiness, as
checksums won't be generated but will be validated.
Kris reports that this leads to significant performance improvements
in loopback benchmarking with TCP and UDP for throughput:
RXCSUM RXCSUM+TXCSUM
TCP 15% 37%
UDP 10% 74%
Update man page.
Reviewed by: sam
Tested by: kris
MFC after: 1 week
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.
Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:
if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd
Drivers that were already disabled because of tty changes:
if_ppp
if_sl
Discussed on: arch@
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted. Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):
INP_TIMEWAIT
INP_SOCKREF
INP_ONESBCAST
INP_DROPPED
Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.
MFC after: 1 week (or after dependencies are MFC'd)
Reviewed by: bz
guarantee that all cpus have acknowledged the cleared enable int by
scheduling the resetting thread on each cpu in succession. Since all
lock profiling happens within a critical section this guarantees that
all cpus have left lock profiling before we clear the datastructures.
- Assert that the per-thread queue of locks lock profiling is aware of
is clear on thread exit. There were several cases where this was not
true that slows lock profiling and leaks information.
- Remove all objects from all lists before clearing any per-cpu
information in reset. Lock profiling objects can migrate between
per-cpu caches and previously these migrated objects could be zero'd
before they'd been removed
Discussed with: attilio
Sponsored by: Nokia
C-NET(PC) has a cfe at location 1 that has both an odd irq mask (it
matches pc98 machines, so maybe it was a flag for pc98 operation) as
well as a memory map. Since this driver doesn't know how to cope, we
start with cfe2, which is purely an I/O space mapped and that seems to
make it work. I say 'seems' here, because the card I have doesn't
seem to have the right dongle for full testing...
when someone is passing new rules, not when he only want to read them.
Because of this bug, even if the given rules were incorrect, they
ended up in rule_string.
- Add missing protection for rule_string when coping it.
Reviewed by: rwatson
MFC after: 1 week
pthread_sigmask() to signal.h. In principle, this shouldn't break anything,
since they're already in signal.h on other systems, and the FreeBSD
manpage says that both pthread.h and signal.h need to be included to
get these functions.
Add a hack to declare pthread_t in the P1003.1-2008 namespace
in signal.h.
in the POSIX namespace, and hiding eaccess() and setproctitle().
Also move mknodat() from unistd.h to sys/stat.h where it belongs.
The *at() syscalls are only in CURRENT, so this shouldn't cause
problems.
improve performance:
- Eliminate custom reference count and condition variable to monitor
threads entering the framework, as this had both significant overhead
and behaved badly in the face of contention.
- Replace reference count with two locks: an rwlock and an sx lock,
which will be read-acquired by threads entering the framework
depending on whether a give policy entry point is permitted to sleep
or not.
- Replace previous mutex locking of the reference count for exclusive
access with write acquiring of both the policy list sx and rw locks,
which occurs only when policies are attached or detached.
- Do a lockless read of the dynamic policy list head before acquiring
any locks in order to reduce overhead when no dynamic policies are
loaded; this a race we can afford to lose.
- For every policy entry point invocation, decide whether sleeping is
permitted, and if not, use a _NOSLEEP() variant of the composition
macros, which will use the rwlock instead of the sxlock. In some
cases, we decide which to use based on allocation flags passed to the
MAC Framework entry point.
As with the move to rwlocks/rmlocks in pfil, this may trigger witness
warnings, but these should (generally) be false positives as all
acquisition of the locks is for read with two very narrow exceptions
for policy load/unload, and those code blocks should never acquire
other locks.
Sponsored by: Google, Inc.
Obtained from: TrustedBSD Project
Discussed with: csjp (idea, not specific patch)
(1) Fix pcib_read/write_config prototypes.
(2) When contrainting a resource request for a 'subtractive' bridge,
it is important to select a range outside the base/limit
registers, since those are the only values known to not
possibly work. On my HP laptop, the base bridge excludes I/O
ports 0xa000-0xafff, however that was the range we were passing
up the tree. Instead, when a range spans the "hole" we now
arbitrarily pick the range just above the hole to allocate from.
All of my rl and xl cards, at a minimum, started working again on this
laptop with those fixes.
- When sending large PR-SCTP messages over a
lossy link we would incorrectly calculate the fwd-tsn
- When receiving large multipart pr-sctp packets we would
incorrectly send back a SACK that would renege improperly
on already received packets thus causing unneeded retransmissions.
is calculated as 0 which causes errors elsewhere.
Submitted by: KOIE Hidetaka <koie@suri.co.jp>
- When sched_affinity() is called with a thread that is not curthread we
need to handle the ON_RUNQ() case by adding the thread to the correct
run queue.
Submitted by: Justin Teller <justin.teller@gmail.com>
MFC after: 1 Week
".note.ABI-tag" section.
The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.
Move code which fetch osreldate for ELF binary to check_note() handler.
PR: 118473
Approved by: kib (mentor)
booting because the CD driver did not use bounce buffers to ensure
request buffers sent to the BIOS were always in the first 1MB. Copy over
the bounce buffer logic from the BIOS disk driver (minus the 64k boundary
code for floppies) to fix this.
Reported by: kensmith
look like a temperature.
This driver will most likely be renamed to something more meaningful in
the near future.
Submitted by: nork
MFC after: 2 weeks
o add 9280 attach that sets up ini, cal, etc.
o new rf backend for 9280 and later parts
o split ini setup and spur mitigation support out to methods
and provide 9280-specific support
o minor fixups to shared code to handle 9280-specific work
Obtained from: Atheros (ini values and some code)
Provide a custom lock around initializing and tearing down EA area,
to prevent both memory leaks and double-free of it. Count the number
of EA area accessors.
Lock protocol requires either holding exclusive vnode lock to modify
i_ea_area, or shared vnode lock and owning IN_EA_LOCKED flag in i_flag.
Noted by: YAMAMOTO, Taku <taku tackymt homeip net>
Tested by: pho (previous version)
MFC after: 2 weeks
both disks, or if we should suppress the slave drive. Default to
suppressing the slave, in the case that this REQIURED tuple turns out
to not actually be present...
Based on the HAL preemption lock there is a problem on SMP machines
and causes a panic.
o When a device detached the current tactic to detach NDIS USB driver is
to call SURPRISE_REMOVED event. So it don't need to call
ndis_halt_nic() again. This fixes some page faults when some drivers
work abnormal.
o it assumes now that URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER is in
DISPATCH_LEVEL (non-sleepable) and as further work
URB_FUNCTION_VENDOR_XXX and URB_FUNCTION_CLASS_XXX should be.
Reviewed by: Hans Petter Selasky <hselasky_at_freebsd.org>
Tested by: Paul B. Mahol <onemda_at_gmail.com>
More HID parsing fixes for usb mice.
- be less strict on the last HID item usage.
- preserve item size and count accross items
- improve default HID usage selection.
Tested by: ache
Submitted by: Hans Petter Selasky
o Header file cleanup.
o bus_dma(9) conversion.
- Removed all consumers of vtophys(9) and converted to use
bus_dma(9).
- Typhoon2 functional specification says the controller supports
64bit DMA addressing. However all Typhoon controllers are known
to lack of DAC support so 64bit DMA support was disabled.
- The hardware can't handle more 16 fragmented Tx DMA segments so
teach txp(4) to collapse these segments to be less than 16.
- Added Rx buffer alignment requirements(4 bytes alignment) and
implemented fixup code to align receive frame. Previously
txp(4) always copied Rx frame to align it on 2 byte boundary
but its copy overhead is much higher than unaligned access on
i386/amd64. Alignment fixup code is now applied only for
strict-alignment architectures. With this change i386 and
amd64 will get instant Rx performance boost. Typhoon2 datasheet
mentions a command that pads arbitrary bytes in Rx buffer but
that command does not work.
- Nuked pointer trick in descriptor ring. This does not work on
sparc64 and replaced it with bcopy. Alternatively txp(4) can
embed a 32 bits index value into the descriptor and compute
real buffer address but it may make code complicated.
- Added endianness support code in various Tx/Rx/command/response
descriptor access. With this change txp(4) should work on all
architectures.
o Added comments for known firmware bugs(Tx checksum offloading,
TSO, VLAN stripping and Rx buffer padding control).
o Prefer faster memory space register access to I/O space access.
Added fall-back mechanism to use alternative I/O space access.
The hardware supports both memory and I/O mapped access. Users
can still force to use old I/O space access by setting
hw.txp.prefer_iomap tunable to 1 in /boot/loader.conf.
o Added experimental suspend/resume methods.
o Nuke error prone Rx buffer handling code and implemented local
buffer management with TAILQ. Be definition the controller can't
pass the last received frame to host if no Rx free buffers are
available to use as head and tail pointer of Rx descriptor ring
can't have the same value. In that case the Rx buffer pointer in
Rx buffer ring still holds a valid buffer and txp_rxbuf_reclaim()
can't fill Rx buffers as the first buffer is still valid. Instead
of relying on the value of Rx buffer ring, introduce local buffer
management code to handle empty buffer situation. This should fix
a long standing bug which completely hangs the controller under
high network load. I could easily trigger the issue by sending 64
bytes UDP frames with netperf. I have no idea how this bugs was
not fixed for a long time.
o Converted ithread interrupt handler to filter based one.
o Rearranged txp_detach routine such that it's now used for general
clean-up routine.
o Show sleep image version on device attach time. This will help
to know what action should be taken depending on sleep image
version. The version information in datasheet was wrong for newer
NV images so I followed Linux which seems to correctly extract
version numbers from response descriptors.
o Firmware image is no longer downloaded in device attach time. Now
it is reloaded whenever if_init is invoked. This is to ensure
correct operation of hardware when something goes wrong.
Previously the controller always run without regard to running
state of firmware. This change will add additional controller
initialization time but it give more robust operation as txp(4)
always start off from a known state. The controller is put into
sleep state until administrator explicitly up the interface.
o As firmware is loaded in if_init handler, it's now possible to
implement real watchdog timeout handler. When watchdog timer is
expired, full-reset the controller and initialize the hardware
again as most other drivers do. While I'm here use our own timer
for watchdog instead of using if_watchdog/if_timer interface.
o Instead of masking specific interrupts with TXP_IMR register,
program TXP_IER register with the interrupts to be raised and
use TXP_IMR to toggle interrupt generation.
o Implemented txp_wait() to wait a specific state of a controller.
o Separate boot related code from txp_download_fw() and name it
txp_boot() to handle boot process.
o Added bus_barrier(9) to host to ARM communication.
o Added endianness to all typhoon command processing. The ARM93C
always expects little-endian format of command/data.
o Removed __STRICT_ALIGNMENT which is not valid on FreeBSD.
__NO_STRICT_ALIGNMENT is provided for that purpose on FreeBSD.
Previously __STRICT_ALIGNMENT was unconditionally defined for
all architectures.
o Rewrote SIOCSIFCAP ioctl handler such that each capability can be
controlled by ifconfig(8). Note, disabling VLAN hardware tagging
has no effect due to the bug of firmware.
o Don't send TXP_CMD_CLEAR_STATISTICS to clear MAC statistics in
txp_tick(). The command is not atomic. Instead, just read the
statistics and reflect saved statistics to the statistics.
dev.txp.%d.stats sysctl node provides detailed MAC statistics.
This also reduces a lot of waste of CPU cycles as processing a
command ring takes a very long time on ARM93C. Note, Rx
multicast and broadcast statistics does not seem to right. It
might be another bug of firmware.
o Implemented link state change handling in txp_tick(). Now sending
packets is allowed only after establishing a valid link. Also
invoke link state change notification whenever its state is
changed so pseudo drivers like lagg(4) that relies on link state
can work with failover or link aggregation without hacks.
if_baudrate is updated to resolved speed so SNMP agents can get
correct bandwidth parameters.
o Overhauled Tx routine such that it now honors number of allowable
DMA segments and checks for 4 free descriptors before trying to
send a frame. A frame may require 4 descriptors(1 frame
descriptor, 1 or more frame descriptors, 1 TSO option descriptor,
one free descriptor to prevent descriptor wrap-around) at least
so it's necessary to check available free descriptors prior to
setting up DMA operation.
o Added a sysctl variable dev.txp.%d.process_limit to control
how many received frames should be served in Rx handler. Valid
ranges are 16 to 128(default 64) in unit of frames.
o Added ALTQ(4) support.
o Added missing IFCAP_VLAN_HWCSUM as txp(4) can offload checksum
calculation as well as VLAN tag insertion/stripping.
o Fixed media header length for VLAN.
o Don't set if_mtu in device attach, it's already set in
ether_ifattach().
o Enabled MWI.
o Fixed module unload panic when bpf listeners are active.
o Rearranged ethernet address programming logic such that it works
on strict-alignment architectures.
o Removed unused member variables in softc.
o Added support for WOL.
o Removed now unused TXP_PCI_LOMEM/TXP_PCI_LOIO.
o Added wakeup command TXP_BOOTCMD_WAKEUP definition.
o Added a new firmware version query command, TXP_CMD_READ_VERSION.
o Removed volatile keyword in softc as bus_dmamap_sync(9) should
take care of this.
o Removed embedded union trick of a structure used to to access
a pointer on LP64 systems.
o Added a few TSO related definitions for struct txp_tcpseg_desc.
However TSO is not used at all due to the limitation of hardware.
o Redefined PKT_MAX_PKTLEN to theoretical maximum size of a frame.
o Switched from bus_space_{read|write}_4 to bus_{read|write}_4.
o Added a new macro TXP_DESC_INC to compute next descriptor index.
Tested by: don.nasco <> gmail dot com
poll(), only copy out the revents field, not the whole pollfd
structure. Otherwise, if the events field is updated
concurrently by another thread, that update may be lost.
This issue apparently causes problems for the JDK on FreeBSD,
which expects the Linux behavior of not updating all fields
(somewhat oddly, Solaris does not implement the required
behavior, but presumably our adaptation of the JDK is based
on the Linux port?).
MFC after: 2 weeks
PR: kern/130924
Submitted by: Kurt Miller <kurt @ intricatesoftware.com>
Discussed with: kib
internal sysctl_sysctl_name() handler to map the MIB array to a string
name and logs this name in the trace log. This can be useful to see
exactly which sysctls a thread is invoking.
MFC after: 1 month
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
shared vnode lock for the leaf vnode only if the mount point has the
extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
FIFO's require exclusive vnode locks for their open() and close()
routines. (My recent MPSAFE patches for UDF and cd9660 already included
this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.
Submitted by: ups
Reviewed by: pjd (ZFS bits)
MFC after: 1 month
legal in the spec. Add newline to the verbose messages we print when
debugging when this happens. The Hitachi HT-4840-11 is the only card
to hit these in years, and it works well enough if we're liberal about
what we accept.