payload size. Before we had always added the header, no matter if we
actually send out compressed data or not.
With this, after the opencrypto/deflate changes, IPcomp starts to work
apart from edge cases. Leave it disabled by default until those are
fixed as well.
PR: kern/123587
MFC after: 5 days
The <sys/termios.h> header file is hardlinked to <termios.h>. It
contains both the structures and the flag definitions, but also the C
library interface that's implemented by the C library.
This header file has the typical problem of including too many random
things and being badly ordered. Instead of trying to fix this, decompose
it into two header files:
- <sys/_termios.h>, which contains struct termios and the flags.
- <termios.h>, which includes <sys/_termios.h> and contains the C
library interface.
This means userspace has to include <termios.h> for struct termios,
while kernelspace code has to include <sys/tty.h>. Also add a
<sys/termios.h>, which prints a warning message before including
<termios.h>. I am aware that there are some applications that use this
header file as well.
This is needed to avoid running into out of buffer situations
where we cannot alloc a new buffer because we hit the array size
limit (ZBUF).
Use a combined allocation for the struct and the actual data buffer
to not increase the number of malloc calls. [1]
Defer initialization of zbuf until we actually need it.
Make sure the output buffer will be large enough in all cases.
Details discussed with: kib [1]
Reviewed by: kib [1]
MFC after: 6 days
adding statistics counters to the PCPU structure. Export the counters
through sysctl by giving each PCPU structure its own sysctl context.
While here, fix cnt.v_intr by not just having it count clock interrupts,
but every interrupt and add more counters for each interrupt source.
replacement but only use it for inflate. For deflate use Z_FINISH
as Z_SYNC_FLUSH adds a trailing marker in some cases that inflate(),
despite the comment in zlib, does npt seem to cope well with, resulting
in errors when uncompressing exactly fills the outbut buffer without
a Z_STREAM_END and a successive call returns an error.
MFC after: 6 days
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available
PR: powerpc/139154
MFC after: 10 days
Right now <sys/termios.h> includes <sys/ttycom.h>, which provides the
TTY ioctls to the svr4 code. We need both struct termios and the ioctls,
so include <sys/tty.h> for now.
for specific "kinds" of disk labels - for example, GPT UUIDs. Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.
Reviewed by: pjd (earlier version)
MFC after: 1 month
video console which doesn't take any input from keyboard and hides
all output replacing it with ``spinning'' character (useful for
embedded products and custom installations).
Sponsored by: Sippy Software, Inc.
from SUSv4 XSI. Note that the functions are obsoleted, and only
provided to ease porting from System V-like systems. Since sigpause
already exists in compat with different interface, XSI sigpause is
named xsi_sigpause.
Reviewed by: davidxu
MFC after: 3 weeks
long as I remember, and completely superseded by better maintained umass(4).
It's main idea was to optionally avoid CAM dependency for such devices, but
with move ATA to CAM, it is not actual any more.
No objections: hselasky@, thompsa@, arch@
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages. Text pages are not just write protected but they are also
copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy. However, here is where things become
confused. It is the debugger, not the process being debugged that requires
write access to the copied page. Nonetheless, the copied page is being
mapped into the process with write access enabled. In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page. Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses
this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.
Reviewed by: kib
MFC after: 4 weeks
message has been printed is enough to get someones attention. Also remove the
line number for DPRINTF/DPRINTFN, it already prints the funtion name and a
unique message.
of the last tick we incremented on.
Submitted by: matthew.fleming/at/isilon.com, is/at/rambler-co.ru
Reviewed by: jeff (who thinks there should be a better way in the future)
Approved by: gnn (mentor)
MFC after: 3 weeks
- fix a system deadlock on process exit when the sample buffer
is full (pmclog_loop blocked in fo_write) and pmcstat exit.
Reviewed by: jkoshy
MFC after: 3 weeks
bits on seems to confuse hardware TX engine.
- For 350 chips, set TX desc's buffer physical address before turning on the
TX desc valid bit.
Submitted by: Jeremy O'Brien obrien654j | gmail, sephe
Obtained from: DragonFly BSD
- Extend XPT-SIM transfer settings control API. Now it allows to report to
SATA SIM number of tags supported by each device, implement ATA mode and
SATA revision negotiation for both SATA and PATA SIMs.
- Make ahci(4) and siis(4) to use submitted maximum tag number, when
scheduling requests. It allows to support NCQ on devices with lower tags
count then controller supports.
- Make PMP driver to report attached devices connection speeds.
- Implement ATA mode negotiation between user settings, device and
controller capabilities.
When PAGE_SIZE is 16K, MJUMPAGESIZE equals MJUM16BYTES and
causes build breakages.
For PAGE_SIZE < 2K, define MJUMPAGESIZE as MCLBYTES.
For PAGE_SIZE > 8K, define MJUMPAGESIZE as 8K.
Everywhere inbetween, define MJUMPAGESIZE as PAGE_SIZE.
Thus MCLBYTES <= MJUMPAGESIZE <= 8KB.
the kernel stack at all. The new USB stack simply caused a change
in timing that triggered a firmware bug more often. The addition
of PRINTF_BUFR_SIZE apparently triggered the same firmware bug
even more reliably.
But even with KSTACK_PAGES=5, one instance of the firmware bug
remained: booting with a CD inserted. This problem was run into
by accident after installing Debian and having to boot FreeBSD
to fixup the GPT partitioning (Thanks... not). After bumping
KSTACK_PAGES to 5, it was pretty unbelievable that the stack was
still being too small.
After updating the firmware we could boot with a CD inserted and
KSTACK_PAGES could be lowered back to 4 pages without problems.
Note: It is believed to be a timing related firmware bug, because
the machine check information showed access to the serial console
on one CPU and access to the EHCI HCD on the other CPU. Since
both are devices on the management unit and thus virtualized in
some way, any execution trace that does not include concurrent
access to the BMC from both CPUs is fine.
Note also that it's not understood exactly how increasing the
kernel stack avoided hitting the firmware bug. A change in page
faults does change timing, but it's not known if that's what's
happening here.
In any case: the problem is being monitored. Reverting back to
4 pages for the kernel stack is preferred, because it makes it
easier to switch to 16K pages (double the page size) without
wasting too much memory by not being able to half the number of
pages...
using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the
ReaddirPlus RPC. This patch is based upon one by pjd@ for the
regular nfs server which has not yet been committed. It is needed
when a ZFS volume is exported and ReaddirPlus (which almost
always happens for NFSv4) is performed by a client. The patch
also simplifies vnode lock handling somewhat.
MFC after: 2 weeks
This adds zfsloader which will be called by zfsboot/gptzfsboot code rather
than the tradional loader. This eliminates the need to set the
LOADER_ZFS_SUPPORT variable in order to get a ZFS enabled loader.
Note however, that you must reinstall your bootcode (zfsboot/gptzfsboot)
in order for the boot process to use the new loader.
New installations will no longer be required to build a ZFS enabled
loader for a working ZFS boot system. Installing zfsboot/gptzfsboot is
sufficient for acknowledging the use of CDDL code and therefore the ZFS
enabled loader.
Based on a previous patch from jhb@
Reviewed by: jhb@
MFC after: 2 weeks
Tx/Rx/Rx return ring such that large part of status block was not
used at all. All bge(4) controllers except BCM5700 AX/BX has a
feature to control the size of status block. So use minimum status
block size allowed in controller. This reduces number of DMAed
status block size to 32 bytes from 80 bytes.
seem to require a special firmware to use TSO. But the firmware is
not available to FreeBSD and Linux claims that the TSO performed by
the firmware is slower than hardware based TSO. Moreover the
firmware based TSO has one known bug which can't handle TSO if
ethernet header + IP/TCP header is greater than 80 bytes. The
workaround for the TSO bug exist but it seems it's too expensive
than not using TSO at all. Some hardwares also have the TSO bug so
limit the TSO to the controllers that are not affected TSO issues
(e.g. 5755 or higher).
While I'm here set VLAN tag bit to all descriptors that belengs to
a frame instead of the first descriptor of a frame. The datasheet
is not clear how to handle VLAN tag bit but it worked either way in
my testing. This makes it simplify TSO configuration a little bit.
Big thanks to davidch@ who sent me detailed TSO information.
Without this I was not able to implement it.
Tested by: current
have a DMA bug when buffer address crosses a multiple of the 4GB
boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within
4GB address for these controllers. The second DMA bug limits DMA
address to be within 40bit address space. This bug applies to
BCM5714 and BCM5715 and 5708(bce(4) controller). This is not
actually a MAC controller bug but an issue with the embedded PCIe
to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers
also limit the DMA address to be within 40bit address space.
Special thanks to davidch@ who gave me detailed errata information.
I think this change will fix long standing bge(4) instability
issues on systems with more than 4GB memory.
Reviewed by: davidch
PCI flush to get correct status block update. Add an optimized
interrupt handler that is activated for MSI case. Actual interrupt
handling is done by taskqueue such that the handler does not
require driver lock for Rx path. The MSI capable bge(4) controllers
automatically disables further interrupt once it enters interrupt
state so we don't need PIO access to disable interrupt in interrupt
handler.
update and then clear status block. Previously it used to access
these index without synchronization which may cause problems when
bounce buffers are used. Also add missing bus_dmamap_sync(9) in
polling handler. Since we now update status block in driver, adjust
bus_dmamap_sync(9) for status block.
checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we
have less than 16 free send BDs set IFF_DRV_OACTIVE and try it
later. Previously bge(4) used to reserve 16 free send BDs after
loading dma maps but hardware just need one reserved send BD. If
prouder index has the same value of consumer index it means the Tx
queue is empty.
While I'm here check IFQ_DRV_IS_EMPTY first to save one lock
operation.
directly access them at fixed address. While I'm here don't touch
other bits of PCIe device control register except max payload size.
Reviewed by: marius
Binary divider value 10 specified in datasheet is not a hex 0x10.
UDMA2 should be 33/2 instead of 66/4, which is documented as reverved,
UDMA4 should be 66/2 instead of 66/4, which is definitely wrong.
unlock Giant twice.
While there, bring conditions in the do/while loops closer to style,
that also makes the lines fit into 80 columns.
Reported and tested by: dougb
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.
Reviewed by: pjd
MFC after: 2 weeks
controller also has support for IP/TCP checksum offloading for Rx
path. But I failed to find to way to enable Rx MAC to compute the
checksum of received frames.
both big-endian and little-endian format in descriptors for Rx path
but I couldn't find equivalent feature in Tx path. So just stick to
little-endian for now.
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].
Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]
fully support booting from large volumes.
Tested by: Emil Smolenski ambsd of raisa.eu.org
Submitted by: Matt Reimer mattjreimer of gmail (most of the C bits)
MFC after: 1 week
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.
directly access them at fixed address. Frequently the register
offset could be changed if additional PCI capabilities are added to
controller.
One odd thing is ET_PCIR_L0S_L1_LATENCY register. I think it's PCIe
link capabilities register but the location of the register does
not match with PCIe capability pointer + offset. I'm not sure it's
shadow register of PCIe link capabilities register.
if_watchdog and if_timer.
- Fix some issues in detach for sn(4), ste(4), and ti(4). Primarily this
means calling ether_ifdetach() before anything else.
- Reorder detach so that ether_ifdetach() is called first. This removes
the race that ATE_FLAG_DETACHING closed, so that flag can be removed.
- Trim a duplicate clearing of IFF_DRV_RUNNING.
Reviewed by: imp
- Overhaul the locking to avoid recursion and add missing locking in a few
places.
- Don't schedule a task to call vge_start() from contexts that are safe to
call vge_start() directly. Just invoke the routine directly instead
(this is what all of the other NIC drivers I am familiar with do). Note
that vge(4) does not use an interrupt filter handler which is the primary
reason some other drivers use tasks.
- Add a new private timer to drive the watchdog timer instead of using
if_watchdog and if_timer.
- Fixup detach by calling ether_ifdetach() before stopping the interface.
just two different attachments (EISA and PCI) to a single driver.
- Add real locking. Previously these drivers only acquired their lock
in their interrupt handler or in the ioctl routine (but too broadly in
the latter). No locking was used for the stack calling down into the
driver via if_init() or if_start(), for device shutdown or detach. Also,
the interrupt handler held the driver lock while calling if_input(). All
this stuff should be fixed in the locking changes.
- Really fix these drivers to handle if_alloc(). The front-end attachments
were using if_initname() before the ifnet was allocated. Fix this by
moving some of the duplicated logic from each driver into pdq_ifattach().
While here, make pdq_ifattach() return an error so that the driver just
fails to attach if if_alloc() fails rather than panic'ing. Also, defer
freeing the ifnet until the driver has stopped using it during detach.
- Add a new private timer to drive the watchdog timer.
- Pass the softc pointer to the interrupt handlers instead of the device_t
so we can avoid the use of device_get_softc() and to better match what
other drivers do.
auto-negotiation. To make this simpler and easier to understand I have
split this out into two separate timers. One just manages the auto-neg
side of things and one is a transmit watchdog. Neither uses if_watchdog.
- Call ether_ifdetach() at the start of detach.
- Add a missing callout_drain() to detach.
- Hook into the stats timer and use that to drive the transmit watchdog
instead of using if_watchdog.
- Run the stats timer every second to match other drivers instead of every
other second.
- Remove dubious callout handling that stopped the timer only to start it
again while holding the driver lock without dropping it in between the
stop and the start.
Don't allow joins w/o source on an existing group.
This is almost always pilot error.
We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IPV6_MSFILTER
is also disallowed.
MFC after: 1 day
Tighten input checking in in6p_join_group():
* Don't try to use the source address, when its family is unspecified.
* If we get a join without a source, on an existing inclusive
mode group, this is an error, as it would change the filter mode.
Fix a problem with the handling of in6_mfilter for new memberships:
* Do not rely on im6f being NULL; it is explicitly initialized to a
non-NULL pointer when constructing a membership.
* Explicitly initialize *im6f to EX mode when the source address
is unspecified.
This fixes a problem with in_mfilter slot recycling in the join path.
MFC after: 1 day
Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.
MFC after: 1 day
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.
pages.
(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade. This and another bug
involving breakpoints will be fixed in coming changes.)
Reviewed by: kib
to the set actually restored by tl0_ret() instead of using the whole
trapframe. Additionally skip %g7 as that register is used as the
userland TLS pointer.
PR: 140523
MFC after: 1 week
reliable on some Marvell PHYs. If msk(4) know it still does not
have established link check whether msk(4) missed the link state
change by looking into polled link state.
Reported by: Mel Flynn < mel.flynn+fbsd.current <> mailing.thruhere dot net >,
Gleb Kurtsou <gleb.kurtsou <> gmail dot com >
Tested by: Gleb Kurtsou <gleb.kurtsou <> gmail dot com >
the transmit watchdog. These drivers already used a private timer when
compiled to use Netgraph. This change just makes them always use the
private timer. Note that these drivers do not compile and are disconnected
from the build due to TTY changes.
if_watchdog and if_timer. The driver already contained an optional stats
timer that individual attachments could use to provide a 'tick' event. The
stats timer only ran if the tick function pointer was non-NULL and the
attachment's tick routine had to call callout_reset(), etc. Now the driver
always schedules a stat timer and manages the callout_reset() internally.
This timer is used to drive the watchdog and will also call the attachment's
'tick' handler if one is provided.
Tested by: WATANABE Kazuhiro
to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag
that allows sigqueue_add() to fail while trying to allocate memory for
new siginfo. When the flag is not set, behaviour is the same as for
KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is
kept to preserve KBI.
Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is
generated by kernel. Deliver siginfo when signal is generated by kill(2)
family of syscalls (SI_USER with properly filled si_uid and si_pid), or
by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag
is not set for the ksi, low memory condition cause old behaviour.
Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL
si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add
pksignal(9) that behaves like psignal but takes ksi, and ddb kill
command implemented as pksignal(..., ksi = NULL) to not do allocation
while in debugger.
While there, remove some register specifiers and use ANSI C prototypes.
Reviewed by: davidxu
MFC after: 1 month
and Marvell handled. Instead of trying to attach two different drivers to
single device, wrapping each call, make one of them (atajmicron, atamarvell)
attach do device solely, but create child device for AHCI driver,
passing it all required resources. It is quite easy, as none of
resources are shared, except IRQ.
As result, it:
- makes drivers operation more independent and straitforward,
- allows to use new ahci(4) driver with such devices, adding support for
new features, such as PMP and NCQ, same time keeping legacy PATA support,
- will allow to just drop old ataahci driver, when it's time come.
Userland daemons need to see IGMP traffic regardless of the group;
omit the imo filter check if the proto is IGMP. The kernel part
of IGMP will have already filtered appropriately at this point.
MFC after: ASAP
Submitted by: Franz Struwig
Reported by: Ivor Prebeg, Franz Struwig
Remove code that years ago was closing race between request submission
to SIM and device/SIM freeze. That race become impossible after moving from
spl to mutex locking, while this workaround causes some unexpected effects.
I have found that it is not only desktop CPUs problem. but mobile also.
Probably AP on laptops just started initially at lower frequency, hiding
the problem.
Disable frequency validation by default, for systems with more then one CPU,
until we can implement it properly. It looks like making more harm now then
benefits. Add 'hw.est.strict' loader tunable to control it.
Now my iXsystems Invincibook is able to run at 800MHz lowest frequency,
instead of 1200MHz before, when 800MHz was incorrectly reported invalid.
CPU core, only pair of them. As result, both cores are running on highest
one of requested frequencies, and that is reported by status register.
Such behavior confuses frequency validation logic, as it runs on only
one core, as SMP is not yet launched, making EIST completely unusable.
To workaround this, add check for validation result. If we haven't found
at least two usable frequencies, then probably we are looking bad and have
to trust data provided by BIOS as-is.
errors. So far 3 different classes are present (correctable,
uncorrectable and fatal) but more can be added easilly.
Obtained from: Sandvine Incorporated
Reviewed by: emase, gibbs
Sponsored by: Sandvine Incorporated
MFC: 2 weeks
These controllers provide combination of AHCI for SATA and legacy
PCI ATA for PATA. Use same solution as used for JMicron controllers.
Add IDs of Marvell 88SX6102, 88SX6111. 88SX6141 alike controllers
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.
Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.
In collaboration with: Mykola Dzham <freebsd levsha org ua>
Reviewed by: jhb
Tested by: ed, battlez
Right now syscons(4) uses a cons25-style terminal emulator. The
disadvantages of that are:
- Little compatibility with embedded devices with serial interfaces.
- Bad bandwidth efficiency, mainly because of the lack of scrolling
regions.
- A very hard transition path to support for modern character sets like
UTF-8.
Our terminal emulation library, libteken, has been supporting
xterm-style terminal emulation for months, so flip the switch and make
everyone use an xterm-style console driver.
I still have to enable this on i386. Right now pc98 and i386 share the
same /etc/ttys file. I'm not going to switch pc98, because it uses its
own Kanji-capable cons25 emulator.
IMPORTANT: What to do if things go wrong (i.e. graphical artifacts):
- Run the application inside script(1), try to reduce the problem and
send me the log file.
- In the mean time, you can run `vidcontrol -T cons25' and `export
TERM=cons25' so you can run applications the same way you did before.
You can also build your kernel with `options TEKEN_CONS25' to make all
virtual terminals use the cons25 emulator by default.
Discussed on: current@
This fixes a null pointer dereference with "gpart create -s GPT" after
the previous commit.
Reported by: Yuri Pankov
Pointyhat to: me
MFC after: 1 week
items rather than a single one. The list is a space separated collection
of items defined as the current one accepted.
While there fix also a nit in a comment.
Obtained from: Sandvine Incorporated
Reviewed by: emaste
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Sponsored by: Sandvine Incorporated
MFC: 2 weeks
from CD on 64-bit hardware to replace existing band-aids. This occurred
when the preloaded mdroot required too many mappings for the static
buffer.
Since we only use the translations buffer once, allocate a dynamic
buffer on the stack. This early in the boot process, the call chain
is quite short and we can be assured of having sufficient stack space.
Reviewed by: grehan
we did not add. Call LLE_REMREF() only when callout_stop()
actually canceled a pending callout.
- callout_reset() may cancel a pending callout. When
callout_reset() canceled a pending callout, call LLE_REMREF()
to drop a reference for the canceled callout.
MFC after: 1 week