dscp as a search key in table lookups;
+ (re)implement a sysctl variable to control the expire frequency of
pipes and queues when they become empty;
+ add 'queue number' as optional part of the flow_id. This can be
enabled with the command
queue X config mask queue ...
and makes it possible to support priority-based schedulers, where
packets should be grouped according to the priority and not some
fields in the 5-tuple.
This is implemented as follows:
- redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
without changing the size or shape of the structure, so there are
no ABI changes. On passing, also document how other fields are
used, and remove some useless assignments in ip_fw2.c
- implement small changes in the userland code to set/read the field;
- revise the functions in ip_dummynet.c to manipulate masks so they
also handle the additional field;
There are no ABI changes in this commit.
a long time and has gone unnoticed just as long, because I kept
using sched_4bsd (due to sched_ule not working with preemption),
but GENERIC had sched_ule by default -- including SMP.
While here, remove unused inclusion of <machine/clock.h>, remove
totally bogus inclusion of <i386/include/specialreg.h>.
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
configuration space on Yukon Ultra(88E8056) such that accesses to
these registers were NOPs which in turn make msk(4) instable on
this controller. Use indirect access method to access
PCI_OUR_REG_[1-5] registers. This should fix a long standing
instability bug which prevented msk(4) working on Yukon Ultra.
Special thanks to koitsu who gave me remote access to his system.
PR: kern/114631, kern/116853
MFC after: 1 week
arcconf tool by Adaptec already seems to use for identifying the
Serial Number of the devices.
Some simple things (like FIB setup and bound checks) are retrieved
from the Adaptec's driver, but this implementation is quite different
because it does use the normal buffer dmat area for loading segments
and not a special one (like the Adaptec's one does).
Sponsored by: Sandvine Incorporated
Discussed with: emaste, scottl
Reviewed by: emaste, scottl
MFC: 2 weeks
their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and
create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy()
to do this work in a central spot. As inpcbinfo becomes more complex
due to ongoing work to add connection groups, this will reduce code
duplication.
MFC after: 1 month
Reviewed by: bz
Sponsored by: Juniper Networks
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
pointer, rather than octeon_fpa_alloc.
o) Report half duplex status properly.
o) Do not unconditionally update the last known link status in the softc. If
report_link isn't set, when octeon_rgmx_config_speed is called the first
time it will tell the driver (essentially) that we have already marked the
interface up. Likewise, don't change media speed and duplex if only the
link status is at issue. [1]
o) Remove manual changing of link state and let octeon_rgmx_config_speed do the
heavy lifting. [1]
Reviewed by: [1] imp
Sponsored by: Packet Forensics
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR: 144529
MFC after: 2 weeks
- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.
PR: kern/144453
Submitted by: Alexander Sack (asack at niksun dot com)[1]
MFC after: 1 week
Yukon FE and Yukon Ultra2. These controllers provide very simple
checksum computation mechanism and it requires additional pseudo
header checksum computation in upper stack. Even though I couldn't
see much performance difference with/without Rx checksum offloading
it may help notebook based controllers.
Actually controller can compute two checksum value by giving
different starting position of checksum computation on received
frame. However, for long time, Marvell's checksum offloading engine
have been known to have several silicon bugs so don't blindly trust
computed partial checksum value. Instead, compute partial checksum
twice by giving the same checksum computation position and compare
the result. If the value is different it's clear indication of
hardware bug. This configuration lose IP checksum offloading
capability but I think it's better to take safe route.
Note, Rx checksum offloading for Yukon XL was still disabled due to
known silicon bug.
index of status block is read first before acknowledging the
interrupts. Otherwise bge(4) may get stale status block as
acknowledging an interrupt may yield another status block update.
Reviewed by: marius
starting from netgraph import in 1999.
netstat(8) used pointer to node as node address, oops. That didn't
work, we need the node ID in brackets to successfully address a node.
We can't look into ng_node, due to inability to include netgraph/netgraph.h
in userland code. So let the node make a hint for a userland, storing
the node ID in its private data.
MFC after: 2 weeks
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.
Unit testing is done using OpenVPN over the if_tun interface.
MFC after: 7 days
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.
PR: 144529
MFC after: 2 weeks
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate
MFC after: 7 days
o) Properly configure the CAM to handle IFF_PROMISC and note where IFF_ALLMULTI
handling would go if we didn't already force the NIC to receive all
multicast traffic.
Reviewed by: imp
Sponsored by: Packet Forensics
o) Inline octeon_rgmx_mark_ready into octeon_rgmx_init.
o) Add a media status handler that reports link and media status.
o) Set link state when if_init is called.
o) Remove some printfs related to driver state changes.
o) Remove some gratuitous comments.
Reviewed by: imp
Sponsored by: Packet Forensics
is issued at the beginning of the initial IN/OUT data transfers. Reason
unknown, probably firmware fault. Now the stall is only cleared on data
transfer errors.
PR: usb/144199
Submitted by: Hans Petter Selasky
other modules can generate USB descriptors.
- extend the vendor specific request function by one length pointer argument,
because not all descriptors store the length in the first byte. For example
HID descriptors.
Submitted by: Hans Petter Selasky
1) vm_machdep.c: remove the dangling allocations so they do not
un-necessarily turn off the cache upon consecutive access.
2) busdma_machdep.c: remove the same amount than shadow mapped.
Reported by: Maks Verver
Submitted by: Mark Tinguely
Reviewed by: Grzegorz Bernacki
MFC after: 3 days
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.
Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.
This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.
MFC after: 3 days
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.
Reviewed by: kib, jhb
given the advent of the extended family and extended model fields. The
values are printed in hex to match their common usage in documentation.
Submitted by: Alexander Best
MFC after: 1 week
- so_pcb is now guaranteed to be non-NULL and valid if a valid socket
reference is held.
- Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a
tcpcb, as it might be a tcptw or NULL otherwise.
- tp can never be NULL by the end of the function, so only check
TCPS_ESTABLISHED before extracting tcpcb fields.
The NFS server arguably incorporates too many assumptions about TCP
internals, but fixing that is left for nother day.
MFC after: 1 week
Reviewed by: bz
Reviewed and tested by: rmacklem
Sponsored by: Juniper Networks
Also disable relaxed ordering as recommended by data sheet for
PCI-X devices. For PCI-X BCM5704, set maximum outstanding split
transactions to 0 as indicated by data sheet.
For BCM5703 in PCI-X mode, DMA read watermark should be less than
or equal to maximum read byte count configuration. Enforce this
limitation in DMA read watermark configuration.
chip revision often found in the blades and resulting in interfaces
not sensing carrier signal. Looking at all problem reports it
appears that it only affects some very specific silicon revision
(ASIC (0x57081021); Rev (B2)) and version of the PHY that
supports 1000baseSX-FDX media only. Therefore, narrow the scope of
workaround to combination of that revision and media type. Given
that the first report on this issue is dated back to 2007, there is
not much hope that this issue will ever be properly resolved.
Among affected systems are IBM HS21, Intel SBXD132 and HP BL460c.
PR: 118238, 122551, 140970
MFC after: 1 month
These header files only provide functionality that can be used in
combination with libcompat. In order to prevent people from including
them without any actual use (which happens a lot with <sys/timeb.h>),
put a warning here to make people more aware.
This means we have to lower WARNS for libcompat, which is no big deal.
are referenced directly from ivar pointer. It's to do like what other
buses do. [1]
o changes exported prototypes. It doesn't use struct siba_* structures
anymore that instead of it it uses only device_t.
o removes duplicate code and debug messages.
o style(9)
Pointed out by: imp [1]
Setting the new sysctl MIB "debug.acpi.enable_debug_objects" to a non-zero
value enables us to print Debug object when something is written to it.
- Allow users to disable interpreter slack mode. Setting the new tunable
"debug.acpi.interpreter_slack" to zero disables some workarounds for common
BIOS mistakes and enables strict ACPI implementations by the specification.
processors. With this workaround, superpage promotion can be re-enabled
under virtualization. Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.
Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it. Andriy also reviewed
and tested my patches.
Discussed with: jhb
MFC after: 3 weeks
counting in incrementing the interrupt nesting level. This fixes a number
of bugs in which the interrupt thread could be preempted by an IPI,
indefinitely delaying acknowledgement of the interrupt to the PIC, causing
interrupt starvation and hangs.
Reported by: linimon
Reviewed by: marcel, jhb
MFC after: 1 week
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
MFC after: 5 days
pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.
This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.
MFC after: 3 days
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).
Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.
Sponsored by: ISPsystem
Reviewed by: rwatson
MFC after: 5 days
violated: so_pcb can never be NULL for a valid UDP socket, and it is
always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does.
MFC after: 1 week
Reviewed by: bz
Sponsored by: Juniper Networks
On Linux, /proc/<pid>/fd is comparable to fdescfs, where it allows you
to inspect the file descriptors used by each process. Glibc's ttyname()
works by performing a readlink() on these nodes, since all nodes in this
directory are symlinks.
It is a bit hard to implement this in linprocfs right now, so I am not
going to bother. Add a way to make ttyname(3) work, by adding a
/proc/<pid>/fd symlink, which points to /dev/fd only if the calling
process matches. When fdescfs is mounted, this will cause the
readlink() in ttyname() to fail, causing it to fall back on manually
finding a matching node in /dev.
Discussed on: emulation@
been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was
guaranteed to be stable when a valid socket reference is held (as it is in
the output path).
MFC after: 1 week
Reviewed by: bz
Sponsored by: Juniper Networks
tcbinfo lock there: r175612, which re-added it, masked a race between
sonewconn(2) and accept(2) that could allow an incompletely initialized
address on a newly-created socket on a listen queue to be exposed. Full
details can be found in that commit message.
MFC after: 1 week
Sponsored by: Juniper Networks
radix table root nodes. This is only needed (and available)
in the virtualization case to free the resources when tearing
down a virtual network stack.
Sponsored by: ISPsystem
Reviewed by: julian, zec
MFC after: 5 days
to not leak them making the VM subsystem unhappy with every stoped vnet(*).
We will still leak pages (especially as zones are marked NOFREE).
(*) This will also keep vmstat -z more usable.
Sponsored by: ISPsystem
MFC after: 5 days
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid. Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS. The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.
Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.
Sponsored by: ISPsystem
MFC after: 5 days
than spinning forever. This fixes booting with CF ejected.
NB: I've made the driver pretty chatty about errors in case there's hardware
that operates differently to mine, so we can easily track down any issues.
Reviewed by: imp
Sponsored by: Packet Forensics
redundant implementations.
o) Use ABI, not ISA, to determine address length.
o) Disable and restore interrupts around any operation that uses all 64 bits of
a register. In kernels using the O32 ABI, the upper 32 bits of those
registers is likely to be corrupted by an interrupt.
Sponsored by: Packet Forensics
In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.