This regression was introduced in r213323.
There are probably no Intel cpus that support amd64 mode, but do not
support cpuid level 4, but it's better to keep i386 and amd64 versions
of this code in sync.
Discovered by: pho
Tested by: pho
MFC after: 2 weeks
the recent changes to track BAR state explicitly. The code would now
attempt to add the same BAR twice in this case. Instead, change this so
that it recognizes this case and only adds it once and do not delete the
BAR outright after parsing the CIS.
Tested by: bschmidt
chnage is different to the one suggested in the PR to try to avoid
cluttering the man page too much.
PR: docs/154494
Submitted by: kilian <kilian.klimek googlemail.com>
MFC after: 1 week
struct inpcbgroup. pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.
Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups. During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock. By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details). This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).
Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems". However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.
Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies. Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.
Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect. In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).
Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.
Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
Options for DNS Configuration) into rtadvd(8) and rtsold(8). DNS
information received by rtsold(8) will go to resolv.conf(5) by
resolvconf(8) script. This is based on work by J.R. Oldroyd (kern/156259)
but revised extensively[1].
- rtadvd(8) now supports "noifprefix" to disable gathering on-link prefixes
from interfaces when no "addr" is specified[2]. An entry in rtadvd.conf
with "noifprefix" + no "addr" generates an RA message with no prefix
information option.
- rtadvd(8) now supports RTM_IFANNOUNCE message to fix crashes when an
interface is added or removed.
- Correct bogus ND_OPT_ROUTE_INFO value to one in RFC 4191.
Reviewed by: bz[1]
PR: kern/156259 [1]
PR: bin/152458 [2]
an explicit action for INET6 configuration happens. The changes are:
1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
an attempt to clear the ND6_IFF_IFDISABLED flag.
2. When an AF_INET6 address is added successfully to an interface and
it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
ND6_IFF_IFDISABLED happens.
This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
in most cases manual configuration is no longer needed.
- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
a link-local address.
This change ensures link-local address configuration when "ifconfig IF inet6"
command is invoked. For example, "ifconfig IF inet6 auto_linklocal" now
always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
set to 1 (i.e. down/up cycle is no longer needed).
Reviewed by: bz
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
This controls if accepting a route in an RA message as the default route.
The default value for each interface can be set by net.inet6.ip6.no_radr.
The system wide default value is 0.
- A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in
NA on RA accepting interfaces. The default is 0 (R-bit is set based on
net.inet6.ip6.forwarding).
Background:
IPv6 host/router model suggests a router sends an RA and a host accepts it for
router discovery. Because of that, KAME implementation does not allow
accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can
make the routing table confused since it can change the default router
unintentionally.
However, in practice there are cases where we cannot distinguish a host from
a router clearly. For example, a customer edge router often works as a host
against the ISP, and as a router against the LAN at the same time. Another
example is a complex network configurations like an L2TP tunnel for IPv6
connection to Internet over an Ethernet link with another native IPv6 subnet.
In this case, the physical interface for the native IPv6 subnet works as a
host, and the pseudo-interface for L2TP works as the default IP forwarding
route.
Problem:
Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
accepting them when net.inet6.ip6.forward=0 cause the following practical
issues:
- A router cannot perform SLAAC. It becomes a problem if a box has
multiple interfaces and you want to use SLAAC on some of them, for
example. A customer edge router for IPv6 Internet access service
using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
physical interface for administration purpose; updating firmware
and so on (link-local addresses can be used there, but GUAs by
SLAAC are often used for scalability).
- When a host has multiple IPv6 interfaces and it receives multiple RAs on
them, controlling the default route is difficult. Router preferences
defined in RFC 4191 works only when the routers on the links are
under your control.
Details of Implementation Changes:
Router Advertisement messages will be accepted even when
net.inet6.ip6.forwarding=1. More precisely, the conditions are as
follow:
(ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
=> Normal RA processing on that interface. (as IPv6 host)
(ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
=> Accept RA but add the router to the defroute list with
rtlifetime=0 unconditionally. This effectively prevents
from setting the received router address as the box's
default route.
(!ACCEPT_RTADV)
=> No RA processing on that interface.
ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface
are classified as "RA-accepting" or not. An RA-accepting interface always
processes RA messages regardless of ip6.forwarding. The difference caused by
NO_RADR or ip6.forwarding is whether the RA source address is considered as
the default router or not.
R-bit in NA on the RA accepting interfaces is set based on
net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests
a router should disable the R-bit completely even when the box has
net.inet6.ip6.forwarding=1, I believe there is no technical reason with
doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
(the default is 0).
Usage:
# ifconfig fxp0 inet6 accept_rtadv
=> accept RA on fxp0
# ifconfig fxp0 inet6 accept_rtadv no_radr
=> accept RA on fxp0 but ignore default route information in it.
# sysctl net.inet6.ip6.norbit_no_radr=1
=> R-bit in NAs on RA accepting interfaces will always be set to 0.
o boot/mfsroot.gz is no more. Copy it only when it exists so as still
to be compatible with Makefile.sysinstall.
o while here, make ispfw.ko optional as well.
o '-b bootimage' is not a valid argument for makefs. What was meant
was '-o bootimage'.
o create the boot image in the current directory so that makefs can
find the file. Previously it had to be created under $BASE because
that's how mkisofs wanted it.
Eliminate one (of several) possible conflicting buffer locks when
trying to reclaim blocks. Rest of fix to be incorporated as part
of SUJ update by jeff.
Pointed out by: Kostik Belousov
should be ok, since the client now delays NFSv4 Close operations
until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no
risk that the NFSv4 Open is closed while an associated byte range lock
still exists.
Tested by: avg
MFC after: 2 weeks
"p_leader" for the "id" for POSIX byte range locking. I think
this would only have affected processes created by rfork(2)
with the RFTHREAD flag specified. This patch fixes that by
passing the "id" down through the various functions from
nfs_advlock().
MFC after: 2 weeks
If the here-document is attached to a compound command or subshell, $?
already works properly. This is both a workaround for bin/41410 and a
requirement for a true fix for bin/41410.
PR: bin/41410
MFC after: 1 week
type of a software- or hardware-generated hash held in the
mbuf.m_pkthdr.flowid field, and provide accessor macros to easily
clear, set, receive, and test for hash values. Some of these
constants correspond to RSS hash types, but we don't want to limit
ourselves to that, as a number of other hashing techniques are in
use on hardware supported by FreeBSD.
Mark the M_FLOWID flag as deprecated; I hope to remove this before
9.0, changing drivers and the stack over to using the new
M_HASHTYPEBITS, most likely to use M_HASHTYPE_OPAQUE as we don't yet
want to nail down the KPI for RSS key/bucket management for device
drivers.
MFC after: 3 days
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
Ignoring the parameter with the unknown options is unlikely to be what was
intended.
Example:
find -n .
Note that things like
find -n
already caused an exit, equivalent to "find" by itself.
hash install, etc. For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).
Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.
(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI. However, it won't be useful without
other previous possibly less MFCable changes.)
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
for lookup. I missed its call to in_pcbbind() when preparing previous
patches, which would lead to a lock assertion failure (although problem
not an actual race condition due to global pcbinfo locks providing
required synchronisation -- in this particular case only). This change
adds the missing locking of the pcbhash lock.
(Existing comments in the ipdivert code question the need for using the
global hash to manage the namespace, as really it's a simple port
namespace and not an address/port namespace. Also, although in_pcbbind
is used to manage reservations, the hash tables aren't used for lookup.
It might be a good idea to make them use hashed lookup, or to use a
different reservation scheme.)
Reviewed by: bz
Reported by: Kristof Provost <kristof at sigsegv.be>
Sponsored by: Juniper Networks
checks for collision/non-collision properties in binding them. This
test would have identified a bug recently reported on current@
involding my disaggregation of the pcbinfo lock.
It would be nice if this test also exercised packet diversion and
injection, but that is for another day.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
sending. What happens otherwise is that the sender splits all the
traffic into 32k chunks, while the receiver is waiting for the whole
packet. Then for a certain packet sizes, particularly 66607 bytes in
my case, the communication stucks to secondary is expecting to
read one chunk of 66607 bytes, while primary is sending two chunks
of 32768 bytes and third chunk of 1071. Probably due to TCP windowing
and buffering the final chunk gets stuck somewhere, so neither server
not client can make any progress.
This patch also protect from short reads, as according to the manual
page there are some cases when MSG_WAITALL can give less data than
expected.
MFC after: 3 days