packets. Reimplement this correctly and use a sysctl that defaults to off so
the user doesnt get any suprises if ipfw blocks the ARP packet.
MFC after: 3 days
m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.
Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:
m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;
to mark the packet m with the specified VLAN tag.
On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:
if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}
VLAN tags are stored in host byte order. Byte swapping may be necessary.
(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)
Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.
Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005
o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6
o add CSUM_TSO flag to mbuf pkthdr csum_flags field
o add tso_segsz field to mbuf pkthdr
o enhance ip_output() packet length check to allow for large TSO packets
o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities
o adjust all callers of tcp_maxmtu[46]() accordingly
Discussed on: -current, -net
Sponsored by: TCP/IP Optimization Fundraise 2005
and skip over the normal IP processing.
Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.
PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
by restoring the ifv_proto field in the vlan softc and putting it to use
this time. It's a good companion for ifv_encaplen, which has already been
used throughout this driver.
before tagging them. This can help to work around brain-damage in some
switches that fail to pad a frame after untagging it if its length drops
below the minimum. This option is blessed by IEEE Std 802.1Q (2003 Ed.),
paragraph C.4.4.3.b. It's controlled by sysctl net.link.vlan.soft_pad.
Idea by: az
MFC after: 1 week
Two almost identical patches based on the if_tap work were submitted
via GNATS; I started out with the patch in 100796 from David Gilbert,
but could have easily started with the patch from Vilmos Nebehaj which
I found only later.
MFC after: 1 week
PR: 93976, 100796
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.
Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
interface, do not just assign -1 to tag because it breaks the logic of
the code to follow. The better way is to handle this case as an unsupported
protocol and return unless INVARIANTS is in effect and we can panic.
Panic is good there because the scenario can happen only because of a
coding error elsewhere.
We also should show the interface name in the panic message for easier
debugging of the problem, should it ever emerge.
Submitted by: qingli (initially)
as it tried to solve:
- it smuggled hidden 802.1q details into otherwise protocol-neutral code;
- it put an important code consistency check under DEBUG, which was never
defined by anyone but a developer hacking this file for the moment;
- lastly, the former bcopy() call had been correct as long as the "dead"
code was there.
(A new version of the fix for tag of -1 to come in the next commit.)
Agreed by: qingli
vlan tag processing, the code will use bcopy() to remove the vlan
tag field but the code copies 2 bytes too many, which essentially
overwrites the protocol type field.
Also, a tag value of -1 is generated for unrecognized interface type,
which would cause an invalid memory access in the vlans[] array.
In addition, removed a line of dead code and its associated comments.
Reviewed by: sam
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.
This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.
It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.
(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)
PR: 71711
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket. pru_abort is now a
notification of close also, and no longer detaches. pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket. This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.
This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree(). With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.
Reviewed by: gnn
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)
Reviewed by: arch@
already locked. The reason to do this is to avoid two lock+unlock operations
in a row. We need the lock here to serialize access to bd_pid for stats
collection purposes.
Drop the locks all together on detach, as they will be picked up by
knlist_remove.
This should fix a failed locking assertion when kqueue is being used with bpf
descriptors.
Discussed with: jmg
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.
Suggested by: glebius, jhb, ru
Previously, another thread could get a pointer to the
interface by scanning the system-wide list and sleep
on the global vlan mutex held by vlan_unconfig().
The interface was gone by the time the other thread
woke up.
In order to be able to call vlan_unconfig() on a detached
interface, remove the purely cosmetic bzero'ing of IF_LLADDR
from the function because a detached interface has no addresses.
Noticed by: a stress-testing script by maxim
Reviewed by: glebius
tested and then set. [1]
Reorganise things to eliminate this, we now ensure that enc0 can not be
destroyed which as the benefit of no longer needing to lock in
ipsec_filter and ipsec_bpf. The cloner will create one interface during the
init so we can guarantee that encif will be valid before any SPD entries are
added to ipsec.
Spotted by: glebius [1]
encryption. There are two functions, a bpf tap which has a basic header with
the SPI number which our current tcpdump knows how to display, and handoff to
pfil(9) for packet filtering.
Obtained from: OpenBSD
Based on: kern/94829
No objections: arch, net
MFC after: 1 month
departing trunk so that we don't get into trouble later
by dereferencing a stale pointer to dead trunk's things.
Prodded by: oleg
Sponsored by: RiNet (Cronyx Plus LLC)
MFC after: 1 week
list.
- First remove from global list, then start destroying.
PR: kern/97679
Submitted by: Alex Lyashkov <shadow itt.net.ru>
Reviewed by: rwatson, brooks
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9
Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)
pointer to a zeroed, statically allocated bpf_if structure. This way the
LIST_EMPTY() macro will always return true. This allows us to remove the
additional unconditional memory reference for each packet in the fast path.
Discussed with: sam
255.255.255.0, and a default route with gateway x.x.x.1. Now if
the address mask is changed to something more specific, e.g.,
255.255.255.128, then after the mask change the default gateway
is no longer reachable.
Since the default route is still present in the routing table,
when the output code tries to resolve the address of the default
gateway in function rt_check(), again, the default route will be
returned by rtalloc1(). Because the lock is currently held on the
rtentry structure, one more attempt to hold the lock will trigger
a crash due to "lock recursed on non-recursive mutex ..."
This is a general problem. The fix checks for the above condition
so that an existing route entry is not mistaken for a new cloned
route. Approriately, an ENETUNREACH error is returned back to the
caller
Approved by: andre
resulting in some build failures. Instead, to fix the problem of bpf not
being present, check the pointer before dereferencing it.
This is a temporary bandaid until we can decide on how we want to handle
the bpf code not being present. This will be fixed shortly.
(1) bpf peer attaches to interface netif0
(2) Packet is received by netif0
(3) ifp->if_bpf pointer is checked and handed off to bpf
(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
initialized to NULL.
(5) ifp->if_bpf is dereferenced by bpf machinery
(6) Kaboom
This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.
Summary of changes:
- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
bpf interface structure. Once this is done, ifp->if_bpf should never be
NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
a lockless read bpf peer list associated with the interface. It should
be noted that the bpf code will pickup the bpf_interface lock before adding
or removing bpf peers. This should serialize the access to the bpf descriptor
list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present
Now what happens is:
(1) Packet is received by netif0
(2) Check to see if bpf descriptor list is empty
(3) Pickup the bpf interface lock
(4) Hand packet off to process
From the attach/detach side:
(1) Pickup the bpf interface lock
(2) Add/remove from bpf descriptor list
Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).
[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
not we have any bpf peers that might be interested in receiving packets.
In collaboration with: sam@
MFC after: 1 month
result, raw_uabort() now needs to call raw_detach() directly. As
raw_uabort() is never called, and raw_disconnect() is probably not ever
actually called in practice, this is likely not a functional change, but
improves congruence between protocols, and avoids a NULL raw cb pointer
after disconnect, which could result in a panic.
MFC after: 1 month
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.
This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.
Reviewed by: brooks
No objections on: net
for IOCTLs where casting data to intptr_t * isn't the right thing to do
as _IO() isn't used for them but _IOR(..., int)/_IOW(..., int) are (i.e.
for all IOCTLs except VMIO_SIOCSIFFLAGS), fixing tap(4) on big-endian
LP64 machines.
PR: sparc64/98084
OK'ed by: emax
MFC after: 1 week
actually destroyed. Also move call to knlist_init() into tapcreate(). This
should fix panic described in kern/95357.
PR: kern/95357
No response from: freebsd-current@
MFC after: 3 days
gateways which are unreachable except through the default router. For
example, assuming there is a default route configured, and inserting
a route
"route add 64.102.54.0/24 60.80.1.1"
is currently allowed even when 60.80.1.1 is only reachable through
the default route. However, an error is thrown when this route is
utilized, say,
"ping 64.102.54.1" will return an error
This type of route insertion should be disallowed becasue:
1) Let's say that somehow our code allowed this packet to flow to
the default router, and the default router knows the next hop is
60.80.1.1, then the question is why bother inserting this route in
the 1st place, just simply use the default route.
2) Since we're not talking about source routing here, the default
router could very well choose a different path than using 60.80.1.1
for the next hop, again it defeats the purpose of adding this route.
Reviewed by: ru, gnn, bz
Approved by: andre
destination. These checks are needed so we do not install
a route looking like this:
(0) 192.0.2.200 UH tun0 =>
When removing this route the kernel will start to walk
the address space which looks like a hang on 64bit platforms
because it'll take ages while on 32bit you should see a panic
when kernel debugging options are turned on.
The problem is in rtrequest1:
if (netmask) {
rt_maskedcopy(dst, ndst, netmask);
} else
bcopy(dst, ndst, dst->sa_len);
In both cases the len might be 0 if the application forgot to
set it. If so ndst will be all-zero leading to above
mentioned strange routes.
This is an application error but we must not fail/hang/panic
because of this.
Looks ok: gnn
No objections: net@ (silence)
MFC after: 8 weeks
The packet filter may reassemble the ip fragments and return a packet that is
larger than the MTU of the sending interface. There is no check for DF or icmp
replies as we can only get a large packet to fragment by reassembling a
previous fragment, and this only happens after a call to pfil(9).
Obtained from: OpenBSD (mostly)
Glanced at by: mlaier
MFC after: 1 month
protocols invoke after allocating a PCB, so so_pcb should be non-NULL.
It is only used by the two IPSEC implementations, so I didn't hit it in
my testing.
Reported by: pjd
MFC after: 3 months
the so_pcb pointer on the socket is always non-NULL. This eliminates
countless unnecessary error checks, replacing them with assertions.
MFC after: 3 months
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.
MFC after: 3 months
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.
This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.
MFC after: 3 months
details see PR kern/94448.
PR: kern/94448
Original patch: Eygene A. Ryabinkin <rea-fbsd at rea dot mbslab dot kiae dot ru>Final patch: thompsa@
Tested by: thompsa@, Eygene A. Ryabinkin
MFC after: 7 days
K&R style function declarations with ANSI style. Also fix endian bugs
accessing ioctl arguments that are passed by value.
PR: kern/93897
Submitted by: Vilmos Nebehaj < vili at huwico dot hu >
MFC after: 1 week
applications.
Open[BGP|OSPF]D make use of this to determine the link status of
interfaces to make the right routing descisions.
Obtained from: OpenBSD
MFC after: 3 days
- Allow RTM_CHANGE to change a number of route flags as specified by
RTF_FMASK.
- The unused rtm_use field in struct rt_msghdr is redesignated as
rtm_fmask field to communicate route flag changes in RTM_CHANGE
messages from userland. The use count of a route was moved to
rtm_rmx a long time ago. For source code compatibility reasons
a define of rtm_use to rtm_fmask is provided.
These changes faciliate running of multiple cooperating routing
daemons at the same time without causing undesired interference.
Open[BGP|OSPF]D make use of these features to have IGP routes
override EGP ones.
Obtained from: OpenBSD (claudio@)
MFC after: 3 days
will remain in the disabled state until another link event happens in the
future (if at all). Add a timer to periodically check the interface state and
recover.
Reported by: Nik Lam <freebsdnik j2d.lam.net.au>
MFC after: 3 days
re-organizing the monitor return logic. We perform interface monitoring
checks after we have determined if the CRC is still on the packet, if
it is, m_adj() is called which will adjust the packet length. This
ensures that we are not including CRC lengths in the byte counters for
each packet.
Discussed with: andre, glebius
that we might have address collisions, so make sure that this hardware address
isn't already in use on another bridge.
Submitted by: csjp
MFC after: 1 month
destination interface as a member of our bridge or this is a unicast packet,
push it through the bpf(4) machinery.
For broadcast or multicast packets, don't bother with the bpf(4) because it will
be re-injected into ether_input. We do this before we pass the packets through
the pfil(9) framework, as it is possible that pfil(9) will drop the packet or
possibly modify it, making it very difficult to debug firewall issues on the
bridge.
Further, implemented IFF_MONITOR for bridge interfaces. This does much the same
thing that it does for regular network interfaces: it pushes the packet to any
bpf(4) peers and then returns. This bypasses all of the bridge machinery,
saving mutex acquisitions, list traversals, and other operations performed by
the bridging code.
This change to the bridging code is useful in situations where individuals use a
bridge to multiplex RX/TX signals from two interfaces, as is required by some
network taps for de-multiplexing links and transmitting the RX/TX signals
out through two separate interfaces. This behaviour is quite common for network
taps monitoring links, especially for certain manufacturers.
Reviewed by: thompsa
MFC after: 1 month
Sponsored by: Seccuris Labs
- use the cu_bridge_id rather than the cu_rootid for the bridge address [1]
- the memcmp return value is not signed so the wrong interface may have been
selected
- fix up the calculation of sc_bridge_id
PR: kern/93909 [1]
MFC after: 3 days
and want to have crypto support loaded as KLD. By moving zlib to separate
module and adding MODULE_DEPEND directives, it is possible to use such
configuration without complication. Otherwise, since IPSEC is linked with
zlib (just like crypto.ko) you'll get following error:
interface zlib.1 already present in the KLD 'kernel'!
Approved by: cognet (mentor)
zero in this case.) A kernel driver has IFF_DRV_RUNNING
at its full disposal while IFF_UP may be toggled only by
humans or their daemonic deputies from the userland.
MFC after: 3 days
did not stop at the right node. Change the backtracking check from
smaller-than to smaller-or-equal to prevent this from happening.
While here fix one additional problem where the insertion of the
default route traversed the entire tree.
PR: kern/38752
Submitted by: qingli (before I became committer)
Reviewed by: andre
MFC after: 3 days
filtering mechanisms to use the new rwlock(9) locking API:
- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:
if (!PFIL_HOOKED(ph))
goto skip_hooks;
- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros
Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API
- code expects memcmp() to return a signed value, our memcmp() returns 0 if
args are equal and > 0 if not.
- It's possible to hijack interface for static entry. If bridge recieves
packet from interface marked as learning it will replace the bridge_rtnode
entry for the source address even if such entry marked as static.
Submitted by: Gleb Kurtsov <k-gleb yandex.ru>
MFC after: 3 days
beginning and simply refuse to attach to a parent without either
flag.
Our network stack cannot handle well IFF_BROADCAST or IFF_MULTICAST
on an interface changing on the fly. E.g., IP will or won't assign
a broadcast address to an interface and join the all-hosts multicast
group on it depending on its IFF_BROADCAST and IFF_MULTICAST settings.
Should the flags alter later, IP will miss the change and keep using
bogus settings. This can lead to evil things like supplying an
invalid broadcast address or trying to leave a multicast group that
hasn't been joined. So just avoid touching the flags since an
interface was created. This has no practical purpose.
Discussed with: -net, glebius, oleg
MFC after: 1 week
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.
The most important changes:
o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.
In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:
- Add per-softc mutex.
- Hold the mutex on output.
The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.
Reported by: many
Tested by: Alexander Shiryaev <aixp mail.ru>
Vararg functions have a different calling convention than regular
functions on amd64. Casting a varag function to a regular one to
match the function pointer declaration will hide the varargs from
the caller and we will end up with an incorrectly setup stack.
Entirely remove the varargs from these functions and change the
functions to match the declaration of the function pointers.
Remove the now unnecessary casts.
Lots of explanations and help from: peter
Reviewed by: peter
PR: amd64/89261
MFC after: 6 days
may have changed by m_pullup() during fastforward processing.
While this is a bug it is actually never triggered in real world
situations and it is not remotely exploitable.
Found by: Coverity Prevent(tm)
Coverity ID: CID780
Sponsored by: TCP/IP Optimization Fundraise 2005
restored when its removed from the bridge.
At the moment we only clear IFCAP_TXCSUM. Since a locally generated packet on
the bridge may be sent out any one or more interfaces it cant be assumed that
every card does hardware csums. Most bridges don't generate a lot of traffic
themselves so turning off offloading won't hurt, bridged packets are
unaffected.
Tested by: Bruce Walker (bmw borderware.com)
MFC after: 5 days
ef_clone(); we were testing the original ifnet, not the one allocated.
When aborting ef_clone() due to if_alloc() failing, free the allocated
efnet structure rather than leaking it.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
SLIST_FOREACH_SAFE() rather than SLIST_FOREACH(), as elements are
freed on each iteration of the loop. This prevents use-after-free.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
attempted to cast a struct ifnet to a struct fw_com which resulted in
data corruption.
PR: kern/91307
Submitted by: Alex Semenyaka <alex at semenyaka do ru>
MFC After: 6 days
ETHERTYPE_IPV6 frames. Change this to be a sysctl knob so that is able to still
bridge non-IP packets if desired.
Also return early if all pfil_* sysctls are turned off, the user obviously does
not want to filter on the bridge.
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.
Obtained from: NetBSD
MFC after: 2 weeks
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
span ports when they disappear. The span port does not have a pointer to the
softc so revert r1.31 and bring back the softc linked-list.
MFC after: 2 weeks
Use the following kernel configuration option to enable:
options BPF_JITTER
If you want to use bpf_filter() instead (e. g., debugging), do:
sysctl net.bpf.jitter.enable=0
to turn it off.
Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).
Obtained from: WinPcap 3.1 (for i386)
- In ifc_name2unit(), disallow leading zeroes in a unit.
Exploit: ifconfig lo01 create
- In ifc_name2unit(), properly handle overflows. Otherwise,
either of two local panic()'s can occur, either because
no interface with such a name could be found after it was
successfully created, or because the code will bogusly
assume that it's a wildcard (unit < 0 due to overflow).
Exploit: ifconfig lo<overflowed_integer> create
- Previous revision made the following sequence trigger
a KASSERT() failure in queue(3):
Exploit: ifconfig lo0 destroy; ifconfig lo0 destroy
This is because IFC_IFLIST_REMOVE() is always called
before ifc->ifc_destroy() has been run, not accounting
for the fact that the latter can fail and leave the
interface operating (like is the case for "lo0").
So we ended up calling LIST_REMOVE() twice. We cannot
defer IFC_IFLIST_REMOVE() until after a call to
ifc->ifc_destroy() because the ifnet may have been
removed and its memory has been freed, so recover from
this by re-inserting the ifnet in the cloned interfaces
list if ifc->ifc_destroy() indicates a failure.
If the packet is rejected from pfil(9) then continue the loop rather than
returning, this means that we can still try to send it out the remaining
interfaces but more importantly the mbuf is freed and refcount decremented on
exit.
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
copy of Ethernet address.
- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.
softc lists and associated mutex are now unused so these have been removed.
Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.
Idea by: brooks
Reviewed by: brooks
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it. It only adds a layer of confusion. The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.
Non-native code is not changed in this commit. For compatibility
MT_HEADER is mapped to MT_DATA.
Sponsored by: TCP/IP Optimization Fundraise 2005
promisc flag from the member interface, this is a no-op anyway since the
interface is disappearing. The driver may have already released
its resources such as miibus and this is likely to panic the kernel.
Submitted and tested by: Wojciech A. Koszek
MFC after: 2 weeks
OR the flags bits with the driver managed status flags. This fixes an
issue where RUNNING flags would not be reported to processes, which
conflicts with the flags information provided by ifconfig(8).
that the following funtions are not used, wrap in '#ifdef noused' for the
moment.
bstp_enable_change_detection
bstp_disable_change_detection
bstp_set_bridge_priority
bstp_set_port_priority
bstp_set_path_cost
- move the function pointer definitions to if_bridgevar.h
- move most of the logic to the new BRIDGE_INPUT and BRIDGE_OUTPUT macros
- remove unneeded functions from if_bridgevar.h and sort a little.
Use bridge_ifdetach() to notify the bridge that a member has been detached. The
bridge can then remove it from its interface list and not try to send out via a
dead pointer.
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.
Assert that all cloners have been destroyed when freeing the memory.
Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.
Discussed with: brooks, pjd, -current
Reviewed by: brooks
http://lists.freebsd.org/pipermail/cvs-src/2004-October/033496.html
The same problem applies to if_bridge(4), too.
- Copy-and-paste the if_bridge(4) related block from
if_ethersubr.c to ng_ether.c
- Add XXXs, so that copy-and-paste would be noticed by
any future editors of this code.
- Also add XXXs near if_bridge(4) declarations.
Silence from: thompsa
module name to something that wouldn't conflict with
sys/dev/firewire/firewire.c.
Submitted by: Cai, Quanqing <caiquanqing at gmail dot com>
PR: kern/82727
MFC after: 3 days
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
can be compiled as loadable modules.
Reviewed by: bde
to the parent interface, such as IFF_PROMISC and
IFF_ALLMULTI. In addition, vlan(4) gains ability
to migrate from one parent to another w/o losing
its own flags.
PR: kern/81978
MFC after: 2 weeks
as it is done for usual promiscuous mode already. This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.
o Axe poll in trap.
o Axe IFF_POLLING flag from if_flags.
o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.
o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.
o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.
Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.
Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.
Reviewed by: ru, sam, jhb
replacement and has additional features which make it superior.
Discussed on: -arch
Reviewed by: thompsa
X-MFC-after: never (RELENG_6 as transition period)
copied mbuf, which keeps the IP header 32-bit aligned. This copied mbuf is
reinjected back into ether_input and off to the IP routines.
Reported and tested by: Peter van Dijk
Approved by: mlaier (mentor)
MFC after: 3 days
- Rearrange code so that in a case of failure the affected
route is not changed. Otherwise, a bogus rtentry will be
left and later rt_check() can recurse on its lock. [1]
- Remove comment about protocol cloning.
- Fix two places where rtentry mutex was recursed on, because
accessed via two different pointers, that were actually pointing
to the same rtentry in some cases. [1]
- Return EADDRINUSE instead of bogus EDQUOT, in case when gateway
uses the same route. [2]
Reported & tested by: ps, Andrej Zverev <az inec.ru> [1]
PR: kern/64090 [2]
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:
- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().
- Annotate that if_detach() needs careful consideration.
- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.
- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.
It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.
MFC after: 3 days
- Use __func__ consistently instead of copying function name
to message strings. Code tends to migrate around source files.
- DIAGNOSTIC is for information, INVARIANTS is for panics.
m_tag_locate(). This adds little overhead of a simple
bitwise operation in case hardware VLAN acceleration
is on, yet saves the more expensive function call if
the acceleration is off.
Reviewed by: ru, glebius
X-MFC-after: 6.0
locks were not aquired because the user buffers were not wired, thus it was
possible that that SYSCTL_OUT could sleep, causing a number of different
problems such as lock ordering issues and dead locks.
-Wire user supplied buffer to ensure SYSCTL_OUT will not sleep.
-Pickup ifnet locks to protect the list.
-Where applicable pickup address locks.
-Pickup radix node head locks.
-Remove splnet stubs
-Remove various comments about locking here, because they are no
longer needed.
It is the hope that these changes will make sysctl_rtsock MP safe.
MFC after: 3 weeks
assigned to the interface.
IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.
An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...
Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week
refresh the PID which has the descriptor open. The PID is refreshed in various
operations like ioctl(2), kevent(2) or poll(2). This produces more accurate
information about current bpf consumers. While we are here remove the bd_pcomm
member of the bpf stats structure because now that we have an accurate PID we
can lookup the via the kern.proc.pid sysctl variable. This is the trick that
NetBSD decided to use to deal with this issue.
Special care needs to be taken when MFC'ing this change, as we have made a
change to the bpf stats structure. What will end up happening is we will leave
the pcomm structure but just mark it as being un-used. This way we keep the ABI
in tact.
MFC after: 1 month
Discussed with: Rui Paulo < rpaulo at NetBSD dot org >
value from an m_tag, but also to set it. This reduces
complex code duplication and improves its readability.
Alas, we shouldn't rename the macro to VLAN_TAG_LVALUE()
globally because that would cause pain for kernel module
port maintainers and vendors using FreeBSD as their codebase.
Added a clarifying comment instead.
Discussed with: ru, glebius
X-MFC-After: 6.0-RELEASE (MFC is good just to reduce the diff)
attached.
This is caused by bpf_detachd clearing IFF_PROMISC on the interface which does
a SIOCSIFFLAGS ioctl. The problem here is that while the interface has been
stopped, IFF_UP has not been cleared so IFF_UP != IFF_DRV_RUNNING, this causes
the ioctl function to init() the interface which resets the callouts.
The destroy then completes and frees the softc but softclock will panic on a
dead callout pointer.
Ensure ifp->if_flags matches reality by clearing IFF_UP when we destroy.
Silence from: rwatson
Approved by: mlaier (mentor)
MFC after: 3 days
through locking; leave some spl references around code where there
are open questions about global variable references. Also, add
an XXX regarding locking in sysctl.
MFC after: 3 days
actually 1514, so comparing the mbuf length which includes the Ethernet header
to the interface MTU is wrong.
The check was a little over the top so just remove it.
Approved by: mlaier (mentor)
MFC after: 3 days
enhance the security of bpf(4) by further relinquishing the privilege of
the bpf(4) consumer (assuming the ioctl commands are being implemented).
Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underly parameters of the
bpf(4) device. An example might be the setting of bpf(4) filter programs or
attaching to different network interfaces.
BIOCSETWF can be used to set write filters for outgoing packets. Currently if
a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used
as a raw socket, regardless of consumer's UID. Write filters give users the
ability to constrain which packets can be sent through the bpf(4) descriptor.
These features are currently implemented by a couple programs which came from
OpenBSD, such as the new dhclient and pflogd.
-Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify
whether a read or write filter is to be set.
-Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the
filter program on the mbuf data once we move the packet in from user-space.
-Rather than execute two uiomove operations, (one for the link header and the
other for the packet data), execute one and manually copy the linker header
into the sockaddr structure via bcopy.
-Restructure bpf_setf to compensate for write filters, as well as read.
-Adjust bpf(4) stats structures to include a bd_locked member.
It should be noted that the FreeBSD and OpenBSD implementations differ a bit in
the sense that we unconditionally enforce the lock, where OpenBSD enforces it
only if the calling credential is not root.
Idea from: OpenBSD
Reviewed by: mlaier
struct ifnet most of if_findindex() become a complex no-op. Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling. This should probably be replaced
with subr_unit at some point.
Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes. Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct. This fixes panics when
loading the ef(4) module. The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again. This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).
Reported by: "Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR: kern/84987
MFC After: 1 day
- Add a note that additions should be made to if_free_type and not
if_free to help avoid this in the future.
This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.
Reported by: thompsa
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.
Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().
Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.
Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().
Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days
- rt0 passed to rt_check() must not be NULL, assert this.
- rt returned by rt_check() must be valid locked rtentry,
if no error occured.
o Modify callers, so that they never pass NULL rt0
to rt_check().
Reviewed by: sam, ume (nd6.c)
device driver, owned by the network stack, or initialized by the device
driver before attach and read-only from then on.
Not all device drivers and network stack components currently follow
these rules, especially with respect to IFF_UP, and a few exceptions
with IFF_ALLMULTI.
MFC after: 7 days
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.
Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.
When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.
Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.
With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.
Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.
Reviewed by: pjd, bz
MFC after: 7 days
m_copym(m, 0, M_COPYALL, how).
This is required for strict alignment architectures where we align the IP
header in the input path but m_copym() will create an unaligned copy in
bridge_broadcast(). m_copypacket() preserves alignment of the first mbuf.
Noticed by: Petri Simolin
Approved by: mlaier (mentor)
MFC after: 3 days
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.
This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.
Requested by: phk
MFC after: 3 days
if_attach(). This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver). Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.
Reported by: Holm Tiffe holm at freibergnet dot de
Discussed with: rwatson
using ifp->if_addr_mtx:
- Initialize if_addr_mtx when ifnet is initialized.
- Destroy if_addr_mtx when ifnet is torn down.
- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.
- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.
- Extract ifmultiaddr tear-down and deallocation in if_freemulti().
- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.
- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.
- De-spl all and sundry.
Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
ifp->if_resolvemulti(), do so with M_NOWAIT rather than M_WAITOK, so
that a mutex can be held over the call. In the FDDI code, add a
missing M_ZERO. Consumers are already aware that if_resolvemulti()
can fail.
MFC after: 1 week