loss links, and 1 second appeared to be too small for high latency links.
If we will receive more complaints, we should make this parameter configurable.
PR: kern/69536
Approved by: archie, julian (mentor)
MFC after: 3 days
operation using NET_NEEDS_GIANT(). This will result in a boot-time
restoration of Giant-enabled network operation, or run-time warning on
dynamic load (applicable only to the Netgraph component). Additional
components will likely need to be marked with this in the future.
its users.
netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.
Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.
PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days
requires a recompile of netgraph users.
Also change the size of a field in the bluetooth code
that was waiting for the next change that needed recompiles so
it could piggyback its way in.
Submitted by: jdp, maksim
MFC after: 2 days
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files
Add netinet/ip_fw_pfil.c.
conf/options
Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile
Add ip_fw_pfil.c.
net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.
netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c
(Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.
netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)
netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.
netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.
Approved by: re (scottl)
- according to RFC2661 an offset size of 0 is allowed.
- when skipping offset padding do not forget to also skip
the 2 octets of the offset size field.
Reviewed by: archie
Approved by: pjd (mentor)
link[n].latency calculated from user supplied value.
This prevents repeated NGM_PPP_SET_CONFIG/NGM_PPP_GET_CONFIG
from failing because of link[n].conf.latency being out of range.
Reviewed by: archie
Approved by: pjd (mentor)
using linker_load_module(). This works OK if NGM_MKPEER message came
from userland and we have process associated with thread. But when
NGM_MKPEER was queued because target node was busy, linker_load_module()
is called from netisr thread leading to panic.
To workaround that we do not load modules by framework, instead ng_socket
loads module (if this is required) before sending NGM_MKPEER.
However, the race condition between return from NgSendMsg() and actual
creation of node still exist and needs to be solved.
PR: kern/62789
Approved by: julian
clients simultaneously. When node is client its mode is configured
with a control message.
sysctl net.graph.nonstandard_pppoe is deprecated but kept for
backward compatibility for some time.
Approved by: julian
Also introduce a macro to be called by persistent nodes to signal their
persistence during shutdown to hide this mechanism from the node author.
Make node flags have a consistent style in naming.
Document the change.
- Return meaningful return errorcodes.
- Free previously allocated connection in error cases.
In ng_device_rcvdata():
- Return meaningful return errorcodes.
- Detach mbuf from netgraph item, and free the item before
doing any other actions that may return from method.
- Do not call strange malloc() for buffer. [1]
- In case of any error jump to end, where mbuf is freed.
In ng_device_disconnect():
- Return meaningful return errorcodes.
- Free disconnected connection.
style(9) in mentioned above functions:
- Remove '/* NGD_DEBUG */', when only one line is ifdef'ed.
- Remove extra braces to easier reading.
- Add space after comma in function calls.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
2. Sort includes, while here.
3. s/NULL/0/ in NG_SEND_MSG_HOOK(), since ng_ID_t is integer.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
is obviously not run a lot. (but is in some test cases).
This code is not usually run because it covers a case that doesn't
happen a lot (removing a node that has data traversing it).
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
we have to revert to TTYDISC which we know will successfully open
rather than try the previous ldisc which might also fail to open.
Do not let ldisc implementations muck about with ->t_line, and remove
code which checks for reopens, it should never happen.
Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation
initialize it in their open routines. Reset to zero when we enter
TTYDISC. ("no" should really be -1 since zero could be a valid
hotchar for certain old european mainframe protocols.)
Thanks to Sam for importing tags in a way that allowed this to be done.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
Also allow the sr and ar drivers to create netgraph versions of their modules.
Document the change to the ksocket node.
- Assert the mutex in NG_IDHASH_FIND() since the mutex is required to
safely walk the node lists in the ng_ID_hash table.
- Acquire the ng_nodelist_mtx when walking ng_allnodes or ng_allhooks
to generate state dump output from the netgraph sysctls.
Only the first link0..link$NLINKS hooks would be utilized, whereas
the link hooks may be connected sparsely.
Add a counter variable so that the link hook array is only traversed
while there is still work to do, but that it continues up to the end
if it has to.
Tweak things so that ng_fec has a chance of working with things
other than ethernet. Use ifp->if_output of the underlying interfaces
and use IF_HANDOFF() rather than depending on ether_output() and
ether_output_frame() explicitly. Also, don't insist that underlying
devices be IFM_ETHER when checking their link states in the link
monitor code.
With these changes, I was able to create a two channel bundle
consisting of one ethernet interface and one 802.11 wireless
device (via ndis). Note that this only works because both devices
use the same if_output vector: ng_fec will not let you bundle
devices with different output vectors together (it really doesn't
make sense to do that).
underlying interfaces rather than using ac_netgraph in struct arpcom.
The latter is meant only for use by ng_ether, and using it breaks
interoperability with the rest of netgraph.
- Lock down low hanging fruit use of sb_flags with socket buffer
lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.
- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to
follow.
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:
SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:
so_qlen so_incqlen so_qstate
so_comp so_incomp so_list
so_head
While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.
While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and address
lock order concerns. In particular:
- Reorganize the optimistic concurrency behavior in accept1() to
always allocate a file descriptor with falloc() so that if we do
find a socket, we don't have to encounter the "Oh, there wasn't
a socket" race that can occur if falloc() sleeps in the current
code, which broke inbound accept() ordering, not to mention
requiring backing out socket state changes in a way that raced
with the protocol level. We may want to add a lockless read of
the queue state if polling of empty queues proves to be important
to optimize.
- In accept1(), soref() the socket while holding the accept lock
so that the socket cannot be free'd in a race with the protocol
layer. Likewise in netgraph equivilents of the accept1() code.
- In sonewconn(), loop waiting for the queue to be small enough to
insert our new socket once we've committed to inserting it, or
races can occur that cause the incomplete socket queue to
overfill. In the previously implementation, it was sufficient
to simply tested once since calling soabort() didn't release
synchronization permitting another thread to insert a socket as
we discard a previous one.
- In soclose()/sofree()/et al, it is the responsibility of the
caller to remove a socket from the incomplete connection queue
before calling soabort(), which prevents soabort() from having
to walk into the accept socket to release the socket from its
queue, and avoids races when releasing the accept mutex to enter
soabort(), permitting soabort() to avoid lock ordering issues
with the caller.
- Generally cluster accept queue related operations together
throughout these functions in order to facilitate locking.
Annotate new locking in socketvar.h.
the socket is on an accept queue of a listen socket. This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
behaviour lost in the change from 4.x style netgraph tee nodes.
Alter the tee node to use the new method. Document the behaviour.
Step the ABI version number... old netgraph klds will refuse to load.
Better than just crashing.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
state. Apparently it happens when both devices try to disconnect RFCOMM
multiplexor channel at the same time.
The scenario is as follows:
- local device initiates RFCOMM connection to the remote device. This
creates both RFCOMM multiplexor channel and data channel;
- remote device terminates RFCOMM data channel (inactivity timeout);
- local device acknowledges RFCOMM data channel termination. Because
there is no more active data channels and local device has initiated
connection it terminates RFCOMM multiplexor channel;
- remote device does not acknowledges RFCOMM multiplexor channel
termination. Instead it sends its own request to terminate RFCOMM
multiplexor channel. Even though local device acknowledges RFCOMM
multiplexor channel termination the remote device still keeps
L2CAP connection open.
Because of hanging RFCOMM multiplexor channel subsequent RFCOMM
connections between local and remote devices will fail.
Reported by: Johann Hugo <jhugo@icomtek.csir.co.za>
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h
Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).
Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)
Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:
#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet
(note that the right hand side would likely be a pointer rather than
the base address of an array.)
of an interface. No functional change.
On passing, comment an useless invocation of TAILQ_INIT(&ifp->if_addrhead)
which could probably be removed in the interest of clarity.
ethernet (tested) and FDDI (not tested). The main use for this is on ADSL (or
other ATM) connections where bridged ethernet is used, PPPoE being a prime
example.
There is no manual page as yet, I will write one shortly.
Reviewed by: harti
functions in kern_socket.c.
Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".
Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.
Submitted by: sam
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
Free approx 86 major numbers with a mostly automatically generated patch.
A number of strategic drivers have been left behind by caution, and a few
because they still (ab)use their major number.
layering violation. As pointed out, there is much better way to do this.
Sorry guys, I need to find a better way to force reviews.
Requested by: harti, julian, scottl (mentor)
Pointy hat to: pjd
It works as follows:
In every 'interval' seconds defined links are checked.
If they are non-active they will not be used by to data transfer.
No response from: julian, archie
Silent on: net@
Approved by: scottl (mentor)
problem here still to be solved: the sockaddr_hci has still a 16 byte
field for the node name. The code currently does not correctly use the
length field in the sockaddr to handle the address length, so
node names get truncated to 15 characters when put into a sockaddr_hci.
introducing a START_NOW command. This command does not send
and GET_IFINDEX message downstream (to wait for the response from
the ETHERNET node), but directly starts the sending process. This allows
one to generate traffic as input for any hook on any node.
a new bpf_mtap2 routine that does the right thing for an mbuf
and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
when prepending the address family (several places were assuming
sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
mbufs have been eliminated; this may better be moved to the bpf
routines
Reviewed by: arch@ and several others
which means "always stay in the standard mode of PPPoE operation
regardless of any junk floating around."
As the referenced PR stated clearly, the old default setting of 0
was extremely dangerous because it opened a possibility for a
spurious frame not only to put down a single PPPoE node running
FreeBSD, but to plague *every* FreeBSD node in a PPPoE network in
such a way that those nodes would keep poisoning each other until
rebooted simultaneously.
PR: kern/47920
Reviewed by: Gleb Smirnoff <glebius <at> cell.sick.ru>
MFC after: 1 week
nonstandard. They differ in the values of certain fields in
the PPPoE frame. Previously, ng_pppoe would start in standard
mode, yet switch to nonstandard one upon reception of a single
nonstandard frame. After having done so, ng_pppoe would be unable
to interact with standard PPPoE peers. Thus, a DoS condition
existed that could be triggered by a buggy peer or malicious party.
Since few people have expressed their displeasure WRT this problem,
the default operation of ng_pppoe is left untouched for now. However,
a new value for the sysctl net.graph.nonstandard_pppoe is introduced,
-1, which will force ng_pppoe stay in standard mode regardless of any
bogus frames floating around.
PR: kern/47920
Submitted by: Gleb Smirnoff <glebius <at> cell.sick.ru>
MFC after: 1 week
In practice it seems that in situations of high packet loss the ACK
timeout seems to hit this maximum (perhaps inappropriately, but the
estimation algorithm is not perfect, so apparently it happens). In
any case, 10 seconds is way too high a value so lower to 1 second.
MFC after: 3 days
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
defines for these constants that include the trailing NUL byte. These
new constants have SIZ in their name instead of LEN. As soon as all
consumers in the tree are converted to use the new defines the old
defines will be put under BURN_BRIDGES.
Reviewed by: archie, julian, ru
Approved by: re (in principle)
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
conservative lock. The problem with the lock-less algorithm is that
it suffers from the ABA problem. Running an application with funnels
a couple of 100kpkts/s through the netgraph system on a dual CPU system
with MPSAFE drivers will panic almost immediatly with the old algorithm.
It may be possible to eliminate the contention between threads that insert
free items into the list and those that get free items by using the
Michael/Scott queue algorithm that has two locks.
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.
If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.
Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.
The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.