before entering ng_netflow. In this case it will have not NULL m_pkthdr.rcvif.
However, it will enter ng_iface soon with another index. So let in_ifIndex
value configured by user override m_pkthdr.rcvif.
Reported by: Damir Bikmuhametov
MFC after: 1 week
so we need to acquire Giant in netgraph methods, so that we don't
race with line discipline methods. Remove NET_NEEDS_GIANT.
- Packets coming into node from netgraph are queued in ifqueue
attached to node private data.
- Mutex in struct ifqueue is used to lock not only the queue, but
the whole private data, and tp->t_lsc field.
- tp->t_lsc pointer is used to indicate whether line discipline is
attached to netgraph or not.
- Use FLG_DIE flag to indicate that node may be destroyed.
(This protection doesn't work, and it didn't before. Must be redesigned.)
- Increment ngt_unit atomically, removing mutex.
- Acquire Giant, when executing ngt_start() from netgraph context.
- Acquire Giant, when {,de}registering line discipline.
- Uncomment forcing queue mode on peers hook, since this is reasonable.
- Force queue mode on our hook, to avoid acquiring Giant when coming from
network stack. We may already hold some mutexes at this point.
Cleanups:
- Use callout_pending() instead of our own flag.
- Remove spl(9) calls. Now we can use return() instead of ERROUT().
style(9):
- Sort includes.
- Sparse initializer for struct linesw.
- Remove some empty lines, sort declarations.
Reviewed by: julian, phk
MFC after: 1 month
- Use callout_pending() instead of our own flags.
- Remove home-grown protection of node, which has a scheduled
callout().
- Remove spl(9) calls.
Tested by: bz
This is just a workaround for a know problem with Motorola E1000
phone. Something is wrong with the configuration of L2CAP/RFCOMM
channel. Even though we set L2CAP MTU to 132 bytes (default RFCOMM
MTU 127 + 5 bytes RFCOMM frame header) and the phone accepts it,
the phone still sends oversized L2CAP packets. It appears that the
phone wants to use bigger (667 bytes) RFCOMM frames, but it does
not segment them according to the configured L2CAP MTU. The 667
bytes RFCOMM frame size corresponds to the default L2CAP MTU of
672 bytes (667 + 5 bytes RFCOMM frame header).
This problem only appears if connection was initiated from the
phone. I'm not sure who is at fault here, so for now just put
workaround in place. Quick look at the spec did not reveal any
anwser.
Tested by: Jes < jjess at freebsd dot polarhome dot com >
MFC after: 3 days
- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
of link state change to a node connected to "lower" hook.
Reviewed by: sam
MFC after: 2 weeks
SI_SUB_INIT_IF but before SI_SUB_DRIVERS. Make Netgraph(4)
framework initialize at SI_SUB_NETGRAPH level.
This does not address the bigger problem: MODULE_DEPEND
does not seem to work when modules are compiled in the
kernel, but it fixes the problem with Netgraph Bluetooth
device drivers reported by a few folks.
PR: i386/69876
Reviewed by: julian, rik, scottl
MFC after: 3 days
- Do not put/remove node references, since this no longer
needed.
- Remove timerActive flag, use callout flags.
- Schedule next callout after doing current one.
Reviewed by: archie
Approved by: julian (mentor)
- Always check that index number passed from userland
is <= NG_NETFLOW_MAXIFACES. [1]
- Increase NG_NETFLOW_MAXIFACES up to 512. [2]
Noticed by: Roman Palagin [1]
Requested by: Yuri Y. Bushmelev [2]
MFC after: 1 week
call net_add_domain(). Calling this function too early (or late) breaks
assertations about the global domains list.
Actually it should be forbidden to call net_add_domain() outside of
SI_SUB_PROTO_DOMAIN completely as there are many places where we traverse
the domains list unprotected, but for now we allow late calls (mostly to
support netgraph). In order to really fix this we have to lock the domains
list in all places or find another way to ensure that we can safely walk the
list while another thread might be adding a new domain.
Spotted by: se
Reviewed by: julian, glebius
PR: kern/73321 (partly)
normal PPP compression, as a workaround for certain (arguably) broken
Linux PPP implementations that can't handle this particular case.
MFC after: 1 week
It means, that node listens to flow control messages from downstreams
and removes link from list of active links whenever a LINK_IS_DOWN message
is received. If LINK_IS_UP message is received, then links is put
back into list of active links.
Approved by: julian (mentor), implicitly
MFC after: 1 week
o Implement some netgraph flow control:
- Whenever status of HDLC heartbeat from pear is timed out,
send NGM_LINK_IS_DOWN message.
- If HDLC link changes status from down to up, send
NGM_LINK_IS_UP message.
Approved by: julian (mentor), implicitly
MFC after: 1 week
out c->c_func, we can't take it after callout_stop(). To take it before
we need to acquire callout_lock, to avoid race. This commit narrows
down area where lock is held, but hack is still present.
This should be redesigned.
Approved by: julian (mentor)
field created for line disciplne drivers private use. Also add NET_NEEDS_GIANT
warning. For whatever reason ng_tty(4) was fixed but ng_h4(4) was not :(
(sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
List of functional changes:
- Make a single device per single node with a single hook.
This gives us parrallelizm, which can't be achieved on a single
node with many devices/hooks. This also gives us flexibility - we
can play with a particular device node, not affecting others.
- Remove read queue as it is. Use struct ifqueue instead. This change
removes a lot of extra memcpy()ing, m_devget()ting and m_copymem()ming.
In ng_device_receivedata() we enqueue an mbuf and wake readers.
In ngdread() we take one mbuf from qeueue and uiomove() it to
userspace. If no mbuf is present we optionally block. [1]
- In ngdwrite() we create an mbuf from uio using m_uiotombuf().
This is faster then uiomove() into buffer, and then m_copydata(),
and this is much better than huge m_pullup().
- Perform locking of device
- Perform locking of connection list.
- Clear out _rcvmsg method, since it does nothing good yet.
- Implement NGM_DEVICE_GET_DEVNAME message.
- #if 0 ioctl method, while nothing is done here yet.
- Return immediately from ngdwrite() if uio_resid == 0.
List of tidyness changes:
- Introduce device2priv(), to remove cut'n'paste.
- Use MALLOC/FREE, instead of malloc/free.
- Use unit2minor().
- Use UID_ROOT/GID_WHEEL instead of 0/0.
- Define NGD_DEVICE_DEVNAME, use it.
- Use more nice macros for debugging. [2]
- Return Exxx, not -1.
style(9) changes:
- No "#endif" after short block.
- Break long lines.
- Remove extra spaces, add needed spaces.
[1] Obtained from: if_tun.c
[2] Obtained from: ng_pppoe.c
Reviewed by: marks
Approved by: julian (mentor)
MFC after: 1 month
The original idea was to use it for firmware upgrading and similar
operations. In real life almost all Bluetooth USB devices do not
need firmware download. If device does require firmware download
then ugen(4) (or specialized driver like ubtbcmfw(8)) should be
used instead.
MFC after: 3 days
- push all bridge logic from if_ethersubr.c into bridge.c
make bridge_in() return mbuf pointer (or NULL).
- call only bridge_in() from ether_input(), after ng_ether_input()
was optinally called.
- call bridge_in() from ng_ether_rcv_upper().
Long description: http://lists.freebsd.org/mailman/htdig/freebsd-net/2004-May/003881.html
Reported by: Jian-Wei Wang <jwwang at FreeBSD.csie.NCTU.edu.tw>
Tested by: myself, Sergey Lyubka
Reviewed by: sam
Approved by: julian (mentor)
MFC after: 2 months
loss links, and 1 second appeared to be too small for high latency links.
If we will receive more complaints, we should make this parameter configurable.
PR: kern/69536
Approved by: archie, julian (mentor)
MFC after: 3 days
operation using NET_NEEDS_GIANT(). This will result in a boot-time
restoration of Giant-enabled network operation, or run-time warning on
dynamic load (applicable only to the Netgraph component). Additional
components will likely need to be marked with this in the future.
its users.
netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.
Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.
PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days
requires a recompile of netgraph users.
Also change the size of a field in the bluetooth code
that was waiting for the next change that needed recompiles so
it could piggyback its way in.
Submitted by: jdp, maksim
MFC after: 2 days
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files
Add netinet/ip_fw_pfil.c.
conf/options
Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile
Add ip_fw_pfil.c.
net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.
netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c
(Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.
netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)
netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.
netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.
Approved by: re (scottl)
- according to RFC2661 an offset size of 0 is allowed.
- when skipping offset padding do not forget to also skip
the 2 octets of the offset size field.
Reviewed by: archie
Approved by: pjd (mentor)
link[n].latency calculated from user supplied value.
This prevents repeated NGM_PPP_SET_CONFIG/NGM_PPP_GET_CONFIG
from failing because of link[n].conf.latency being out of range.
Reviewed by: archie
Approved by: pjd (mentor)
using linker_load_module(). This works OK if NGM_MKPEER message came
from userland and we have process associated with thread. But when
NGM_MKPEER was queued because target node was busy, linker_load_module()
is called from netisr thread leading to panic.
To workaround that we do not load modules by framework, instead ng_socket
loads module (if this is required) before sending NGM_MKPEER.
However, the race condition between return from NgSendMsg() and actual
creation of node still exist and needs to be solved.
PR: kern/62789
Approved by: julian
clients simultaneously. When node is client its mode is configured
with a control message.
sysctl net.graph.nonstandard_pppoe is deprecated but kept for
backward compatibility for some time.
Approved by: julian
Also introduce a macro to be called by persistent nodes to signal their
persistence during shutdown to hide this mechanism from the node author.
Make node flags have a consistent style in naming.
Document the change.
- Return meaningful return errorcodes.
- Free previously allocated connection in error cases.
In ng_device_rcvdata():
- Return meaningful return errorcodes.
- Detach mbuf from netgraph item, and free the item before
doing any other actions that may return from method.
- Do not call strange malloc() for buffer. [1]
- In case of any error jump to end, where mbuf is freed.
In ng_device_disconnect():
- Return meaningful return errorcodes.
- Free disconnected connection.
style(9) in mentioned above functions:
- Remove '/* NGD_DEBUG */', when only one line is ifdef'ed.
- Remove extra braces to easier reading.
- Add space after comma in function calls.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
2. Sort includes, while here.
3. s/NULL/0/ in NG_SEND_MSG_HOOK(), since ng_ID_t is integer.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
is obviously not run a lot. (but is in some test cases).
This code is not usually run because it covers a case that doesn't
happen a lot (removing a node that has data traversing it).
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
we have to revert to TTYDISC which we know will successfully open
rather than try the previous ldisc which might also fail to open.
Do not let ldisc implementations muck about with ->t_line, and remove
code which checks for reopens, it should never happen.
Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation
initialize it in their open routines. Reset to zero when we enter
TTYDISC. ("no" should really be -1 since zero could be a valid
hotchar for certain old european mainframe protocols.)
Thanks to Sam for importing tags in a way that allowed this to be done.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
Also allow the sr and ar drivers to create netgraph versions of their modules.
Document the change to the ksocket node.
- Assert the mutex in NG_IDHASH_FIND() since the mutex is required to
safely walk the node lists in the ng_ID_hash table.
- Acquire the ng_nodelist_mtx when walking ng_allnodes or ng_allhooks
to generate state dump output from the netgraph sysctls.
Only the first link0..link$NLINKS hooks would be utilized, whereas
the link hooks may be connected sparsely.
Add a counter variable so that the link hook array is only traversed
while there is still work to do, but that it continues up to the end
if it has to.
Tweak things so that ng_fec has a chance of working with things
other than ethernet. Use ifp->if_output of the underlying interfaces
and use IF_HANDOFF() rather than depending on ether_output() and
ether_output_frame() explicitly. Also, don't insist that underlying
devices be IFM_ETHER when checking their link states in the link
monitor code.
With these changes, I was able to create a two channel bundle
consisting of one ethernet interface and one 802.11 wireless
device (via ndis). Note that this only works because both devices
use the same if_output vector: ng_fec will not let you bundle
devices with different output vectors together (it really doesn't
make sense to do that).
underlying interfaces rather than using ac_netgraph in struct arpcom.
The latter is meant only for use by ng_ether, and using it breaks
interoperability with the rest of netgraph.
- Lock down low hanging fruit use of sb_flags with socket buffer
lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.
- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to
follow.
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:
SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:
so_qlen so_incqlen so_qstate
so_comp so_incomp so_list
so_head
While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.
While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and address
lock order concerns. In particular:
- Reorganize the optimistic concurrency behavior in accept1() to
always allocate a file descriptor with falloc() so that if we do
find a socket, we don't have to encounter the "Oh, there wasn't
a socket" race that can occur if falloc() sleeps in the current
code, which broke inbound accept() ordering, not to mention
requiring backing out socket state changes in a way that raced
with the protocol level. We may want to add a lockless read of
the queue state if polling of empty queues proves to be important
to optimize.
- In accept1(), soref() the socket while holding the accept lock
so that the socket cannot be free'd in a race with the protocol
layer. Likewise in netgraph equivilents of the accept1() code.
- In sonewconn(), loop waiting for the queue to be small enough to
insert our new socket once we've committed to inserting it, or
races can occur that cause the incomplete socket queue to
overfill. In the previously implementation, it was sufficient
to simply tested once since calling soabort() didn't release
synchronization permitting another thread to insert a socket as
we discard a previous one.
- In soclose()/sofree()/et al, it is the responsibility of the
caller to remove a socket from the incomplete connection queue
before calling soabort(), which prevents soabort() from having
to walk into the accept socket to release the socket from its
queue, and avoids races when releasing the accept mutex to enter
soabort(), permitting soabort() to avoid lock ordering issues
with the caller.
- Generally cluster accept queue related operations together
throughout these functions in order to facilitate locking.
Annotate new locking in socketvar.h.
the socket is on an accept queue of a listen socket. This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
behaviour lost in the change from 4.x style netgraph tee nodes.
Alter the tee node to use the new method. Document the behaviour.
Step the ABI version number... old netgraph klds will refuse to load.
Better than just crashing.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
state. Apparently it happens when both devices try to disconnect RFCOMM
multiplexor channel at the same time.
The scenario is as follows:
- local device initiates RFCOMM connection to the remote device. This
creates both RFCOMM multiplexor channel and data channel;
- remote device terminates RFCOMM data channel (inactivity timeout);
- local device acknowledges RFCOMM data channel termination. Because
there is no more active data channels and local device has initiated
connection it terminates RFCOMM multiplexor channel;
- remote device does not acknowledges RFCOMM multiplexor channel
termination. Instead it sends its own request to terminate RFCOMM
multiplexor channel. Even though local device acknowledges RFCOMM
multiplexor channel termination the remote device still keeps
L2CAP connection open.
Because of hanging RFCOMM multiplexor channel subsequent RFCOMM
connections between local and remote devices will fail.
Reported by: Johann Hugo <jhugo@icomtek.csir.co.za>