a new bpf_mtap2 routine that does the right thing for an mbuf
and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
when prepending the address family (several places were assuming
sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
mbufs have been eliminated; this may better be moved to the bpf
routines
Reviewed by: arch@ and several others
which means "always stay in the standard mode of PPPoE operation
regardless of any junk floating around."
As the referenced PR stated clearly, the old default setting of 0
was extremely dangerous because it opened a possibility for a
spurious frame not only to put down a single PPPoE node running
FreeBSD, but to plague *every* FreeBSD node in a PPPoE network in
such a way that those nodes would keep poisoning each other until
rebooted simultaneously.
PR: kern/47920
Reviewed by: Gleb Smirnoff <glebius <at> cell.sick.ru>
MFC after: 1 week
nonstandard. They differ in the values of certain fields in
the PPPoE frame. Previously, ng_pppoe would start in standard
mode, yet switch to nonstandard one upon reception of a single
nonstandard frame. After having done so, ng_pppoe would be unable
to interact with standard PPPoE peers. Thus, a DoS condition
existed that could be triggered by a buggy peer or malicious party.
Since few people have expressed their displeasure WRT this problem,
the default operation of ng_pppoe is left untouched for now. However,
a new value for the sysctl net.graph.nonstandard_pppoe is introduced,
-1, which will force ng_pppoe stay in standard mode regardless of any
bogus frames floating around.
PR: kern/47920
Submitted by: Gleb Smirnoff <glebius <at> cell.sick.ru>
MFC after: 1 week
In practice it seems that in situations of high packet loss the ACK
timeout seems to hit this maximum (perhaps inappropriately, but the
estimation algorithm is not perfect, so apparently it happens). In
any case, 10 seconds is way too high a value so lower to 1 second.
MFC after: 3 days
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
defines for these constants that include the trailing NUL byte. These
new constants have SIZ in their name instead of LEN. As soon as all
consumers in the tree are converted to use the new defines the old
defines will be put under BURN_BRIDGES.
Reviewed by: archie, julian, ru
Approved by: re (in principle)
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
conservative lock. The problem with the lock-less algorithm is that
it suffers from the ABA problem. Running an application with funnels
a couple of 100kpkts/s through the netgraph system on a dual CPU system
with MPSAFE drivers will panic almost immediatly with the old algorithm.
It may be possible to eliminate the contention between threads that insert
free items into the list and those that get free items by using the
Michael/Scott queue algorithm that has two locks.
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.
If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.
Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.
The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.
messages are forwarded as netgraph control messages to the node
that is connected to the manage hook. If that hook is not connected,
the event is lost. Flow control events are converted to netgraph
flow control messages and send along the hook that is connected to
the flow controlled VC. ACR change events are converted to control
messages and sent along the hook for the given VC.
in the case where the bridge node was closed down but a timeout
still applied to it, the final reference to the node was freeing the private
data structure using the wrong malloc type.
Approved by: re@
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.
Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
queue items that can be allocated by netgraph and the number of free queue
items that are cached on a private list.
Netgraph places an upper limit on the number of queue items it may allocate.
When there is a large number of netgraph messages travelling through the
system (100k/sec and more) there is a high probability, that messages get
queued at the nodes and netgraph runs out of queue items. In this case the data
flow through netgraph gets blocked. The tuneable for the number of free
items lets one trade memory for performance.
The tunables are also available as read-only sysctls.
PR: kern/47393
Reviewed by: julian
Approved by: jake (mentor)
- Make transmission of packets work again. This stopped working because
ether_ifattach() was forcing ifp->if_output to be ether_output() and
clobbering our attempt to override this vector with a pointer to
ng_fec_output(). Move the overriding of ifp->if_output to after
ether_ifattach().
- Abandon the use of the netgraph ng_ether_input_p hook for snagging
incoming frames, and instead override the ifp->if_input vector for
interfaces that have been aggregated into our bundle. (I would have
loved to have written things this way in the first place, but I
didn't want to have to be the one to implement the if_input hook
and change all the drivers.) This avoids collisions with the ng_ether
module, which uses the same hook. Each aggregated device now calls
ng_fec_input() directly, which then fakes up the rcvif pointer
before invoking ifp->if_input itself.
This module should actually work now.
changed since this code was written:
- The ng_ether_input_p hook only accepts two arguments now: the pointer
to the ether header structure is gone.
- It's no longer necessary to cons up a fake ether header before passing
incoming packets to BPF_MTAP().
ng_fec_input() has been modified to account for these two changes.
Running tcpdump on fec0 should work now.
PR: kern/46720
turns runs its tasks free of Giant too. It is intended that as drivers
become locked down, they will move out of the old, Giant-bound taskqueue
and into this new one. The old taskqueue has been renamed to
taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.
we have the rc4 code already in the kernel (via wlan stuff or awi).
Add a dependency on the rc4 module so if it doesn't exist then load it.
Reviewed by: archie
pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed
to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
flag (and only one of those two).
Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)
(a) Save control message return address only if NGM_MPPC_CONFIG_DECOMP
(b) Properly count the number of required re-key operations
when we loose synchronization and have to resync
MFC after: 3 days
1) "ubt" driver did not work when system is booted with the device attached
2) missing "break;" in ubt_rcvmsg() function;
Submitted by: Maksim Yevmenkin <Maksim.Yevmenkin@cw.com>
Approved by: re (jhb)
Has been seen to work on several cards and communicating with
several mobile phones to use them as modems etc.
We are still talking with 3com to try get them to allow us to include
the firmware for their pccard in the driver but the driver is here..
In the mean time
it can be downloaded from the 3com website and loaded using the utility
bt3cfw(8) (supplied) (instructions in the man page)
Not yet linked to the build
Submitted by: Maksim Yevmenkin <myevmenk@exodus.net>
Approved by: re
TAILQ_FIRST(&ifp->if_addrhead) to find the link layer ifaddr.
(it's always first I believe)
Allows this to compile on -current.
.. need testers with FEC capable switches..
This is NOT YET CONVERTED TO -current.
This node is a source for preprogrammed packets at a known rate for testing.
I will convert it to -current "in place" but will MFC teh original
pre-conversion variant as that is what is originally submitted.
Man page my me, info from Dave's README.
Submitted by: Dave Chapeskie <dchapeskie@SANDVINE.com>
Obtained from: Sandvine inc.
MFC after: 1 week
These are really only partly netgraph nodes as they do not use the
netgraph interfaces for many of the functions for which they could
be used, however they represent important functionality.
Submitted by: wpaul
MFC after: 2 days
linker_load_module() instead.
This fixes a bug where the kernel was unable to properly locate and
load a kernel module in vfs_mount() (and probably in the netgraph
code as well since it was using the same function). This is because
the linker_load_file() does not properly search the module path.
Problem found by: peter
Reviewed by: peter
Thanks to: peter
so that /dev/mumble can be the entrypoint to some networking graph,
e.g. a tunnel or a remote tape drive or whatever...
Not fully tested (by me) yet.
Submitted by: Mark Santcroos <marks@ripe.net>
MFC after: 3 weeks
Eliminate some of the unnecessary complexity of ng_ether_glueback_header().
Simplify two functions a bit by doing the NG_FREE_META(meta) earlier.
Reviewed by: julian, brian
MFC after: 1 week
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count
- so_options
- so_linger
- so_state
o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:
- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()
Reviewed by: alfred
Requested by: bde
Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.
While I am here, sort include files alphabetically, where possible.
exhausting the kernel timeout table. Perform the usual gymnastics to
avoid race conditions between node shutdown and timeouts occurring.
Also fix a bug in handling ack delays < PPTP_MIN_ACK_DELAY. Before,
we were ack'ing immediately. Instead, just impose a minimum ack delay
time, like the name of the macro implies.
MFC after: 1 week
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.
Tested on: alpha, i386
Reviewed by: bde, jake, tmm
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
a particular Ethernet interface will actually be delivered by (only) that
device driver. This is not necessarily true when ng_ether(4) is used.
To word around this, while a ng_ether(4)'s "upper" hook is connected,
turn off all hardware checksum, fragmentation, etc., features for that
interface.
PR: kern/31586
MFC after: 1 week
'struct ng_ksocket_sockopt') like to peek into the ng_mesg header for
information. Make sure when generating default values that we provide
a valid header to peek into.
MFC after: 1 week
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and
adapting it for KSE.
Locks:
1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.
1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp);
/* increments reference count on a file */
struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */
struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
socreate(), rather than getting it implicitly from the thread
argument.
o Make NFS cache the credential provided at mount-time, and use
the cached credential (nfsmount->nm_cred) when making calls to
socreate() on initially connecting, or reconnecting the socket.
This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well
as bugs involving NFS and mandatory access control implementations.
Reviewed by: freebsd-arch
hooks depending on ethertype. Great for prototyping protocols.
connects to the lower and upper hooks of an ethernet type of node.
Obtained from: Monzoon Networks.
Thanks to Andre Oppermann, May 2001.
1) Allow the sending of more than one control message at a time
over a unix domain socket. This should cover the PR 29499.
2) This requires that unp_{ex,in}ternalize and unp_scan understand
mbufs with more than one control message at a time.
3) Internalize and externalize used to work on the mbuf in-place.
This made life quite complicated and the code for sizeof(int) <
sizeof(file *) could end up doing the wrong thing. The patch always
create a new mbuf/cluster now. This resulted in the change of the
prototype for the domain externalise function.
4) You can now send SCM_TIMESTAMP messages.
5) Always use CMSG_DATA(cm) to determine the start where the data
in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1)
in some places, which gives the wrong alignment on the alpha.
(NetBSD made this fix some time ago).
This results in an ABI change for discriptor passing and creds
passing on the alpha. (Probably on the IA64 and Spare ports too).
6) Fix userland programs to use CMSG_* macros too.
7) Be more careful about freeing mbufs containing (file *)s.
This is made possible by the prototype change of externalise.
PR: 29499
MFC after: 6 weeks
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
allow an in-kernel webserver (or similar) to accept
and handle incoming connections using netgraph without ever leaving the
kernel. (allows incoming tunnel requests to be
handled totally within the kernel for example)
Needs work, but shouldn't break existing functionality.
Submitted by: John Polstra <jdp@polstra.com>
MFC after: 2 weeks
LISTENed for, return EEXISTS.
Only match the magic "*" service tag if no other LISTEN service tags
match.
Require an explicit LISTEN for an empty service tag in order to match
empty service requests.
Approved by: julian
MFC after: 3 days
improved readability. The two real functional changes are that
netgraph now sees this as the "split" node type rather then the
"ng_split" node type and that meta data is passed through without
processing rather then being dropped.
Reviewed by: jhb, julian
MFC after: 7 weeks
machines. The code formerly read:
long val;
if (val < (long)-0x80000000 || ...)
return EINVAL;
The constant 0x80000000 has type unsigned int. The unary `-'
operator does not change the type (or the value, in this case).
Therefore the promotion to long is done by 0-extension, giving
0x0000000080000000 instead of the desired 0xffffffff80000000. I
got rid of the `-' and changed the cast to (int32_t) to give proper
sign-extension on all architectures and to better reflect the fact
that we are range-checking a 32-bit value.
This commit also makes the analogous changes to ng_int{8,16}_parse
for consistency.
MFC after: 3 days
Change a prototype.
Add a function version of ng_ref_node() when debugging so
a breakpoint can be set on it.
ng_base.c:
add 'node' as an argument to ng_apply_item so that it is up
to the caller to take over and release the item's reference on
the node. If the release reports back that the node went away
due to the reference going to 0, the caller should cease referencing
the now defunct node. (e.g. the item was a 'kill node' message).
Alter ng_unref_node to report back the residual references as a result.
ng_pptpgre.c:
Don't reference a node after we dropped a reference to it.
(What if it was the last?)
Fixes a node leak reported by Harti Brandt <brandt@fokus.gmd.de>
which was due to an incorrect earlier attempt to fix the
"accessing node after dropping the last reference" problem.
ehternet frames to a netgraph hook.
Submitted by: "Vitaly V. Belekhov" <vitaly@riss-telecom.ru>
translated to 5.0 by me. man page not yet written.
This node still needs a little work.. don't use yet. Not yet linked into
the build.
and add a sysctl to pppoe to activate non standard ethertypes
so that idiot ISPs (apparently in France) who use
equipment from idiot suppliers (rumour says 3com)
who use nonstandard ethertypes can still connect.
"yep, sure we do pppoe, we use a different identifier to that dictated in
the standard, but sure it's pppoe!"
sysctl -w net.graph.stupid_isp=1 enables the changeover.
packet flow into two unidirectional flows.
Part of a suite of nodes developed for packet flow control.
More to follow as I have time to port them to 5.x or
as others do so. The ipfw node will be the hardest..
Submitted by: "Vitaly V. Belekhov" <vitaly@riss-telecom.ru>
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
also try implement teh documented behaviour in socket nodes
so that when there is only one hook, an unaddressed write/send
will DTRT and send the data to that hook.
(e.g. ethernet nodes are persistent until you rip out the hardware)
Use this support in the ethernet and sample nodes.
Add some more abstraction on the 'item's so that node and
hook reference counting can be checked easier.
Slight man page correction.
Make pppoe type dependent on ethernet type.
Clean up node shutdown a little.
Move a mutex from MTX_SPIN to MTX_DEF (oops)
Fix small ref-counting bug.
remove warning on one2many type.
The new method is 'flood' (in addition to the old round-robin)
in which incoming packets are sent to more than one outgoing hook.
(I'm not sure what Rogier is using this for but it seems generally useful
and isn't much extra)
Submitted by: Rogier R. Mulhuijzen (drwilco@drwilco.net )
from a node, but does it via the locking queue, thus ensuring that the
node is locked when it's hook is removed.
Add 'deadnode' and 'deadhook' structures for when a node or hook is
invalidated but not yet freed. (not yet freed)
This version is functional and is aproaching solid..
notice I said APROACHING. There are many node types I cannot test
I have tested: echo hole ppp socket vjc iface tee bpf async tty
The rest compile and "Look" right. More changes to follow.
DEBUGGING is enabled in this code to help if people have problems.
format version number. (userland programs should not need to be
recompiled when the netgraph kernel internal ABI is changed.
Also fix modules that don;t handle the fact that a caller may not supply
a return message pointer. (benign at the moment because the calling code
checks, but that will change)