Commit Graph

2692 Commits

Author SHA1 Message Date
marius
278d761d73 o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control
support in mii(4):
  - Merge generic flow control advertisement (which can be enabled by
    passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
    NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
    IFM_FLOW isn't implemented as a global option via the "don't care
    mask" but instead as a media specific option this. This has the
    following advantages:
    o allows flow control advertisement with autonegotiation to be
      turned on and off via ifconfig(8) with the default typically
      being off (though MIIF_FORCEPAUSE has been added causing flow
      control to be always advertised, allowing to easily MFC this
      changes for drivers that previously used home-grown support for
      flow control that behaved that way without breaking POLA)
    o allows to deal with PHY drivers where flow control advertisement
      with manual selection doesn't work or at least isn't implemented,
      like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
      by setting MIIF_NOMANPAUSE
    o the available combinations of media options are readily available
      from the `ifconfig -m` output
  - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
    and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
    these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
  - Change IFM_ETH_MASTER from being implemented as a global option via
    the "don't care mask" to a media specific one as it actually is only
    applicable to IFM_1000_T to date.
  - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
    actually configure manually selected slave mode (like we also do in
    the PHY specific implementations).
  - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
    is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
  e1000phy(4) and ip1000phy(4) to use the generic flow control support
  instead of home-grown solutions via IFM_FLAGs. This includes changing
  these PHY drivers and smcphy(4) to no longer unconditionally advertise
  support for flow control but only if the selected media has IFM_FLOW
  set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
  i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
  set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
  and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
  the contents of the BMSR via mii_phy_add_media() instead of hardcoding
  them. The latter approach seems to have developed historically, besides
  causing unnecessary code duplication it was also undesirable because
  brgphy_mii_phy_auto() already based the capability advertisement on the
  contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
  just BCM5701. Apparently this was a misinterpretation of a workaround
  in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
  BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
  this doesn't mean we can't set these as well on other PHYs for manual
  media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
  IFM_1000_T master mode support now is generally available with all PHY
  drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
  not applicable there.

Reviewed by:	yongari (plus additional testing)
Obtained from:	NetBSD (partially), OpenBSD (partially)
MFC after:	2 weeks
2010-11-14 13:26:10 +00:00
kib
1889bb0afa Use 'z' modifier for size_t printing. 2010-11-13 11:11:51 +00:00
dim
7b0aabca30 Similar to r212647, remove the workaround in sys/net/vnet.h for an ld
bug (incorrect placement of __start_SECNAME in some cases) that was
fixed in r210245.

There is already an UPDATING entry about needing a recent ld.

MFC after:	1 month
2010-11-12 22:59:50 +00:00
gnn
c3225b5eaa Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
dim
2c7257916f Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.

Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.

Reviewed by:	kib
2010-11-11 19:18:52 +00:00
rpaulo
2631ae0f3d Sync DLTs with the latest pcap version. 2010-10-29 18:41:09 +00:00
bz
491af1942e Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1).  Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by:	rwatson (1) a while back.
Approved by:	thompsa (2) for the change from r204279.
MFC after:	6 days
2010-10-25 08:30:19 +00:00
pluknet
8950ed8036 Reshuffle SIOCGIFCONF32 handler from r155224.
- move all the chunks into one file, which allows to hide SIOCGIFCONF32
  global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
  doing internal 32bit specific changes within it (e.g. as it's done for
  SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by:	kib (mentor)
MFC after:	2 weeks
2010-10-21 16:20:48 +00:00
bz
753b9262ae Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.

PR:		kern/146250
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	1 week
2010-10-16 19:25:27 +00:00
bz
e7e5079137 lltable_drain() has never been used so far, thus #if 0 it for now.
While touching it add the missing locking to the now disabled code
for the time when we'll resurrect it.

MFC after:	3 days
2010-10-16 18:42:09 +00:00
bz
054f4fff23 Only hide the ifa and not the tp under #ifdef INET as the tp is needed
for locking evenwhen there is no INET.

MFC after:	3 days
2010-10-01 15:14:14 +00:00
jhb
b8914d3479 - Expand scope of tun/tap softc locks to cover more softc fields and
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
  rather than using the global knote list lock.  Also, use the softc
  for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
  This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
  where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4).  tun(4)
  already did this.
- Remove remaining spl calls.

Submitted by:	Marcin Cieslak  saper of saper|info (3)
MFC after:	2 weeks
2010-09-22 21:02:43 +00:00
jkim
4b1a6e6702 Fix a typo in a comment.
Submitted by:	afiveg
2010-09-16 18:37:33 +00:00
mdf
ab3a8b533a Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
bz
06977c291c MFp4 CH=183259:
No reason to use if_free_type() as we don't change our type.
  Just if_free() is fine.

MFC after:	3 days
2010-09-02 16:11:12 +00:00
emaste
a9a1b47f1d Add a sysctl knob to accept input packets on any link in a failover lagg. 2010-09-01 16:53:38 +00:00
bz
70761a9ea6 MFp4 CH=182972:
Add explicit linkstate UP/DOWN for the epair.  This is needed by carp(4)
and other things to work.

MFC after:	5 days
2010-08-27 23:22:58 +00:00
rpaulo
ea11ba6788 Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
zec
e1e5264fc5 When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by:	julian
MFC after:	3 days
2010-08-13 18:17:32 +00:00
will
d548943ae9 Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
will
aa4e762c4a Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
jhb
4ab164bfd3 Adjust the interface type in the link layer socket address for vlan(4)
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER).  The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)).  Now they
should all report IFT_L2VLAN.

Reviewed by:	brooks
MFC after:	1 month
2010-08-06 15:15:26 +00:00
kib
c5df0ad3b8 Properly set ifi_datalen for compat32 struct if_data32.
PR:	kern/149240
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 15:40:42 +00:00
glebius
b89adb17c9 Don't check malloc(M_WAITOK) result. 2010-07-27 11:56:49 +00:00
bz
0078d05705 Return NULL rather than 0 for a pointer.
MFC after:	3 days
2010-07-27 11:54:01 +00:00
glebius
43f917d899 When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.

This lead to losing LLE_PUB and/or LLE_PROXY flags.

We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.

PR:		kern/148784, kern/146539
Silence from:	qingli, 5 days
2010-07-27 10:05:27 +00:00
jkim
9fdab3dd15 Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock
regardless of the given flags.

MFC after:	3 days
2010-07-22 18:44:40 +00:00
luigi
bbbbe1a022 whitespace cleanup 2010-07-15 14:41:59 +00:00
luigi
9f2bcd601a small portability fix to build on linux/windows 2010-07-15 14:41:06 +00:00
jkim
14f08fd627 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
zec
66c3a596d7 Provide a macro for registering a virtualized sysctl handler for
VNET opaque data.

MFC after:	30 days
2010-06-02 15:29:21 +00:00
qingli
f6ab4a6810 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
jhb
7df2d528cf Ignore failures from removing multicast addresses from the parent (trunk)
interface when tearing down a vlan interface.  If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked.  This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.

In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface.  In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever.  In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.

Silence from:	net@
MFC after:	4 days
2010-05-17 19:36:56 +00:00
kmacy
a68dd336d5 allocate ipv6 flows from the ipv6 flow zone
reported by: rrs@

MFC after:	3 days
2010-05-16 21:48:39 +00:00
bz
c9d1ca826b Fix an issue with the dynamic pcpu/vnet data allocators.
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous.  For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with:	jeff, rwatson (briefly)
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-05-14 21:11:58 +00:00
kmacy
5e73b28c4d workaround bug with ipv6 where a flow can have a null rtentry 2010-05-12 04:51:20 +00:00
alc
e6c77ecaea Remove page queues locking from all sf_buf_mext()-like functions. The page
lock now suffices.

Fix a couple nearby style violations.
2010-05-06 17:43:41 +00:00
alc
c9aaa1e2a2 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
sobomax
213eac1f2c Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after:	1 month
2010-05-03 07:32:50 +00:00
alc
387e15c45a This is the first step in transitioning responsibility for synchronizing
access to the page's wire_count from the page queues lock to the page lock.

Submitted by:	kmacy
2010-05-03 05:41:50 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
bz
0a90ef1728 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
kmacy
e04522d342 need to initialize the lock before it is used
MFC after:	3 days
2010-04-27 23:48:50 +00:00
bz
7fcd42cfee MFP4: @177254
Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-04-27 15:16:54 +00:00
kib
2861e20e63 Provide compat32 shims for bpf(4), except zero-copy facilities.
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by:	jhb (version for 6.x)
Reviewed and tested by:	emaste
MFC after:	2 weeks
2010-04-25 16:43:41 +00:00
kib
1bf8e84d75 Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.
This allows getifaddrs(3) to work for compat32 binaries.

Submitted by:	jhb (6.x version)
Reviewed by:	emaste
Tested by:	emaste and <pluknet gmail com>
MFC after:	2 weeks
2010-04-25 16:42:47 +00:00
julian
e5e811b1e9 Move two copies of the same definition to a common include file.
MFC after: 3 weeks
2010-04-14 23:06:07 +00:00
delphij
58be1647f8 When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland.  However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by:	bschmidt
MFC after:	2 weeks
2010-04-14 22:02:19 +00:00
bz
778f026aa2 Take a reference to make sure that the interface cannot go away during
if_clone_destroy() in case parallel threads try to.

PR:		kern/116837
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	10 days
2010-04-11 18:47:38 +00:00