Commit Graph

3074 Commits

Author SHA1 Message Date
hselasky
9fcf944d2a MFC r274376:
Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by:	Mellanox Technologies
2014-11-19 09:03:12 +00:00
kib
e4b2ee7e2b Merge the fueword(9) and casueword(9). In particular,
MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.

MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.

MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used.  The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.

MFC r273788 (by jkim):
Actually install casuword(9) to fix build.

MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).
2014-11-18 12:53:32 +00:00
hselasky
fa183f0174 MFC r271946 and r272595:
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.

Sponsored by:	Mellanox Technologies
2014-11-03 12:38:29 +00:00
ae
33d2961d9a MFC r272770:
When tunneling interface is going to insert mbuf into netisr queue after stripping
  outer header, consider it as new packet and clear the protocols flags.

  This fixes problems when IPSEC traffic goes through various tunnels and router
  doesn't send ICMP/ICMPv6 errors.

PR:		174602
Sponsored by:	Yandex LLC
2014-10-30 13:53:57 +00:00
hselasky
1d17f744c7 MFC r273733, r273740 and r273773:
The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

Sponsored by:	Mellanox Technologies
2014-10-30 08:04:48 +00:00
hselasky
1f41d295fb MFC r263710, r273377, r273378, r273423 and r273455:
- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by:	Mellanox Technologies
2014-10-27 14:38:00 +00:00
glebius
9ea3e68626 Merge r272385 by melifaro from head:
Free radix mask entries on main radix destroy.
  This is temporary commit to be merged to 10.
  Other approach (like hash table) should be used
  to store different masks.

PR:             194078
2014-10-16 20:46:02 +00:00
ae
f7ad542948 MFC r272176:
Keep list of lagg ports sorted by if_index.
2014-10-07 07:52:47 +00:00
asomers
f906790c87 MFC r265232
Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed.
The thread that is destroying the lagg has already set sc->sc_psc=NULL when
the "ifconfig -am" thread gets to lacp_req().  It tries to dereference
sc->sc_psc and panics.  The solution is for lacp_req() to check the value of
sc->sc_psc.  If NULL, harmlessly return an lacp_opreq structure full of
zeros.  Full details in GNATS.

PR:	189003
2014-10-06 23:17:01 +00:00
glebius
3722b178a3 Merge r269998 from head:
- Count global pf(4) statistics in counter(9).
  - Do not count global number of states and of src_nodes,
    use uma_zone_get_cur() to obtain values.
  - Struct pf_status becomes merely an ioctl API structure,
    and moves to netpfil/pf/pf.h with its constants.
  - V_pf_status is now of type struct pf_kstatus.

  Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
  Sponsored by: InnoGames GmbH
2014-08-25 15:40:37 +00:00
np
c11c6b7951 Update a couple of header files that were missed in r270252. This is a
direct commit to stable/10.

Submitted by:	luigi
2014-08-21 19:42:03 +00:00
mav
0959ad1632 MFC r269492:
Improve locking of multicast addresses in VLAN and LAGG interfaces.

This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.
2014-08-18 15:54:35 +00:00
kevlo
f112206e5a MFC r268787:
Deprecate m_act.  Use m_nextpkt always.
2014-07-24 06:02:03 +00:00
tuexen
493873a6ef MFC r264241:
Call sctp_addr_change() from rt_addrmsg() instead of rt_newaddrmsg_fib(),
since rt_addrmsg() gets also called from other functions.
2014-06-22 16:36:14 +00:00
luigi
2472187c4f MFC 267168:
misc bugfixes:
- stdio.h is needed for fprint()
- make memsize uint32_t to avoid errors due to overflow
- honor the *XPOLL flagg in NIOCREGIF requests
- mmap fails wit MAP_FAILED, not NULL.
2014-06-09 15:16:17 +00:00
luigi
34919b06cf MFC 267167: whitespace changes (comments) 2014-06-09 15:15:08 +00:00
asomers
322a1ee4a0 MFC r264887
Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
        In rtinit1, use the interface fib instead of the process fib.  The
        latter wasn't very useful because ifconfig(8) is usually invoked
        with the default process fib.  Changing ifconfig(8) to use setfib(2)
        would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
        Clear the expected ATF failure

sys/net/if.c
        Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
        Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
        in_scrubprefix.  Pass it the interface fib.
2014-06-06 21:45:14 +00:00
asomers
a8aa481895 MFC changes relating to running multiple interfaces on different fibs but
with addresses on the same subnet.

MFC r266860

Fix unintended KBI change from r264905.  Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
        Add legacy-compatible functions as described above.  Ensure legacy
        behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
        Call with _fib() functions if we must use a specific fib, or the
        legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
        Improve the udp_dontroute test.  The bug that this test exercises is
        that ifa_ifwithnet() will return the wrong address, if multiple
        interfaces have addresses on the same subnet but with different
        fibs.  The previous version of the test only considered one possible
        failure mode: that ifa_ifwithnet_fib() might fail to find any
        suitable address at all.  The new version also checks whether
        ifa_ifwithnet_fib() finds the correct address by checking where the
        ARP request goes.

MFC r264917

Style fixes, mostly trailing whitespace elimination.  No functional change.

MFC r264905

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
        Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
        functions will only return an address whose interface fib equals the
        argument.

sys/net/route.c
        Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
        arguments.

sys/netinet/in.c
        Update in_addprefix to consider the interface fib when adding
        prefixes.  This will prevent it from not adding a subnet route when
        one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
        Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
        In some cases it there wasn't a clear specific fib number to use.
        In others, I was unable to test those functions so I chose
        RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
        fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
        Revert r263738.  The udp_dontroute test was right all along.
        However, bugs kern/187550 and kern/187553 cancelled each other out
        when it came to this test.  Because of kern/187553, ifa_ifwithnet
        searched the default fib instead of the requested one, but because
        of kern/187550, there was an applicable subnet route on the default
        fib.  The new test added in r263738 doesn't work right, however.  I
        can verify with dtrace that ifa_ifwithnet returned the wrong address
        before I applied this commit, but route(8) miraculously found the
        correct interface to use anyway.  I don't know how.

        Clear expected failure messages for kern/187550 and kern/187552.

MFC r263738

tests/sys/netinet/Makefile
tests/sys/netinet/fibs.sh
        Replace fibs:udp_dontroute with fibs:src_addr_selection_by_subnet.
        The original test was poorly written; it was actually testing
        kern/167947 instead of the desired kern/187553.  The root cause of the
        bug is that ifa_ifwithnet did not have a fib argument.  The new test
        more directly targets that behavior.

tests/sys/netinet/udp_dontroute.c
        Delete the auxilliary binary used by the old test
2014-06-06 20:35:40 +00:00
melifaro
aaa6b80bb3 Merge 260488, r260508.
r260488:
  Split rt_newaddrmsg_fib() into two different functions.
  Adding/deleting interface addresses involves access to 3 different subsystems,
  int different parts of code. Each call can fail, so reporting successful
  operation by rtsock in the middle of the process error-prone.

  Further split routing notification API and actual rtsock calls via creating
  public-available rt_addrmsg() / rt_routemsg() functions with "private"
  rtsock_* backend.

r260508:
  Simplify inet alias handling code: if we're adding/removing alias which
  has the same prefix as some other alias on the same interface, use
  newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

  This eliminates the following rtsock messages:

  Pinned RTM_ADD for prefix (for alias addition).
  Pinned RTM_DELETE for prefix (for alias withdrawal).

  Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

  before commit, addition:

    got message of size 116 on Fri Jan 10 14:13:15 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

    got message of size 192 on Fri Jan 10 14:13:15 2014
    RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

  after commit, addition:

    got message of size 116 on Fri Jan 10 13:56:26 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

  before commit, wihdrawal:

    got message of size 192 on Fri Jan 10 13:58:59 2014
    RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

    got message of size 116 on Fri Jan 10 13:58:59 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  adter commit, withdrawal:

    got message of size 116 on Fri Jan 10 14:14:11 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
  (and requires some hacks to keep prefix in route table on RTM_DELETE).

  I've tested this change with quagga (no change) and bird (*).

  bird alias handling is already broken in *BSD sysdep code, so nothing
  changes here, too.

  I'm going to MFC this change if there will be no complains about behavior
  change.

  While here, fix some style(9) bugs introduced by r260488
  (pointed by glebius and bde).
2014-05-08 21:03:31 +00:00
melifaro
5ca6003c5c Merge r260379, r260460.
r260379:
  Partially fix IPv4 interface routes deletion in RADIX_MPATH.

  Noticed by:   Nikolay Denev <ndenev at gmail.com>

r260460:
  Constanly use RT_ALL_FIBS everywhere instead of -1.
2014-05-08 20:41:39 +00:00
melifaro
d42ec49fe7 Merge r259528, r259528, r260295.
r259528:
  Simplify contiguous mask checking.

  Suggested by: glebius

r260228:
  Remove useless register variable modifiers.
  Do some more style(9).

r260295:
  Change semantics for rnh_lookup() function: now
  it performs exact match search, regardless of netmask existance.
  This simplifies most of rnh_lookup() consumers.

  Fix panic triggered by deleting non-existent host route.

  PR:           kern/185092
  Submitted by: Nikolay Denev <ndenev at gmail.com>
2014-05-08 20:27:06 +00:00
rmacklem
5bd3f1337e MFC: r264630
For NFS mounts using rsize,wsize=65536 over TSO enabled
network interfaces limited to 32 transmit segments, there
are two known issues.
The more serious one is that for an I/O of slightly less than 64K,
the net device driver prepends an ethernet header, resulting in a
TSO segment slightly larger than 64K. Since m_defrag() copies this
into 33 mbuf clusters, the transmit fails with EFBIG.
A tester indicated observing a similar failure using iSCSI.

The second less critical problem is that the network
device driver must copy the mbuf chain via m_defrag()
(m_collapse() is not sufficient), resulting in measurable overhead.

This patch reduces the default size of if_hw_tsomax
slightly, so that the first issue is avoided.
Fixing the second issue will require a way for the
network device driver to inform tcp_output() that it
is limited to 32 transmit segments.
2014-05-06 02:54:59 +00:00
rmacklem
a54326376a MFC: r264517
Vlan did not set the value of if_hw_tsomax, so when vlan
was stacked on top of a network interface that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface. This patch modifies vlan so that
it sets if_hw_tsomax to the value of the parent interface.
2014-05-06 02:49:31 +00:00
rmacklem
1f951a5c9b MFC: r264469, r264498
Lagg did not set the value of if_hw_tsomax, so when lagg
was stacked on top of network interfaces that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface(s). This patch modifies lagg so that
it sets if_hw_tsomax to the minimum of the value(s) for the
underlying network interfaces.
2014-05-06 02:44:01 +00:00
mm
5b89692b00 MFC r264689:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

PR:		kern/182964
2014-04-27 09:05:34 +00:00
jmmv
d9b0a628da MFC various fixes to the tools/regression/ tests.
- r262953 Fix m4 tests so that they run cleanly with prove.
- r262954 Fix printf tests so that they run cleanly with prove.
- r262959 Fix sed tests so that they run cleanly with prove.
- r262960 Fix yacc tests so that they run cleanly with prove.
- r262961 Fix pkill tests so that they run cleanly with prove.
- r262962 Fix ncal tests so that they run cleanly with prove.
- r263081 Fix lastcomm tests under amd64.
- r263082 Only run the make tests when make is fmake.
- r263083 Fix sa tests.
- r263084 Turn a test precondition into a skip in the mdconfig tests.
- r263085 Make the strerror tests work without libtap.
- r263087 Remove broken tests for eui64_line.
- r263221 Change etcupdate tests to return 1 on test failures.
- r263352 Make the priv test program exit with non-zero if any failures are detected.
- r263353 errx prepends the program name to the message; don't do it by hand.
- r263362 Include strings.h so that bpf_filter.c can be built in userland.
2014-04-14 13:30:08 +00:00
glebius
a25c39725c Merge r263203: garbage collect long time obsoleted (or never used) stuff
from routing API.
2014-04-09 11:15:50 +00:00
glebius
1e3b300892 o Provide a compatibility shim for netstat(1) to obtain output queue
drops via NET_RT_IFLISTL sysctl. The sysctl handler appends oqdrops
  at the end of struct if_msghdrl, and netstat(1) sees that as an
  additional field of struct if_data. This allows us to fetch the data
  keeping ABI and API compatibility.
  This is direct commit to stable/10.

o Merge r263331 from head, to restore printing of queue drops.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-04-03 14:58:52 +00:00
glebius
03fdc2934e Merge r262763, r262767, r262771, r262806 from head:
- Remove rt_metrics_lite and simply put its members into rtentry.
  - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
    removes another cache trashing ++ from packet forwarding path.
  - Create zini/fini methods for the rtentry UMA zone. Via initialize
    mutex and counter in them.
  - Fix reporting of rmx_pksent to routing socket.
  - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
2014-03-21 15:15:30 +00:00
glebius
f937dcf2bd Bulk sync of pf changes from head, in attempt to fixup broken build I
made in r263029.

Merge r257186,257215,257349,259736,261797.

These changesets split pfvar.h into several smaller headers and make
userland utilities to include only some of them.
2014-03-12 10:45:58 +00:00
glebius
71d3a4f585 Merge r261882, r261898, r261937, r262760, r262799:
Once pf became not covered by a single mutex, many counters in it became
  race prone. Some just gather statistics, but some are later used in
  different calculations.

  A real problem was the race provoked underflow of the states_cur counter
  on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
  value is used in pf_state_expires() and any state created by this rule
  is immediately expired.

  Thus, make fields states_cur, states_tot and src_nodes of struct
  pf_rule be counter(9)s.
2014-03-11 15:43:06 +00:00
glebius
7616e36e49 Merge r262770 from head: pacify gcc. 2014-03-05 03:16:23 +00:00
glebius
ed41469327 Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,
r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029,
      r262030, r262162 from head.

  Large flowtable revamp. See commit messages for merged revisions for
  details.

Sponsored by:	Netflix
2014-03-04 15:14:47 +00:00
glebius
352d508b16 Merge r261590: Fixup for r261590 (vnet sysctl handlers cleanup) 2014-03-04 14:05:37 +00:00
glebius
4b9e17c3ef Merge r261590, r261592 from head:
Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
  in the sysctl_root().

  Note: SYSCTL_VNET_* macros can be removed as well. All is
    needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
    But for now keep macros in place to avoid large code churn.
2014-03-04 14:01:12 +00:00
luigi
5bacc3bb87 MFH: sync the netmap code with the one in HEAD
(enhanced VALE switch, netmap pipes, emulated netmap mode).
See details in the log for svn 261909.
2014-02-18 05:01:04 +00:00
gnn
183f607e23 MFC 260207
Convert #defines to enums so that the values are visible in the debugger.

Requested by:	gibbs
2014-02-14 00:26:30 +00:00
glebius
99ea781723 Merge r258478, r258479, r258480, r259719: fixes related to mass source
nodes removal.

PR:		176763
2014-01-22 10:29:15 +00:00
glebius
5da449f113 Merge several fixlets from head:
r257619: Remove unused PFTM_UNTIL_PACKET const.
r257620: Code logic of handling PFTM_PURGE into pf_find_state().
r258475: Don't compare unsigned <= 0.
r258477: Fix off by ones when scanning source nodes hash.
2014-01-22 10:18:25 +00:00
pluknet
9ee780ef73 MFC r258675: Fix build. 2014-01-18 21:57:38 +00:00
avg
c1dbdbde60 MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE 2014-01-17 10:58:59 +00:00
scottl
e03e146ca1 MFC r260070
Multi-queue NIC drivers and multi-port lagg tend to use the same lower
 bits of the flowid as each other, resulting in a poor distribution of
 packets among queues in certain cases.  Work around this by adding a
 set of sysctls for controlling a bit-shift on the flowid when doing
 multi-port aggrigation in lagg and lacp.  By default, lagg/lacp will
 now use bits 16 and higher instead of 0 and higher.

Obtained from:	Netflix
2014-01-02 01:51:54 +00:00
scottl
db06903b84 Merge r256563:
In the flowtable scanner, restart the scan at the last found position,
not at position 0.  Changes the scanner from O(N^2) to O(N).

Reviewed by:    emax
Obtained from:  Netflix
2013-12-30 05:19:27 +00:00
np
2aa8caeed6 MFC r258692 (gnn).
Add constants for use in interrogating various fiber and copper connectors
most often used with network interfaces.

The SFF-8472 standard defines the information that can be retrieved
from an optic or a copper cable plugged into a NIC, most often
referred to as SFP+.  Examples of values that can be read
include the cable vendor's name, part number, date of manufacture
as well as running data such as temperature, voltage and tx
and rx power.

Copious comments on how to use these values with an I2C interface
are given in the header file itself.

Discussed with:	gnn
2013-12-11 00:17:13 +00:00
rodrigc
a19b1d3a58 MFC 258591
In vnet_route_uninit(), free some memory that is allocated in vnet_route_init().

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       see: http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html
            http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021291.html

This doesn't eliminate all the "Freed UMA keg was not empty" warning messages
on the console, but it helps.

Approved by: re (gjb)
2013-12-04 07:55:49 +00:00
ae
95353ffa2c MFC r256689:
Use the same actor key for media types of the same speed.

  PR:		176097

MFC r256832:
  Add a note that lacp_compose_key() should be updated, when new media
  types will be added.

  Submitted by:	melifaro

Approved by:	re (hrs)
2013-11-11 09:47:51 +00:00
melifaro
c259ad5c52 MFC r256624:
Fix long-standing issue with incorrect radix mask calculation.

Usual symptoms are messages like
rn_delete: inconsistent annotation
rn_addmask: mask impossibly already in tree
routing daemon constantly deleting IPv6 default route
or inability to flush/delete particular prefix in ipfw table.

Changes:
* Assume 32 bytes as maximum radix key length
* Remove rn_init()
* Statically allocate rn_ones/rn_zeroes
* Make separate mask tree for each "normal" tree instead of system
global one
* Remove "optimization" on masks reusage and key zeroying
* Change rn_addmask() arguments to accept tree pointer (no users in base)

MFC changes:
* keep rn_init()
* create global mask tree, protected with mutex, for old rn_addmask
users (currently 0 in base)
* Add new rn_addmask_r() function (rn_addmask in head) with additional
argument to accept tree pointer

PR:		kern/182851, kern/169206, kern/135476, kern/134531
Found by:	Slawa Olhovchenkov <slw@zxy.spb.ru>
Reviewed by:	glebius (previous versions)
Sponsored by:	Yandex LLC
Approved by:	re (glebius)
2013-10-29 12:53:23 +00:00
grehan
2f628a2240 MFC r257078
Fix panic in the tap driver when a tap and vmnet interface were
  created after each other e.g.

   ifconfig tap0
   ifconfig vmnet0
   <panic>

  Appears to be a cut'n'paste error from the tap code to the vmnet
  code where the name string wasn't updated in the call to make_dev().

Approved by:  re (glebius)
2013-10-28 22:41:36 +00:00
markm
70d85b1cf3 Merge from project branch via main. Uninteresting commits are trimmed.
Refactor of /dev/random device. Main points include:

* Userland seeding is no longer used. This auto-seeds at boot time
on PC/Desktop setups; this may need some tweeking and intelligence
from those folks setting up embedded boxes, but the work is believed
to be minimal.

* An entropy cache is written to /entropy (even during installation)
and the kernel uses this at next boot.

* An entropy file written to /boot/entropy can be loaded by loader(8)

* Hardware sources such as rdrand are fed into Yarrow, and are no
longer available raw.

------------------------------------------------------------------------
r256240 | des | 2013-10-09 21:14:16 +0100 (Wed, 09 Oct 2013) | 4 lines

Add a RANDOM_RWFILE option and hide the entropy cache code behind it.
Rename YARROW_RNG and FORTUNA_RNG to RANDOM_YARROW and RANDOM_FORTUNA.
Add the RANDOM_* options to LINT.

------------------------------------------------------------------------
r256239 | des | 2013-10-09 21:12:59 +0100 (Wed, 09 Oct 2013) | 2 lines

Define RANDOM_PURE_RNDTEST for rndtest(4).

------------------------------------------------------------------------
r256204 | des | 2013-10-09 18:51:38 +0100 (Wed, 09 Oct 2013) | 2 lines

staticize struct random_hardware_source

------------------------------------------------------------------------
r256203 | markm | 2013-10-09 18:50:36 +0100 (Wed, 09 Oct 2013) | 2 lines

Wrap some policy-rich code in 'if NOTYET' until we can thresh out
what it really needs to do.

------------------------------------------------------------------------
r256184 | des | 2013-10-09 10:13:12 +0100 (Wed, 09 Oct 2013) | 2 lines

Re-add /dev/urandom for compatibility purposes.

------------------------------------------------------------------------
r256182 | des | 2013-10-09 10:11:14 +0100 (Wed, 09 Oct 2013) | 3 lines

Add missing include guards and move the existing ones out of the
implementation namespace.

------------------------------------------------------------------------
r256168 | markm | 2013-10-08 23:14:07 +0100 (Tue, 08 Oct 2013) | 10 lines

Fix some just-noticed problems:

o Allow this to work with "nodevice random" by fixing where the
MALLOC pool is defined.

o Fix the explicit reseed code. This was correct as submitted, but
in the project branch doesn't need to set the "seeded" bit as this
is done correctly in the "unblock" function.

o Remove some debug ifdeffing.

o Adjust comments.

------------------------------------------------------------------------
r256159 | markm | 2013-10-08 19:48:11 +0100 (Tue, 08 Oct 2013) | 6 lines

Time to eat crow for me.

I replaced the sx_* locks that Arthur used with regular mutexes;
this turned out the be the wrong thing to do as the locks need to
be sleepable. Revert this folly.

# Submitted by:	Arthur Mesh <arthurmesh@gmail.com> (In original diff)

------------------------------------------------------------------------
r256138 | des | 2013-10-08 12:05:26 +0100 (Tue, 08 Oct 2013) | 10 lines

Add YARROW_RNG and FORTUNA_RNG to sys/conf/options.

Add a SYSINIT that forces a reseed during proc0 setup, which happens
fairly late in the boot process.

Add a RANDOM_DEBUG option which enables some debugging printf()s.

Add a new RANDOM_ATTACH entropy source which harvests entropy from the
get_cyclecount() delta across each call to a device attach method.

------------------------------------------------------------------------
r256135 | markm | 2013-10-08 07:54:52 +0100 (Tue, 08 Oct 2013) | 8 lines

Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use
EVENTHANDLER(mountroot) instead.

This means we can't count on /var being present, so something will
need to be done about harvesting /var/db/entropy/... .

Some policy now needs to be sorted out, and a pre-sync cache needs
to be written, but apart from that we are now ready to go.

Over to review.

------------------------------------------------------------------------
r256094 | markm | 2013-10-06 23:45:02 +0100 (Sun, 06 Oct 2013) | 8 lines

Snapshot.

Looking pretty good; this mostly works now. New code includes:

* Read cached entropy at startup, both from files and from loader(8)
preloaded entropy. Failures are soft, but announced. Untested.

* Use EVENTHANDLER to do above just before we go multiuser. Untested.

------------------------------------------------------------------------
r256088 | markm | 2013-10-06 14:01:42 +0100 (Sun, 06 Oct 2013) | 2 lines

Fix up the man page for random(4). This mainly removes no-longer-relevant
details about HW RNGs, reseeding explicitly and user-supplied
entropy.

------------------------------------------------------------------------
r256087 | markm | 2013-10-06 13:43:42 +0100 (Sun, 06 Oct 2013) | 6 lines

As userland writing to /dev/random is no more, remove the "better
than nothing" bootstrap mode.

Add SWI harvesting to the mix.

My box seeds Yarrow by itself in a few seconds! YMMV; more to follow.

------------------------------------------------------------------------
r256086 | markm | 2013-10-06 13:40:32 +0100 (Sun, 06 Oct 2013) | 11 lines

Debug run. This now works, except that the "live" sources haven't
been tested. With all sources turned on, this unlocks itself in
a couple of seconds! That is no my box, and there is no guarantee
that this will be the case everywhere.

* Cut debug prints.

* Use the same locks/mutexes all the way through.

* Be a tad more conservative about entropy estimates.

------------------------------------------------------------------------
r256084 | markm | 2013-10-06 13:35:29 +0100 (Sun, 06 Oct 2013) | 5 lines

Don't use the "real" assembler mnemonics; older compilers may not
understand them (like when building CURRENT on 9.x).

# Submitted by:	Konstantin Belousov <kostikbel@gmail.com>

------------------------------------------------------------------------
r256081 | markm | 2013-10-06 10:55:28 +0100 (Sun, 06 Oct 2013) | 12 lines

SNAPSHOT.

Simplify the malloc pools; We only need one for this device.

Simplify the harvest queue.

Marginally improve the entropy pool hashing, making it a bit faster
in the process.

Connect up the hardware "live" source harvesting. This is simplistic
for now, and will need to be made rate-adaptive.

All of the above passes a compile test but needs to be debugged.

------------------------------------------------------------------------
r256042 | markm | 2013-10-04 07:55:06 +0100 (Fri, 04 Oct 2013) | 25 lines

Snapshot. This passes the build test, but has not yet been finished or debugged.

Contains:

* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).

* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.

* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.

* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.

* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.

# Submitted by:	Arthur Mesh <arthurmesh@gmail.com> (the first item)

------------------------------------------------------------------------
r255319 | markm | 2013-09-06 18:51:52 +0100 (Fri, 06 Sep 2013) | 4 lines

Yarrow wants entropy estimations to be conservative; the usual idea
is that if you are certain you have N bits of entropy, you declare
N/2.

------------------------------------------------------------------------
r255075 | markm | 2013-08-30 18:47:53 +0100 (Fri, 30 Aug 2013) | 4 lines

Remove short-lived idea; thread to harvest (eg) RDRAND enropy into the
usual harvest queues. It was a nifty idea, but too heavyweight.

# Submitted by:	Arthur Mesh <arthurmesh@gmail.com>

------------------------------------------------------------------------
r255071 | markm | 2013-08-30 12:42:57 +0100 (Fri, 30 Aug 2013) | 4 lines

Separate out the Software RNG entropy harvesting queue and thread
into its own files.

# Submitted by:	 Arthur Mesh <arthurmesh@gmail.com>

------------------------------------------------------------------------
r254934 | markm | 2013-08-26 20:07:03 +0100 (Mon, 26 Aug 2013) | 2 lines

Remove the short-lived namei experiment.

------------------------------------------------------------------------
r254928 | markm | 2013-08-26 19:35:21 +0100 (Mon, 26 Aug 2013) | 2 lines

Snapshot; Do some running repairs on entropy harvesting. More needs
to follow.

------------------------------------------------------------------------
r254927 | markm | 2013-08-26 19:29:51 +0100 (Mon, 26 Aug 2013) | 15 lines

Snapshot of current work;

1) Clean up namespace; only use "Yarrow" where it is Yarrow-specific
or close enough to the Yarrow algorithm. For the rest use a neutral
name.

2) Tidy up headers; put private stuff in private places. More could
be done here.

3) Streamline the hashing/encryption; no need for a 256-bit counter;
128 bits will last for long enough.

There are bits of debug code lying around; these will be removed
at a later stage.

------------------------------------------------------------------------
r254784 | markm | 2013-08-24 14:54:56 +0100 (Sat, 24 Aug 2013) | 39 lines

1) example (partially humorous random_adaptor, that I call "EXAMPLE")
 * It's not meant to be used in a real system, it's there to show how
   the basics of how to create interfaces for random_adaptors. Perhaps
   it should belong in a manual page

2) Move probe.c's functionality in to random_adaptors.c
 * rename random_ident_hardware() to random_adaptor_choose()

3) Introduce a new way to choose (or select) random_adaptors via tunable
"rngs_want" It's a list of comma separated names of adaptors, ordered
by preferences. I.e.:
rngs_want="yarrow,rdrand"

Such setting would cause yarrow to be preferred to rdrand. If neither of
them are available (or registered), then system will default to
something reasonable (currently yarrow). If yarrow is not present, then
we fall back to the adaptor that's first on the list of registered
adaptors.

4) Introduce a way where RNGs can play a role of entropy source. This is
mostly useful for HW rngs.

The way I envision this is that every HW RNG will use this
functionality by default. Functionality to disable this is also present.
I have an example of how to use this in random_adaptor_example.c (see
modload event, and init function)

5) fix kern.random.adaptors from
kern.random.adaptors: yarrowpanicblock
to
kern.random.adaptors: yarrow,panic,block

6) add kern.random.active_adaptor to indicate currently selected
adaptor:
root@freebsd04:~ # sysctl kern.random.active_adaptor
kern.random.active_adaptor: yarrow

# Submitted by:	Arthur Mesh <arthurmesh@gmail.com>

Submitted by:	Dag-Erling Smørgrav <des@FreeBSD.org>, Arthur Mesh <arthurmesh@gmail.com>
Reviewed by:	des@FreeBSD.org
Approved by:	re (delphij)
Approved by:	secteam (des,delphij)
2013-10-12 15:31:36 +00:00
glebius
75528d8e36 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00