Because pmap_enter_locked() is called from few different functions
some redundancy in superpage promotion attempts can be observed.
Hence, avoid promotion in pmap_enter_object() (if the object can
be mapped by superpage it will be handled by pmap_enter_object()
itself) and also do not waste time in pmap_enter_quick().
From now on the promotion will be performed only in pmap_enter().
It was possible to create RW superpage mapping even if
the base pages were RO due to wrong setting of the prot
flag passed to pmap_map_section().
Promotion attempt should be canceled in case of attributes
mismatch between any two base pages. Since we still use
pv_flags to maintain permission to write (PVF_WRITE) and
wired status (PVF_WIRED) for a page, it is also necessary
to take those variables into account.
Invalidate L1 PTE regardles of existance of the corresponding
l2_bucket. This is relevant when superpage is entered via
pmap_enter_object() and will fix crash on entering page
in place of not properly removed superpage.
- netmap pipes, providing bidirectional blocking I/O while moving
100+ Mpps between processes using shared memory channels
(no mistake: over one hundred million. But mind you, i said
*moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC,
host port, VALE switch port, netmap pipe, and individual queues.
The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications
that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial
amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap
our public repository for netmap/VALE code, including
linux versions and other stuff that does not belong here,
such as python bindings.
https://code.google.com/p/netmap-libpcap
a clone of the libpcap repository with netmap support.
With this any libpcap client has access to most netmap
feature with no recompilation. E.g. tcpdump can filter
packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw
a userspace version of ipfw+dummynet which uses netmap
to send/receive packets. Speed is up in the 7-10 Mpps
range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some
point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better
than the version currently in 10 and 9.
MFC after: 3 days
insert flow entry. During the route lookup the critical section is
exited. It may happen, that after route lookup we will be executed
on an other CPU that already has such flowentry. Before this change
we simply freed the flowentry and returned to ip_output() with
failure.
Actually there is nothing wrong with using previously allocated
flow entry, updating it properly. Thus, make flowentry_insert()
return the new either old fle, and make use of it.
Count reuses as "collisions" and real inserts as "inserts".
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
race prone. Some just gather statistics, but some are later used in
different calculations.
A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.
Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.
Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.
Thanks to: Dennis Yusupoff <dyr smartspb.net>
Also reported by: dumbbell, pgj, Rambler
Sponsored by: Nginx, Inc.
The on-board NIC is an 3x3 AR9380 with 5GHz only.
* enable pci code in AR9344_BASE
* enable ath_pci and the firmware loading bits in DB120
* add in the relevant hints in DB120.hints to inform the probe/attach
code where the PCIe fixup data is for the onboard chip.
This is only relevant for a default development board. I also have a
DB120 with the on-board PCIe wifi NIC disabled and it's exposed as
a real PCIe slot (to put normal PCIe NICs in); the fixup code will need
to be disabled to make this work correctly.
Tested:
* DB120
a call of pager_swap_freespace() was moved around, now leading to freeing
the incorrect page because of the pindex changes after vm_page_rename().
Get back to use the correct pindex when destroying the swap space.
Sponsored by: EMC / Isilon storage division
Reported by: avg
Tested by: pho
MFC after: 7 days
Some of the collisions that are occuring are due to flowtable lookups
that succeed but have an invalid lle - typically because the L2 adjacency
lookup hasn't completed. This would lead to a follow-up insert which
would then fail (ie, collision) and the code would fall through to doing
a slow-path L2/L3 lookup in the netinet/netinet6 code.
This patch simply aborts storing a new flowtable entry if the lle isn't
yet valid.
Whilst I'm here, add a new pcpu counter for the item so the number of
failures can be tracked separately from generic "collisions."
Reviewed by: glebius
MFC after: 10 days
Sponsored by: Netflix, Inc.
because we use the 1MiB section maps as they only need a single pagetable.
To allow this we only use pc relative loads to ensure we only load from
physical addresses until we are running from a known virtual address.
As a side effect any data from before or 64MiB after the kernel needs to
be mapped in to be used. This should not be an issue for kernels loaded
with ubldr as it places this data just after the kernel. It will be a
problem when loading directly from anything using the Linux ABI that
places the ATAG data outside this range, for example U-Boot.
Currently we have 3 usage patterns:
1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient)
2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed.
3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing).
We separate case 1 and implement it inside its only customer - nd6_output.
This leads to some code duplication (especialy SEND stuff, which should be
hooked to output in a different way), but simplifies locking and control
flow logic fir nd6_output_lle.
Reviewed by: ae
MFC after: 3 weeks
Sponsored by: Yandex LLC
Add gpioled(4) to BEAGLEBONE kernel and add the description of the four
on-board leds of beaglebone-black to its DTS file.
Approved by: adrian (mentor, implicit)
change the gpio children can be described as directly connected to the GPIO
controller without the need of describing the OFW GPIO bus itself on the
DTS file.
With this commit the OFW GPIO bus is fully functional on BBB and RPi.
GPIO controllers which want to use the OFW GPIO bus will need similar
changes.
Approved by: adrian (mentor, implicit)
This change makes ofw_iicbus attach to iicbb(4) controllers in addition to
the already supported i2c host bridges (iichb).
On iicbb(4) allow the direct access of the OFW parent node by its children,
so they can be directly attached to iicbb(4) node on the DTS without the
need of describing the i2c bus.
Approved by: adrian (mentor, implicit)
gpioled(4).
Tested on RPi and BBB (using the hardware I2C controller and gpioiic(4) for
the I2C tests). It was also verified for regressions on RSPRO (MIPS/ar71xx)
used as reference for a non OFW-based system.
Update the gpioled(4) and gpioiic(4) man pages with some details and
examples about the FDT/OFW support.
Some compatibility details pointed out by imp@ will follow in subsequent
commits.
Approved by: adrian (mentor, implicit)
describe GPIO bindings in the system.
Move the GPIOBUS lock macros to gpiobusvar.h as they are now shared between
the OFW and the non OFW versions of GPIO bus.
Export gpiobus_print_pins() so it can also be used on the OFW GPIO bus.
Approved by: adrian (mentor, implicit)
- Get USB input report length from HID descriptor.
- Use 1 finger TAP for devices which has no integrated button.
- Move data buffer to softc instead of allocating it.
MFC after: 1 week
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.