much as possible, by avoiding null ANDs and ORs to the frame buffer.
Mouse cursors are fairly sparse, especially for their frame. Pixels
are written in groups of 8 in planar mode and the per-group sparseness
is not as large, but it still averages about 40% with the current
9x13 mouse cursor. The average drawing time is reduced by about this
amount (from 22 usec constant to 12.5 usec average on Haswell).
This optimization is relatively larger with larger cursors. Width 10
requires 6 frame buffer accesses per line instead of 4 if not done
sparsely, but rarely more than 4 if done sparsely.
mode.
Don't manually unroll the 2 inner loops. On Haswell, doing so gave a
speedup of about 0.5% (about 4 cycles per iteration out of 1400), but
hard-coded a limit of width 9 and made better better optimizations
harder to see. gcc-4.2.1 -O does the unrolling anyway, unless tricked
with a volatile hack. gcc's unrolling is not very good and gives a
a speedup of about half as much (about 2 cycles per iteration). (All
timing on i386.)
Manual unrolling was only feasible because the inner loop only iterates
once or twice. Usually twice, but a dynamic check is needed to decide,
and was not moved from the second-innermost loop manually or by gcc.
This commit basically adds another dynamic check in the inner loop.
Cursor widths of 10-17 require 3 iterations in the inner loop and this
is not so easy to unroll -- even gcc stops at 2.
TXP_CMD_WAIT argument allocates a response buffer. If the allocation
fails, txp_ext_command() returns an error and it's handed in caller.
Found by: PVS-Studio
the method a lot.
Reduce the AND mask to the complement of the cursor's frame, so that area
inside the frame is not drawn first in black and then in lightwhite. The
AND-OR method is only directly suitable for the text mouse image, since
it doesn't go to the hardware there. Planar mode Mouse cursor drawing
takes 10-20 usec on my Haswell system (approx. 100 graphics accesses
at 130 nsec each), so the transient was not visible.
The method used the fancy read mode 1 and its color compare and color
don't care registers with value 0 in them so that all colors matched.
All that this did was make byte reads of frame buffer memory return 0xff,
so that the x86 case could obfuscate read+write as "and". The read must
be done for its side effect on the graphics controller but is not used,
except it must return 0xff to avoid affecting the write when the write
is obfuscated as a read-modify-write "and". Perhaps that was a good
optimization for 8088 CPUs where each extra instruction byte took as
long as a byte memory access.
Just use read+write after removing the fancy read mode. Remove x86
ifdefs that did the "and". After removing the "and" in the non-x86
part of the ifdefs, fix 4 of 6 cases where the shift was wrong.
in unusual cases. Optimize and significantly clean up removal in this
renderer. Optimize removal in the vga direct renderer.
Removal only needs to be done in the border (the part with pixels) in
both cases. The planar renderer used the condition scp->xoff > 0 to
test whether a right border exists. This actually tests for a left
border, and when the total horizontal border is 8 pixels, rounding gives
only a right border. This was the unusual broken case. An example
is easy to configure using something like "vidcontrol -f 8x16 iso-8x16
-g 79x25 MODE_27".
Optimize the planar case a little by only removing 9x13 active pixels
out of 16x16. Optimize it a lot by not doing anything if there is no
overlap with the border. Don't unroll the main loop or hard-code so
many assumptions about font sizes in it. On my Haswell system, graphics
memory and i/o accesses takes about 520 cycles each so optimizations from
unrolling are in the noise.
Optimize the direct case to not do anything if there is no overlap with
the border. Do a sanity check on the saveunder's coordinates. This
requires a previous change to pass non-rounded coordinates.
The previous commit also fixed the coordinates passed to the mouse
removal renderer. The coordinates were rounded down to a character
boundary, and thus essentially unusable. The renderer had to keep
track of the previous position, or clear a larger area. The latter
is only safe in the border, which is all that needs special handling
anyway.
I think no renderer depends on the bug. They have the following
handling:
- gfb sparc64: this seems to assume non-rounded coordinates
- gfb other: does nothing (seems to be missing border handling)
- vga text: does nothing (doesn't need border handling)
- vga planar: clears extras in the border, with some bugs. The fixes
will use the precise coordinates to optimize.
- vga direct: clears at the previous position with no check that it
is active, and clears everything. Checking finds this bug.
- others: are there any?
reduce hard-coded assumptions on font sizes so that the cursor size
can be more independent of the font size. Moving the mouse in the
buggy mode left trails of garbage.
The mouse cursor currently has size 9x13 in all modes. This can occupy
2x3 character cells with 8x8 fonts, but the algorithm was hard-coded
for only 2x2 character cells. Rearrange to hard-code only a maximum
cursor size (now 10x16) and to not hard-code in the logic. The number
of cells needed is now over-estimated in some cases.
2x3 character cells should also be used in text mode with 8x8 fonts
(except with large pixels, the cursor size should be reduced), but
hard-coding for 2x2 in the implementation makes it not very easy to
expand, and it practice it shifts out bits to reduce to 2x2.
In graphics modes, expansion is easier and there is no shifting out
for 9x13 cursors (but 9 is a limit for hard-coding for 2 8-bit VGA
cells wide). A previous commit removed the same buggy hard-coding for
removal at a lower level in planar mode. Another previous commit fixed
the much larger code for lower-level removal in direct mode; this is
independent of the font size so worked for 8x8 fonts. Text mode always
depended on the higher-level removal here, and always worked since
everything was hard-coded consistently for 2x2 character cells.
scteken_init(). Move the internals of scteken_sync() into a local
function to help do this.
scteken_init() reset or adjusted the default attribute and screen
position at least 3 and 5 times, respectively. Warm init shouldn't
do any more than reset the "input" state.
(scterm-sc.c (which still works after minor editing), only resets
the escape state and the saved cursor position, and then does a
nearly-null sync of the current color.)
This mainly broke mode changes, and was most noticeable when the
background color is not teken's default (usually black). Then the
screen gets cleared in the wrong color. vidcontrol restores the
default normal attribute and tries to restore the default reverse
attribute. vidcontrol doesn't clear the screen again after restoring
the attribute(s), and it is too late to do it there without flicker.
Now the default normal attribute is restored before the change affects
the rendering.
When the foreground color is not teken's default, clearing with the
wrong attributes gave strange cursor colors for some cursor types.
The default reverse attribute is not restored since it is unsupported.
2/3 of the clobbering was from 2 resetting window resizing calls. The
second one is needed to restore the size, but must not reset. Window
resizing also sanitizes the cursor position, and after the main reset
resets the window size, the cursor row would often be adjusted from
24 to 23 if it were not already reset to 0. scteken_sync() is good
for restoring the window size and the cursor position in the correct
order, but was unusable at init time since scp->ts is not always
initialized then. Adjust to use its internals.
I didn't notice any problems from the cursor reset. The cursor should
be reset, and a previous fix was to reset it consistently a little
later.
Doing nothing for warm init works almost as well, if not better. It
is not very useful to reset the escape state for mode changes, since
the reset is especially likely to be null then. The escape state is
most likely to be non-initial and corrupted by its most normal uses
-- sloppy non-atomic output where a context switch or just mixing
stdout with stderr splits up escape sequences.
Omit unused rates while initializing / dumping Tx power values.
They were not accessed anywhere (except for debugging), so this is
(mostly) no-op.
Tested with
* RTL8188EU, STA mode.
* RTL8812AU, STA mode.
Found by: PVS-Studio
1. Deadcode in ecore_init_cache_line_size(), qlnx_ioctl() and
qlnx_clean_filters()
2. ARRAY_VS_SINGLETON issue in qlnx_remove_all_mcast_mac() and
qlnx_update_rx_prod()
MFC after:5 days
like I hoped, since they are needed for removing parts over the border.
Continue fixing bugs in them.
In the vga planar mode renderer, remove removal of the part of the
image over the text window. This was hard-coded for nearly 8x16 fonts
and in practice didn't remove enough for 8x8 fonts. This used the
wrong attribute over cutmarked regions. The caller refreshes with the
correct attribute later, so the attribute bug only caused flicker.
The caller uses the same hard-coding, so the refreshes fix up all the
spots with the wrong attribute, but keep missing the missed spots.
This still gives trails of bits of cursors for cursor motions in the
affected configurations (mainly depth 4 modes with 8x8) fonts. 8x14
fonts barely escape the problem since although the cursor is drawn
as 16x16, its active part is only 9x13 and the active part fits in
the hard-coded 2x2 character cell window for 8x14 fonts. 8x8 fonts
need a 2x3 window.
In the fb non-sparc64 renderer, the buggy image removal was buggier
and was already avoided by returning before it. Remove it completely
and fix nearby style bugs. It was essentially the same as for the vga
planar mode renderer (obfuscated by swapping x and y). This was buggier
since fb should handle more types of hardware so the hard-coding is
wronger.
The remaining fb image removal is also buggier. It never supported
software cursors drawn into the border, and the hardware cursor is
probably broken by other bugs to be fixed soon.
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
(that is, in all supported 8, 15, 16 and 24-color modes). Moving the
mouse cursor while holding down a button (giving cut marking) left a
trail of garbage from misremoved mouse cursors (usually colored
rectangles and not cursor shapes). Cases with a button not held down
worked better and may even have worked.
No renderer support for removing (software) mouse cursors is needed
(and many renderers don't have any), since sc_remove_mouse_image()
marks for update the region containing the image and usually much
more. The mouse cursor can be (partially) over as many as 4 character
cells, and removing it in only the 1-4 cells occupied by it would be
best for efficiency and for avoiding flicker. However,
sc_remove_mouse_image() can only mark a single linear region and
usually marks a full row of cells and 1 more to be sure to cover the
4 cells. It always does this, so using the special rendering method
just wastes even more time and gives even more flicker. The special
methods will be removed soon.
The general method always works. vga_pxlmouse_direct() appeared to
defer to it by returning immediately if !on. However,
vga_pxlmouse_direct() actually did foot-shooting using a disguised
saveunder method. Normal order near a mouse move is:
(1) remove the mouse cursor in the renderer (optional)
(2) remove the mouse cursor again and refresh the screen over the
mouse cursor and much more from the vtb. When the mouse has
actually moved and a button is down, many attributes in this
region are changed to be up to date with the new cut marking
(3) draw the keyboard cursor again if it was clobbered by the update
(4) draw the mouse cursor image in its new position.
The bug was to remove the mouse cursor again in step (4), before the
drawing it again in (4), using a saveunder that was valid in step (1)
at best. The quick fix is to use the saveunder in step (1) and not
in step (4). Using it in step (4) also used it before it was
initialized, initially and after mode and screen switches.
in the vga renderer. Removal used stale attributes and didn't try to
merge with the current attribute for cut marking, so special rendering
of cut marking was lost in many cases. The gfb renderer is too broken
to support special rendering of cut marking at all, so this change is
supposed to be just a style fix for it. Remove all traces of the
saveunder method which was used to implement this bug.
Fix drawing of the cursor image in text mode, only in the vga
renderer. This used a stale attribute from the frame buffer instead
of from the saveunder, but did merge with the current attribute for
cut marking so it caused less obvious bugs (subtle misrendering for
the character under the cursor).
The saveunder method may be good in simpler drivers, but in syscons
the 'under' is already saved in a better way in the vtb. Just redraw
it from there, with visible complications for cut marking and
invisible complications for mouse cursors. Almost all drawing
requests are passed a flag 'flip' which currently means to flip to
reverse video for characters in the cut marking region, but should
mean that the the characters are in the cut marking regions so should
be rendered specially, preferably using something better than reverse
video. The gfb renderer always ignores this flag. The vga renderer
ignored it for removal of the text cursor -- the saveunder gave the
stale rendering at the time the cursor was drawn. Mouse cursors need
even more complicated methods. They are handled by drawing them last
and removing them first. Removing them usually redraws many other
characters with the correct cut marking (but transiently loses the
keyboard cursor, which is redrawn soon). This tended to hide the
saveunder bug for forward motions of the keyboard cursor. But slow
backward motions of the keyboard cursor always lost the cut marking,
and fast backwards motions lost in for about 4 in every 5 characters,
depending on races with the scrn_update() timeout handler. This is
because the forward motions are usually into the region redrawn for
the mouse cursor, while backwards motions rarely are.
Text cursor drawing in the vga renderer used also used a
possibly-stale copy of the character and its attribute. The vga
render has the "optimization" of sometimes reading characters from the
screen instead of from the vtb (this was not so good even in 1990 when
main memory was only a few times faster than video RAM). Due to care
in update orders, the character is never stale, but its attribute
might be (just the cut marking part, again due to care in order).
gfb doesn't have the scp->scr pointer used for the "optimization", and
vga only uses this pointer for text mode. So most cases have to
refresh from the vtb, and we can be sure that the ordering of vtb
updates and drawing is as required for this to work.
so that we can use it in iflib to detect pause frames.
The igb(4) driver definitely used to use this in its old timer function and
I see no reason to restrict it to that driver only.
Sponsored by: Limelight Networks
(and accurate).
T4 and later have an extra bit for page shift so the maximum page size
is 8TB (shift of 12 + 31) instead of 128MB (12 + 15). This saves space
in the chip's PBL (physical buffer list) when registering very large
memory regions.
MFC after: 3 days
Sponsored by: Chelsio Communications
- Move all bitmap related functions from bitops.h to bitmap.h, similar
to what Linux does.
- Apply some minor code cleanup and simplifications to optimize the
generated code when using static inline functions.
- Implement the following list of bitmap functions which are needed by
drm-next and ibcore:
- bitmap_find_next_zero_area_off()
- bitmap_find_next_zero_area()
- bitmap_or()
- bitmap_and()
- bitmap_xor()
- Add missing include directives to the qlnxe driver
(davidcs@ has been notified)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Under certain conditions on certain versions of Hyper-V, the RNDIS
rxfilter is _not_ zero on the hypervisor side after the successful
RNDIS initialization, which breaks the assumption of any following
code (well, it breaks the RNDIS API contract actually). Clear the
RNDIS rxfilter explicitly, drain packets sneaking through, and drain
the interrupt taskqueues scheduled due to the stealth packets.
Reported by: dexuan@
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D10230
Synthetic keyboard is the only supported keyboard on GEN2 Hyper-V.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D10196
The tsec_error_intr_locked() is called with the global lock owned (e.g.
the transmit and the receive lock are both owned). We must not call
tsec_receive_intr_locked() while owning the transmit lock. The normal
receive interrupt takes care that frames are received, this is none of
the business of the error interrupt.
Submitted by: Sebastian Huber <sebastian.huber_AT_embedded-brains.de>
Use a method similar to the if_dwc driver. Use a wmb() before the flags of the
first transmit buffer of a frame are written.
Group transmit/receive structure members for better cache efficiency.
Tested on P1020RDB. TCP transmit throughput increases from 60MiB/s to
90MiB/s.
Submitted by: Sebastian Huber <sebastian.huber_AT_embedded-brains.de>
Timeout is now effectively a boolean rather than a time-remaining. This was
missed in r316478, but included in the original patch (mis-merged with a manual
merge).