38862a5553
direct mode renderer. I thought that reads were not much slower than writes, so that the method only tripled the time for the whole function, but I recently measured that video memory reads can be up to 53 times slower than writes in tighter loops than here. Loop overheap here reduces the multiplier to only 16-20 on Haswell. Start cleaning up and fixing larger bugs in this function. Only replace the 22-line removal loop by a 3-line one for now, since adjusting the old loop would have required many palette calculations which are better done in the DRAW_PIXEL() macro. This also fixes missing support for depth 24, but only for removal. Removal is currently sloppy at the right bottom corner. It sometimes leaks border color into the text window. This is soon cleaned up by the caller. The planar renderer has complications to clip at the corner.