- use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle()
into cpu_switch() and vice versa.
- moved badsw code to after cpu_switch().
Cosmetic changes:
- moved sw0 string to be immediately after its caller (badsw).
- removed unused #include.
the one place that depended on it. wakeup() is now prototyped in
<sys/systm.h> so that it is normally visible.
Added nested include of <sys/queue.h> in <vm/vm_object.h>. The queue
macros are a more fundamental prerequisite for <vm/vm_object.h> than
the wakeup prototype and previously happened to be included by
namespace pollution from <sys/proc.h> or elsewhere.
controllers, it is an error to issue a command before the keyboard
has had time to reply to the previous command. Setting the LEDs
involves issueing 2 commands, so it never worked on these keyboards.
Fixed resetting of keyboard. It is possible for unprocessed
scancodes to be present when the reset routine is called. This
usually occurs after switching from one console driver to another
in userconfig. pcvt and syscons attempt to flush any stale scancodes,
but sometimes fail to do so because keyboard and/or keyboard
controller takes a long time to send the scancodes after reset
(scancodes are apparently not flushed by reset!). syscons handles
this later by not checking for errors at strategic places, but pcvt
was confused.
Use an impossible initial and failure mode setting for the LEDs
so that the LEDs always get set if they are possibly out of sync.
Added missing spltty() in update_led().
shipped with freebsd can be changed without modifying the Makefiles directly.
Creates: BOOT_FORCE_COMCONSOLE
BOOT_PROBE_KEYBOARD
BOOT_PROBE_KEYBOARD_LOCK
BOOT_COMCONSOLE (port value for console)
ufs_read() and ufs_write().
Found by: looking at warnings for comparing the result of lblktosize()
(which is usually daddr_t = long) with file sizes (which are u_quad_t
for ufs). File sizes should probably be off_t's to avoid warnings
when the are compared with file offsets, so the fixed lblktosize()
casts to off_t instead of u_quad_t.
Added definition of smalllblksize(). It is the same as the old
lblksize() and is more efficient for small block numbers on 32-bit
machines.
Use smalllblktosize() instead of its expansion in blksize() and
dblksize(). This keeps the line length short and makes it more
obvious that the shift can't overflow.
capable of being used for things other than swap space allocation,
and splvm would have been appropriate for only swap space allocation
and other VM things. My commit broke that (and was actually a mistake.)
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.
Still no support for Ultra-SCSI and other new features, but the code
should now correctly initialize the clock pre-scaler (based on freqency
measurement results, if necessary).
Fix support of 16 targets for WIDE SCSI.
Disable bus reset in case no progress is made for too long ("ncr dead"
message), which did not work too well with scanners and other slow devices.
(yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.
Submitted by: fenner
if a single process is performing a large number of requests (in this
case writing a large file). The writing process could monopolise the
recieve lock and prevent any other processes from recieving their
replies.
It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the
behaviour which originally pointed out the problem. When a process
writes to a file over NFS, it usually arranges for another process
(the 'iod') to perform the request. If no iods are available, then it
turns the write into a 'delayed write' which is later picked up by the
next iod to do a write request for that file. This can cause that
particular iod to do a disproportionate number of requests from a
single process which can harm performance on some NFS servers. The
alternative is to perform the write synchronously in the context of
the original writing process if no iod is avaiable for asynchronous
writing.
The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and
the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The
default is vfs.nfs.dwrite=1; if many people tell me that performance
is better if vfs.nfs.dwrite=0 then I will change the default.
Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
features are used without testing for i586 features that they depend
on. Configuring option PERFMON without configuring a suitable cpu
still doesn't fail right.
(1) Merged i386/i386/sb.h, deleted pc98/pc98/sb.h.
(2) pc98/conf/GENERIC8 looks more like i386/conf/GENERIC now.
(3) Fixed display bug in pc98/boot/biosboot/io.c.
(4) Prepare to merge memory allocation routines:
pc98/i386/locore.s
pc98/i386/machdep.c
pc98/pc98/pc98_machdep.c
pc98/pc98/pc98_machdep.h
(5) Support new board "C-NET(98)":
pc98/pc98/if_ed98.h
pc98/pc98/if_ed.c
(6) Make sure FPU is recognized for non-Intel CPUs:
pc98/pc98/npx.c
(7) Do not expect bss to be zero-allocated:
pc98/pc98/pc98.c
Submitted by: The FreeBSD(98) Development Team
Also disabled -Wunused. It caused too many warnings even for me.
The sign mismatch warnings should be fixed first. They are more
important and harder to disable (they are controlled by -W, which
controls too many things).
I586_OPTIMIZED_BCOPY is configured.
Similarly for bzero/I586_OPTIMIZED_BZERO.
Fake 586's had better have a hardware FPU with non-broken exception
handling (we mask exceptions, but broken exception handling may trap
on the instructions that do the masking). I guess this means that
the routines won't work on most 386's or FPUless 486's even when they
have a h/w FPU.
These are based on using the FPU to do 64-bit stores. They also
use i586-optimized instruction ordering, i586-optimized cache
management and a couple of other tricks. They should work on any
i*86 with a h/w FPU, but are slower on at least i386's and i486's.
They come close to saturating the memory bus on i586's. bzero()
can maintain a 3-3-3-3 burst cycle to 66 MHz non-EDO main memory
on a P133 (but is too slow to keep up with a 2-2-2-2 burst cycle
for EDO - someone with EDO should fix this). bcopy() is several
cycles short of keeping up with a 3-3-3-3 cycle for writing. For
a P133 writing to 66 MHz main memory, it just manages an N-3-3-3,
3-3-3-3 pair of burst cycles, where N is typically 6.
The new routines are not used by default. They are always configured
and can be enabled at runtime using a debugger or an lkm to change
their function pointer, or at compile time using new options (see
another log message).
Removed old, dead i586_bzero() and i686_bzero(). Read-before-write is
usually bad for i586's. It doubles the memory traffic unless the data
is already cached, and data is (or should be) very rarely cached for
large bzero()s (the system should prefer uncached pages for cleaning),
and the amount of data handled by small bzero()s is relatively small
in the kernel.
Improved comments about overlapping copies.
Removed unused #include.
callers of it to take advantage of this. This reduces new connection
request overhead in the face of a large number of PCBs in the system.
Thanks to David Filo <filo@yahoo.com> for suggesting this and providing
a sample implementation (which wasn't used, but showed that it could be
done).
Reviewed by: wollman
I have only tested the ABP5140 card and only with a single CDROM drive
but it seems to work fine. This driver relies on features found only in
the SCSI branch so will not work in -current until those changes
are brought in. It also doesn't have any error handling code *yet*.
The goal is to use this driver as the development platform for the new
generic SCSI layer error recovery/handling code.
PCI and EISA front ends will show up as soon as I get my hands on
the cards. There are also a few issues in the driver that I need
to clear up with AdvanSys before I can suggest sticking one of
these cards in your server. 8-)
Thanks to AdvanSys for releasing this code under a suitable copyright.
Obtained from: Ported from the Linux driver writen by
bobf@advansys.com (Bob Frey).
I have only tested the ABP5140 card and only with a single CDROM drive
but it seems to work fine. This driver relies on features found only in
the SCSI branch so will not work in -current until those changes
are brought in. It also doesn't have any error handling code *yet*.
The goal is to use this driver as the development platform for the new
generic SCSI layer error recovery/handling code.
PCI and EISA front ends will show up as soon as I get my hands on
the cards. There are also a few issues in the driver that I need
to clear up with AdvanSys before I can suggest sticking one of
these cards in your server. 8-)
Thanks to AdvanSys for releasing this code under a suitable copyright.
Obtained from: Ported from the Linux driver writen by
bobf@advansys.com (Bob Frey).
64K. The change has essentially neutral effect on those machines with
little or no cache, and has a positive effect on "normal" machines
with 256K or more cache.
[long long] results when I last worked on them, but they are normally
used together with to daddr_t's and off_t's which are signed, so the
unsigned results did little except cause warnings.
The main change is from unsigned long unsigned int. It just needs to
be a 32-bit type and unsigned int is most natural. Using a non-long
type has the "advantage" of hiding bugs in the "machine-independent"
code where it prints foo_t's using %d or %x. These bugs are currently
hidden bug not compiling with -Wformat.
I tried changing vm_ooffset_t from long long to unsigned long long, but
that was wrong because vm_ooffset_t needs to be long to match off_t,
although file offsets are never negative.
Reviewed by: dyson
Staticized it in userconfig. The one in pcvt is unused.
Removed bogus unused arg to getchar(). This should not have compiled
in the USERCONFIG_BOOT case, but the getchar() was also non-prototyped
and defined in K&R style.
Staticized the badly named global variable `next'. Even static variables
should have a unique module-specific prefix so that they can be referenced
easily in debuggers, etc.
Major: When blocking occurs in allocbuf() for VMIO files,
excess wire counts could accumulate.
Major: Pages are incorrectly accumulated into the physical
buffer for clustered reads. This happens when bogus
page is needed.
Minor: When reclaiming buffers, the async flag on the buffer
needs to be zero, or the reclaim is not optimal.
Minor: The age flag should be cleared, if a buffer is wanted.
First, change sysinstall and the Makefile rules to not build the kernel
nlist directly into sysinstall now. Instead, spit it out as an ascii
file in /stand and parse it from sysinstall later. This solves the chicken-n-
egg problem of building sysinstall into the fsimage before BOOTMFS is built
and can have its symbols extracted. Now we generate the symbol file in
release.8.
Second, add Poul-Henning's USERCONFIG_BOOT changes. These have two
effects:
1. Userconfig is always entered, rather than only after a -c
(don't scream yet, it's not as bad as it sounds).
2. Userconfig reads a message string which can optionally be
written just past the boot blocks. This string "preloads"
the userconfig input buffer and is parsed as user input.
If the first command is not "USERCONFIG", userconfig will
treat this as an implied "quit" (which is why you don't need
to scream - you never even know you went through userconfig
and back out again if you don't specifically ask for it),
otherwise it will read and execute the following commands
until a "quit" is seen or the end is reached, in which case
the normal userconfig command prompt will then be presented.
How to create your own startup sequences, using any boot.flp image
from the next snap forward (not yet, but soon):
% dd of=/dev/rfd0 seek=1 bs=512 count=1 conv=sync <<WAKKA_WAKKA_DOO
USERCONFIG
irq ed0 10
iomem ed0 0xcc000
disable ed1
quit
WAKKA_WAKKA_DOO
Third, add an intro screen to UserConfig so that users aren't just thrown
into this strange screen if userconfig is auto-launched. The default
boot.flp startup sequence is now, in fact, this:
USERCONFIG
intro
visual
(Since visual never returns, we don't need a following "quit").
Submitted-By: phk & jkh
1/ session leader
2/ Have a console device vnode (/dev/console)
3/ have NULL pointer for a consoel tty struct.
fix the only case where the tty struct is referenced without a prior
check for existance.
certain error conditions, it is possible for pages to be left allocated
in the object beyond it's end. It is generally bad practice to allocate
pages beyond the end of an object.
support LD_HINTS_VERSION_2 that has the ldconfig pathname stored in the
ld.so.hints file (ie: a new library can be installed and used without
needing to run ldconfig -m first)
Reviewed by: nate, jdp
Obtained from: NetBSD (mostly)
divisor latch registers if the registers wouldn't change.
Use the default console cfcr setting while setting the divisor
latch registers for console i/o. Input may be messed up by
transiently changing the cfcr.
Use a usual cfcr setting while setting the divisor latch registers
in the probe. This shouldn't matter, but this is not the place to
test the UART's handling of 5 bit words.
Removed a stale devfs comment.
was always disabled because "pci.h" wasn't included. Now the configured
pci devices are listed and you can edit bogus flags for them.
Fixed bitrot in the disabled code. A used #include was removed and const
poisoning wasn't fixed.
Removed unused #include.
not resuming the NIC as required for transmit. Thanks to Alan Cox
<alc@cs.rice.edu> for noticing this.
Added another performance optimization to compensate. :-)
Changed crscdt to 1...strange, but this seems to be needed for some reason
despite what the manual says.
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.
Additionally, fix the tlb management code for 386 machines.
- kern.maxproc and kern.maxprocperuid were read-only (and thus essentially
useless. Apparently no one uses them).
- all the user sysctls were read-write (and thus it was possible for them
to be inconsistent with the authoritative fixed values in the library).
Removed unused #include.
for headers in the compile directory work unsurprisingly. Without
-I-, the search for "foo.h" begins in the directory of the file
that includes it, and the compile directory is only searched because
`-I.' is in ${INCLUDES}.
Removed -I$S/sys from ${INCLUDES}. It was once necessary to find
things like "param.h" in $S/sys. Now <sys/param.h> is found in $S.
lcall 7,0 (ie: ldt slot 0) and lcall 0x87,0 (ldt slot 16, it's shifted
three bits to the left). I was fiddling with this so long ago, I don't
recall the specifics.
with this quite a while ago when somebody reported a BSD/OS 2.1 binary
that wouldn't run. I'm pretty sure they tried it and I'm pretty sure
they mentioned to me that the patch worked.
comparisons in the inb() and outb() macros. I decided that int args
are OK here. Any type that can hold a u_int16_t without overflow
is correct, and 32-bit types are optimal.
Introduced a few tens of warnings (100 in LINT) for use of pessimized
(short) types for the port arg. Only a few drivers are affected by
this. u_short pessimizations aren't detected.
Added `__extension__' before the statement-expression in inb() so
that it can be compiled without warnings by gcc -pedantic.
ring that caused wrong things to happen sometimes.
Doubled the number of transmit descriptors to 128 so that the internal
FIFO in the NIC can be fully filled when dealing with small packets.
Several minor performance improvements.
- don't include <sys/ioctl.h> in any header. Include <sys/ioccom.h>
instead. This was already done in 4.4Lite for the most important
ioctl headers. Header spam currently increases kernel build
times by 10-20%. There are more than 30000 #includes (not counting
duplicates) for compiling LINT.
- include <sys/types.h> if and only it is necessary to make the header
almost self-sufficient (some ioctl headers still need structs from
elsewhere).
- uniformized idempotency ifdefs. Copied the style in the 4.4Lite
ioctl headers.
drop the oldest entry in the queue.
There was a fair bit of discussion as to whether or not the
proper action is to drop a random entry in the queue. It's
my conclusion that a random drop is better than a head drop,
however profiling this section of code (done by John Capo)
shows that a head-drop results in a significant performance
increase.
There are scenarios where a random drop is more appropriate.
If I find one in reality, I'll add the random drop code under
a conditional.
Obtained from: discussions and code done by Vernon Schryver (vjs@sgi.com).
It is needed for implementation details but very little of it is
needed for the interface. Include it in the few places that didn't
already include it.
Include <sys/ioccom.h> in <sys/disklabel.h> (as already in
<sys/diskslice.h>) so that all the disk-related headers are almost
self-sufficient.
the prototype.
Put the jump table for i486_bzero() in the data section. This
speeds up i486_bzero() a little on Pentiums without significantly
affecting its speed on 486's.
Don't waste time falling through 14 nop's to return from do1 in
i486_bzero().
Use fastmove() for counts >= 1024 (was > 1024). Cosmetic.
Fixed profiling of fastmove().
Restored meaningful labels from the pre-1.1 version in fastmove().
Local labels are evil.
Fixed (high resolution non-) profiling of __bb_init_func().
comma expression has no effect" in the MAKE_SET() macro. This also
fixes compiling with -O3 (which removes static functions unless
there is a suitable reference to them). Declaring all the static
symbols as __unused would also fix the warning, but would be bogus
(they are used) and wouldn't fix -O3. However, the dummy pointers
for the references waste about 1.5K text and 20K symbol space for
GENERIC. This wastage hasn't changed - the dummy pointers are just
nonzero now.
to deal with the fact that we relied on devconf to do the shutdown
callouts in various drivers. The changes in this commit are to add support
for device shutdown in this driver via the new at_shutdown() mechanism.
Similar changes need to be made to all of the other drivers that need
a shutdown routine called (if_de.c comes to mind immediately).
incorrect, and correct the support for B_ORDERED. The spl window
fix was from Peter Wemm, and his questions led me to find the problem with
the interrupt time page manipulation.
data pointed at in a ktrace file, if this process is being ktrace'ed.
I'm using this to profile malloc usage.
The advantage is that there is no context around this call, ie, no
open file or socket, so it will work in any process, and you can
decide if you want it to collect data or not.
/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
I maintain that it saves more power to simply "hlt" the CPU than to
spend tons of time trying to tell the APM bios to do the same.
In particular if you do it 100 times a second...
changes. This version should fix a number of bugs such as with auto-
speed sensing and at least one known panic.
Submitted by: Matt Thomas (matt@3am-software.com)
bsd.obj.mk. Also, a make target called objwarn checks to see
if ${.OBJDIR} != ${.CURDIR} and ${.OBJDIR} != ${CANONICALOBJDIR}
and outputs a warning. (No warning for the latter if MAKEOBJDIR or MAKEOBJDIRP
REFIX is set). objwarn is called from all targets in bsd.prog.mk, bsd.kmod.mk,
and bsd.lib.mk.
Reviewed by: bde
is that it doesn't say _what_ did it! (the core dumped console message
is very useful for listing the process name and pid). This adds similar
information.
`show vmopag', `show page' and `show pageq'. Moved all vm ddb stuff
to the ends of the vm source files.
Changed printf() to db_printf(), `indent' to db_indent, and iprintf()
to db_iprintf() in ddb commands. Moved db_indent and db_iprintf()
from vm to ddb.
vm_page.c:
Don't use __pure. Staticized.
db_output.c:
Reduced page width from 80 to 79 to inhibit double spacing for long
lines (there are still some problems if words are printed across
column 79).
The details are hidden in the DB_COMMAND(cmd_name, func_name) and
DB_SHOW_COMMAND(cmd_name, func_name) macros. DB_COMMAND() adds to
the top-level ddb command table and DB_SHOW_COMMAND adds to the
`show' subtable. Most external commands will probably be `show'
commands with no side effects. They should check their pointer
args more carefully than `show map' :-), or ddb should trap internal
faults better (like it does for memory accesses).
The vm ddb commands are temporarily unattached.
ddb.h:
Also declare `db_indent' and db_iprintf() which will replace vm's
`indent' and iprintf().
Saved a few bytes by copying `dosdev' and/or `name' to local variables.
This optimization (for dosdev) was done in one place before but this
was lost in the devread() cleanup. This optimization (for dosdev)
can almost be done by bogusly declaring dosdev as const, but gcc still
often space-pessimizes code like the following:
extern const int dosdev; ... foo(dosdev); bar(dosdev);
gcc often doesn't bother to copy dosdev to a temporary local because
the local would have to be preserved in memory across the call to
foo(). OTOH, for
extern int dosdev; ... auto int dosdev_copy = dosdev; ...
foo(dosdev_copy); bar(dosdev_copy);
the copy must be made because foo() might alter dosdev.
the pointer to the string "/kernel". This pointer was once only
statically to once save space, but it has had to be dynamically
initialized for some time, so the static initialization just wastes
space. The string gets moved to the text section, so the actual
savings may be negative due to padding.
instead of 0 if there is no input.
pcvt_drv.c:
Partially fixed pccncheckc(). It returned a boolean value instead of
the character that it fetches from the input fifo (if any). I think
it still discards characters after the first for multi-char input.
instead of 0 if there is no input.
syscons.c:
Added missing spl locking in sccncheckc(). Return the same value as
sccngetc() would. It is wrong for sccngetc() to return non-ASCII, but
stripping the non-ASCII bits doesn't help.
still being used just to support printing of the device name in the
probe. Restored the method used in rev.1.6 and changed it to print
the same strings as the previous revision.
Reviewed by: Paul Richards
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.
[part 1 of two commits, I just realized I can't play with the indices as
I was typing this commit message.]
to "keepidle". this should not occur unless the connection has
been established via the 3-way handshake which requires an ACK
Submitted by: jmb
Obtained from: problem discussed in Stevens vol. 3
B_ASYNC flag broke things pretty bad (freeing buffer already on
queue or other wierd buffer queue errors.) The broken code is
left in commented out, but this makes the problem go away for
now.
(1) Add PC98 support to apm_bios.h and ns16550.h, remove pc98/pc98/ic
(2) Move PC98 specific code out of cpufunc.h (to pc98.h)
(3) Let the boot subtrees look more alike
Submitted by: The FreeBSD(98) Development Team
<freebsd98-hackers@jp.freebsd.org>
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.
biosextmem > 65536, but biosextmem is a 16-bit quantity so it is
guaranteed to be < 65536. Related cruft for biosbasemem was
mostly cleaned up in rev.1.26.
It worked because it is spelled correctly in LINT.
Added old obscure syscons options MAXCONS, SLOW_VGA and XT_KEYBOARD.
This file should be sorted both alphabetically and on the module
name by using a consistent prefix for each module, but there is no
consistency in the old options. E.g., MAXCONS is spelled PCVT_NSCREENS
for pcvt.
and xdm, possibly in general.
What was happening was that the server was doing a tcsetattr(.. TCSADRAIN)
on the mouse fd after a write. Since /dev/sysmouse had a null t_oproc,
the drain failed with EIO. Somehow this spammed XFree86 (!@&^#%*& binary
release!!), and the driver was left in a bogus state (ie: switch_in_progress
permanently TRUE).
The simplest way out was to implement a dummy scmousestart() routine to
accept any characters from the tty system and toss them into the void.
It would probably be more correct to intercept scwrite()'s to the mouse
device, but that's executed for every single write to the screen.
Supplying a start routine to eat the characters is only executed for the
mouse port during startup/shutdown, so it should be faster.
-I- to CFLAGS. <sb.h> must currently be used to give the version
of sb.h in the current directory, while "sb.h" in the buggy version
gave the (wrong) version in the source directory. Searching in the
source directory first is normal, but is the reverse of the order
suggested by the 4.4Lite2 #include style. -I- will remove the
ambiguities.
Sorry if this makes it harder to merge in lite2 stuff but hey..
At least I can figure out what is going on whenever I end up going through those
files again..
do we have a policy regarding commenting existing code?
This enables other consumers of the mouse, to get it info via
moused/syscons.
In order to use it run moused (from sysconfig), and then tell
your Xserver that it should use /dev/sysmouse (mknod sysmouse c 12 128)
and it a mousesystems mouse. Everybody will be happy then :)
Remember that moused still needs to know what kind of mouse you
have..
Comments welcome, as is test results...
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)
(A pointer to a const was misused to avoid loading loading the same
value twice, but gcc does exactly the same optimization automatically.
It can see that the value hasn't changed.)
- avoiding strcmp("?" saved 12 bytes. gcc inlined the strcmp()
but this takes as much or more code as a function call. The
inlining was bogus because the strcmp() in the bootstrap isn't
standard.
- using a char instead of an int for the boolean `last_only' saved 8
bytes. Booleans should usually be represented as chars on the i386.
- simplifying the return tests saved 9 bytes.
- using putc instead of printf to print a newline saved 3 bytes of code
and 2 bytes of const data.
- avoiding `else's by always doing the else clause and fixing it up
saved 4+8 bytes.
gcc always generates large code for accesses to globals. For locals
it only generates large code if there are more than 128 bytes of
locals. It sorts scalar locals after array locals to pessimize for
space in the usual case when there are more (static) references to
scalars than to arrays.
Saved another 16 bytes (13 before padding) by adding a `continue'.
Fall-through tests normally save space, but here one of them made
gcc do space-unoptimal register allocation (it allocates ch in %bl
because preserving this register across function calls is "free",
but comparisions with %bl take one byte fewer than comparsions with
%bl).