What makes today's software bloated is the crud built-up over time, standardization, security, reliability, a trend toward easier maintenance/productivity over raw speed, and so on. Here's [1] a simple program that got trimmed down to mere bytes. You can see how much overhead the aforementioned items add to C code which, by itself, produces very efficient assembler. For people wanting a middle ground, there are High Level Assemblers such as Hyde's HLA [2] and I've speculated we could do something similar with LLVM's bytecode.
[1] http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...
[2] http://www.plantation-productions.com/Webster/HighLevelAsm/H...