Indeed, if we look at Church's paper, we'd find that the term is 80 symbols written fully, {λt[{{{{{t}(t)}(λh[λf[λn[{{{n}(h)}(f)}(n)]]])}(t)}(t)}(t)]}(λf[λx[{f}({f}(x))]]), or 44 symbols when abbreviated, {λt t(t, (λh λf λn n(h, f, n)), t, t, t)}(λf λx f(f(x))), which aren't too impressive given the alphabet of 11 or 12 symbols.
I made the simplest choices I could that do not waste bits.
> And why use a self-delimiting format in the first place?
Because a lambda term description has many different parts that you need to be able to separate from each other.
> And why encode de Bruijn indices in unary
I tried to answer that in more detail in this previous discussion [1].
[1] >>37584869
It's pretty natural to use de Bruijn indexes, since their meaning is local and their composition matches the language semantics (they have a denotational semantics, which gives expressions an intrinsic meaning). For example, with de Bruijn indices the expression `λ0` is always the identity function. If we tried a different scheme, like "the index of the lambda in the string", then expressions like `λ0` would have no intrinsic meaning; it depends on what "string" they occur in, which requires some concrete notion of "the string", which complicates any interpreter, etc.
Whenever I see an implementation of lambda calculus (or some derivative) which doesn't use de Bruijn indexes, I require some justification to convince me why not :) (There are such justifications, e.g. performance optimisations, use of a host language's naming facilities, etc.; but I've can't think of any reason a self-contained, minimalist approach should avoid them)
> why use a self-delimiting format in the first place?
There are various technical reasons to prefer self-delimiting languages when working in algorithmic information theory. For example, prefix-free (binary) encodings are paths through a (binary) tree, with programs as leaves. This lets us assign well-defined probability measures to the set of all programs (assign 1 to the root node, and divide each node's probability between its (two) children). It's also more natural when programs are executed before they're fully known; e.g. reading one bit at a time on-demand (say, generated by a coin toss)