Indeed, if we look at Church's paper, we'd find that the term is 80 symbols written fully, {λt[{{{{{t}(t)}(λh[λf[λn[{{{n}(h)}(f)}(n)]]])}(t)}(t)}(t)]}(λf[λx[{f}({f}(x))]]), or 44 symbols when abbreviated, {λt t(t, (λh λf λn n(h, f, n)), t, t, t)}(λf λx f(f(x))), which aren't too impressive given the alphabet of 11 or 12 symbols.
It's pretty natural to use de Bruijn indexes, since their meaning is local and their composition matches the language semantics (they have a denotational semantics, which gives expressions an intrinsic meaning). For example, with de Bruijn indices the expression `λ0` is always the identity function. If we tried a different scheme, like "the index of the lambda in the string", then expressions like `λ0` would have no intrinsic meaning; it depends on what "string" they occur in, which requires some concrete notion of "the string", which complicates any interpreter, etc.
Whenever I see an implementation of lambda calculus (or some derivative) which doesn't use de Bruijn indexes, I require some justification to convince me why not :) (There are such justifications, e.g. performance optimisations, use of a host language's naming facilities, etc.; but I've can't think of any reason a self-contained, minimalist approach should avoid them)
> why use a self-delimiting format in the first place?
There are various technical reasons to prefer self-delimiting languages when working in algorithmic information theory. For example, prefix-free (binary) encodings are paths through a (binary) tree, with programs as leaves. This lets us assign well-defined probability measures to the set of all programs (assign 1 to the root node, and divide each node's probability between its (two) children). It's also more natural when programs are executed before they're fully known; e.g. reading one bit at a time on-demand (say, generated by a coin toss)