If the number representation is encoded outside the 64 bits then you have removed the 64 bit restriction. Of course it is hard to calculate how many bits of information are required to define the type. But "uint64" is pretty small and just requires our current human context (quite a few more bits of information) to make sense!
The explanation for how to interpret the 64 bit string is always outside of the 64 bit string. It's going to be tough to compare how big two explanations are since that's going to depend a lot on what each person considers to be a sufficient explanation.
I'm personally quite rusty on lambda calculus, and from glancing at the author's papers about lambda calculus encodings I suspect it will take a rather large number of bytes to represent an explanation that will make it through my thick skull!
Indeed, if we look at Church's paper, we'd find that the term is 80 symbols written fully, {λt[{{{{{t}(t)}(λh[λf[λn[{{{n}(h)}(f)}(n)]]])}(t)}(t)}(t)]}(λf[λx[{f}({f}(x))]]), or 44 symbols when abbreviated, {λt t(t, (λh λf λn n(h, f, n)), t, t, t)}(λf λx f(f(x))), which aren't too impressive given the alphabet of 11 or 12 symbols.
I made the simplest choices I could that do not waste bits.
> And why use a self-delimiting format in the first place?
Because a lambda term description has many different parts that you need to be able to separate from each other.
> And why encode de Bruijn indices in unary
I tried to answer that in more detail in this previous discussion [1].
[1] >>37584869
It's pretty natural to use de Bruijn indexes, since their meaning is local and their composition matches the language semantics (they have a denotational semantics, which gives expressions an intrinsic meaning). For example, with de Bruijn indices the expression `λ0` is always the identity function. If we tried a different scheme, like "the index of the lambda in the string", then expressions like `λ0` would have no intrinsic meaning; it depends on what "string" they occur in, which requires some concrete notion of "the string", which complicates any interpreter, etc.
Whenever I see an implementation of lambda calculus (or some derivative) which doesn't use de Bruijn indexes, I require some justification to convince me why not :) (There are such justifications, e.g. performance optimisations, use of a host language's naming facilities, etc.; but I've can't think of any reason a self-contained, minimalist approach should avoid them)
> why use a self-delimiting format in the first place?
There are various technical reasons to prefer self-delimiting languages when working in algorithmic information theory. For example, prefix-free (binary) encodings are paths through a (binary) tree, with programs as leaves. This lets us assign well-defined probability measures to the set of all programs (assign 1 to the root node, and divide each node's probability between its (two) children). It's also more natural when programs are executed before they're fully known; e.g. reading one bit at a time on-demand (say, generated by a coin toss)