Hey HN!
I've been curious about the history of computer science and decided to try to read Turing's 1936 paper where he conceptualizes the Turing Machine, etc.
I had trouble understanding the paper, read The Annotated Turing by Charles Petzold (which is wonderful), but felt that reading a reference implementation would help formalize my understanding. When I couldn't find an open source implementation, I decided to write my own.
The implementation includes:
- Abbreviated tables (m-functions)
- Conversions to Standard Descriptions and Description Numbers
- A working universal machine
- A walkthrough of the paper section-by-section in the context of the codebase
I'm still working on sections 8-11 (the math/logic is a bit difficult for me) - if you understand these sections well enough to explain, I'd love to talk to you!In general I am interested in new mediums for learning source material (web annotations, walkthroughs, reference implementations, etc.), and was inspired by Karpathy's Zero to Hero course in particular for this project.
Note that a straightforward universal machine for the lambda calculus can be orders of magnitude smaller than for Turing machines [1].
[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
I'd love to see how far you have got - I had more of an interest in lambda calculus (and particularly combinators) but I have now developed a serious fascination with TMs. And yes the CS course I did covered them - but that was a long time ago!
[1] https://www.ics.uci.edu/~lopes/teaching/inf212W12/readings/c...
[2] https://compcalc.github.io/public/church/church_calculi_1941...
https://www.youtube.com/watch?v=kmAc1nDizu0
Also the BB Challenge site - where I've been getting definitions of TMs to run:
e.g.
1RB0LD_1RC0RF_1LC1LA_0LE1RZ_1LF0RB_0RC0RE
I've ran that one even though there is no hope of me ever seeing it terminate
https://bbchallenge.org/1RB0LD_1RC0RF_1LC1LA_0LE1RZ_1LF0RB_0...
I chose Go for the Turing Machines because I enjoy writing it, and planned to blindly use Go again for the Lambda Calculus for the reason above, but if you have a recommendation I'd love to hear it!
Although I never wrote a Turing Machine interpreter in the lambda calculus, I did write one for its close cousin, the Brainfuck language [2].
[1] https://www.irif.fr/~krivine/articles/lazymach.pdf
[2] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
For encoding size, it would be optimal if index i occurs roughly with frequency 2^-i. In many lambda terms of practical interest, one does see higher indices occurring much less frequently, so it's not terribly far from optimal. Some compression is certainly possible; within n binding lambdas, index n could be encoded as 1^n instead of 1^n 0, but again that severely complicates the interpreter itself.
I noticed that some of the program lengths ended up in expressions of lower and upper bounds. Also lambda terms represented with De Bruijn indices are essentially lists of numbers and a binary encoding could give exponentially shorter representation as compared to an unary encoding at the price of some overhead when dealing with the binary numbers which I thought might be a constant.
But I did admittedly not read the page too carefully and would probably need to refresh my knowledge to properly understand the details. The programs there are also mostly short, so a binary encoding would probably make them longer. And if it really mattered, than this is of course such an obvious thing to do, that it would certainly not have been overlooked.
Several of the most celebrated theorems in AIT, such as Symmetry of Information, employ programs that need to interpret other programs. So constants in these theorems significantly suffer from complicated index encodings.
A binary encoding is not as simple as you might think, as the code needs to be self-delimiting. So you first need to encode the number of bits used in the binary code. And that must also be self-delimiting. As you can see, you need to fall back to a unary encoding at some point, as that is naturally self-delimiting.
An example of an efficient binary self-delimiting code can be found at [1]. Note that the program for decoding it into a Church numeral (the very best case) is already about 60% the size of the entire self-interpreter.
Many programs used in AIT proofs use only single digit indices (the self-interpreter e.g. only uses 1-5) and these would be negatively impacted by a binary encoding.
[1] https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...