zlacker

Upon thinking about this further, I guess it's sort of obvious why this happens. dl(0) does a constant, a function invocation, a comparison, a conditional, and a return of a constant (although if you look at the bytecode you'll see a few more steps in there). len(str(0)) does a constant and two built-in function invocations, which (as it happens) involve lookups in the global namespace at runtime in case you have redefined len or str.

So, basically, it's only interpreting a very small number of interpreter bytecodes either way, so the small number it has to interpret to use the builtins is comparable to the small number it has to interpret to run the recursive definition, and so the interpretive overhead is comparable (and it swamps the overhead of things like allocating a new string).

This machine is kind of slow. This took 54 CPU seconds in LuaJIT:

    > function dl(n) if n < 10 then return 1 else return 1 + dl(n/10) end end
    > for i = 1, 1000*1000*100 do dl(15322) end

That means that approach took 540 ns per invocation rather than Python's 4640 ns --- only about 9× slower instead of the usual 40×. Or maybe this is a case where LuaJIT isn't really coming through the way it usually does.