import math
def digitLengthCorrect(n):
return len(str(n))
def digitLengthClever2(n):
return 1 if n == 0 else (math.floor(math.log10(n)) + 1)
testcases = (
[10 ** i for i in range(300)] +
[(10 ** i) - 1 for i in range(300)]
)
for t in testcases:
a = digitLengthCorrect(t)
b = digitLengthClever2(t)
assert a == b, (t, a, b)If your target language is Python3, you have the "bit_length()" method of integer objects. Multiply by c ~= 0.30103 and then correct for being too low; superficially, it seems like this algorithm should work for extremely large numbers, at least to millions of digits.
As far as test cases, n10, (n10)+1 and (n10)-1 seem like good choices and as long as the underlying mathematical function is monotonic should ferret out failures. I could easily run my implementation for numbers up to 30,000 digits, but going to 300,000 digits didn't finish while writing this message.
log10_2 = math.log10(2)
def digit_length_clever3(n):
if not n: return 1
est = math.floor(n.bit_length() * log10_2)
low = n >= 10**est
return est + low def digitlength(n):
digits = 0
while True:
digits += 1
n /= 10
if n == 0:
return digits> As a result, solutions using strings are disallowed on problem sets and quizzes until they are taught.
Edit: In C, the version that loops is 15 times faster than the version that allocates a new string. Python is weird.
def digitlength(n):
digits = 1
while (n > 9):
digits += 1
n /= 10
return digits
Also may want to set n = abs(n) in the beginning, in case n is negative. In [1]: dl = lambda n: 1 if n < 10 else 1 + dl(n // 10)
In [3]: map(dl, [0, 1, 9, 10, 99, 1000, 15432, 32, 801])
Out[3]: [1, 1, 1, 2, 2, 4, 5, 2, 3]
In [4]: %timeit dl(15322)
100000 loops, best of 3: 4.64 µs per loop
In [5]: %timeit len(str(15322))
1000000 loops, best of 3: 1.41 µs per loop
In [6]: %timeit dl(0)
1000000 loops, best of 3: 721 ns per loop
In [7]: %timeit len(str(0))
1000000 loops, best of 3: 1.24 µs per loop def digitLength(n):
dlen = 1
high = 9
while n > high:
dlen += 1
high = 10*high + 9
return dlen fn digit_length_recursive(input: usize) -> usize {
if input == 0 {
0
} else {
1 + digit_length_recursive(input / 10)
}
}
fn digit_length(input: usize) -> usize {
std::cmp::max(digit_length_recursive(input), 1)
}I read https://www.netmeister.org/blog/cs-falsehoods.html which came from https://news.ycombinator.com/item?id=21500672 in the top HN articles at the moment
Item 27 of that list made me laugh when I read this article :)
Wow. this is one of the reasons I hated school. No programmatic reason what given for why a string solution couldnt be used, only an arbitrary reason. Here students may have knowledge from self teaching or whatever, but they are unallowed to use that knowledge because "reasons".
To any teacher that thinks its a good idea to punish students for thinking outside the box: shame on you. All youre going to end up doing is crushing enthusiasm and/or creating drones. Please dont.
This also fails my personal rule to always need to have a reason to do something mean - like failing someone on a problem, and no reason to do something nice - like saying it’s fine since it works.
I've taught CS courses before, and have seen plenty of self-proclaimed self-taught know-it-alls who seem to be more stackoverflow-copy-pasters than anything else.
Thankfully this TA agreed with you (as do I). He said it looked good, and I think that's the shortest lab I ever had.
the division examples are not necessary either, thats the point. you can solve it different ways, that doesnt mean one way is not necessary, it just means its different. one may be faster, one may be more readable. If you dont allow different solutions you cant explore the tradeoffs between them.
personally, when something is to basic to hold my interest, i try to find ways to make it more challenging.
So, basically, it's only interpreting a very small number of interpreter bytecodes either way, so the small number it has to interpret to use the builtins is comparable to the small number it has to interpret to run the recursive definition, and so the interpretive overhead is comparable (and it swamps the overhead of things like allocating a new string).
This machine is kind of slow. This took 54 CPU seconds in LuaJIT:
> function dl(n) if n < 10 then return 1 else return 1 + dl(n/10) end end
> for i = 1, 1000*1000*100 do dl(15322) end
That means that approach took 540 ns per invocation rather than Python's 4640 ns --- only about 9× slower instead of the usual 40×. Or maybe this is a case where LuaJIT isn't really coming through the way it usually does.In an operating systems class we had a little project to create a command line calculator in C, with the added hoop of using x86 ASM for all control flow and calculations. As this was not a programming class we had a very brief overview of the very basics needed to get this done. I assumed using floating point arithmetic would make this easier, but knew that was not part of the early spec, so I asked the professor in class what version of x86 and he said pentium 1.
I then found and read intel’s documentation for the first generation Pentium. Which completely changed my mental model of CPU’s. Honestly, it was probably the closest collage assignment to how real world coding works and much more useful long term than just implementing some simple algorithms by hand.
Sometimes you actually have to do arithmetic when programming. It's useful to teach people these sort of things so they don't cook up half baked solutions in the future.
I wonder what would happen if you implemented itoa and then counted digits. It’s pretty much the same thing.
EDIT: Setting Decimal module's precision manually via getcontext() may be required if you work with "long" numbers.
import math
import numpy as np
from decimal import getcontext, Decimal
def digit_length_correct(n):
return len(str(n))
def digit_length_clever_2(n):
if n ==0:
return 1
else:
return math.floor((np.log10(Decimal(n)))) + 1 # You can use np.floor() and eschew the use of math. module altogether, but I left it intact to show the minimal necessary modifications
def generate_cases(n):
getcontext().prec = n + 3
return (
[10 ** i for i in range(n)] +
[(10 ** i) - 1 for i in range(n)]
)
cases = generate_cases(300)
for t in cases:
a = digit_length_correct(t)
b = digit_length_clever_2(t)
assert a == b, (t, a, b)
for t in cases[-2:]:
print(t)
print(digit_length_correct(t))
print(digit_length_clever_2(t))The entire problem is arbitrary. It's not a real-world problem looking for a solution. It's a programming exercise.
>>> import math
>>>
>>> for i in range(1, 20):
... n = int('9' * i)
... digitLengthClever2 = math.floor(math.log10(n) + 1)
... float_value = math.log10(n)
... print(f'{n:<20} {digitLengthClever2:>2} {float_value}')
...
9 1 0.9542425094393249
99 2 1.99563519459755
999 3 2.9995654882259823
9999 4 3.9999565683801923
99999 5 4.999995657033466
999999 6 5.999999565705301
9999999 7 6.99999995657055
99999999 8 7.999999995657055
999999999 9 8.999999999565706
9999999999 10 9.99999999995657
99999999999 11 10.999999999995657
999999999999 12 11.999999999999567
9999999999999 13 12.999999999999957
99999999999999 14 13.999999999999996 # <-
999999999999999 16 15.0 # <-
9999999999999999 17 16.0
99999999999999999 18 17.0
999999999999999999 19 18.0
9999999999999999999 20 19.0These crop up all the time in those online programming challenge things.
They are usually maths problems (e.g. Find the combined area of two partially intersecting squares based on their coordinates of a,b,c,d and w,x,y,z) where the solution is done via programming but most of the work is, in my opinion, figuring out the maths which is testing for the wrong thing.
I like the general idea of it--improving the algorithm incrementally, finding exception cases, but I wonder if there is a better example that could be used.
It seems to me that one either knows how to use logarithms or not, and thus students would either skip to the final solution, or be stuck until having the answer given to them.
for larger numbers it is probably faster.
digits = ceil(bits * log10(2))
A 10 KiB picture can be thought of as one 24 661 digit number. A 2 GiB video file can be thought of as one 5 171 655 946 digit number. I find this really puts everything in perspective. Every number already exists, digital content creators are just trying to find them. Certain numbers are actually illegal.A practical application of this: calculating buffer sizes for a function that converts numbers to text.
/* digits = ceil(bits * log10(2))
* = ceil(64 * log10(2))
* = 20
*/
#define DECIMAL_DIGITS_64_BITS 20
/* digits + sign + NUL */
char digits[DECIMAL_DIGITS_64_BITS + 1 + 1];
It's possible to handle arbitrary precision numbers by counting the number of bits and calculating the amount of memory that must be allocated for the resulting string.Regarding itoa(), the first implementation provided in the article actually match it's implementation, without the unnecessary bits (building the output).
I hate those programming class just trying to teach python surface use, while in a programming class you have time to go deeper and learn about how python works, cause basically python use all str to binary and loop for doing all the work requested by the teacher, without student even being aware of how it does it !
One could say as well: this isn't programming, this is low level language tricks.
You don't need to know "how integers are manipulated into cpu" when learning to program, and at an introductory class like that described in the article you shouldn't either. There's a reason SICP at MIT was in Lisp and then Python.
I also very much doubt it would be "much more fun doing it in Assembly language".
The author went at length to explain how this is a useful exercize for programming, as it introduces edge cases, alternative implementations, testing, etc.
I hope he learnt more than the students...
Well, as long as he disallowed the obvious solution (last paragraph of the OP), and then he still got the solution wrong... (see other comments up threat)
In our collage, one of the first semester classes we were doing assembly on old Motorola 8 bit, the other one was Java.
people that never programmed before had more trouble with java, than assembler.
So, the discussion of this problem can be extended further into a discussion of specs, tradeoffs, benchmarking, etc.
I think if you tell stuff like this upfront it is totally acceptable: “Reduce the number of digits in a given float without converting it to a String”.
Even students with previous knowledge could take this as a challenge where they learn something.
But if you want to teach programming, i would follow my path, and provide a deeper understanding at how to control a cpu and how it really works. In order to demystify computer and gives student a real experience of all the hidden works that are done with a 4 lines python code.
and for the fun side, guess why assembly is the second searched language on Stack overflow during weekend : https://stackoverflow.blog/2017/02/07/what-programming-langu...
i guess people are trying to have more fun on weekend than on boring office project during work days :)
And those numbers show a real interest about Assembly which is usually greatly discarded in any common CS teaching anywere. So teachers decides it's not interesting, while in fact most peoples search about it on weekend...I mean it illustrates a real issue here. May be understanding how to program a cpu at low level is something natural, that only scholar peoples cannot understand, therefore neglecting natural tendency of normal peoples to try to understand how things really work...
And for those really wanting to even dig deeper and understand what is a CPU, i strongly suggest looking for "from nand to tetris" https://www.nand2tetris.org/ wich basically start at nand logical gate, to the extent to create a full working cpu and programming it to play tetris.
That said, I do agree that as stated the problem is a toy. The problem statement could at least motivate the lack of string operations - i.e. pretend you’re the language designer and you’re tasked with implementing str(int) in C. Just saying “don’t do that” isn’t helpful. Gaining an understanding that nothing is magical is useful though.
def digit_length(n):
totlen = 0
n += not n
while n > 0:
klen = 1
k = 10
while n > k * k:
k = k * k
klen *= 2
n = n // k
totlen += klen
return totlenI'd just picked up python and was having so much fun going through the project Euler problems.
I can't remember which question it was for but we had to know the length of an integer. At that point I hadn't learnt about type conversion so had to implement it the way the author wants his students to do it.
I remember fondly as well that I hadn't learnt about the modulo operator and had to implement that by myself as well.
I couldn't believe it when I got the answer and progressed to the forum where others shared their answers and they were doing it on one line with %!
Good times!
I get that sometimes the simplest solution to a problem is not teaching the right stuff, but in almost all cases that is a fault in the problem, not the solution.
They should consider the upper bound of their algorithm. Because it uses floating point, it doesn't work for all natural numbers, as required by their specification.
To address your point, counting digits in base 2 is much simpler than in base 10, because the internal representation of the number on the computer is already base 2. You can use numeric calculations, but you can also just look at your digits.
(Edited for fewer tangents and to be more positive.)
Avoiding loops by recursion is no worse than avoiding loops by calling a looping library function...
The thing I did learn, and which I think has paid more dividends in my career than knowing the definition of the Fibonacci sequence, is that knowing your tools well can save you time and effort and reduce the amount of your own code you need to maintain and put effort into.
For the purposes of teaching, the best approach is not always to start from the foundations and build upwards.
https://github.com/python/cpython/blob/master/Objects/floato...
Thinking outside the box is of course valuable, but the teachers have specific concept they want to teach and constraints like that help them accomplish that.
When it comes to tests, it’s not as easy. At tests it’s is more about solving the actual problem, but one could still argue that the test is meant to test how well one has learned what was thought, not to solve the problem.
I think this is totally fair in this case.
If you prefer, you can view the problem restrictions as a means to encourage students to develop a more general algorithm for solving the problem, rather than a programmatic implementation specific to Python. Hopefully this makes them seem less arbitrary.
In this case, even though the solution is disallowed for grading purposes, there is an opportunity to discuss it during the lesson.
Since the discussion takes place in a low-pressure, guided group lesson, students will certainly not be penalized for suggesting the string method in that setting. Also, students trying to use strings on homework will fail tests in the linter and autograder before they finally submit, so they will have plenty of opportunity to fix their mistake before receiving a grade on their work.
I once got points taken off for using a recursive function as part of my solution. I didn't know the language before, nor did I have much experience with recursive functions as a formal concept.
I didn't even know at that time that recursive functions were a special thing that had to be taught to me.
Blocking off a generic "anything you haven't been taught yet" is a hellfire of ambiguous egg shells.
Strictly speaking you might not have been taught to use division with loops. Maybe you haven't been taught about the log10 function, or you have but you haven't been taught about combining the log10 function with the round/ceil functions, are you sure you were taught about putting if's before these exact functions?
The issue is every programming question requires by its very nature that you use something you haven't been taught to solve it. So you have to know what the teacher counts as acceptable vs unacceptable, and this comes into weird edge cases where the logic behind why something would be considered part of the "you must be taught first" category vs the "you are to assume figuring this out is part of the question" category requires knowing an overview of how to teach computer science, something the students would not have.
I once got points taken off for using a recursive function as part of my solution because that wasn't taught until the next quarter. I didn't know the language before, nor did I have much experience with recursive functions as a formal concept. We knew how to make functions, call functions, do loops, and even do gotos.
I didn't even know at that time that recursive functions were a special thing that had to be taught to me, it just came to me as the way to solve the problem.
Also, counting digits in base 2 is not the log2. The former gives you the latter but not the reverse. Finding the number of decimal digits in a number given in base 2 is not a simplification.
Look how much discussion there is over a few thousand nanoseconds here, edge cases around 16 digit numbers (if my bank account gets that high, and you accidentally round it up to too many digits, we can deal with that when it happens).
It seems inconcievable that the most 'productive' can spend this much effort on everything they do, and 'pick your battles' only goes so far towards explaining it. 'Move fast and break things' goes farther but only in the space of possibilities where correct vs incorrect has approximately no consequences.
"Companies with unicorn valuations are most probably doing things which don't have to be done very well"?
constraints should be spelled out explicitly.
The issue is every programming question requires by its very nature that you use something you haven't been taught to solve it
that, i don't agree with. it's entirely possible to state programming questions in such a way that they only require things you have already been taught. (in its simplest form "being taught" means "being told about". most (if not all) freecodecamp exercises work like that.
For example, if early in an elementary number theory class the student is asked to prove that there are in infinite number of primes of the form 4n+3, a solution that just invokes Dirichlet's theorem on primes in arithmetic progressions would probably not be acceptable. That approach does work to show that there are an infinite number of 4n+3 primes, but completely fails to show that that the student understood the material actually taught in class.
It's the exact same thing with the digit counting problem. Solving it by just invoking the built in string length function does little to demonstrate that the student understands the material taught so far.
...and then checks for and corrects the potential off-by-one thus incurred.
If you do these calculations by hand, the complexity will be more obvious because each operation will be smaller.
> Evaluating different solutions leads to natural questions about the definitions implicit in the problem statement: mathematically, what is digit length?
OF COURSE! This is a MATHEMATICAL problem that is posed as a PROGRAMMING question. This has always confused me and I could never justify until reading this article why I felt them to be irrelevant to showcasing my programming knowledge. At the very least just tell people, this is the mathematical reasoning behind it, and watch the person implement it.
Good job teaching unreliable algorithms.
Notice that all the straightforward solutions the author rejects, they reject because it fails on 0. Then, with the oh-so-clever log10 solution... surprise! They special-case 0. I'm not sure what the lesson here is supposed to be.
Even within just Arabic numbers, it possible for localization systems to add digit separators such as commas, underscores, or periods depending on region.
Sure, str() of an out-of-the-box number in Python won't do localization by default, but Python is a highly dynamic language and I've seen __str__() handlers that do localization, even on things that otherwise act like numbers.
"Simple" string conversion-based "math" rarely is simple in the real world, and ignoring localization and the complexities of Unicode in situations where you are assuming they can't possibly happen is a jungle of mistakes that programmers continue to make every day. The world doesn't run on EBCDIC or ASCII alone anymore, and localization is a thing that matters to a lot of people. The correspondence between code points reported by len(str()) and either input to str() and output to a display device is tenuous at best in anything but the best ASCII circumstances, and handwaving it away is an exercise in your particular bias.
[ETA: And the parent discussion was explicitly about table formatting the output, and that itself specifically is a scenario where you should want to localize the output first and not just rely on str(yourNumber).]