zlacker

[parent] [thread] 7 comments
1. jepler+(OP)[view] [source] 2019-11-10 22:44:41
But the math.log10 solution is unfortunately "wrong" too, at least in my python3 implementation.

    import math

    def digitLengthCorrect(n):
        return len(str(n))

    def digitLengthClever2(n):
        return 1 if n == 0 else (math.floor(math.log10(n)) + 1)

    testcases = (
        [10 ** i for i in range(300)] +
        [(10 ** i) - 1 for i in range(300)]
    )

    for t in testcases:
        a = digitLengthCorrect(t)
        b = digitLengthClever2(t)
        assert a == b, (t, a, b)
replies(5): >>jstrie+I1 >>gspetr+Dr >>jdnier+rs >>FabHK+IE >>contra+FP
2. jstrie+I1[view] [source] 2019-11-10 23:06:05
>>jepler+(OP)
Interesting point, could be worth adding a note to students about the finite precision of floating point numbers in Python. Thanks for pointing this out!
replies(1): >>jepler+X2
◧◩
3. jepler+X2[view] [source] [discussion] 2019-11-10 23:19:51
>>jstrie+I1
Incidentally, while it focuses on values that fit in machine registers, you might check out Hacker's Delight's section on computing "ilog10". In the second edition, section 11-4 is on integer logarithms.

If your target language is Python3, you have the "bit_length()" method of integer objects. Multiply by c ~= 0.30103 and then correct for being too low; superficially, it seems like this algorithm should work for extremely large numbers, at least to millions of digits.

As far as test cases, n10, (n10)+1 and (n10)-1 seem like good choices and as long as the underlying mathematical function is monotonic should ferret out failures. I could easily run my implementation for numbers up to 30,000 digits, but going to 300,000 digits didn't finish while writing this message.

    log10_2 = math.log10(2)
    def digit_length_clever3(n):
        if not n: return 1
        est = math.floor(n.bit_length() * log10_2)
        low = n >= 10**est
        return est + low
4. gspetr+Dr[view] [source] 2019-11-11 05:24:59
>>jepler+(OP)
Good point. To indemnify youself from this type of data loss, change math.log10() to np.log10() and use Decimal() for tasks involving precise number crunching.

EDIT: Setting Decimal module's precision manually via getcontext() may be required if you work with "long" numbers.

    import math
    import numpy as np
    from decimal import getcontext, Decimal

    def digit_length_correct(n):
        return len(str(n))
    
    def digit_length_clever_2(n):
        if n ==0:
            return 1
        else:
            return math.floor((np.log10(Decimal(n)))) + 1 # You can use np.floor() and eschew the use of math. module altogether, but I left it intact to show the minimal necessary modifications
    
    def generate_cases(n):
        getcontext().prec = n + 3
        return (
        [10 ** i for i in range(n)] +
        [(10 ** i) - 1 for i in range(n)]
    )
    
    cases = generate_cases(300)
    
    for t in cases:
        a = digit_length_correct(t)
        b = digit_length_clever_2(t)
        assert a == b, (t, a, b)
    
    for t in cases[-2:]:
        print(t)
        print(digit_length_correct(t))
        print(digit_length_clever_2(t))
replies(1): >>contra+EW
5. jdnier+rs[view] [source] 2019-11-11 05:37:29
>>jepler+(OP)
Sure enough. This may help show what happens with float precision:

    >>> import math
    >>>
    >>> for i in range(1, 20):
    ...     n = int('9' * i)
    ...     digitLengthClever2 = math.floor(math.log10(n) + 1)
    ...     float_value = math.log10(n)
    ...     print(f'{n:<20} {digitLengthClever2:>2}  {float_value}')
    ...
    9                     1  0.9542425094393249
    99                    2  1.99563519459755
    999                   3  2.9995654882259823
    9999                  4  3.9999565683801923
    99999                 5  4.999995657033466
    999999                6  5.999999565705301
    9999999               7  6.99999995657055
    99999999              8  7.999999995657055
    999999999             9  8.999999999565706
    9999999999           10  9.99999999995657
    99999999999          11  10.999999999995657
    999999999999         12  11.999999999999567
    9999999999999        13  12.999999999999957
    99999999999999       14  13.999999999999996  # <-
    999999999999999      16  15.0                # <- 
    9999999999999999     17  16.0
    99999999999999999    18  17.0
    999999999999999999   19  18.0
    9999999999999999999  20  19.0
6. FabHK+IE[view] [source] 2019-11-11 09:08:10
>>jepler+(OP)
Yes, the so-called "naive" solution is the best one, in my view, insofar as it is correct. Of course, choose the "clever" solution if speed is more important than correctness (in other words, if the approximate number of digits is acceptable, eg. if a +1 safety buffer is used) - though then benchmark to show that it is actually faster, for the distribution of numbers you actually encounter.

So, the discussion of this problem can be extended further into a discussion of specs, tradeoffs, benchmarking, etc.

7. contra+FP[view] [source] 2019-11-11 11:21:45
>>jepler+(OP)
Yeah it was bound to go wrong somewhere when you try to find the length of an arbitrary length integer using floating point arithmetic.
◧◩
8. contra+EW[view] [source] [discussion] 2019-11-11 12:34:26
>>gspetr+Dr
Using arbitrary precision arithmetic kind of defeats the point of not using loops.
[go to top]