zlacker

It’s not them denying it, it’s the LLM that generated this slop.

All they had to say was that the KiB et. al. were introduced in 1998, and the adoption has been slow.

And not “but a kilobyte can be 1000,” as if it’s an effort issue.

>>ozozoz+(OP)
They are managed by different standards organizations. One doesn't like the other encroaching on its turf. "kilo" has only one official meaning as a base-10 scalar.

replies(2): >>dietr1+2R >>NetMag+Nt2

>>kevin_+yx
I don't think of base 10 being meaningful in binary computers. Indexing 1k needs 10 bits regardless if you wanted 1000 or 1024, and the base 10 leaves some awkward holes.

In my mind base 10 only became relevant when disk drive manufacturers came up with disks with "weird" disk sizes (maybe they needed to reserve some space for internals, or it's just that the disk platters didn't like powers of two) and realised that a base 10 system gave them better looking marketing numbers. Who wants a 2.9TB drive when you can get a 3TB* drive for the same price?

replies(5): >>userbi+6T >>thfura+KU >>fc417f+611 >>manana+K21 >>jibal+ds1

>>dietr1+2R
At the TB level, the difference is closer to 10%.

Three binary terabytes i.e. 3 * 2^40 is 3298534883328, or 298534883328 more bytes than 3 decimal terabytes. The latter is 298.5 decimal gigabytes, or 278 binary gigabytes.

Indeed, early hard drives had slightly more than even the binary size --- the famous 10MB IBM disk, for example, had 10653696 bytes, which was 167936 bytes more than 10MB --- more than an entire 160KB floppy's worth of data.

>>dietr1+2R
>I don't think of base 10 being meaningful in binary computers.

Okay, but what do you mean by “10”?

replies(1): >>dietr1+E31

>>dietr1+2R
> I don't think of base 10 being meaningful in binary computers.

They communicate via the network, right? And telephony has always been in base 10 bits as opposed to base two eight bit bytes IIUC. So these two schemes have always been in tension.

So at some point the Ki, Mi, etc prefixes were introduced along with b vs B suffixes and that solved the issue 3+ decades ago so why is this on the HN front page?!

A better question might be, why do we privilege the 8 bit byte? Shouldn't KiB officially have a subscript 8 on the end?

replies(1): >>purple+Am1

>>dietr1+2R
Buy an SSD, and you can get both at the same time!

That is to say, all the (high-end/“gamer”) consumer SSDs that I’ve checked use 10% overprovisioning and achieve that by exposing a given number of binary TB of physical flash (e.g. a “2TB” SSD will have 2×1024⁴ bytes’ worth of flash chips) as the same number of decimal TB of logical addresses (e.g. that same SSD will appear to the OS as 2×1000⁴ bytes of storage space). And this makes sense: you want a round number on your sticker to make the marketing people happy, you aren’t going to make non-binary-sized chips, and 10% overprovisioning is OK-ish (in reality, probably too low, but consumers don’t shop based on the endurance metrics even if they should).

replies(2): >>userbi+Mb1 >>jdsull+wM1

>>thfura+KU
10, not to be confused with 10 or even the weird cousin, 10

>>manana+K21
you aren’t going to make non-binary-sized chips

TLC flash actually has a total number of bits that's a multiple of 3, but it and QLC are so unreliable that there's a significant amount of extra bits used for error correction and such.

SSDs haven't been real binary sizes since the early days of SLC flash which didn't need more than basic ECC. (I have an old 16MB USB drive, which actually has a user-accessible capacity of 16,777,216 bytes. The NAND flash itself actually stores 17,301,504 bytes.)

>>fc417f+611
To be fair, the octet as the byte has been dominant for decades. POSIX even has the definition “A byte is composed of a contiguous sequence of 8 bits.” I would wager many software engineers don't even know that a non-octet bytes were a thing, given that college CS curricula typically just teach a byte is 8 bits.

I found some search results about Texas Instruments' digital signal processors using 16-bit bytes, and came across this blogpost from 2017 talking about implementing 16-bit bytes in LLVM: https://embecosm.com/2017/04/18/non-8-bit-char-support-in-cl.... Not sure if they actually implemented it, but that was surprising to me that non octet bytes still exist, albeit in a very limited manner.

Do you know of any other uses for bytes that are not 8 bits?

replies(3): >>zineke+es1 >>ahazre+xO1 >>fc417f+CZ1

>>ozozoz+(OP)
There's no evidence of an LLM being involved.

>>dietr1+2R
> I don't think of base 10 being meaningful in binary computers.

First, you implicitly assumed a decimal number base in your comment.

Second: Of course its meaningful. It's also relevant since humans use binary computers and numeric input and output in text is almost always in decimal.

>>purple+Am1
> Do you know of any other uses for bytes that are not 8 bits?

For "bytes" as the term-of-art itself? Probably not. For "codes" or "words"? 5 bits are the standard in Baudot transmission (in teletype though). 6- and 7-bit words were the standards of the day for very old computers (ASCII is in itself a 7-bit code), especially on DEC-produced ones (https://rabbit.eng.miami.edu/info/decchars.html).

>>manana+K21
"consumers don’t shop based on the endurance metrics even if they should"

Its been well over a decade now and neither I nor anyone I know has ever had an SSD endurance issue. So it seems like the type of problem where you should just go enterprise if you have it.

>>purple+Am1
Back in the days of Octal notation, there were computers with a 12 bit word size that used sixbit characters (early DEC PDP-8, PDP-5, early CDC machines). 'Byte' was sometimes used for 6- and 9-bit halfword values.

>>purple+Am1
I wanted to reply with a bunch of DSP examples but on further investigation the ones I checked just now seem to very deliberately use the term "data word". That said, the C char type in these cases is one "data word" as opposed to 8 bits; I feel like that ought to count as a non-8-bit byte regardless of the terminology in the docs.

NXP makes a number of audio DSPs with a native 24 bit width.

Microchip still ships chips in the PIC family with instructions of various widths including 12 and 14 bit however I believe the data memory on those chips is either 8 or 16 bit. I have no idea how to classify a machine where the instruction and data memory widths don't match.

Unlike POSIX, C merely requires that char be at least 8 bits wide. Although I assume lots of real world code would break if challenged on that particular detail.

>>kevin_+yx
Who was appointed as arbiter of meaning for kilo? And by what right?

replies(1): >>yencab+zG4

>>NetMag+Nt2
The International Bureau of Weights and Measures, by an agreement between 64 countries that has been in effect for 6 and a half decades by now, and currently officially used by countries representing approximately 95% of the world's population. The work itself started in 1875 with an agreement between 17 countries.

A little late to lawyer that...