1 kilobyte is precisely 1000 bytes?

This ambiguity is documented at least back to 1984, by IBM, the pre-eminent computer company of the time.

In 1972 IBM started selling the IBM 3333 magnetic disk drive. This product catalog [0] from 1979 shows them marketing the corresponding disks as "100 million bytes" or "200 million bytes" (3336 mdl 1 and 3336 mdl 11, respectively). By 1984, those same disks were marketed in the "IBM Input/Output Device Summary"[1] (which was intended for a customer audience) as "100MB" and "200MB"

0: (PDF page 281) "IBM 3330 DISK STORAGE" http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...

1: (PDF page 38, labeled page 2-7, Fig 2-4) http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...

Also, hats off to http://electronicsandbooks.com/ for keeping such incredible records available for the internet to browse.

-------

Edit: The below is wrong. Older experience has corrected me - there has always been ambiguity (perhaps bifurcated between CPU/OS and storage domains). "And that with such great confidence!", indeed.

-------

The article presents wishful thinking. The wish is for "kilobyte" to have one meaning. For the majority of its existence, it had only one meaning - 1024 bytes. Now it has an ambiguous meaning. People wish for an unambiguous term for 1000 bits, however that word does not exist. People also might wish that others use kibibyte any time they reference 1024 bytes, but that is also wishful thinking.

The author's wishful thinking is falsely presented as fact.

I think kilobyte was the wrong word to ever use for 1024 bytes, and I'd love to go back in time to tell computer scientists that they needed to invent a new prefix to mean "1,024" / "2^10" of something, which kilo- never meant before kilobit / kilobyte were invented. Kibi- is fine, the phonetics sound slightly silly to native English speakers, but the 'bi' indicates binary and I think that's reasonable.

I'm just not going to fool myself with wishful thinking. If, in arrogance or self-righteousness, one simply assumes that every time they see "kilobyte" it means 1,000 bytes - then they will make many, many failures. We will always have to take care to verify whether "kilobyte" means 1,000 or 1,024 bytes before implementing something which relies on that for correctness.

>>quotem+9c
It doesn't, there's no singular optimal amount of over-provisioning. And that would make no sense, you'd have 28% over-provisioning for a 100/128GB drive, vs 6% over-provisioning for a 500/512GB drive, vs. 1.2% over-provisioning for a 1000/1024GB drive.

It's easy to find some that are marketed as 500GB and have 500x10^9 bytes [0]. But all the NVMe's that I can find that are marketed as 512GB have 512x10^9 bytes[1], neither 500x10^9 bytes nor 2^39 bytes. I cannot find any that are labeled "1TB" and actually have 1 Tebibyte. Even "960GB" enterprise SSD's are measured in base-10 gigabytes[2].

0: https://download.semiconductor.samsung.com/resources/data-sh...

1: https://download.semiconductor.samsung.com/resources/data-sh...

2: https://image.semiconductor.samsung.com/resources/data-sheet...

(Why are these all Samsung? Because I couldn't find any other datasheets that explicitly call out how they define a GB/TB)

>>none_t+Of
No it’s not. KiB is an abbreviation for kibibyte

Eg https://en.wikipedia.org/wiki/Kilobyte

>>surpri+(OP)
Whenever this discussion comes up I liked to point out that even in the computer industry, prefixes like kilo/mega/etc more often mean a power of 10 than a power of 2:

I gave some examples in my post https://blog.zorinaq.com/decimal-prefixes-are-more-common-th...

>>none_t+mf
. . . it seems weird that these are reported with a lowercase 'k' but 'M' and so on remain uppercase.

For SI units, the abbreviations are defined, so a lowercase k for kilo and uppercase M for mega is correct. Lower case m is milli, c is centi, d is deci. Uppercase G is giga, T is tera and so on.

https://en.wikipedia.org/wiki/International_System_of_Units#...

>>angst_+xi
Even then it was not universal. For example, that Apple I ad that got posted a few days ago mentioned that "the system is expandable to 65K". https://upload.wikimedia.org/wikipedia/commons/4/48/Apple_1_...

>>angst_+xi
> The Z-80 microprocessor could address 64kb (which was 65,536 bytes) on its 16-bit address bus.

I was going to say that what it could address and what they called what it could address is an important distinction, but found this fun ad from 1976[1].

"16K Bytes of RAM Memory, expandable to 60K Bytes", "4K Bytes of ROM/RAM Monitor software", seems pretty unambiguous that you're correct.

Interestingly wikipedia at least implies the IBM System 360 popularized the base-2 prefixes[2], citing their 1964 documentation, but I can't find any use of it in there for the main core storage docs they cite[3]. Amusingly the only use of "kb" I can find in the pdf is for data rate off magnetic tape, which is explicitly defined as "kb = thousands of bytes per second", and the only reference to "kilo-" is for "kilobaud", which would have again been base-10. If we give them the benefit of the doubt on this, presumably it was from later System 360 publications where they would have had enough storage to need prefixes to describe it.

[1] https://commons.wikimedia.org/wiki/File:Zilog_Z-80_Microproc...

[2] https://en.wikipedia.org/wiki/Byte#Units_based_on_powers_of_...

[3] http://www.bitsavers.org/pdf/ibm/360/systemSummary/A22-6810-...

>>cedill+Wd
it's, way older in than the 1990's! In computering, "K" always meant 1024 at least from 1970's.

Example: in 1972, DEC PDP 11/40 handbook [0] said on first page: "16-bit word (two 8-bit bytes), direct addressing of 32K 16-bit words or 64K 8-bit bytes (K = 1024)". Same with Intel - in 1977 [1], they proudly said "Static 1K RAMs" on the first page.

[0] https://pdos.csail.mit.edu/6.828/2005/readings/pdp11-40.pdf

[1] https://deramp.com/downloads/mfe_archive/050-Component%20Spe...

>>cedill+Wd
> 1024 bearing near ubiquitous was only the case in the 90s or so

More like late 60s. In fact, in the 70s and 80s, I remember the storage vendors being excoriated for "lying" by following the SI standard.

There were two proposals to fix things in the late 60s, by Donald Morrison and Donald Knuth. Neither were accepted.

Another article suggesting we just roll over and accept the decimal versions is here:

https://cacm.acm.org/opinion/si-and-binary-prefixes-clearing...

This article helpfully explains that decimal KB has been "standard" since the very late 90s.

But when such an august personality as Donald Knuth declares the proposal DOA, I have no heartburn using binary KB.

https://www-cs-faculty.stanford.edu/~knuth/news99.html

>>surpri+(OP)
Nope.

It would be nice to have a different standard for decimal vs. binary kilobytes.

But if Don Knuth thinks that the "international standard" naming for binary kilobytes is dead on arrival, who am I to argue?

https://www-cs-faculty.stanford.edu/~knuth/news99.html

>>surpri+(OP)
>Why do we often say 1 kilobyte = 1024 bytes?

Because Windows, and only Windows, shows it this way. It is official and documented: https://devblogs.microsoft.com/oldnewthing/20090611-00/?p=17...

> Explorer is just following existing practice. Everybody (to within experimental error) refers to 1024 bytes as a kilobyte, not a kibibyte. If Explorer were to switch to the term kibibyte, it would merely be showing users information in a form they cannot understand, and for what purpose? So you can feel superior because you know what that term means and other people don’t.

>>Valdik+Gx
I know the only other software with this kind of error: https://github.com/lsd-rs/lsd/issues/807

>>pif+0j
Knuth thought the international standard promulgated naming (kibibyte) was DOA.

https://www-cs-faculty.stanford.edu/~knuth/news99.html

And he was right.

Context is important.

"K" is an excellent prefix for 1024 bytes when working with small computers, and a metric shit ton of time has been saved by standardizing on that.

When you get to bigger units, marketing intervenes, and, as other commenters have pointed out, we have the storage standard of MB == 1000 * 1024.

But why is that? Certainly it's because of the marketing, but also it's because KB has been standardized for bytes.

> Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.

You couldn't be more wrong. Absolutely nobody talks about 8K bytes of memory and means 8000.

>>cedill+Wd
only when the 8 bit home computer era started was the 1024 convention firmly established.

That's the microcomputer era that has defined the vast majority of our relationship with computers.

IMO, having lived through this era, the only people pushing 1,000 byte kilobytes were storage manufacturers, because it allows them to bump their numbers up.

https://www.latimes.com/archives/la-xpm-2007-nov-03-fi-seaga...

>>surpri+(OP)
Metric prefixing should only be used with the unit bit. There is no confusion there. I mean, if you would equate a bit with a certain voltage threshold, you could even argue about fractional bits.

Approximating metric prefixing with kibi, Mibi, Gibi... is confusing because it doesn't make sense semantically. There is nothing base-10-ish about it.

I propose some naming based on shift distance, derived from the latin iterativum. https://en.wikipedia.org/wiki/Latin_numerals#Adverbial_numer...

* 2^10, the kibibyte, is a deci (shifted) byte, or just a 'deci'

* 2^20, the mibibyte, is a vici (shifted) byte, or a 'vici'

* 2^30, the gibibyte, is a trici (shifted) byte, or a 'trici'

I mean, we really only need to think in bytes for memory addressing, right? The base doesn't matter much, if we were talking exabytes, does it?

>>public+ss
It's "referer" in the HTTP standard, but "referrer" when correctly spelled in English. https://en.wikipedia.org/wiki/HTTP_referer

>>surpri+(OP)
I'm sticking with power-of-2 sizes. Invent a new word for decimal, metric units where appropriate. I proposed[0] "kitribytes", "metribytes", "gitribytes", etc. Just because "kilo" has a meaning in one context doesn't mean we're stuck with it in others. It's not as though the ancient Greeks originally meant "kilo" to mean "exactly 1,000". "Giga" just meant "giant". "Tera" is just "monster". SI doesn't have sole ownership for words meaning "much bigger than we can possibly count at a glance".

Donald Knuth himself said[1]:

> The members of those committees deserve credit for raising an important issue, but when I heard their proposal it seemed dead on arrival --- who would voluntarily want to use MiB for a maybe-byte?! So I came up with the suggestion above, and mentioned it on page 94 of my Introduction to MMIX. Now to my astonishment, I learn that the committee proposals have actually become an international standard. Still, I am extremely reluctant to adopt such funny-sounding terms; Jeffrey Harrow says "we're going to have to learn to love (and pronounce)" the new coinages, but he seems to assume that standards are automatically adopted just because they are there.

If Gordon Bell and Gene Amdahl used binary sizes -- and they did -- and Knuth thinks the new terms from the pre-existing units sound funny -- and they do -- then I feel like I'm in good company on this one.

0: https://honeypot.net/2017/06/11/introducing-metric-quantity....

1: https://www-cs-faculty.stanford.edu/~knuth/news99.html

>>kmm+rb
> classic 3.5 inch floppy disks

90 mm floppy disks. https://jdebp.uk/FGA/floppy-discs-are-90mm-not-3-and-a-half-...

Which I have taken to calling 1440 KiB – accurate and pretty recognizable at the same time.

>>nerdsn+Da
People were using metric words for binary numbers since at least the late 1950s: https://en.wikipedia.org/wiki/Timeline_of_binary_prefixes#19...

Which doesn't make it more correct, of course, even through I strongly believe believe that it is (where appropriate for things like memory sizes). Just saying, it goes much further back than 1984.

>>Taniwh+LI
From https://archive.org/details/byte-magazine-1977-02/page/n145/...:

“A byte was described as consisting of any number of parallel bits from one to six. Thus a byte was assumed to have a length appropriate for the occasion. Its first use was in the context of the input-output equipment of the 1950s, which handled six bits at a time.”

>>tomber+aG
>> It doesn't matter. "kilo" means 1000. People are free to use it wrong if they wish.

> All words are made up.

Yes, and the made up words of kilo and kibi were given specific definitions by the people who made them up:

* https://en.wikipedia.org/wiki/Metric_prefix

* https://en.wikipedia.org/wiki/Binary_prefix

> […] as long as both parties understand and are consistent in their usage to each other.

And if they don't? What happens then?

Perhaps it would be easier to use the words definitions as they are set up in standards and regulations so context is less of an issue.

* https://xkcd.com/1860/

>>surpri+(OP)
Ah, if only I had a dollar for every time I've had to point someone to the a tool like the following when trying to explain the difference between how much "bandwidth" their server has per month (an IEC unit) vs how fast the server connection is (a SI unit): https://null.53bits.co.uk/uploads/programming/javascript/dat...

>>soneil+2Q
RAMAC, not RAMDAC: https://en.wikipedia.org/wiki/History_of_IBM_magnetic_disk_d...

However it doesn't seem to be divided into sectors at all, more like each track is like a loop of magnetic tape. In that context it makes a bit more sense to use decimal units, measuring in bits per second like for serial comms.

Or maybe there were some extra characters used for ECC? 5 million / 100 / 100 = 500 characters per track, leaves 72 bits over for that purpose if the actual size was 512.

First floppy disks - also from IBM - had 128-byte sectors. IIRC, it was chosen because it was the smallest power of two that could store an 80-column line of text (made standard by IBM punched cards).

Disk controllers need to know how many bytes to read for each sector, and the easiest way to do this is by detecting overflow of an n-bit counter. Comparing with 80 or 100 would take more circuitry.

>>NooneA+bJ
You think the people selling RAM are somehow more virtuous than the people selling hard disks?

https://en.wikipedia.org/wiki/DRAM_price_fixing_scandal

>>superj+bE
It may "make sense" but that's actually a false equivalence. The raw disk space for a 3.5" high-density floppy disk for IBM PCs is 512 bytes per sector * 18 sectors per track * 80 tracks per side * 2 sides = 1,474,560 bytes. It is 1.47 MB or 1.40 MiB neither of which is 1440 KB or KiB. The 1440 number comes from Microsoft's FAT12 filesystem. That was the space that's left for files outside the allocation table.

Sectors per track or tracks per side is subject to change. Moreover a different filesystem may have non-linear growth of the MFT/superblock that'll have a different overhead.

https://en.wikipedia.org/wiki/List_of_floppy_disk_formats

>>surpri+(OP)
There is a counterproductive obsession with powers of 10.

Sometimes, other systems just make more sense.

For example, for time, or angles, or bytes. There are properties of certain numbers (or bases) that make everything descending from them easier to deal with.

for angles and time (and feet): https://en.wikipedia.org/wiki/Superior_highly_composite_numb...

For other problems we use base 2, 3, 8, 16, or 10.

Must we treat metric as a hammer, and every possible problem as a nail?

>>ZoomZo+cU1
>Capital K is for Kelvin.

It should be "kelvin" here. ;)

Unit names are always lower-case[1] (watt, joule, newton, pascal, hertz), except at the start of a sentence. When referring to the scientists the names are capitalized of course, and the unit symbols are also capitalized (W, J, N, Pa, Hz).

[1] SI Brochure, Section 5.3 "Unit Names" https://www.bipm.org/documents/20126/41483022/SI-Brochure-9-...

>>tomber+xW1
You could think of the SI as a form of language planning.

https://en.wikipedia.org/wiki/Language_planning

(Then you could decide what you think about language planning.)

>>fc417f+962
> Metric prefixes are always lower case, so GB isn't valid metric.

Ummm, what? https://en.wikipedia.org/wiki/Metric_prefix

>>its_ma+Na2
> even the metric people are not so crazy

No, they were absolutely that crazy [1]. Luckily the proposal fell through.

1. https://en.wikipedia.org/wiki/Decimal_time

>>f33d51+RC1
5000, but a slightly shorter foot than the modern one.

https://en.wikipedia.org/wiki/Mile#Roman

https://en.wikipedia.org/wiki/Ancient_Roman_units_of_measure...

>>fc417f+4f2
> no one is going to confuse mB for millibytes because what would that even mean?

Data compression. For example, look at http://prize.hutter1.net/ , heading "Contestants and Winners for enwik8". On 23.May'09, Alex's program achieved 1.278 bits per character. On 4.Nov'17, Alex achieved 1.225 bits per character. That is an improvement of 0.053 b/char, or 53 millibits per character. Similarly, we can talk about how many millibits per pixel JPEG-XL is better than classic JPEG for the same perceptual visual quality. (I'm using bits as the example, but you can use bytes and reach the same conclusion.)

Just because you don't see a use for mB doesn't mean it's open for use as a synonym of MB. Lowercase m means milli-, as already demonstrated in countless frequently used units - millilitre, millimetre, milliwatt, milliampere, and so on.

In case you're wondering, mHz is not a theoretical concept either. If you're generating a tone at say 440 Hz, you can talk about the frequency stability in millihertz of deviation.

>>fc417f+sZ1
To be fair, the octet as the byte has been dominant for decades. POSIX even has the definition “A byte is composed of a contiguous sequence of 8 bits.” I would wager many software engineers don't even know that a non-octet bytes were a thing, given that college CS curricula typically just teach a byte is 8 bits.

I found some search results about Texas Instruments' digital signal processors using 16-bit bytes, and came across this blogpost from 2017 talking about implementing 16-bit bytes in LLVM: https://embecosm.com/2017/04/18/non-8-bit-char-support-in-cl.... Not sure if they actually implemented it, but that was surprising to me that non octet bytes still exist, albeit in a very limited manner.

Do you know of any other uses for bytes that are not 8 bits?

>>pwdiss+sL
> 90 mm floppy disks. https://jdebp.uk/FGA/floppy-discs-are-90mm-not-3-and-a-half-...

That page is part right and part wrong.

It is right in claiming that "3.5-inch" floppies are actually 90 mm.

It is wrong in claiming that the earlier "5.25-inch" floppies weren't metric

"5.25-inch" floppies are actually 130 mm as standardised in ECMA-78 [0]

"8-inch" floppies are actually 200 mm as standardised in ECMA-69 [1]

Actually there's a few different ECMA standards for 130 and 200 mm floppies – the physical dimensions are the same, but using different recording mechanisms (FM vs MFM–those of a certain age may remember MFM as "double density", and those even older may remember FM as "single density"), and single-sided versus double-sided.

[0] ECMA-78: Data interchange on 130 mm flexible disk cartridges using MFM recording at 7 958 ftprad on 80 tracks on each side), June 1986: https://ecma-international.org/publications-and-standards/st...

[1] ECMA-69: Data interchange on 200 mm flexible disk cartridges using MFM recording at 13 262 ftprad on both sides, January 1981: https://ecma-international.org/publications-and-standards/st...

>>purple+Wk2
> Do you know of any other uses for bytes that are not 8 bits?

For "bytes" as the term-of-art itself? Probably not. For "codes" or "words"? 5 bits are the standard in Baudot transmission (in teletype though). 6- and 7-bit words were the standards of the day for very old computers (ASCII is in itself a 7-bit code), especially on DEC-produced ones (https://rabbit.eng.miami.edu/info/decchars.html).

>>moi238+Eg2
This is https://en.wikipedia.org/wiki/Reductio_ad_absurdum

Inability to communicate isn't what we observe because as I already stated, meaning is shared. Dictionaries are one way shared meaning can be developed, as are textbooks, software source codes, circuits, documentation, and any other artifact which links the observable with language. All of that being collectively labeled culture. The mass of which I analogized with inertia so as to avoid oversimplifications like yours.

My point is that one person's definition does not a culture, make. And that adoption of new word definitions is inherently a group cultural activity which requires time, effort, and the willingness of the group to participate. People must be convinced the change is an improvement on some axis. Dictation of a definition from on high is as likely to result in the word meaning the exact opposite in popular usage as not. Your comment seems to miss any understanding or acknowledgement that a language is a living thing, owned by the people who speak it, and useful for speaking about the things which matter most to them. That credible dictionaries generally don't accept words or definitions until widespread use can be demonstrated.

It seems like some of us really want human language to work like rule-based computer languages. Or think they already do. But all human languages come free with a human in the loop, not a rules engine.

>>assimp+yu2
Depends if you're using the botanical definition or the (more common) culinary definition[0].

I would argue fruit and fruit are two words, one created semasiologically and the other created onomasiologically. Had we chosen a different pronunciation for one of those words, there would be no confusion about what fruits are.

[0] - https://en.wikipedia.org/wiki/Fruit#Botanical_vs._culinary

>>surpri+(OP)
Yes. Go check https://en.wikipedia.org/wiki/Metric_prefix and also https://en.wikipedia.org/wiki/Binary_prefix if you want 1024 bytes.

>>highhe+hu2
Dunno but there are two similar but slightly different cross head screw designs https://www.pbswisstools.com/en/news/detail/phillips-and-poz...

>>nacoza+iX2
The Commdore 64 has 64 kibibytes.

https://en.wikipedia.org/wiki/Byte#Multiple-byte_units

"the C64 took its name from its 64 kilobytes (65,536 bytes) of RAM"

https://en.wikipedia.org/wiki/Commodore_64

>>kazina+dc5
I was there too. What you call being "widely reviled" is just another way of saying "in common usage". Maybe you "reviled" it, but it was just convention. So it's not false, and it wasn't well delineated. Just like it continues not to be.

Why don't you take a look at Wikipedia which clearly describes the many, many, many places in which powers-of-10 is used, and then also has a section on powers-of-2:

https://en.wikipedia.org/wiki/Byte#Units_based_on_powers_of_...

Remember, it wasn't just hard drives either. It's been data transfer speeds, network speeds, tape capacities, etc. There's an awful lot of stuff in computing that doesn't inherently depend on powers of 2 for its scaling.

And so as long as we have both units and will always have both units, it makes sense to give them different names. And, obviously, the one that matches the SI system should have the same name as it. Can you seriously disagree? Again, I don't care what you say in conversation. But in labels and specifications, how can you argue against it?

zlacker

1 kilobyte is precisely 1000 bytes?