This ambiguity is documented at least back to 1984, by IBM, the pre-eminent computer company of the time.
In 1972 IBM started selling the IBM 3333 magnetic disk drive. This product catalog [0] from 1979 shows them marketing the corresponding disks as "100 million bytes" or "200 million bytes" (3336 mdl 1 and 3336 mdl 11, respectively). By 1984, those same disks were marketed in the "IBM Input/Output Device Summary"[1] (which was intended for a customer audience) as "100MB" and "200MB"
0: (PDF page 281) "IBM 3330 DISK STORAGE" http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...
1: (PDF page 38, labeled page 2-7, Fig 2-4) http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...
Also, hats off to http://electronicsandbooks.com/ for keeping such incredible records available for the internet to browse.
-------
Edit: The below is wrong. Older experience has corrected me - there has always been ambiguity (perhaps bifurcated between CPU/OS and storage domains). "And that with such great confidence!", indeed.
-------
The article presents wishful thinking. The wish is for "kilobyte" to have one meaning. For the majority of its existence, it had only one meaning - 1024 bytes. Now it has an ambiguous meaning. People wish for an unambiguous term for 1000 bits, however that word does not exist. People also might wish that others use kibibyte any time they reference 1024 bytes, but that is also wishful thinking.
The author's wishful thinking is falsely presented as fact.
I think kilobyte was the wrong word to ever use for 1024 bytes, and I'd love to go back in time to tell computer scientists that they needed to invent a new prefix to mean "1,024" / "2^10" of something, which kilo- never meant before kilobit / kilobyte were invented. Kibi- is fine, the phonetics sound slightly silly to native English speakers, but the 'bi' indicates binary and I think that's reasonable.
I'm just not going to fool myself with wishful thinking. If, in arrogance or self-righteousness, one simply assumes that every time they see "kilobyte" it means 1,000 bytes - then they will make many, many failures. We will always have to take care to verify whether "kilobyte" means 1,000 or 1,024 bytes before implementing something which relies on that for correctness.
There was always a confusion about whether a kilobyte was 1000 or 1024 bytes. Early diskettes always used 1000, only when the 8 bit home computer era started was the 1024 convention firmly established.
Before that it made no sense to talk about kilo as 1024. Earlier computers measured space in records and words, and I guess you can see how in 1960, no one would use kilo to mean 1024 for a 13 bit computer with 40 byte records. A kiloword was, naturally, 1000 words, so why would a kilobyte be 1024?
1024 bearing near ubiquitous was only the case in the 90s or so - except for drive manufacturing and signal processing. Binary prefixes didn't invent the confusion, they were a partial solution. As you point out, while it's possible to clearly indicate binary prefixes, we have no unambiguous notation for decimal bytes.
In fact, they practically say the same exact thing you have said: In a nutshell, base-10 prefixes were used for base-2 numbers, and now it's hard to undo that standard in practice. They didn't say anything about making assumptions. The only difference is that that the author wants to keep trying, and you don't think it's possible? Which is perfectly fine. It's just not as dramatic as your tone implies.
Even worse, the 3.5" HD floppy disk format used a confusing combination of the two. Its true capacity (when formatted as FAT12) is 1,474,560 bytes. Divide that by 1024 and you get 1440KB; divide that by 1000 and you get the oft-quoted (and often printed on the disk itself) "1.44MB", which is inaccurate no matter how you look at it.
Similarly, the 4104 chip was a "4kb x 1 bit" RAM chip and stored 4096 bits. You'd see this in the whole 41xx series, and beyond.
Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.
Many people have been using it wrong for decades, but its literal value did not change.
You are free to intend only one meaning in your own communication, but you may sometimes find yourself being misunderstood: that, too, is reality.
In fact, this is the only case I can think of where that has ever happened.
I was going to say that what it could address and what they called what it could address is an important distinction, but found this fun ad from 1976[1].
"16K Bytes of RAM Memory, expandable to 60K Bytes", "4K Bytes of ROM/RAM Monitor software", seems pretty unambiguous that you're correct.
Interestingly wikipedia at least implies the IBM System 360 popularized the base-2 prefixes[2], citing their 1964 documentation, but I can't find any use of it in there for the main core storage docs they cite[3]. Amusingly the only use of "kb" I can find in the pdf is for data rate off magnetic tape, which is explicitly defined as "kb = thousands of bytes per second", and the only reference to "kilo-" is for "kilobaud", which would have again been base-10. If we give them the benefit of the doubt on this, presumably it was from later System 360 publications where they would have had enough storage to need prefixes to describe it.
[1] https://commons.wikimedia.org/wiki/File:Zilog_Z-80_Microproc...
[2] https://en.wikipedia.org/wiki/Byte#Units_based_on_powers_of_...
[3] http://www.bitsavers.org/pdf/ibm/360/systemSummary/A22-6810-...
I wonder if there's a wikipedia article listing these...
Which makes it really @#ing annoying when you have things like "I want to transmit 8 gigabytes (meaning gibibytes, 2*30) over a 1 gigabit/s link, how long will it take?". Welcome to every networking class in the 90s.
We should continue moving towards a world where 2*k prefixes have separate names and we use SI prefixes only for their precise base-10 meanings. The past is polluted but we hopefully have hundreds of years ahead of us to do things better.
Example: in 1972, DEC PDP 11/40 handbook [0] said on first page: "16-bit word (two 8-bit bytes), direct addressing of 32K 16-bit words or 64K 8-bit bytes (K = 1024)". Same with Intel - in 1977 [1], they proudly said "Static 1K RAMs" on the first page.
[0] https://pdos.csail.mit.edu/6.828/2005/readings/pdp11-40.pdf
[1] https://deramp.com/downloads/mfe_archive/050-Component%20Spe...
You can say that one meaning is more correct than the other, but that doesn't vanish the other meaning from existence.
More like late 60s. In fact, in the 70s and 80s, I remember the storage vendors being excoriated for "lying" by following the SI standard.
There were two proposals to fix things in the late 60s, by Donald Morrison and Donald Knuth. Neither were accepted.
Another article suggesting we just roll over and accept the decimal versions is here:
https://cacm.acm.org/opinion/si-and-binary-prefixes-clearing...
This article helpfully explains that decimal KB has been "standard" since the very late 90s.
But when such an august personality as Donald Knuth declares the proposal DOA, I have no heartburn using binary KB.
https://www-cs-faculty.stanford.edu/~knuth/news99.html
And he was right.
Context is important.
"K" is an excellent prefix for 1024 bytes when working with small computers, and a metric shit ton of time has been saved by standardizing on that.
When you get to bigger units, marketing intervenes, and, as other commenters have pointed out, we have the storage standard of MB == 1000 * 1024.
But why is that? Certainly it's because of the marketing, but also it's because KB has been standardized for bytes.
> Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.
You couldn't be more wrong. Absolutely nobody talks about 8K bytes of memory and means 8000.
Now, it depends.
That's the microcomputer era that has defined the vast majority of our relationship with computers.
IMO, having lived through this era, the only people pushing 1,000 byte kilobytes were storage manufacturers, because it allows them to bump their numbers up.
https://www.latimes.com/archives/la-xpm-2007-nov-03-fi-seaga...
Kudos for getting back. (and closing the tap of "you are wrong" comments :))
But once hard drives started hitting about a gigabyte was when everyone started noticing and howling.
> The author's wishful thinking is falsely presented as fact.
There's good reason why the meanings of SI prefixes aren't set by convention or by common usage or by immemorial tradition, but by the SI. We had several thousand years of setting weights and measures by local and trade tradition and it was a nightmare, which is how we ended up with the SI. It's not a good show for computing to come along and immediately recreate the long and short ton.
E.g., M-W lists both, with even the 1,024 B definition being listed first. Wiktionary lists the 1,024 B definition, though it is tagged as "informal".
As a prescriptivist myself I would love if the world could standardize on kilo = 1000, kibi = 1024, but that'll likely take some time … and the introduction of the word to the wider public, who I do not think is generally aware of the binary prefixes, and some large companies deciding to use the term, which they likely won't do, since companies are apt to always trade for low-grade perpetual confusion over some short-term confusion during the switch.
If we are talking about kilobytes, it could just as easily the opposite.
Unless you were referring to only contracts which you yourself draft, in which case it'd be whatever you personally want.
Which doesn't make it more correct, of course, even through I strongly believe believe that it is (where appropriate for things like memory sizes). Just saying, it goes much further back than 1984.
Adding to your point, it is human nature to create industry- or context-specific units and refuse to play with others.
In the non-metric world, I see examples like: Paper publishing uses points (1/72 inch), metal machinists use thousands of an inch, woodworkers use feet and inches and binary fractions, land surveyors use decimal feet (unusual!), waist circumference is in inches, body height is in feet and inches, but you buy fabric by the yard, airplane altitudes are in hundreds to tens of thousands of feet instead of decimal miles. Crude oil is traded in barrels but gasoline is dispensed in gallons. Everyone thinks their usage of units and numbers is intuitive and optimal, and everyone refuses to change.
In the metric(ish) world, I still see many tensions. The micron is a common alternate name for the micrometre, yet why don't we have a millin or nanon or picon? The solution is to eliminate the micron. I've seen the angstrom (0.1 nm) in spectroscopy and in the discussion of CPU transistor sizes, yet it diverts attention away from the picometre. The bar (100 kPa) is popular in talking about things like tire pressure because it's nearly 1 atmosphere. The mmHg is a unit of pressure that sounds metric but is not; the correct unit is pascal. No one in astronomy uses mega/giga/tera/peta/etc.-metres; instead they use AU and parsec and (thousand, million, billion) light-years. Particle physics use eV/keV/MeV instead of some units around the picojoule.
Having a grab bag of units and domains that don't talk to each other is indeed the natural state of things. To put your foot down and say no, your industry does not get its own special snowflake unit, stop that nonsense and use the standardized unit - that takes real effort to achieve.
Here's my theory. In the beginning, everything was base10. Because humans.
Binary addressing made sense for RAM. Especially since it makes decoding address lines into chip selects (or slabs of core, or whatever) a piece of cake, having chips be a round number in binary made life easier for everyone.
Then early DOS systems (CP/M comes to mind particularly) mapped disk sectors to RAM regions, so to enable this shortcut, disk sectors became RAM-shaped. The 512-byte sector was born. File sizes can be written in bytes, but what actually matters is how many sectors they take up. So file sizing inherited this shortcut.
But these shortcuts never affected "real computers", only the hamstrung crap people were running at home.
So today we have multiple ecosystems. Some born out of real computers, some with a heavy DOS inheritance. Some of us were taught DOS's limitations as truth, and some of us weren't.
I don't know if that's correct, but at least it'd explain the mismatch.
This is a myth. The first IBM harddrive was 5,000,000 characters in 1956 - before bytes were even common usage. Drives have always been base10, it's not a conspiracy.
Drives are base10, lines are base10, clocks are base10, pretty much everything but RAM is base10. Base2 is the exception, not the rule.
Yeah, I already knew that, lol.
But thanks for bringing it to my attention. :-)
You can get away with those on machines with 64 bit address spaces and TFLOPs of math capacity. You can't on anything older or smaller.
However it doesn't seem to be divided into sectors at all, more like each track is like a loop of magnetic tape. In that context it makes a bit more sense to use decimal units, measuring in bits per second like for serial comms.
Or maybe there were some extra characters used for ECC? 5 million / 100 / 100 = 500 characters per track, leaves 72 bits over for that purpose if the actual size was 512.
First floppy disks - also from IBM - had 128-byte sectors. IIRC, it was chosen because it was the smallest power of two that could store an 80-column line of text (made standard by IBM punched cards).
Disk controllers need to know how many bytes to read for each sector, and the easiest way to do this is by detecting overflow of an n-bit counter. Comparing with 80 or 100 would take more circuitry.
But that said, we aren't talking about sector sizes. Of course storage mediums are always going to use sector sizes of powers of two. What's being talked about here is the confusion in how to refer to the storage medium's total capacity.
Actually, that's not true.
As far as I know, IBM floppy disks always used power-of-2 sizes. The first read-write IBM floppy drives to ship to customers were part of the IBM 3740 Data Entry System (released 1973), designed as a replacement for punched cards. IBM's standard punched card format stored 80 bytes per a card, although some of their systems used a 96 byte format instead. 128 byte sectors was enough to fit either, plus some room for expansion. In their original use case, files were stored with one record/line/card per a disk sector.
However, unlike floppies, (most) IBM mainframe hard disks didn't use power-of-2 sectors. Instead, they supported variable sector sizes ("CKD" format) – when you created a file, it would be assigned one or more hard disk tracks, which then would be formatted with whatever sector size you wanted. In early systems, it was common to use 80 byte sectors, so you could store one punched card per a sector. You could even use variable length sectors, so successive sectors on the same track could be of different sizes.
There was a limit on how many bytes you could fit in a track - for an IBM 3390 mainframe hard disk (released 1989), the maximum track size is 56,664 bytes – not a power of two.
IBM mainframes historically used physical hard disks with special firmware that supported all these unusual features. Nowadays, however, they use industry standard SSDs and hard disks, with power of two sector sizes, but running special software on the SAN which makes it look like a busload of those legacy physical hard disks to the mainframe. And newer mainframe applications use a type of file (VSAM) which uses power-of-two sector sizes (512 bytes through 32KB, but 4KB is most common). So weird sector sizes is really only a thing for legacy apps (BSAM, BDAM, BPAM-sans-PDSE), and certain core system files which are stuck on that format due to backward compatibility requirements. But go back to the 1960s/1970s, non-power-of-2 sector sizes were totally mainstream on IBM mainframe hard disks.
And in that environment, 1000 bytes rather than 1024 bytes makes complete sense. However, file sizes were commonly given in allocation units of tracks/cylinders instead of bytes.