Which file can contain more information: a 1.44 MB floppy disk or 1 TB hard disk?
Which password is more random (i.e. harder to guess): one that can be stored only 1 byte of memory or one that is stored in 64 bytes?
Information theory deals with determining exactly how many bits of it would take to encode a given problem.
Temperature. Sometimes you read that it's a measure of warmth; sometimes cold. Aren't hot and cold opposites?
Yes, hot and cold are opposites, but in a way they give the same kind of information. That's also true for information and randomness. Specifically, little randomness means more (certain) information.
I think one frequent source of confusion is the difference between "randomness" and "uncertainty" in colloquial versus formal usage. Entropy and randomness in the formal sense don't have a strong connotation that the uncertainty is intrinsic and irreducible. In the colloquial sense, I feel like there's often an implication that the uncertainty can't be avoided.
Let's use an analogy of a remote observation post and with a soldier sending hourly reports:
0 ≝ we're not being attacked
1 ≝ we're being attacked!
Instead of thinking of a particular message x, you have to think of the distribution of messages this soldier sends, which we can model as a random variable X. For example in peaceful times, the message will be 0 99.99% of the time, while in war times could be 50-50 in case of active conflict.The entropy, denoted H(X), measures how uncertain the central command post have about the message before they will receive, or equivalently, the information they gain after receiving the message. The peace time messages contain virtually no information (very low entropy), while wartime 50-50-probability messages contain H(X)=1 bit each.
Another useful way to think about information is to say "how easy would be to guess the message" instead of receiving it? In peacetime you could just assume the message is 0 and you'll be right 99.99% of the time. In wartime, it would be much harder to guess---hence the intuitive notion that wartime messages contain information.
Entropy is usually poorly taught, there's really three entropies that get convoluted. The statistical mechanics entropy, which is the math to describe random distribution of ideal particles. There's Shannon's entropy, which is for describing randomness in strings of characters. And there's classical entropy, which is to describe the fraction of irreversible/unrecoverable losses to heat as a system transfers potential energy to other kinds of energy on its way towards equilibrium with its surroundings or the "dead state" (which is a reference state of absolute entropy).
These are all named with the same word, and while they have some relation with each other, they are each different enough that there should be unique names for all three, IMO.
Entropy is a the amount of information it takes to describe a system. That is, how many bits does it take to "encode" all possible states of the system.
For example, say I had to communicate the result of 100 (fair) coin flips to you. This requires 100 bits of information as each of the 100 bit vectors is equally likely.
If I were to complicate things by adding in a coin that was unfair, I would need less than 100 bits as the unfair coin would not be equally distributed. In the extreme case where 1 of the 100 coins is completely unfair and always turns up heads, for example, then I only need to send 99 bits as we both know the result of flipping the one unfair coin.
The shorthand of calling it a "measure of randomness" probably comes from the problem setup. For the 100 coin case, we could say (in my opinion, incorrectly) that flipping 100 fair coins is "more random" than flipping 99 fair coins with one bad penny that always comes up heads.
Shannon's original paper is extremely accessible and I encourage everyone to read it [1]. If you'll permit self-promotion, I made a condensed blog post about the derivations that you can also read, though it's really Shannon's paper without most of the text [2].
[1] http://people.math.harvard.edu/~ctm/home/text/others/shannon...