Sometimes, one wonders why Google offers so many amazing services for free. There is a saying in Silicon Valley: “If you don’t know what the product is, then YOU are the product.” In this case, Google collects millions upon millions of text queries, each one providing a little bit more data about what people are interested in. According to the book In the Plex, the short-lived free telephone directory service GOOG411 service allowed Google to collect voice samples to create the voice-recognition engine in the Android smart phone operating system. By running the Universe’s most popular search engine, Google collects samples of queries that allow it to build better and better algorithms for predicting what you are looking for, even before you finish typing. The most direct examples are Google Autocomplete and Spell-Check, but the Andriod keyboard can use your typing history, along with its database of what letters come next in queries, to make very good guesses about what you are trying to say.
A Pioneer in the this field, Claude Shannon, is probably the most important computer scientist (almost) no one has heard of. In addition to his work on artificial intelligence, Shannon came up with the definition of information.
The essential idea is that not all signals will carry the same amount of information. The redundancies in a message contains that could be excised without loss of meaning do not convey any extra information. In a sense, information is the same as the surprise the receive experiences by reading it. For example, if you see me throw a fair coin and ask me which way it landed, by saying “heads,” I am conveying one bit of information – the anwser to a single “yes/no” or “1/0” binary question. Now imagine that you ask me who won the Stanley Cup this year.
If I tell you “The Chicago Blackhawks,” that would also be one bit of information, assuming that you knew that they were one of the two finalists. However, if you had stopped paying attention after the end of the regular season, and only knew beforehand which 16 of the 30 NHL teams made the playoffs, the answer “The Chicago Blackhawks” contains more information, since now you know which of the 16 possibilities is correct. The Shannon Entropy is defined as the minimum number of binary bits that are needed narrow down the same number of choices. If all n choices are equally likely a priori, this is just Log2(n). Notice how important previous knowledge, and the set of “allowed” possibilities is. For example, when you play Chess over the internet (or by mail if you are old-school), you could imagine constructing a system of ordering all of the allowed moves (since both players know the rules of chess, and agree on a system of move notation), for example, that lists, in alphabetical order, all of the legal moves from the current position in alphabetical order. Then, to make a move, you can just send the corresponding number.
More generally, if not all of the choices are equally probable, for example, the next letter in a book, the amount of information or surprise is greatly reduced. For example, in English text, the letter “q” is almost always followed by a “u.” The formula for the Shannon Entropy can be extended to be:
Shannon attempted to measure the average information content of the English language by reading the letter of a book to his wife and seeing if she could guess the next letter. Current estimates out the answer at about 1.5 bits per letter. If all 26 letters were equally likely, this would be Log2(26)=4.7 . The greatly reduced information content represents the redundancy inherent in language, which is an intentional solution for the inevitable errors in transmission. This led Shannon to another remarkable result; no matter how noisy the channel is, it can still be used to communication, as long as you are OK with error-correcting algorithms and slow bandwidth. Conversely, you can use compression algorithms to reduce redundancy to increase the rate of information transfer. If you were paying by the letter, you might just get rid of all of these trailing “u”s after “q”s and save yourself some money. Similar considerations for people sending telegrams (paying by the word) or, more contemporary, using Twitter (limited to 140 characters), crop up, and standard abbreviations have come into common usage as a result. One of the big revolutions in digital media streaming, Netflix, Spotify, et al, is not just the “wider pipes” of the internet, but better compression algorithms for sending music and video.
What about the connection with physics? The reason the measure of information is called the “Shannon Entropy” is that the formula is identical to the one for the thermodynamic entropy of the system. As detailed on the Hammock Physicist, a good way to think about entropy is the amount of information you would need to specify the particular state the system is in. In the terms of statistical mechanics, the amount of additional data you would need to pick out the specific microstate a system is in, once you know the macrostate. This can be a lot of information, if you system is, lets say, an ideal gas in a canister. Even after you have measured all of the observable s like pressure and temperature, you would still be missing the location and velocities of the all bazillions of individual gas molecules. While this is already suggested by the Boltzmann formula for entropy:
S = k ln Γ, where Γ is the number of microstates, it is not obvious why the total entropy of a system either increases or stays the same. In other words, while it makes sense that a system, left to its own devices, will generally evolve to macrostates with more microstates (eg. more probable), but it is not a trivial consequence. But if you think about entropy in terms of information, the second law falls right into your lap:
The bit count, also referred to as the entropy, measures missing information. To put this slightly more precise: entropy quantifies the detailed information we lack about the system. It is the minimum number of additional bits that you need in order to be able to fill in the hidden information and to describe the energy configuration of the system down to its finest details. The second law of thermodynamics states that this missing information can’t decrease. If we are ignorant about the finer and fast details of the system, that ignorance can not disappear due to the system evolving. Ignorance can spread and grow, but it can’t disappear. That’s all there is to the much celebrated second law. (You are not disappointed, are you?)
That is, you can’t gain information about a system just by letting it evolve. In fact, your ignorance only grows as time goes on (see Chaos Theory).