A rough draft at introducing the base concepts for machine learning and image recognition.
Early unmanned space exploration vehicles took images of the moon with film cameras. These photographs were automatically developed using chemicals, dried, scanned and transmitted back to Earth as radio waves. A photograph was not data, but it could become data by measuring how light reflected off small areas of its surface from the top left to the bottom right. If it does, send a beep, if it doesn't, send silence. On or off, 1 or 0. Then, at another time and place, those beeps, or lack of them, are used to fill in some, not all, of the areas of a grid and the image appears. Not identical, missing some of the analogue nuance, but good enough.
Before too long it was possible to bypass the chemicals and simply measure the light inside the camera using a sensor. A self-contained, theoretically portable digital camera was produced by a Kodak engineer in 1975. It weighed 3.6kg, recorded 10,000 black and white pixels and stored them on magnetic tape, and all consumer and professional digital cameras pretty much follow its process, albeit at somewhat higher resolutions.
At first, digital photographs were functionally the same as analogue ones, they just didn't look as good. They had their uses, like taking photos of distant planets, but photographers and publishers tended to prefer the high quality of film which could be scanned into a computer system at a later date if needed. But then around the mid 2000s, the quality increased to the point that, when combined with convenience, analogue film became a luxury, discarded by both professionals and consumers. In 2011, 36 years after starting this revolution, Kodak went bankrupt.
We now live in a world ruled by digital images. The vast majority of people carry a digital camera with them at all times which is capable of publishing its images to the internet where the vast majority of people are able to see them. These photographs exist only as microscopic magnetic switches on hard drives and storage cards. You cannot look at them in the same way you cannot hear music by looking at the grooves on a vinyl record. In order to see a digital image is has to be processed, translated from encoded data to light up areas of a screen with different colours. Each screen or device translates the data slightly differently depending on its size or operating system. And different apps might use different interpretations depending on their requirements. When I look at a photograph on my computer and then send it to you to look at on your phone, we are looking at different interpretations of the same data. We are not looking at the same data.
On the surface this doesn't really matter. The whole act of looking itself introduces enough psychological and emotional variables that any subtle differences in rendering are pretty much moot. And, of course, the medium is the message, meaning that the fact of your looking at the photo in Facebook on your phone on the bus is way more important to your understanding of it and any actions you might take due to it than the contents of the image itself. Data is just the raw material. And as long as everyone's getting the same data then at least we can try to control the mediums that are translating it, say by regulation of monopolies and encouraging a plurality of platforms.
But once we dig a bit deeper, this fundamental nature of digital images – that they do not exist as images until they are interpreted as such by software – starts to matter quite a bit.
We think of a JPEG file as an image, but it isn't. It's a data format, a way of storing information, which lends itself to be interpreted as an image by software. It can also be turned into sound. When you play JPEG data as sound it mostly sounds like static noise, but it's still sound. Some people like to manipulate images by imported them into audio editing tools and saving them back as images. If you've ever wondered what a photograph would look like when put through an echo filter, wonder no more.
What's kinda fascinating is it looks just like an image that's been put through an echo filter.
Digital data is a sequences of switches, some of them on, some of them off. When we take a photograph we translate the light that comes through the lens into millions of switches, on and off. The same thing when we record sound digitally – millions of switches. Or when we save a word processing file, or a CAD drawing, or a web page. Everything that we call "digital" is a sequence of switches. On and off. 0s and 1s
In order to experience these recordings, these creations, these pieces of media, we have to translate them from their stored state into something we can perceive. Most of the time this is pretty linear. Photo software turns JPEGs into images. Music software turns MP3s into sound. Word processing software DOCs turn into text. Just as record players turn vinyl into music or printing presses turn metal type into newspapers. But it doesn't have to be.
Years ago, when home computers were new and most households had record players, computer data was, very very occasionally, distributed on vinyl records. You would play the record and send the audio not to the speakers but to the computer which would interpret the different tones as 0s and 1s. Of course this isn't news to anyone who had a ZX Spectrum or some other home computer which loaded software from cassette tapes. Games came on exactly the same kind of tapes as albums did. Part of the nostalgia for that era is the sound of the software playing in a tape deck.
A similar aural nostalgia can be had for the "boing boing" sounds produced by a computer modem connecting to the internet over a phone line. For those of us online at home in the 1990s and early 2000s, this was the anthem of the 'net, a digital conversation between two computers rendered as a song for us to sing along to. There was no good reason for it to be audible to humans, but in doing so it neatly illustrated the neutrality of the digital signal. By design this was code to be interpreted by the modem's circuitry. But it was also music. Which can be then turned into a visual graphic. Same data, different results.