One area of technology that is helping improve the services that we use on our smartphones, and on the web, is machine learning. Sometimes, the terms machine learning and artificial intelligence get used as synonyms, especially when a big name company wants to talk about its latest innovations, however AI and machine learning are two quite distinct, yet connected, areas of computing.
The goals of AI is to create a machine which can mimic a human mind and to do that it needs learning capabilities. However the goal of AI researchers are quite broad and include not only learning, but also knowledge representation, reasoning, and even things like abstract thinking. Machine learning on the other hand is solely focused on writing software which can learn from past experience.
What you might find most astonishing is that machine learning is actually more closely related to data mining and statistical analysis than AI. Why is that? Well, lets look at what we mean by machine learning.
One of the standard definitions of machine learning, as given by Tom Mitchell – a Professor at the Carnegie Mellon University (CMU), is a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
To put that a bit more simply, if a computer program can improve how it performs a task by using previous experience then you can say it has learned. This is quite different to a program which can perform a task because its programmers have already defined all the parameters and data needed to perform the task. For example, a computer program can play tic-tac-toe (noughts and crosses) because a programmer wrote the code with a built-in winning strategy. However a program that has no pre-defined strategy and only has a set of rules about the legal moves, and what is a winning scenario, will need to learn by repeatedly playing the game until it is able to win.
This doesn’t only apply to games, it also true of programs which perform classification and prediction. Classification is the process whereby a machine can recognize and categorize things from a dataset including from visual data and measurement data. Prediction (known as regression in statistics) is where a machine can guess (predict) the value of something based on previous values. For example, given a set of characteristics about a house, how much is it worth based on previous house sales.
That leads us to another definition of machine learning, it is the extraction of knowledge from data. You have a question you are trying to answer and you think the answer is in the data. That is why machine learning is related to statistics and data mining.
Types of machine learning
Machine learning can be split into three broad categories: Supervised, unsupervised and reinforcement. Let’s look at what they mean:
Supervised learning is where you teach (train) the machine using data which is well labeled. That means that the data is already tagged with the correct answer (outcome). Here is a picture of the letter A. This is the flag for the UK, it has three colors, one of them is red, and so on. The greater the dataset the more the machine can learn about the subject matter. After the machine is trained, it is the given new, previously unseen data, and the learning algorithm then uses the past experience to give a result. That is the letter A, that is the UK flag, and so on.
Unsupervised learning is where the machine is trained using a dataset that doesn’t have any labeling. The learning algorithm is never told what the data represents. Here is a letter, but no other information is given about which letter. Here are the characteristics of a particular flag, but without naming the flag. Unsupervised learning is like listening to a podcast in a foreign language which you don’t understand. You don’t have a dictionary and you don’t have a supervisor (teacher) to tell you about what you are hearing. If you listen to just one podcast it won’t be of much benefit, but if you listen to hundreds of hours of these podcasts your brain will start to form a model about how the language works. You will start to recognize patterns and you will start to expect certain sounds. When you do get hold of a dictionary or a tutor then you will learn the language much quicker.
The key thing about unsupervised learning is that once the unlabeled data has been processed it only takes one example of labeled data to make the learning algorithm fully effective. Having processed thousands of images of letters, processing one letter A will instantly label a whole section of the processed data. The advantage is that only a small set of labelled data is needed. Labeled data is harder to create than unlabeled data. In general we all have access to large amounts of unlabeled data, and only small amounts of labeled data.
One of the buzzwords that we hear from companies like Google and Facebook is 'Neural Net.'
Reinforcement learning is similar to unsupervised training in that the training data is unlabeled, however when asked a question about the data the outcome will be graded. A good example of this is playing games. If the machine wins the game then the result is trickled back down through the set of moves to reinforce the validity of those moves. Again, this isn’t much use if the computer plays just one or two games. But if it plays thousands, even millions of games then the cumulative effect of reinforcement will create a winning strategy.
How does it work
There are lots of different techniques used by engineers building machine learning systems. As I mentioned before, a large number of them are related to data mining and statistics. For example, if you have a dataset which describes the characteristics of different coins including their weight and diameter then you can employ statistical techniques like the ‘nearest neighbors’ algorithm to classify a previously unseen coin. What the ‘nearest neighbors’ algorithm does it look to see what classification was give to the nearest neighbors and then give the same classification to the new coin. The number of neighbors used to make that decision is referred to as ‘k’, and so the full title for the algorithm is ‘k-nearest neighbors.’
However there are lots of other algorithms that try to do the same thing, but using different methods. Take a look at the following diagram:
The picture on the top left is the data set. The data is classified into two categories, red and blue. The data is hypothetical, however it could represent almost anything: coin weights and diameters, number of petals on a plant and their widths, etc. Clearly there is some definite grouping here. Everything in the upper left belongs to the red category, and the bottom right to blue. However in the middle there is some crossover. If you get a new, previously unseen, sample which fits somewhere in the middle, does it belong to the red category or to blue? The other images show different algorithms and how they attempt to categorize a new sample. If the new sample lands in a white area then it means it can’t be classified using that method. The number on the lower right shows the classification accuracy.
One of the buzzwords that we hear from companies like Google and Facebook is “Neural Net.” A neural net is a machine learning technique modeled on the way neurons work in the human brain. The idea is that given a number of inputs the neuron will propagate a signal depending on how it interprets the inputs. In machine learning terms this is done with matrix multiplication along with an activation function.
The use of neural networks has increased significantly in recent years and the current trend is to use deep neural networks with several layers of interconnected neurons. During Google I/O 2015, Senior Vice-President of Products, Sundar Pichai, explained how machine learning and deep neural networks are helping Google fulfill its core mission to “organize the world’s information and make it universally accessible and useful.” To that end you can ask Google Now things like, “How do you say Kermit the Frog in Spanish.” And because of DNNs, Google is able to do voice recognition, natural language processing, and translation.
Currently Google is using 30 layer neural nets, which is quite impressive. As a result of using DNNs, Google’s error rate for speech recognition has dropped from 23% in 2013 to just 8% in 2015.
Some examples of machine learning
So we know that companies like Google and Facebook use machine learning to help improve their services. So what can be achieved with machine learning? One interesting area is picture annotation. Here the machine is presented with a photograph and asked to describe it. Here are some examples of machine generated annotations:
The first two are quite accurate (although I am not sure there is a sink in the first picture), and the third is interesting in that the computer managed to detect the box of doughnuts, but it misinterpreted the other pastries as a cup of coffee. Of course the algorithm can also get it completely wrong:
Another example is teaching a machine to write. Cleveland Amory, an American author, reporter and commentator, once wrote, “In my day the schools taught two things, love of country and penmanship — now they don’t teach either.” I wonder what he would think about this:
The above handwriting sample was produced by a Recurrent Neural Network. To train the machine its creators asked 221 different writers to use a ‘smart whiteboard’ and to copy out some text. During the writing the position of their pen was tracked using infra-red. This resulted in a set of x and y coordinates which were used for supervised training. As you can see the results are quite impressive. In fact, the machine can actually write in several different styles, and at different levels of untidiness!
Google recently published a paper about using neural networks as a way to model conversations. As part of the experiment the researchers trained the machine using 62 million sentences from movie subtitles. As you can imagine the results are interesting. At one point the machine declares that it isn’t “ashamed of being a philosopher!” While later when asked about discussing morality and ethics it said, “and how i’m not in the mood for a philosophical debate.” So it seems that if you feed a machine a steady diet of Hollywood movie scripts the result is a moody philosopher!
Unlike many areas of AI research, machine learning isn’t an intangible target, it is a reality that is already working to improve the services we use. In many ways it is the unsung hero, the uncelebrated star which works in the background trawling through all our data to try and find the answers we are looking for. And like “Deep Thought” from Douglas Adam’s Hitchhiker’s Guide to the Galaxy, sometimes it is the question we need to understand first, before we can understand the answer!