Best daily deals

Links on Android Authority may earn us a commission. Learn more.

What is machine learning and how does it work?

Machine learning is everywhere these days, but how does it work?
January 4, 2022

From personal assistants like Google Assistant and Alexa to content recommendations from YouTube and Amazon, it’s hard to think of a service or technology that machine learning hasn’t radically improved over the past few years.

Simply put, machine learning is a subset of artificial intelligence that allows computers to learn from their own experiences — much like we do when learning or picking up a new skill. When implemented correctly, the technology can perform certain complex tasks better than any human, and often within seconds.

Given how pervasive machine learning has become in today’s society, you may wonder how it works and what its limitations are. To that end, here’s a simple primer on the technology. Don’t worry if you don’t have a background in computer science — this article is just a high-level overview of what happens under the hood.

What is machine learning?

Using Google Lens to identify a bunch of bananas as seen on the camera of the OnePlus 7 Pro.

Even though many people conflate the terms machine learning (ML) and artificial intelligence (AI), there’s actually a distinction between the two. To understand why, it’s worth talking about how artificial intelligence started off in the first place.

Early applications of AI, theorized around 50 years or so ago, were extremely basic by today’s standards. A chess game where you play against computer-controlled opponents, for instance, could once be considered revolutionary. It’s easy to see why — the ability to solve problems based on a set of rules can qualify as basic “intelligence”, after all. These days, however, we’d consider such a system extremely rudimentary as it lacks experience — a key component of human intelligence. This is where machine learning comes in.

Machine learning enables computers to learn or train themselves from massive amounts of existing data.

Machine learning adds an entirely new dimension to artificial intelligence — it enables computers to learn or train themselves from massive amounts of existing data. In this context, “learning” means forming relationships and extracting new patterns from a given set of data. This is a lot like how human intelligence works as well. When we come across something unfamiliar, we use our senses to study its features and can use our memory to recognize it the next time.

How does machine learning work?

Google io 2021 introduction to ML dataset types

Broadly speaking, a machine learning problem can be solved in two distinct phases: training and inference. In the first stage, a computer algorithm analyzes a bunch of sample or training data to extract relevant features and patterns. Each algorithm is generally optimized for a certain type of data. The data can be anything — numbers, images, text, and even speech.

The success of the training process, meanwhile, is directly linked to three factors: the algorithm itself, the amount of data you feed it, and the dataset’s quality. Every now and then, researchers propose new algorithms or techniques that improve accuracy and reduce errors, as you’d expect from cutting-edge technology. Increasing the amount of data you offer the algorithm, on the other hand, can also help cover more edge cases.

Machine learning programs involve two distinct stages: training and inference.

The output of a machine learning algorithm is often referred to as a model. You can equate ML models to a dictionary or reference manual as it’s used for future predictions. In other words, we use trained models to infer results from new data that our program has never seen before.

The training process usually involves analyzing thousands or even millions of samples. As you’d expect, this is a fairly hardware-intensive process that needs to be completed ahead of time. Once the training process is complete and all of the relevant features have been analyzed, however, some resulting models can be small enough to fit on common devices like smartphones.

Consider a machine learning application that interprets handwritten text, for example. As part of the training process, a developer first feeds an ML algorithm with sample images. This eventually gives them an ML model that can be packaged and deployed within something like an Android application. When users install the app and feed it with new images of their own, their devices can reference the model to infer new results. In the real world, you won’t see any of this, of course — the app will simply convert handwritten words into digital text.

Training a machine learning model is a hardware-intensive task that may take several hours or even days.

While early machine learning applications relied on the cloud for training and inference, recent technological advancements have enabled local, on-device inference as well. Of course, this largely depends on the algorithm and hardware used — as we’ll discuss in a later section.

For now, here’s a rundown of the various machine learning training techniques and how they differ from each other.

Supervised, unsupervised, and reinforcement learning

Training and inference presentation slide at Google IO

In a nutshell, the data used to train the algorithm can fall under one of two categories: labeled and unlabelled. As you may have guessed from the title, supervised learning involves a labeled dataset, which helps the training algorithm know what it’s looking for.

Take a model thats sole purpose is to identify images of dogs and cats, for example. If you feed the algorithm with labeled images of the two animals, it is simply a case of supervised learning. However, if you expect the algorithm to figure out the differentiating features all on its own (that is, without labels indicating the image contains a dog or cat), it becomes unsupervised learning.

Unsupervised learning is especially useful in instances where you might not know what patterns to look for. Furthermore, new data is constantly fed back into the system for training — without any manual input required from a human.

Say an ecommerce website like Amazon wants to create a targeted marketing campaign. They typically already know a lot about their customers, including their age, purchasing history, browsing habits, location, and much more. An unsupervised learning algorithm would be able to form relationships between these variables all by itself. It might help marketers realize that customers from a particular area tend to purchase certain types of clothing or that young shoppers are more likely to spend on recreational items. Whatever the case may be, it’s a completely hands-off number-crunching, discovery process.

Unsupervised learning excels at finding patterns and relationships in a dataset that a human might otherwise overlook.

All in all, unsupervised learning is a useful technique in scenarios that are not quite as straightforward as those with known outcomes.

Finally, we have reinforcement learning, which works particularly well in applications that have many ways to reach a clear goal. It’s a system of trial and error — positive actions are rewarded, while negative ones are discarded. This means the model can evolve based on its own experiences over time.

A game of chess is the perfect application for reinforcement learning because the algorithm can learn from its mistakes. In fact, Google’s DeepMind subsidiary built an ML program that used reinforcement learning to become better at the board game, Go. Between 2016 and 2017, it went on to defeat multiple Go world champions in competitive settings — a remarkable achievement, to say the least.

What about neural networks and what is deep learning?

DNA sequencing deep learning slide at Google IO

A neural network is a specific subtype of machine learning inspired by the behavior of the human brain. Biological neurons in an animal body are responsible for sensory processing. They take information from our surroundings and transmit electrical signals over long distances to the brain. Our bodies have billions of such neurons that all communicate with each other, helping us see, feel, hear, and everything in between.

An artificial neural network mimics the behavior of biological neurons in an animal body.

In that vein, artificial neurons in a neural network talk to each other as well. They break down complex problems into smaller chunks or “layers”. Each layer is made up of neurons (also called nodes) that accomplish a specific task and communicate their results with nodes in the next layer. In a neural network trained to recognize objects, for example, you’ll have one layer with neurons that detect edges, another that looks at changes in color, and so on.

Layers are linked to each other, so “activating” a particular chain of neurons gives you a certain predictable output. Because of this multi-layer approach, neural networks excel at solving complex problems. Consider autonomous or self-driving vehicles, for instance. They use a myriad of sensors and cameras to detect roads, signage, pedestrians, and obstacles. All of these variables have some complex relationship with each other, making it a perfect application for a multi-layered neural network.

Deep learning is a term that’s often used to describe a neural network with many layers. The term “deep” here simply refers to the layer depth.

Where do we see machine learning in our daily lives?

Sony Xperia 1 III Google Assistant
Robert Triggs / Android Authority

Machine learning influences pretty much every aspect of our digital lives. Social media platforms like Instagram, for example, often show you targeted advertisements based on the posts you interact with. If you like an image containing food, you might get advertisements related to meal kits or nearby restaurants. Similarly, streaming services like YouTube and Netflix can infer new genres and topics you may be interested in, based on your watch history and duration.

Even on personal devices like smartphones, features such as facial recognition rely heavily on machine learning. Take the Google Photos app, for example. It not only detects faces from your photos but also uses machine learning to identify unique facial features for each individual. The pictures you upload help improve the system, allowing it to make more accurate predictions in the future. The app also often prompts you to verify if a certain match is accurate — indicating that the system has a low confidence level in that particular prediction.

See also: How on-device machine learning has changed the way we use our phones

Indeed, machine learning is all about achieving reasonably high accuracy in the least amount of time. It’s not always successful, of course.

In 2016, Microsoft unveiled a state-of-the-art chatbot named Tay. As a showcase of its human-like conversational abilities, the company allowed Tay to interact with the public through a Twitter account. However, the project was taken offline within just 24 hours after the bot began responding with derogatory remarks and other inappropriate dialogue.

The above example highlights an important point — machine learning is only really useful if the training data is reasonably high quality and aligns with your end goal. Tay was trained on live Twitter submissions, meaning it was easily manipulated or trained by malicious actors.

Machine learning isn't a one-size-fits-all arrangement. It requires careful planning, a varied and clean data set, and occasional supervision.

Dangers of machine learning aside, the technology can also help in scenarios where traditional methods just cannot keep pace.

Rendering graphically complex video games represents one such application. For decades, we’ve relied on yearly performance increases to achieve this task. However, processing power has started to plateau of late — even as other technologies like display resolutions and refresh rates continue to march upwards.

ML-based upscaling technologies like Nvidia’s Deep Learning Supersampling (DLSS) are helping bridge this gap. The way DLSS works is rather straightforward — the GPU first renders an image at a lower resolution and then uses a trained ML model to upscale it. The results are impressive, to say the least — far better than traditional, non-ML upscaling technologies. Similarly, super-resolution upscaling is used to improve smartphone photography image quality. Machine learning isn’t just for basic predictions anymore.

How does hardware affect machine learning performance?

Crypto mining with GPU
Edgar Cervantes / Android Authority

Many of the aforementioned machine learning applications, including facial recognition and ML-based image upscaling, were once impossible to accomplish on consumer-grade hardware. In other words, you had to connect to a powerful server sitting in a data center to accomplish most ML-related tasks.

Even today, training an ML model is extremely hardware intensive and pretty much requires dedicated hardware for larger projects. Since training involves running a small number of algorithms repeatedly, though, manufacturers often design custom chips to achieve better performance and efficiency. These are called application-specific integrated circuits or ASICs. Large-scale ML projects typically make use of either ASICs or GPUs for training, and not general-purpose CPUs. These offer higher performance and lower power consumption than a traditional CPU.

Machine learning accelerators help improve inference efficiency, making it possible to deploy ML apps to more and more devices.

Things have started to change, however, at least on the inference side of things. On-device machine learning is starting to become more commonplace on devices like smartphones and laptops. This is thanks to the inclusion of dedicated, hardware-level ML accelerators within modern processors and SoCs.

Read more: Why are smartphone chips suddenly including an AI processor?

Machine learning accelerators are extremely power efficient compared to an ordinary processor. This is why the DLSS upscaling technology we spoke about earlier, for example, is only available on newer Nvidia graphics cards with the requisite ML acceleration hardware. In smartphones, we’ve seen specific low-power accelerators designed for voice detection and a growing trend in ML processing power integrated tightly with more traditional image processors for better photography.

Going forward, we’re likely to see feature segmentation and exclusivity depending on each new hardware generation’s machine learning acceleration capabilities. In fact, we’re already witnessing that happen in the smartphone industry.

Machine learning at the edge: Smartphones and consumer devices

Pixel 6 showing Live Caption
Ryan Haines / Android Authority

ML accelerators have been built into smartphone SoCs for a while now. However, they’ve become a key focal point of late due to the rise of use-cases like computational photography and voice recognition.

In 2021, Google announced its first semi-custom SoC, nicknamed Tensor, for the Pixel 6. One of Tensor’s key differentiators was its custom TPU — or Tensor Processing Unit. Google claims that its chip delivers significantly faster ML inference versus the competition, especially in areas such as natural language processing. This, in turn, allowed Google to use Tensor for a suite of new features on the Pixel 6, including real-time language translation, HDR-enabled video recording, and faster speech-to-text functionality. Smartphone processors from MediaTek, Qualcomm, and Samsung have their own takes on dedicated ML hardware too.

See also: What is Google Tensor?

That’s not to say that cloud-based inference isn’t still in use today — quite the opposite, in fact. While on-device machine learning has become increasingly common, it’s still far from ideal. This is especially true when we consider complex problems like voice recognition and image classification. Voice assistants like Amazon’s Alexa and Google Assistant are only as good as they are today because they rely on powerful cloud infrastructure — for both inference as well as model re-training.

On-device machine learning enabled a plethora of futuristic smartphone features, including computational photography, real-time translation, and live captions.

However, as with most new technologies, new solutions and techniques are constantly on the horizon. In 2017, Google’s HDRnet algorithm revolutionized smartphone imaging, while MobileNet brought down the size of ML models and made on-device inference feasible. More recently, the company highlighted how it uses a privacy-preserving technique called federated learning to train machine learning models with user-generated data.

Apple, meanwhile, also integrates hardware ML accelerators within all of its consumer chips these days. The Apple M1 family of SoCs included in the latest Macbooks, for instance, has enough machine learning grunt to perform training tasks on the device itself.

And with that, you’re now up to speed on the basics of machine learning! If you’re looking to get started with the technology on your own, consider checking out our guide on adding machine learning to an Android app.