If you’re like the members of the Android Authority team, you’re probably asked often: Which phone is the best? It’s a deceptively simple question, but we’re going to help you answer it and we’re going to do it in the most objective way possible. Introducing the new Android Authority testing methodology!
Going forward, all smartphones that we review will be put through over 40 standardized tests, with the aim of establishing which is the best of them all. Here’s an inside look at our all-new testing methods, including what we’re looking for in each test, the companies we partner with for testing equipment, and what this means for smartphone reviews on Android Authority going forward.
Test results are meaningless if you aren’t comparing apples to apples, so controlled test conditions are essential. Before and during testing, we seek to identify and eliminate all sources of errors.
For our camera tests, we took a concrete room with sealed up windows and completely blacked it out. We coated the walls with duvetyne to make sure that even if light did get in, it wouldn’t reflect off the walls or make it to our test charts. We also test our screens in the same lab with a program called Calman.
We seek to identify and eliminate all sources of errors
For audio and battery life, we implemented automated tests in order to remove as many variables as possible.
When we test a device, our main concern is collecting usable data in test conditions that are standardized for each and every review. For much of our data collection, that means using time-tested off-the-shelf solutions and very few tests that are susceptible to outside variables.
Because smartphones use digital cameras, there are literally hundreds of ways you can rate digital images objectively, pixel by pixel. To that end, companies like our partners at Imatest offer off-the-shelf solutions to objectively assess image quality. We use a version called Imatest IS, along with the recommended charts to collect and analyze the shots taken by the smartphones we review.
We gather our sample data in a controlled environment, free of outside light that could change our results
We gather our sample data in a controlled environment, free of outside light that could change our results. For every shot, we use the same lighting, with the same positioning and the same framing. This allows us to directly compare results from one smartphone camera to another.
Objective tests fall into several categories, but mainly they use only a handful of charts. Most outlets will use a combination of a Colorchecker, resolution chart, and 4K video resolution charts, to name a few. From these, you can assess how well a camera handles exposure, white balance, noise, dynamic range, sharpness, coma, chromatic aberration… you get the idea. There’s huge amounts of data that can be gleaned from each shot taken of a test chart.
This chart has been around for a long time, but it’s exceedingly useful to see just how good color, noise, and white balance performance are on a given camera. Using a properly-exposed shot of the XRite 24-patch colorchecker chart tells you how accurate each of the 24 color patches are against a Pointer’s gamut and how saturated those colors are compared to what they should be. By looking at the color drift from the monochrome patches on the bottom, you can see if the camera changes color temperature by comparing their chromaticity against the ideal white point (D65).
You can tell exactly how much noise you can expect to see in your shots
But the really cool thing you can measure with this chart is the level of noise in a given shot. For example, you can not only tell exactly how much noise you can expect to see in your shots at a certain setting, but also what kind of noise is present in every patch, too. While that’s more information than most users will ever care about, this testing is useful for those looking to compare which cameras are noisier than others. Generally, you don’t want to see anything exceeding 2%, but the realities of tiny camera sensors make that nearly impossible.
After we take a brightly-lit test shot of our color chart, we then dim the lights to a level you’d see in a bar or at dusk, and shoot again. The pitfall here is that most smartphone cameras have what’s called a noise reduction algorithm. This kills noise in low light, but it also makes photos look blotchy and ill-defined.
To test sharpness, you take a perfectly-aligned shot of an evenly-lit resolution chart. We use a variation on this chart, but there are many out there that accomplish the same thing. The main way to quantify sharpness is line widths per picture height (LW/PH) at MTF50.
Assuming the test shot is free of errors, it’s time to analyze the data. The software will look at the high-contrast slanted edges of the boxes in all portions of the chart to measure how many pixels it takes to transition from black to white, if there’s any added color that shouldn’t be there, if any lines are curved… you get the idea. We get the bulk of our data from this chart, because it’s extremely useful to assess not only the image sensor, but the lens as well. Crappy lenses will be prone to chromatic aberration, coma, and soft corner sharpness—and crappy sensors will be prone to soft center sharpness.
Using the video resolution chart is one of the easier tests out there because it’s so simple to set up. To do this, you’d place the chart on the setup like you would for any other test, and start recording. Once you’ve got your focus locked, you gently-but-consistently move the camera back and forth, up and down.
The video resolution chart will demonstrate how well a camera resolves an image, measured in line pairs per picture height (LP/PH). Specifically, we use the DSC Labs CamAlign Mega Trumpet to accurately rate a 4K video signal. Once we record a sample clip, we mark the point where you can’t distinguish the lines from each other anymore and line it up with the guides on the chart.
All of this is well and good, but it’s important to remember that photography is as much art as science. In other words, the best cameras by the numbers don’t always take the prettiest pictures. Being technically “better” matters in some areas, but many of the things people actually like in photography depend on technical shortcomings. Take Instagram for example: every time you use a filter to alter your photos, you’re “negatively” affecting the objective quality of your snaps in some way, be it color accuracy, sharpness, dynamic range, or noise… But it looks really cool, right?
It's important to remember that photography is as much art as science
There’s so much that can “go wrong” when properly exposing a shot out in the real world, that even the “best” camera will struggle given the right circumstances. Even if you take the best shot you possibly could by the numbers, what’s “accurate” and what’s a “good photo” are often at odds with each other. There’s so much more to photography as a practice that can’t be replaced by sensor tech just yet, and one of the marks of a mature photographer is learning how to use the shortcomings of your camera to maximum effect.
While test results can imply stark differences in camera quality, the truth of the matter is that what people perceive as the “best” isn’t close to that objectively — and there’s nothing wrong with that. Some results just don’t matter as much as others. For this reason, our camera reviews will always include an assessment of image quality from one of our expert reviewers, in addition to the results of the technical, test-based assessment.
In the very same lab we test the camera performance, we also test the screens of the smartphones. As before, this is because we don’t want any light pollution messing up our readings.
While there’s plenty of competing standards like Rec.709, DCI-P3, Rec.2020 etc. out there, we measure each screen objectively so you know what you’re getting into. Where people’s tastes vary on things like color saturation and brightness, measurements like gamma and color accuracy are a little less subjective.
For this, we use a program called Calman in conjunction with an Xrite i1 Pro colorimeter in order to measure things like screen brightness, color accuracy, and more. That way, there’s very little subjectivity in how well a screen is able to reproduce transitions from white to black, color to color, and so on. We can just show you!
Specifically, we look at how easy the phone can reach an acceptably bright level (200 nits), how bright the display can get, and whether or not those numbers change in the presence of ambient light. 200 nits is the minimum acceptable level for viewing in a well-lit room, so the lower the display % is at that brightness level the better. For peak brightness, the higher the number goes, the better you’ll be able to see your screen in direct sunlight… though anything over 600 nits will probably sear your retinas a little if you view it in the wrong situation.
This one’s sticky, but no less necessary. What we think of as white light isn’t the same coming from every display. Our eyes adjust to different lighting situations, and the way we measure how “warm” or “cool” the light appears is in degrees Kelvin. That’s not very intuitive we know, but all you need to be aware of is that normal daylight color temperature is 6500 K, otherwise referred to as “D65.”
What we think of as white light isn't the same coming from every display
If a display has a lower color temperature, it’ll look a little yellow/orange compared to what it should, and if it has a higher color temperature than D65, it’ll look a little bluish. While some variation is perfectly fine, excessive color temperature errors will make your image look a bit strange.
Gamma simply refers to how well a screen is able to display all of the brightness values from black to white. In our tests, we sample known brightness values with calibration files supplied by Calman, and plot the line. Though there is no singular standard for gamma in the display industry, we look for a slope between 2.1 and 2.2 for cinephiles, as that’s the preferred gamma for movie addicts. If that number goes too high—meaning the transition between gray values is too violent—you may start to see banding or posterization in your video. If the number is too small, you’ll see a lack of contrast.
While there are many ways to skin a cat, there are relatively few ways to test audio. We know quite a bit about how sound behaves, and, like light, it can be objectively tested.
Given that we don’t have an anechoic chamber, we have to test speaker loudness in a fairly basic manner. Essentially, we get a sensitive decibel meter, and set up shop in our extremely quiet testing lab. Placing the device one meter away from the phone, we then record the maximum output of the phone’s speaker. While this does not tell us anything about the speaker’s quality, it does tell us whether or not the speakers are able to project sound in a standard environment.
We can see how much distortion and added noise a phone adds to the audio signal
A bit on the technical side, this is also a fairly pedestrian test. Using a test file pre-loaded onto the phone, we test the electrical output of the phone’s headphone jack (assuming it has one). We then record the output. While most people may not really care about this number, hardcore audiophiles will find it useful, because it tells them if they need to use an amplifier or other equipment to get the most out of their listening.
Using a program called Rightmark Audio Analyzer, we can use another test file to see how much distortion and added noise the phone adds to the audio signal. The process is very similar to the output test, but instead of using a multimeter, we use an interface to record the output of the phone, and compare it to the test file. The software then handles the rest, and gives us our data.
Not all headphone jacks are created equal, and sometimes they alter the way music sounds. Sometimes. Not often.
By using the data we collected from the noise and distortion tests, we can piece together how a headphone jack emphasizes all audible music notes. Ideally, we’d like to see the phone treat every note equally, but sometimes that doesn’t happen. A little variation is fine, but anything over 5dB will be somewhat noticeable… and anything over 10dB will definitely be noticeable.
Lordy, do we put the time in on this one. Nobody uses their smartphone in exactly the same way as anyone else, so we test for several different use cases. There’s no perfect way to go about testing battery life because of several factors, but we force each phone through a gantlet of tests designed to cover many of the most common tasks performed by a mobile device.
This test is pretty much exactly what you might expect. We have a custom-built app to simulate normal web browsing behavior that repeats itself over and over again. The app goes to common websites, scrolls through pages, and even loads common objects like short video clips, images, and loads of text. As with all of our battery tests, this is performed when the screen is set to 200 nits just to make sure that all results are an apples-to-apples assessment.
Our custom battery testing app runs for 90 minutes and then estimates the total battery life based on the amount drained
Much like our WiFi browsing test, this test uses our custom-built app to simulate normal gaming behavior on an infinite timeline. We install a certain game that uses the Unreal engine, and let it go for as long as it can before giving up the ghost. This is our most brutal test, so don’t be surprised if the numbers are a bit low compared to what you might expect. Like all our other tests, the screen brightness is locked in at 200 nits so there’s no issue caused by that variable.
In order to test how long an Android phone can play back video, we do the one thing we can do in this situation: play video!
We set the screen’s brightness to 200 nits, and set a standard video file that we have pre-loaded onto the phone to play back. This file loops ad infinitum until the battery runs out, and we record the time of death.
Our testing app has a quick test built-in that shuffles in between all three of these tests for 90 minutes and then estimates the total battery life based on the amount drained. This allows us to partially simulate real world usage, although it is worth noting that battery life varies dramatically according to your usage.
Adequately kicking the proverbial tires on a processor is a much tougher task than you might think. Reducing all of the wonderful things these chips can do to a handful of numbers is hardly useful for most people, because lots of hard-to-contextualize numbers don’t help much, now do they? Every test out there has its own philosophy of how to score and what should be valued, so it’s common for phones to bomb one test, but crush another.
That’s why we use a huge battery of processor tests in order to see how the chip handles the load in lots of different situations. We use all off-the-shelf tests that have been used to qualify processor performance for years, including GFXBench, Geekbench 4, 3DMark, Vellamo, AnTuTu, and Basemark OS. While no single benchmark tells us everything about a processor, using several at once allows us to gain a deeper understanding of the nuances of each phone’s processing package.
It can be tempting to just look at the numerical score and stop thinking, but it’s very rarely the case that a single number tells you everything you need to know about something.
First: there’s a fairly narrow range of things that humans can actually perceive. If you can’t actually tell the difference between two measurements, are they effectively any different? Our position is “no, they’re not.” While that’s not what hardcore enthusiasts want to hear, it would be wildly unfair to tip the scales in some metrics that might not be a big deal for most people. For example, if someone partnered with a phone company to put a DSLR sensor in a handset, it would absolutely obliterate all other cameras in our testing, enough to overcome the disparity in things like features, screen performance, and other metrics. Not really fair if it does nothing else well, right?
Some data we collect is also actively the target of manufacturer malfeasance, as many benchmarks can be gamed. That’s why we run so many tests at once: so any outliers in performance will be normalized over a large sample size.
Read the reviews! Our trained experts will arm you with all the knowledge you need
Despite our best efforts to avoid publishing misleading graphs, sometimes our data can be tough to read and send the wrong signals. For this reason, we implore you to read the reviews. We have a wonderful staff of trained experts to contextualize the data and arm you with enough knowledge to make the best purchasing decision for you.
At Android Authority, we take great pride in our testing, but we are humans. Sometimes we receive dud units, other times we get errant data, and even rarer we just simply screw up. It happens. Nobody’s immune from this, and even the best of us will make mistakes from time to time.
We mitigate against this by using as many standardized solutions as we can, but also by running tests over and over to make sure our results are repeatable. In the event we need to admit something went awry, we will issue an update post. It’s very important to us that you get the information you’ll need as a consumer, and if something is keeping us from getting it to you, that’s a problem.
In the case we don’t get everything right the first time, we won’t just leave incorrect data out there for all to see. That’s why we keep an open line of dialogue with our readers, as well as manufacturers, to see if our experience mirrors that of others with the device. If we deem it necessary, we’ll retest a new unit and publish our findings.