Do you care that Android manufacturers cheat in benchmarks?
On this edition of the Friday Debate, we discuss benchmarks. There’s a bit of a scandal going on in the Android community, following reports that several large manufacturers are artificially boosting their scores in benchmarks. Some say it’s a normal practice, others that it’s downright cheating. Our questions for this week – do benchmarks matter? Do you care that Android OEMs are rigging them? Is the whole issue overblown?
Join us in the discussion, vote in our poll, and sound off in the comments!
This whole benchmark scandal was going to happen. It was just a matter of time. No reviewers really base their final judgments on a device on how well they do in benchmarks, but a lot of people do. Maybe not insofar as the scores themselves, but the graphs that get shown. People see devices like the Note 3, the HTC One, etc are much, much longer (or taller, depending on the graph) than the competitors. So people think that the device is that much better proportionately.
Now here’s the kicker. I don’t think that this should be a scandal and I don’t think people should care at all. Even if these OEMs code it into the OS to kick the processors into high gear, it’s not actually cheating. The chipset the device comes with can only do so well, so why not let it go nuts? Obviously, the Snapdragon 800 is better than last year’s quad cores, so what does it matter if they’re both maxed out for their graphics test? It’ll invariably show the same information at the end. That one is better for graphics than the other.
We already know these aren’t good (or even reasonably accurate) representations of the real world performance of these chipsets and devices. However, we do one thing. If the OS maxes out the CPU for the tests, we know where the ceiling is. If the OS is treating an intense graphics benchmark as any other app, then the CPU will adjust accordingly, which knocks off points and ends up somewhere in the middle. With the CPU maxed out, the scores max out, and we can see just how high these devices can go. In either case, we still don’t know the real world performance, but if we let them max out the CPU, we’ll at least know the performance ceiling. So we may actually walk away from graphics benchmarks with a fact to give people, instead of conjecture like we do now.
So what should the media do? Nothing. Most of us already add the standard “it’s a graphical benchmark which doesn’t show the real world performance of this device” boilerplate in all of our articles. The only difference is when we do it now, it’ll actually be more accurate that it was last week!
We’ve all known for a while that benchmark scores don’t directly translate into real-world performance, and I can’t say that I’m shocked to hear that some manufacturers have been fiddling around to improve their scores. But at least if every manufacturer is optimizing their components and software when it comes to benchmarking, then the playing field is level and the benchmarks still perform their function.
However, optimization is one thing, but overclocking parts only when running benchmarks verges on misrepresentation of a product. People benchmark different devices with similar hardware because clock speeds, other pieces of hardware, and software optimisations can affect the results. But if you’re testing a secretly overclocked device then all comparisons instantly go out of the window, as these levels of performance cannot be obtained in the final product.
There are good reasons why components are locked at certain frequencies, usually for stability, but also to control power consumption and heat levels. In the Galaxy S4’s example, it’s very doubtful that the GPU would be able to run at 533MHz for a long period of time without draining the battery or causing substantially more heat than the stock 480Mhz clock, and consumers are being misled if they expect to see similar levels of performance and the advertised levels of battery life on their own handsets.
Looking at PC hardware for reference, stock and overclocked versions of graphics cards are openly tested and compared, and this lends itself to decent performance comparisons. Attempting to conceal or alter clock speeds based on the testing conditions defeats the point of doing a comparison.
Hopefully this experience will lead to more diligence and accurate testing in the benchmarking media, more scepticism on the part of the consumer, and the rightful calling out of those who attempt to cheat.
It’s a pretty sad move on Samsung’s part, especially since its devices would have apparently outperformed the competition without them gaming the system (just not by as much). If you’re altering the performance just for benchmarking apps then you’re creating a false impression because people will never get that level of performance in the real world. I guess if everyone cheats you’d have a level playing field again, but it would obviously be better if everyone didn’t cheat.
It reminds me of the mileage stats for cars. Manufacturers run them in vacuum conditions that will never be matched in the real world, but because it’s understood everyone does this, the comparison still has some value for consumers, even if it doesn’t reflect real world performance.
I’m not convinced many consumers check up on benchmarks before buying a new phone and you’d have to be really naïve to be shocked by this, but it is still shoddy. Not just putting your hands up and admitting it is actually more annoying, a Samsung spokesman said “This was not an attempt to exaggerate particular benchmarking results. We remain committed to providing our customers with the best possible user experience.” What was it then? A weird decision to optimize for specific benchmarking apps helps customers how?
I suspect I’m like many of us; easily dazzled by bar charts and graphs. Though, it’s nice to see how high these numbers can go, right? We’ve reached the point where its just a matter of further optimizing apps and the Android OS itself. Oh, and perhaps a few manufacturer overlays could be (at last) laid to rest, permanently. Benchmarks, while important – certainly – there are other elements that people truly legitimately care about like like battery life, real world performance, etc.
If you recall the Lenovo K900 we saw with the world’s first Intel Clovertrail+ SoC it scored very high in Antutu, but the phones battery life under those conditions was shockingly poor. My advice – look at the big picture. I highly doubt most consumers look at benchmarks, and I’m not surprised in the slightest that they are ‘gamed’. Technically, they aren’t gamed. Rather, the hardware is pushed to its limit for a short period of time, and its documented. This is akin to an engine being properly tuned for a quarter mile speed test. Of course, the logic nerds and semantic specialists could debate this to no end, but we’ve better things to do. Is it right? Arguably not. Do I care? Nope.
First, maybe I’m too intransigent but I don’t think we should give tech companies a free pass to cheat customers. Because no matter how you look at it, that’s what’s going on: tech companies lie to customers so they can make more money. I know that’s how the world works, but that’s not important.
Also not important:
- That “everyone does it” (except, well, not everyone does it)
- That benchmarks are flawed anyway (an inaccurate benchmark is still better than a benchmark that is inaccurate and rigged)
- That the benefits of gaming benchmarks are small (5% can be enough to change a hierarchy, a headline, and ultimately, a buying decision)
- That people don’t care about benchmarks (those few people who do influence many others through their opinions)
- That customers don’t really lose anything (sports fans don’t lose anything when athletes take steroids, but that doesn’t make it right)
While phone makers are blameworthy for doing what they do, what I dislike the most about this whole affair is the patronizing attitude of some members of the Android community. I get it – you knew all along that benchmarks are rigged, kudos to you. But that doesn’t mean we should all just ignore the issue now that it’s in the open.
What do YOU think?
Join us in the comments and vote in our poll.