How GameBench makes its uncheatable benchmark

December 9, 2013

GameBench-charts-leadCompanies like Samsung, LG, HTC and many others spend millions of dollars on designing and manufacturing Android smartphones. These investments are made based on the projected sales of these devices. For all of the big names this cycle of design, build, and sell is a continuous activity. This development cycle isn’t just limited to the smartphone makers, it is true for the System-on-a-Chip makers including Qualcomm, Samsung, NVIDA, MediaTek and so on. All these companies have a vested interest in the performance of their products because performance (along with features and other quantifiables like power consumption) heavily influences sales.

For example, Samsung has a very popular range of Galaxy devices with probably the most well known being the ‘S’ range. The Galaxy S3 was hugely popular, the S4 did well – but not quite as well as expected, and there are lots of rumors about the next iteration, the Galaxy S5. The problem is that companies like Samsung aren’t beyond a little bit of cheating to boost the sales of their devices.

This year Samsung has been accused of tweaking the Android firmware on their devices to detect benchmark apps like AnTuTu or Quadrant and ensure that the devices run at maximum performance (and worst battery life) while running these tests. This mean that the benchmark results were artificially boosted.

Enter stage left GameBench, a new startup company that wants to change benchmarking with an uncheatable benchmark that measures “real world performance.” Since our initial post about GameBench, we have been in contact with the company to find out more about how they plan to make these results reflect real world performance.

The first big difference about GameBench is that it uses real world games not synthetic tests. Since 70% of app revenue comes from games and nearly one third of the time users spend on their devices is actually spent playing games, it makes sense to measure how well a device plays those games, not how well it can run an artificial benchmark. GameBench picks multiple game genres such as first-person-shooter (FPS), racing games and running games. The list of games is a secret and changes with the market. The games are download from within the GameBench app, which runs non-invasively in the background and the devices are tested using several game testers (amateurs, intermediate and power gamers). Just to make sure that one of the big companies aren’t influencing one of the testers, these are also changed on a regular basis.

During the benchmarking phase the devices are tested for performance and battery life. If a manufacturer somehow did manage to artificially boosts the performance during a test it would reflect on the battery life which would dent the overall score.

The devices are used as they come out-of-the-box. There is no rooting needed and the tests are even performed at a controlled temperature to ensure that the battery life testing is completely fair.

The results of the testing has two purposes. First GameBench works with app developers, SoC makers and device manufacturers to help them increase the overall performance but without revealing the specifics about which games are used. The benchmarking collects lots of data which is useful for developers and can help them get that extra few percentage points of performance or decrease battery consumption.

gamebench-s4-htc-one-overall-score

Also the company publishes an overall score which is ranked on a curve taking into account several factors and not just the raw frame rate. The first official score to be released by the company is the comparison of the Snapdragon 600 versions of the Galaxy S4 and the HTC One along with the Lenovo K900. The S4 won with an overall score of 3696 while the HTC One scored 2840. That means that in the real world, taking into account performance and battery life, the S4 is 30 percent better at playing games that the HTC One!

Interestingly the Lenovo K900, which uses a dual-core 2Ghz Intel chip, scores just 264. Ouch!

Comments

  • Salaried tips

    Stupid just have a test that won’t start until the cpu idles

    • Jason Yuen

      how does that help?

    • joser116

      A test that temporarily kills all the other processes but the benchmark’s and runs all benchmarks at a uniform resolution.

  • APai

    cool, zigackly what i was looking for!!! now that I have this, I can make up my mind to decide which one sounds like the best device!

    I’ll buy one based on my needs and budget, but wth, whats more important is which one I’d like to like.

    /s

    • Xarus

      If you like to play games on your phone – would this not help you pick one?

      • APai

        at the end of the day, who would really buy a phone based on benchmarks alone ? hairsplitting used to be a popular sport on the PC front. Ive tailed it for a number of years. so many fps, and other stuff, but at the end of the day an acceptable package is what we look at. in the PC at least you could have your own designed rig. don’t know , i view it as a silly exercise.

  • Roberto Tomás

    this test kinda sucks. it favors select game developers, who will generally favor certain optimizations and therefore certain architectures.

    • Xarus

      How does it favour select developers? From what I read, if the game plays well (read: not laggy or with crazy drain on the battery) – it gets a good score.

      • Roberto Tomás

        it says “GameBench picks multiple game genres such as first-person-shooter (FPS), racing games and running games. The list of games is a secret and changes with the market. ”

        — this selection routine favors some developers, in the sense that it is their games that are chosen and eventually if not immediately they will benefit from media coverage. The games chosen are also secret, meaning all sorts of shady deals can happen and no one would know. That isn’t necessarily so bad: just like if Apple published it’s own synthetic benchmarks it can still be useful even when it is obviously biased. Intel is another example: they actually do that. — but pre-selecting the games has a side effect: Those developers will each only optimize for certain architectures.

        • Xarus

          I don’t know what developers optimising for a certain architecture has to do with anything. This isn’t like a normal bench with a synthetic test, either the popular apps run well on a given device or they dont.

          • Roberto Tomás

            Yes but the benchmark isn’t designed to benchmark game optimization, it is designed to benchmark gpus.

        • http://www.garysims.co.uk garysims

          Roberto, I think you are missing the point. The benchmark isn’t to test the games it is to test the devices. It doesn’t matter if the game is optimized via help from GameBench or not, the point is the device is under test not the game. If the developer does optimize their app then everyone wins as the optimization will work across all devices.

          Also the games aren’t selected in conjunction with the developers and they won’t benefit from media coverage as the game list is not disclosed. So no developers are favored over others.

          • Roberto Tomás

            garysims your response disappoints me. you usually know better: and I mean that, I enjoy reading what you write.. My comment was off a bit too, though, so we can forgive each other.

            I was assuming that the benchmark was to compare GPUs, or to give a general comparison of gaming performance on devices. Which it is not, and nor does the website claim that. As you point out, it is to give a “game metric” comparison of device performance. However I stick with my initial feeling on it, that as a GPU benchmark, this is going to suck. But if you want to know how a device might perform playing games that are kept on a secret list that you don’t have, this is the benchmark for you.

            to respond to your comment — Of course the benchmark tests the devices (mostly the gpus) not the games. Of course the games can’t be optimized for Gamebench and it is hard to imagine a reason anyone would care to do that. Why are you creating strawman arguments? Didn’t you read what I wrote before responding? Please go back and try to understand it better, at least if you had a real point in your comment.

          • http://www.garysims.co.uk garysims

            Roberto,

            I think the key to understanding GameBench’s idea is that they want a real world metric and not an artificial one. Yes there are lots of pure GPU benchmarks out there that put the GPU through fixed tests with shading and anti-aliasing etc and they give you a score. Sure. But that score doesn’t necessarily reflect real life.

            The idea of using a selection of games is that these games, both 2D and 3D, are representative of what people are actually playing and how well any device will play those games.

            As time moves on and new games come out then the list will be revised to include the most common games. This time next year the most popular games will be different to those of this year.

            While GPU benchmarks measure pure speed, the GameBench system also measures battery life. Which would I prefer a device that can play the latest game at 60fps for 2 hours or a device that can play the same game at 45fps but for 3 hours. I would pick the second one. GPU benchmarks stress the GPU but ignore the impact it has on the battery.

            Also GameBench will release their app on the Play Store so anyone can download and test games to see which devices handle game A or game B better. The only thing missing from the released version will be the final score calculation as that is something only GameBench release under lab/test conditions . However other information like FPS and battery usage I think will be in their app.

            That means that as an artificial benchmark designed stress the GPU, no this isn’t the best tool and there are others which are better. But as a benchmark of what happens in reality, using the games that people actually play, this is a brilliant benchmark.

          • Roberto Tomás

            hey Gary. You really like the benchmark then? I’ll say that they could do a pretty decent job at overcoming their design flaws if they tried .. there’s no way I’d want to trust this over GFXBench yet .. nor even after a year. But eventually, with a continued line of sucessfully not deviating very much from trusted benchmarks, I might hold it as just as useful. The design flaws are pretty steep, is all.

            • the list does not attempt to target games by which gpus they optimize for, meaning some gpus that actually are more muscular may appear just as anemic as the most common varieties.

            • the list is secret, we have no wy to know how good of a job they did at select either common games or covering the gpu space. No way to know when it is artificially favoring games made by big studios that mostly design for kids when some of us might play different games (or the reverse), no way to know when it favors gpus that support opengl 2.1 versus 2.0 .. etc, etc

            • the battery life thing is a good point, I don’t imagine that they will do poorly at this. but usually a lot of reviews online already do a great job of describing battery life.

            I honestly could not disagree more with your final paragraph. This will not tell you how the device performs in reality — or at least there are design flaws that would definitely make me want to wait, years, before beginning to include it in any reviews. Unless your reality happens to be well prescribed by their expectation of what games you play: which is something they keep secret by design.

          • http://www.garysims.co.uk garysims

            Roberto,

            I guess was are going to have to agree to disagree! But I can’t close without saying a couple of more things!!!

            :-)

            If I buy a car and the manufacturer says it can do X miles to the gallon (or liters/100 KM, whatever). Under artificial conditions they are right. On a test track going around and around that will be the consumption. But I don’t drive my car around a test track, I drive in the town, on the freeway etc.

            So the manufacturer benchmark is like GFXBench etc. The GameBench benchmark is the report from a guy who drives the car for a month and tells me how much he actually consumed. See the difference?

            When people buy cars they may read reviews in motoring magazine from reviewers who drove the car for a longer period of time and reported the actual real world results. That is what GameBench do. I would trust the review in a magazine, so why wouldn’t I trust GameBench’s results?

            As for your GPU optimization comments, I still don’t fully understand what you are getting at. The games are downloaded from the Play Store and are the exact same version that a normal user would get when they hit the download button on any particular device. If the Play Store does delivery an optimized version of the game to a specific device it will be the same for the GameBench test.

            With regards to the list of games, I think that if you and I are able to raise questions about selecting a range of games that cover real world usages, I am sure they can!

            Just to conclude, the key is the difference between an artificial test and a real world test

  • Android Developer

    Why not providing a tool that lets you download a unique benchmark app each time you start it for the first time, so that no company could use the “if app X is up, then boost the speed”.
    Each downloaded app would have a different package-name (the unique identifier of apps on Android), so it can bypass the special list that is on the cheating mechanism.

    Of course, this only tackles one way to cheat. I wonder what would cheaters do to overcome this.

    • Xarus

      From the engadget article a couple weeks ago, looks like early 2014

      • Android Developer

        I see. thanks. do you know perhaps the answer to the other question I’ve asked?

        • Xarus

          the ‘why not providing a tool that lets…’ question? ‘fraid not :(

          • Android Developer

            yes. i also wonder what would cheaters do after such an act.