Benchmarking in Games: Built-in Vs. Actual Gameplay

By Gary Navat

Introduction

Benchmarking is a test or series of tests to assess the component or the computer’s capability. This is also a way to know how well the product would perform against the others. To PC enthusiasts and to some gamers, the benchmark results are crucial as this will affect their buying decisions. But despite of several benchmarking software available, free or commercial, how can a tester or reader be sure that the results are reliable, dependable, and impose a trustworthy assessment?

In gaming, there are some games that come with a benchmarking tool. These are usually based on the 3D engine used in the actual game. But aside from these tools, tester can also assess the game’s performance based on the actual gameplay. These are measured based on the frame rate or frames per second (FPS). These are the frames drawn onto the monitor. Imagine the film camera; it has a series of still images (called frame/s) that when played or rolled, it create an illusion of motion. The faster, the smoother the playback you see on the screen.

Today we will try to answer if we should trust these benchmarking tools, or if we should test based on the actual gameplay.

System Setup and Methodology

Hardware

Processor: AMD Athlon X2 5000+ 2.6GHz Dual-core
Motherboard: ASUS M2N68-CM
Memory: Kingston 2GB DDR2-800
Hard Drive: Seagate 500GB hard disk
Video Card: Sparkle nVidia 9600GT DDR3-512MB

Software

Operating System:  Microsoft Windows 7 x64 with Service Pack 1
Benchmarking: FRAPS 3.2
Video Card Driver: nVidia Forceware 270.61

Games

Half-Life 2 Lost Coast
Company of Heroes
Far Cry 2
HAWX
Street Fighter 4
Just Cause 2 Demo
Dirt 2 Demo

NOTES

– in this article, we will be comparing the built-in benchmarking tool and the actual game’s performance. A very high-end system was not necessary as we are not testing the game’s playability. For relatively demanding games, we set the quality to medium or custom to prevent any components to hold back the performance (bottleneck effect).

– as much as possible we tried replicate how the built-in benchmark performed the test, that includes the map, level, and even the duration of the run. We used FRAPS to record the scores and get the average of each run. The actual benchmarking run varied depending on the duration of the built-in.

– from this point I will only type FPS instead of “Frames for second” as it will be used frequently in this article. If it is “First Person Shooter”, I will type the whole term.

RESULTS

Half-Life 2 Lost Coast

Settings: 1024×768, All High, 6x MSAA, 16x AF
Built-in: 2x, 286 seconds total duration
Actual : Chapter 1
5x 60-second FRAPS benchmark, 300 seconds total duration

Half-Life 2’s built-in benchmark is a fly-over on the first chapter, displaying the game’s visual quality and environment, but it didn’t show any gunfight and showed only a single explosion.

Half Life 2

The actual gameplay’s FPS dropped compared to built-in’s, that is because the actual gameplay has a lot of gunfights and explosions. The built-in benchmark could have been a little accurate if it showed actual gameplay.

Company of Heroes

Settings: 1024×768, All High, Antialiasing enabled
Built-in: 2x, 290 seconds total duration
Actual: Skirmish 2v2, McGachaen’s War map
Single 5-minute FRAPS benchmark, 300 seconds total duration

The built-in benchmark is short movie showing airplanes, paratroopers, with lots of gunfights and explosions. The scene is set at night.

Company of Heroes

The actual gameplay’s FPS is only half of built-in’s FPS. Although the built-in benchmark showed explosions and gunfights, it didn’t reflect the actual gameplay’s performance.

Far Cry 2

Settings:1024×768, Medium render quality, No HDR, bloom, and antialiasing
Low fire, physics, and real trees,
Built-in: Ranch medium – 2x, 300 seconds total duration
Action scene – 4x, 112 seconds total duration
Actual: Chapter 1 (4% of the campaign)
Single 2-minute FRAPS benchmark, 120 seconds total duration

The game’s benchmarking tool allows you to choose a variety of tests, either a fly-through over the game’s environment, or an action scene which imitates the gunfights of the actual gameplay.

Far Cry 2

The Ranch Medium benchmark is too high compared to built-in action scene and actual gameplay, naturally because it was a fly-over type of benchmark. On the other hand, the action scene’s FPS is a little closer than the actual gameplay – benchmark like this is more accurate than fly-over types.

HAWX

Settings: 1024×768, Custom quality, 4x Antialiasing, Direct X 10
Built-in: 2x, 260 seconds total duration
Actual: Free flight – single 2-minute FRAPS benchmark (120 seconds)
Dogfight – 3x 2-minute FRAPS benchmarks (180 seconds)
Operation: Glass Hammer

HAWX’s benchmarking tool is divided into 2 parts. The first part is a fly-over Rio de Janeiro and the second part is an actual scene of the game. The two tests were combined and produced a single result at the end of the benchmark. The actual benchmarking was also done the same. We also added the individual average result of free flight and dogfight tests to show their differences.

HAWX

The free flight’s result is very noticeable for its high FPS, while the actual dogfight and the built-in benchmark have almost same results. Combining the dogfight and the free flight produced higher result taking it farther than the built-in which is also a combination of free flight and a scene from the actual game. It is weird that they have different result. This showed once again the inaccuracy of fly-over type of test.

Street Fighter 4

Settings: 1024×768, Highest quality, C16xQ Antialiasing, Extra touch – Off
Built-in: 2x, 440 seconds total duration
Actual : 8x Varied duration FRAPS benchmark, 323 seconds total duration

The built-in benchmarking tool will run three consecutive rounds of fighting with different characters and map each, and the last part will show selected characters with the camera circling around them.

Street Fighter 4

Street Fighter 4 benchmarking tool is so far the most accurate among the previous ones. We run four rounds of fighting with the same characters and map. But there is something weird in the actual game when played on lower quality. The game’s FPS just won’t past 60, making it looks like the FPS is locked while the built-in’s FPS soars up to 80. Setting the game to highest quality resolve this problem and gave similar results.

Just Cause 2

Settings: 1024×768, Custom quality, 4x Antialiasing
Built-in: 2x, 240 seconds total duration
Actual: 2x 2-minute FRAPS benchmark, 240 seconds total duration

Just Cause 2 benchmarking tool is another fly-over to the game’s environment with day and night cycle.

Just Cause 2

As expected, the built-in’s FPS is higher than the actual and doesn’t reflect the actual performance when you played the game.

Dirt 2

Settings: 1024×768, Custom quality, 4x MSAA
Built-in: 2x, 460 seconds total duration
Actual: 4x 2-minute FRAPS benchmark, 480 seconds total duration

Dirt 2 benchmarking utility portrayed the actual gameplay – a racing event with AI cars in a map available in the game. But the problem is that the demo only allowed us to play the same map (Morocco map) but on a different mode (Trailblazer) in which you race against time, not against the AI cars. In this mode you only see the car you are driving, a lot different from the benchmarking tool. To imitate the benchmarking tool, we also run into another map (Baja map) which you race against the AI cars, resulting to more dusts, bumps, and intense driving.

Dirt 2

Running specific maps posted different result, but they are little closer to built-in’s result. Combining the results from different maps posted the same, exact result as in built-in. This means that the result of the built-in is what you will experience in the actual game.

CONCLUSION

Based on all of the results, built-in fly-over benchmarks don’t reflect the actual game’s performance, because of the lack of, if not lesser interactivity. This also why it gave higher FPS result than the actual gameplay.

Benchmarking tools depicting the actual gameplay usually give closer result to the actual gameplay. They are more accurate compared to the fly-over type of benchmarking. The particles, number of objects, the day and night cycle, the artificial intelligence (AI), and even the hardware drivers and operating systems, they all affect the game’s performance. Different map, scenario might also affect the game’s performance.

But then if you really want to know how the game will perform on a specific system, you will have to play the game. This is also why we are appealing to game developers / publishers to release playable demo before the release of the actual game. This is a great way help the gaming community. If you are the one basing your buying decision on the reviews, you have to be careful and keen on details especially on the methodology. You might be surprised or disappointed when you actually play the game.

Site founder and gaming hardware enthusiast.

1 Comment
  1. This is very interesting, especially the Company of Heroes comparison. Looks like they really want to make it look playable and have chosen a benchmark scene accordingly.

    Leave a reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.