Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Call to Action] Benchmarks for the benchmark server #36

Open
reduz opened this issue Oct 13, 2023 · 3 comments
Open

[Call to Action] Benchmarks for the benchmark server #36

reduz opened this issue Oct 13, 2023 · 3 comments

Comments

@reduz
Copy link
Member

reduz commented Oct 13, 2023

Creation of benchmarks for the benchmarks server

This document is a compilation of benchmarks suggested by maintainers and contributors from all areas of the engine.

Our goal is to create a limited amount of benchmarks where we can track the progress and optimization improvements we do to the engine every day and how they reflect on benchmarks. Likewise it will help us better spot when we have a performance regression.

The benchmarks should be relatively short and running all the benchmarks should not take a very long time (an hour at much).

Because creating all these benchmarks is laborious, we are asking for help to community and contributors, so we can organize the creation of all these (and obviously suggest more, but we need to ensure the ones listed here are completed first).

For this, we have created a #benchmarks channel in our Rocket Chat (chat.godotengine.org). If you are interested in lending us a hand and implementing the benchmarks from this list, please feel free to join and say hi.

Ultimately the goal is that these benchmarks are contributed to this repository as pull requests.

Benchmark Methodologies

Depending on the type of benchmark, different methodologies need to be used:

  • 🟪Algorithm🟪 Benchmark : This is type of benchmark that measures how much a piece of code runs. Benchmark must call functions to indicate start and stop.
  • 🟥CPU🟥 Benchmark : This is a type of benchmark that measures performance (CPU usage in msecs) over several frames, averaged. Benchmark must call when to start and stop benchmarking at different frames.
  • 🟩GPU🟩 Benchmark : This is a type of becnhmark that measures performance (GPU usage in msecs) over several frames, averageed. Benchmark must call when to start and stop benchmarking at different frames.
  • 🟦Startup🟦 Benchmark : This benchmark measures something that happens upon startup of the editor or game. Benchmark musts erase a running directory, set it up, run and initiate benchmark.

Benchmark format

Benchmarks are written like this:

  • 🟫TYPE🟫 : [Groups it Belongs]: Name : Description

When a benchmark belongs to groups, the benchmark (graph) will be named after the group and contain all the plots of the individual benchmarks belonging to it, helping to contextualize and compare.

Otherwise, if no group is specified, the benchmark is standalone with a single plot.

List of Benchmarks

These are all the benchmarks that have to be created. As they are submitted, the checkboxes will be ticked.

Core

  • 🟪Algorithm🟪 StringNames: Creating and freeing 1000 StringNames Implement Core benchmarks #40
  • 🟪Algorithm🟪 NodePaths: Creating and freeing 1000 NodePaths Implement Core benchmarks #40
  • 🟪Algorithm🟪 Strings: Create a benchmark that runs all complex search/merge/etc string operations 50 times (already in main).
  • 🟪Algorithm🟪 ConfigFileSave / ConfigFileLoad: Create a ConfigFile full of fields and sections (1000). Benchmark saving and benchmark loading. Implement Core benchmarks #40

Math

  • 🟪Algorithm🟪 QuickHull3D: Run QuickHull 50 times with random point clouds.
  • 🟪Algorithm🟪 Triangulate: Triangulate a circle of 1000 points
  • 🟪Algorithm🟪 Delaunay2D: Delaunay 1000 points in 2D
  • 🟪Algorithm🟪 Delaunay3D: Delaunay 1000 points in 3D
  • 🟪Algorithm🟪 Expression: Create 20 complex expressions. Run each 100 times.
  • 🟪Algorithm🟪 Noise: Create benchmarks for different noise models.

3D Rendering

CPU Benchmarks

  • 🟥CPU🟥 [Cull]: Static: Cull 10k objects

  • 🟥CPU🟥 [Cull]: Rotating: Cull 10k rotating objects

  • 🟥CPU🟥 Cull Directional Shadows : Cull 10k objects lit by directional shadow (4 splits)

  • 🟥CPU🟥 [Cull Omni Shadows]: Static: Cull 10k objects, 200 omni lights with shadows, static

  • 🟥CPU🟥 [Cull Omni Shadows]: Dynamic: Cull 10k objects, 200 omni lights with shadows, lights move around.

  • 🟥CPU🟥 [Cull Spot Shadows]: Static: Cull 10k objects, 200 omni lights with shadows, static

  • 🟥CPU🟥 [Cull Spot Shadows]: Dynamic: Cull 10k objects, 200 omni lights with shadows, lights move around.

  • 🟥CPU🟥 Lightmap probe influence : Simple scene baked with lightmaps (and using tons of probes, like 500), 1000 objects moving around receiving influence from probes.

GPU Benchmarks

  • 🟩GPU🟩 [GI,Lighting,Effects,AA,DOF]: Sponza with Ambient Light: Sponza only lit by ambient light.

  • 🟩GPU🟩 [Lighting]: Sponza with directional light: Sponza lit with a directional light and shadows.

  • 🟩GPU🟩 [Lighting]: Sponza with omni lights: Sponza lit by several omni lights casting shadows.

  • 🟩GPU🟩 [GI]: Sponza lightmapped: Sponza lit with directional light, lightmapped only (lights all baked static).

  • 🟩GPU🟩 [GI]: Sponza with RefProbe: Sponza with directional light, with GI reflection probe.

  • 🟩GPU🟩 [GI]: Sponza with VoxelGI: Sponza with directional light, with GI from VoxelGI

  • 🟩GPU🟩 [GI]: Sponza with SDFGI: Sponza with directional light, with GI from SDFGI

  • 🟩GPU🟩 [GI]: Sponza with SSGI: Sponza with directional light, with GI from SSGI

  • 🟩GPU🟩 [Effects]: Sponza with SSR: Sponza with directional light, with SSR

  • 🟩GPU🟩 [Effects]: Sponza with Volumetric Fog: Sponza lit with directional light and volumetric fog.

  • 🟩GPU🟩 [Effects]: Sponza with SSAO: Sponza with constant ambient light (no lights) and SSAO.

  • 🟩GPU🟩 [Effects] GLOW: Sponza with ambient light, all glow levels enabled.

  • 🟩GPU🟩 [AA]: FXAA: Sponza with ambient light (not lights), and only FXAA

  • 🟩GPU🟩 [AA]: MSAA4x: Sponza with ambient light (not lights), and MSAA4x

  • 🟩GPU🟩 [AA]: MSAA8x: Sponza with ambient light (not lights), and MSAA8x

  • 🟩GPU🟩 [AA]: TAA: Sponza with ambient light (not lights), and TAA

  • 🟩GPU🟩 [AA]: FSR2_100: Sponza with ambient light (not lights), and FSR2 at 100%

  • 🟩GPU🟩 [AA]: FSR2_50: Sponza with ambient light (not lights), and FSR2 at 50% (or whathever)

  • 🟩GPU🟩 [DOF]: Box: Sponza with ambient light, running Box DOF effect (near and far)

  • 🟩GPU🟩 [DOF]: Hex: Sponza with ambient light, running Hex DOF effect (near and far)

  • 🟩GPU🟩 [DOF]: Circle: Sponza with ambient light, running Circle DOF effect (near and far)

  • 🟩GPU🟩 [SDFGI Motion]: OFF: Very large scene with directional light and without SDFGI off, camera moving around.

  • 🟩GPU🟩 [SDFGI Motion] ON: Very large scene with directional light and without SDFGI on, camera moving around.

2D Rendering

  • 🟥CPU🟥 [CanvasItem]: Rendering: Draw different shapes (images, circles, etc) using the CanvasItem 2D drawing API. Draw 5000 elements. Measure performance.

  • 🟥CPU🟥 [CanvasItem]: Re-Rendering: Same as above, but every frame call queue_update() so it redraws. Measure performance.

  • 🟥CPU🟥 Polygon : Draw complex polygons (1000 points) every frame

  • 🟥CPU🟥 [BunnyMark]: nodes: Run a bunnymark style benchmark with 5000 nodes.

  • 🟥CPU🟥 [BunnyMark]: CanvasItem: Run a bunnymark style benchmark with 5000 bunnies drawn using CanvasItem API

  • 🟥CPU🟥[BunnyMark]: MeshInstance2D: Run a bunnymark style benchmark with 5000 bunnies drawn using MeshInstance2D, drawing directly into the 2D array.

  • 🟥CPU🟥 [2D Lights]: Lights: Benchmark running several 2D lights on-screen

  • 🟥CPU🟥 [2D Lights]: Lights & Shadows: Benchmark running several 2D lights and shadows on-screen

2D Physics

  • 🟥CPU🟥 Rigid Bodies : Throw 2000 shapes next to each other in a pit, let them solve and stack, measure performance for 20 seconds
  • 🟥CPU🟥 Area2D : Place 2000 kinematic bodies, move around 1000 Area2D nodes of different chapes. Measure performance for 10 seconds.
  • 🟥CPU🟥 CharacterBody : Make a complex scene (maybe using tilemap). Throw 1000 CharacterBodies running around and jumping randomly.Measure performance for 10 seconds.
  • 🟪Algorithm🟪 RayCast: Measure how much it takes doing 10000 raycasts in a complex scene (lots of shapes). From random pairs of points.

3D Physics

  • 🟥CPU🟥 Rigid Bodies : Throw 2000 shapes next to each other in a pit, let them solve and stack, measure performance for 20 seconds
  • 🟥CPU🟥 Area3D : Place 2000 kinematic bodies, move around 1000 Area2D nodes of different chapes. Measure performance for 10 seconds.
  • 🟥CPU🟥 CharacterBody : Make a complex scene (maybe using gridmap). Throw 1000 CharacterBodies running around and jumping randomly.Measure performance for 10 seconds.
  • 🟥CPU🟥 Triangle Mesh : Open a complex triangle mesh geometry collision, trow 1000 bodies on it, measure performance for 20 seconds.
  • 🟥CPU🟥 SoftBody : Create a cloth softbody, throw 500 rigid bodies on it. Measure performance for 10 seconds.
  • 🟪Algorithm🟪 RayCast: Measure how much it takes doing 10000 raycasts in a complex scene (lots of shapes). From random pairs of points.

GDScript

  • 🟪Algorithm🟪 [Calculate Mandlebrot set]: GDscript: for fixed size image, and time it.
  • 🟪Algorithm🟪 [Simple for loop add]: GDscript: Run 1,000,000 iterations of a for loop adding a number
  • 🟪Algorithm🟪 [Simple for loop call]: GDScript: Run 1,000,000 iterations calling a function
  • 🟪Algorithm🟪 [Lambda performance]: GDscript: Make a simple lambda function and call it 1000 times.
  • 🟪Algorithm🟪 [Port these benchmarks]: GDscript: https://programming-language-benchmarks.vercel.app/lua

C#

  • Same benchmarks as GDScript, but named C#

GodotCPP (C++)

  • Same benchmarks as GDScript, but named C++

Asset importing

  • 🟦Startup🟦 [Mesh Import]: GLTF: Import a large GLTF2 scene (lots of geometry) but no textures.

  • 🟦Startup🟦 [Mesh Import]: FBX: Import a large FBX file (same as above, FBX format)

  • 🟦Startup🟦 [Mesh Import]: OBJ : Same as above, OBJ format.

  • 🟦Startup🟦 [Image Import]: Lossless: Import 200 images generated at random resolutions with lossless compression (WebP)

  • 🟦Startup🟦 [Image Import]: Lossy: Import 200 images generated at random resolutions with lossy compression (WebP)

  • 🟦Startup🟦 [Image Import]: S3TC: Import 200 images generated at random resolutions as S3TC

  • 🟦Startup🟦 [Image Import]: ETC2: Import 200 images generated at random resolutions as S3TC

  • 🟦Startup🟦 [Image Import]: BC7: Import 200 images generated at random resolutions as BC7

  • 🟦Startup🟦 Image Import]: BC6H: Import 200 images generated at random resolutions as BC6H

  • 🟦Startup🟦 [Image Import]: ASTC: Import 200 images generated at random resolutions as ASTC

  • 🟦Startup🟦 GLTF Export : Export a large GLTF2 file (same as import I guess)

  • 🟦Startup🟦 [Audio Import]: OGG : Import 50 OGG Vorbis files

  • 🟦Startup🟦 [Audio Import]: MP3 Import: Import 50 MP3 files

  • 🟦Startup🟦 [Audio Import]: WAV Uncompressed: Import 50 WAV files

  • 🟦Startup🟦 [Audio Import]: WAV IMA-ADPCM: Import 50 WAV files

Scene nodes (base Node class)

  • 🟪Algorithm🟪 [Adding 5000 children]: Unnamed: Adding 5000 random children nodes without name.

  • 🟪Algorithm🟪 [Adding 5000 children]: Named: Adding 5000 random children nodes with the same name (let the conflict resolution happen).

  • 🟪Algorithm🟪 Moving node children: Move 5000 children nodes between two random positions 5000 times.

  • 🟪Algorithm🟪 [Delete children]: in order: Remove all 5000 children in order, first to last.

  • 🟪Algorithm🟪 [Delete children]: in reverse order: Remove all 5000 children in order, last to first.

  • 🟪Algorithm🟪 [Delete children]: in random order: Remove all 5000 children in random order.

  • 🟪Algorithm🟪 Get node: Create a complex scene hierarchy of 1000 nodes with random nesting, Obtain paths of all nodes, test performance of get_node() for each one from root node

Animation

  • 🟥CPU🟥 Animated Models Blended : Animate 100 Skeletal character model with complex AnimationBlendTree (Perhaps optimizing the searching way for TrackPath using other methods than string matching will improve the performance)

  • 🟥CPU🟥 Animated Models State : Animate 1000 Skeletal character model with complex AnimationStateMachine

  • 🟥CPU🟥 Tweens: Animate 100 properties with a Tween

  • 🟥CPU🟥 Tween Methods: Animate 1000 Tweens using tween_method().

Navigation

  • 🟪Algorithm🟪 AStar3D: Create random map with 1000 inter connected points (probably using Delaunay3D), Benchmark solving it 1000 times from random 2 points
  • 🟪Algorithm🟪 Navigation: On a premade map, solve 1000 random paths between two points on the surface of the shapes.
  • Agents : Benchmark 1000 moving agents in a map with local collision avoidance.

GUI

  • 🟥CPU🟥 RichTextLabel long text shaping: Display a RichTextLabel with 100+ paragraphs of Lorem Ipsum
  • 🟥CPU🟥 Container sorting: Make a BoxContainer with 1000 Control children and call queue_sort() for 1000 frames in a row
  • 🟥CPU🟥 Container resizing : Create a random set of containers up to 20 levels. Every frame resize the parent container, measure CPU.
  • 🟥CPU🟥 Text Rendering : Create a label with a huge text (lorem impsum) with a tiny font size that fills the screen. Measure performance.
  • 🟥CPU🟥 Text Resizing : Create a complex paragraph (loren ipsum) in a Label. Make a script that resizes it evey frame so it has to re-fit the text.

Editor

  • 🟦Startup🟦 [Editor startup]: with no shader cache

  • 🟦Startup🟦 [Editor startup]: with shaders cached

  • 🟦Startup🟦 [Editor scan 5000 files on first open]: no cache

  • 🟦Startup🟦 [Editor scan 5000 files on first open]: with cache

  • 🟦Startup🟦 Inspector full tree update: Open an object with a large number of properties, variety of property editors; force update_tree.

  • 🟦Startup🟦 Editor full theme update: Open the editor, change editor base/accent color or font size.

  • 🟦Startup🟦 Editor log update: Print 10000 messages (some duplicating, some unique) to the log, toggle on and off "Collapse similar".

Networking (no idea how to benchmark this, you are the expert)

  • [High level networking node]: sync: Sync 1000 nodes in the SceneTree

  • [High level networking node]: variable sync: Sync 1 node with 1000 variables in the SceneTree

@Chubercik
Copy link
Contributor

A few boxes in the Math section can be ticked off :)

@coppolaemilio

This comment was marked as outdated.

@Calinou

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants