GPU Debug Scopes

September 17, 2024

Rendering APIs these days tend to capture their gpu workloads into a serialized form such as a command-buffer or command-list to be dispatched at a later time into a work-queue.

Diagnostic tooling such as RenderDoc or Nsight-Graphics allows the disecting of these command-buffers, but it’s not very obvious to determine what is happening at a high level from the list of API commands alone:

RenderDoc(Before)
RenderDoc
Nsight-Graphics(Before)
NSight

Without any additional debugging information, RenderDoc and Nsight will show a flat list of command-buffer API-calls and will provide some filtering and categorization of these commands to help track down the ones that you care about. This process is slow, especially when working with multiple captures and need to draw some kind of comparisons between them.

  • What if you want to ensure that some host-code ran?
  • What if you want to ensure Step 1 and Step 2 ran before some issue at Step 3?
  • Where did this extra API call come from?
  • How do I make sure my cool optimization ran here?
  • What if your host code made an opportunistic early-return and skipped some API calls that you were expecting?

It’s difficult to capture these kind of contexts with a flat list of API calls.

Thankfully, rendering APIs tend to allow the attaching of diagnostic data to both command-buffers and objects to provide valuable diagnostic information to your captures:

RenderDoc(After)
RenderDoc
Nsight-Graphics(After)
NSight

After adding some object names, and debugging scopes, both RenderDoc and Nsight will interpret this data to have both readable object-names and allows groups of command-buffer API-calls to be grouped and even colorized to your liking. Above, I generated a color for the pipeline-syncing API-call based on bits of the hash of the graphics-pipeline itself so I can identify if a pipeline is being utilized repetitively at a glance.

I’ll talk about Vulkan and OpenGL’s particular implementation of such features and how to utilize RAII(Resource Acquisition Is Initialization) patterns to automatically maintain nested scopes to create a sort of call stack within your command-buffers.

This is a pattern I’ve found myself utilizing quite a lot to help with debugging, diagnosing issues, and profiling.

RAII

If you already know what RAII is, you can just skip to the implementation

Before we get to the Graphics-API specific implementation, here’s a quick rundown on how RAII looks like in C++, C#, and Rust as well.

The final implementations will be provided in C++, but can be mapped to C# and Rust through their various OpenGL and Vulkan bindings.

C++

In C++, one simply has to implement code into the constructor and deconstructor to achieve a RAII pattern.

class DebugScope
{
public:
	// Constructor
	DebugScope(const char* ScopeName)
	{
		// Begin scope
		// Graphics API code
	}

	// Deconstructor
	~DebugScope()
	{
		// End Scope
		// Graphics API code
	}
};

Usage:

void Work();
{
	DebugScope Scope("Work"); // Constructor called

	EvenMoreWork();

	if( error )
	{
		return; // Deconstructor called
	}

} // Deconstructor called

void ScopeTest()
{
	DebugScope Scope("ScopeTest"); // Constructor called

	Work();

	if( error )
	{
		return; // Deconstructor called
	}

	MoreWork();

} // Deconstructor called

CSharp

C# gets a bit more tricky. C# as a language is garbage-collected so the lifetime of an object is determined by the scheduling of the garbage collector. So you cannot deterministically know when a class’s Deconstructor gets called or resources get released. Some additional work must be done to get C++’s behavior where the deconstructor automatically gets called upon leaving the scope. We’re trying to avoid having to manually call functions here!

C# exposes the IDisposable interface for classes to implement for the releasing of unmanaged resources such as file-objects, GPU-objects, Native-types, other IDisposable-types, or any other type that is not handled by the garbage collector. In this case though our neeeds are simpler. We aren’t actually freeing any GPU resources, we just want to call some Graphics-API calls for some automatic scope-management.

using System;

class DebugScope : IDisposable
{
	// Constructor
	public DebugScope(string ScopeName)
	{
		// Begin scope
		// Graphics API code
	}

	// Implement IDisposable
	public void Dispose()
	{
		// End Scope
		// Graphics API code
	}
}

The using keyword will also ensure that an object is valid during the scope of the using-block and will automatically call Dispose upon leaving the scope. There’s no clean way to enforce a class to only be utilized within a using-block though, to ensure proper RAII-behavior. This pattern will have to be a discretion of the code-base.

Usage:

static void Work()
{
	// Constructor called
	using( DebugScope Scope = new DebugScope("Work") )
	{
		EvenMoreWork();

		if( error )
		{
			return; // Dispose called
		}
	} // Dispose called
}

static void ScopeTest()
{
	// Constructor called
	using( DebugScope Scope = new DebugScope("ScopeTest") )
	{
		Work();

		if( error )
		{
			return; // Dispose called
		}

		MoreWork();
	} // Dispose called
}

Rust

Rust implements RAII by implementing the Drop trait. By implementing fn drop(&mut self);, code can now be ran when the struct leaves a scope.

pub struct DebugScope;

impl DebugScope {
    // Constructor
    pub fn new(scope_name: &str) -> Self {
        // Begin scope
        // Graphics API code
        return DebugScope {};
    }
}

// Implement Drop trait
impl Drop for DebugScope {
    // Deconstructor
    fn drop(&mut self) {
        // End scope
        // Graphics API code
    }
}

Similar to C++, just defining the object is enough for our code to run when defined within a scope and upon leaving the scope. Since scope-objects don’t usually have to be touched after they are defined, the object can be named with an underscore(_) before its name to avoid any “unused variable”-warnings.

Usage:

pub fn work() {
    // Constructor called
    let _scope = DebugScope::new("Work");

    even_more_work();

    if (error) {
        return; // Drop called
    }
} // Drop called

pub fn scope_test() {
    // Constructor called
    let _scope = DebugScope::new("ScopeTest");

    work();

    if (error) {
        return; // Drop called
    }

    more_work();
} // Drop called

Implementations

Vulkan

Vulkan provides the VK_EXT_debug_utils extension to allow attaching names to objects as well as colored labels to spans of command-buffer commands and queue-operations.

The vkCmd{Begin,End,Insert}DebugUtilsLabelEXT-functions are utilized to group and label particular spans of command buffer operations with a VkDebugUtilsLabelEXT structure. This allows both a plaintext name(const char*) an RGBA floating-point color(float[4]) to be correlated with command buffer operations:

// Provided by VK_EXT_debug_utils
typedef struct VkDebugUtilsLabelEXT {
	VkStructureType    sType;
	const void*        pNext;
	const char*        pLabelName;
	float              color[4];
} VkDebugUtilsLabelEXT;

vkCmdInsertDebugUtilsLabelEXT additionally allows the insertion of additional one-off labels within a command buffer as well. An additional function or operator-overload may be added to the DebugScope object to insert these additional labels.

A minimally viable Vulkan implementation, ready for you to copy-paste, could look like this:

class DebugScope {
private:
	// Keep this command buffer around so that the deconstructor can properly
	// end the debug-scope
	const VkCommandBuffer commandBuffer;

public:
	// Upon construction, begin the debug-scope
	DebugScope(
		VkCommandBuffer targetCommandBuffer,
		const char* scopeName, std::span<const float, 4> scopeColor
	) : commandBuffer(targetCommandBuffer)
	{
		VkDebugUtilsLabelEXT label = {};
		label.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_LABEL_EXT;
		label.pLabelName = scopeName;
		std::copy_n(scopeColor.begin(), 4, label.color);
		vkCmdBeginDebugUtilsLabelEXT(commandBuffer, &label);
	}

	// A bonus operator to insert plain labels within the command-buffer
	void operator()(const char* scopeName, std::span<const float, 4> scopeColor) const
	{
		VkDebugUtilsLabelEXT label = {};
		label.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_LABEL_EXT;
		label.pLabelName = scopeName;
		std::copy_n(scopeColor.begin(), 4, label.color);
		vkCmdInsertDebugUtilsLabelEXT(commandBuffer, &label);
	}

	// Upon deconstruction, begin the debug-scope
	~DebugScope()
	{
		vkCmdEndDebugUtilsLabelEXT(commandBuffer);
	}
};
// Usage
void DoThing(VkCommandBuffer commandBuffer)
{
	static float thingColor[4] = {1.0f, 1.0f, 0.0f, 1.0f};
	DebugScope scope(commandBuffer, "DoThing", thingColor);

	scope("Step1", thingColor);

	vkCmd...(commandBuffer);

	scope("Step2", thingColor);

	vkCmd...(commandBuffer);

	scope("Step3", thingColor);

	vkCmd...(commandBuffer);
}

The basic pattern can be extended further to add even more conveniences such as utilizing Vulkan-Hpp to help make the code more concise and expressive or utilizing fmt to aid in scope and label name generation. You could even put __FILE__ or __LINE__ or the calling function-name itself into the debug-scope name to be able to more easily “blame” each command buffer scope to the exact host-code that emitted it.

To avoid additional overhead, you might choose to use something like VK_EXT_tooling_info (Core in Vulkan 1.3) to only conditionally insert these API commands if it detects that RenderDoc or Nsight is attached to the Vulkan instance.

Scope and label coloration can either be manually decided at each call-site or it can be generated to your choosing.

A simple one is to maintain a static depth-integer within the DebugScope-object that increments/decrements in the ctor/dtor. Knowing what depth each scope is at allows for procedural color-selection such as utilizing something like Inigo Quilez’s procedural color-palettes. A contribution I made to DuckStation utilized this pattern in particular. This would work fine if you only ever operated upon a single recycled command buffer. In a multi-threaded environment, you will probably want to maintain this depth-variable in a per-command-buffer abstraction as opposed to having a globally shared depth-variable between all command buffers.

Before
Before
After
After

Another option for coloration is to group certain operations by colors such as making all “transfer” workloads yellow, all “graphics” workloads green, all “compute” workloads orange, and all “present” operations magenta.

Some code-bases might further decide to color the labels based on the specific operation, such as coloring a label for a vkCmdClearColorImage operation with the clear-color itself. A contribution I made to Panda3DS utilized this pattern.

ClearColor

With this additional data in your command buffer, debug callbacks will also be able to interpret this additional context within the VkDebugUtilsMessengerCallbackDataEXT structure.

Each of the originally defined VkDebugUtilsLabelEXT structures for each label can be derived by iterating with the pCmdBufLabels and cmdBufLabelCount variables.

These labels are sorted from oldest to newest. So pCmdBufLabels[0] would be the oldest label that was set leading into the current debug message, and pCmdBufLabels[cmdBufLabelCount - 1] would be the most recent label.

This could provide valuable context around particular error messages to help diagnose an issue.

VKAPI_ATTR VkBool32 VKAPI_CALL DebugMessageCallback(
	VkDebugUtilsMessageSeverityFlagBitsEXT      MessageSeverity,
	VkDebugUtilsMessageTypeFlagsEXT             MessageType,
	const VkDebugUtilsMessengerCallbackDataEXT* CallbackData, void* UserData
)
{
	// Loop through all labels for this particular message
	for( std::uint32_t i = 0; i < CallbackData->cmdBufLabelCount; ++i )
	{
		const VkDebugUtilsLabelEXT& CurLabel = CallbackData->pCmdBufLabels[i];
		std::fprintf(stderr, "%u [%s]\n", i, CurLabel.pLabelName);
	}

	switch( vk::DebugUtilsMessageSeverityFlagBitsEXT(MessageSeverity) )
	{
	case vk::DebugUtilsMessageSeverityFlagBitsEXT::eError:
	case vk::DebugUtilsMessageSeverityFlagBitsEXT::eWarning:
	{
		// Something bad happened!
		const char* Message = CallbackData->pMessage;
		std::puts(Message);
		assert(0);
	}
	...
	}
	return VK_FALSE;
}

In this example output, I’ve artifically doubled the size of a pipeline barrier within a scope named Upload Data:

0 [Download]
1 [Rendering]
2 [Upload Data]        <<< This is the most-recent label reached before this error!
Validation Error: [ VUID-VkBufferMemoryBarrier-size-01189 ] Object 0: handle = 0xdb05ecf0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xb63479f2 | vkCmdPipelineBarrier(): pBufferMemoryBarriers[0].size VkBuffer 0xab64de0000000020[] has offset 0x0 and size 0x2f4400 whose sum is greater than total size 0x17a200. The Vulkan spec states: If size is not equal to VK_WHOLE_SIZE, size must be less than or equal to than the size of buffer minus offset (https://vulkan.lunarg.com/doc/view/1.3.268.0/windows/1.3-extensions/vkspec.html#VUID-VkBufferMemoryBarrier-size-01189)

With this, I can now know exactly what part of the code-base to start looking investigating the issue in.

OpenGL

OpenGL provides the GL_KHR_debug extension for attaching diagnostic information to the rendering context.

Since OpenGL operates upon a global-state, gl{Push,Pop}DebugGroup will group together API-calls at a global-scope.

I have yet to see any GPU tooling utilize the id parameter of glPushDebugGroup, but I’ve assigned it to the global scope-depth to try and keep this code future-facing to any diagnostic tooling that may eventually decide to do something with it. You could just statically provide it 0 if you wanted to.

OpenGL

class DebugScope {
	inline static GLuint GlobalScopeDepth = 0;
	const GLuint ScopeDepth;

	public:
	DebugScope(std::string_view ScopeName)
		: ScopeDepth(GlobalScopeDepth++)
	{
		glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION, ScopeDepth, ScopeName.size(), ScopeName.data());
	}

	~DebugScope()
	{
		glPopDebugGroup();
		GlobalScopeDepth--;
	}
};

tdpbuud: Average Color