Rendering APIs these days tend to capture their gpu workloads into a serialized form such as a command-buffer or command-list to be dispatched at a later time into a work-queue.
Diagnostic tooling such as RenderDoc or Nsight-Graphics allows the disecting of these command-buffers, but it’s not very obvious to determine what is happening at a high level from the list of API commands alone:
RenderDoc(Before) |
---|
Nsight-Graphics(Before) |
---|
Without any additional debugging information, RenderDoc and Nsight will show a flat list of command-buffer API-calls and will provide some filtering and categorization of these commands to help track down the ones that you care about. This process is slow, especially when working with multiple captures and need to draw some kind of comparisons between them.
- What if you want to ensure that some host-code ran?
- What if you want to ensure Step 1 and Step 2 ran before some issue at Step 3?
- Where did this extra API call come from?
- How do I make sure my cool optimization ran here?
- What if your host code made an opportunistic early-return and skipped some API calls that you were expecting?
It’s difficult to capture these kind of contexts with a flat list of API calls.
Thankfully, rendering APIs tend to allow the attaching of diagnostic data to both command-buffers and objects to provide valuable diagnostic information to your captures:
RenderDoc(After) |
---|
Nsight-Graphics(After) |
---|
After adding some object names, and debugging scopes, both RenderDoc and Nsight will interpret this data to have both readable object-names and allows groups of command-buffer API-calls to be grouped and even colorized to your liking. Above, I generated a color for the pipeline-syncing API-call based on bits of the hash of the graphics-pipeline itself so I can identify if a pipeline is being utilized repetitively at a glance.
I’ll talk about Vulkan and OpenGL’s particular implementation of such features and how to utilize RAII(Resource Acquisition Is Initialization) patterns to automatically maintain nested scopes to create a sort of call stack within your command-buffers.
This is a pattern I’ve found myself utilizing quite a lot to help with debugging, diagnosing issues, and profiling.
RAII
If you already know what RAII is, you can just skip to the implementation
Before we get to the Graphics-API specific implementation, here’s a quick rundown on how RAII looks like in C++, C#, and Rust as well.
The final implementations will be provided in C++, but can be mapped to C# and Rust through their various OpenGL and Vulkan bindings.
C++
In C++, one simply has to implement code into the constructor and deconstructor to achieve a RAII pattern.
class DebugScope
{
public:
// Constructor
DebugScope(const char* ScopeName)
{
// Begin scope
// Graphics API code
}
// Deconstructor
~DebugScope()
{
// End Scope
// Graphics API code
}
};
Usage:
void Work();
{
DebugScope Scope("Work"); // Constructor called
EvenMoreWork();
if( error )
{
return; // Deconstructor called
}
} // Deconstructor called
void ScopeTest()
{
DebugScope Scope("ScopeTest"); // Constructor called
Work();
if( error )
{
return; // Deconstructor called
}
MoreWork();
} // Deconstructor called
CSharp
C# gets a bit more tricky. C# as a language is garbage-collected so the lifetime of an object is determined by the scheduling of the garbage collector. So you cannot deterministically know when a class’s Deconstructor gets called or resources get released. Some additional work must be done to get C++’s behavior where the deconstructor automatically gets called upon leaving the scope. We’re trying to avoid having to manually call functions here!
C# exposes the IDisposable interface for classes to implement for the releasing of unmanaged resources such as file-objects, GPU-objects, Native-types, other IDisposable-types, or any other type that is not handled by the garbage collector. In this case though our needs are simpler. We aren’t actually freeing any GPU resources, we just want to call some Graphics-API calls for some automatic scope-management.
using System;
class DebugScope : IDisposable
{
// Constructor
public DebugScope(string ScopeName)
{
// Begin scope
// Graphics API code
}
// Implement IDisposable
public void Dispose()
{
// End Scope
// Graphics API code
}
}
The
using
keyword will also ensure that an object is valid during the scope of the
using
-block and will automatically call Dispose
upon leaving the scope.
There’s no clean way to enforce a class to only be utilized within a
using
-block though, to ensure proper RAII-behavior.
This pattern will have to be a discretion of the code-base.
Usage:
static void Work()
{
// Constructor called
using( DebugScope Scope = new DebugScope("Work") )
{
EvenMoreWork();
if( error )
{
return; // Dispose called
}
} // Dispose called
}
static void ScopeTest()
{
// Constructor called
using( DebugScope Scope = new DebugScope("ScopeTest") )
{
Work();
if( error )
{
return; // Dispose called
}
MoreWork();
} // Dispose called
}
- I have an old PR for Ryujinx that implements this for their Vulkan backend!
Rust
Rust implements RAII
by implementing the
Drop trait.
By implementing fn drop(&mut self);
, code can now be ran when the struct leaves
a scope.
pub struct DebugScope;
impl DebugScope {
// Constructor
pub fn new(scope_name: &str) -> Self {
// Begin scope
// Graphics API code
return DebugScope {};
}
}
// Implement Drop trait
impl Drop for DebugScope {
// Deconstructor
fn drop(&mut self) {
// End scope
// Graphics API code
}
}
Similar to C++, just defining the object is enough for our code to run when
defined within a scope and upon leaving the scope.
Since scope-objects don’t usually have to be touched after they are defined,
the object can be named with an underscore(_
) before its name to avoid any
“unused variable”-warnings.
Usage:
pub fn work() {
// Constructor called
let _scope = DebugScope::new("Work");
even_more_work();
if (error) {
return; // Drop called
}
} // Drop called
pub fn scope_test() {
// Constructor called
let _scope = DebugScope::new("ScopeTest");
work();
if (error) {
return; // Drop called
}
more_work();
} // Drop called
Implementations
Vulkan
Vulkan provides the VK_EXT_debug_utils extension to allow attaching names to objects as well as colored labels to spans of command-buffer commands and queue-operations.
The vkCmd{Begin,End,Insert}DebugUtilsLabelEXT
-functions are utilized to group and label particular spans of command buffer operations with a
VkDebugUtilsLabelEXT structure. This allows both a plaintext name(const char*
) an RGBA
floating-point color(float[4]
) to be correlated with command buffer operations:
// Provided by VK_EXT_debug_utils
typedef struct VkDebugUtilsLabelEXT {
VkStructureType sType;
const void* pNext;
const char* pLabelName;
float color[4];
} VkDebugUtilsLabelEXT;
vkCmdInsertDebugUtilsLabelEXT
additionally allows the insertion of additional
one-off labels within a command buffer as well.
An additional function or operator-overload may be added to the DebugScope
object to insert these additional labels.
A minimally viable Vulkan implementation, ready for you to copy-paste, could look like this:
class DebugScope {
private:
// Keep this command buffer around so that the deconstructor can properly
// end the debug-scope
const VkCommandBuffer commandBuffer;
public:
// Upon construction, begin the debug-scope
DebugScope(
VkCommandBuffer targetCommandBuffer,
const char* scopeName, std::span<const float, 4> scopeColor
) : commandBuffer(targetCommandBuffer)
{
VkDebugUtilsLabelEXT label = {};
label.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_LABEL_EXT;
label.pLabelName = scopeName;
std::copy_n(scopeColor.begin(), 4, label.color);
vkCmdBeginDebugUtilsLabelEXT(commandBuffer, &label);
}
// A bonus operator to insert plain labels within the command-buffer
void operator()(const char* scopeName, std::span<const float, 4> scopeColor) const
{
VkDebugUtilsLabelEXT label = {};
label.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_LABEL_EXT;
label.pLabelName = scopeName;
std::copy_n(scopeColor.begin(), 4, label.color);
vkCmdInsertDebugUtilsLabelEXT(commandBuffer, &label);
}
// Upon deconstruction, begin the debug-scope
~DebugScope()
{
vkCmdEndDebugUtilsLabelEXT(commandBuffer);
}
};
// Usage
void DoThing(VkCommandBuffer commandBuffer)
{
static float thingColor[4] = {1.0f, 1.0f, 0.0f, 1.0f};
DebugScope scope(commandBuffer, "DoThing", thingColor);
scope("Step1", thingColor);
vkCmd...(commandBuffer);
scope("Step2", thingColor);
vkCmd...(commandBuffer);
scope("Step3", thingColor);
vkCmd...(commandBuffer);
}
The basic pattern can be extended further to add even more conveniences such as
utilizing Vulkan-Hpp to help make
the code more concise and expressive or utilizing
fmt to aid in scope and label name generation.
You could even put __FILE__
or __LINE__
or
the calling function-name itself
into the debug-scope name to be able to more easily “blame” each command buffer
scope to the exact host-code that emitted it.
To avoid additional overhead, you might choose to use something like VK_EXT_tooling_info (Core in Vulkan 1.3) to only conditionally insert these API commands if it detects that RenderDoc or Nsight is attached to the Vulkan instance.
Scope and label coloration can either be manually decided at each call-site or it can be generated to your choosing.
A simple one is to maintain a static depth
-integer within the DebugScope
-object
that increments/decrements in the ctor/dtor. Knowing what depth each scope is
at allows for procedural color-selection
such as utilizing something like
Inigo Quilez’s procedural color-palettes.
A contribution I made to DuckStation
utilized this pattern in particular.
This would work fine if you only ever operated upon a single recycled command buffer.
In a multi-threaded environment, you will probably want to maintain this
depth
-variable in a per-command-buffer abstraction as opposed to having a
globally shared depth
-variable between all command buffers.
Before |
---|
After |
---|
Another option for coloration is to group certain operations by colors such as making all “transfer” workloads yellow, all “graphics” workloads green, all “compute” workloads orange, and all “present” operations magenta.
Some code-bases might further decide to color the labels based on the specific operation, such as coloring a label for a vkCmdClearColorImage operation with the clear-color itself. A contribution I made to Panda3DS utilized this pattern.
With this additional data in your command buffer, debug callbacks will also be able to interpret this additional context within the VkDebugUtilsMessengerCallbackDataEXT structure.
Each of the originally defined VkDebugUtilsLabelEXT
structures for each label
can be derived by iterating with the pCmdBufLabels
and cmdBufLabelCount
variables.
These labels are sorted from oldest to newest.
So pCmdBufLabels[0]
would be the oldest label that was set leading
into the current debug message, and pCmdBufLabels[cmdBufLabelCount - 1]
would
be the most recent label.
This could provide valuable context around particular error messages to help diagnose an issue.
VKAPI_ATTR VkBool32 VKAPI_CALL DebugMessageCallback(
VkDebugUtilsMessageSeverityFlagBitsEXT MessageSeverity,
VkDebugUtilsMessageTypeFlagsEXT MessageType,
const VkDebugUtilsMessengerCallbackDataEXT* CallbackData, void* UserData
)
{
// Loop through all labels for this particular message
for( std::uint32_t i = 0; i < CallbackData->cmdBufLabelCount; ++i )
{
const VkDebugUtilsLabelEXT& CurLabel = CallbackData->pCmdBufLabels[i];
std::fprintf(stderr, "%u [%s]\n", i, CurLabel.pLabelName);
}
switch( vk::DebugUtilsMessageSeverityFlagBitsEXT(MessageSeverity) )
{
case vk::DebugUtilsMessageSeverityFlagBitsEXT::eError:
case vk::DebugUtilsMessageSeverityFlagBitsEXT::eWarning:
{
// Something bad happened!
const char* Message = CallbackData->pMessage;
std::puts(Message);
assert(0);
}
...
}
return VK_FALSE;
}
In this example output, I’ve artifically doubled the size of a pipeline barrier
within a scope named Upload Data
:
0 [Download]
1 [Rendering]
2 [Upload Data] <<< This is the most-recent label reached before this error!
Validation Error: [ VUID-VkBufferMemoryBarrier-size-01189 ] Object 0: handle = 0xdb05ecf0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xb63479f2 | vkCmdPipelineBarrier(): pBufferMemoryBarriers[0].size VkBuffer 0xab64de0000000020[] has offset 0x0 and size 0x2f4400 whose sum is greater than total size 0x17a200. The Vulkan spec states: If size is not equal to VK_WHOLE_SIZE, size must be less than or equal to than the size of buffer minus offset (https://vulkan.lunarg.com/doc/view/1.3.268.0/windows/1.3-extensions/vkspec.html#VUID-VkBufferMemoryBarrier-size-01189)
With this, I can now know exactly what part of the code-base to start investigating the issue in.
OpenGL
OpenGL provides the GL_KHR_debug extension for attaching diagnostic information to the rendering context.
Since OpenGL API calls operate upon a global-state, gl{Push,Pop}DebugGroup
will group together API-calls at a global-scope.
I have yet to see any GPU tooling utilize the id
parameter of
glPushDebugGroup
, but I’ve assigned it to the global scope-depth to try and
keep this code future-facing to any diagnostic tooling that may eventually
decide to do something with it.
You could just statically provide it 0
if you wanted to.
class DebugScope {
inline static GLuint GlobalScopeDepth = 0;
const GLuint ScopeDepth;
public:
DebugScope(std::string_view ScopeName)
: ScopeDepth(GlobalScopeDepth++)
{
glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION, ScopeDepth, ScopeName.size(), ScopeName.data());
}
~DebugScope()
{
glPopDebugGroup();
GlobalScopeDepth--;
}
};