Vulkan in 30 minutes
30 minutes notactually guaranteed.
I've writtenthis post with a specific target audience in mind, namely those who have a goodgrounding in existing APIs (e.g. D3D11 and GL) and understand the concepts ofmultithreading, staging resources, synchronisation and so on but want to knowspecifically how they are implemented in Vulkan. So we end up with a whirlwind tour ofwhat the main Vulkan concepts look like.
This isn'tintended to be comprehensive (for that you should read the spec or a more in-depth tutorial),nor is it heavy in background or justification. Hopefully by the end of thisyou should be able to read specs or headers and have a sketched idea of how asimple Vulkan application is implemented, but you will need to do additional reading.
Mostly, this isthe document I wish had already been written when I first encountered Vulkan- so for the mostpart it is tuned to what I would have wanted to know. I'll reference the specwhenever you should do more reading to get a precise understanding, but you'llat least know what to look for.
- baldurk
General
At the end ofthe post I've included a heavily abbreviated pseudocode program showing therough steps to a hello world triangle, to match up to the explanations.
A few simplethings that don't fit any of the other sections:
Vulkan is a C API, i.e. free function entry points. This is the same as GL.
The API is quite heavily typed - unlike GL. Each enum is separate, handles that are returned are opaque 64-bit handles so they are typed on 64-bit (not typed on 32-bit, although you can make them typed if you use C++).
A lot of functions (most, even) take extensible structures as parameters instead of basic types.
VkAllocationCallbacks * is passed into creation/destruction functions that lets you pass custom malloc/free functions for CPU memory. For more details read the spec, in simple applications you can just pass NULL and let the implementation do its own CPU-side allocation.
Warning: I'm notconsidering any error handling, nor do I talk much about querying forimplementation limits and respecting them. While I'm not intentionally gettinganything outright wrong, I am skipping over many details that a realapplication needs to respect. This post is just to get a grasp of the API, it'snot a tutorial!
First steps
You initialise Vulkan by creating aninstance (VkInstance). The instance is an entirely isolatedsilo of Vulkan- instances do not know about each other in any way. At this point youspecify some simple information including which layers and extensions you wantto activate - there are query functions that let you enumerate what layers andextensions are available.
With a VkInstance, you can now examine the GPUs available. A given Vulkan implementationmight not be running on a GPU, but let's keep things simple. Each GPU gives youa handle - VkPhysicalDevice. You can query the GPUs names,properties, capabilities, etc. For example see vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures.
With aVkPhysicalDevice, you can create a VkDevice. The VkDevice is your mainhandle and it represents a logical connection - i.e. 'I am running Vulkan on thisGPU'. VkDevice is used for pretty much everythingelse. This is the equivalent of a GL context or D3D11 device.
N.B. Each ofthese is a 1:many relationship. A VkInstance can have many VkPhysicalDevices, a VkPhysicalDevice can have many VkDevices. In Vulkan 1.0, there is nocross-GPU activity, but you can bet this will come in the future though.
I'm hand wavingsome book-keeping details, Vulkan in general is quite lengthy in setup due to its explicit nature and thisis a summary not an implementation guide. The overall picture is that yourinitialisation mostly looks like vkCreateInstance() → vkEnumeratePhysicalDevices() → vkCreateDevice(). For a quick anddirty hello world triangle program, you can do just that and pick the firstphysical device, then come back to it once you want error reporting &validation, enabling optional device features, etc.
Images and Buffers
Now that we havea VkDevice we can start creating pretty muchevery other resource type (a few have further dependencies on other objects),for example VkImage and VkBuffer.
For GL people,one kind of new concept is that you must declare at creation time how an imagewill be used. You provide a bit field, with each bit indicating a certain typeof usage - color attachment, or sampled image in shader, or image load/store, etc.
You also specifythe tiling for the image - LINEAR or OPTIMAL. This specifies the tiling/swizzling layout for the image data inmemory. OPTIMAL tiled images are opaquely tiled,LINEAR are laid out just as you expect. This affects whether the image datais directly readable/writable, as well as format support - drivers reportimage support in terms of 'what image types are supported in OPTIMAL tiling, and what image types are supported in LINEAR'. Be prepared for very limited LINEAR support.
Buffers aresimilar and more straightforward, you give them a size and a usage and that'sabout it.
Images aren'tused directly, so you will have to create a VkImageView - this is familiar to D3D11 people. Unlike GL texture views, image viewsare mandatory but are the same idea - a description of what array slices or miplevels are visible to wherever the image view is used, and optionally adifferent (but compatible) format (like aliasing a UNORMtexture as UINT).
Buffers are usually used directly as they're just a block of memory, butif you want to use them as a texel buffer in a shader, you need to providea VkBufferView.
Allocating GPU Memory
Those buffersand images can't be used immediately after creation as no memory has beenallocated for them. This step is up to you.
Available memoryis exposed to applications by the vkGetPhysicalDeviceMemoryProperties(). It reports one or more memory heaps of given sizes, and one ormore memory typeswith given properties. Each memory type comes from oneheap - so a typical example for a discrete GPU on a PC would be two heaps - one for systemRAM, and one for GPU RAM, and multiple memory types from each.
The memory typeshave different properties. Some will be CPU visible or not, coherent betweenGPU and CPU access, cached or uncached, etc. You can find out all of theseproperties by querying from the physical device. This allows you to choose thememory type you want. E.g. staging resources will need to be in host visiblememory, but your images you render to will want to be in device local memoryfor optimal use. However there is an additional restriction on memory selectionthat we'll get to in the next section.
To allocatememory you call vkAllocateMemory() whichrequires your VkDevice handle and adescription structure. The structure dictates which type of memory to allocatefrom which heap and how much to allocate, and returns a VkDeviceMemory handle.
Host visiblememory can be mapped for update - vkMapMemory()/vkUnmapMemory() are familiar functions. All maps are by definition persistent, andas long as you synchronise it's legal to have memory mapped while in use by theGPU.
GL people willbe familiar with the concept, but to explain for D3D11 people - the pointersreturned by vkMapMemory() can be heldand even written to by the CPU while the GPU is using them. These 'persistent'maps are perfectly valid as long as you obey the rules and make sure tosynchronise access so that the CPU isn't writing to parts of the memoryallocation that the GPU is using (see later).
This is a littleoutside the scope of this guide but I'm going to mention it any chance I get - for the purposesof debugging, persistent maps of non-coherent memory with explicit regionflushes will be much more efficient/fast than coherent memory. The reason beingthat for coherent memory the debugger must jump through hoops to detect andtrack changes, but the explicit flushes of non-coherent memory provide nice markup ofmodifications.
In RenderDoc to help out with this, if you flush a memory region then thetool assumes you will flush for every write, and turns off the expensive hoop-jumping to trackcoherent memory. That way even if the only memory available is coherent, thenyou can get efficient debugging.
Binding Memory
Each VkBuffer or VkImage, depending on itsproperties like usage flags and tiling mode (remember that one?) will reporttheir memory requirements to you viavkGetBufferMemoryRequirements or vkGetImageMemoryRequirements.
The reportedsize requirement will account for padding for alignment between mips, hiddenmeta-data, and anything else needed for the total allocation. The requirementsalso include a bitmask of the memory types that are compatible with thisparticular resource. The obvious restrictions kick in here: that OPTIMAL tiling color attachment image will report that only DEVICE_LOCAL memory types are compatible, and it will be invalid to try to bindsome HOST_VISIBLE memory.
The memory typerequirements generally won't vary if you have the same kind of image or buffer.For example if you know that optimally tiled images can go in memory type 3,you can allocate all of them from the same place. You will only have to checkthe size and alignment requirements per-image. Read the spec for the exactguarantee here!
Note the memoryallocation is by no means 1:1. You can allocate a large amount of memory and aslong as you obey the above restrictions you can place several images or buffersin it at different offsets. The requirements include an alignment if you areplacing the resource at a non-zero offset. In fact you will definitely want to do this in any realapplication, as there are limits on the total number of allocations allowed.
There is an additional alignment requirement bufferImageGranularity - a minimum separation required between memory used for a VkImage and memory used for aVkBuffer in thesame VkDeviceMemory. Read the spec for more details, butthis mostly boils down to an effective page size, and requirement that eachpage is only used for one type of resource.
Once you havethe right memory type and size and alignment, you can bind it with vkBindBufferMemory or vkBindImageMemory. This bindingis immutable, and must happen before you start using the buffer orimage.
Command buffers and submission
Work isexplicitly recorded to and submitted from a VkCommandBuffer.
A VkCommandBuffer isn't created directly, it is allocated from a VkCommandPool. This allows for better threading behaviour since command buffers andcommand pools must be externally synchronised (see later). You can have a poolper thread and vkAllocateCommandBuffers()/vkFreeCommandBuffers() command buffers from it without heavy locking.
Once you havea VkCommandBuffer you begin recording, issue allyour GPU commands into it *hand waving goes here* and end recording.
Command buffersare submitted to a VkQueue. The notion ofqueues are how work becomes serialised to be passed to the GPU. A VkPhysicalDevice (remember way back? The GPU handle) can report a numberof queue families with different capabilities. e.g. a graphics queuefamily and a compute-only queue family. When you create your device you ask for a certainnumber of queues from each family, and then you can enumerate them from thedevice after creation with vkGetDeviceQueue().
I'm going tofocus on having just a single do-everything VkQueue as the simplecase, since multiple queues must be synchronised against each other as they canrun out of order or in parallel to each other. Be aware that someimplementations might require you to use a separate queue for swapchainpresentation - I think chances are that most won't, but you have to account for this.Again, read the spec for details!
You can vkQueueSubmit() several command buffers at once to the queue and they will beexecuted in turn. Nominally this defines the order of execution but rememberthat Vulkan has very specific ordering guarantees - mostly about what work can overlap ratherthan wholesale rearrangement - so take care to read the spec to make sure you synchronise everythingcorrectly.
Shaders and Pipeline State Objects
The reasoningbehind moving to monolithic PSOs is well trodden by now so I won't go over it.
A Vulkan VkPipeline bakes in a lot of state, but allows specific parts of the fixedfunction pipeline to be set dynamically: Things like viewport, stencil masksand refs, blend constants, etc. A full list as ever is in the spec. When youcall vkCreateGraphicsPipelines(), you choose whichstates will be dynamic, and the others are taken from values specified in thePSO creation info.
You canoptionally specify a VkPipelineCache at creationtime. This allows you to compile a whole bunch of pipelines and then call vkGetPipelineCacheData() to save the blob of data to disk. Next time you can prepopulate thecache to save on PSO creation time. The expected caveats apply - there isversioning to be aware of so you can't load out of date or incorrect caches.
Shaders arespecified as SPIR-V. This has already been discussed much better elsewhere, so I will justsay that you create a VkShaderModule from a SPIR-V module, whichcould contain several entry points, and at pipeline creation time you chose oneparticular entry point.
The easiest wayto get some SPIR-V for testing is with the reference compiler glslang, but other front-ends are available, as well as LLVM → SPIR-V support.
Binding Model
To establish apoint of reference, let's roughly outline D3D11's binding model. GL's is quitesimilar.
Each shader stage has its own namespace, so pixel shader texture binding 0 is not vertex shader texture binding 0.
Each resource type is namespaced apart, so constant buffer binding 0 is definitely not the same as texture binding 0.
Resources are individually bound and unbound to slots (or at best in contiguous batches).
In Vulkan, the base bindingunit is a descriptor. A descriptor is an opaque representation that stores'one bind'. This could be an image, a sampler, a uniform/constant buffer, etc.It could also be arrayed - so you can have an array of images that can be different sizes etc, aslong as they are all 2D floating point images.
Descriptorsaren't bound individually, they are bound in blocks in a VkDescriptorSet which each have a particular VkDescriptorSetLayout. The VkDescriptorSetLayout describesthe types of the individual bindings in each VkDescriptorSet.
The easiest wayI find to think about this is consider VkDescriptorSetLayout as being like a C struct type - it describes some members, each memberhaving an opaque type (constant buffer, load/store image, etc). The VkDescriptorSet is a specific instance of that type - and each member in the VkDescriptorSet is a binding you can update with whichever resource you want it tocontain.
This is roughlyhow you create the objects too. You pass a list of the types, array sizes andbindings to Vulkan to create a VkDescriptorSetLayout, then you canallocateVkDescriptorSets with that layout from a VkDescriptorPool. The pool acts the same way as VkCommandPool, to let you allocate descriptors on different threads more efficiently byhaving a pool per thread.
VkDescriptorSetLayoutBinding bindings[]={
// binding 0 is a UBO, array size 1, visibleto all stages
{0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,1,VK_SHADER_STAGE_ALL_GRAPHICS,NULL},
// binding 1 is a sampler, array size 1,visible to all stages
{1, VK_DESCRIPTOR_TYPE_SAMPLER, 1, VK_SHADER_STAGE_ALL_GRAPHICS,NULL},
// binding 5 is an image, array size 10,visible only to fragment shader
{5, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,10,VK_SHADER_STAGE_FRAGMENT_BIT,NULL},
};
Example C++outlining creation of a descriptor set layout
Once you have adescriptor set, you can update it directly to put specific values in thebindings, and also copy between different descriptor sets.
When creating apipeline, you specify N VkDescriptorSetLayouts for use ina VkPipelineLayout. Then when binding, you have to bindmatching VkDescriptorSets of those layouts. The sets canupdate and be bound at different frequencies, which allows grouping allresources by frequency of update.
To extend theabove analogy, this defines the pipeline as something like a function, and itcan take some number of structs as arguments. When creating the pipeline youdeclare the types (VkDescriptorSetLayouts) of eachargument, and when binding the pipeline you pass specific instances of thosetypes (VkDescriptorSets).
The other sideof the equation is fairly simple - instead of having shader or type namespaced bindings in your shader code,each resource in the shader simply says which descriptor set and binding itpulls from. This matches the descriptor set layout you created.
#version430
layout(set =0, binding =0)uniform MyUniformBufferType {
// ...
} MyUniformBufferInstance;
// note in the C++ sample above, this is just a sampler - not a combined image+sampler
// as is typical in GL.
layout(set =0, binding =1) sampler MySampler;
layout(set =0, binding =5)uniformimage2D MyImages[10];
Example GLSLshowing bindings
Synchronisation
I'm going tohand wave a lot in this section because the specific things you need tosynchronise get complicated and long-winded fast, and I'm just going to focus onwhat synchronisation is available and leave the details of what youneed to synchronise to reading of specs or more in-depth documents.
This is probablythe hardest part of Vulkan to get right, especially since missing synchronisation might notnecessarily break anything when you run it!
Several types ofobjects must be 'externally synchronised'. In fact I've used that phrase beforein this post. The meaning is basically that if you try to use the same VkQueue on two different threads, there's no internal locking so it willcrash - it's up to you to 'externally synchronise' access to that VkQueue.
For the exactrequirements of what objects must be externally synchronised when you shouldcheck the spec, but as a rule you can use VkDevice for creation functions freely - it is locked for allocation sake - but things likerecording and submitting commands must be synchronised.
N.B. There is noexplicit or implicit ref counting of any object - you can't destroy anything until you aresure it is never going to be used again by either the CPU or the GPU.
Vulkan has VkEvent, VkSemaphore and VkFence which can be used for efficient CPU-GPU and GPU-GPU synchronisation. They work as youexpect so you can look up the precise use etc yourself, but there are nosurprises here. Be careful that you do use synchronisation though, as there arefew ordering guarantees in the spec itself.
Pipelinebarriers are a new concept, that are used in general terms for ensuringordering of GPU-side operations where necessary, for example ensuring that results fromone operation are complete before another operation starts, or that all work ofone type finishes on a resource before it's used for work of another type.
There are threetypes of barrier - VkMemoryBarrier, VkBufferMemoryBarrier and VkImageMemoryBarrier. A VkMemoryBarrier applies to memory globally, and the other two apply to specificresources (and subsections of those resources).
The barriertakes a bit field of different memory access types to specify what operationson each side of the barrier should be synchronised against the other. A simpleexample of this would be "this VkImageMemoryBarrier has srcAccessMask = ACCESS_COLOR_ATTACHMENT_WRITE and dstAccessMask = ACCESS_SHADER_READ", which indicates that all color writes should finish before anyshader reads begin - without this barrier in place, you could read stale data.
Image layouts
Image barriershave one additional property - images exist in states called image layouts. VkImageMemoryBarrier can specify a transition from one layout to another. The layout mustmatch how the image is used at any time. There is a GENERAL layout which is legal to use for anything but might not be optimal,and there are optimal layouts for color attachment, depth attachment, shadersampling, etc.
Images begin ineither the UNDEFINED or PREINITIALIZED state (you can choose). The latter is useful for populating an imagewith data before use, as the UNDEFINED layout hasundefined contents - a transition from UNDEFINED to GENERAL may lose the contents, but PREINITIALIZED to GENERAL won't.Neither initial layout is valid for use by the GPU, so at minimum aftercreation an image needs to be transitioned into some appropriate state.
Usually you haveto specify the previous and new layouts accurately, but it is always valid totransition from UNDEFINED to anotherlayout. This basically means 'I don't care what the image was like before,throw it away and use it like this'.
Render passes
A VkRenderpass is Vulkan's way of more explicitly denoting how your rendering happens, rather thanletting you render into then sample images at will. More information about howthe frame is structured will aid everyone, but primarily this is to aid tilebased renderers so that they have a direct notion of where rendering on a giventarget happens and what dependencies there are between passes, to avoid leavingtile memory as much as possible.
N.B. Because Iprimarily work on desktops (and for brevity & simplicity) I'm notmentioning a couple of optional things you can do that aren't commonly suitedto desktop GPUs like input and transient attachments. As always, read the spec:).
The firstbuilding block is a VkFramebuffer, which is a setof VkImageViews. This is not necessarilythe same as the classic idea of a framebuffer as the particular images you arerendering to at any given point, as it can contain potentially more images thanyou ever render to at once.
A VkRenderPass consists of a series of subpasses. In your simple triangle caseand possibly in many other cases, this will just be one subpass. For now, let'sjust consider that case. The subpass selects some of the framebufferattachments as color attachments and maybe one as a depth-stencil attachment.If you have multiple subpasses, this is where you might have different subsetsused in each subpass - sometimes as output and sometimes as input.
Drawing commandscan only happen inside a VkRenderPass, and some commandssuch as copies clears can only happen outside a VkRenderPass. Some commands such as state binding can happen inside or outside atwill. Consult the spec to see which commands are which.
Subpasses do notinherit state at all, so each time you start a VkRenderPass or move to a new subpass you have to bind/set all of the state.Subpasses also specify an action both for loading and storing each attachment.This allows you to say 'the depth should be cleared to 1.0, but the color canbe initialised to garbage for all I care - I'm going to fully overwrite the screen inthis pass'. Again, this can provide useful optimisation information that thedriver no longer has to guess.
The lastconsideration is compatibility between these different objects. When you createa VkRenderPass (and all of its subpasses) youdon't reference anything else, but you do specify both the format and use ofall attachments. Then when you create a VkFramebuffer you must choose a VkRenderPass that itwill be used with. This doesn't have to be the exact instance that you willlater use, but it does have to be compatible - the same number and format of attachments.Similarly when creating a VkPipeline you have tospecify theVkRenderPass and subpass that it will be usedwith, again not having to be identical but required to be compatible.
There are morecomplexities to consider if you have multiple subpasses within your renderpass, as you have to declare barriers and dependencies between them, andannotate which attachments must be used for what. Again, if you're looking intothat read the spec.
Backbuffers and presentation
I'm only goingto talk about this fairly briefly because not only is it platform-specific but it'sfairly straightforward.
Note that Vulkan exposes nativewindow system integration via extensions, so you will have to request themexplicitly when you create your VkInstance and VkDevice.
To start with,you create a VkSurfaceKHR from whatevernative windowing information is needed.
Once you have asurface you can create a VkSwapchainKHR for thatsurface. You'll need to query for things like what formats are supported onthat surface, how many backbuffers you can have in the chain, etc.
You can thenobtain the actual images in the VkSwapchainKHR via vkGetSwapchainImagesKHR(). These are normal VkImage handles, butyou don't control their creation or memory binding - that's all done for you. You will have tocreate an VkImageView each though.
When you want torender to one of the images in the swapchain, you can call vkAcquireNextImageKHR() that will return to you the index of the next image in the chain.You can render to it and then call vkQueuePresentKHR() with the same index to have it presented to the display.
There are manymore subtleties and details if you want to get really optimal use out of theswapchain, but for the dead-simple hello world case, the above suffices.
Conclusion
Hopefully you'restill with me after that rather break-neck pace.
As promised I'veskipped a lot of details and skimmed over some complexities, for example I havecompletely failed to mention sparse resources support, primary and secondarycommand buffers, and I've probably missed some other cool things.
With any luckthough you have the broad-strokes impression of how a simple Vulkan applications is put together, andyou're in a better place to go look at some documentation and figure the restout for yourself.
Any questions orcomments, let me know on twitter or email. In particular if anything is actuallywrong I will correct it, as I don't want to mislead with this document - just set up abasic understanding that can be expanded on with further reading.
Also just toplug myself a little, if you need a graphics debugger for Vulkan considergiving RenderDoc a try, and let me know if you have any problems.
Happy hacking!
30 minutes notactually guaranteed.
I've writtenthis post with a specific target audience in mind, namely those who have a goodgrounding in existing APIs (e.g. D3D11 and GL) and understand the concepts ofmultithreading, staging resources, synchronisation and so on but want to knowspecifically how they are implemented in Vulkan. So we end up with a whirlwind tour ofwhat the main Vulkan concepts look like.
This isn'tintended to be comprehensive (for that you should read the spec or a more in-depth tutorial),nor is it heavy in background or justification. Hopefully by the end of thisyou should be able to read specs or headers and have a sketched idea of how asimple Vulkan application is implemented, but you will need to do additional reading.
Mostly, this isthe document I wish had already been written when I first encountered Vulkan- so for the mostpart it is tuned to what I would have wanted to know. I'll reference the specwhenever you should do more reading to get a precise understanding, but you'llat least know what to look for.
- baldurk
General
At the end ofthe post I've included a heavily abbreviated pseudocode program showing therough steps to a hello world triangle, to match up to the explanations.
A few simplethings that don't fit any of the other sections:
Vulkan is a C API, i.e. free function entry points. This is the same as GL.
The API is quite heavily typed - unlike GL. Each enum is separate, handles that are returned are opaque 64-bit handles so they are typed on 64-bit (not typed on 32-bit, although you can make them typed if you use C++).
A lot of functions (most, even) take extensible structures as parameters instead of basic types.
VkAllocationCallbacks * is passed into creation/destruction functions that lets you pass custom malloc/free functions for CPU memory. For more details read the spec, in simple applications you can just pass NULL and let the implementation do its own CPU-side allocation.
Warning: I'm notconsidering any error handling, nor do I talk much about querying forimplementation limits and respecting them. While I'm not intentionally gettinganything outright wrong, I am skipping over many details that a realapplication needs to respect. This post is just to get a grasp of the API, it'snot a tutorial!
First steps
You initialise Vulkan by creating aninstance (VkInstance). The instance is an entirely isolatedsilo of Vulkan- instances do not know about each other in any way. At this point youspecify some simple information including which layers and extensions you wantto activate - there are query functions that let you enumerate what layers andextensions are available.
With a VkInstance, you can now examine the GPUs available. A given Vulkan implementationmight not be running on a GPU, but let's keep things simple. Each GPU gives youa handle - VkPhysicalDevice. You can query the GPUs names,properties, capabilities, etc. For example see vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures.
With aVkPhysicalDevice, you can create a VkDevice. The VkDevice is your mainhandle and it represents a logical connection - i.e. 'I am running Vulkan on thisGPU'. VkDevice is used for pretty much everythingelse. This is the equivalent of a GL context or D3D11 device.
N.B. Each ofthese is a 1:many relationship. A VkInstance can have many VkPhysicalDevices, a VkPhysicalDevice can have many VkDevices. In Vulkan 1.0, there is nocross-GPU activity, but you can bet this will come in the future though.
I'm hand wavingsome book-keeping details, Vulkan in general is quite lengthy in setup due to its explicit nature and thisis a summary not an implementation guide. The overall picture is that yourinitialisation mostly looks like vkCreateInstance() → vkEnumeratePhysicalDevices() → vkCreateDevice(). For a quick anddirty hello world triangle program, you can do just that and pick the firstphysical device, then come back to it once you want error reporting &validation, enabling optional device features, etc.
Images and Buffers
Now that we havea VkDevice we can start creating pretty muchevery other resource type (a few have further dependencies on other objects),for example VkImage and VkBuffer.
For GL people,one kind of new concept is that you must declare at creation time how an imagewill be used. You provide a bit field, with each bit indicating a certain typeof usage - color attachment, or sampled image in shader, or image load/store, etc.
You also specifythe tiling for the image - LINEAR or OPTIMAL. This specifies the tiling/swizzling layout for the image data inmemory. OPTIMAL tiled images are opaquely tiled,LINEAR are laid out just as you expect. This affects whether the image datais directly readable/writable, as well as format support - drivers reportimage support in terms of 'what image types are supported in OPTIMAL tiling, and what image types are supported in LINEAR'. Be prepared for very limited LINEAR support.
Buffers aresimilar and more straightforward, you give them a size and a usage and that'sabout it.
Images aren'tused directly, so you will have to create a VkImageView - this is familiar to D3D11 people. Unlike GL texture views, image viewsare mandatory but are the same idea - a description of what array slices or miplevels are visible to wherever the image view is used, and optionally adifferent (but compatible) format (like aliasing a UNORMtexture as UINT).
Buffers are usually used directly as they're just a block of memory, butif you want to use them as a texel buffer in a shader, you need to providea VkBufferView.
Allocating GPU Memory
Those buffersand images can't be used immediately after creation as no memory has beenallocated for them. This step is up to you.
Available memoryis exposed to applications by the vkGetPhysicalDeviceMemoryProperties(). It reports one or more memory heaps of given sizes, and one ormore memory typeswith given properties. Each memory type comes from oneheap - so a typical example for a discrete GPU on a PC would be two heaps - one for systemRAM, and one for GPU RAM, and multiple memory types from each.
The memory typeshave different properties. Some will be CPU visible or not, coherent betweenGPU and CPU access, cached or uncached, etc. You can find out all of theseproperties by querying from the physical device. This allows you to choose thememory type you want. E.g. staging resources will need to be in host visiblememory, but your images you render to will want to be in device local memoryfor optimal use. However there is an additional restriction on memory selectionthat we'll get to in the next section.
To allocatememory you call vkAllocateMemory() whichrequires your VkDevice handle and adescription structure. The structure dictates which type of memory to allocatefrom which heap and how much to allocate, and returns a VkDeviceMemory handle.
Host visiblememory can be mapped for update - vkMapMemory()/vkUnmapMemory() are familiar functions. All maps are by definition persistent, andas long as you synchronise it's legal to have memory mapped while in use by theGPU.
GL people willbe familiar with the concept, but to explain for D3D11 people - the pointersreturned by vkMapMemory() can be heldand even written to by the CPU while the GPU is using them. These 'persistent'maps are perfectly valid as long as you obey the rules and make sure tosynchronise access so that the CPU isn't writing to parts of the memoryallocation that the GPU is using (see later).
This is a littleoutside the scope of this guide but I'm going to mention it any chance I get - for the purposesof debugging, persistent maps of non-coherent memory with explicit regionflushes will be much more efficient/fast than coherent memory. The reason beingthat for coherent memory the debugger must jump through hoops to detect andtrack changes, but the explicit flushes of non-coherent memory provide nice markup ofmodifications.
In RenderDoc to help out with this, if you flush a memory region then thetool assumes you will flush for every write, and turns off the expensive hoop-jumping to trackcoherent memory. That way even if the only memory available is coherent, thenyou can get efficient debugging.
Binding Memory
Each VkBuffer or VkImage, depending on itsproperties like usage flags and tiling mode (remember that one?) will reporttheir memory requirements to you viavkGetBufferMemoryRequirements or vkGetImageMemoryRequirements.
The reportedsize requirement will account for padding for alignment between mips, hiddenmeta-data, and anything else needed for the total allocation. The requirementsalso include a bitmask of the memory types that are compatible with thisparticular resource. The obvious restrictions kick in here: that OPTIMAL tiling color attachment image will report that only DEVICE_LOCAL memory types are compatible, and it will be invalid to try to bindsome HOST_VISIBLE memory.
The memory typerequirements generally won't vary if you have the same kind of image or buffer.For example if you know that optimally tiled images can go in memory type 3,you can allocate all of them from the same place. You will only have to checkthe size and alignment requirements per-image. Read the spec for the exactguarantee here!
Note the memoryallocation is by no means 1:1. You can allocate a large amount of memory and aslong as you obey the above restrictions you can place several images or buffersin it at different offsets. The requirements include an alignment if you areplacing the resource at a non-zero offset. In fact you will definitely want to do this in any realapplication, as there are limits on the total number of allocations allowed.
There is an additional alignment requirement bufferImageGranularity - a minimum separation required between memory used for a VkImage and memory used for aVkBuffer in thesame VkDeviceMemory. Read the spec for more details, butthis mostly boils down to an effective page size, and requirement that eachpage is only used for one type of resource.
Once you havethe right memory type and size and alignment, you can bind it with vkBindBufferMemory or vkBindImageMemory. This bindingis immutable, and must happen before you start using the buffer orimage.
Command buffers and submission
Work isexplicitly recorded to and submitted from a VkCommandBuffer.
A VkCommandBuffer isn't created directly, it is allocated from a VkCommandPool. This allows for better threading behaviour since command buffers andcommand pools must be externally synchronised (see later). You can have a poolper thread and vkAllocateCommandBuffers()/vkFreeCommandBuffers() command buffers from it without heavy locking.
Once you havea VkCommandBuffer you begin recording, issue allyour GPU commands into it *hand waving goes here* and end recording.
Command buffersare submitted to a VkQueue. The notion ofqueues are how work becomes serialised to be passed to the GPU. A VkPhysicalDevice (remember way back? The GPU handle) can report a numberof queue families with different capabilities. e.g. a graphics queuefamily and a compute-only queue family. When you create your device you ask for a certainnumber of queues from each family, and then you can enumerate them from thedevice after creation with vkGetDeviceQueue().
I'm going tofocus on having just a single do-everything VkQueue as the simplecase, since multiple queues must be synchronised against each other as they canrun out of order or in parallel to each other. Be aware that someimplementations might require you to use a separate queue for swapchainpresentation - I think chances are that most won't, but you have to account for this.Again, read the spec for details!
You can vkQueueSubmit() several command buffers at once to the queue and they will beexecuted in turn. Nominally this defines the order of execution but rememberthat Vulkan has very specific ordering guarantees - mostly about what work can overlap ratherthan wholesale rearrangement - so take care to read the spec to make sure you synchronise everythingcorrectly.
Shaders and Pipeline State Objects
The reasoningbehind moving to monolithic PSOs is well trodden by now so I won't go over it.
A Vulkan VkPipeline bakes in a lot of state, but allows specific parts of the fixedfunction pipeline to be set dynamically: Things like viewport, stencil masksand refs, blend constants, etc. A full list as ever is in the spec. When youcall vkCreateGraphicsPipelines(), you choose whichstates will be dynamic, and the others are taken from values specified in thePSO creation info.
You canoptionally specify a VkPipelineCache at creationtime. This allows you to compile a whole bunch of pipelines and then call vkGetPipelineCacheData() to save the blob of data to disk. Next time you can prepopulate thecache to save on PSO creation time. The expected caveats apply - there isversioning to be aware of so you can't load out of date or incorrect caches.
Shaders arespecified as SPIR-V. This has already been discussed much better elsewhere, so I will justsay that you create a VkShaderModule from a SPIR-V module, whichcould contain several entry points, and at pipeline creation time you chose oneparticular entry point.
The easiest wayto get some SPIR-V for testing is with the reference compiler glslang, but other front-ends are available, as well as LLVM → SPIR-V support.
Binding Model
To establish apoint of reference, let's roughly outline D3D11's binding model. GL's is quitesimilar.
Each shader stage has its own namespace, so pixel shader texture binding 0 is not vertex shader texture binding 0.
Each resource type is namespaced apart, so constant buffer binding 0 is definitely not the same as texture binding 0.
Resources are individually bound and unbound to slots (or at best in contiguous batches).
In Vulkan, the base bindingunit is a descriptor. A descriptor is an opaque representation that stores'one bind'. This could be an image, a sampler, a uniform/constant buffer, etc.It could also be arrayed - so you can have an array of images that can be different sizes etc, aslong as they are all 2D floating point images.
Descriptorsaren't bound individually, they are bound in blocks in a VkDescriptorSet which each have a particular VkDescriptorSetLayout. The VkDescriptorSetLayout describesthe types of the individual bindings in each VkDescriptorSet.
The easiest wayI find to think about this is consider VkDescriptorSetLayout as being like a C struct type - it describes some members, each memberhaving an opaque type (constant buffer, load/store image, etc). The VkDescriptorSet is a specific instance of that type - and each member in the VkDescriptorSet is a binding you can update with whichever resource you want it tocontain.
This is roughlyhow you create the objects too. You pass a list of the types, array sizes andbindings to Vulkan to create a VkDescriptorSetLayout, then you canallocateVkDescriptorSets with that layout from a VkDescriptorPool. The pool acts the same way as VkCommandPool, to let you allocate descriptors on different threads more efficiently byhaving a pool per thread.
VkDescriptorSetLayoutBinding bindings[]={
// binding 0 is a UBO, array size 1, visibleto all stages
{0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,1,VK_SHADER_STAGE_ALL_GRAPHICS,NULL},
// binding 1 is a sampler, array size 1,visible to all stages
{1, VK_DESCRIPTOR_TYPE_SAMPLER, 1, VK_SHADER_STAGE_ALL_GRAPHICS,NULL},
// binding 5 is an image, array size 10,visible only to fragment shader
{5, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,10,VK_SHADER_STAGE_FRAGMENT_BIT,NULL},
};
Example C++outlining creation of a descriptor set layout
Once you have adescriptor set, you can update it directly to put specific values in thebindings, and also copy between different descriptor sets.
When creating apipeline, you specify N VkDescriptorSetLayouts for use ina VkPipelineLayout. Then when binding, you have to bindmatching VkDescriptorSets of those layouts. The sets canupdate and be bound at different frequencies, which allows grouping allresources by frequency of update.
To extend theabove analogy, this defines the pipeline as something like a function, and itcan take some number of structs as arguments. When creating the pipeline youdeclare the types (VkDescriptorSetLayouts) of eachargument, and when binding the pipeline you pass specific instances of thosetypes (VkDescriptorSets).
The other sideof the equation is fairly simple - instead of having shader or type namespaced bindings in your shader code,each resource in the shader simply says which descriptor set and binding itpulls from. This matches the descriptor set layout you created.
#version430
layout(set =0, binding =0)uniform MyUniformBufferType {
// ...
} MyUniformBufferInstance;
// note in the C++ sample above, this is just a sampler - not a combined image+sampler
// as is typical in GL.
layout(set =0, binding =1) sampler MySampler;
layout(set =0, binding =5)uniformimage2D MyImages[10];
Example GLSLshowing bindings
Synchronisation
I'm going tohand wave a lot in this section because the specific things you need tosynchronise get complicated and long-winded fast, and I'm just going to focus onwhat synchronisation is available and leave the details of what youneed to synchronise to reading of specs or more in-depth documents.
This is probablythe hardest part of Vulkan to get right, especially since missing synchronisation might notnecessarily break anything when you run it!
Several types ofobjects must be 'externally synchronised'. In fact I've used that phrase beforein this post. The meaning is basically that if you try to use the same VkQueue on two different threads, there's no internal locking so it willcrash - it's up to you to 'externally synchronise' access to that VkQueue.
For the exactrequirements of what objects must be externally synchronised when you shouldcheck the spec, but as a rule you can use VkDevice for creation functions freely - it is locked for allocation sake - but things likerecording and submitting commands must be synchronised.
N.B. There is noexplicit or implicit ref counting of any object - you can't destroy anything until you aresure it is never going to be used again by either the CPU or the GPU.
Vulkan has VkEvent, VkSemaphore and VkFence which can be used for efficient CPU-GPU and GPU-GPU synchronisation. They work as youexpect so you can look up the precise use etc yourself, but there are nosurprises here. Be careful that you do use synchronisation though, as there arefew ordering guarantees in the spec itself.
Pipelinebarriers are a new concept, that are used in general terms for ensuringordering of GPU-side operations where necessary, for example ensuring that results fromone operation are complete before another operation starts, or that all work ofone type finishes on a resource before it's used for work of another type.
There are threetypes of barrier - VkMemoryBarrier, VkBufferMemoryBarrier and VkImageMemoryBarrier. A VkMemoryBarrier applies to memory globally, and the other two apply to specificresources (and subsections of those resources).
The barriertakes a bit field of different memory access types to specify what operationson each side of the barrier should be synchronised against the other. A simpleexample of this would be "this VkImageMemoryBarrier has srcAccessMask = ACCESS_COLOR_ATTACHMENT_WRITE and dstAccessMask = ACCESS_SHADER_READ", which indicates that all color writes should finish before anyshader reads begin - without this barrier in place, you could read stale data.
Image layouts
Image barriershave one additional property - images exist in states called image layouts. VkImageMemoryBarrier can specify a transition from one layout to another. The layout mustmatch how the image is used at any time. There is a GENERAL layout which is legal to use for anything but might not be optimal,and there are optimal layouts for color attachment, depth attachment, shadersampling, etc.
Images begin ineither the UNDEFINED or PREINITIALIZED state (you can choose). The latter is useful for populating an imagewith data before use, as the UNDEFINED layout hasundefined contents - a transition from UNDEFINED to GENERAL may lose the contents, but PREINITIALIZED to GENERAL won't.Neither initial layout is valid for use by the GPU, so at minimum aftercreation an image needs to be transitioned into some appropriate state.
Usually you haveto specify the previous and new layouts accurately, but it is always valid totransition from UNDEFINED to anotherlayout. This basically means 'I don't care what the image was like before,throw it away and use it like this'.
Render passes
A VkRenderpass is Vulkan's way of more explicitly denoting how your rendering happens, rather thanletting you render into then sample images at will. More information about howthe frame is structured will aid everyone, but primarily this is to aid tilebased renderers so that they have a direct notion of where rendering on a giventarget happens and what dependencies there are between passes, to avoid leavingtile memory as much as possible.
N.B. Because Iprimarily work on desktops (and for brevity & simplicity) I'm notmentioning a couple of optional things you can do that aren't commonly suitedto desktop GPUs like input and transient attachments. As always, read the spec:).
The firstbuilding block is a VkFramebuffer, which is a setof VkImageViews. This is not necessarilythe same as the classic idea of a framebuffer as the particular images you arerendering to at any given point, as it can contain potentially more images thanyou ever render to at once.
A VkRenderPass consists of a series of subpasses. In your simple triangle caseand possibly in many other cases, this will just be one subpass. For now, let'sjust consider that case. The subpass selects some of the framebufferattachments as color attachments and maybe one as a depth-stencil attachment.If you have multiple subpasses, this is where you might have different subsetsused in each subpass - sometimes as output and sometimes as input.
Drawing commandscan only happen inside a VkRenderPass, and some commandssuch as copies clears can only happen outside a VkRenderPass. Some commands such as state binding can happen inside or outside atwill. Consult the spec to see which commands are which.
Subpasses do notinherit state at all, so each time you start a VkRenderPass or move to a new subpass you have to bind/set all of the state.Subpasses also specify an action both for loading and storing each attachment.This allows you to say 'the depth should be cleared to 1.0, but the color canbe initialised to garbage for all I care - I'm going to fully overwrite the screen inthis pass'. Again, this can provide useful optimisation information that thedriver no longer has to guess.
The lastconsideration is compatibility between these different objects. When you createa VkRenderPass (and all of its subpasses) youdon't reference anything else, but you do specify both the format and use ofall attachments. Then when you create a VkFramebuffer you must choose a VkRenderPass that itwill be used with. This doesn't have to be the exact instance that you willlater use, but it does have to be compatible - the same number and format of attachments.Similarly when creating a VkPipeline you have tospecify theVkRenderPass and subpass that it will be usedwith, again not having to be identical but required to be compatible.
There are morecomplexities to consider if you have multiple subpasses within your renderpass, as you have to declare barriers and dependencies between them, andannotate which attachments must be used for what. Again, if you're looking intothat read the spec.
Backbuffers and presentation
I'm only goingto talk about this fairly briefly because not only is it platform-specific but it'sfairly straightforward.
Note that Vulkan exposes nativewindow system integration via extensions, so you will have to request themexplicitly when you create your VkInstance and VkDevice.
To start with,you create a VkSurfaceKHR from whatevernative windowing information is needed.
Once you have asurface you can create a VkSwapchainKHR for thatsurface. You'll need to query for things like what formats are supported onthat surface, how many backbuffers you can have in the chain, etc.
You can thenobtain the actual images in the VkSwapchainKHR via vkGetSwapchainImagesKHR(). These are normal VkImage handles, butyou don't control their creation or memory binding - that's all done for you. You will have tocreate an VkImageView each though.
When you want torender to one of the images in the swapchain, you can call vkAcquireNextImageKHR() that will return to you the index of the next image in the chain.You can render to it and then call vkQueuePresentKHR() with the same index to have it presented to the display.
There are manymore subtleties and details if you want to get really optimal use out of theswapchain, but for the dead-simple hello world case, the above suffices.
Conclusion
Hopefully you'restill with me after that rather break-neck pace.
As promised I'veskipped a lot of details and skimmed over some complexities, for example I havecompletely failed to mention sparse resources support, primary and secondarycommand buffers, and I've probably missed some other cool things.
With any luckthough you have the broad-strokes impression of how a simple Vulkan applications is put together, andyou're in a better place to go look at some documentation and figure the restout for yourself.
Any questions orcomments, let me know on twitter or email. In particular if anything is actuallywrong I will correct it, as I don't want to mislead with this document - just set up abasic understanding that can be expanded on with further reading.
Also just toplug myself a little, if you need a graphics debugger for Vulkan considergiving RenderDoc a try, and let me know if you have any problems.
Happy hacking!