9.Command Recording and Drawing

[TOC]

介绍

内容

Clearing a color image

Clearing a depth-stencil image

Clearing render pass attachments

Binding vertex buffers

Binding an index buffer

Providing data to shaders through push constants

Setting viewport state dynamically

Setting scissor state dynamically

Setting line width state dynamically

Setting depth bias state dynamically

Setting blend constants state dynamically

Drawing a geometry

Drawing an indexed geometry

Dispatching compute work

Executing a secondary command buffer inside a primary command buffer

Recording a command buffer that draws a geometry with a dynamic viewport
and scissor states

Recording command buffers on multiple threads

Preparing a single frame of animation

Increasing performance through increasing the number of separately rendered
frames

简述

Vulkan设计为图形和计算API.它的主要目的是允许我们用多个厂商生产的grahpics 硬件生成dynamic images.

已经了解了如何创建和管理资源以及在shaders中使用.了解了不同的shader stages和pipeline objects控制rendering state或分发computational work.最后一件事是需要知道如何绘制images的知识.

本文讨论commands.受线学习drawing commands和在我们的source code里管理它们以达到最高性能.最后vulkan API里最强力的能力–在多线程进行record command buffers.

准备

Clearing a color image

vulkan里,给render pass的attachment description设置loadOp为VK_ATTACHMENT_LOAD_OP_CLEAR以clear.

有时,我们不想这么做,需要隐式实现

1
2
3

vkCmdClearColorImage( command_buffer, image, image_layout, &clear_color,
                     static_cast<uint32_t>(image_subresource_ranges.size()),
                     image_subresource_ranges.data() );

提供image的handle,layout,sub-resources的数组(mipmap level and/or array layers).

只能清理color image.以及transfer dst usage images.

Clearing a depth-stencil image

1
2
3

vkCmdClearDepthStencilImage( command_buffer, image, image_layout,
                            &clear_value, static_cast<uint32_t>(image_subresource_ranges.size()),
                            image_subresource_ranges.data() );

VkClearDepthStencilValue

depth when a depth aspect should be cleared

stencil for a value used to clear the stencil aspect

Clearing render pass attachments

vkCmdClearAttachments

有时清理attachements of sub-passes.

1
2
3

vkCmdClearAttachments( command_buffer,
                      static_cast<uint32_t>(attachments.size()), attachments.data(),
                      static_cast<uint32_t>(rects.size()), rects.data() );

VkClearAttachment

aspectMask attachment的aspect(color,depth,stencil)

aspectMask 置为VK_IMAGE_ASPECT_COLOR_BIT,指明colorAttachment为当前sub-pass里的color attachemnt,否则忽略

clearValue

VkClearRect

top-left,width,height

Binding vertex buffers

当进行几何绘制需要指明vertiices数据.至少需要vertex positions(其实也不是必须的,可以shader里生成…).其他数据还有normal,tangent/bitangent,colors,teexture coordinates.这些数据来源于usage为vertex buffer的buffers.需要在dc前绑定这些buffers.

VertexBufferParameters

struct VertexBufferParameters {
    VkBuffer Buffer;
    VkDeviceSize MemoryOffset;
};

std::vector<VertexBufferParameters> named buffers_parameters.
    ..

std::vector<VkBuffer> buffers;
std::vector<VkDeviceSize> offsets;
for( auto & buffer_parameters : buffers_parameters ) {
    buffers.push_back( buffer_parameters.Buffer );
    offsets.push_back( buffer_parameters.MemoryOffset );
}
vkCmdBindVertexBuffers( command_buffer, first_binding,
                       static_cast<uint32_t>(buffers_parameters.size()), buffers.data(),
                       offsets.data() );

Binding an index buffer

index buffer的usage为index buffer，type为，比如VK_INDEX_TYPE_UINT16，VK_INDEX_TYPE_UINT32

1	vkCmdBindIndexBuffer( command_buffer, buffer, memory_offset, index_type );

Providing data to shaders through push constants

大多数时间使用descriptor set通过buffers或images提供大量数据.为了快速方便提供数据给shader,可以使用push constants.

vkCmdPushConstants( command_buffer, 
                   pipeline_layout,
                   pipeline_stages, 
                   offset,//4的倍数 
                   size, //4的倍数
                   data //void*
                   )

硬件最少支持128bytes.

一个例子

std::array<float, 4> color = { 0.0f, 0.7f, 0.4f, 0.1f };
ProvideDataToShadersThroughPushConstants( CommandBuffer, *PipelineLayout,
                                         VK_SHADER_STAGE_FRAGMENT_BIT, 0, static_cast<uint32_t>(sizeof( color[0] ) *
color.size()), &color[0] );

ProvideDataToShadersThroughPushConstants(...)
{
	vkCmdPushConstants( command_buffer, pipeline_layout, pipeline_stages,
offset, size, data );
}

settings

Setting viewport state dynamically

VkViewport

left :up left for x
top: up left for y
width
height

1
2
3

> vkCmdSetViewport( command_buffer, first_viewport,
>                  static_cast<uint32_t>(viewports.size()), viewports.data() );
>

stages dynamic指明是动态的,但数量是创建时就固定了的

Setting scissor state dynamically

scissor额外再viewport dimentsion内添加了一个渲染rectangle区域.总开启.可以静态设置,也可以cb动态设置

VkRect2D

x:horizontal offset (in pixels) from up left corner of viewport for x number of offset

y:upper left corner

width

height

vkCmdSetScissor
1
2
3
> vkCmdSetScissor( command_buffer, first_scissor,
>                 static_cast<uint32_t>(scissors.size()), scissors.data() );
>

Setting line width states dynamically

1	vkCmdSetLineWidth( command_buffer, line_width )

Setting depth bias state dynamically

depth bias可以修正fragment的depth value计算.

depth bias可以对fragment的depth进行offset.通常绘制非常近的objects用到.比如墙上的pictures or posters.这类objects绘制会有z-fighting.

depth bias修正value计算–存储在depth attachment里的depth value.但不会影响渲染的image.也就是不会影响距离感.修正是基于constant factor和fragment的slope.也指明depth bias(clamp)能加的最大或最小值.

1	vkCmdSetDepthBias( command_buffer, constant_factor, clamp, slope_factor );

Setting blend constants states dynamically

blend用于透明物体模拟.通过控制混合英子和操作,得到最终结果.也可以使用constant color进行计算.constant color可以动态设置.

1	vkCmdSetBlendConstants( command_buffer, blend_constants.data() );

drawing

Drawing a geometry

vkCmdDraw( command_buffer, 
          vertex_count, 
          instance_count, 
          first_vertex,//多models存储到一个vertex buffer里有用
          first_instance
          );

instance在不改变vertex进行通mesh绘制很有用(ref specifying pipeline vertex binding description,attribute description,and input state,chapter 8,graphics and compute piipeline).

vulkan里没有Default state.

比如descriptor sets或dynamic pipeline states.每次record cb,所有需求的descriptor sets需要绑定给成本，类似作为dynamic的pipeline state必须用对于函数提供值,render pass必须在合适的command buffer里开始.

drawing cam be performed only inside the render pass.

Drawing an indexed geometry

最常用的.

vkCmdDrawIndexed()

去重复顶点,需要额外的index buffer.但在vertex有很多额外数据时很有必要(normal,tangent,bitangent,two texture coordinates).

$\color {red}{新的概念(reuse vertex)}$:Indexed drawing允许硬件重用vertex caching里已经计算的vertices.根据indices,如果已经计算过,reuse.

1 2	vkCmdDrawIndexed( command_buffer, index_count, instance_count, first_index, vertex_offset, first_instance );

Dispatching compute work

compute pipeline

resource通过且只能通过descriptor sets

可用来进行image post-processing,color correction or blur.physical 计算.

compute shaderdispatched in groups.

vkCmdDispatch( command_buffer, x_size, y_size, z_size );

workgroups

maxComputeWorkGroupCount[3]

硬件最少支持65,535

不能再render passes里进行

Executing a secondary command buffer inside a primary command buffer

vulkan里可以record2中command buffers-primary and secondary.primary command buffers能直接submit到queues.secondary command buffers只能在primary command buffer里执行.

vkCmdExecuteCommands

一般primary command buffers已经足够用来rendering或computing work.但是有时需要把工作分到两种command buffer 里.当想图形硬件执行secondary command buffers时我们能在primary command buffer里这样做:

1
2
3

vkCmdExecuteCommands( command_buffer,
                     static_cast<uint32_t>(secondary_command_buffers.size()),
                     secondary_command_buffers.data() );

example

Recording a command buffer that draws a geometry with a dynamic viewport and scissor states

struct Mesh {
    std::vector<float> Data;
    std::vector<uint32_t> VertexOffset;
    std::vector<uint32_t> VertexCount;
};

if( !BeginCommandBufferRecordingOperation( command_buffer,
                                          VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, nullptr ) ) {
    return false;
}

image memory barrier

if( present_queue_family_index != graphics_queue_family_index ) {
    ImageTransition image_transition_before_drawing = {
        swapchain_image,
        VK_ACCESS_MEMORY_READ_BIT,
        VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
        present_queue_family_index,
        graphics_queue_family_index,
        VK_IMAGE_ASPECT_COLOR_BIT
    };
    SetImageMemoryBarrier( command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
                          VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, {
                              image_transition_before_drawing } );
}

start render pass,bind pipeline object

BeginRenderPass( command_buffer, render_pass, framebuffer, { { 0, 0 },
                                                            framebuffer_size }, clear_values, VK_SUBPASS_CONTENTS_INLINE );
BindPipelineObject( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
                   graphics_pipeline );

设置dynamic states.viewport ,scissor..bind a buffer for vertex data

VkViewport viewport = {
    0.0f,
    0.0f,
    static_cast<float>(framebuffer_size.width),
    static_cast<float>(framebuffer_size.height),
    0.0f,
    1.0f,
};
SetViewportStateDynamically( command_buffer, 0, { viewport } );
VkRect2D scissor = {
    {
        0,
        0
    },
    {
        framebuffer_size.width,
        framebuffer_size.height
    }
};
SetScissorStateDynamically( command_buffer, 0, { scissor } );
BindVertexBuffers( command_buffer, first_vertex_buffer_binding,
                  vertex_buffers_parameters );

descriptor sets,shaders访问

1 2	BindDescriptorSets( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline_layout, index_for_first_descriptor_set, descriptor_sets, {} );

现在可以绘制几何体了.当然还可以设置index buffer,提供push constants值.

for( size_t i = 0; i < geometry.Parts.size(); ++i ) {
    DrawGeometry( command_buffer, geometry.Parts[i].VertexCount,
                 instance_count, geometry.Parts[i].VertexOffset, first_instance );
}

在停止record command buffer前,需要end render pass.之后需要另一个transition on a swapchain image.当完成在single frame of animation上进行绘制,想要在swapchain image上显示.为此需要改变它的layout为VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,因为这是presentation engine正确显示image要求的.这个不走需要显示进行.

$\color{red}{注意}$,如果用于graphics operations和presentations的queues不同,需要一个queue ownership transfer.这通过另一个image memory barrier完成.之后,我们能停止record a command buffer.

EndRenderPass( command_buffer );
if( present_queue_family_index != graphics_queue_family_index ) {
    ImageTransition image_transition_before_present = {
        swapchain_image,
        VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
        VK_ACCESS_MEMORY_READ_BIT,
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
        graphics_queue_family_index,
        present_queue_family_index,
        VK_IMAGE_ASPECT_COLOR_BIT
    };
    SetImageMemoryBarrier( command_buffer,
                          VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                          VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, { image_transition_before_present }
                         );
}
if( !EndCommandBufferRecordingOperation( command_buffer ) ) {
    return false;
}
return true;

我们能用这个cb并submit it to a (graphic) queue.只能submit一次,因为flag 为VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT.

submit这个cb之后,能显示到swapchain image上.需要注意submission和presentation operations需要进行同步

advanced

*Recording command buffers on multiple threads

自定义结构

struct CommandBufferRecordingThreadParameters {
    VkCommandBuffer CommandBuffer;
    std::function<bool( VkCommandBuffer )> RecordingFunction;
};

每个线程一个,记录cbs.RecordingFunction定义了一个在独立thread里record command buffer的function.

为了多线程使用vulkan,需要记住一些规则.

第一不能再多线程修改同一个object.比如不能再多线程从同一个pool allocate command buffers或不能从多线程更新descriptor set.

只有再资源时只读的或者时访问分开的资源吗,我们能从多线程访问.但很难追踪哪个资源时哪个线程创建的.通常,资源创建和修改再主线程(rendering thread).

在Vulkan中使用多线程最常见的场景是并发地记录命令缓冲区.这个操作花费大量时间,分开到多线程进行时很有道理的.

当多线程进行record command buffers时需要多线程和每个线程对应一个独立的command pool

command buffer recording不影响其他资源(除了pool).只准备给一个queue submit commands,所以能record任何操作使用任何资源.比如记录多个操作访问同样的图片或descriptor sets.同样的pipelines能同时绑定不同的command buffers.我们也能record operations绘制到同样的attachments里.

std::vector<std::thread> threads( threads_parameters.size() );
for( size_t i = 0; i < threads_parameters.size(); ++i ) {
    threads[i] = std::thread::thread(
        threads_parameters[i].RecordingFunction,
        threads_parameters[i].CommandBuffer );
}

所有thread完成record cbs后需要收集到一起然后submit它们到queue.

真实app里会避免这样创建和销毁threads的方式.相反,使用已有的job/task system并使用它们record需要的cbs.如图.

submission只能再单线程进行(queus,similarly to other resources,cannot be accessed concurrently),需要等待所有线程完成.

std::vector<VkCommandBuffer> command_buffers( threads_parameters.size() );
for( size_t i = 0; i < threads_parameters.size(); ++i ) {
    threads[i].join();
    command_buffers[i] = threads_parameters[i].CommandBuffer;
}
if( !SubmitCommandBuffersToQueue( queue, wait_semaphore_infos,
                                 command_buffers, signal_semaphores, fence ) ) {
    return false;
}
return true;

submitting cbs一次只能再一个线程进行.

swapchain object也会发生同样的情况.同时只能在一个线程acquire和present swapchain images.

需要留意将layout 从VK_IMAGE_LAYOUT_PRESENT_SRC_KHR (or VK_IMAGE_LAYOUT_UNDEFINED)转换为VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.

Preparing a single frame of animation

Preparing a single frame of animation can be divided into five steps:

Acquiring a swapchain image.
Creating a framebuffer.
Recording a command buffer.
Submitting the command buffer to the queue.
Presenting an image.

uint32_t image_index;
if( !AcquireSwapchainImage( logical_device, swapchain,
                           image_acquired_semaphore, VK_NULL_HANDLE, image_index ) ) {
    return false;
}
std::vector<VkImageView> attachments = { swapchain_image_views[image_index]
                                       };
if( VK_NULL_HANDLE != depth_attachment ) {
    attachments.push_back( depth_attachment );
}
if( !CreateFramebuffer( logical_device, render_pass, attachments,
                       swapchain_size.width, swapchain_size.height, 1, *framebuffer ) ) {
    return false;
}
if( !record_command_buffer( command_buffer, image_index, *framebuffer ) ) {
    return false;
}

std::vector<WaitSemaphoreInfo> wait_semaphore_infos = wait_infos;
wait_semaphore_infos.push_back( {
    image_acquired_semaphore,
    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
} );
if( !SubmitCommandBuffersToQueue( graphics_queue, wait_semaphore_infos, {
    command_buffer }, { ready_to_present_semaphore }, finished_drawing_fence )
  ) {
    return false;
}
PresentInfo present_info = {
    swapchain,
    image_index
};
if( !PresentImage( present_queue, { ready_to_present_semaphore }, {
    present_info } ) ) {
    return false;
}
return true;

fence用于GPU确定cb结束

*Increasing performance through increasing the number of separately rendered frames

在等待cb 运行结束这段时间是浪费了的.所以需要独立render multiple frames of animation .

自定义结构体

struct FrameResources {
    VkCommandBuffer CommandBuffer;//单帧独立的comman buffer
    VkDestroyer<VkSemaphore> ImageAcquiredSemaphore;//给presentation engine的信号量
    VkDestroyer<VkSemaphore> ReadyToPresentSemaphore;//用于知道queue停止运行该cb
    VkDestroyer<VkFence> DrawingFinishedFence;//当signaled表示GPU运行完了
    VkDestroyer<VkImageView> DepthAttachment;//
    VkDestroyer<VkFramebuffer> Framebuffer;
};

用于管理单帧生命周期内管理的资源.

rendering animation是要给循环.一帧绘制,一帧显示.

需要准备多份set

Tests have shown that increasing the number of frame resources from one to two may increase the performance by 50%.

Adding a third set increases the performance further, but the growth isn’t as big this time.

So, the performance gain is smaller with each additional set of frame resources. Three sets of rendering resources seems like a good choice, but we should perform our own tests and see what is best for our specific needs.

check

static uint32_t frame_index = 0;
FrameResources & current_frame = frame_resources[frame_index];
if( !WaitForFences( logical_device, { *current_frame.DrawingFinishedFence
                                    }, false, 2000000000 ) ) {
    return false;
}
if( !ResetFences( logical_device, { *current_frame.DrawingFinishedFence } )
  ) {
    return false;
}

InitVkDestroyer( logical_device, current_frame.Framebuffer );
if( !PrepareSingleFrameOfAnimation( logical_device, graphics_queue,
                                   present_queue, swapchain, swapchain_size, swapchain_image_views,
                                   *current_frame.DepthAttachment, wait_infos,
                                   *current_frame.ImageAcquiredSemaphore,
                                   *current_frame.ReadyToPresentSemaphore,
                                   *current_frame.DrawingFinishedFence, record_command_buffer,
                                   current_frame.CommandBuffer, render_pass, current_frame.Framebuffer ) ) {
    return false;
}
frame_index = (frame_index + 1) % frame_resources.size();
return true;