zoukankan      html  css  js  c++  java
  • Would it be faster to batch SetVertex/PixelShaderConstant calls?

    From  http://www.gamedev.net/topic/435995-would-it-be-faster-to-batch-setvertexpixelshaderconstant-calls/

    Copy the content here for fear that the link is invalide someday.

    ------------------------------------------------------------

    Kimani:

    So I would imagine the API call I make most often is SetVertexShaderConstantF. This handy dandy page, at the bottom, says some stuff about it. It's not really all that expensive. But I do it alot, so it adds up! And I've been thinking about optimizing.

    My question is thus: would it be faster for me to change things around such that I set a bunch of constant registers all in one go, or would it be about the same as if I kept it is - setting them one at a time.

    Example: Going from this:
    pd3dDevice->SetVertexShaderConstant( 0, (float*)&( value1 ), 1 );
    pd3dDevice->SetVertexShaderConstantF( 1, (float*)&( value2 ), 1 );
    pd3dDevice->SetVertexShaderConstantF( 2, (float*)&( matrix1 ), 4 );
    pd3dDevice->SetVertexShaderConstantF( 6, (float*)&( value3 ), 1 );
    
    
    
    

    to:
    vec4 values[7];
    
    // copy value1/value2/matrix1/value3 into values
    
    pd3dDevice->SetVertexShaderConstantF( 0, (float*)values, 7 );
    
    
    
    



    Any experience or thoughts? I have no idea if it even would be an optimization, and it's too much work to go ahead and implement it if it's not even going to help.

    By jollyjeffers:

    Less crossing of the API boundary is usually a good thing - you should get better performance, but the question remains as to how much better.

    A fundamental rule with optimization - something I see overlooked by far too many people - is that you absolutely MUST have performance data to start with. Before you even consider optimization strategies you need to spend a lot of time working out where your time is really being spent - optimizing the worst offender is going to have a lot more benefit than some simple but cheap parts.

    All I'm trying to get at is that seeing you call SetVertexShaderConstantF lots doesn't necessarily mean its a real performance bottleneck in your code. You need to measure it first, which also has the advantage of telling you whether your optimizations really did improve performance...

    Get used to using PIX for Windows - its not great for benchmarking (last I used it the timing was a bit broken in this sense, and observation interferes with the measurements) but the call-stream capture can be invaluable.

    Use the D3DPERF_BeginEvent() and D3DPERF_EndEvent() (or use one I prepared for you earlier [wink]) to make the call-stream easier to interpret.

    Running a difference against two streams (before/after) can give you a good idea as to whether your new algorithms really are reducing the number of API calls for example.

    hth
    Jack

    By Namethatnobodyelsetook:

    Jollyjeffers' advice is good... you should test.

    Back in DX8 is was well known the setting shader constants was one of the slower operations and batching was a huge win. With newer cards, and DX9, I have no idea if those performance concerns still exist. You best bet is to test it, or rely on known data.

    The SDK has a page (click index, then "Accurately Profiling Direct3D API Calls") on how to profile D3D. The end of the article contains common timings for many D3D functions, and it appears that setting pixel shader constants can be extremely slow, while setting vertex shader constants are only somewhat slow. In other words, it looks like batching is still a good idea.

    Kimani:

    Well, I implemented it with my SM2.0 render pipeline, which is the main one. It took quite a while, but it looks like, at least in this particular scene, I'm seeing a 45% increase in performance.

    I'm sure most of it is the batching of the pixel shader constants, since yes, they are quite a bit more expensive.

    I also put in the ScopeProfiler thingy. It's quite useful, and will help a lot :D

  • 相关阅读:
    AES密码算法详解(转自https://www.cnblogs.com/luop/p/4334160.html)
    快速排序和插入排序——我的代码
    北京大学1001ACM——高精度类型题总结
    C语言数组不知道输入几个整数以及输入一直到为0
    C语言并查集例子——图问题巧用parent[]数组
    C语言快速判断素数——不超时
    C语言如何才能使用bool类型
    C语言两个特别大的整数类型相加超出范围使用两个技巧
    C语言存30位数字长的十进制方法
    dockerfile相关命令
  • 原文地址:https://www.cnblogs.com/qilinzi/p/1952772.html
Copyright © 2011-2022 走看看