zoukankan      html  css  js  c++  java
  • 剖析虚幻渲染体系(07)- 后处理

    7.1 本篇概述

    为了与时俱进,紧跟UE官方的步伐,从本篇起:

    分析的UE源码升级到4.26.2,不再是4.25的源码!

    分析的UE源码升级到4.26.2,不再是4.25的源码!!

    分析的UE源码升级到4.26.2,不再是4.25的源码!!!

    重要的事情说三遍,需要同步看源码的同学注意更新了。

    7.1.1 本篇内容

    其实在剖析虚幻渲染体系(04)- 延迟渲染管线的章节4.3.9 PostProcessing就已经粗略地介绍过后处理的内容,不过本篇将更加深入详细地阐述后处理的流程和主要技术点。

    更具体地,本篇主要阐述UE的以下内容:

    • 后处理主流程
    • 后处理序列
    • 后处理主要技术点

    不过,还是推荐先阅读剖析虚幻渲染体系(04)- 延迟渲染管线,再阅读此篇,保持循序渐进地学习UE渲染体系的步伐。另外,也推荐先阅读笔者的一篇文章2.1 色彩理论,对颜色空间、色彩理论、线性空间和Gamma校正有了清晰的认知之后更好地过渡到本篇文章。

    本篇涉及的源码目录主要在:

    • EngineSourceRuntimeRendererPrivatePostProcess
    • EngineShadersPrivate

    7.1.2 后处理重要性

    可能有些同学会疑惑,后处理不就是对渲染完的图片进行处理吗?跟图形渲染的紧密性有那么大么?可以忽略学习后处理么?

    为了解答以上疑问,也体现后处理的重要性,以及其在UE或图形渲染的关联和地位,特地开辟了此小节。

    以UE的默认场景为例,它的画面如下:

    现在用以下命令行关闭所有后处理:

    ShowFlag.PostProcessing 0
    

    结果画面变成了下面这般模样:

    用RenderDoc截帧,发现上面的画面其实还应用了后处理阶段的Gamma校正,好家伙,把它也关了,由此得到彻底没有后处理的画面:

    对比第一幅图,看到差别了么?是不是颜色的亮度、对比度、色彩、还有锯齿都不一样?

    这也印证了,即便你没有对后处理做任何设置或更改,UE依然在默认情况下执行了很多后处理,才使得渲染画面最终正常地呈现在屏幕前。

    由此可知,后处理之于渲染、之于UE,有着何等重要的位置。有了后处理,我们便可如虎添翼,画龙点睛,让画面才更加可信、生动、有趣。

    实际上,后处理的应用远不止于此,结合深度、法线等屏幕空间的信息之后,将拥有更广阔更丰富的魔法世界。

    7.2 后处理基础

    本章将阐述后处理的一些基础知识点和概念及UE的操作使用。

    7.2.1 后处理简介

    艺术家和设计师使用虚幻引擎提供的后期处理效果,可以调整场景的整体外观和感觉。

    默认情况下,UE会开启抗锯齿、自动曝光、Bloom、色调映射和Gamma校正等后处理:

    当然,可以通过场景视图的Show/Post Processing菜单下的选项动态开启关闭后处理,以便观察指定后处理的对场景产生的效果和变化。

    也可以通过之前提及的控制台命令开启或关闭后处理。

    7.2.2 后处理体积

    对于艺术家,更通用且方便的方法是往场景拖曳后处理体积(Post Processing Volume),以便精确地控制后处理效果和参数。

    后处理体积涉及的类型和参数非常多,下面是后处理体积的属性分类:

    其中Lens是镜头相关的后处理效果,包含Bloom、Exposure、Flares、DOF等效果;Color Grading是颜色分级,包含白平衡、全局、阴影、中调、高调等效果;Film就是电影色调映射,可以调整斜度、低调、黑色、肩部、白色等曲线参数;Rendering Feature包含了渲染管线相关的效果,包含后处理材质、环境立方图、AO、光追相关特性、GI、运动模糊、LPV、反射、SSR、透明、路径追踪以及屏幕百分比;最后是Post Processing Volume Setting,可以指定优先级、混合权重、设定是否影响无限范围(上图)。

    同一个场景可以同时存在多个后处理体积,但为了性能和可维护性,应该保持一个场景只有一个全局后处理(Infinite Extent),其余的设置为局部范围。

    7.2.3 后处理材质

    虽然后处理体积提供了很多内置的后处理效果,但渲染的效果千变万化,它们肯定无法全部满足应用的实际需求。UE的后处理材质(Post Processing Material)便满足自定义要求,可以利用材质编辑器实现自定义的后处理效果。

    添加后处理材质也不复杂,新建材质,将材质域(Material Domain)设置为Post Process,此时材质引脚只有Emissive Color被点亮:

    并且Post Process Material属性栏处于可编辑状态:

    这些参数的含义说明如下:

    • Blendable Location:材质混合位置,可选的有After Tonemapping(色调映射之后)、Before Tonemapping(色调映射之前)、Before Translucency(透明之前)、Replacing Tonemapping(替换色调映射)、SSR Input(屏幕空间反射输入),默认是After Tonemapping(色调映射之后)。
    • Output Alpha:是否输出Alpha,如果是,则需要正确处理和输出Emissive Color的Alpha通道。默认不开启。
    • Blendable Priority:混合优先级,数值越高越后被渲染(即优先级小的先被渲染)。默认是0。
    • Is Blendable:是否可混合,如果可混合,会在C++层预先插值混合共享同一母材质的所有材质(或材质实例)参数。默认是可以。
    • Enable Stencil Test:是否开启模板测试,如果开启,可以设置比较方式和参考值。默认不开启。

    编辑好后处理材质之后,为了将它应用到场景上,可以在后处理体积的Rendering Feature属性栏的Post Process Material列表中设置:

    还可以调整每个材质的混合权重和顺序(拖曳权重左边的点阵)。

    值得注意的是,在后处理材质中,SceneTexture材质节点的SceneColor无法访问,否则会报错:

    后处理材质中无法访问SceneColor,提示SceneColor只能在Surface材质域中使用。

    解决这个问题就是选中SceneTexture节点,在属性栏的Scene Texture Id选择PostProcessInput0:

    除了PostProcessInput0,还有其它很多屏幕空间的数据(GBuffer)可以被后处理材质读取:

    但是,在多数后处理通道中,PostProcessInput1~PostProcessInput6是空纹理。

    7.3 后处理流程

    本章将进入UE的后处理代码进行分析。

    7.3.1 AddPostProcessingPasses

    后处理的主入口是AddPostProcessingPasses,位于的FDeferred末尾:

    void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
    {
        (......)
        
        RenderTranslucency(RHICmdList, ...);
        
        (......)
        
        // 后处理阶段。
        if (ViewFamily.bResolveScene)
        {
            GRenderTargetPool.AddPhaseEvent(TEXT("PostProcessing"));
    
            (......)
            
            // 后处理的输入参数.
            FPostProcessingInputs PostProcessingInputs;
            PostProcessingInputs.ViewFamilyTexture = ViewFamilyTexture;
            PostProcessingInputs.SeparateTranslucencyTextures = &SeparateTranslucencyTextures;
            PostProcessingInputs.SceneTextures = SceneTextures;
    
            (......)
            
            {
                // 遍历所有view, 每个view增加后处理Pass.
                for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
                {
                    FViewInfo& View = Views[ViewIndex];
                    // 增加后处理通道.
                    AddPostProcessingPasses(GraphBuilder, View, PostProcessingInputs);
                }
            }
    
            // 将场景上下文的场景颜色纹理置空.
            AddPass(GraphBuilder, [this, &SceneContext](FRHICommandListImmediate&)
            {
                SceneContext.SetSceneColor(nullptr);
            });
        }
        
        (......)
    }
    

    AddPostProcessingPasses是处理UE内置后处理序列的,涉及的代码量大,不过下面先分析其主要流程:

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessing.cpp
    
    void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
    {
        Inputs.Validate();
    
        // 获取纹理和视图数据.
        const FIntRect PrimaryViewRect = View.ViewRect;
        const FSceneTextureParameters SceneTextureParameters = GetSceneTextureParameters(GraphBuilder, Inputs.SceneTextures);
        const FScreenPassRenderTarget ViewFamilyOutput = FScreenPassRenderTarget::CreateViewFamilyOutput(Inputs.ViewFamilyTexture, View);
        const FScreenPassTexture SceneDepth(SceneTextureParameters.SceneDepthTexture, PrimaryViewRect);
        const FScreenPassTexture SeparateTranslucency(Inputs.SeparateTranslucencyTextures->GetColorForRead(GraphBuilder), PrimaryViewRect);
        const FScreenPassTexture CustomDepth((*Inputs.SceneTextures)->CustomDepthTexture, PrimaryViewRect);
        const FScreenPassTexture Velocity(SceneTextureParameters.GBufferVelocityTexture, PrimaryViewRect);
        const FScreenPassTexture BlackDummy(GSystemTextures.GetBlackDummy(GraphBuilder));
    
        // 场景颜色.
        FScreenPassTexture SceneColor((*Inputs.SceneTextures)->SceneColorTexture, PrimaryViewRect);
        FScreenPassTexture SceneColorBeforeTonemap;
        FScreenPassTexture SceneColorAfterTonemap;
        const FScreenPassTexture OriginalSceneColor = SceneColor;
    
        // 初始化纹理.
        const FEyeAdaptationParameters EyeAdaptationParameters = GetEyeAdaptationParameters(View, ERHIFeatureLevel::SM5);
        FRDGTextureRef LastEyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
        FRDGTextureRef EyeAdaptationTexture = LastEyeAdaptationTexture;
        FRDGTextureRef HistogramTexture = BlackDummy.Texture;
    
        // 处理后处理开启标记.
        const FEngineShowFlags& EngineShowFlags = View.Family->EngineShowFlags;
        const bool bVisualizeHDR = EngineShowFlags.VisualizeHDR;
        const bool bViewFamilyOutputInHDR = GRHISupportsHDROutput && IsHDREnabled();
        const bool bVisualizeGBufferOverview = IsVisualizeGBufferOverviewEnabled(View);
        const bool bVisualizeGBufferDumpToFile = IsVisualizeGBufferDumpToFileEnabled(View);
        const bool bVisualizeGBufferDumpToPIpe = IsVisualizeGBufferDumpToPipeEnabled(View);
        const bool bOutputInHDR = IsPostProcessingOutputInHDR();
    
        const FPaniniProjectionConfig PaniniConfig(View);
    
        // 后处理特定Pass.
        enum class EPass : uint32
        {
            MotionBlur,    // 运动模糊
            Tonemap,     // 色调映射
            FXAA,        // FXAA抗锯齿    
            PostProcessMaterialAfterTonemapping, // 色调映射之后的后处理
            VisualizeDepthOfField,
            VisualizeStationaryLightOverlap,
            VisualizeLightCulling,
            SelectionOutline,
            EditorPrimitive,
            VisualizeShadingModels,
            VisualizeGBufferHints,
            VisualizeSubsurface,
            VisualizeGBufferOverview,
            VisualizeHDR,
            PixelInspector,
            HMDDistortion,
            HighResolutionScreenshotMask,
            PrimaryUpscale,     // 主放大
            SecondaryUpscale,     // 次放大
            MAX
        };
        
        (......)
    
        // 后处理特定Pass对应的名字.
        const TCHAR* PassNames[] =
        {
            TEXT("MotionBlur"),
            TEXT("Tonemap"),
            TEXT("FXAA"),
            TEXT("PostProcessMaterial (AfterTonemapping)"),
            TEXT("VisualizeDepthOfField"),
            TEXT("VisualizeStationaryLightOverlap"),
            TEXT("VisualizeLightCulling"),
            TEXT("SelectionOutline"),
            TEXT("EditorPrimitive"),
            TEXT("VisualizeShadingModels"),
            TEXT("VisualizeGBufferHints"),
            TEXT("VisualizeSubsurface"),
            TEXT("VisualizeGBufferOverview"),
            TEXT("VisualizeHDR"),
            TEXT("PixelInspector"),
            TEXT("HMDDistortion"),
            TEXT("HighResolutionScreenshotMask"),
            TEXT("PrimaryUpscale"),
            TEXT("SecondaryUpscale")
        };
    
        static_assert(static_cast<uint32>(EPass::MAX) == UE_ARRAY_COUNT(PassNames), "EPass does not match PassNames.");
    
        // 声明后处理序列PassSequence实例.
        TOverridePassSequence<EPass> PassSequence(ViewFamilyOutput);
        PassSequence.SetNames(PassNames, UE_ARRAY_COUNT(PassNames));
        
        // 开启或关闭指定Pass.
        PassSequence.SetEnabled(EPass::VisualizeStationaryLightOverlap, EngineShowFlags.StationaryLightOverlap);
        PassSequence.SetEnabled(EPass::VisualizeLightCulling, EngineShowFlags.VisualizeLightCulling);
        PassSequence.SetEnabled(EPass::SelectionOutline, false);
        PassSequence.SetEnabled(EPass::EditorPrimitive, false);
        PassSequence.SetEnabled(EPass::VisualizeShadingModels, EngineShowFlags.VisualizeShadingModels);
        PassSequence.SetEnabled(EPass::VisualizeGBufferHints, EngineShowFlags.GBufferHints);
        PassSequence.SetEnabled(EPass::VisualizeSubsurface, EngineShowFlags.VisualizeSSS);
        PassSequence.SetEnabled(EPass::VisualizeGBufferOverview, bVisualizeGBufferOverview || bVisualizeGBufferDumpToFile || bVisualizeGBufferDumpToPIpe);
        PassSequence.SetEnabled(EPass::VisualizeHDR, EngineShowFlags.VisualizeHDR);
        PassSequence.SetEnabled(EPass::PixelInspector, false);
        PassSequence.SetEnabled(EPass::HMDDistortion, EngineShowFlags.StereoRendering && EngineShowFlags.HMDDistortion);
        PassSequence.SetEnabled(EPass::HighResolutionScreenshotMask, IsHighResolutionScreenshotMaskEnabled(View));
        PassSequence.SetEnabled(EPass::PrimaryUpscale, PaniniConfig.IsEnabled() || (View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::SpatialUpscale && PrimaryViewRect.Size() != View.GetSecondaryViewRectSize()));
        PassSequence.SetEnabled(EPass::SecondaryUpscale, View.RequiresSecondaryUpscale());
        
        (......)
    
        if (IsPostProcessingEnabled(View)) // 视图启用后处理
        {
            const EStereoscopicPass StereoPass = View.StereoPass;
            
            // 处理数据和标记.
            const bool bPrimaryView = IStereoRendering::IsAPrimaryView(View);
            const bool bHasViewState = View.ViewState != nullptr;
            const bool bDepthOfFieldEnabled = DiaphragmDOF::IsEnabled(View);
            const bool bVisualizeDepthOfField = bDepthOfFieldEnabled && EngineShowFlags.VisualizeDOF;
            const bool bVisualizeMotionBlur = IsVisualizeMotionBlurEnabled(View);
            const EAutoExposureMethod AutoExposureMethod = GetAutoExposureMethod(View);
            const EAntiAliasingMethod AntiAliasingMethod = !bVisualizeDepthOfField ? View.AntiAliasingMethod : AAM_None;
            const EDownsampleQuality DownsampleQuality = GetDownsampleQuality();
            const EPixelFormat DownsampleOverrideFormat = PF_FloatRGB;
            const bool bMotionBlurEnabled = !bVisualizeMotionBlur && IsMotionBlurEnabled(View);
            const bool bTonemapEnabled = !bVisualizeMotionBlur;
            const bool bTonemapOutputInHDR = View.Family->SceneCaptureSource == SCS_FinalColorHDR || View.Family->SceneCaptureSource == SCS_FinalToneCurveHDR || bOutputInHDR || bViewFamilyOutputInHDR;
            const bool bEyeAdaptationEnabled = bHasViewState && bPrimaryView;
            const bool bHistogramEnabled = bVisualizeHDR || (bEyeAdaptationEnabled && AutoExposureMethod == EAutoExposureMethod::AEM_Histogram && View.FinalPostProcessSettings.AutoExposureMinBrightness < View.FinalPostProcessSettings.AutoExposureMaxBrightness);
            const bool bBloomEnabled = View.FinalPostProcessSettings.BloomIntensity > 0.0f;
    
            // 色调映射之后的后处理材质.
            const FPostProcessMaterialChain PostProcessMaterialAfterTonemappingChain = GetPostProcessMaterialChain(View, BL_AfterTonemapping);
    
            PassSequence.SetEnabled(EPass::MotionBlur, bVisualizeMotionBlur || bMotionBlurEnabled);
            PassSequence.SetEnabled(EPass::Tonemap, bTonemapEnabled);
            PassSequence.SetEnabled(EPass::FXAA, AntiAliasingMethod == AAM_FXAA);
            PassSequence.SetEnabled(EPass::PostProcessMaterialAfterTonemapping, PostProcessMaterialAfterTonemappingChain.Num() != 0);
            PassSequence.SetEnabled(EPass::VisualizeDepthOfField, bVisualizeDepthOfField);
    
            // 插件后处理回调.
            for (int32 ViewExt = 0; ViewExt < View.Family->ViewExtensions.Num(); ++ViewExt)
            {
                for (int32 SceneViewPassId = 0; SceneViewPassId != static_cast<int>(ISceneViewExtension::EPostProcessingPass::MAX); SceneViewPassId++)
                {
                    ISceneViewExtension::EPostProcessingPass SceneViewPass = static_cast<ISceneViewExtension::EPostProcessingPass>(SceneViewPassId);
                    EPass PostProcessingPass = TranslatePass(SceneViewPass);
    
                    View.Family->ViewExtensions[ViewExt]->SubscribeToPostProcessingPass(
                        SceneViewPass,
                        PassSequence.GetAfterPassCallbacks(PostProcessingPass),
                        PassSequence.IsEnabled(PostProcessingPass));
                }
            }
    
            // 后处理序列开启或关闭处理完毕.
            PassSequence.Finalize();
    
            // 后处理材质链 - 透明混合之前(Before Translucency)
            {
                const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTranslucency);
    
                if (MaterialChain.Num())
                {
                    SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
                }
            }
    
            // 光圈DOF
            {
                FRDGTextureRef LocalSceneColorTexture = SceneColor.Texture;
    
                if (bDepthOfFieldEnabled)
                {
                    LocalSceneColorTexture = DiaphragmDOF::AddPasses(GraphBuilder, SceneTextureParameters, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
                }
    
                if (LocalSceneColorTexture == SceneColor.Texture)
                {
                    LocalSceneColorTexture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
                }
    
                SceneColor.Texture = LocalSceneColorTexture;
            }
    
            // 后处理材质链 - 色调映射之前(Before Tonemapping)
            {
                const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTonemapping);
    
                if (MaterialChain.Num())
                {
                    SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
                }
            }
    
            FScreenPassTexture HalfResolutionSceneColor;
    
            // 主视图区域.
            FIntRect SecondaryViewRect = PrimaryViewRect;
    
            // 时间抗锯齿TAA.
            if (AntiAliasingMethod == AAM_TemporalAA)
            {
                // 是否允许场景颜色下采样.
                const bool bAllowSceneDownsample =
                    IsTemporalAASceneDownsampleAllowed(View) &&
                    // We can only merge if the normal downsample pass would happen immediately after.
                    !bMotionBlurEnabled && !bVisualizeMotionBlur &&
                    // TemporalAA is only able to match the low quality mode (box filter).
                    GetDownsampleQuality() == EDownsampleQuality::Low;
    
                int32 UpscaleMode = ITemporalUpscaler::GetTemporalUpscalerMode();
    
                const ITemporalUpscaler* DefaultTemporalUpscaler = ITemporalUpscaler::GetDefaultTemporalUpscaler();
                const ITemporalUpscaler* UpscalerToUse = ( UpscaleMode == 0 || !View.Family->GetTemporalUpscalerInterface())? DefaultTemporalUpscaler : View.Family->GetTemporalUpscalerInterface();
    
                const TCHAR* UpscalerName = UpscalerToUse->GetDebugName();
    
                (......)
    
                ITemporalUpscaler::FPassInputs UpscalerPassInputs;
    
                UpscalerPassInputs.bAllowDownsampleSceneColor = bAllowSceneDownsample;
                UpscalerPassInputs.DownsampleOverrideFormat = DownsampleOverrideFormat;
                UpscalerPassInputs.SceneColorTexture = SceneColor.Texture;
                UpscalerPassInputs.SceneDepthTexture = SceneDepth.Texture;
                UpscalerPassInputs.SceneVelocityTexture = Velocity.Texture;
                UpscalerPassInputs.EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
    
                // 增加TAA Pass.
                UpscalerToUse->AddPasses(
                    GraphBuilder,
                    View,
                    UpscalerPassInputs,
                    &SceneColor.Texture,
                    &SecondaryViewRect,
                    &HalfResolutionSceneColor.Texture,
                    &HalfResolutionSceneColor.ViewRect);
            }
            // 屏幕空间反射(SSR).
            else if (ShouldRenderScreenSpaceReflections(View))
            {
                if (!View.bStatePrevViewInfoIsReadOnly)
                {
                    check(View.ViewState);
                    FTemporalAAHistory& OutputHistory = View.ViewState->PrevFrameViewInfo.TemporalAAHistory;
                    GraphBuilder.QueueTextureExtraction(SceneColor.Texture, &OutputHistory.RT[0]);
    
                    FTAAPassParameters TAAInputs(View);
                    TAAInputs.SceneColorInput = SceneColor.Texture;
                    TAAInputs.SetupViewRect(View);
                    OutputHistory.ViewportRect = TAAInputs.OutputViewRect;
                    OutputHistory.ReferenceBufferSize = TAAInputs.GetOutputExtent() * TAAInputs.ResolutionDivisor;
                }
            }
    
            // 场景颜色视图区域编程次视图的.
            SceneColor.ViewRect = SecondaryViewRect;
    
            // 后处理材质链 - 屏幕空间反射输入(SSR Input)
            if (View.ViewState && !View.bStatePrevViewInfoIsReadOnly)
            {
                const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_SSRInput);
    
                if (MaterialChain.Num())
                {
                    // 保存SSR的后处理输出给下一帧使用.
                    FScreenPassTexture PassOutput = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
                    GraphBuilder.QueueTextureExtraction(PassOutput.Texture, &View.ViewState->PrevFrameViewInfo.CustomSSRInput);
                }
            }
    
            // 运动模糊.
            if (PassSequence.IsEnabled(EPass::MotionBlur))
            {
                FMotionBlurInputs PassInputs;
                PassSequence.AcceptOverrideIfLastPass(EPass::MotionBlur, PassInputs.OverrideOutput);
                PassInputs.SceneColor = SceneColor;
                PassInputs.SceneDepth = SceneDepth;
                PassInputs.SceneVelocity = Velocity;
                PassInputs.Quality = GetMotionBlurQuality();
                PassInputs.Filter = GetMotionBlurFilter();
    
                if (bVisualizeMotionBlur)
                {
                    SceneColor = AddVisualizeMotionBlurPass(GraphBuilder, View, PassInputs);
                }
                else
                {
                    SceneColor = AddMotionBlurPass(GraphBuilder, View, PassInputs);
                }
            }
    
            SceneColor = AddAfterPass(EPass::MotionBlur, SceneColor);
    
            // 如果TAA没有下采样场景颜色, 这里将采用半尺寸分辨率执行之.
            if (!HalfResolutionSceneColor.Texture)
            {
                FDownsamplePassInputs PassInputs;
                PassInputs.Name = TEXT("HalfResolutionSceneColor");
                PassInputs.SceneColor = SceneColor;
                PassInputs.Quality = DownsampleQuality;
                PassInputs.FormatOverride = DownsampleOverrideFormat;
    
                HalfResolutionSceneColor = AddDownsamplePass(GraphBuilder, View, PassInputs);
            }
    
            // 保存半尺寸分辨率的场景颜色到历史中.
            extern int32 GSSRHalfResSceneColor;
            if (ShouldRenderScreenSpaceReflections(View) && !View.bStatePrevViewInfoIsReadOnly && GSSRHalfResSceneColor)
            {
                check(View.ViewState);
                GraphBuilder.QueueTextureExtraction(HalfResolutionSceneColor.Texture, &View.ViewState->PrevFrameViewInfo.HalfResTemporalAAHistory);
            }
    
            FSceneDownsampleChain SceneDownsampleChain;
    
            // 直方图.
            if (bHistogramEnabled)
            {
                HistogramTexture = AddHistogramPass(GraphBuilder, View, EyeAdaptationParameters, HalfResolutionSceneColor, LastEyeAdaptationTexture);
            }
    
            // 人眼适应(自动曝光).
            if (bEyeAdaptationEnabled)
            {
                const bool bBasicEyeAdaptationEnabled = bEyeAdaptationEnabled && (AutoExposureMethod == EAutoExposureMethod::AEM_Basic);
    
                if (bBasicEyeAdaptationEnabled)
                {
                    const bool bLogLumaInAlpha = true;
                    SceneDownsampleChain.Init(GraphBuilder, View, EyeAdaptationParameters, HalfResolutionSceneColor, DownsampleQuality, bLogLumaInAlpha);
    
                    // Use the alpha channel in the last downsample (smallest) to compute eye adaptations values.
                    EyeAdaptationTexture = AddBasicEyeAdaptationPass(GraphBuilder, View, EyeAdaptationParameters, SceneDownsampleChain.GetLastTexture(), LastEyeAdaptationTexture);
                }
                // Add histogram eye adaptation pass even if no histogram exists to support the manual clamping mode.
                else
                {
                    EyeAdaptationTexture = AddHistogramEyeAdaptationPass(GraphBuilder, View, EyeAdaptationParameters, HistogramTexture);
                }
            }
    
            FScreenPassTexture Bloom;
    
            // 泛光.
            if (bBloomEnabled)
            {
                FSceneDownsampleChain BloomDownsampleChain;
    
                FBloomInputs PassInputs;
                PassInputs.SceneColor = SceneColor;
    
                const bool bBloomThresholdEnabled = View.FinalPostProcessSettings.BloomThreshold > -1.0f;
    
                // Reuse the main scene downsample chain if a threshold isn't required for bloom.
                if (SceneDownsampleChain.IsInitialized() && !bBloomThresholdEnabled)
                {
                    PassInputs.SceneDownsampleChain = &SceneDownsampleChain;
                }
                else
                {
                    FScreenPassTexture DownsampleInput = HalfResolutionSceneColor;
    
                    if (bBloomThresholdEnabled)
                    {
                        const float BloomThreshold = View.FinalPostProcessSettings.BloomThreshold;
    
                        FBloomSetupInputs SetupPassInputs;
                        SetupPassInputs.SceneColor = DownsampleInput;
                        SetupPassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
                        SetupPassInputs.Threshold = BloomThreshold;
    
                        DownsampleInput = AddBloomSetupPass(GraphBuilder, View, SetupPassInputs);
                    }
    
                    const bool bLogLumaInAlpha = false;
                    BloomDownsampleChain.Init(GraphBuilder, View, EyeAdaptationParameters, DownsampleInput, DownsampleQuality, bLogLumaInAlpha);
    
                    PassInputs.SceneDownsampleChain = &BloomDownsampleChain;
                }
    
                FBloomOutputs PassOutputs = AddBloomPass(GraphBuilder, View, PassInputs);
                SceneColor = PassOutputs.SceneColor;
                Bloom = PassOutputs.Bloom;
    
                FScreenPassTexture LensFlares = AddLensFlaresPass(GraphBuilder, View, Bloom, *PassInputs.SceneDownsampleChain);
    
                if (LensFlares.IsValid())
                {
                    Bloom = LensFlares;
                }
            }
    
            if (!Bloom.IsValid())
            {
                Bloom = BlackDummy;
            }
    
            SceneColorBeforeTonemap = SceneColor;
    
            // 色调映射.
            if (PassSequence.IsEnabled(EPass::Tonemap))
            {
                const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_ReplacingTonemapper);
    
                if (MaterialChain.Num())
                {
                    const UMaterialInterface* HighestPriorityMaterial = MaterialChain[0];
    
                    FPostProcessMaterialInputs PassInputs;
                    PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
                    PassInputs.SetInput(EPostProcessMaterialInput::SceneColor, SceneColor);
                    PassInputs.SetInput(EPostProcessMaterialInput::SeparateTranslucency, SeparateTranslucency);
                    PassInputs.SetInput(EPostProcessMaterialInput::CombinedBloom, Bloom);
                    PassInputs.SceneTextures = GetSceneTextureShaderParameters(Inputs.SceneTextures);
                    PassInputs.CustomDepthTexture = CustomDepth.Texture;
    
                    SceneColor = AddPostProcessMaterialPass(GraphBuilder, View, PassInputs, HighestPriorityMaterial);
                }
                else
                {
                    FRDGTextureRef ColorGradingTexture = nullptr;
    
                    if (bPrimaryView)
                    {
                        ColorGradingTexture = AddCombineLUTPass(GraphBuilder, View);
                    }
                    // We can re-use the color grading texture from the primary view.
                    else if (View.GetTonemappingLUT())
                    {
                        ColorGradingTexture = TryRegisterExternalTexture(GraphBuilder, View.GetTonemappingLUT());
                    }
                    else
                    {
                        const FViewInfo* PrimaryView = static_cast<const FViewInfo*>(View.Family->Views[0]);
                        ColorGradingTexture = TryRegisterExternalTexture(GraphBuilder, PrimaryView->GetTonemappingLUT());
                    }
    
                    FTonemapInputs PassInputs;
                    PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
                    PassInputs.SceneColor = SceneColor;
                    PassInputs.Bloom = Bloom;
                    PassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
                    PassInputs.ColorGradingTexture = ColorGradingTexture;
                    PassInputs.bWriteAlphaChannel = AntiAliasingMethod == AAM_FXAA || IsPostProcessingWithAlphaChannelSupported();
                    PassInputs.bOutputInHDR = bTonemapOutputInHDR;
    
                    SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
                }
            }
            
            SceneColor = AddAfterPass(EPass::Tonemap, SceneColor);
            
            SceneColorAfterTonemap = SceneColor;
    
            // FXAA抗锯齿.
            if (PassSequence.IsEnabled(EPass::FXAA))
            {
                FFXAAInputs PassInputs;
                PassSequence.AcceptOverrideIfLastPass(EPass::FXAA, PassInputs.OverrideOutput);
                PassInputs.SceneColor = SceneColor;
                PassInputs.Quality = GetFXAAQuality();
    
                SceneColor = AddFXAAPass(GraphBuilder, View, PassInputs);
            }
    
            SceneColor = AddAfterPass(EPass::FXAA, SceneColor);
    
            // 后处理材质链 - 色调映射之后(After Tonemapping)
            if (PassSequence.IsEnabled(EPass::PostProcessMaterialAfterTonemapping))
            {
                FPostProcessMaterialInputs PassInputs = GetPostProcessMaterialInputs(SceneColor);
                PassSequence.AcceptOverrideIfLastPass(EPass::PostProcessMaterialAfterTonemapping, PassInputs.OverrideOutput);
                PassInputs.SetInput(EPostProcessMaterialInput::PreTonemapHDRColor, SceneColorBeforeTonemap);
                PassInputs.SetInput(EPostProcessMaterialInput::PostTonemapHDRColor, SceneColorAfterTonemap);
                PassInputs.SceneTextures = GetSceneTextureShaderParameters(Inputs.SceneTextures);
    
                SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, PassInputs, PostProcessMaterialAfterTonemappingChain);
            }
    
            (......)
    
            SceneColor = AddAfterPass(EPass::VisualizeDepthOfField, SceneColor);
        }
        else // 视图不启用后处理, 则最小化后处理序列: 只混合透明纹理和Gamma校正.
        {
            PassSequence.SetEnabled(EPass::MotionBlur, false);
            PassSequence.SetEnabled(EPass::Tonemap, true);
            PassSequence.SetEnabled(EPass::FXAA, false);
            PassSequence.SetEnabled(EPass::PostProcessMaterialAfterTonemapping, false);
            PassSequence.SetEnabled(EPass::VisualizeDepthOfField, false);
            PassSequence.Finalize();
    
            SceneColor.Texture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
    
            SceneColorBeforeTonemap = SceneColor;
    
            if (PassSequence.IsEnabled(EPass::Tonemap))
            {
                FTonemapInputs PassInputs;
                PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
                PassInputs.SceneColor = SceneColor;
                PassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
                PassInputs.bOutputInHDR = bViewFamilyOutputInHDR;
                PassInputs.bGammaOnly = true;
    
                SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
            }
    
            SceneColor = AddAfterPass(EPass::Tonemap, SceneColor);
    
            SceneColorAfterTonemap = SceneColor;
        }
    
        // 可视化后处理Pass.
        if (PassSequence.IsEnabled(EPass::VisualizeStationaryLightOverlap))
        {
            (......)
    
            SceneColor = AddVisualizeComplexityPass(GraphBuilder, View, PassInputs);
        }
    
        (......) // 忽略编辑器或可视化代码.
    
        // 主放大Pass
        if (PassSequence.IsEnabled(EPass::PrimaryUpscale))
        {
            FUpscaleInputs PassInputs;
            PassSequence.AcceptOverrideIfLastPass(EPass::PrimaryUpscale, PassInputs.OverrideOutput);
            PassInputs.SceneColor = SceneColor;
            PassInputs.Method = GetUpscaleMethod();
            PassInputs.Stage = PassSequence.IsEnabled(EPass::SecondaryUpscale) ? EUpscaleStage::PrimaryToSecondary : EUpscaleStage::PrimaryToOutput;
    
            // 帕尼尼投影(Panini projection)由主放大通道处理。
            PassInputs.PaniniConfig = PaniniConfig;
    
            SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
        }
    
        // 次放大Pass
        if (PassSequence.IsEnabled(EPass::SecondaryUpscale))
        {
            FUpscaleInputs PassInputs;
            PassSequence.AcceptOverrideIfLastPass(EPass::SecondaryUpscale, PassInputs.OverrideOutput);
            PassInputs.SceneColor = SceneColor;
            PassInputs.Method = View.Family->SecondaryScreenPercentageMethod == ESecondaryScreenPercentageMethod::LowerPixelDensitySimulation ? EUpscaleMethod::SmoothStep : EUpscaleMethod::Nearest;
            PassInputs.Stage = EUpscaleStage::SecondaryToOutput;
    
            SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
        }
    }
    

    后处理材质会先声明一个TOverridePassSequence实例,然后按需开启或关闭它们,之后会根据视图是否启用后处理进入两个分支:如果启用,则按序列先后处理每个后处理效果;如果不启用,则只保留最小化的后处理序列,仅包含透明纹理混合和Gamma校正。

    至于判断视图是否开启后处理,由以下接口实现:

    bool IsPostProcessingEnabled(const FViewInfo& View)
    {
        if (View.GetFeatureLevel() >= ERHIFeatureLevel::SM5) // 高于SM5的设备
        {
            return
                // 视图家族开启后处理
                 View.Family->EngineShowFlags.PostProcessing &&
                // 并且不是可视化调试模式.
                !View.Family->EngineShowFlags.VisualizeDistanceFieldAO &&
                !View.Family->EngineShowFlags.VisualizeShadingModels &&
                !View.Family->EngineShowFlags.VisualizeMeshDistanceFields &&
                !View.Family->EngineShowFlags.VisualizeGlobalDistanceField &&
                !View.Family->EngineShowFlags.ShaderComplexity;
        }
        // < SM5的设备
        else
        {
            // 视图家族开启后处理且不开启着色复杂度调试模式且是移动端HDR管线.
            return View.Family->EngineShowFlags.PostProcessing && !View.Family->EngineShowFlags.ShaderComplexity && IsMobileHDR();
        }
    }
    

    这些特定后处理Pass都有统一的形式:输入纹理、输入参数、输出纹理,输入纹理必然包含SceneColor,输出纹理通常也是SceneColor,并且上一个后处理Pass的SceneColor输出作为下一个后处理Pass的SceneColor输入,以实现不同后处理效果的叠加。

    另外,需要注意的是,有一些后处理(屏幕空间)的效果并没有在PassSequence中体现,而是安插在后处理渲染管线的特定位置中。

    7.3.2 TOverridePassSequence

    由于后处理效果是叠加状态,意味着它们的混合(处理)顺序相关,如果处理顺序不当,将得到非预想的结果。

    TOverridePassSequence就是给定一个枚举类型,按照特殊规则管理和有序地执行所有的Pass。它的定义如下:

    // EngineSourceRuntimeRendererPrivateOverridePassSequence.h
    
    template <typename EPass>
    class TOverridePassSequence final
    {
    public:
        TOverridePassSequence(const FScreenPassRenderTarget& InOverrideOutput)
            : OverrideOutput(InOverrideOutput)
        {}
    
        ~TOverridePassSequence();
    
        // 设置名字
        void SetName(EPass Pass, const TCHAR* Name);
        void SetNames(const TCHAR* const* Names, uint32 NameCount);
    
        // 开启指定Pass.
        void SetEnabled(EPass Pass, bool bEnabled);
        bool IsEnabled(EPass Pass) const;
    
        // 是否最后一个Pass.
        bool IsLastPass(EPass Pass) const;
    
        // 接受Pass, 如果没有按顺序, 则会报错.
        void AcceptPass(EPass Pass)
        {
    #if RDG_ENABLE_DEBUG
            const int32 PassIndex = (int32)Pass;
    
            check(bFinalized);
            checkf(NextPass == Pass, TEXT("Pass was accepted out of order: %s. Expected %s."), Passes[PassIndex].Name, Passes[(int32)NextPass].Name);
            checkf(Passes[PassIndex].bEnabled, TEXT("Only accepted passes can be enabled: %s."), Passes[PassIndex].Name);
    
            Passes[PassIndex].bAccepted = true;
    
            // Walk the remaining passes until we hit one that's enabled. This will be the next pass to add.
            for (int32 NextPassIndex = int32(NextPass) + 1; NextPassIndex < PassCountMax; ++NextPassIndex)
            {
                if (Passes[NextPassIndex].bEnabled)
                {
                    NextPass = EPass(NextPassIndex);
                    break;
                }
            }
    #endif
        }
    
        // 如果Pass是最后一个, 则接受覆盖的RT.
        bool AcceptOverrideIfLastPass(EPass Pass, FScreenPassRenderTarget& OutTargetToOverride, const TOptional<int32>& AfterPassCallbackIndex = TOptional<int32>())
        {
            bool bLastAfterPass = AfterPass[(int32)Pass].Num() == 0;
    
            if (AfterPassCallbackIndex)
            {
                bLastAfterPass = AfterPassCallbackIndex.GetValue() == AfterPass[(int32)Pass].Num() - 1;
            }
            else
            {
                // Display debug information for a Pass unless it is an after pass.
                AcceptPass(Pass);
            }
    
            // We need to override output only if this is the last pass and the last after pass.
            if (IsLastPass(Pass) && bLastAfterPass)
            {
                OutTargetToOverride = OverrideOutput;
                return true;
            }
    
            return false;
        }
    
        // Pass开启结束.
        void Finalize()
        {
    #if RDG_ENABLE_DEBUG
            check(!bFinalized);
            bFinalized = true;
    
            for (int32 PassIndex = 0; PassIndex < PassCountMax; ++PassIndex)
            {
                checkf(Passes[PassIndex].bAssigned, TEXT("Pass was not assigned to enabled or disabled: %s."), Passes[PassIndex].Name);
            }
    #endif
    
            bool bFirstPass = true;
    
            for (int32 PassIndex = 0; PassIndex < PassCountMax; ++PassIndex)
            {
                if (Passes[PassIndex].bEnabled)
                {
                    if (bFirstPass)
                    {
    #if RDG_ENABLE_DEBUG
                        NextPass = (EPass)PassIndex;
    #endif
                        bFirstPass = false;
                    }
                    LastPass = (EPass)PassIndex;
                }
            }
        }
    
        FAfterPassCallbackDelegateArray& GetAfterPassCallbacks(EPass Pass);
    
    private:
        static const int32 PassCountMax = (int32)EPass::MAX;
    
        struct FPassInfo
        {
    #if RDG_ENABLE_DEBUG
            const TCHAR* Name = nullptr;
            bool bAssigned = false;
            bool bAccepted = false;
    #endif
            bool bEnabled = false;
        };
    
        FScreenPassRenderTarget OverrideOutput;
        TStaticArray<FPassInfo, PassCountMax> Passes;
        TStaticArray<FAfterPassCallbackDelegateArray, PassCountMax> AfterPass;
        EPass LastPass = EPass::MAX;
    
    #if RDG_ENABLE_DEBUG
        EPass NextPass = EPass(0);
        bool bFinalized = false;
    #endif
    };
    

    通过TOverridePassSequence可以方便地实现、管理、执行一组有序的后处理效果。但也需要注意以下几点:

    • PassSequence在处理完通道开启和关闭之后,需要手动调用一次Finalize,否则开发者模式下会报错。

    • TOverridePassSequence需要显示开启和关闭指定通道,如果开启或关闭的通道和实际加入的Pass不一致,则会报错。下面详细地按某些情形讨论PassSequence是否会产生报错(开发模式下):

      • 通道A被开启,没有向GraphBuilder添加Pass,没有调用AcceptOverrideIfLastPass,会报错。
      • 通道A被开启,向GraphBuilder添加了Pass,没有调用AcceptOverrideIfLastPass,会报错。
      • 通道A被开启,没有向GraphBuilder添加Pass,有调用AcceptOverrideIfLastPass,不会报错。PassSequence无法察觉到这种情况的异常!!
      • 通道A被关闭,向GraphBuilder添加了Pass,有调用AcceptOverrideIfLastPass,会报错。
      • 通道A被关闭,向GraphBuilder添加了Pass,没有调用AcceptOverrideIfLastPass,不会报错。PassSequence无法察觉到这种情况的异常!!
      • 如果通道A和B都被开启,但B在A之前向GraphBuilder添加了Pass并调用AcceptOverrideIfLastPass,会报错。

      举个具体的例子,有以下代码:

      // 关闭通道序列的FXAA.
      PassSequence.SetEnabled(EPass::FXAA, false);
      
      (......)
      
      // 构造FXAA输入参数
      FFXAAInputs PassInputs;
      // 调用Pass接受.
      PassSequence.AcceptOverrideIfLastPass(EPass::FXAA, PassInputs.OverrideOutput);
      PassInputs.SceneColor = SceneColor;
      PassInputs.Quality = GetFXAAQuality();
      
      // 加入FXAA通道.
      SceneColor = AddFXAAPass(GraphBuilder, View, PassInputs);
      

      由于以上代码中已经关闭了PassSequence的FXAA通道,但又尝试进行对其添加通道并调用AcceptOverrideIfLastPass,则开发者模式下会报以下错误:

      意思是说Pass没有按照开启的Pass顺序被接受,是在AcceptOverrideIfLastPass内部触发的。

    7.3.3 BlendableLocation

    BlendableLocation就是后处理材质的混合位置,它的定义如下:

    enum EBlendableLocation
    {
        // 色调映射之后.
        BL_AfterTonemapping,
        // 色调映射之前.
        BL_BeforeTonemapping,
        // 半透明组合之前.
        BL_BeforeTranslucency,
        // 替换掉色调映射.
        BL_ReplacingTonemapper,
        // SSR输入.
        BL_SSRInput,
    
        BL_MAX,
    };
    

    EBlendableLocation可在材质编辑器的属性面板中指定:

    默认的混合阶段是在色调映射之后(After Tonemapping),但是,我们可以改变混合位置来实现不同的效果。比如,我们的后处理效果需要用到色调映射之前的场景颜色,那么就需要将混合位置改成Before Tonemapping;如果需要自定义色调映射算法,以代替UE的默认色调映射效果,那么可以改成Replacing the Tonemapper;如果我们的后处理效果不希望影响透明物体,则可以改成Before Translucency;如果想实现自定义的SSR算法,则可以改成SSR Input。

    为了阐明BlendableLocation在后处理管线中的作用及处理过程,抽取其相关的类型、代码和步骤:

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessMaterial.h
    
    using FPostProcessMaterialChain = TArray<const UMaterialInterface*, TInlineAllocator<10>>;
    FPostProcessMaterialChain GetPostProcessMaterialChain(const FViewInfo& View, EBlendableLocation Location);
    
    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessMaterial.cpp
        
    FPostProcessMaterialChain GetPostProcessMaterialChain(const FViewInfo& View, EBlendableLocation Location)
    {
        if (!IsPostProcessMaterialsEnabledForView(View))
        {
            return {};
        }
    
        const FSceneViewFamily& ViewFamily = *View.Family;
    
        TArray<FPostProcessMaterialNode, TInlineAllocator<10>> Nodes;
        FBlendableEntry* Iterator = nullptr;
    
        (......)
    
        // 遍历视图的后处理设置, 获取所有后处理材质节点. 注意, 这里的迭代器已经指明了Location, 意味着添加到Nodes的都是在Location的材质.
        while (FPostProcessMaterialNode* Data = IteratePostProcessMaterialNodes(View.FinalPostProcessSettings, Location, Iterator))
        {
            check(Data->GetMaterialInterface());
            Nodes.Add(*Data);
        }
    
        if (!Nodes.Num())
        {
            return {};
        }
    
        // 按优先级排序.
        ::Sort(Nodes.GetData(), Nodes.Num(), FPostProcessMaterialNode::FCompare());
    
        FPostProcessMaterialChain OutputChain;
        OutputChain.Reserve(Nodes.Num());
    
        // 添加材质到输出列表.
        for (const FPostProcessMaterialNode& Node : Nodes)
        {
            OutputChain.Add(Node.GetMaterialInterface());
        }
    
        return OutputChain;
    }
    
    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessing.cpp
    
    void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
    {
        (......)
        
        const FPostProcessMaterialChain PostProcessMaterialAfterTonemappingChain = GetPostProcessMaterialChain(View, BL_AfterTonemapping);
        
        (......)
        
        // 后处理材质链 - Before Translucency
        const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTranslucency);
        SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
        
        (......)
        
        // 组合半透明纹理到场景颜色纹理中.
        LocalSceneColorTexture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
        
        (......)
        
        // 后处理材质链 - Before Tonemapping
        const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTonemapping);
        SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
        
        (......)
        
        // 后处理材质链 - SSR Input
        const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_SSRInput);
        GraphBuilder.QueueTextureExtraction(PassOutput.Texture, &View.ViewState->PrevFrameViewInfo.CustomSSRInput);
        
        (......)
        
        // 色调映射
        if (PassSequence.IsEnabled(EPass::Tonemap))
        {
            const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_ReplacingTonemapper);
    
            // 如果存在需要替换UE默认色调映射的材质, 则执行之.
            if (MaterialChain.Num())
            {
                SceneColor = AddPostProcessMaterialPass(GraphBuilder, View, PassInputs, HighestPriorityMaterial);
            }
            // 不存在需要替换UE默认色调映射的材质, 执行UE默认的色调映射.
            else
            {
                SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
            }
        }
    
        // 后处理材质链 - After Tonemapping
        SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, PassInputs, PostProcessMaterialAfterTonemappingChain);
    
        (......)
    }
    

    由以上代码可知,BlendableLocation名符其实,单看Location的字面意思就已经知道它的运行位置。其中最重要的是色调映射,在之前、之中、之后都可以自定义后处理材质,为引擎可扩展性添砖加瓦。

    7.3.4 PostProcessMaterial

    PostProcessMaterial就是处理和渲染BlendableLocation的后处理材质类,其定义和相关类型如下:

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessMaterial.h
    
    // 后处理材质输入槽. 
    enum class EPostProcessMaterialInput : uint32
    {
        SceneColor = 0, // 场景颜色, 总是激活(可用)状态. 来自上一个后处理的输出.
        SeparateTranslucency = 1, // 透明纹理, 总是激活状态.
        CombinedBloom = 2, // 组合的泛光.
    
        // 仅用于可视化.
        PreTonemapHDRColor = 2,
        PostTonemapHDRColor = 3,
    
        // 速度.
        Velocity = 4
    };
    
    // 后处理材质Uniform Buffer.
    BEGIN_SHADER_PARAMETER_STRUCT(FPostProcessMaterialParameters, )
        SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View)
        SHADER_PARAMETER_STRUCT_INCLUDE(FSceneTextureShaderParameters, SceneTextures)
        SHADER_PARAMETER_STRUCT(FScreenPassTextureViewportParameters, PostProcessOutput)
        SHADER_PARAMETER_STRUCT_ARRAY(FScreenPassTextureInput, PostProcessInput, [kPostProcessMaterialInputCountMax])
        SHADER_PARAMETER_SAMPLER(SamplerState, PostProcessInput_BilinearSampler)
        SHADER_PARAMETER_RDG_TEXTURE(Texture2D, MobileCustomStencilTexture)
        SHADER_PARAMETER_SAMPLER(SamplerState, MobileCustomStencilTextureSampler)
        SHADER_PARAMETER_RDG_TEXTURE(Texture2D, EyeAdaptationTexture)
        SHADER_PARAMETER_SRV(Buffer<float4>, EyeAdaptationBuffer)
        SHADER_PARAMETER(int32, MobileStencilValueRef)
        SHADER_PARAMETER(uint32, bFlipYAxis)
        SHADER_PARAMETER(uint32, bMetalMSAAHDRDecode)
        RENDER_TARGET_BINDING_SLOTS()
    END_SHADER_PARAMETER_STRUCT()
    
    // 后处理材质输入.
    struct FPostProcessMaterialInputs
    {
        inline void SetInput(EPostProcessMaterialInput Input, FScreenPassTexture Texture)
        {
            Textures[(uint32)Input] = Texture;
        }
    
        inline FScreenPassTexture GetInput(EPostProcessMaterialInput Input) const
        {
            return Textures[(uint32)Input];
        }
    
        // 验证纹理有效性.
        inline void Validate() const
        {
            ValidateInputExists(EPostProcessMaterialInput::SceneColor);
            ValidateInputExists(EPostProcessMaterialInput::SeparateTranslucency);
    
            // Either override output format is valid or the override output texture is; not both.
            if (OutputFormat != PF_Unknown)
            {
                check(OverrideOutput.Texture == nullptr);
            }
            if (OverrideOutput.Texture)
            {
                check(OutputFormat == PF_Unknown);
            }
    
            check(SceneTextures.SceneTextures || SceneTextures.MobileSceneTextures);
        }
    
        inline void ValidateInputExists(EPostProcessMaterialInput Input) const
        {
            const FScreenPassTexture Texture = GetInput(EPostProcessMaterialInput::SceneColor);
            check(Texture.IsValid());
        }
    
        // 可选的, 渲染到指定的RT. 如果没有, 则新的纹理被创建.
        FScreenPassRenderTarget OverrideOutput;
        // 纹理列表.
        TStaticArray<FScreenPassTexture, kPostProcessMaterialInputCountMax> Textures;
        // 输出RT格式.
        EPixelFormat OutputFormat = PF_Unknown;
        // 自定义深度纹理.
        FRDGTextureRef CustomDepthTexture = nullptr;
    
        // 场景的GBuffer.
        FSceneTextureShaderParameters SceneTextures;
    
        // 是否翻转Y轴.
        bool bFlipYAxis = false;
        // 是否允许输入的场景颜色作为输出.
        bool bAllowSceneColorInputAsOutput = true;
        // Metal MSAA特殊标记.
        bool bMetalMSAAHDRDecode = false;
    };
    
    // 增加后处理材质Pass.
    FScreenPassTexture AddPostProcessMaterialPass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessMaterialInputs& Inputs, const UMaterialInterface* MaterialInterface);
    
    // 增加后处理材质链.
    FScreenPassTexture AddPostProcessMaterialChain(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessMaterialInputs& Inputs, const FPostProcessMaterialChain& MaterialChain);
    

    下面继续分析在后处理管线需要调用到的AddPostProcessMaterialChain

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessMaterial.cpp
    
    FScreenPassTexture AddPostProcessMaterialChain(
        FRDGBuilder& GraphBuilder,
        const FViewInfo& View,
        const FPostProcessMaterialInputs& InputsTemplate,
        const FPostProcessMaterialChain& Materials)
    {
        // 初始化输出为场景颜色的输入.
        FScreenPassTexture Outputs = InputsTemplate.GetInput(EPostProcessMaterialInput::SceneColor);
    
        (......)
        
        // 遍历材质链, 给每个材质添加一个通道.
        for (const UMaterialInterface* MaterialInterface : Materials)
        {
            FPostProcessMaterialInputs Inputs = InputsTemplate;
            Inputs.SetInput(EPostProcessMaterialInput::SceneColor, Outputs);
            
            (......)
    
            // 如果不是最后一个材质, 则不应用输出覆盖.
            if (MaterialInterface != Materials.Last())
            {
                Inputs.OverrideOutput = FScreenPassRenderTarget();
                Inputs.bFlipYAxis = false;
            }
    
            // 增加单个后处理材质通道. (见后面分析)
            Outputs = AddPostProcessMaterialPass(GraphBuilder, View, Inputs, MaterialInterface);
        }
    
        return Outputs;
    }
    
    // 增加单个后处理材质通道.
    FScreenPassTexture AddPostProcessMaterialPass(
        FRDGBuilder& GraphBuilder,
        const FViewInfo& View,
        const FPostProcessMaterialInputs& Inputs,
        const UMaterialInterface* MaterialInterface)
    {
        // 验证输入有效性.
        Inputs.Validate();
    
        // 初始化输入数据.
        const FScreenPassTexture SceneColor = Inputs.GetInput(EPostProcessMaterialInput::SceneColor);
        const ERHIFeatureLevel::Type FeatureLevel = View.GetFeatureLevel();
    
        const FMaterial* Material = nullptr;
        const FMaterialRenderProxy* MaterialRenderProxy = nullptr;
        const FMaterialShaderMap* MaterialShaderMap = nullptr;
        GetMaterialInfo(MaterialInterface, FeatureLevel, Inputs.OutputFormat, Material, MaterialRenderProxy, MaterialShaderMap);
    
        FRHIDepthStencilState* DefaultDepthStencilState = FScreenPassPipelineState::FDefaultDepthStencilState::GetRHI();
        FRHIDepthStencilState* DepthStencilState = DefaultDepthStencilState;
    
        FRDGTextureRef DepthStencilTexture = nullptr;
    
        // Allocate custom depth stencil texture(s) and depth stencil state.
        const ECustomDepthPolicy CustomStencilPolicy = GetMaterialCustomDepthPolicy(Material, FeatureLevel);
    
        if (CustomStencilPolicy == ECustomDepthPolicy::Enabled)
        {
            check(Inputs.CustomDepthTexture);
            DepthStencilTexture = Inputs.CustomDepthTexture;
            DepthStencilState = GetMaterialStencilState(Material);
        }
    
        // 混合状态.
        FRHIBlendState* DefaultBlendState = FScreenPassPipelineState::FDefaultBlendState::GetRHI();
        FRHIBlendState* BlendState = DefaultBlendState;
        
        if (IsMaterialBlendEnabled(Material))
        {
            BlendState = GetMaterialBlendState(Material);
        }
    
        // 处理各种标记.
        const bool bCompositeWithInput = DepthStencilState != DefaultDepthStencilState || BlendState != DefaultBlendState;
        const bool bPrimeOutputColor = bCompositeWithInput || !View.IsFirstInFamily();
        const bool bBackbufferWithDepthStencil = (DepthStencilTexture != nullptr && !GRHISupportsBackBufferWithCustomDepthStencil && Inputs.OverrideOutput.IsValid());
        const bool bCompositeWithInputAndFlipY = bCompositeWithInput && Inputs.bFlipYAxis;
        const bool bCompositeWithInputAndDecode = Inputs.bMetalMSAAHDRDecode && bCompositeWithInput;
        const bool bForceIntermediateTarget = bBackbufferWithDepthStencil || bCompositeWithInputAndFlipY || bCompositeWithInputAndDecode;
    
        // 渲染输出.
        FScreenPassRenderTarget Output = Inputs.OverrideOutput;
    
        // 将场景颜色作为输出.
        if (!Output.IsValid() && !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0) && bPrimeOutputColor && !bForceIntermediateTarget && Inputs.bAllowSceneColorInputAsOutput)
        {
            Output = FScreenPassRenderTarget(SceneColor, ERenderTargetLoadAction::ELoad);
        }
        else
        {
            // 创新新的纹理作为输出.
            if (!Output.IsValid() || bForceIntermediateTarget)
            {
                FRDGTextureDesc OutputDesc = SceneColor.Texture->Desc;
                OutputDesc.Reset();
                if (Inputs.OutputFormat != PF_Unknown)
                {
                    OutputDesc.Format = Inputs.OutputFormat;
                }
                OutputDesc.ClearValue = FClearValueBinding(FLinearColor::Black);
                OutputDesc.Flags |= GFastVRamConfig.PostProcessMaterial;
    
                Output = FScreenPassRenderTarget(GraphBuilder.CreateTexture(OutputDesc, TEXT("PostProcessMaterial")), SceneColor.ViewRect, View.GetOverwriteLoadAction());
            }
    
            if (bPrimeOutputColor || bForceIntermediateTarget)
            {
                // Copy existing contents to new output and use load-action to preserve untouched pixels.
                if (Inputs.bMetalMSAAHDRDecode)
                {
                    AddMobileMSAADecodeAndDrawTexturePass(GraphBuilder, View, SceneColor, Output);
                }
                else
                {
                    AddDrawTexturePass(GraphBuilder, View, SceneColor, Output);
                }
                Output.LoadAction = ERenderTargetLoadAction::ELoad;
            }
        }
    
        const FScreenPassTextureViewport SceneColorViewport(SceneColor);
        const FScreenPassTextureViewport OutputViewport(Output);
    
        RDG_EVENT_SCOPE(GraphBuilder, "PostProcessMaterial %dx%d Material=%s", SceneColorViewport.Rect.Width(), SceneColorViewport.Rect.Height(), *Material->GetFriendlyName());
    
        const uint32 MaterialStencilRef = Material->GetStencilRefValue();
    
        const bool bMobilePlatform = IsMobilePlatform(View.GetShaderPlatform());
    
        // 处理后处理材质参数.
        FPostProcessMaterialParameters* PostProcessMaterialParameters = GraphBuilder.AllocParameters<FPostProcessMaterialParameters>();
        PostProcessMaterialParameters->SceneTextures = Inputs.SceneTextures;
        PostProcessMaterialParameters->View = View.ViewUniformBuffer;
        if (bMobilePlatform)
        {
            PostProcessMaterialParameters->EyeAdaptationBuffer = GetEyeAdaptationBuffer(View);
        }
        else
        {
            PostProcessMaterialParameters->EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
        }
        PostProcessMaterialParameters->PostProcessOutput = GetScreenPassTextureViewportParameters(OutputViewport);
        PostProcessMaterialParameters->MobileCustomStencilTexture = DepthStencilTexture;
        PostProcessMaterialParameters->MobileCustomStencilTextureSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
        PostProcessMaterialParameters->MobileStencilValueRef = MaterialStencilRef;
        PostProcessMaterialParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
        PostProcessMaterialParameters->bMetalMSAAHDRDecode = Inputs.bMetalMSAAHDRDecode ? 1 : 0;
    
        // 处理深度模板缓冲.
        if (DepthStencilTexture && !bMobilePlatform)
        {
            PostProcessMaterialParameters->RenderTargets.DepthStencil = FDepthStencilBinding(
                DepthStencilTexture,
                ERenderTargetLoadAction::ELoad,
                ERenderTargetLoadAction::ELoad,
                FExclusiveDepthStencil::DepthRead_StencilRead);
        }
        else if (!DepthStencilTexture && bMobilePlatform && Material->IsStencilTestEnabled())
        {
            PostProcessMaterialParameters->MobileCustomStencilTexture = GSystemTextures.GetBlackDummy(GraphBuilder);
            
            switch (Material->GetStencilCompare())
            {
            case EMaterialStencilCompare::MSC_Less:
                PostProcessMaterialParameters->MobileStencilValueRef = -1;
                break;
            case EMaterialStencilCompare::MSC_LessEqual:
            case EMaterialStencilCompare::MSC_GreaterEqual:
            case EMaterialStencilCompare::MSC_Equal:
                PostProcessMaterialParameters->MobileStencilValueRef = 0;
                break;
            case EMaterialStencilCompare::MSC_Greater:
            case EMaterialStencilCompare::MSC_NotEqual:
                PostProcessMaterialParameters->MobileStencilValueRef = 1;
                break;
            case EMaterialStencilCompare::MSC_Always:
                PostProcessMaterialParameters->MobileStencilValueRef = 256;
                break;
            default:
                break;
            }
        }
    
        // 系统纹理和采样器.
        PostProcessMaterialParameters->PostProcessInput_BilinearSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();;
        const FScreenPassTexture BlackDummy(GSystemTextures.GetBlackDummy(GraphBuilder));
        GraphBuilder.RemoveUnusedTextureWarning(BlackDummy.Texture);
        FRHISamplerState* PointClampSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
    
        // 处理材质后处理参数的输入槽(PostProcessInput0~PostProcessInput4).
        for (uint32 InputIndex = 0; InputIndex < kPostProcessMaterialInputCountMax; ++InputIndex)
        {
            FScreenPassTexture Input = Inputs.GetInput((EPostProcessMaterialInput)InputIndex);
    
            // 如果指定插槽的输入纹理不存在或后处理材质没有用到该插槽,则将输入纹理置空。
            if (!Input.Texture || !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0 + InputIndex))
            {
                Input = BlackDummy;
            }
    
            PostProcessMaterialParameters->PostProcessInput[InputIndex] = GetScreenPassTextureInput(Input, PointClampSampler);
        }
    
        const bool bIsMobile = FeatureLevel <= ERHIFeatureLevel::ES3_1;
        PostProcessMaterialParameters->bFlipYAxis = Inputs.bFlipYAxis && !bForceIntermediateTarget;
    
        // 处理后处理材质的VS和PS.
        FPostProcessMaterialShader::FPermutationDomain PermutationVector;
        PermutationVector.Set<FPostProcessMaterialShader::FMobileDimension>(bIsMobile);
    
        TShaderRef<FPostProcessMaterialVS> VertexShader = MaterialShaderMap->GetShader<FPostProcessMaterialVS>(PermutationVector);
        TShaderRef<FPostProcessMaterialPS> PixelShader = MaterialShaderMap->GetShader<FPostProcessMaterialPS>(PermutationVector);
        ClearUnusedGraphResources(VertexShader, PixelShader, PostProcessMaterialParameters);
    
        EScreenPassDrawFlags ScreenPassFlags = EScreenPassDrawFlags::AllowHMDHiddenAreaMask;
    
        if (PostProcessMaterialParameters->bFlipYAxis)
        {
            ScreenPassFlags |= EScreenPassDrawFlags::FlipYAxis;
        }
    
        // 增加全屏幕绘制.
        AddDrawScreenPass(
            GraphBuilder,
            RDG_EVENT_NAME("PostProcessMaterial"),
            View,
            OutputViewport,
            SceneColorViewport,
            FScreenPassPipelineState(VertexShader, PixelShader, BlendState, DepthStencilState),
            PostProcessMaterialParameters,
            ScreenPassFlags,
            [&View, VertexShader, PixelShader, MaterialRenderProxy, PostProcessMaterialParameters, MaterialStencilRef](FRHICommandListImmediate& RHICmdList)
        {
            FPostProcessMaterialVS::SetParameters(RHICmdList, VertexShader, View, MaterialRenderProxy, *PostProcessMaterialParameters);
            FPostProcessMaterialPS::SetParameters(RHICmdList, PixelShader, View, MaterialRenderProxy, *PostProcessMaterialParameters);
            RHICmdList.SetStencilRef(MaterialStencilRef);
        });
    
        // 处理翻转和输出覆盖.
        if (bForceIntermediateTarget && !bCompositeWithInputAndDecode)
        {
            if (!Inputs.bFlipYAxis)
            {
                // We shouldn't get here unless we had an override target.
                check(Inputs.OverrideOutput.IsValid());
                AddDrawTexturePass(GraphBuilder, View, Output.Texture, Inputs.OverrideOutput.Texture);
                Output = Inputs.OverrideOutput;
            }
            else
            {
                FScreenPassRenderTarget TempTarget = Output;
                if (Inputs.OverrideOutput.IsValid())
                {
                    Output = Inputs.OverrideOutput;
                }
                else
                {
                    Output = FScreenPassRenderTarget(SceneColor, ERenderTargetLoadAction::ENoAction);
                }
    
                AddCopyAndFlipTexturePass(GraphBuilder, View, TempTarget.Texture, Output.Texture);
            }
        }
    
        return MoveTemp(Output);
    }
    

    需要注意,在后处理材质中,处理的PostProcessInput0~PostProcessInput4对应着材质编辑器SceneTexture节点的PostProcessInput(下图)。

    除了PostProcessInput0被SceneColor占用,其它插槽可以搭载自定义的纹理,以便在材质编辑器中访问。示例:

    for (uint32 InputIndex = 0; InputIndex < kPostProcessMaterialInputCountMax; ++InputIndex)
    {
        FScreenPassTexture Input = Inputs.GetInput((EPostProcessMaterialInput)InputIndex);
    
        if (!Input.Texture || !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0 + InputIndex))
        {
            // 如果自定义的输入纹理有效, 则放到插槽4.
            if(MyInput.Texture && InputIndex == 4)
            {
                Input = MyInput;
            }
            else
            {
                Input = BlackDummy;
            }
        }
    
        PostProcessMaterialParameters->PostProcessInput[InputIndex] = GetScreenPassTextureInput(Input, PointClampSampler);
    }
    

    这样就实现了搭载自定义的纹理。当然这种方式带点侵入性,如果需要更加优雅的方式,则需要扩展EPostProcessMaterialInput并修改相关代码。

    7.4 后处理技术

    本章将阐述UE内置的常见的后处理技术,包含部分屏幕空间的渲染技术。

    7.4.1 Gamma和线性空间

    对图形学或PBR有所了解的同学应该清楚,现代图形引擎中存在线性空间的渲染管线和传统的sRGB空间的渲染管线,它们之间的区别如下图所示:

    上:Gamma空间渲染管线。在渲染前后不会对纹理颜色执行线性转换。

    下:线性空间渲染管线。在shader前期去除了Gamma校正,在shader后期恢复Gamma校正。

    为什么线性空间的渲染管线要在前期和后期分别去掉又加回Gamma校正呢?

    这其实是历史遗留的问题。

    早期的电视机采用CRT显像管,由于电压的强度与人眼感知的亮度不成正比,成指数为0.45的指数级曲线。为了解决这个问题,就引入指数为2.2的Gamma校正,强行提升显示图像的数据让电压与人眼感知亮度成线性比例。久而久之,之后的很多硬件设备、色彩空间(如sRGB)、文件格式(如jpeg,png等)、DDC软件(如ps)默认都执行了Gamma校正,并且一直沿用至今。

    虽然当今的液晶显示器不再需要Gamma校正,但为了兼容已经广泛存在的有着Gamma校正的色彩空间及标准,也不得不保留Gamma校正。所以在shader后期还是要恢复Gamma校正,以便图片能够在显示设备正常显示。

    两个不同的渲染管线,对最终的光照结果会产生较大的差异:

    上半部分是线性空间,在不同光强度的反应下,能够得到更加物理正确的结果;下半部分是Gamma空间的计算,对光强度的反应过于强烈,得到过暗或过曝的画面。

    Gamma和线性空间的颜色是可以互相转换的,利用简单的指数运算即可:

    [c'= f(c) = c^n = pow(c, n) ]

    其中,(c)是输入颜色,(c')是输出颜色,(n)是Gamma校正指数。下图中,分别是(n)分别取0.45, 1.0, 2.2的Gamma校正曲线图:

    1.0是线性空间,输入输出值一样;0.45和2.2是Gamma曲线,处于此空间的色彩将被提亮或压暗,并且0.45⋅2.2≈1.00.45⋅2.2≈1.0,以保证两次Gamma校正之后能够恢复到线性空间:

    [c' = f_{gamma2.2}(f_{gamma0.45}(c)) = ({c^{0.45}})^{2.2} = c^{0.45 cdot 2.2} = c^{0.99} approx c ]

    默认情况下,UE的渲染管线已经是线性空间的,意味着所有纹理和颜色在shader计算过程中需要在保持在线性空间状态,最后呈现到屏幕前又需要经过Gamma校正转换成sRGB(如果显示器支持HDR或线性空间,则可以不需要)。

    通常情况下,UE在导入原始的纹理资源时,已经将纹理转换成了线性空间:

    导入sRGB的图片后,UE默认将其转换成线性空间。

    如果导入的图片已经是线性空间的,则需要去掉sRGB的勾选。如果想在材质编辑器中动态地转换Gamma校正,则可以使用类似以下的材质节点:

    当然,绝大多数情况下,在材质编辑器中,我们不需要关心Gamma和线性空间的转换,因为UE已经在背后为我们进行了处理。通常情况下,Gamma恢复会和色调映射一起处理。

    7.4.2 HDR和色调映射

    HDR(High Dynamic Range)即高动态范围,拥有更高的对比度和更广的色域,可分为基于软件的后处理HDR和基于硬件的显示设备HDR。

    与HDR相对立的是LDR(Low Dynamic Range,低动态范围)。

    由于UE等现代引擎都已经支持了线性空间的渲染管线,由此在光照计算过程中可能产生大于普通白色(颜色值为1.0)的成百上千倍的亮度,此时,如果不用某些曲线将其压缩到合理的值域,在很多设备都会显示异常。

    色调映射(Tone Mapping)就是将过高的颜色值调整到和显示设备兼容的色彩范围。

    UE支持基于物理的色调映射技术,被称为ACES Tonemapper。ACES Tonemapper采用以下的曲线来映射线性空间的颜色:

    实现代码如下:

    float3 ACESToneMapping(float3 color, float adapted_lum)
    {
        const float A = 2.51f;
        const float B = 0.03f;
        const float C = 2.43f;
        const float D = 0.59f;
        const float E = 0.14f;
    
        color *= adapted_lum;
        return (color * (A * color + B)) / (color * (C * color + D) + E);
    }
    

    相较旧的色调映射器,ACES Tonemapper的渲染结果更加接近物理真实:

    上图是UE旧的色调映射,下图采用了ACES的色调映射。可见新的色调映射在自发光率足够大时,颜色开始变白,更符合物理真实。

    UE的实际实现代码远上面介绍的复杂,对应的C++实现代码如下:

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessTonemap.cpp
    
    FScreenPassTexture AddTonemapPass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FTonemapInputs& Inputs)
    {
        const FSceneViewFamily& ViewFamily = *(View.Family);
        const FPostProcessSettings& PostProcessSettings = View.FinalPostProcessSettings;
    
        const bool bIsEyeAdaptationResource = (View.GetFeatureLevel() >= ERHIFeatureLevel::SM5) ? Inputs.EyeAdaptationTexture != nullptr : Inputs.EyeAdaptationBuffer != nullptr;
        const bool bEyeAdaptation = ViewFamily.EngineShowFlags.EyeAdaptation && bIsEyeAdaptationResource;
    
        const FScreenPassTextureViewport SceneColorViewport(Inputs.SceneColor);
    
        FScreenPassRenderTarget Output = Inputs.OverrideOutput;
    
        // 创建输出纹理.
        if (!Output.IsValid())
        {
            FRDGTextureDesc OutputDesc = Inputs.SceneColor.Texture->Desc;
            OutputDesc.Reset();
            OutputDesc.Flags |= View.bUseComputePasses ? TexCreate_UAV : TexCreate_RenderTargetable;
            OutputDesc.Flags |= GFastVRamConfig.Tonemap;
            // RGB is the color in LDR, A is the luminance for PostprocessAA
            OutputDesc.Format = Inputs.bOutputInHDR ? GRHIHDRDisplayOutputFormat : PF_B8G8R8A8;
            OutputDesc.ClearValue = FClearValueBinding(FLinearColor(0, 0, 0, 0));
    
            const FTonemapperOutputDeviceParameters OutputDeviceParameters = GetTonemapperOutputDeviceParameters(*View.Family);
            const ETonemapperOutputDevice OutputDevice = static_cast<ETonemapperOutputDevice>(OutputDeviceParameters.OutputDevice);
    
            if (OutputDevice == ETonemapperOutputDevice::LinearEXR)
            {
                OutputDesc.Format = PF_A32B32G32R32F;
            }
            if (OutputDevice == ETonemapperOutputDevice::LinearNoToneCurve || OutputDevice == ETonemapperOutputDevice::LinearWithToneCurve)
            {
                OutputDesc.Format = PF_FloatRGBA;
            }
    
            Output = FScreenPassRenderTarget(
                GraphBuilder.CreateTexture(OutputDesc, TEXT("Tonemap")),
                Inputs.SceneColor.ViewRect,
                ERenderTargetLoadAction::EClear);
        }
    
        const FScreenPassTextureViewport OutputViewport(Output);
    
        FRHITexture* BloomDirtMaskTexture = GBlackTexture->TextureRHI;
    
        if (PostProcessSettings.BloomDirtMask && PostProcessSettings.BloomDirtMask->Resource)
        {
            BloomDirtMaskTexture = PostProcessSettings.BloomDirtMask->Resource->TextureRHI;
        }
    
        // 采样器.
        FRHISamplerState* BilinearClampSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
        FRHISamplerState* PointClampSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
    
        const float DefaultEyeExposure = bEyeAdaptation ? 0.0f : GetEyeAdaptationFixedExposure(View);
    
        const float SharpenDiv6 = FMath::Clamp(CVarTonemapperSharpen.GetValueOnRenderThread(), 0.0f, 10.0f) / 6.0f;
    
        // 处理色差参数.
        FVector4 ChromaticAberrationParams;
        {
            // 处理场景颜色边缘
            // 从百分比到分数
            float Offset = 0.0f;
            float StartOffset = 0.0f;
            float Multiplier = 1.0f;
    
            if (PostProcessSettings.ChromaticAberrationStartOffset < 1.0f - KINDA_SMALL_NUMBER)
            {
                Offset = PostProcessSettings.SceneFringeIntensity * 0.01f;
                StartOffset = PostProcessSettings.ChromaticAberrationStartOffset;
                Multiplier = 1.0f / (1.0f - StartOffset);
            }
    
            // 基色的波长,单位是纳米.
            const float PrimaryR = 611.3f;
            const float PrimaryG = 549.1f;
            const float PrimaryB = 464.3f;
    
            // 简单透镜的色差在波长上大致是线性的.
            float ScaleR = 0.007f * (PrimaryR - PrimaryB);
            float ScaleG = 0.007f * (PrimaryG - PrimaryB);
            ChromaticAberrationParams = FVector4(Offset * ScaleR * Multiplier, Offset * ScaleG * Multiplier, StartOffset, 0.f);
        }
    
        // 处理色调映射参数.
        FTonemapParameters CommonParameters;
        CommonParameters.View = View.ViewUniformBuffer;
        CommonParameters.FilmGrain = GetFilmGrainParameters(View);
        CommonParameters.OutputDevice = GetTonemapperOutputDeviceParameters(ViewFamily);
        CommonParameters.Color = GetScreenPassTextureViewportParameters(SceneColorViewport);
        if (Inputs.Bloom.Texture)
        {
            const FScreenPassTextureViewport BloomViewport(Inputs.Bloom);
            CommonParameters.Bloom = GetScreenPassTextureViewportParameters(BloomViewport);
            CommonParameters.ColorToBloom = GetScreenPassTextureViewportTransform(CommonParameters.Color, CommonParameters.Bloom);
        }
        CommonParameters.Output = GetScreenPassTextureViewportParameters(OutputViewport);
        CommonParameters.ColorTexture = Inputs.SceneColor.Texture;
        CommonParameters.BloomTexture = Inputs.Bloom.Texture;
        CommonParameters.EyeAdaptationTexture = Inputs.EyeAdaptationTexture;
        CommonParameters.ColorGradingLUT = Inputs.ColorGradingTexture;
        CommonParameters.BloomDirtMaskTexture = BloomDirtMaskTexture;
        CommonParameters.ColorSampler = BilinearClampSampler;
        CommonParameters.BloomSampler = BilinearClampSampler;
        CommonParameters.ColorGradingLUTSampler = BilinearClampSampler;
        CommonParameters.BloomDirtMaskSampler = BilinearClampSampler;
        CommonParameters.ColorScale0 = PostProcessSettings.SceneColorTint;
        CommonParameters.ColorScale1 = FLinearColor::White * PostProcessSettings.BloomIntensity;
        CommonParameters.BloomDirtMaskTint = PostProcessSettings.BloomDirtMaskTint * PostProcessSettings.BloomDirtMaskIntensity;
        CommonParameters.ChromaticAberrationParams = ChromaticAberrationParams;
        CommonParameters.TonemapperParams = FVector4(PostProcessSettings.VignetteIntensity, SharpenDiv6, 0.0f, 0.0f);
        CommonParameters.SwitchVerticalAxis = Inputs.bFlipYAxis;
        CommonParameters.DefaultEyeExposure = DefaultEyeExposure;
        CommonParameters.EditorNITLevel = EditorNITLevel;
        CommonParameters.bOutputInHDR = ViewFamily.bIsHDR;
        CommonParameters.LensPrincipalPointOffsetScale = View.LensPrincipalPointOffsetScale;
        CommonParameters.LensPrincipalPointOffsetScaleInverse.X = -View.LensPrincipalPointOffsetScale.X / View.LensPrincipalPointOffsetScale.Z;
        CommonParameters.LensPrincipalPointOffsetScaleInverse.Y = -View.LensPrincipalPointOffsetScale.Y / View.LensPrincipalPointOffsetScale.W;
        CommonParameters.LensPrincipalPointOffsetScaleInverse.Z = 1.0f / View.LensPrincipalPointOffsetScale.Z;
        CommonParameters.LensPrincipalPointOffsetScaleInverse.W = 1.0f / View.LensPrincipalPointOffsetScale.W;
        CommonParameters.EyeAdaptationBuffer = Inputs.EyeAdaptationBuffer;
    
        // 处理桌面版色调映射的排列.
        TonemapperPermutation::FDesktopDomain DesktopPermutationVector;
        {
            TonemapperPermutation::FCommonDomain CommonDomain = TonemapperPermutation::BuildCommonPermutationDomain(View, Inputs.bGammaOnly, Inputs.bFlipYAxis, Inputs.bMetalMSAAHDRDecode);
            DesktopPermutationVector.Set<TonemapperPermutation::FCommonDomain>(CommonDomain);
    
            if (!CommonDomain.Get<TonemapperPermutation::FTonemapperGammaOnlyDim>())
            {
                // 量化颗粒.
                {
                    static TConsoleVariableData<int32>* CVar = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.Tonemapper.GrainQuantization"));
                    const int32 Value = CVar->GetValueOnRenderThread();
                    DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperGrainQuantizationDim>(Value > 0);
                }
    
                DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperColorFringeDim>(PostProcessSettings.SceneFringeIntensity > 0.01f);
            }
    
            DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperOutputDeviceDim>(ETonemapperOutputDevice(CommonParameters.OutputDevice.OutputDevice));
    
            DesktopPermutationVector = TonemapperPermutation::RemapPermutation(DesktopPermutationVector, View.GetFeatureLevel());
        }
    
        const bool bComputePass = (Output.Texture->Desc.Flags & TexCreate_UAV) == TexCreate_UAV ? View.bUseComputePasses : false;
    
        if (bComputePass) // 启用CS.
        {
            FTonemapCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTonemapCS::FParameters>();
            PassParameters->Tonemap = CommonParameters;
            PassParameters->RWOutputTexture = GraphBuilder.CreateUAV(Output.Texture);
    
            FTonemapCS::FPermutationDomain PermutationVector;
            PermutationVector.Set<TonemapperPermutation::FDesktopDomain>(DesktopPermutationVector);
            PermutationVector.Set<TonemapperPermutation::FTonemapperEyeAdaptationDim>(bEyeAdaptation);
    
            TShaderMapRef<FTonemapCS> ComputeShader(View.ShaderMap, PermutationVector);
    
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("Tonemap %dx%d (CS GammaOnly=%d)", OutputViewport.Rect.Width(), OutputViewport.Rect.Height(), Inputs.bGammaOnly),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(OutputViewport.Rect.Size(), FIntPoint(GTonemapComputeTileSizeX, GTonemapComputeTileSizeY)));
        }
        else // 启用PS
        {
            FTonemapPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTonemapPS::FParameters>();
            PassParameters->Tonemap = CommonParameters;
            PassParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
    
            FTonemapVS::FPermutationDomain VertexPermutationVector;
            VertexPermutationVector.Set<TonemapperPermutation::FTonemapperSwitchAxis>(Inputs.bFlipYAxis);
            VertexPermutationVector.Set<TonemapperPermutation::FTonemapperEyeAdaptationDim>(bEyeAdaptation);
    
            TShaderMapRef<FTonemapVS> VertexShader(View.ShaderMap, VertexPermutationVector);
            TShaderMapRef<FTonemapPS> PixelShader(View.ShaderMap, DesktopPermutationVector);
    
            const bool bIsStereo = IStereoRendering::IsStereoEyeView(View);
            FRHIBlendState* BlendState = Inputs.bWriteAlphaChannel || bIsStereo ? FScreenPassPipelineState::FDefaultBlendState::GetRHI() : TStaticBlendStateWriteMask<CW_RGB>::GetRHI();
            FRHIDepthStencilState* DepthStencilState = FScreenPassPipelineState::FDefaultDepthStencilState::GetRHI();
    
            EScreenPassDrawFlags DrawFlags = EScreenPassDrawFlags::AllowHMDHiddenAreaMask;
    
            // 绘制全屏纹理.
            AddDrawScreenPass(
                GraphBuilder,
                RDG_EVENT_NAME("Tonemap %dx%d (PS GammaOnly=%d)", OutputViewport.Rect.Width(), OutputViewport.Rect.Height(), Inputs.bGammaOnly),
                View,
                OutputViewport,
                SceneColorViewport,
                FScreenPassPipelineState(VertexShader, PixelShader, BlendState, DepthStencilState),
                PassParameters,
                DrawFlags,
                [VertexShader, PixelShader, PassParameters](FRHICommandList& RHICmdList)
            {
                SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), PassParameters->Tonemap);
                SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
            });
        }
    
        return MoveTemp(Output);
    }
    

    由于在笔者的PC电脑上,运行的是PS分支的色调映射,下面就分析其使用的PS及相关代码:

    // EngineShadersPrivatePostProcessTonemap.usf
    
    // PS入口.
    void MainPS(
        in noperspective float2 UV : TEXCOORD0,
        in noperspective float3 InExposureScaleVignette : TEXCOORD1,
        in noperspective float4 GrainUV : TEXCOORD2,
        in noperspective float2 ScreenPos : TEXCOORD3,
        in noperspective float2 FullViewUV : TEXCOORD4,
        float4 SvPosition : SV_POSITION,        // after all interpolators
        out float4 OutColor : SV_Target0
        )
    {
        OutColor = TonemapCommonPS(UV, InExposureScaleVignette, GrainUV, ScreenPos, FullViewUV, SvPosition);
    }
    

    PS主入口会调用TonemapCommonPS,由于TonemapCommonPS存在大量宏定义,影响正常主流程分析,下面直接用RenderDoc截帧得到的简化版代码:

    // 注意是RenderDoc截帧得到的简化版代码, 非原版代码.
    float4 TonemapCommonPS(
        float2 UV,
        float3 ExposureScaleVignette,
        float4 GrainUV,
        float2 ScreenPos,
        float2 FullViewUV,
        float4 SvPosition
        )
    {
        float4 OutColor = 0;
    
        const float OneOverPreExposure = View_OneOverPreExposure;
        float  Grain = GrainFromUV(GrainUV.zw);
        float2 SceneUV = UV.xy;
        
        // 获取场景颜色
        float4  SceneColor = SampleSceneColor(SceneUV);
        SceneColor.rgb *= OneOverPreExposure;
        
        float ExposureScale = ExposureScaleVignette.x;
        float SharpenMultiplierDiv6 = TonemapperParams.y;
        
        float3  LinearColor = SceneColor.rgb * ColorScale0.rgb;
        
        // Bloom
        float2 BloomUV = ColorToBloom_Scale * UV + ColorToBloom_Bias;
        BloomUV = clamp(BloomUV, Bloom_UVViewportBilinearMin, Bloom_UVViewportBilinearMax);
        float4 CombinedBloom = Texture2DSample(BloomTexture, BloomSampler, BloomUV);
        CombinedBloom.rgb *= OneOverPreExposure;
    
        // 暗角参数.
        float2 DirtLensUV = ConvertScreenViewportSpaceToLensViewportSpace(ScreenPos) * float2(1.0f, -1.0f);
        float3 BloomDirtMaskColor = Texture2DSample(BloomDirtMaskTexture, BloomDirtMaskSampler, DirtLensUV * .5f + .5f).rgb * BloomDirtMaskTint.rgb;
        
        LinearColor += CombinedBloom.rgb * (ColorScale1.rgb + BloomDirtMaskColor);
        LinearColor *= ExposureScale;
        // 暗角.
        LinearColor.rgb *= ComputeVignetteMask( ExposureScaleVignette.yz, TonemapperParams.x );
    
        // LUT
        float3  OutDeviceColor = ColorLookupTable(LinearColor);
    
        // 颗粒.
        float  LuminanceForPostProcessAA = dot(OutDeviceColor,  float3 (0.299f, 0.587f, 0.114f));
        float  GrainQuantization = 1.0/256.0;
        float  GrainAdd = (Grain * GrainQuantization) + (-0.5 * GrainQuantization);
        OutDeviceColor.rgb += GrainAdd;
    
        OutColor = float4(OutDeviceColor, saturate(LuminanceForPostProcessAA));
    
        // HDR输出.
        [branch]
        if(bOutputInHDR)
        {
            OutColor.rgb = ST2084ToLinear(OutColor.rgb);
            OutColor.rgb = OutColor.rgb / EditorNITLevel;
            OutColor.rgb = LinearToPostTonemapSpace(OutColor.rgb);
        }
    
        return OutColor;
    }
    

    色调映射阶段处理和组合了颗粒、暗角、Bloom、曝光、LUT、HDR等处理,不过,这里有点奇怪,为什么没有找到色调映射相关的代码?

    结合RenderDoc截帧,可以发现端倪,原来答案就藏在ColorLookupTable,这里的LUT查找不是简单的ColorGrading之类的效果,而是将执行了色调映射。下面进入ColorLookupTable

    Texture3D ColorGradingLUT;
    SamplerState ColorGradingLUTSampler;
    static const float LUTSize = 32;
    
    float3 LinToLog( float3 LinearColor )
    {
        const float LinearRange = 14;
        const float LinearGrey = 0.18;
        const float ExposureGrey = 444;
    
        // 使用剥离,“纯对数”公式。参数化的灰点和动态范围覆盖。
        float3 LogColor = log2(LinearColor) / LinearRange - log2(LinearGrey) / LinearRange + ExposureGrey / 1023.0;
        LogColor = saturate( LogColor );
    
        return LogColor;
    }
    
    float3  ColorLookupTable(  float3  LinearColor )
    {
        float3 LUTEncodedColor;
        // 线性转Log空间.
        LUTEncodedColor = LinToLog( LinearColor + LogToLin( 0 ) );
    
        // 将float转成int.
        float3 UVW = LUTEncodedColor * ((LUTSize - 1) / LUTSize) + (0.5f / LUTSize);
        // 采样3D的ColorGradingLUT纹理.
        float3  OutDeviceColor = Texture3DSample( ColorGradingLUT, ColorGradingLUTSampler, UVW ).rgb;
    
        return OutDeviceColor * 1.05;
    }
    

    下面简单分析ColorGradingLUT的生成过程。从截帧数据可以看到ColorGradingLUT是在Tonemap之前的CombineLUT生成的:

    它是一个32x32x32的3D纹理,下图是切片0的颜色值(放大8倍):

    需要注意的是,ColorGradingLUT会每帧动态生成,根据场景颜色动态调整,不是生成一次之后就缓存起来。下面直接进入其使用的PS代码(RenderDoc截帧的简化版本):

    // EngineShadersPrivatePostProcessCombineLUTs.usf
    
    // 校正颜色.
    float3 ColorCorrect( float3 WorkingColor,
        float4 ColorSaturation,
        float4 ColorContrast,
        float4 ColorGamma,
        float4 ColorGain,
        float4 ColorOffset )
    {
        float Luma = dot( WorkingColor, AP1_RGB2Y );
        WorkingColor = max( 0, lerp( Luma.xxx, WorkingColor, ColorSaturation.xyz*ColorSaturation.w ) );
        WorkingColor = pow( WorkingColor * (1.0 / 0.18), ColorContrast.xyz*ColorContrast.w ) * 0.18;
        WorkingColor = pow( WorkingColor, 1.0 / (ColorGamma.xyz*ColorGamma.w) );
        WorkingColor = WorkingColor * (ColorGain.xyz * ColorGain.w) + (ColorOffset.xyz + ColorOffset.w);
        return WorkingColor;
    }
    
    // 对颜色的阴影/中调/高调执行校正.
    float3 ColorCorrectAll( float3 WorkingColor )
    {
        float Luma = dot( WorkingColor, AP1_RGB2Y );
    
        float3 CCColorShadows = ColorCorrect(...);
        float CCWeightShadows = 1- smoothstep(0, ColorCorrectionShadowsMax, Luma);
    
        float3 CCColorHighlights = ColorCorrect(...);
        float CCWeightHighlights = smoothstep(ColorCorrectionHighlightsMin, 1, Luma);
    
        float3 CCColorMidtones = ColorCorrect...);
        float CCWeightMidtones = 1 - CCWeightShadows - CCWeightHighlights;
    
        float3 WorkingColorSMH = CCColorShadows*CCWeightShadows + CCColorMidtones*CCWeightMidtones + CCColorHighlights*CCWeightHighlights;
    
        return WorkingColorSMH;
    }
    
    float BlueCorrection;
    float ExpandGamut;
    float ToneCurveAmount;
    
    float4 CombineLUTsCommon(float2 InUV, uint InLayerIndex)
    {
        // 计算自然色彩.
        float4 Neutral;
        {
            float2 UV = InUV - float2(0.5f / LUTSize, 0.5f / LUTSize);
            Neutral = float4(UV * LUTSize / (LUTSize - 1), InLayerIndex / (LUTSize - 1), 0);
        }
    
        float4 OutColor = 0;
    
        // 初始化颜色转换系数.
        const float3x3 sRGB_2_AP1 = mul( XYZ_2_AP1_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
        const float3x3 AP1_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP1_2_XYZ_MAT ) );
        const float3x3 AP0_2_AP1 = mul( XYZ_2_AP1_MAT, AP0_2_XYZ_MAT );
        const float3x3 AP1_2_AP0 = mul( XYZ_2_AP0_MAT, AP1_2_XYZ_MAT );
        const float3x3 AP1_2_Output = OuputGamutMappingMatrix( OutputGamut );
    
        float3 LUTEncodedColor = Neutral.rgb;
        float3 LinearColor;
    
        if (GetOutputDevice() >= 3)
            LinearColor = ST2084ToLinear(LUTEncodedColor) * LinearToNitsScaleInverse;
        else
            LinearColor = LogToLin( LUTEncodedColor ) - LogToLin( 0 );
    
        // 色彩平衡.
        float3 BalancedColor = WhiteBalance( LinearColor );
        
        // 计算颜色调整系数.
        float3 ColorAP1 = mul( sRGB_2_AP1, BalancedColor );
        if (!bUseMobileTonemapper)
        {
            float LumaAP1 = dot( ColorAP1, AP1_RGB2Y );
            float3 ChromaAP1 = ColorAP1 / LumaAP1;
            float ChromaDistSqr = dot( ChromaAP1 - 1, ChromaAP1 - 1 );
            float ExpandAmount = ( 1 - exp2( -4 * ChromaDistSqr ) ) * ( 1 - exp2( -4 * ExpandGamut * LumaAP1*LumaAP1 ) );
    
            const float3x3 Wide_2_XYZ_MAT =
            {
                0.5441691, 0.2395926, 0.1666943,
                0.2394656, 0.7021530, 0.0583814,
                -0.0023439, 0.0361834, 1.0552183,
            };
    
            const float3x3 Wide_2_AP1 = mul( XYZ_2_AP1_MAT, Wide_2_XYZ_MAT );
            const float3x3 ExpandMat = mul( Wide_2_AP1, AP1_2_sRGB );
    
            float3 ColorExpand = mul( ExpandMat, ColorAP1 );
            ColorAP1 = lerp( ColorAP1, ColorExpand, ExpandAmount );
        }
    
        // 校正颜色的高中低调.
        ColorAP1 = ColorCorrectAll( ColorAP1 );
        float3 GradedColor = mul( AP1_2_sRGB, ColorAP1 );
    
        // 蓝色校正.
        const float3x3 BlueCorrect =
        {
            0.9404372683, -0.0183068787, 0.0778696104,
            0.0083786969, 0.8286599939, 0.1629613092,
            0.0005471261, -0.0008833746, 1.0003362486
        };
        const float3x3 BlueCorrectInv =
        {
            1.06318, 0.0233956, -0.0865726,
            -0.0106337, 1.20632, -0.19569,
            -0.000590887, 0.00105248, 0.999538
        };
        const float3x3 BlueCorrectAP1 = mul( AP0_2_AP1, mul( BlueCorrect, AP1_2_AP0 ) );
        const float3x3 BlueCorrectInvAP1 = mul( AP0_2_AP1, mul( BlueCorrectInv, AP1_2_AP0 ) );
    
        ColorAP1 = lerp( ColorAP1, mul( BlueCorrectAP1, ColorAP1 ), BlueCorrection );
        
        // Film色调映射.
        float3 ToneMappedColorAP1 = FilmToneMap( ColorAP1 );
        
        ColorAP1 = lerp(ColorAP1, ToneMappedColorAP1, ToneCurveAmount);
        ColorAP1 = lerp( ColorAP1, mul( BlueCorrectInvAP1, ColorAP1 ), BlueCorrection );
    
        float3 FilmColor = max(0, mul( AP1_2_sRGB, ColorAP1 ));
        FilmColor = ColorCorrection( FilmColor );
        
        float3 FilmColorNoGamma = lerp( FilmColor * ColorScale, OverlayColor.rgb, OverlayColor.a );
        GradedColor = lerp(GradedColor * ColorScale, OverlayColor.rgb, OverlayColor.a);
        
        FilmColor = pow( max(0, FilmColorNoGamma), InverseGamma.y );
    
        float3  OutDeviceColor = 0;
        
        // 根据输出设备的类型调用不同的色彩处理, 默认是0.
        if( GetOutputDevice() == 0 )
        {
            // 高阶(原始)颜色是FilmColor.
            float3 OutputGamutColor = FilmColor;
            // 线性空间转到sRGB.
            OutDeviceColor = LinearToSrgb( OutputGamutColor );
        }
        else if( GetOutputDevice() == 1 )
        {
            float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, FilmColor ) );
            OutDeviceColor = LinearTo709Branchless( OutputGamutColor );
        }
        else if( GetOutputDevice() == 3 || GetOutputDevice() == 5 )
        {
            float3 ODTColor = ACESOutputTransforms1000( GradedColor );
            ODTColor = mul( AP1_2_Output, ODTColor );
            OutDeviceColor = LinearToST2084( ODTColor );
        }
        else if( GetOutputDevice() == 4 || GetOutputDevice() == 6 )
        {
            float3 ODTColor = ACESOutputTransforms2000( GradedColor );
            ODTColor = mul( AP1_2_Output, ODTColor );
            OutDeviceColor = LinearToST2084( ODTColor );
        }
        else if( GetOutputDevice() == 7 )
        {
            float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, GradedColor ) );
            OutDeviceColor = LinearToST2084( OutputGamutColor );
        }
        else if( GetOutputDevice() == 8 )
        {
            OutDeviceColor = GradedColor;
        }
        else if (GetOutputDevice() == 9)
        {
            float3 OutputGamutColor = mul(AP1_2_Output, mul(sRGB_2_AP1, FilmColorNoGamma));
            OutDeviceColor = OutputGamutColor;
        }
        else
        {
            float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, FilmColor ) );
            OutDeviceColor = pow( OutputGamutColor, InverseGamma.z );
        }
    
        OutColor.rgb = OutDeviceColor / 1.05;
        OutColor.a = 0;
    
        return OutColor;
    }
    
    // PS主入口.
    void MainPS(FWriteToSliceGeometryOutput Input, out float4 OutColor : SV_Target0)
    {
        OutColor = CombineLUTsCommon(Input.Vertex.UV, Input.LayerIndex);
    }
    

    上面密密麻麻布满了颜色的系数计算、空间转换,着实让人眼花,不过我们只需要重点关注FilmToneMap

    // EngineShadersPrivateTonemapCommon.ush
    
    // 在后处理体积中编辑得到, 然后由C++传入.
    float FilmSlope;
    float FilmToe;
    float FilmShoulder;
    float FilmBlackClip;
    float FilmWhiteClip;
    
    half3 FilmToneMap( half3 LinearColor ) 
    {
        const float3x3 sRGB_2_AP0 = mul( XYZ_2_AP0_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
        const float3x3 sRGB_2_AP1 = mul( XYZ_2_AP1_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
    
        const float3x3 AP0_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP0_2_XYZ_MAT ) );
        const float3x3 AP1_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP1_2_XYZ_MAT ) );
        
        const float3x3 AP0_2_AP1 = mul( XYZ_2_AP1_MAT, AP0_2_XYZ_MAT );
        const float3x3 AP1_2_AP0 = mul( XYZ_2_AP0_MAT, AP1_2_XYZ_MAT );
        
        float3 ColorAP1 = LinearColor;
    
    #if 1
        // 发光模块常数
        const float RRT_GLOW_GAIN = 0.05;
        const float RRT_GLOW_MID = 0.08;
    
        float saturation = rgb_2_saturation( ColorAP0 );
        float ycIn = rgb_2_yc( ColorAP0 );
        float s = sigmoid_shaper( (saturation - 0.4) / 0.2);
        float addedGlow = 1 + glow_fwd( ycIn, RRT_GLOW_GAIN * s, RRT_GLOW_MID);
        ColorAP0 *= addedGlow;
    #endif
    
    #if 1
        // --- 红色修改系数 --- //
        const float RRT_RED_SCALE = 0.82;
        const float RRT_RED_PIVOT = 0.03;
        const float RRT_RED_HUE = 0;
        const float RRT_RED_WIDTH = 135;
        float hue = rgb_2_hue( ColorAP0 );
        float centeredHue = center_hue( hue, RRT_RED_HUE );
        float hueWeight = Square( smoothstep( 0, 1, 1 - abs( 2 * centeredHue / RRT_RED_WIDTH ) ) );
            
        ColorAP0.r += hueWeight * saturation * (RRT_RED_PIVOT - ColorAP0.r) * (1. - RRT_RED_SCALE);
    #endif
        
        // 使用ACEScg基数作为工作空间.
        float3 WorkingColor = mul( AP0_2_AP1_MAT, ColorAP0 );
    
        WorkingColor = max( 0, WorkingColor );
    
        // 准备降低饱和度.
        WorkingColor = lerp( dot( WorkingColor, AP1_RGB2Y ), WorkingColor, 0.96 );
        
        const half ToeScale            = 1 + FilmBlackClip - FilmToe;
        const half ShoulderScale    = 1 + FilmWhiteClip - FilmShoulder;
        
        const float InMatch = 0.18;
        const float OutMatch = 0.18;
    
        float ToeMatch;
        if( FilmToe > 0.8 )
        {
            // 0.18 will be on straight segment
            ToeMatch = ( 1 - FilmToe  - OutMatch ) / FilmSlope + log10( InMatch );
        }
        else
        {
            // 0.18 will be on toe segment
    
            // Solve for ToeMatch such that input of InMatch gives output of OutMatch.
            const float bt = ( OutMatch + FilmBlackClip ) / ToeScale - 1;
            ToeMatch = log10( InMatch ) - 0.5 * log( (1+bt)/(1-bt) ) * (ToeScale / FilmSlope);
        }
    
        float StraightMatch = ( 1 - FilmToe ) / FilmSlope - ToeMatch;
        float ShoulderMatch = FilmShoulder / FilmSlope - StraightMatch;
        
        half3 LogColor = log10( WorkingColor );
        half3 StraightColor = FilmSlope * ( LogColor + StraightMatch );
        
        half3 ToeColor        = (    -FilmBlackClip ) + (2 *      ToeScale) / ( 1 + exp( (-2 * FilmSlope /      ToeScale) * ( LogColor -      ToeMatch ) ) );
        half3 ShoulderColor    = ( 1 + FilmWhiteClip ) - (2 * ShoulderScale) / ( 1 + exp( ( 2 * FilmSlope / ShoulderScale) * ( LogColor - ShoulderMatch ) ) );
    
        ToeColor        = LogColor <      ToeMatch ?      ToeColor : StraightColor;
        ShoulderColor    = LogColor > ShoulderMatch ? ShoulderColor : StraightColor;
    
        half3 t = saturate( ( LogColor - ToeMatch ) / ( ShoulderMatch - ToeMatch ) );
        t = ShoulderMatch < ToeMatch ? 1 - t : t;
        t = (3-2*t)*t*t;
        half3 ToneColor = lerp( ToeColor, ShoulderColor, t );
    
        // 后置降饱和度
        ToneColor = lerp( dot( float3(ToneColor), AP1_RGB2Y ), ToneColor, 0.93 );
    
        return max( 0, ToneColor );
    }
    

    可知UE的Film Tonemapping(电影色调映射)除了常规的色彩空间转换和曲线映射,还增加了Slope(斜率)、Toe(脚趾)、Shoulder(肩部)、Black Clip(黑色裁剪)、White Clip(白色裁剪)等不同色阶的调整,以便艺术家精确地控制画面效果。

    它们在后处理体积中可以编辑:

    上图是UE的默认值,但实际上在代码中,UE还给出了不同游戏和配置的参考系数:

    // Default settings
    Slope = 0.88;
    Toe = 0.55;
    Shoulder = 0.26;
    BlackClip= 0;
    WhiteClip = 0.04;
    
    // Uncharted settings
    Slope = 0.63;
    Toe = 0.55;
    Shoulder = 0.47;
    BlackClip= 0;
    WhiteClip = 0.01;
    
    // HP settings
    Slope = 0.65;
    Toe = 0.63;
    Shoulder = 0.45;
    BlackClip = 0;
    WhiteClip = 0;
    
    // Legacy settings
    Slope = 0.98;
    Toe = 0.3;
    Shoulder = 0.22;
    BlackClip = 0;
    WhiteClip = 0.025;
    
    // ACES settings
    Slope = 0.91;
    Toe = 0.53;
    Shoulder = 0.23;
    BlackClip = 0;
    WhiteClip = 0.035;
    

    更加具体的参数含义和效果变化参加官方文档:Color Grading and Filmic Tonemapper

    值得一提的是,前述代码隐含了大量的色彩空间转换、色调曲线映射等知识点,如果没有接触过这类知识,将会云里雾里。幸好,它们可以在这篇文献Tone Mapping找到理论依据和参考实现,值得一读。通篇理解之后,将会豁然开朗,之前的很多疑团将被解开!

    参考文献Tone Mapping展示XYZ色彩空间转换到sRGB的过程和公式。

    7.4.3 Screen Percentage

    UE存在屏幕百分比(Screen Percentage)技术,用于比显示屏幕分辨率更低的分辨率进行渲染,然后上采样到指定屏幕分辨率。它有两种屏幕百分比:Primary Screen Percentage(主屏幕百分比)Secondary Screen Percentage(次屏幕百分比)

    Primary Screen Percentage是用户可以设置和修改的分辨率比例,它基于以较低的分辨率渲染帧,然后在用户界面(UI)绘制之前将其升级的想法。Secondary Screen Percentage在Primary Screen Percentage之后再一次也是最后一次执行的分辨率上采样通道,它不可在运行时动态地修改,用于高DPI但性能较低的设备,以便使用较低的分辨率然后上采样到高DPI分辨率。

    渲染阶段的较低分辨率经过主屏幕百分比后放大纹理分辨率,经过各种后处理效果后再由次屏幕百分比放大到适配屏幕的分辨率,之后再处理UI。

    在场景视图中可以调整主屏幕百分比:

    在后处理体积中也可以设置主屏幕百分比:

    当然,还可以通过控制台命令来更改:

    r.ScreenPercentage 100
    

    它们的实现在后处理渲染管线的最后阶段:

    void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
    {
        (......)
        
        // 主放大
        if (PassSequence.IsEnabled(EPass::PrimaryUpscale))
        {
            FUpscaleInputs PassInputs;
            PassSequence.AcceptOverrideIfLastPass(EPass::PrimaryUpscale, PassInputs.OverrideOutput);
            PassInputs.SceneColor = SceneColor;
            PassInputs.Method = GetUpscaleMethod();
            PassInputs.Stage = PassSequence.IsEnabled(EPass::SecondaryUpscale) ? EUpscaleStage::PrimaryToSecondary : EUpscaleStage::PrimaryToOutput;
    
            // Panini projection is handled by the primary upscale pass.
            PassInputs.PaniniConfig = PaniniConfig;
    
            SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
        }
    
        // 次放大
        if (PassSequence.IsEnabled(EPass::SecondaryUpscale))
        {
            FUpscaleInputs PassInputs;
            PassSequence.AcceptOverrideIfLastPass(EPass::SecondaryUpscale, PassInputs.OverrideOutput);
            PassInputs.SceneColor = SceneColor;
            PassInputs.Method = View.Family->SecondaryScreenPercentageMethod == ESecondaryScreenPercentageMethod::LowerPixelDensitySimulation ? EUpscaleMethod::SmoothStep : EUpscaleMethod::Nearest;
            PassInputs.Stage = EUpscaleStage::SecondaryToOutput;
    
            SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
        }
    }
    

    主放大和次放大都调用了AddUpscalePass

    // EngineSourceRuntimeRendererPrivatePostProcessPostProcessUpscale.cpp
    
    FScreenPassTexture AddUpscalePass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FUpscaleInputs& Inputs)
    {
        FScreenPassRenderTarget Output = Inputs.OverrideOutput;
    
        // 创建新的输出纹理.
        if (!Output.IsValid())
        {
            FRDGTextureDesc OutputDesc = Inputs.SceneColor.Texture->Desc;
            OutputDesc.Reset();
    
            if (Inputs.Stage == EUpscaleStage::PrimaryToSecondary)
            {
                const FIntPoint SecondaryViewRectSize = View.GetSecondaryViewRectSize();
                QuantizeSceneBufferSize(SecondaryViewRectSize, OutputDesc.Extent);
                Output.ViewRect.Min = FIntPoint::ZeroValue;
                Output.ViewRect.Max = SecondaryViewRectSize;
            }
            else
            {
                OutputDesc.Extent = View.UnscaledViewRect.Max;
                Output.ViewRect = View.UnscaledViewRect;
            }
    
            OutputDesc.Flags |= GFastVRamConfig.Upscale;
    
            Output.Texture = GraphBuilder.CreateTexture(OutputDesc, TEXT("Upscale"));
            Output.LoadAction = ERenderTargetLoadAction::EClear;
        }
    
        const FScreenPassTextureViewport InputViewport(Inputs.SceneColor);
        const FScreenPassTextureViewport OutputViewport(Output);
    
        // Panini投影.
        FPaniniProjectionConfig PaniniConfig = Inputs.PaniniConfig;
        PaniniConfig.Sanitize();
    
        const bool bUsePaniniProjection = PaniniConfig.IsEnabled();
    
        // 上采用参数.
        FUpscaleParameters* PassParameters = GraphBuilder.AllocParameters<FUpscaleParameters>();
        PassParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
        PassParameters->Input = GetScreenPassTextureViewportParameters(InputViewport);
        PassParameters->Output = GetScreenPassTextureViewportParameters(OutputViewport);
        PassParameters->SceneColorTexture = Inputs.SceneColor.Texture;
        PassParameters->SceneColorSampler = TStaticSamplerState<SF_Bilinear, AM_Border, AM_Border, AM_Border>::GetRHI();
        PassParameters->PointSceneColorTexture = Inputs.SceneColor.Texture;
        PassParameters->PointSceneColorTextureArray = Inputs.SceneColor.Texture;
        PassParameters->PointSceneColorSampler = TStaticSamplerState<SF_Point, AM_Border, AM_Border, AM_Border>::GetRHI();
        PassParameters->Panini = GetPaniniProjectionParameters(PaniniConfig, View);
        PassParameters->UpscaleSoftness = FMath::Clamp(CVarUpscaleSoftness.GetValueOnRenderThread(), 0.0f, 1.0f);
        PassParameters->View = View.ViewUniformBuffer;
    
        // 处理FUpscalePS.
        FUpscalePS::FPermutationDomain PixelPermutationVector;
        PixelPermutationVector.Set<FUpscalePS::FMethodDimension>(Inputs.Method);
        TShaderMapRef<FUpscalePS> PixelShader(View.ShaderMap, PixelPermutationVector);
    
        const TCHAR* const StageNames[] = { TEXT("PrimaryToSecondary"), TEXT("PrimaryToOutput"), TEXT("SecondaryToOutput") };
        static_assert(UE_ARRAY_COUNT(StageNames) == static_cast<uint32>(EUpscaleStage::MAX), "StageNames does not match EUpscaleStage");
        const TCHAR* StageName = StageNames[static_cast<uint32>(Inputs.Stage)];
    
        GraphBuilder.AddPass(
            RDG_EVENT_NAME("Upscale (%s) %dx%d", StageName, Output.ViewRect.Width(), Output.ViewRect.Height()),
            PassParameters,
            ERDGPassFlags::Raster,
            [&View, bUsePaniniProjection, PixelShader, PassParameters, InputViewport, OutputViewport](FRHICommandList& RHICmdList)
        {
            RHICmdList.SetViewport(OutputViewport.Rect.Min.X, OutputViewport.Rect.Min.Y, 0.0f, OutputViewport.Rect.Max.X, OutputViewport.Rect.Max.Y, 1.0f);
    
            TShaderRef<FShader> VertexShader;
            // Panini投影使用特殊的VS. 亦即在VS里处理Panini投影.
            if (bUsePaniniProjection)
            {
                TShaderMapRef<FUpscaleVS> TypedVertexShader(View.ShaderMap);
                SetScreenPassPipelineState(RHICmdList, FScreenPassPipelineState(TypedVertexShader, PixelShader));
                SetShaderParameters(RHICmdList, TypedVertexShader, TypedVertexShader.GetVertexShader(), *PassParameters);
                VertexShader = TypedVertexShader;
            }
            else
            {
                TShaderMapRef<FScreenPassVS> TypedVertexShader(View.ShaderMap);
                SetScreenPassPipelineState(RHICmdList, FScreenPassPipelineState(TypedVertexShader, PixelShader));
                VertexShader = TypedVertexShader;
            }
            check(VertexShader.IsValid());
    
            SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
    
            // 全屏绘制.
            DrawRectangle(
                RHICmdList,
                // Output Rect (RHI viewport relative).
                0, 0, OutputViewport.Rect.Width(), OutputViewport.Rect.Height(),
                // Input Rect
                InputViewport.Rect.Min.X, InputViewport.Rect.Min.Y, InputViewport.Rect.Width(), InputViewport.Rect.Height(),
                OutputViewport.Rect.Size(),
                InputViewport.Extent,
                VertexShader,
                // Panini投影使用曲面细分.
                bUsePaniniProjection ? EDRF_UseTesselatedIndexBuffer : EDRF_UseTriangleOptimization);
        });
    
        return MoveTemp(Output);
    }
    

    以上可知根据是否Panini投影,会使用不同的VS,但PS一样,都是FUpscaleVS,下面分析FUpscaleVS的shader代码:

    // EngineShadersPrivatePostProcessUpscale.usf
    
    void MainPS(noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
    {
        OutColor = 0;
    
        // 最近点上采样.(不会模糊, 但有块状)
    #if METHOD == UPSCALE_METHOD_NEAREST
        #if ES3_1_PROFILE
            #if MOBILE_MULTI_VIEW
                OutColor = Texture2DArraySample(PointSceneColorTextureArray, PointSceneColorSampler, float3(UVAndScreenPos.xy,0));
            #else
                OutColor = Texture2DSample(PointSceneColorTexture, PointSceneColorSampler, UVAndScreenPos.xy);
            #endif
        #else
            #if MOBILE_MULTI_VIEW
                OutColor = PointSceneColorTextureArray.SampleLevel(PointSceneColorSampler, vec3(UVAndScreenPos.xy,0), 0, int2(0, 0));
            #else
                OutColor = PointSceneColorTexture.SampleLevel(PointSceneColorSampler, UVAndScreenPos.xy, 0, int2(0, 0));
            #endif
        #endif
    
        // 双线性上采样.(快, 但有锯齿)
    #elif METHOD == UPSCALE_METHOD_BILINEAR
        OutColor.rgb = SampleSceneColorRGB(UVAndScreenPos.xy);
    
        // 定向模糊上采样, 使用了与不锐利的蒙版.
    #elif METHOD == UPSCALE_METHOD_DIRECTIONAL
        float2 UV = UVAndScreenPos.xy;
        float X = 0.5;
        float3 ColorNW = SampleSceneColorRGB(UV + float2(-X, -X) * Input_ExtentInverse);
        float3 ColorNE = SampleSceneColorRGB(UV + float2( X, -X) * Input_ExtentInverse);
        float3 ColorSW = SampleSceneColorRGB(UV + float2(-X,  X) * Input_ExtentInverse);
        float3 ColorSE = SampleSceneColorRGB(UV + float2( X,  X) * Input_ExtentInverse);
        OutColor.rgb = (ColorNW * 0.25) + (ColorNE * 0.25) + (ColorSW * 0.25) + (ColorSE * 0.25);
        
        float LumaNW = Luma(ColorNW);
        float LumaNE = Luma(ColorNE);
        float LumaSW = Luma(ColorSW);
        float LumaSE = Luma(ColorSE);
    
        float2 IsoBrightnessDir;
        float DirSWMinusNE = LumaSW - LumaNE;
        float DirSEMinusNW = LumaSE - LumaNW;
        IsoBrightnessDir.x = DirSWMinusNE + DirSEMinusNW;
        IsoBrightnessDir.y = DirSWMinusNE - DirSEMinusNW;
    
        // avoid NaN on zero vectors by adding 2^-24 (float ulp when length==1, and also minimum representable half)
        IsoBrightnessDir = IsoBrightnessDir * (0.125 * rsqrt(dot(IsoBrightnessDir, IsoBrightnessDir) + 6e-8));
    
        float3 ColorN = SampleSceneColorRGB(UV - IsoBrightnessDir * Input_ExtentInverse);
        float3 ColorP = SampleSceneColorRGB(UV + IsoBrightnessDir * Input_ExtentInverse);
    
        float UnsharpMask = 0.25;
        OutColor.rgb = (ColorN + ColorP) * ((UnsharpMask + 1.0) * 0.5) - (OutColor.rgb * UnsharpMask);
    
        // 双立方的Catmull-Rom上采样, 每像素使用5个采样点.
    #elif METHOD == UPSCALE_METHOD_CATMULL_ROM
        FCatmullRomSamples Samples = GetBicubic2DCatmullRomSamples(UVAndScreenPos.xy, Input_Extent, Input_ExtentInverse);
        for (uint i = 0; i < Samples.Count; i++)
        {
            OutColor.rgb += SampleSceneColorRGB(Samples.UV[i]) * Samples.Weight[i];
        }
        OutColor *= Samples.FinalMultiplier;
    
        // LANCZOS上采样.
    #elif METHOD == UPSCALE_METHOD_LANCZOS
        {
            // Lanczos 3
            float2 UV = UVAndScreenPos.xy * Input_Extent;
            float2 tc = floor(UV - 0.5) + 0.5;
            float2 f = UV - tc + 2;
    
            // compute at f, f-1, f-2, f-3, f-4, and f-5 using trig angle addition
            float2 fpi = f*PI, fpi3 = f * (PI / 3.0);
            float2 sinfpi = sin(fpi), sinfpi3 = sin(fpi3), cosfpi3 = cos(fpi3);
            const float r3 = sqrt(3.0);
            float2 w0 = ( sinfpi *       sinfpi3              ) / ( f       * f       );
            float2 w1 = (-sinfpi * (     sinfpi3 - r3*cosfpi3)) / ((f - 1.0)*(f - 1.0));
            float2 w2 = ( sinfpi * (    -sinfpi3 - r3*cosfpi3)) / ((f - 2.0)*(f - 2.0));
            float2 w3 = (-sinfpi * (-2.0*sinfpi3             )) / ((f - 3.0)*(f - 3.0));
            float2 w4 = ( sinfpi * (    -sinfpi3 + r3*cosfpi3)) / ((f - 4.0)*(f - 4.0));
            float2 w5 = (-sinfpi * (     sinfpi3 + r3*cosfpi3)) / ((f - 5.0)*(f - 5.0));
    
            // use bilinear texture weights to merge center two samples in each dimension
            float2 Weight[5];
            Weight[0] = w0;
            Weight[1] = w1;
            Weight[2] = w2 + w3;
            Weight[3] = w4;
            Weight[4] = w5;
    
            float2 Sample[5];
            Sample[0] = Input_ExtentInverse * (tc - 2);
            Sample[1] = Input_ExtentInverse * (tc - 1);
            Sample[2] = Input_ExtentInverse * (tc + w3 / Weight[2]);
            Sample[3] = Input_ExtentInverse * (tc + 2);
            Sample[4] = Input_ExtentInverse * (tc + 3);
    
            OutColor = 0;
    
            // 5x5 footprint with corners dropped to give 13 texture taps
            OutColor += float4(SampleSceneColorRGB(float2(Sample[0].x, Sample[2].y)), 1) * Weight[0].x * Weight[2].y;
    
            OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[1].y)), 1) * Weight[1].x * Weight[1].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[2].y)), 1) * Weight[1].x * Weight[2].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[3].y)), 1) * Weight[1].x * Weight[3].y;
    
            OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[0].y)), 1) * Weight[2].x * Weight[0].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[1].y)), 1) * Weight[2].x * Weight[1].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[2].y)), 1) * Weight[2].x * Weight[2].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[3].y)), 1) * Weight[2].x * Weight[3].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[4].y)), 1) * Weight[2].x * Weight[4].y;
    
            OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[1].y)), 1) * Weight[3].x * Weight[1].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[2].y)), 1) * Weight[3].x * Weight[2].y;
            OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[3].y)), 1) * Weight[3].x * Weight[3].y;
    
            OutColor += float4(SampleSceneColorRGB(float2(Sample[4].x, Sample[2].y)), 1) * Weight[4].x * Weight[2].y;
    
            OutColor /= OutColor.w;
        }
    
        // 高斯上采样.
    #elif METHOD == UPSCALE_METHOD_GAUSSIAN
        {
            // Gaussian filtered unsharp mask
            float2 UV = UVAndScreenPos.xy * Input_Extent;
            float2 tc = floor(UV) + 0.5;
    
            // estimate pixel value and derivatives
            OutColor = 0;
            float3 Laplacian = 0;
            UNROLL for (int i = -3; i <= 2; ++i)
            {
                UNROLL for (int j = -3; j <= 2; ++j)
                {
                    float2 TexelOffset = float2(i, j) + 0.5;
    
                    // skip corners: eliminated entirely by UNROLL
                    if (dot(TexelOffset, TexelOffset) > 9) continue;
    
                    float2 Texel = tc + TexelOffset;
                    float2 Offset = UV - Texel;
                    float OffsetSq = 2 * dot(Offset, Offset);    // texel loop is optimized for variance = 0.5
                    float Weight = exp(-0.5 * OffsetSq);
                    float4 Sample = Weight * float4(SampleSceneColorRGB(Texel * Input_ExtentInverse), 1);
    
                    OutColor += Sample;
                    Laplacian += Sample.rgb * (OffsetSq - 2);
                }
            }
            OutColor /= OutColor.a;
            Laplacian /= OutColor.a;
    
            float UnsharpScale = UpscaleSoftness * (1 - Input_Extent.x * Input_Extent.y * Output_ViewportSizeInverse.x * Output_ViewportSizeInverse.y);
            OutColor.rgb -= UnsharpScale * Laplacian;
        }
    
        // 平滑采样.
    #elif METHOD == UPSCALE_METHOD_SMOOTHSTEP
        OutColor.rgb = SampleSceneColorRGB(GetSmoothstepUV(UVAndScreenPos.xy, Input_Extent, Input_ExtentInverse));
    #endif
    }
    

    上面涉及到了部分纹理过滤和采样技术:最近点、双线性、双立方、Lanczos等,其中部分采样曲线示意图如下:

    部分曲线的效果对比图如下:

    这里说一下最复杂的Lanczos采样算法。Lanczos的卷积核普态公式如下:

    [L(x) = egin{cases} 1 & ext{if} x = 0, \ dfrac{a sin(pi x) sin(pi x / a)}{pi^2 x^2} & ext{if} -a leq x < a ext{and} x eq 0, \ 0 & ext{otherwise}. end{cases} ]

    其中(a)是正整数,通常是2或3,表示卷积核的尺寸。当(a=2)(=3)时,卷积核曲线如下所示:

    利用卷积核公式,可以获得Lanczos的插值(采样)公式:

    [S(x) = sum_{i=lfloor x floor - a + 1}^{lfloor x floor + a} s_{i} L(x - i) ]

    其中(x)是采样位置,(a)是过滤尺寸大小,(lfloor x floor)floor函数。

    不过上面的PS的shader代码中并没有完全按照公式实现,而是对三角函数运算和循环语句做了优化。

    此外,上面的C++代码中涉及到了Panini Projection,它是用来校正广角FOV的透视畸变。

    上:未采用Panini Projection,画面产生了明显的畸变;下:采用了Panini Projection,画面恢复正常。

    7.4.4 FXAA

    UE的内置抗锯齿算法有MSAA、FXAA、TAA,MSAA主要用于前向渲染的基于硬件抗锯齿的算法,TAA主要用于延迟渲染的抗锯齿算法,而FXAA是基于后处理的抗锯齿算法。它们的比较如下表:

    适用管线 效果 消耗 其它描述
    MSAA 前向 清晰度高,抗锯齿好 带宽中,显存中 需要额外记录采样覆盖数据
    FXAA 前向,延迟 清晰度较高,抗锯齿较好 带宽低,显存低 不需要额外记录数据,计算量较大
    TAA 延迟 清晰度较低,存在延时、闪烁、鬼影等,但静态画面抗锯齿非常好 带宽高,显存高 需要速度缓冲和额外记录历史帧数据

    UE4.26延迟渲染管线下的TAA和FXAA对比图。上面是TAA,下面是FXAA。

    FXAA全称Fast approXimate Anti-Aliasing,是就职于NVIDIA的Timothy Lottes首次提出的一种快速近似MSAA的后处理抗锯齿方法,他分别在2009年、2011年发表了文献FXAAFiltering Approaches for Real-Time Anti-Aliasing

    FXAA的核心算法如下图和文字所示(序号和图片一一对应):

    1、输入一副没有抗锯齿的sRGB颜色空间的纹理,在Shader逻辑中,它将内部转换为亮度的估计标量值。

    2、检查局部对比度以避免处理非边缘的部分。检测到的边缘数据放到R通道,用向黄色混合来表示检测到的子像素锯齿量。

    这一步实现中做了优化,会对低对比度(非边缘)的像素进行早期返回(Early Exit)。

    3、通过局部对比度检测的像素被分类为水平(金色)或垂直(蓝色)。

    4、给定边缘方向,选择与边缘成90度的最高对比度的像素作为一对(Pair),用蓝/绿表示。

    5、在边缘的正负(红/蓝)方向上搜索边缘末端(end-of-edge)。检查沿边缘高对比度像素对(Pair)的平均亮度的显著变化。

    6、给定边缘末端,将边缘上的像素位置转换为垂直于边缘的子像素偏移90度,以减少锯齿。其中,红/蓝是减少/增加水平偏移,金色/天空蓝是减少/增加垂直偏移。

    7、给定子像素偏移量,对输入纹理重新采样。

    8、最后根据检测到的子像素锯齿量加入一个低通滤波器。

    关于FXAA需要补充几点说明:

    • 由于FXAA不需要额外的纹理数据,输入和输出纹理只有一张,所以可以在一个Pass处理完,带宽和显存消耗低。
    • 要求输入纹理是sRGB,如果是XYZ或线性空间的颜色将得不到预想的效果。
    • 由于FXAA需要进行多次步骤的计算,因此计算消耗理论上要比MSAA高,相当于时间换空间。
    • FXAA是基于屏幕空间的后处理算法,不需要用到法线、深度等GBuffer数据。
    • 由于FXAA只根据颜色的亮度来查找边缘,所以效果有限,无法检测出深度边缘和曲面边缘。
    • UE4.26的实现正是基于第二篇文献的算法,有很多种预设(Preset),它们是针对不同平台和质量等级执行的优化和适配。

    下面分析UE的FXAA在PC平台的算法,其它平台核心算法类似,此文不涉及。下面代码涉及的很多后缀,它们的含义如下图:

    上图的各个缩写含义如下:

    • M:Median,中心像素。
    • N:North,M上面的像素。
    • S:South,M下面的像素。
    • W:West,M左边的像素。
    • E:East,M右边的像素。
    • NW:Northwest,M左上角的像素。
    • NE:Northeast,M右上角的像素。
    • SW:Southwest,M左下角的像素。
    • SE:Southeast,M右下角的像素。
    // EngineShadersPrivateFXAAShader.usf
    
    // 包含NVIDIA的FXAA实现文件.
    #include "Fxaa3_11.ush"
    
    // FXAA的PS主入口.
    void FxaaPS(noperspective float2 TexCenter : TEXCOORD0, noperspective float4 TexCorners : TEXCOORD1, out float4 OutColor : SV_Target0)
    {
        FxaaTex TextureAndSampler;
        TextureAndSampler.tex = Input_Texture;
        TextureAndSampler.smpl = Input_Sampler;
        TextureAndSampler.UVMinMax = float4(Input_UVViewportBilinearMin, Input_UVViewportBilinearMax);
    
        OutColor = FxaaPixelShader(
            TexCenter, TexCorners,
            TextureAndSampler,
            TextureAndSampler,
            TextureAndSampler,
            Input_ExtentInverse,
            fxaaConsoleRcpFrameOpt,
            fxaaConsoleRcpFrameOpt2,
            fxaaConsole360RcpFrameOpt2,
            fxaaQualitySubpix,
            fxaaQualityEdgeThreshold,
            fxaaQualityEdgeThresholdMin,
            fxaaConsoleEdgeSharpness,
            fxaaConsoleEdgeThreshold,
            fxaaConsoleEdgeThresholdMin,
            fxaaConsole360ConstDir);
    
        #if (POST_PROCESS_ALPHA != 2)
            OutColor.a = 1.0;
        #endif
    }
    

    下面直接分析PC平台的FxaaPixelShader

    // EngineShadersPrivateFxaa3_11.ush
    
    #if (FXAA_PC == 1) // 表明是PC平台
    
    FxaaFloat4 FxaaPixelShader(
        FxaaFloat2 pos, // 是像素中心, 这里使用非透视插值 (关闭透视插值).
        FxaaFloat4 fxaaConsolePosPos, // 只用于Console平台.
        FxaaTex tex, // 输入纹理
        FxaaTex fxaaConsole360TexExpBiasNegOne, // 只用于360平台.
        FxaaTex fxaaConsole360TexExpBiasNegTwo, // 只用于360平台.
        FxaaFloat2 fxaaQualityRcpFrame, // 只用于FXAA的质量, 必须是constant/uniform, {x_} = 1.0/screenWidthInPixels, {_y} = 1.0/screenHeightInPixels
        FxaaFloat4 fxaaConsoleRcpFrameOpt, // 只用于360平台.
        FxaaFloat4 fxaaConsoleRcpFrameOpt2, // 只用于360平台.
        FxaaFloat4 fxaaConsole360RcpFrameOpt2, // 只用于360平台.
        FxaaFloat fxaaQualitySubpix, // 只用于FXAA的质量. 控制锐利度.
        FxaaFloat fxaaQualityEdgeThreshold, // 边缘阈值. 只用于FXAA的质量. 
        FxaaFloat fxaaQualityEdgeThresholdMin, // 最小边缘阈值. 只用于FXAA的质量. 
        FxaaFloat fxaaConsoleEdgeSharpness, // 只用于360平台.
        FxaaFloat fxaaConsoleEdgeThreshold,
        FxaaFloat fxaaConsoleEdgeThresholdMin,
        FxaaFloat4 fxaaConsole360ConstDir
    ) {
        FxaaFloat2 posM;
        posM.x = pos.x;
        posM.y = pos.y;
        
        // 从输入纹理采样数据, 计算亮度值.
        #if (FXAA_GATHER4_ALPHA == 1)
            #if (FXAA_DISCARD == 0)
                FxaaFloat4 rgbyM = FxaaTexTop(tex, posM);
                #if (FXAA_GREEN_AS_LUMA == 0)
                    #define lumaM rgbyM.w
                #else
                    #define lumaM rgbyM.y
                #endif
            #endif
            #if (FXAA_GREEN_AS_LUMA == 0)
                FxaaFloat4 luma4A = FxaaTexAlpha4(tex, posM);
                FxaaFloat4 luma4B = FxaaTexOffAlpha4(tex, posM, FxaaInt2(-1, -1));
            #else
                FxaaFloat4 luma4A = FxaaTexGreen4(tex, posM);
                FxaaFloat4 luma4B = FxaaTexOffGreen4(tex, posM, FxaaInt2(-1, -1));
            #endif
            #if (FXAA_DISCARD == 1)
                #define lumaM luma4A.w
            #endif
            #define lumaE luma4A.z
            #define lumaS luma4A.x
            #define lumaSE luma4A.y
            #define lumaNW luma4B.w
            #define lumaN luma4B.z
            #define lumaW luma4B.x
        #else
            FxaaFloat4 rgbyM = FxaaTexTop(tex, posM);
            #if (FXAA_GREEN_AS_LUMA == 0)
                #define lumaM rgbyM.w
            #else
                #define lumaM rgbyM.y
            #endif
            FxaaFloat lumaS = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 0, 1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1, 0), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaN = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 0,-1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 0), fxaaQualityRcpFrame.xy));
        #endif
    /*--------------------------------------------------------------------------*/
        // 计算各个方向上的亮度最大最小值, 检测是否可提前退出.
        FxaaFloat maxSM = max(lumaS, lumaM);
        FxaaFloat minSM = min(lumaS, lumaM);
        FxaaFloat maxESM = max(lumaE, maxSM);
        FxaaFloat minESM = min(lumaE, minSM);
        FxaaFloat maxWN = max(lumaN, lumaW);
        FxaaFloat minWN = min(lumaN, lumaW);
        FxaaFloat rangeMax = max(maxWN, maxESM);
        FxaaFloat rangeMin = min(minWN, minESM);
        FxaaFloat rangeMaxScaled = rangeMax * fxaaQualityEdgeThreshold;
        FxaaFloat range = rangeMax - rangeMin;
        FxaaFloat rangeMaxClamped = max(fxaaQualityEdgeThresholdMin, rangeMaxScaled);
        FxaaBool earlyExit = range < rangeMaxClamped;
    /*--------------------------------------------------------------------------*/
        if(earlyExit)
            #if (FXAA_DISCARD == 1)
                FxaaDiscard;
            #else
                return rgbyM;
            #endif
    /*--------------------------------------------------------------------------*/
        // 计算对角方向的亮度值.
        #if (FXAA_GATHER4_ALPHA == 0)
            FxaaFloat lumaNW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1,-1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaSE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1, 1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaNE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1,-1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaSW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 1), fxaaQualityRcpFrame.xy));
        #else
            FxaaFloat lumaNE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(1, -1), fxaaQualityRcpFrame.xy));
            FxaaFloat lumaSW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 1), fxaaQualityRcpFrame.xy));
        #endif
    /*--------------------------------------------------------------------------*/
        // 下面计算各个方向的边缘在水平和竖直的值, 以及子像素的值.
    /*--------------------------------------------------------------------------*/
        FxaaFloat lumaNS = lumaN + lumaS;
        FxaaFloat lumaWE = lumaW + lumaE;
        FxaaFloat subpixRcpRange = 1.0/range;
        FxaaFloat subpixNSWE = lumaNS + lumaWE;
        FxaaFloat edgeHorz1 = (-2.0 * lumaM) + lumaNS;
        FxaaFloat edgeVert1 = (-2.0 * lumaM) + lumaWE;
    /*--------------------------------------------------------------------------*/
        FxaaFloat lumaNESE = lumaNE + lumaSE;
        FxaaFloat lumaNWNE = lumaNW + lumaNE;
        FxaaFloat edgeHorz2 = (-2.0 * lumaE) + lumaNESE;
        FxaaFloat edgeVert2 = (-2.0 * lumaN) + lumaNWNE;
    /*--------------------------------------------------------------------------*/
        FxaaFloat lumaNWSW = lumaNW + lumaSW;
        FxaaFloat lumaSWSE = lumaSW + lumaSE;
        FxaaFloat edgeHorz4 = (abs(edgeHorz1) * 2.0) + abs(edgeHorz2);
        FxaaFloat edgeVert4 = (abs(edgeVert1) * 2.0) + abs(edgeVert2);
        FxaaFloat edgeHorz3 = (-2.0 * lumaW) + lumaNWSW;
        FxaaFloat edgeVert3 = (-2.0 * lumaS) + lumaSWSE;
        FxaaFloat edgeHorz = abs(edgeHorz3) + edgeHorz4;
        FxaaFloat edgeVert = abs(edgeVert3) + edgeVert4;
    /*--------------------------------------------------------------------------*/
        FxaaFloat subpixNWSWNESE = lumaNWSW + lumaNESE;
        FxaaFloat lengthSign = fxaaQualityRcpFrame.x;
        // 如果水平方向的边缘长度>竖直边缘长度, 说明是水平方向的边缘.
        FxaaBool horzSpan = edgeHorz >= edgeVert;
        FxaaFloat subpixA = subpixNSWE * 2.0 + subpixNWSWNESE;
    /*--------------------------------------------------------------------------*/
        // 如果不是水平边缘, 则将N和S换成W和E.(这样后面就避免了重复的代码)
        if(!horzSpan) lumaN = lumaW;
        if(!horzSpan) lumaS = lumaE;
        if(horzSpan) lengthSign = fxaaQualityRcpFrame.y;
        FxaaFloat subpixB = (subpixA * (1.0/12.0)) - lumaM;
    /*--------------------------------------------------------------------------*/
        // 根据梯度计算配对.
        FxaaFloat gradientN = lumaN - lumaM;
        FxaaFloat gradientS = lumaS - lumaM;
        FxaaFloat lumaNN = lumaN + lumaM;
        FxaaFloat lumaSS = lumaS + lumaM;
        FxaaBool pairN = abs(gradientN) >= abs(gradientS);
        FxaaFloat gradient = max(abs(gradientN), abs(gradientS));
        if(pairN) lengthSign = -lengthSign;
        FxaaFloat subpixC = FxaaSat(abs(subpixB) * subpixRcpRange);
    /*--------------------------------------------------------------------------*/
        // 计算偏移.
        FxaaFloat2 posB;
        posB.x = posM.x;
        posB.y = posM.y;
        FxaaFloat2 offNP;
        offNP.x = (!horzSpan) ? 0.0 : fxaaQualityRcpFrame.x;
        offNP.y = ( horzSpan) ? 0.0 : fxaaQualityRcpFrame.y;
        if(!horzSpan) posB.x += lengthSign * 0.5;
        if( horzSpan) posB.y += lengthSign * 0.5;
    /*--------------------------------------------------------------------------*/
        // 计算偏移后的位置.
        // 上面的像素位置.
        FxaaFloat2 posN; 
        posN.x = posB.x - offNP.x * FXAA_QUALITY__P0;
        posN.y = posB.y - offNP.y * FXAA_QUALITY__P0;
        // 下面的像素位置.
        FxaaFloat2 posP; 
        posP.x = posB.x + offNP.x * FXAA_QUALITY__P0;
        posP.y = posB.y + offNP.y * FXAA_QUALITY__P0;
        FxaaFloat subpixD = ((-2.0)*subpixC) + 3.0;
        FxaaFloat lumaEndN = FxaaLuma(FxaaTexTop(tex, posN));
        FxaaFloat subpixE = subpixC * subpixC;
        FxaaFloat lumaEndP = FxaaLuma(FxaaTexTop(tex, posP));
    /*--------------------------------------------------------------------------*/
        if(!pairN) lumaNN = lumaSS;
        // 梯度缩放.
        FxaaFloat gradientScaled = gradient * 1.0/4.0;
        FxaaFloat lumaMM = lumaM - lumaNN * 0.5;
        FxaaFloat subpixF = subpixD * subpixE;
        FxaaBool lumaMLTZero = lumaMM < 0.0;
    /*--------------------------------------------------------------------------*/
        // 第1次边缘末端查找.
        lumaEndN -= lumaNN * 0.5;
        lumaEndP -= lumaNN * 0.5;
        FxaaBool doneN = abs(lumaEndN) >= gradientScaled;
        FxaaBool doneP = abs(lumaEndP) >= gradientScaled;
        if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P1;
        if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P1;
        FxaaBool doneNP = (!doneN) || (!doneP);
        if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P1;
        if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P1;
    /*--------------------------------------------------------------------------*/
        // 第2次边缘末端查找.
        if(doneNP) {
            if(!doneN) lumaEndN = FxaaLuma(FxaaTexTop(tex, posN.xy));
            if(!doneP) lumaEndP = FxaaLuma(FxaaTexTop(tex, posP.xy));
            if(!doneN) lumaEndN = lumaEndN - lumaNN * 0.5;
            if(!doneP) lumaEndP = lumaEndP - lumaNN * 0.5;
            doneN = abs(lumaEndN) >= gradientScaled;
            doneP = abs(lumaEndP) >= gradientScaled;
            if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P2;
            if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P2;
            doneNP = (!doneN) || (!doneP);
            if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P2;
            if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P2;
    /*--------------------------------------------------------------------------*/
            // 第3次边缘末端查找.
            #if (FXAA_QUALITY__PS > 3)
            if(doneNP) {
                if(!doneN) lumaEndN = FxaaLuma(FxaaTexTop(tex, posN.xy));
                if(!doneP) lumaEndP = FxaaLuma(FxaaTexTop(tex, posP.xy));
                if(!doneN) lumaEndN = lumaEndN - lumaNN * 0.5;
                if(!doneP) lumaEndP = lumaEndP - lumaNN * 0.5;
                doneN = abs(lumaEndN) >= gradientScaled;
                doneP = abs(lumaEndP) >= gradientScaled;
                if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P3;
                if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P3;
                doneNP = (!doneN) || (!doneP);
                if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P3;
                if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P3;
    /*--------------------------------------------------------------------------*/
                #if (FXAA_QUALITY__PS > 4)
                    (......) // 最多到12个以上的采样像素.
                #endif
    /*--------------------------------------------------------------------------*/
            }
            #endif
    /*--------------------------------------------------------------------------*/
        }
    /*--------------------------------------------------------------------------*/
        FxaaFloat dstN = posM.x - posN.x;
        FxaaFloat dstP = posP.x - posM.x;
        if(!horzSpan) dstN = posM.y - posN.y;
        if(!horzSpan) dstP = posP.y - posM.y;
    /*--------------------------------------------------------------------------*/
        FxaaBool goodSpanN = (lumaEndN < 0.0) != lumaMLTZero;
        FxaaFloat spanLength = (dstP + dstN);
        FxaaBool goodSpanP = (lumaEndP < 0.0) != lumaMLTZero;
        FxaaFloat spanLengthRcp = 1.0/spanLength;
    /*--------------------------------------------------------------------------*/
        FxaaBool directionN = dstN < dstP;
        FxaaFloat dstMin = min(dstN, dstP);
        FxaaBool goodSpan = directionN ? goodSpanN : goodSpanP;
        FxaaFloat subpixG = subpixF * subpixF;
        FxaaFloat pixelOffset = (dstMin * (-spanLengthRcp)) + 0.5;
        FxaaFloat subpixH = subpixG * fxaaQualitySubpix;
    /*--------------------------------------------------------------------------*/
        // 计算最终的采样位置并从输入纹理中采样.
        FxaaFloat pixelOffsetGood = goodSpan ? pixelOffset : 0.0;
        FxaaFloat pixelOffsetSubpix = max(pixelOffsetGood, subpixH);
        if(!horzSpan) posM.x += pixelOffsetSubpix * lengthSign;
        if( horzSpan) posM.y += pixelOffsetSubpix * lengthSign;
        // 注意FxaaTexTop使用了纹理的双线性采样, 所以才能呈现出混合过渡的效果.
        #if ((FXAA_DISCARD == 1) || (POST_PROCESS_ALPHA == 2))
            return FxaaTexTop(tex, posM);
        #else
            return FxaaFloat4(FxaaTexTop(tex, posM).xyz, lumaM);
        #endif
    }
    #endif
    

    以上代码须知,最终计算出来的是当前像素偏移后的采样位置,并且使用了双线性采样,所以才能达到混合过渡的抗锯齿效果。

    FXAA在延迟渲染的表现不甚理想,综合抗锯齿效果上不如TAA或SMAA好。不过SMAA需要额外集成或实现,SMAA在UE的实现在GitHub上可以找到:https://github.com/inequation/UnrealEngine/tree/SMAA-4.12。

    FXAA、TAA和SMAA的效果对比图。

    7.4.5 TAA

    TAA全称是Temporal Anti-Aliasing,通常被翻译成时间抗锯齿(或临时抗锯齿)。它是Epic Game的Brian Karis(很熟悉的名字吧,本系列文章数次提及他)实现并提出的UE当家抗锯齿技术,并且在SIGGRAPH2014发表了演讲High Quality Temporal Supersampling。2016年,NVIDIA的研究员Marco Salvi发表了改进篇An Excursion in Temporal Superampling

    TAA的核心思想是将MSAA在同一帧的空间采样分摊到时间轴上的多帧采样,然后按照某种权重混合它们:

    由于要在多帧中生成采样偏移量(被称为Jitter),可以通过修改投影矩阵达成:

    当然,Jitter也可以通过特殊的Pattern或Halton等低差异序列算法生成:

    对于移动平均数(Moving Average),可以使用简单的多帧历史数据和当前帧数据混合,但这种采样不足,受限于历史帧的数量,太大会导致带宽暴增。可以用指数型移动平均数,可以模拟几乎无穷大的样本与固定存储:

    当当前帧的混合权重(alpha)(UE默认是0.04)足够小时,可以近似等于Simple的方法(这样就可以只保留一帧历史数据):

    但是,以上方法只是适用于静态场景,对于动态场景,需要配合某些方法(如速度缓冲、深度、材质索引、颜色差别、法线变化等)丢弃无效的采样样本(如突然出现或消失在屏幕的像素)。

    UE使用的是结合了速度缓冲的重投影(Reprojection)和邻居截取(Neighborhood Clamping)的技术。其中速度缓冲和运动模糊的一致,需要根据速度缓冲重投影或删除Jitter:

    这就对速度缓冲的精度要求非常高,需要记录所有像素的运动向量,需要R16G16精度的速度缓冲,对于程序化的动画、滚动的纹理以及半透明物体可能产生瑕疵:

    UE为了防止Ghost鬼影,采用了邻居截取(Neighborhood Clamping),先对当前的像素以及周围数个像素的颜色建立一个颜色空间Bounding Box,然后在使用历史的颜色样本之前,先将这个历史的颜色Clamp在这个包围盒的区域内。直观的理解就是,当历史样本的颜色和当前的颜色差别特别大的时候,用这个Clamping尽量拉回到和当前帧这个像素周围差不多的颜色。

    需要注意的是,Clamping使用的颜色空间是YCoCg,可以将min和max的基数视作RGB空间的AABB,且可以将Box的朝向定位到亮度方向上:

    关于YCoCg颜色空间

    也被称为YCgCo,其中Y是亮度值,Cg是绿色量,Co是橙色量。可以和sRGB空间互转,公式如下:

    [egin{bmatrix} Y \ Co \ Cg end{bmatrix} = egin{bmatrix} frac{1}{4} & frac{1}{2} & frac{1}{4} \ frac{1}{2} & 0 & -frac{1}{2} \ -frac{1}{4} & frac{1}{2} & -frac{1}{4}end{bmatrix} cdot egin{bmatrix} R \ G \ B end{bmatrix} ]

    [egin{bmatrix} R \ G \ B end{bmatrix} = egin{bmatrix} 1 & 1 & -1 \ 1 & 0 & 1 \ 1 & -1 & -1end{bmatrix} cdot egin{bmatrix} Y \ Co \ Cg end{bmatrix} ]

    下图是正常图片被分解成YCoCg空间的图例(从上到下依次是正常、亮度Y、绿色Cg、橙色Co):

    有了YCoCg的AABB的Box,便可以裁剪历史数据到这个Box的边缘:

    NVIDIA版本提出一种改进的方式,叫方差裁剪(Variance Clipping)。先计算最早的两个颜色的矩(Moment),用来建立一个改进的AABB;利用两个矩可以计算出平均值和标准方差,进一步地可以计算出新的基数minc和maxc,用它们来代替旧的AABB:

    根据新的AABB计算出一个Gaussian模型,然后用它可以生成更加紧凑的包围盒:

    理论部分已经说完了,接下来分析UE的具体实现代码。首先分析TAA的Jitter生成,在InitViews阶段的PreVisibilityFrameSetup

    // EngineSourceRuntimeRendererPrivateSceneVisibility.cpp
    
    void FSceneRenderer::PreVisibilityFrameSetup(FRHICommandListImmediate& RHICmdList)
    {
        RHICmdList.BeginScene();
        
        (......)
    
        for(int32 ViewIndex = 0;ViewIndex < Views.Num();ViewIndex++)
        {
            FViewInfo& View = Views[ViewIndex];
            FSceneViewState* ViewState = View.ViewState;
    
            (......)
            
            // TAA子像素采样数量.
            int32 CVarTemporalAASamplesValue = CVarTemporalAASamples.GetValueOnRenderThread();
    
            bool bTemporalUpsampling = View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::TemporalUpscale;
            
            // 计算视图的TAA子像素偏移量(Jitter).
            if (View.AntiAliasingMethod == AAM_TemporalAA && ViewState && (CVarTemporalAASamplesValue > 0 || bTemporalUpsampling) && View.bAllowTemporalJitter)
            {
                float EffectivePrimaryResolutionFraction = float(View.ViewRect.Width()) / float(View.GetSecondaryViewRectSize().X);
    
                // 计算TAA采样数量.
                int32 TemporalAASamples = CVarTemporalAASamplesValue;
                {
                    if (Scene->GetFeatureLevel() < ERHIFeatureLevel::SM5)
                    {
                        // 移动端只能使用2采样数量.
                        TemporalAASamples = 2;
                    }
                    else if (bTemporalUpsampling)
                    {
                        // 屏幕百分比<100%的TAA上采样需要额外的时间采样数量, 为最终输出纹理获得稳定的时间采样密度, 避免输出像素对齐的收敛问题.
                        TemporalAASamples = float(TemporalAASamples) * FMath::Max(1.f, 1.f / (EffectivePrimaryResolutionFraction * EffectivePrimaryResolutionFraction));
                    }
                    else if (CVarTemporalAASamplesValue == 5)
                    {
                        TemporalAASamples = 4;
                    }
    
                    TemporalAASamples = FMath::Clamp(TemporalAASamples, 1, 255);
                }
    
                // 计算在时间序列的采样点的索引.
                int32 TemporalSampleIndex            = ViewState->TemporalAASampleIndex + 1;
                if(TemporalSampleIndex >= TemporalAASamples || View.bCameraCut)
                {
                    TemporalSampleIndex = 0;
                }
    
                // 更新view state.
                if (!View.bStatePrevViewInfoIsReadOnly && !bFreezeTemporalSequences)
                {
                    ViewState->TemporalAASampleIndex          = TemporalSampleIndex;
                    ViewState->TemporalAASampleIndexUnclamped = ViewState->TemporalAASampleIndexUnclamped+1;
                }
    
                // 在时间序列上选择一个子像素采样坐标.
                float SampleX, SampleY;
                if (Scene->GetFeatureLevel() < ERHIFeatureLevel::SM5)
                {
                    float SamplesX[] = { -8.0f/16.0f, 0.0/16.0f };
                    float SamplesY[] = { /* - */ 0.0f/16.0f, 8.0/16.0f };
                    check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
                    SampleX = SamplesX[ TemporalSampleIndex ];
                    SampleY = SamplesY[ TemporalSampleIndex ];
                }
                else if (View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::TemporalUpscale)
                {
                    // 均匀分布时域Jitter在[-0.5, 0.5],因为不再有任何输入和输出像素对齐. 注意此处用的Halton序列.
                    SampleX = Halton(TemporalSampleIndex + 1, 2) - 0.5f;
                    SampleY = Halton(TemporalSampleIndex + 1, 3) - 0.5f;
    
                    View.MaterialTextureMipBias = -(FMath::Max(-FMath::Log2(EffectivePrimaryResolutionFraction), 0.0f) ) + CVarMinAutomaticViewMipBiasOffset.GetValueOnRenderThread();
                    View.MaterialTextureMipBias = FMath::Max(View.MaterialTextureMipBias, CVarMinAutomaticViewMipBias.GetValueOnRenderThread());
                }
                else if( CVarTemporalAASamplesValue == 2 )
                {
                    // 2xMSAA
                    // Pattern docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx
                    //   N.
                    //   .S
                    float SamplesX[] = { -4.0f/16.0f, 4.0/16.0f };
                    float SamplesY[] = { -4.0f/16.0f, 4.0/16.0f };
                    check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
                    SampleX = SamplesX[ TemporalSampleIndex ];
                    SampleY = SamplesY[ TemporalSampleIndex ];
                }
                else if( CVarTemporalAASamplesValue == 3 )
                {
                    // 3xMSAA
                    //   A..
                    //   ..B
                    //   .C.
                    // Rolling circle pattern (A,B,C).
                    float SamplesX[] = { -2.0f/3.0f,  2.0/3.0f,  0.0/3.0f };
                    float SamplesY[] = { -2.0f/3.0f,  0.0/3.0f,  2.0/3.0f };
                    check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
                    SampleX = SamplesX[ TemporalSampleIndex ];
                    SampleY = SamplesY[ TemporalSampleIndex ];
                }
                else if( CVarTemporalAASamplesValue == 4 )
                {
                    // 4xMSAA
                    // Pattern docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx
                    //   .N..
                    //   ...E
                    //   W...
                    //   ..S.
                    // Rolling circle pattern (N,E,S,W).
                    float SamplesX[] = { -2.0f/16.0f,  6.0/16.0f, 2.0/16.0f, -6.0/16.0f };
                    float SamplesY[] = { -6.0f/16.0f, -2.0/16.0f, 6.0/16.0f,  2.0/16.0f };
                    check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
                    SampleX = SamplesX[ TemporalSampleIndex ];
                    SampleY = SamplesY[ TemporalSampleIndex ];
                }
                else if( CVarTemporalAASamplesValue == 5 )
                {
                    // Compressed 4 sample pattern on same vertical and horizontal line (less temporal flicker).
                    // Compressed 1/2 works better than correct 2/3 (reduced temporal flicker).
                    //   . N .
                    //   W . E
                    //   . S .
                    // Rolling circle pattern (N,E,S,W).
                    float SamplesX[] = {  0.0f/2.0f,  1.0/2.0f,  0.0/2.0f, -1.0/2.0f };
                    float SamplesY[] = { -1.0f/2.0f,  0.0/2.0f,  1.0/2.0f,  0.0/2.0f };
                    check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
                    SampleX = SamplesX[ TemporalSampleIndex ];
                    SampleY = SamplesY[ TemporalSampleIndex ];
                }
                else // 大于5采样数, 则使用Halton序列.
                {
                    float u1 = Halton( TemporalSampleIndex + 1, 2 );
                    float u2 = Halton( TemporalSampleIndex + 1, 3 );
    
                    // 生成正态分布的样本.
                    // exp( x^2 / Sigma^2 )
                        
                    static auto CVar = IConsoleManager::Get().FindConsoleVariable(TEXT("r.TemporalAAFilterSize"));
                    float FilterSize = CVar->GetFloat();
    
                    // 缩放分布以设置非单位方差.
                    // Variance = Sigma^2
                    float Sigma = 0.47f * FilterSize;
    
                    float OutWindow = 0.5f;
                    float InWindow = FMath::Exp( -0.5 * FMath::Square( OutWindow / Sigma ) );
                        
                    // Box-Muller变换
                    float Theta = 2.0f * PI * u2;
                    float r = Sigma * FMath::Sqrt( -2.0f * FMath::Loge( (1.0f - u1) * InWindow + u1 ) );
                    
                    SampleX = r * FMath::Cos( Theta );
                    SampleY = r * FMath::Sin( Theta );
                }
    
                // 保存采样数据到View.
                View.TemporalJitterSequenceLength = TemporalAASamples;
                View.TemporalJitterIndex = TemporalSampleIndex;
                View.TemporalJitterPixels.X = SampleX;
                View.TemporalJitterPixels.Y = SampleY;
    
                View.ViewMatrices.HackAddTemporalAAProjectionJitter(FVector2D(SampleX * 2.0f / View.ViewRect.Width(), SampleY * -2.0f / View.ViewRect.Height()));
            }
    
        (......)
    }
    

    由于UE在PC平台默认的采样数量是8,所以默认使用Halton序列来采样子像素。其中Halton的生成序列如下图所示:

    相比随机采样,Halton获得的采样序列更加均匀,且可以获得没有上限的样本数(UE默认限制在8以内)。除此之外,还有Sobel、Niederreiter、Kronecker等低差异序列算法,它们的比较如下图:

    对于UE的默认设置,利用Halton生成的前8个序列计算出来的SampleX和SampleY分别是:

    0: (-0.163972363, 0.284008324)
    1: (-0.208000556, -0.360267729)
    2: (0.172162965, 0.144461900)
    3: (-0.430473328, 0.156679258)
    4: (0.0485312343, -0.275233328)
    5: (0.0647613853, 0.367280841)
    6: (-0.147184864, -0.0535709597)
    7: (0.366960347, -0.307915747)
    

    由于有正数有负数,且处于[-0.5, 0.5]之间,说明是基于像素中心(0.5, 0.5)的子像素偏移量。图形化后的坐标点如下:

    解析完了Jitter,继续分析TAA是如何在C++组织绘制逻辑的(UE4.26支持第4代和第5代TAA,此处以第4代为分析对象):

    // EngineSourceRuntimeRendererPrivatePostProcessTemporalAA.cpp
    
    FTAAOutputs AddTemporalAAPass(
        FRDGBuilder& GraphBuilder,
        const FViewInfo& View,
        const FTAAPassParameters& Inputs,
        const FTemporalAAHistory& InputHistory,
        FTemporalAAHistory* OutputHistory)
    {
        // 记录标记等.
        const bool bSupportsAlpha = IsPostProcessingWithAlphaChannelSupported();
        const int32 IntputTextureCount = (IsDOFTAAConfig(Inputs.Pass) && bSupportsAlpha) ? 2 : 1;
        const bool bIsMainPass = IsMainTAAConfig(Inputs.Pass);
        const bool bCameraCut = !InputHistory.IsValid() || View.bCameraCut;
    
        const FIntPoint OutputExtent = Inputs.GetOutputExtent();
    
        // 记录输入区域.
        const FIntRect SrcRect = Inputs.InputViewRect;
        const FIntRect DestRect = Inputs.OutputViewRect;
        const FIntRect PracticableSrcRect = FIntRect::DivideAndRoundUp(SrcRect, Inputs.ResolutionDivisor);
        const FIntRect PracticableDestRect = FIntRect::DivideAndRoundUp(DestRect, Inputs.ResolutionDivisor);
    
        const uint32 PassIndex = static_cast<uint32>(Inputs.Pass);
        const TCHAR* PassName = kTAAPassNames[PassIndex];
    
        // 输出纹理.
        FTAAOutputs Outputs;
        // 当前帧的历史纹理.
        TStaticArray<FRDGTextureRef, FTemporalAAHistory::kRenderTargetCount> NewHistoryTexture;
    
        // 创建输出和历史帧纹理.
        {
            EPixelFormat HistoryPixelFormat = PF_FloatRGBA;
            if (bIsMainPass && Inputs.bUseFast && !bSupportsAlpha && CVarTAAR11G11B10History.GetValueOnRenderThread())
            {
                HistoryPixelFormat = PF_FloatR11G11B10;
            }
    
            FRDGTextureDesc SceneColorDesc = FRDGTextureDesc::Create2D(
                OutputExtent,
                HistoryPixelFormat,
                FClearValueBinding::Black,
                TexCreate_ShaderResource | TexCreate_UAV);
    
            if (Inputs.bOutputRenderTargetable)
            {
                SceneColorDesc.Flags |= TexCreate_RenderTargetable;
            }
    
            const TCHAR* OutputName = kTAAOutputNames[PassIndex];
    
            for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
            {
                NewHistoryTexture[i] = GraphBuilder.CreateTexture(
                    SceneColorDesc,
                    OutputName,
                    ERDGTextureFlags::MultiFrame);
            }
    
            NewHistoryTexture[0] = Outputs.SceneColor = NewHistoryTexture[0];
    
            if (IntputTextureCount == 2)
            {
                Outputs.SceneMetadata = NewHistoryTexture[1];
            }
    
            if (Inputs.bDownsample)
            {
                const FRDGTextureDesc HalfResSceneColorDesc = FRDGTextureDesc::Create2D(
                    SceneColorDesc.Extent / 2,
                    Inputs.DownsampleOverrideFormat != PF_Unknown ? Inputs.DownsampleOverrideFormat : Inputs.SceneColorInput->Desc.Format,
                    FClearValueBinding::Black,
                    TexCreate_ShaderResource | TexCreate_UAV | GFastVRamConfig.Downsample);
    
                Outputs.DownsampledSceneColor = GraphBuilder.CreateTexture(HalfResSceneColorDesc, TEXT("SceneColorHalfRes"));
            }
        }
    
        RDG_GPU_STAT_SCOPE(GraphBuilder, TAA);
    
        TStaticArray<bool, FTemporalAAHistory::kRenderTargetCount> bUseHistoryTexture;
    
        // 处理FTAAStandaloneCS参数.
        {
            FTAAStandaloneCS::FPermutationDomain PermutationVector;
            PermutationVector.Set<FTAAStandaloneCS::FTAAPassConfigDim>(Inputs.Pass);
            PermutationVector.Set<FTAAStandaloneCS::FTAAFastDim>(Inputs.bUseFast);
            PermutationVector.Set<FTAAStandaloneCS::FTAADownsampleDim>(Inputs.bDownsample);
            PermutationVector.Set<FTAAStandaloneCS::FTAAUpsampleFilteredDim>(true);
    
            if (IsTAAUpsamplingConfig(Inputs.Pass))
            {
                const bool bUpsampleFiltered = CVarTemporalAAUpsampleFiltered.GetValueOnRenderThread() != 0 || Inputs.Pass != ETAAPassConfig::MainUpsampling;
                PermutationVector.Set<FTAAStandaloneCS::FTAAUpsampleFilteredDim>(bUpsampleFiltered);
    
                // 根据屏幕百分比设置排列.
                if (SrcRect.Width() > DestRect.Width() ||
                    SrcRect.Height() > DestRect.Height())
                {
                    PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(2);
                }
                else if (SrcRect.Width() * 100 < 50 * DestRect.Width() &&
                    SrcRect.Height() * 100 < 50 * DestRect.Height() &&
                    Inputs.Pass == ETAAPassConfig::MainSuperSampling)
                {
                    PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(3);
                }
                else if (SrcRect.Width() * 100 < 71 * DestRect.Width() &&
                    SrcRect.Height() * 100 < 71 * DestRect.Height())
                {
                    PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(1);
                }
            }
    
            FTAAStandaloneCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTAAStandaloneCS::FParameters>();
    
            // 设置通用的着色器参数.
            const FIntPoint InputExtent = Inputs.SceneColorInput->Desc.Extent;
            const FIntRect InputViewRect = Inputs.InputViewRect;
            const FIntRect OutputViewRect = Inputs.OutputViewRect;
    
            if (!IsTAAUpsamplingConfig(Inputs.Pass))
            {
                SetupSampleWeightParameters(PassParameters, Inputs, View.TemporalJitterPixels);
            }
    
            const float ResDivisor = Inputs.ResolutionDivisor;
            const float ResDivisorInv = 1.0f / ResDivisor;
    
            PassParameters->ViewUniformBuffer = View.ViewUniformBuffer;
            PassParameters->CurrentFrameWeight = CVarTemporalAACurrentFrameWeight.GetValueOnRenderThread();
            PassParameters->bCameraCut = bCameraCut;
    
            PassParameters->SceneDepthTexture = Inputs.SceneDepthTexture;
            PassParameters->GBufferVelocityTexture = Inputs.SceneVelocityTexture;
    
            PassParameters->SceneDepthTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
            PassParameters->GBufferVelocityTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
    
            PassParameters->StencilTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateWithPixelFormat(Inputs.SceneDepthTexture, PF_X24_G8));
    
            // 速度缓冲.
            if (!PassParameters->GBufferVelocityTexture)
            {
                PassParameters->GBufferVelocityTexture = GraphBuilder.RegisterExternalTexture(GSystemTextures.BlackDummy);;
            }
    
            // 输入缓冲着色器参数.
            {
                PassParameters->InputSceneColorSize = FVector4(
                    InputExtent.X,
                    InputExtent.Y,
                    1.0f / float(InputExtent.X),
                    1.0f / float(InputExtent.Y));
                PassParameters->InputMinPixelCoord = PracticableSrcRect.Min;
                PassParameters->InputMaxPixelCoord = PracticableSrcRect.Max - FIntPoint(1, 1);
                PassParameters->InputSceneColor = Inputs.SceneColorInput;
                PassParameters->InputSceneColorSampler = TStaticSamplerState<SF_Point>::GetRHI();
                PassParameters->InputSceneMetadata = Inputs.SceneMetadataInput;
                PassParameters->InputSceneMetadataSampler = TStaticSamplerState<SF_Point>::GetRHI();
            }
    
            PassParameters->OutputViewportSize = FVector4(
                PracticableDestRect.Width(), PracticableDestRect.Height(), 1.0f / float(PracticableDestRect.Width()), 1.0f / float(PracticableDestRect.Height()));
            PassParameters->OutputViewportRect = FVector4(PracticableDestRect.Min.X, PracticableDestRect.Min.Y, PracticableDestRect.Max.X, PracticableDestRect.Max.Y);
            PassParameters->OutputQuantizationError = ComputePixelFormatQuantizationError(NewHistoryTexture[0]->Desc.Format);
    
            // 设置历史着色器参数.
            {
                FRDGTextureRef BlackDummy = GraphBuilder.RegisterExternalTexture(GSystemTextures.BlackDummy);
    
                if (bCameraCut)
                {
                    PassParameters->ScreenPosToHistoryBufferUV = FVector4(1.0f, 1.0f, 1.0f, 1.0f);
                    PassParameters->ScreenPosAbsMax = FVector2D(0.0f, 0.0f);
                    PassParameters->HistoryBufferUVMinMax = FVector4(0.0f, 0.0f, 0.0f, 0.0f);
                    PassParameters->HistoryBufferSize = FVector4(1.0f, 1.0f, 1.0f, 1.0f);
    
                    for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
                    {
                        PassParameters->HistoryBuffer[i] = BlackDummy;
                    }
    
                    // Remove dependency of the velocity buffer on camera cut, given it's going to be ignored by the shader.
                    PassParameters->GBufferVelocityTexture = BlackDummy;
                }
                else
                {
                    FIntPoint ReferenceViewportOffset = InputHistory.ViewportRect.Min;
                    FIntPoint ReferenceViewportExtent = InputHistory.ViewportRect.Size();
                    FIntPoint ReferenceBufferSize = InputHistory.ReferenceBufferSize;
    
                    float InvReferenceBufferSizeX = 1.f / float(InputHistory.ReferenceBufferSize.X);
                    float InvReferenceBufferSizeY = 1.f / float(InputHistory.ReferenceBufferSize.Y);
    
                    PassParameters->ScreenPosToHistoryBufferUV = FVector4(
                        ReferenceViewportExtent.X * 0.5f * InvReferenceBufferSizeX,
                        -ReferenceViewportExtent.Y * 0.5f * InvReferenceBufferSizeY,
                        (ReferenceViewportExtent.X * 0.5f + ReferenceViewportOffset.X) * InvReferenceBufferSizeX,
                        (ReferenceViewportExtent.Y * 0.5f + ReferenceViewportOffset.Y) * InvReferenceBufferSizeY);
    
                    FIntPoint ViewportOffset = ReferenceViewportOffset / Inputs.ResolutionDivisor;
                    FIntPoint ViewportExtent = FIntPoint::DivideAndRoundUp(ReferenceViewportExtent, Inputs.ResolutionDivisor);
                    FIntPoint BufferSize = ReferenceBufferSize / Inputs.ResolutionDivisor;
    
                    PassParameters->ScreenPosAbsMax = FVector2D(1.0f - 1.0f / float(ViewportExtent.X), 1.0f - 1.0f / float(ViewportExtent.Y));
    
                    float InvBufferSizeX = 1.f / float(BufferSize.X);
                    float InvBufferSizeY = 1.f / float(BufferSize.Y);
    
                    PassParameters->HistoryBufferUVMinMax = FVector4(
                        (ViewportOffset.X + 0.5f) * InvBufferSizeX,
                        (ViewportOffset.Y + 0.5f) * InvBufferSizeY,
                        (ViewportOffset.X + ViewportExtent.X - 0.5f) * InvBufferSizeX,
                        (ViewportOffset.Y + ViewportExtent.Y - 0.5f) * InvBufferSizeY);
    
                    PassParameters->HistoryBufferSize = FVector4(BufferSize.X, BufferSize.Y, InvBufferSizeX, InvBufferSizeY);
    
                    for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
                    {
                        if (InputHistory.RT[i].IsValid())
                        {
                            PassParameters->HistoryBuffer[i] = GraphBuilder.RegisterExternalTexture(InputHistory.RT[i]);
                        }
                        else
                        {
                            PassParameters->HistoryBuffer[i] = BlackDummy;
                        }
                    }
                }
    
                for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
                {
                    PassParameters->HistoryBufferSampler[i] = TStaticSamplerState<SF_Bilinear>::GetRHI();
                }
            }
    
            PassParameters->MaxViewportUVAndSvPositionToViewportUV = FVector4(
                (PracticableDestRect.Width() - 0.5f * ResDivisor) / float(PracticableDestRect.Width()),
                (PracticableDestRect.Height() - 0.5f * ResDivisor) / float(PracticableDestRect.Height()),
                ResDivisor / float(DestRect.Width()),
                ResDivisor / float(DestRect.Height()));
    
            PassParameters->HistoryPreExposureCorrection = View.PreExposure / View.PrevViewInfo.SceneColorPreExposure;
    
            {
                float InvSizeX = 1.0f / float(InputExtent.X);
                float InvSizeY = 1.0f / float(InputExtent.Y);
                PassParameters->ViewportUVToInputBufferUV = FVector4(
                    ResDivisorInv * InputViewRect.Width() * InvSizeX,
                    ResDivisorInv * InputViewRect.Height() * InvSizeY,
                    ResDivisorInv * InputViewRect.Min.X * InvSizeX,
                    ResDivisorInv * InputViewRect.Min.Y * InvSizeY);
            }
    
            PassParameters->EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
    
            // 时间上采样特定的参数.
            {
                float InputViewSizeInvScale = Inputs.ResolutionDivisor;
                float InputViewSizeScale = 1.0f / InputViewSizeInvScale;
    
                PassParameters->TemporalJitterPixels = InputViewSizeScale * View.TemporalJitterPixels;
                PassParameters->ScreenPercentage = float(InputViewRect.Width()) / float(OutputViewRect.Width());
                PassParameters->UpscaleFactor = float(OutputViewRect.Width()) / float(InputViewRect.Width());
                PassParameters->InputViewMin = InputViewSizeScale * FVector2D(InputViewRect.Min.X, InputViewRect.Min.Y);
                PassParameters->InputViewSize = FVector4(
                    InputViewSizeScale * InputViewRect.Width(), InputViewSizeScale * InputViewRect.Height(),
                    InputViewSizeInvScale / InputViewRect.Width(), InputViewSizeInvScale / InputViewRect.Height());
            }
    
            // UAVs
            {
                for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
                {
                    PassParameters->OutComputeTex[i] = GraphBuilder.CreateUAV(NewHistoryTexture[i]);
                }
    
                if (Outputs.DownsampledSceneColor)
                {
                    PassParameters->OutComputeTexDownsampled = GraphBuilder.CreateUAV(Outputs.DownsampledSceneColor);
                }
            }
    
            // Debug UAVs
            {
                FRDGTextureDesc DebugDesc = FRDGTextureDesc::Create2D(
                    OutputExtent,
                    PF_FloatRGBA,
                    FClearValueBinding::None,
                    /* InFlags = */ TexCreate_ShaderResource | TexCreate_UAV);
    
                FRDGTextureRef DebugTexture = GraphBuilder.CreateTexture(DebugDesc, TEXT("Debug.TAA"));
                PassParameters->DebugOutput = GraphBuilder.CreateUAV(DebugTexture);
            }
    
            TShaderMapRef<FTAAStandaloneCS> ComputeShader(View.ShaderMap, PermutationVector);
    
            ClearUnusedGraphResources(ComputeShader, PassParameters);
            for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
            {
                bUseHistoryTexture[i] = PassParameters->HistoryBuffer[i] != nullptr;
            }
    
            // 增加TAA处理CS通道.
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("TAA %s%s %dx%d -> %dx%d",
                    PassName, Inputs.bUseFast ? TEXT(" Fast") : TEXT(""),
                    PracticableSrcRect.Width(), PracticableSrcRect.Height(),
                    PracticableDestRect.Width(), PracticableDestRect.Height()),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(PracticableDestRect.Size(), GTemporalAATileSizeX));
        }
        
        // 处理历史输出数据.
        if (!View.bStatePrevViewInfoIsReadOnly)
        {
            OutputHistory->SafeRelease();
    
            for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
            {
                if (bUseHistoryTexture[i])
                {
                    GraphBuilder.QueueTextureExtraction(NewHistoryTexture[i], &OutputHistory->RT[i]);
                }
            }
    
            OutputHistory->ViewportRect = DestRect;
            OutputHistory->ReferenceBufferSize = OutputExtent * Inputs.ResolutionDivisor;
        }
    
        return Outputs;
    } // AddTemporalAAPass()
    

    下面分析FTAAStandaloneCS使用的CS shader:

    // EngineShadersPrivateTemporalAATAAStandalone.usf
    
    [numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
    void MainCS(
        uint2 DispatchThreadId : SV_DispatchThreadID,
        uint2 GroupId : SV_GroupID,
        uint2 GroupThreadId : SV_GroupThreadID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
        // 获取视口UV.
        float2 ViewportUV = (float2(DispatchThreadId) + 0.5f) * OutputViewportSize.zw;
        
        #if AA_LOWER_RESOLUTION
        {
            ViewportUV = (float2(DispatchThreadId) + 0.5f) * MaxViewportUVAndSvPositionToViewportUV.zw;
            ViewportUV = min(ViewportUV, MaxViewportUVAndSvPositionToViewportUV.xy);
        }
        #endif
    
        // 曝光缩放.
        float FrameExposureScale = EyeAdaptationLookup();
        FTAAHistoryPayload OutputPayload = TemporalAASample(GroupId, GroupThreadId, GroupThreadIndex, ViewportUV, FrameExposureScale);
    
        float4 OutColor0 = 0;
        float4 OutColor1 = 0;
    
        // 处理输出数据.
        #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC
        {
            OutColor0.rgb = OutputPayload.Color.rgb;
            OutColor0.a = OutputPayload.CocRadius;
        }
        #elif AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
        {
            OutColor0 = OutputPayload.Color;
            OutColor1.r = OutputPayload.CocRadius;
        }
        #else
        {
            OutColor0 = OutputPayload.Color;
        }
        #endif
    
        uint2 PixelPos = DispatchThreadId + OutputViewportRect.xy;
        if (all(PixelPos < OutputViewportRect.zw))
        {
            float4 FinalOutput0 = min(MaxHalfFloat.xxxx, OutColor0);
            // 随机量化采样.
            #if AA_ENABLE_STOCASTIC_QUANTIZATION
            {
                uint2 Random = Rand3DPCG16(int3(PixelPos, View.StateFrameIndexMod8)).xy;
                float2 E = Hammersley16(0, 1, Random);
    
                FinalOutput0.rgb += FinalOutput0.rgb * (E.x * OutputQuantizationError);
            }
            #endif
    
            // 存储最终的颜色输出.
            OutComputeTex_0[PixelPos] = FinalOutput0;
    
            #if HISTORY_RENDER_TARGETS == 2
                OutComputeTex_1[PixelPos] = OutColor1;
            #endif
        }
    
        // 下采样.
        #if TAA_DOWNSAMPLE
        {
            uint P0 = GroupThreadId.x + GroupThreadId.y * THREADGROUP_SIZEX;
            uint P1 = P0 + 1;
            uint P2 = P0 + THREADGROUP_SIZEX;
            uint P3 = P2 + 1;
    
            GroupSharedDownsampleArray[P0] = OutColor0;
    
            GroupMemoryBarrierWithGroupSync();
    
            if (((GroupThreadId.x | GroupThreadId.y) & 1) == 0)
            {
                OutComputeTexDownsampled[PixelPos / 2] =
                    (OutColor0 + GroupSharedDownsampleArray[P1] + GroupSharedDownsampleArray[P2] + GroupSharedDownsampleArray[P3]) * 0.25;
            }
        }
        #endif //TAA_DOWNSAMPLE
    }
    

    TAA的主要逻辑在TemporalAASample

    FTAAHistoryPayload TemporalAASample(uint2 GroupId, uint2 GroupThreadId, uint GroupThreadIndex, float2 ViewportUV, float FrameExposureScale)
    {
        // 设置TAA输入参数.
        FTAAInputParameters InputParams;
    
        // 预曝光.
        #if USE_PREEXPOSURE
            InputParams.FrameExposureScale = ToScalarMemory(FrameExposureScale * View.OneOverPreExposure);
        #else
            InputParams.FrameExposureScale = ToScalarMemory(FrameExposureScale);
        #endif
    
        // 逐像素设置.
        {
            InputParams.GroupId = GroupId;
            InputParams.GroupThreadId = GroupThreadId;
            InputParams.GroupThreadIndex = GroupThreadIndex;
            InputParams.ViewportUV = ViewportUV;
            InputParams.ScreenPos = ViewportUVToScreenPos(ViewportUV);
            InputParams.NearestBufferUV = ViewportUV * ViewportUVToInputBufferUV.xy + ViewportUVToInputBufferUV.zw;
    
            // 处理单个或多通道的响应AA(responsive AA).
            #if AA_SINGLE_PASS_RESPONSIVE
            {
                const uint kResponsiveStencilMask = 1 << 3;
                
                int2 SceneStencilUV = (int2)trunc(InputParams.NearestBufferUV * InputSceneColorSize.xy);
                uint SceneStencilRef = StencilTexture.Load(int3(SceneStencilUV, 0)) STENCIL_COMPONENT_SWIZZLE;
    
                InputParams.bIsResponsiveAAPixel = (SceneStencilRef & kResponsiveStencilMask) ? 1.f : 0.f;
            }
            #elif TAA_RESPONSIVE
                InputParams.bIsResponsiveAAPixel = 1.f;
            #else
                InputParams.bIsResponsiveAAPixel = 0.f;
            #endif
        
            // 处理上采样.
            #if AA_UPSAMPLE
            {
                // 像素原点坐标.
                float2 PPCo = ViewportUV * InputViewSize.xy + TemporalJitterPixels;
                // 像素中心坐标.
                float2 PPCk = floor(PPCo) + 0.5;
                // 像素左上角的中心坐标.
                float2 PPCt = floor(PPCo - 0.5) + 0.5;
            
                InputParams.NearestBufferUV = InputSceneColorSize.zw * (InputViewMin + PPCk);
                InputParams.NearestTopLeftBufferUV = InputSceneColorSize.zw * (InputViewMin + PPCt);
            }
            #endif
        }
    
        // 设置中间结果.
        FTAAIntermediaryResult IntermediaryResult = CreateIntermediaryResult();
    
        // 查找像素和最近相邻像素的运动向量.
        // ------------------------------------------------
        float3 PosN; // 本像素的位置, 但随后可能是最近的相邻像素.
        PosN.xy = InputParams.ScreenPos;
    
        PrecacheInputSceneDepth(InputParams);
        PosN.z = SampleCachedSceneDepthTexture(InputParams, int2(0, 0));
    
        // 最小深度的屏幕位置.
        float2 VelocityOffset = float2(0.0, 0.0);
        #if AA_CROSS // 在深度搜索X模式中使用的像素交叉距离。
        {
            float4 Depths;
            // AA_CROSS默认是2.
            // 左下
            Depths.x = SampleCachedSceneDepthTexture(InputParams, int2(-AA_CROSS, -AA_CROSS));
            // 右上
            Depths.y = SampleCachedSceneDepthTexture(InputParams, int2( AA_CROSS, -AA_CROSS));
            // 左下
            Depths.z = SampleCachedSceneDepthTexture(InputParams, int2(-AA_CROSS,  AA_CROSS));
            // 右下
            Depths.w = SampleCachedSceneDepthTexture(InputParams, int2( AA_CROSS,  AA_CROSS));
    
            float2 DepthOffset = float2(AA_CROSS, AA_CROSS);
            float DepthOffsetXx = float(AA_CROSS);
            #if HAS_INVERTED_Z_BUFFER
                // Nearest depth is the largest depth (depth surface 0=far, 1=near).
                if(Depths.x > Depths.y) 
                {
                    DepthOffsetXx = -AA_CROSS;
                }
                if(Depths.z > Depths.w) 
                {
                    DepthOffset.x = -AA_CROSS;
                }
                float DepthsXY = max(Depths.x, Depths.y);
                float DepthsZW = max(Depths.z, Depths.w);
                if(DepthsXY > DepthsZW) 
                {
                    DepthOffset.y = -AA_CROSS;
                    DepthOffset.x = DepthOffsetXx; 
                }
                float DepthsXYZW = max(DepthsXY, DepthsZW);
                if(DepthsXYZW > PosN.z) 
                {
                    VelocityOffset = DepthOffset * InputSceneColorSize.zw;
    
                    PosN.z = DepthsXYZW;
                }
            #else // !HAS_INVERTED_Z_BUFFER
                #error Fix me!
            #endif // !HAS_INVERTED_Z_BUFFER
        }
        #endif    // AA_CROSS
    
        // 像素或最近像素的摄像机运动(在ScreenPos空间中).
        bool OffScreen = false;
        float Velocity = 0;
        float HistoryBlur = 0;
        float2 HistoryScreenPosition = InputParams.ScreenPos;
    
        #if 1
        {
            // 当前和上一帧裁剪数据.
            float4 ThisClip = float4( PosN.xy, PosN.z, 1 );
            float4 PrevClip = mul( ThisClip, View.ClipToPrevClip );
            float2 PrevScreen = PrevClip.xy / PrevClip.w;
            float2 BackN = PosN.xy - PrevScreen;
    
            float2 BackTemp = BackN * OutputViewportSize.xy;
    
            #if AA_DYNAMIC // 动态模糊.
            {
                float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV + VelocityOffset, 0);
                bool DynamicN = EncodedVelocity.x > 0.0;
                if(DynamicN)
                {
                    BackN = DecodeVelocityFromTexture(EncodedVelocity).xy;
                }
                BackTemp = BackN * OutputViewportSize.xy;
            }
            #endif
    
            Velocity = sqrt(dot(BackTemp, BackTemp));
            #if !AA_BICUBIC
                // Save the amount of pixel offset of just camera motion, used later as the amount of blur introduced by history.
                float HistoryBlurAmp = 2.0;
                HistoryBlur = saturate(abs(BackTemp.x) * HistoryBlurAmp + abs(BackTemp.y) * HistoryBlurAmp);
            #endif
            // 当前像素对应的历史帧位置.
            HistoryScreenPosition = InputParams.ScreenPos - BackN;
    
            // 检测HistoryBufferUV是否在视口之外.
            OffScreen = max(abs(HistoryScreenPosition.x), abs(HistoryScreenPosition.y)) >= 1.0;
        }
        #endif
    
        // 缓存输入的颜色数据, 将它们加载到LDS中.
        PrecacheInputSceneColor(/* inout = */ InputParams);
    
        #if AA_UPSAMPLE_ADAPTIVE_FILTERING == 0
            // 过滤输入数据.
            FilterCurrentFrameInputSamples(
                InputParams,
                /* inout = */ IntermediaryResult);
        #endif
        
        // 计算邻域的包围盒.
        FTAAHistoryPayload NeighborMin;
        FTAAHistoryPayload NeighborMax;
    
        ComputeNeighborhoodBoundingbox(
            InputParams,
            /* inout = */ IntermediaryResult,
            NeighborMin, NeighborMax);
    
        // 采样历史数据.
        FTAAHistoryPayload History = SampleHistory(HistoryScreenPosition);
    
        // 是否需要忽略历史数据(历史数据在视口之外或突然出现).
        bool IgnoreHistory = OffScreen || bCameraCut;
    
        // 动态抗鬼影.
        // ---------------------
        #if AA_DYNAMIC_ANTIGHOST && AA_DYNAMIC && HISTORY_PAYLOAD_COMPONENTS == 3
        bool Dynamic4; // 判断这个点是不是运动的
        {
            #if !AA_DYNAMIC
                #error AA_DYNAMIC_ANTIGHOST requires AA_DYNAMIC
            #endif
            // 分别采样速度缓冲的下边(Dynamic1), 左边(Dynamic3), 自身(Dynamic4), 右边(Dynamic5), 上面(Dynamic7).
            bool Dynamic1 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 0, -1)).x > 0;
            bool Dynamic3 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2(-1,  0)).x > 0;
            Dynamic4 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0).x > 0;
            bool Dynamic5 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 1,  0)).x > 0;
            bool Dynamic7 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 0,  1)).x > 0;
    
            // 判断以上任意一点是否运动的
            bool Dynamic = Dynamic1 || Dynamic3 || Dynamic4 || Dynamic5 || Dynamic7;
            // 继续判断是否需要忽略历史数据(不运动且历史的alpha>0).
            IgnoreHistory = IgnoreHistory || (!Dynamic && History.Color.a > 0);
        }
        #endif
        
        // Clamp历史亮度之前先保存之.
        float LumaMin     = GetSceneColorLuma4(NeighborMin.Color);
        float LumaMax     = GetSceneColorLuma4(NeighborMax.Color);
        float LumaHistory = GetSceneColorLuma4(History.Color);
        
        FTAAHistoryPayload PreClampingHistoryColor = History;
        // Clamp历史数据.
        History = ClampHistory(IntermediaryResult, History, NeighborMin, NeighborMax);
        
        // 颜色Clamp之后过滤输入.
        #if AA_UPSAMPLE_ADAPTIVE_FILTERING == 1
        {
            #if AA_VARIANCE
                #error AA_VARIANCE and AA_UPSAMPLE_ADAPTIVE_FILTERING are not compatible because of circular code dependency.
            #endif
    
            // 忽略历史帧数据.
            if (IgnoreHistory)
            {
                IntermediaryResult.InvFilterScaleFactor = 0;
            }
    
            IntermediaryResult.InvFilterScaleFactor -= (Velocity * UpscaleFactor) * 0.1;
            IntermediaryResult.InvFilterScaleFactor = max(IntermediaryResult.InvFilterScaleFactor, ScreenPercentage);
    
            FilterCurrentFrameInputSamples(
                InputParams,
                /* inout = */ IntermediaryResult);
        }
        #endif
    
        // 重新添加锯齿以锐化
        // -------------------------------
        #if AA_FILTERED && !AA_BICUBIC
        {
            #if AA_UPSAMPLE
                #error Temporal upsample does not support sharpen.
            #endif
            
            // Blend in non-filtered based on the amount of sub-pixel motion.
            float AddAliasing = saturate(HistoryBlur) * 0.5;
            float LumaContrastFactor = 32.0;
            #if AA_YCOCG // TODO: Probably a bug arround here because using Luma4() even with YCOCG=0.
                // 1/4 as bright.
                LumaContrastFactor *= 4.0;
            #endif
            float LumaContrast = LumaMax - LumaMin;
            AddAliasing = saturate(AddAliasing + rcp(1.0 + LumaContrast * LumaContrastFactor));
            IntermediaryResult.Filtered.Color = lerp(IntermediaryResult.Filtered.Color, SampleCachedSceneColorTexture(InputParams, int2(0, 0)).Color, AddAliasing);
        }
        #endif
        
        // 计算混合因子.
        // --------------------
        float BlendFinal;
        {
            float LumaFiltered = GetSceneColorLuma4(IntermediaryResult.Filtered.Color);
    
            // CurrentFrameWeight是从c++传入的,默认为0.04f
            BlendFinal = IntermediaryResult.FilteredTemporalWeight * CurrentFrameWeight;
            // 根据速度进行插值,速度越大,则BlendFinal越大
            // 速度越大,历史帧越不可信
            BlendFinal = lerp(BlendFinal, 0.2, saturate(Velocity / 40));
    
            // 确保至少有一些小的贡献.
            BlendFinal = max( BlendFinal, saturate( 0.01 * LumaHistory / abs( LumaFiltered - LumaHistory ) ) );
    
            #if AA_NAN && (COMPILER_GLSL || COMPILER_METAL)
                // The current Metal & GLSL compilers don't handle saturate(NaN) -> 0, instead they return NaN/INF.
                BlendFinal = -min(-BlendFinal, 0.0);
            #endif
    
            // ResponsiveAA强制成新帧的1/4.
            BlendFinal = InputParams.bIsResponsiveAAPixel ? (1.0/4.0) : BlendFinal;
    
            #if AA_LERP 
                BlendFinal = 1.0/float(AA_LERP);
            #endif
        
            // 处理DOF.
            #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC || AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
            {
                float BilateralWeight = ComputeBilateralWeight(IntermediaryResult.Filtered.CocRadius, History.CocRadius);
    
                BlendFinal = lerp(1, BlendFinal, BilateralWeight);
            }
            #endif
            
            // 如果是镜头切换, 当前帧强制成1.
            if (bCameraCut)
            {
                BlendFinal = 1.0;
            }
        }
    
        // 忽略历史帧, 重置数据.
        if (IgnoreHistory)
        {
            // 历史帧等于滤波后的结果.
            History = IntermediaryResult.Filtered;
            #if HISTORY_PAYLOAD_COMPONENTS == 3
                History.Color.a = 0.0;
            #endif
        }
        
        // 最终在历史和过滤颜色之间混合
        // -------------------------------------------------
        // 亮度权重混合.
        float FilterWeight = GetSceneColorHdrWeight(InputParams, IntermediaryResult.Filtered.Color.x);
        float HistoryWeight = GetSceneColorHdrWeight(InputParams, History.Color.x);
    
        FTAAHistoryPayload OutputPayload;
        {
            // 计算带权重的插值.
            float2 Weights = WeightedLerpFactors(HistoryWeight, FilterWeight, BlendFinal);
            // 增加输出的历史负载数据, 会进行加权, 历史帧的alpha会乘以Weights.x系数下降.
            OutputPayload = AddPayload(MulPayload(History, Weights.x), MulPayload(IntermediaryResult.Filtered, Weights.y));
        }
    
        // 调整靠近1的Alpha, 0.995 < 0.996 = 254/255
        if (OutputPayload.Color.a > 0.995)
        {
            OutputPayload.Color.a = 1;
        }
    
        // 转换颜色回到线性空间.
        OutputPayload.Color = TransformBackToRawLinearSceneColor(OutputPayload.Color);
        
        #if AA_NAN // 非法数据.
            OutputPayload.Color = -min(-OutputPayload.Color, 0.0);
            OutputPayload.CocRadius = isnan(OutputPayload.CocRadius) ? 0.0 : OutputPayload.CocRadius;
        #endif
    
        #if HISTORY_PAYLOAD_COMPONENTS == 3
            #if  AA_DYNAMIC_ANTIGHOST && AA_DYNAMIC 
                // 如果这一帧是运动的话,那么alpha为1,写入历史帧.
                OutputPayload.Color.a = Dynamic4 ? 1 : 0;
            #else
                // 不运动或非动态, Alpha为0.
                OutputPayload.Color.a = 0;
            #endif
        #endif
    
        return OutputPayload;
    }
    

    上面TAA的主流程中涉及到了很多个重要的接口调用,下面继续解析之:

    // 过滤当前帧的输入采样数据.
    void FilterCurrentFrameInputSamples(
        in FTAAInputParameters InputParams,
        inout FTAAIntermediaryResult IntermediaryResult)
    {
        (......)
    
        FTAAHistoryPayload Filtered;
    
        {
            // 上采样.
            #if AA_UPSAMPLE
                // Pixel coordinate of the center of output pixel O in the input viewport.
                float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
    
                // Pixel coordinate of the center of the nearest input pixel K.
                float2 PPCk = floor(PPCo) + 0.5;
            
                // Vector in pixel between pixel K -> O.
                float2 dKO = PPCo - PPCk;
            #endif
            
            // 根据采样数量选择不同的卷积核.
            #if AA_SAMPLES == 9
                const uint SampleIndexes[9] = kSquareIndexes3x3;
            #elif AA_SAMPLES == 5 || AA_SAMPLES == 6
                const uint SampleIndexes[5] = kPlusIndexes3x3;
            #endif
    
            #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC || AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
                // Fetches center pixel's Coc for the bilateral filtering.
                float CenterCocRadius = SampleCachedSceneColorTexture(InputParams, int2(0, 0)).CocRadius;
            #endif
    
            // 计算邻居的HDR, 最终权重和颜色.
            float NeighborsHdrWeight = 0;
            float NeighborsFinalWeight = 0;
            float4 NeighborsColor = 0;
    
            UNROLL
            for (uint i = 0; i < AA_SAMPLES; i++)
            {
                // 从最近的输入像素获得样本偏移量.
                int2 SampleOffset;
                
                #if AA_UPSAMPLE && AA_SAMPLES == 6
                    if (i == 5)
                    {
                        SampleOffset = SignFastInt(dKO);
                    }
                    else
                #endif
                {
                    const uint SampleIndex = SampleIndexes[i];
                    SampleOffset = kOffsets3x3[SampleIndex];
                }
                float2 fSampleOffset = float2(SampleOffset);
                
                // When doing Coc bilateral, the center sample is accumulated last.
                #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC && 0
                    if (all(SampleOffset == 0) && (AA_SAMPLES != 6 || i != 5))
                    {
                        continue;
                    }
                #endif
    
                // 找出这个输入样本的空间权值.
                #if AA_UPSAMPLE
                    // 计算输出像素和输入像素I之间的像素增量.
                    // 注意: abs() 不必要, 因为后面会用dot(dPP, dPP).
                    float2 dPP = fSampleOffset - dKO;
    
                    float SampleSpatialWeight = ComputeSampleWeigth(IntermediaryResult, dPP);
    
                #elif AA_SAMPLES == 9
                    float SampleSpatialWeight = SampleWeights[i];
    
                #elif AA_SAMPLES == 5
                    float SampleSpatialWeight = PlusWeights[i];
    
                #else
                    #error Do not know how to compute filtering sample weight.
                #endif
    
                // 获取颜色采样.
                FTAASceneColorSample Sample = SampleCachedSceneColorTexture(InputParams, SampleOffset);
                    
                // 查找采样点的HDR权重.
                #if AA_TONE
                    float SampleHdrWeight = Sample.HdrWeight;
                #else
                    float SampleHdrWeight = 1;
                #endif
    
                // 根据有效负载求出样本的双边权重.
                #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC
                    float BilateralWeight = ComputeNeightborSampleBilateralWeight(CenterCocRadius, Sample.CocRadius);
    
                #else
                    float BilateralWeight = 1;
    
                #endif
    
                // 计算最终采样权重.
                float SampleFinalWeight = SampleSpatialWeight * SampleHdrWeight * BilateralWeight;
    
                // 应用权重到采样颜色中.
                NeighborsColor       += SampleFinalWeight * Sample.Color;
                NeighborsFinalWeight += SampleFinalWeight;
    
                NeighborsHdrWeight   += SampleSpatialWeight * SampleHdrWeight;
            }
            
            (......)
        }
    
        IntermediaryResult.Filtered = Filtered;
    }
    
    // 计算用于拒绝历史记录的邻域包围盒.
    void ComputeNeighborhoodBoundingbox(
        in FTAAInputParameters InputParams,
        in FTAAIntermediaryResult IntermediaryResult,
        out FTAAHistoryPayload OutNeighborMin,
        out FTAAHistoryPayload OutNeighborMax)
    {
        // 相邻像素的数据.
        FTAAHistoryPayload Neighbors[kNeighborsCount];
        UNROLL
        for (uint i = 0; i < kNeighborsCount; i++)
        {
            Neighbors[i].Color = SampleCachedSceneColorTexture(InputParams, kOffsets3x3[i]).Color;
            Neighbors[i].CocRadius = SampleCachedSceneColorTexture(InputParams, kOffsets3x3[i]).CocRadius;
        }
    
        FTAAHistoryPayload NeighborMin;
        FTAAHistoryPayload NeighborMax;
    
        #if AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_VARIANCE 
        // 这个就是NVIDIA版本的Variance Clipping.
        {
            #if AA_SAMPLES == 9
                const uint SampleIndexes[9] = kSquareIndexes3x3;
            #elif AA_SAMPLES == 5
                const uint SampleIndexes[5] = kPlusIndexes3x3;
            #else
                #error Unknown number of samples.
            #endif
    
            // 计算当前像素的矩(moment).
            float4 m1 = 0;
            float4 m2 = 0;
            for( uint i = 0; i < AA_SAMPLES; i++ )
            {
                float4 SampleColor = Neighbors[ SampleIndexes[i] ];
    
                m1 += SampleColor;
                m2 += Pow2( SampleColor );
            }
    
            m1 *= (1.0 / AA_SAMPLES);
            m2 *= (1.0 / AA_SAMPLES);
    
            // 标准方差.
            float4 StdDev = sqrt( abs(m2 - m1 * m1) );
            // 邻居的最大最小值.
            NeighborMin = m1 - 1.25 * StdDev;
            NeighborMax = m1 + 1.25 * StdDev;
    
            // 跟输入的过滤数据做比较, 找出最大最小值.
            NeighborMin = min( NeighborMin, IntermediaryResult.Filtered );
            NeighborMax = max( NeighborMax, IntermediaryResult.Filtered );
        }
        #elif AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_SAMPLE_DISTANCE
        // 只在某个半径内执行颜色裁剪.
        {
            float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
            float2 PPCk = floor(PPCo) + 0.5;
            float2 dKO = PPCo - PPCk;
            
            // 总是考虑4个样本.
            NeighborMin = Neighbors[4];
            NeighborMax = Neighbors[4];
            
            // 减少距离阈值作为upsacale因素增加, 以减少鬼影.
            float DistthresholdLerp = UpscaleFactor - 1;
            float DistThreshold = lerp(1.51, 1.3, DistthresholdLerp);
    
            #if AA_SAMPLES == 9
                const uint Indexes[9] = kSquareIndexes3x3;
            #else
                const uint Indexes[5] = kPlusIndexes3x3;
            #endif
    
            // 计算所有样本的最大最小值.
            UNROLL
            for( uint i = 0; i < AA_SAMPLES; i++ )
            {
                uint NeightborId = Indexes[i];
                if (NeightborId != 4)
                {
                    float2 dPP = float2(kOffsets3x3[NeightborId]) - dKO;
    
                    FLATTEN
                    if (dot(dPP, dPP) < (DistThreshold * DistThreshold))
                    {
                        NeighborMin = MinPayload(NeighborMin, Neighbors[NeightborId]);
                        NeighborMax = MaxPayload(NeighborMax, Neighbors[NeightborId]);
                    }
                }
            }
        }
        #elif AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_MIN_MAX
        // 用最大最小包围盒来裁剪, 是默认的方式.
        {
            NeighborMin = MinPayload3( Neighbors[1], Neighbors[3], Neighbors[4] );
            NeighborMin = MinPayload3( NeighborMin,  Neighbors[5], Neighbors[7] );
    
            NeighborMax = MaxPayload3( Neighbors[1], Neighbors[3], Neighbors[4] );
            NeighborMax = MaxPayload3( NeighborMax,  Neighbors[5], Neighbors[7] );
            
            #if AA_SAMPLES == 6
            {
                float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
                float2 PPCk = floor(PPCo) + 0.5;
                float2 dKO = PPCo - PPCk;
                
                int2 FifthNeighborOffset = SignFastInt(dKO);
    
                FTAAHistoryPayload FifthNeighbor;
                FifthNeighbor.Color = SampleCachedSceneColorTexture(InputParams, FifthNeighborOffset).Color;
                FifthNeighbor.CocRadius = SampleCachedSceneColorTexture(InputParams, FifthNeighborOffset).CocRadius;
                
                NeighborMin = MinPayload(NeighborMin, FifthNeighbor);
                NeighborMax = MaxPayload(NeighborMax, FifthNeighbor);
            }
            #elif AA_SAMPLES == 9
            {
                FTAAHistoryPayload NeighborMinPlus = NeighborMin;
                FTAAHistoryPayload NeighborMaxPlus = NeighborMax;
    
                NeighborMin = MinPayload3( NeighborMin, Neighbors[0], Neighbors[2] );
                NeighborMin = MinPayload3( NeighborMin, Neighbors[6], Neighbors[8] );
    
                NeighborMax = MaxPayload3( NeighborMax, Neighbors[0], Neighbors[2] );
                NeighborMax = MaxPayload3( NeighborMax, Neighbors[6], Neighbors[8] );
    
                if( AA_ROUND )
                {
                    NeighborMin = AddPayload(MulPayload(NeighborMin, 0.5), MulPayload(NeighborMinPlus, 0.5));
                    NeighborMax = AddPayload(MulPayload(NeighborMax, 0.5), MulPayload(NeighborMaxPlus, 0.5));
                }
            }
            #endif
        }
        #else
            #error Unknown history clamping box.
        #endif
    
        OutNeighborMin = NeighborMin;
        OutNeighborMax = NeighborMax;
    }
    
    // 采样历史数据.
    FTAAHistoryPayload SampleHistory(in float2 HistoryScreenPosition)
    {
        float4 RawHistory0 = 0;
        float4 RawHistory1 = 0;
    
        #if AA_BICUBIC // 用Catmull-Rom曲线采样历史数据, 以减少运动模糊.(默认使用)
        {
            float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
    
            // 裁剪HistoryBufferUV,避免对额外样本的计算.
            #if AA_MANUALLY_CLAMP_HISTORY_UV
                HistoryBufferUV = clamp(HistoryBufferUV, HistoryBufferUVMinMax.xy, HistoryBufferUVMinMax.zw);
            #endif
    
            FCatmullRomSamples Samples = GetBicubic2DCatmullRomSamples(HistoryBufferUV, HistoryBufferSize.xy, HistoryBufferSize.zw);
            for (uint i = 0; i < Samples.Count; i++)
            {
                float2 SampleUV = Samples.UV[i];
    
                // 裁剪SampleUV在HistoryBufferUVMinMax内, 避免取样潜在NaN跑到视图区域之外.
                // 可能消耗很大,但Samples.UVDir实际上是编译期常数。
                if (AA_MANUALLY_CLAMP_HISTORY_UV)
                {
                    if (Samples.UVDir[i].x < 0)
                    {
                        SampleUV.x = max(SampleUV.x, HistoryBufferUVMinMax.x);
                    }
                    else if (Samples.UVDir[i].x > 0)
                    {
                        SampleUV.x = min(SampleUV.x, HistoryBufferUVMinMax.z);
                    }
    
                    if (Samples.UVDir[i].y < 0)
                    {
                        SampleUV.y = max(SampleUV.y, HistoryBufferUVMinMax.y);
                    }
                    else if (Samples.UVDir[i].y > 0)
                    {
                        SampleUV.y = min(SampleUV.y, HistoryBufferUVMinMax.w);
                    }
                }
    
                RawHistory0 += HistoryBuffer_0.SampleLevel(HistoryBufferSampler_0, SampleUV, 0) * Samples.Weight[i];
            }
            RawHistory0 *= Samples.FinalMultiplier;
        }
        // 双线性采样历史数据.
        #else
        {
            // Clamp HistoryScreenPosition to be within viewport.
            if (AA_MANUALLY_CLAMP_HISTORY_UV)
            {
                HistoryScreenPosition = clamp(HistoryScreenPosition, -ScreenPosAbsMax, ScreenPosAbsMax);
            }
    
            float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
    
            RawHistory0 = HistoryBuffer_0.SampleLevel(HistoryBufferSampler_0, HistoryBufferUV, 0);
        }
        #endif
    
        #if HISTORY_RENDER_TARGETS == 2
        {
            if (AA_MANUALLY_CLAMP_HISTORY_UV)
            {
                HistoryScreenPosition = clamp(HistoryScreenPosition, -ScreenPosAbsMax, ScreenPosAbsMax);
            }
    
            float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
    
            RawHistory1 = HistoryBuffer_1.SampleLevel(HistoryBufferSampler_1, HistoryBufferUV, 0);
        }
        #endif
        
        // 处理和保存历史数据的结果.
        FTAAHistoryPayload HistoryPayload;
        HistoryPayload.Color = RawHistory0;
    
        #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
            HistoryPayload.CocRadius = RawHistory1.r;
        #else
            HistoryPayload.CocRadius = RawHistory0.a;
        #endif
    
        #if USE_PREEXPOSURE
            HistoryPayload.Color.rgb *= HistoryPreExposureCorrection;
        #endif
    
        HistoryPayload.Color = TransformSceneColor(HistoryPayload.Color);
    
        return HistoryPayload;
    }
    
    // 裁剪历史数据.
    FTAAHistoryPayload ClampHistory(inout FTAAIntermediaryResult IntermediaryResult, FTAAHistoryPayload History, FTAAHistoryPayload NeighborMin, FTAAHistoryPayload NeighborMax)
    {
        #if !AA_CLAMP
            return History;
        
        #elif AA_CLIP // 使用更紧的AABB裁剪历史数据.
            // 裁剪历史,这使用颜色AABB相交更紧.
            float4 TargetColor = Filtered;
    
            // 历史裁剪.
            float ClipBlend = HistoryClip( HistoryColor.rgb, TargetColor.rgb, NeighborMin.rgb, NeighborMax.rgb );
            // 裁剪到0~1.
            ClipBlend = saturate( ClipBlend );
    
            // 根据混合权重插值历史和目标颜色.
            HistoryColor = lerp( HistoryColor, TargetColor, ClipBlend );
    
            #if AA_FORCE_ALPHA_CLAMP
                HistoryColor.a = clamp( HistoryColor.a, NeighborMin.a, NeighborMax.a );
            #endif
    
            return HistoryColor;
    
        #else //!AA_CLIP, 使用Neighborhood clamping(邻域裁剪).
            History.Color = clamp(History.Color, NeighborMin.Color, NeighborMax.Color);
            History.CocRadius = clamp(History.CocRadius, NeighborMin.CocRadius, NeighborMax.CocRadius);
            return History;
        #endif
    }
    

    此外,重点说下当前帧和历史帧的权重插值WeightedLerpFactors

    // EngineShadersPrivateTemporalAATAACommon.ush
    
    taa_half2 WeightedLerpFactors(taa_half WeightA, taa_half WeightB, taa_half Blend)
    {
        // 先插值获得带权重的A和B.
        taa_half BlendA = (taa_half(1.0) - Blend) * WeightA;
        taa_half BlendB = Blend * WeightB;
        // 计算它们和的倒数.
        taa_half RcpBlend = SafeRcp(BlendA + BlendB);
        // 用它们和的倒数归一化.
        BlendA *= RcpBlend;
        BlendB *= RcpBlend;
        // 输出结果.
        return taa_half2(BlendA, BlendB);
    }
    

    上面的权重插值和线性插值不一样,关键是会将A和B的权重考虑进去,并除以它们的和来达到归一化的目的。

    若用公式表达,则WeightA的插值结果WeightA‘的计算公式为:

    [{WeightA}' = cfrac{(1.0- ext{Blend}) cdot ext{WeightA} }{(1.0- ext{Blend}) cdot ext{WeightA}+ ext{Blend} cdot ext{WeightB}} ]

    WeightB的插值结果WeightB‘的计算公式为:

    [{WeightB}' = cfrac{ ext{Blend} cdot ext{WeightB} }{(1.0- ext{Blend}) cdot ext{WeightA}+ ext{Blend} cdot ext{WeightB}} ]

    下面展示一下无AA、FXX、TAA的对比效果图:

    不过,不要被上面的静态对比图给蒙蔽了,实际上,在运动的画面中,TAA还是存在不少问题,比如细小的物体的颗粒感、画面变模糊、突然消失或出现的像素有瑕疵、存在少许的延时等等。

    但目前的硬件性能,在延迟管线下,TAA还是首选的抗锯齿技术。可以用一些Hack方法来缓解上述的瑕疵。

    7.4.6 SSR

    SSR全称Screen Space Reflections(屏幕空间的反射),是在屏幕空间计算光滑表面的反射效果的技术。

    UE的SSR效果一览。

    SSR和Cubemap、平面反射不一样,效果和效率都鉴于它们之间:

    反射类型 效果 消耗 描述
    Planar Reflections 高,动态 需要用镜像的摄像机渲染多一次
    Screen Space Reflections 中,动态 基于屏幕空间,存在隐藏几何体和边缘裁剪的问题
    Cubemap Reflections 低,静态 预生成,只适用于静态物体的反射

    SSR的核心思想在于重用屏幕空间的数据:

    SSR的核心算法和步骤是对每个像素执行以下步骤:

    • 计算反射射线。
    • 沿着反射射线方向追踪(可用深度缓冲)。
    • 用交点的颜色作为反射颜色。

    .jpg)

    SSR的步骤和依赖的屏幕空间数据,包含场景颜色、法线、深度和蒙版。

    沿着反射射线方向追踪交点时,可用的方法有:

    • 固定步长的Raymarch:实现最简单,但效率差,追踪次数多。
    • 距离场:需预生成,可有效减少追踪次数。
    • Mesh/BVH:要用到分支,实现和数据结构复杂,非缓存一致性的内存访问。
    • Voxels:需预生成,内存消耗大,只能针对高端设备。
    • Hi-Z Buffer(Depth Mip-Map):GPU友好,不能完美覆盖空间。(下图)

    在使用Hi-Z Buffer追踪之前,需要用最小最大值生成深度的MipMap:

    对于粗糙(模糊)表面的反射方向,可以使用重要性采样(Halton序列和BRDF):

    在实现过程中,可以使用符合权重条件的邻域射线来复用交点:

    如何判定邻域射线可用呢?下图就是其中的一个权重算法:

    另外,还有半分辨率来计算SSR,用类似TAA的Jitter方法达到多采样效果:

    在追踪方式上,可用稀疏(Sparse)光线追踪优化,将光线追踪从颜色解析中解耦出来,只在半分辨率上执行,提升颜色解析到全分辨率,依然使用4邻域作为解析:

    在生成重要性采样的射线时,需要过滤之,将射线模拟成椎体,获得更广范围的交点:

    这就需要对场景颜色进行下采样获得Mipmap,模拟不同粗糙度的表面反射交点覆盖的范围。

    SSR由于基于屏幕空间数据(相当于覆盖视锥体最前方的一层纸片),因此,会产生隐藏几何体的问题:

    SSR的瑕疵,注意食指的倒影,明显被截掉了。

    另外,还存在边缘裁剪(Edge Cutoff)的问题:

    以上问题可以使用边缘过渡来缓解:

    SSR只能用于延迟渲染管线,因为要依赖GBuffer数据、场景颜色和深度的Mipmap,它在渲染管线的流程示意图如下:

    SSR内部的渲染过程图示如下:

    下面转向分析UE的实现。首先来看SSR渲染的入口和逻辑:

    void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
    {
        (......)
        
        // 渲染光源.
        RenderLights(GraphBuilder, ...);
        
        (......)
        
        // 渲染反射(包含SSR)和天空光.
        RenderDeferredReflectionsAndSkyLighting(GraphBuilder, ...);
        
        (......)
    }
    
    void FDeferredShadingSceneRenderer::RenderDeferredReflectionsAndSkyLighting(
        FRDGBuilder& GraphBuilder,
        TRDGUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBuffer,
        FRDGTextureMSAA SceneColorTexture,
        FRDGTextureRef DynamicBentNormalAOTexture,
        FRDGTextureRef VelocityTexture,
        FHairStrandsRenderingData* HairDatas)
    {
        (......)
    
        for (FViewInfo& View : Views)
        {  
            (......)
    
            // 处理SSR.
            else if (bScreenSpaceReflections)
            {
                bDenoise = DenoiserMode != 0 && CVarDenoiseSSR.GetValueOnRenderThread();
                bTemporalFilter = !bDenoise && View.ViewState && IsSSRTemporalPassRequired(View);
    
                ESSRQuality SSRQuality;
                GetSSRQualityForView(View, &SSRQuality, &DenoiserConfig);
    
                RDG_EVENT_SCOPE(GraphBuilder, "ScreenSpaceReflections(Quality=%d)", int32(SSRQuality));
    
                // 渲染SSR.
                RenderScreenSpaceReflections(GraphBuilder, SceneTextures, SceneColorTexture.Resolve, View, SSRQuality, bDenoise, &DenoiserInputs);
            }
                
        (......)
    }
    
    // EngineSourceRuntimeRendererPrivateScreenSpaceRayTracing.cpp
        
    void RenderScreenSpaceReflections(
        FRDGBuilder& GraphBuilder,
        const FSceneTextureParameters& SceneTextures,
        const FRDGTextureRef CurrentSceneColor,
        const FViewInfo& View,
        ESSRQuality SSRQuality,
        bool bDenoiser,
        IScreenSpaceDenoiser::FReflectionsInputs* DenoiserInputs,
        FTiledScreenSpaceReflection* TiledScreenSpaceReflection)
    {
        // 处理输入纹理.
        FRDGTextureRef InputColor = CurrentSceneColor;
        if (SSRQuality != ESSRQuality::VisualizeSSR)
        {
            if (View.PrevViewInfo.CustomSSRInput.IsValid())
            {
                InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.CustomSSRInput);
            }
            else if (GSSRHalfResSceneColor && View.PrevViewInfo.HalfResTemporalAAHistory.IsValid())
            {
                InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.HalfResTemporalAAHistory);
            }
            else if (View.PrevViewInfo.TemporalAAHistory.IsValid())
            {
                InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.TemporalAAHistory.RT[0]);
            }
        }
    
        const bool SSRStencilPrePass = CVarSSRStencil.GetValueOnRenderThread() != 0 && SSRQuality != ESSRQuality::VisualizeSSR && TiledScreenSpaceReflection == nullptr;
        
        // 为降噪分配输入.
        {
            FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                FSceneRenderTargets::Get_FrameConstantsOnly().GetBufferSizeXY(),
                PF_FloatRGBA, FClearValueBinding(FLinearColor(0, 0, 0, 0)),
                TexCreate_RenderTargetable | TexCreate_ShaderResource | TexCreate_UAV);
    
            Desc.Flags |= GFastVRamConfig.SSR;
    
            DenoiserInputs->Color = GraphBuilder.CreateTexture(Desc, TEXT("ScreenSpaceReflections"));
    
            if (bDenoiser)
            {
                Desc.Format = PF_R16F;
                DenoiserInputs->RayHitDistance = GraphBuilder.CreateTexture(Desc, TEXT("ScreenSpaceReflectionsHitDistance"));
            }
        }
    
        IScreenSpaceDenoiser::FReflectionsRayTracingConfig RayTracingConfigs;
        GetSSRShaderOptionsForQuality(SSRQuality, &RayTracingConfigs);
            
        // SSR通用shader参数.
        FSSRCommonParameters CommonParameters;
        CommonParameters.SSRParams = ComputeSSRParams(View, SSRQuality, false);
        CommonParameters.ViewUniformBuffer = View.ViewUniformBuffer;
        CommonParameters.SceneTextures = SceneTextures;
    
        if (InputColor == CurrentSceneColor || !CommonParameters.SceneTextures.GBufferVelocityTexture)
        {
            CommonParameters.SceneTextures.GBufferVelocityTexture = GraphBuilder.RegisterExternalTexture(GSystemTextures.MidGreyDummy);
        }
        
        FRenderTargetBindingSlots RenderTargets;
        RenderTargets[0] = FRenderTargetBinding(DenoiserInputs->Color, ERenderTargetLoadAction::ENoAction);
    
        if (bDenoiser)
        {
            RenderTargets[1] = FRenderTargetBinding(DenoiserInputs->RayHitDistance, ERenderTargetLoadAction::ENoAction);
        }
    
        // SSR的模板缓冲Pass.
        if (SSRStencilPrePass)
        {
            // 绑定深度缓冲.
            RenderTargets.DepthStencil = FDepthStencilBinding(
                SceneTextures.SceneDepthTexture,
                ERenderTargetLoadAction::ENoAction,
                ERenderTargetLoadAction::ELoad,
                FExclusiveDepthStencil::DepthNop_StencilWrite);
    
            FScreenSpaceReflectionsStencilPS::FPermutationDomain PermutationVector;
            PermutationVector.Set<FSSROutputForDenoiser>(bDenoiser);
    
            FScreenSpaceReflectionsStencilPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceReflectionsStencilPS::FParameters>();
            PassParameters->CommonParameters = CommonParameters;
            PassParameters->RenderTargets = RenderTargets;
            
            TShaderMapRef<FScreenSpaceReflectionsStencilPS> PixelShader(View.ShaderMap, PermutationVector);
            ClearUnusedGraphResources(PixelShader, PassParameters);
            
            // SSR模板Pass.
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("SSR StencilSetup %dx%d", View.ViewRect.Width(), View.ViewRect.Height()),
                PassParameters,
                ERDGPassFlags::Raster,
                [PassParameters, &View, PixelShader](FRHICommandList& RHICmdList)
            {
                SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
                RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
            
                FGraphicsPipelineStateInitializer GraphicsPSOInit;
                FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
                // Clobers the stencil to pixel that should not compute SSR
                GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Always, SO_Replace, SO_Replace, SO_Replace>::GetRHI();
    
                SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
                SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
    
                RHICmdList.SetStencilRef(0x80);
    
                FPixelShaderUtils::DrawFullscreenTriangle(RHICmdList);
            });
        }
    
        // 增加SSR pass.
        auto SetSSRParameters = [&](auto* PassParameters)
        {
            {
                const FVector2D HZBUvFactor(
                    float(View.ViewRect.Width()) / float(2 * View.HZBMipmap0Size.X),
                    float(View.ViewRect.Height()) / float(2 * View.HZBMipmap0Size.Y));
                PassParameters->HZBUvFactorAndInvFactor = FVector4(
                    HZBUvFactor.X,
                    HZBUvFactor.Y,
                    1.0f / HZBUvFactor.X,
                    1.0f / HZBUvFactor.Y);
            }
            {
                FIntPoint ViewportOffset = View.ViewRect.Min;
                FIntPoint ViewportExtent = View.ViewRect.Size();
                FIntPoint BufferSize = SceneTextures.SceneDepthTexture->Desc.Extent;
    
                if (View.PrevViewInfo.TemporalAAHistory.IsValid())
                {
                    ViewportOffset = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Min;
                    ViewportExtent = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Size();
                    BufferSize = View.PrevViewInfo.TemporalAAHistory.ReferenceBufferSize;
                    ensure(ViewportExtent.X > 0 && ViewportExtent.Y > 0);
                    ensure(BufferSize.X > 0 && BufferSize.Y > 0);
                }
    
                FVector2D InvBufferSize(1.0f / float(BufferSize.X), 1.0f / float(BufferSize.Y));
    
                PassParameters->PrevScreenPositionScaleBias = FVector4(
                    ViewportExtent.X * 0.5f * InvBufferSize.X,
                    -ViewportExtent.Y * 0.5f * InvBufferSize.Y,
                    (ViewportExtent.X * 0.5f + ViewportOffset.X) * InvBufferSize.X,
                    (ViewportExtent.Y * 0.5f + ViewportOffset.Y) * InvBufferSize.Y);
    
                PassParameters->ScreenSpaceRayTracingDebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, DenoiserInputs->Color->Desc, TEXT("DebugSSR"), true);
            }
            PassParameters->PrevSceneColorPreExposureCorrection = InputColor != CurrentSceneColor ? View.PreExposure / View.PrevViewInfo.SceneColorPreExposure : 1.0f;
            
            PassParameters->SceneColor = InputColor;
            PassParameters->SceneColorSampler = GSSRHalfResSceneColor ? TStaticSamplerState<SF_Bilinear>::GetRHI() : TStaticSamplerState<SF_Point>::GetRHI();
            
            PassParameters->HZB = GraphBuilder.RegisterExternalTexture(View.HZB);
            PassParameters->HZBSampler = TStaticSamplerState<SF_Point>::GetRHI();
        };
    
        // SSR的PS参数.
        FScreenSpaceReflectionsPS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FSSRQualityDim>(SSRQuality);
        PermutationVector.Set<FSSROutputForDenoiser>(bDenoiser);
            
        FScreenSpaceReflectionsPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceReflectionsPS::FParameters>();
        PassParameters->CommonParameters = CommonParameters;
        SetSSRParameters(&PassParameters->SSRPassCommonParameter);
        PassParameters->RenderTargets = RenderTargets;
    
        TShaderMapRef<FScreenSpaceReflectionsPS> PixelShader(View.ShaderMap, PermutationVector);
    
        if (TiledScreenSpaceReflection == nullptr) // 非分块SSR(PC默认方式).
        {
            ClearUnusedGraphResources(PixelShader, PassParameters);
            
            // 增加SSR RayMarch通道.
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("SSR RayMarch(Quality=%d RayPerPixel=%d%s) %dx%d",
                    SSRQuality, RayTracingConfigs.RayCountPerPixel, bDenoiser ? TEXT(" DenoiserOutput") : TEXT(""),
                    View.ViewRect.Width(), View.ViewRect.Height()),
                PassParameters,
                ERDGPassFlags::Raster,
                [PassParameters, &View, PixelShader, SSRStencilPrePass](FRHICommandList& RHICmdList)
            {
                SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
                RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
            
                FGraphicsPipelineStateInitializer GraphicsPSOInit;
                FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
                if (SSRStencilPrePass)
                {
                    // Clobers the stencil to pixel that should not compute SSR
                    GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Equal, SO_Keep, SO_Keep, SO_Keep>::GetRHI();
                }
    
                SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
                SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
    
                RHICmdList.SetStencilRef(0x80);
    
                // 绘制全屏幕.
                FPixelShaderUtils::DrawFullscreenTriangle(RHICmdList);
            });
        }
        else // 分块SSR.
        {
            check(TiledScreenSpaceReflection->TileSize == 8); // WORK_TILE_SIZE
    
            FScreenSpaceReflectionsTileVS::FPermutationDomain VsPermutationVector;
            TShaderMapRef<FScreenSpaceReflectionsTileVS> VertexShader(View.ShaderMap, VsPermutationVector);
    
            PassParameters->TileListData = TiledScreenSpaceReflection->TileListStructureBufferSRV;
            PassParameters->IndirectDrawParameter = TiledScreenSpaceReflection->DispatchIndirectParametersBuffer;
    
            ValidateShaderParameters(VertexShader, *PassParameters);
            ValidateShaderParameters(PixelShader, *PassParameters);
    
            // 增加SSR RayMarch通道.
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("SSR RayMarch(Quality=%d RayPerPixel=%d%s) %dx%d",
                    SSRQuality, RayTracingConfigs.RayCountPerPixel, bDenoiser ? TEXT(" DenoiserOutput") : TEXT(""),
                    View.ViewRect.Width(), View.ViewRect.Height()),
                PassParameters,
                ERDGPassFlags::Raster,
                [PassParameters, &View, VertexShader, PixelShader, SSRStencilPrePass](FRHICommandList& RHICmdList)
            {
                SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
                RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
    
                FGraphicsPipelineStateInitializer GraphicsPSOInit;
                FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
                if (SSRStencilPrePass)
                {
                    // Clobers the stencil to pixel that should not compute SSR
                    GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Equal, SO_Keep, SO_Keep, SO_Keep>::GetRHI();
                }
                GraphicsPSOInit.PrimitiveType = GRHISupportsRectTopology ? PT_RectList : PT_TriangleList;
                GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GEmptyVertexDeclaration.VertexDeclarationRHI;
                GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
                GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();
    
                SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
                SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), *PassParameters);
                SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
    
                RHICmdList.SetStencilRef(0x80);
    
                PassParameters->IndirectDrawParameter->MarkResourceAsUsed();
    
                RHICmdList.DrawPrimitiveIndirect(PassParameters->IndirectDrawParameter->GetIndirectRHICallBuffer(), 0);
            });
        }
    } // RenderScreenSpaceReflections()
    

    以上代码可知,SSR是RenderLights之后渲染的。在笔者的PC设备中,SSR采用Quality为2、每像素射线数量为1、非分块的参数:

    下面继续分析SSR RayMarch使用的Shader代码:

    // EngineShadersPrivateSSRTSSRTReflections.usf
    
    void ScreenSpaceReflections(
        float4 SvPosition
        , out float4 OutColor
    #if SSR_OUTPUT_FOR_DENOISER
        , out float4 OutClosestHitDistance
    #endif
    )
    {
        // 获取坐标.
        float2 UV = SvPosition.xy * View.BufferSizeAndInvSize.zw;
        float2 ScreenPos = ViewportUVToScreenPos((SvPosition.xy - View.ViewRectMin.xy) * View.ViewSizeAndInvSize.zw);
        uint2 PixelPos = (uint2)SvPosition.xy;
        
        bool bDebugPrint = all(PixelPos == uint2(View.ViewSizeAndInvSize.xy) / 2);
    
        OutColor = 0;
        
        #if SSR_OUTPUT_FOR_DENOISER
            OutClosestHitDistance = -2.0;
        #endif
    
        // 获取GBuffer.
        FGBufferData GBuffer = GetGBufferDataFromSceneTextures(UV);
    
        float3 N = GBuffer.WorldNormal;
        const float SceneDepth = GBuffer.Depth;
        const float3 PositionTranslatedWorld = mul( float4( ScreenPos * SceneDepth, SceneDepth, 1 ), View.ScreenToTranslatedWorld ).xyz;
        const float3 V = normalize(View.TranslatedWorldCameraOrigin - PositionTranslatedWorld);
    
        // 修改GGX各向异性法线粗糙度.
        ModifyGGXAnisotropicNormalRoughness(GBuffer.WorldTangent, GBuffer.Anisotropy, GBuffer.Roughness, N, V);
        
        float Roughness = GetRoughness(GBuffer);
        float RoughnessFade = GetRoughnessFade(Roughness);
    
        // 提前退出. 如果用了模板prepass, 则无用.
        BRANCH if( RoughnessFade <= 0.0 || GBuffer.ShadingModelID == 0 )
        {
            return;
        }
    
        // 初始化粗糙度, Vis, 最近交点等数据.
        float a = Roughness * Roughness;
        float a2 = a * a;
        
        float NoV = saturate( dot( N, V ) );
        float G_SmithV = 2 * NoV / (NoV + sqrt(NoV * (NoV - NoV * a2) + a2));
    
        float ClosestHitDistanceSqr = INFINITE_FLOAT;
    
        // 根据质量等级设置步数, 每像素光线数量, 是否光滑.
    #if SSR_QUALITY == 1
        uint NumSteps = 8;
        uint NumRays = 1;
        bool bGlossy = false;
    #elif SSR_QUALITY == 2
        uint NumSteps = 16;
        uint NumRays = 1;
        #if SSR_OUTPUT_FOR_DENOISER
            bool bGlossy = true;
        #else
            bool bGlossy = false;
        #endif
    #elif SSR_QUALITY == 3
        uint NumSteps = 8;
        uint NumRays = 4;
        bool bGlossy = true;
    #else // SSR_QUALITY == 4
        uint NumSteps = 12;
        uint NumRays = 12;
        bool bGlossy = true;
    #endif
        
        if( NumRays > 1 ) // 每像素射线大于1
        {
            // 计算噪点和随机数.
            float2 Noise;
            Noise.x = InterleavedGradientNoise( SvPosition.xy, View.StateFrameIndexMod8 );
            Noise.y = InterleavedGradientNoise( SvPosition.xy, View.StateFrameIndexMod8 * 117 );
        
            uint2 Random = Rand3DPCG16( int3( PixelPos, View.StateFrameIndexMod8 ) ).xy;
            
            // 获取当前法线在切线空间的3个正交的基向量.
            float3x3 TangentBasis = GetTangentBasis( N );
            // 切线空间的V向量.
            float3 TangentV = mul( TangentBasis, V );
    
            float Count = 0;
    
            // 如果粗糙度很小, 说明是光滑的表面, 修改步数和射线数量.
            if( Roughness < 0.1 )
            {
                NumSteps = min( NumSteps * NumRays, 24u );
                NumRays = 1;
            }
    
            // 发射NumRays条射线.
            LOOP for( uint i = 0; i < NumRays; i++ )
            {
                float StepOffset = Noise.x;
                StepOffset -= 0.5;
                
                // Hammersley低差异序列.
                float2 E = Hammersley16( i, NumRays, Random );
                // 先对E在圆盘上采样, 再结合粗糙度和切线空间的V向量做GGX重要性采样, 最后将结果变换到切线空间, 获得了切线空间的半向量H.
                float3 H = mul( ImportanceSampleVisibleGGX(UniformSampleDisk(E), a2, TangentV ).xyz, TangentBasis );
                // 计算光源方向.
                float3 L = 2 * dot( V, H ) * H - V;
    
                float3 HitUVz;
                float Level = 0;
                
                // 如果是光滑表面, 将光源方向转化成其反射向量.
                if( Roughness < 0.1 )
                {
                    L = reflect(-V, N);
                }
                
                // 执行HZB射线检测.
                bool bHit = RayCast(
                    HZB, HZBSampler,
                    PositionTranslatedWorld, L, Roughness, SceneDepth, 
                    NumSteps, StepOffset,
                    HZBUvFactorAndInvFactor,
                    bDebugPrint,
                    HitUVz,
                    Level
                );
    
                // 如果命中, 采样场景颜色.
                BRANCH if( bHit )
                {
                    ClosestHitDistanceSqr = min(ClosestHitDistanceSqr, ComputeRayHitSqrDistance(PositionTranslatedWorld, HitUVz));
    
                    float2 SampleUV;
                    float Vignette;
                    // 重投影交点.
                    ReprojectHit(PrevScreenPositionScaleBias, GBufferVelocityTexture, GBufferVelocityTextureSampler, HitUVz, SampleUV, Vignette);
    
                    // 采样场景颜色.
                    float4 SampleColor = SampleScreenColor( SceneColor, SceneColorSampler, SampleUV ) * Vignette;
    
                    SampleColor.rgb *= rcp( 1 + Luminance(SampleColor.rgb) );
                    OutColor += SampleColor;
                }
            }
    
            OutColor /= max( NumRays, 0.0001 );
            OutColor.rgb *= rcp( 1 - Luminance(OutColor.rgb) );
        }
        else // 每像素射线==1
        {
            float StepOffset = InterleavedGradientNoise(SvPosition.xy, View.StateFrameIndexMod8);
            StepOffset -= 0.5;
            
            float3 L;
            if (bGlossy)
            {
                float2 E = Rand1SPPDenoiserInput(PixelPos);
                
                #if SSR_OUTPUT_FOR_DENOISER
                {
                    E.y *= 1 - GGX_IMPORTANT_SAMPLE_BIAS;
                }
                #endif
                
                float3x3 TangentBasis = GetTangentBasis( N );
                float3 TangentV = mul( TangentBasis, V );
    
                float3 H = mul( ImportanceSampleVisibleGGX(UniformSampleDisk(E), a2, TangentV ).xyz, TangentBasis );
                L = 2 * dot( V, H ) * H - V;
            }
            else
            {
                L = reflect( -V, N );
            }
            
            float3 HitUVz;
            float Level = 0;
            
            // HZB射线检测.
            bool bHit = RayCast(
                HZB, HZBSampler,
                PositionTranslatedWorld, L, Roughness, SceneDepth,
                NumSteps, StepOffset,
                HZBUvFactorAndInvFactor,
                bDebugPrint,
                HitUVz,
                Level
            );
    
            // 处理交点后的采样数据.
            BRANCH if( bHit )
            {
                ClosestHitDistanceSqr = ComputeRayHitSqrDistance(PositionTranslatedWorld, HitUVz);
    
                float2 SampleUV;
                float Vignette;
                ReprojectHit(PrevScreenPositionScaleBias, GBufferVelocityTexture, GBufferVelocityTextureSampler, HitUVz, SampleUV, Vignette);
    
                OutColor = SampleScreenColor(SceneColor, SceneColorSampler, SampleUV) * Vignette;
            }
        }
        
        // 颜色过渡.
        OutColor *= RoughnessFade;
        OutColor *= SSRParams.r;
    
    #if USE_PREEXPOSURE
        OutColor.rgb *= PrevSceneColorPreExposureCorrection;
    #endif
        
        // 为降噪输出最近交点的距离.
        #if SSR_OUTPUT_FOR_DENOISER
        {
            OutClosestHitDistance = ComputeDenoiserConfusionFactor(
                ClosestHitDistanceSqr > 0,
                length(View.TranslatedWorldCameraOrigin - PositionTranslatedWorld),
                sqrt(ClosestHitDistanceSqr));
        }
        #endif
    }
    
    // PS主入口.
    void ScreenSpaceReflectionsPS(
        float4 SvPosition : SV_POSITION
        , out float4 OutColor : SV_Target0
    #if SSR_OUTPUT_FOR_DENOISER
        , out float4 OutClosestHitDistance : SV_Target1
    #endif
    )
    {
        ScreenSpaceReflections(SvPosition, OutColor
    #if SSR_OUTPUT_FOR_DENOISER
            ,OutClosestHitDistance
    #endif
        );
    }
    
    

    下面分析射线检测RayCast的逻辑主堆栈:

    // 射线检测.
    bool RayCast(
        Texture2D Texture, SamplerState Sampler,
        float3 RayOriginTranslatedWorld, float3 RayDirection,
        float Roughness, float SceneDepth,
        uint NumSteps, float StepOffset,
        float4 HZBUvFactorAndInvFactor, 
        bool bDebugPrint,
        out float3 OutHitUVz,
        out float Level)
    {
        FSSRTRay Ray = InitScreenSpaceRayFromWorldSpace(RayOriginTranslatedWorld, RayDirection, SceneDepth);
    
        // 检测单个屏幕空间的射线.
        return CastScreenSpaceRay(
            Texture, Sampler,
            Ray,
            Roughness, NumSteps, StepOffset,
            HZBUvFactorAndInvFactor, bDebugPrint,
            /* out */ OutHitUVz,
            /* out */ Level);
    } // RayCast()
    
    
    // 检测单个屏幕空间的射线.
    bool CastScreenSpaceRay(
        Texture2D Texture, SamplerState Sampler,
        FSSRTRay Ray,
        float Roughness,
        uint NumSteps, float StepOffset,
        float4 HZBUvFactorAndInvFactor, 
        bool bDebugPrint,
        out float3 OutHitUVz,
        out float Level)
    {
        // 初始化射线的起点, 步长等.
        const float3 RayStartScreen = Ray.RayStartScreen;
        float3 RayStepScreen = Ray.RayStepScreen;
    
        float3 RayStartUVz = float3( (RayStartScreen.xy * float2( 0.5, -0.5 ) + 0.5) * HZBUvFactorAndInvFactor.xy, RayStartScreen.z );
        float3 RayStepUVz  = float3(  RayStepScreen.xy  * float2( 0.5, -0.5 )         * HZBUvFactorAndInvFactor.xy, RayStepScreen.z );
        
        const float Step = 1.0 / NumSteps;
        float CompareTolerance = Ray.CompareTolerance * Step;
        
        float LastDiff = 0;
        Level = 1;
    
        RayStepUVz *= Step;
        float3 RayUVz = RayStartUVz + RayStepUVz * StepOffset;
        #if IS_SSGI_SHADER && SSGI_TRACE_CONE
            RayUVz = RayStartUVz;
        #endif
        
        float4 MultipleSampleDepthDiff;
        bool4 bMultipleSampleHit; // TODO: Might consumes VGPRS if bug in compiler.
        bool bFoundAnyHit = false;
        
        #if IS_SSGI_SHADER && SSGI_TRACE_CONE
            const float ConeAngle = PI / 4;
            const float d = 1;
            const float r = d * sin(0.5 * ConeAngle);
            const float Exp = 1.6; //(d + r) / (d - r);
            const float ExpLog2 = log2(Exp);
            const float MaxPower = exp2(log2(Exp) * (NumSteps + 1.0)) - 0.9;
    
            {
                //Level = 2;
            }
        #endif
    
        uint i;
    
        // 最多检测NumSteps次, 找到交点就退出循环. 每次检测SSRT_SAMPLE_BATCH_SIZE(4)个采样点.
        LOOP
        for (i = 0; i < NumSteps; i += SSRT_SAMPLE_BATCH_SIZE)
        {
            float2 SamplesUV[SSRT_SAMPLE_BATCH_SIZE];
            float4 SamplesZ;
            float4 SamplesMip;
    
            // 计算采样坐标, 深度和深度纹理Mip层级.
            #if IS_SSGI_SHADER && SSGI_TRACE_CONE // SSGI或锥体追踪
            {
                UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
                for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
                {
                    float S = float(i + j) + StepOffset;
    
                    float NormalizedPower = (exp2(ExpLog2 * S) - 0.9) / MaxPower;
    
                    float Offset = NormalizedPower * NumSteps;
    
                    SamplesUV[j] = RayUVz.xy + Offset * RayStepUVz.xy;
                    SamplesZ[j] = RayUVz.z + Offset * RayStepUVz.z;
                }
            
                SamplesMip.xy = Level;
                Level += (8.0 / NumSteps) * Roughness;
            
                SamplesMip.zw = Level;
                Level += (8.0 / NumSteps) * Roughness;
            }
            #else // SSR执行此分支.
            {
                UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
                for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
                {
                    SamplesUV[j] = RayUVz.xy + (float(i) + float(j + 1)) * RayStepUVz.xy;
                    SamplesZ[j] = RayUVz.z + (float(i) + float(j + 1)) * RayStepUVz.z;
                }
            
                // 采样深度的Mip层级.
                SamplesMip.xy = Level;
                // 调整层级, 注意受粗糙度影响, 粗糙度越小, Level也越小.
                Level += (8.0 / NumSteps) * Roughness;
            
                SamplesMip.zw = Level;
                Level += (8.0 / NumSteps) * Roughness;
            }
            #endif
    
            // 采样场景深度.
            float4 SampleDepth;
            {
                UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
                for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
                {
                    SampleDepth[j] = Texture.SampleLevel(Sampler, SamplesUV[j], SamplesMip[j]).r;
                }
            }
    
            // 计算是否相交.
            // 计算射线采样点深度和深度纹理的差异.
            MultipleSampleDepthDiff = SamplesZ - SampleDepth;
            // 检测是否小于深度对比容忍值.
            bMultipleSampleHit = abs(MultipleSampleDepthDiff + CompareTolerance) < CompareTolerance;
            // 4个采样点只要有1个满足就算相交.
            bFoundAnyHit = any(bMultipleSampleHit);
    
            // 找到交点, 退出循环.
            BRANCH
            if (bFoundAnyHit)
            {
                break;
            }
    
            LastDiff = MultipleSampleDepthDiff.w;
        } // for( uint i = 0; i < NumSteps; i += 4 )
        
        // 计算输出坐标.
        BRANCH
        if (bFoundAnyHit)
        {
            (......)
            #else // SSR
            {
                float DepthDiff0 = MultipleSampleDepthDiff[2];
                float DepthDiff1 = MultipleSampleDepthDiff[3];
                float Time0 = 3;
    
                FLATTEN
                if (bMultipleSampleHit[2])
                {
                    DepthDiff0 = MultipleSampleDepthDiff[1];
                    DepthDiff1 = MultipleSampleDepthDiff[2];
                    Time0 = 2;
                }
                FLATTEN
                if (bMultipleSampleHit[1])
                {
                    DepthDiff0 = MultipleSampleDepthDiff[0];
                    DepthDiff1 = MultipleSampleDepthDiff[1];
                    Time0 = 1;
                }
                FLATTEN
                if (bMultipleSampleHit[0])
                {
                    DepthDiff0 = LastDiff;
                    DepthDiff1 = MultipleSampleDepthDiff[0];
                    Time0 = 0;
                }
    
                Time0 += float(i);
                float Time1 = Time0 + 1;
    
                // 利用线段交点找到更准确的交点.
                float TimeLerp = saturate(DepthDiff0 / (DepthDiff0 - DepthDiff1));
                float IntersectTime = Time0 + TimeLerp;
                    
                OutHitUVz = RayUVz + RayStepUVz * IntersectTime;
            }
            #endif
    
            // 输出交点的数据.
            OutHitUVz.xy *= HZBUvFactorAndInvFactor.zw;
            OutHitUVz.xy = OutHitUVz.xy * float2( 2, -2 ) + float2( -1, 1 );
            OutHitUVz.xy = OutHitUVz.xy * View.ScreenPositionScaleBias.xy + View.ScreenPositionScaleBias.wz;
        }
        else
        {
            OutHitUVz = float3(0, 0, 0);
        }
        
        return bFoundAnyHit;
    } // CastScreenSpaceRay()
    

    SSR的RayMarch每次循环执行4个采样点,以减少循环次数,深度的Level随着采样点而增加,且受粗糙度影响,这也符合物理原理:越光滑的表面,反射的颜色越清晰,对应的Mip层级越小。在测试交点时允许一定范围的深度误差,以加快检测。

    根据RenderDoc截帧,可以发现其使用的场景颜色不是本帧的,而是上一帧的TAA数据:

    这主要是因为TAA在后处理阶段执行,在SSR阶段,本帧的场景颜色还未执行后处理,还处于未抗锯齿的原始状态,直接使用势必降低SSR效果。由于SSR的RayMarch的SPP默认才1,意味着此阶段的颜色带有明显的噪点:

    这就需要后续步骤进行降噪,UE是在TAA阶段处理的:

    上:SSR带噪点纹理;中:TAA的历史帧纹理;下:TAA处理后的纹理。

    7.4.7 SSAO

    AO(Ambient Occlusion)的本质是遮蔽来自环境光(非直接光),它可以通过发射很多射线来计算被遮挡的系数:

    在实时渲染,计算遮挡系数存在两种方式:物体空间和屏幕空间。物体空间的AO使用真实的几何体进行射线检测,根据场景复杂度呈指数级的消耗,通常比较慢,需要复杂的物体简化和空间的数据结构技术。而屏幕空间的AO在后处理阶段完成,不需要预处理数据,不依赖场景复杂度,实现简单,消耗较低,但不是物理正确的,只能得到近似的遮挡结果。

    限制表面法线形成的半球半径来近似AO,可以在封闭区域(enclosed area,如室内)比较好且有效地获得遮挡效果:

    SSAO(Screen Space Ambient Occlusion)使用深度缓冲来近似场景物体,给每个像素在球体范围内采样若干次,并测试深度缓冲:

    如果超过一半的采样点不被遮挡(通过了深度测试),AO将被应用。如果法线不可以,则需要用球体代替半球体:

    这种基于屏幕空间的深度测试是存在瑕疵的,比如下图中的箭头所指的点,虽然不可以通过深度测试,但其实是没有被遮挡的点:

    如果拥有法线信息,则可以将SSAO升级成HBAO(Horizon Based Ambient Occlusion),它会在法线形成的半球内采样,在深度缓冲中近似光线追踪:

    HBAO采用更加精准的计算公式:

    SSAO还可以采用半分辨率来加速计算,但也会在细小的高频物体(如草)产生闪烁的瑕疵。

    除了SSAO和HBAO之外,还存在SSDO(Screen Space Directional Occlusion,屏幕空间方向遮挡)的技术,它和SSAO不同之处在于:SSAO会给每个像素生成多个采样点来累积遮挡系数,而SSDO会给每个像素生成多个方向来累积它们的辐射率:

    SSDO在可见性测试也有所不同,过程更加复杂,结合下图加以描述。

    对每个像素执行以下操作:

    • 在像素P点以半径(r_{max})​结合法线形成的半球内计算N个采样点(A-D),每个采样点执行以下操作:
      • 将采样点投影到屏幕空间。
      • 根据深度缓冲计算表面位置。
      • 如果采样点向上,将视作被遮挡(如图中的A、B、D点)。
      • 如果采样点向下,则P点被来自这个方向(C)的光照照亮(采用模糊的环境图,滤波约等于(2pi/N)​​)。

    对于非直接光照的计算,SSDO采用这样的方式:将每个采样点当成很小的区域光,朝向像素法线,然后给每个采样点计算到P点的形状因子(form factor),并累积贡献量,获得一次间接光反弹的近似结果:

    相比SSAO,SDAO有方向性,且有颜色,从而获得了类似Color Bleeding的GI效果:

    除了上述的AO技术,还有HBAO+、HDAO(High Definition Ambient Occlusion)、Hybrid Ambient OcclusionMSSAO(Multi-Resolution Screen-Space Ambient Occlusion)VXAO(Voxel Accelerated Ambient Occlusion)GTAO(Ground-Truth Ambient Occlusion)等等AO技术。

    阐述完SSAO及相关技术的理论,接下来直接进入UE的代码实现。UE的SSAO入口位于RenderBasePass和RenderLights之间:

    void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
    {
        (......)
        
        RenderBasePass(...);
        
        (......)
        
        // 准备光照之前的组合光照阶段,如延迟贴花、SSO等。
        GCompositionLighting.Reset();
        if (FeatureLevel >= ERHIFeatureLevel::SM5)
        {
            (......)
    
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
            {
                const FViewInfo& View = Views[ViewIndex];
                // SSAO在此接口渲染.
                GCompositionLighting.ProcessAfterBasePass(GraphBuilder, Scene->UniformBuffers, View, SceneTextures);
            }
            
            (......)
        }
        
        (......)
        
        RenderLights(...);
    }
    

    下面分析GCompositionLighting::ProcessAfterBasePass

    // EngineSourceRuntimeRendererPrivateCompositionLightingCompositionLighting.cpp
    
    void FCompositionLighting::ProcessAfterBasePass(
        FRDGBuilder& GraphBuilder,
        FPersistentUniformBuffers& UniformBuffers,
        const FViewInfo& View,
        TRDGUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBuffer)
    {
        FSceneRenderTargets& SceneContext = FSceneRenderTargets::Get(GraphBuilder.RHICmdList);
    
        if (CanOverlayRayTracingOutput(View))
        {
            const FSceneViewFamily& ViewFamily = *View.Family;
            RDG_EVENT_SCOPE(GraphBuilder, "LightCompositionTasks_PreLighting");
    
            AddPass(GraphBuilder, [&UniformBuffers, &View](FRHICommandList&)
            {
                UniformBuffers.UpdateViewUniformBuffer(View);
            });
    
            (......)
    
            // Forward shading SSAO is applied before the base pass using only the depth buffer.
            // SSAO只需要深度缓冲, 不需要GBuffer, 可以在
            if (!IsForwardShadingEnabled(View.GetShaderPlatform()))
            {
                FScreenPassRenderTarget FinalTarget = FScreenPassRenderTarget(GraphBuilder.RegisterExternalTexture(SceneContext.ScreenSpaceAO, TEXT("AmbientOcclusionDirect")), View.ViewRect, ERenderTargetLoadAction::ENoAction);
    
                FScreenPassTexture AmbientOcclusion;
    
                // 根据SSAO级别和类型选择不同的渲染方式.
                const uint32 SSAOLevels = FSSAOHelper::ComputeAmbientOcclusionPassCount(View);
                if (SSAOLevels)
                {
                    const EGTAOType GTAOType = FSSAOHelper::GetGTAOPassType(View, SSAOLevels);
    
                    TUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBufferRHI = CreateSceneTextureUniformBuffer(GraphBuilder.RHICmdList, View.FeatureLevel);
    
                    // 分离的GTAO.
                    if (GTAOType == EGTAOType::EAsyncHorizonSearch || GTAOType == EGTAOType::EAsyncCombinedSpatial)
                    {
                        (......)
                        
                        AmbientOcclusion = AddPostProcessingGTAOPostAsync(GraphBuilder, View, Parameters, GTAOHorizons, FinalTarget);
                    }
                    else // 非分离的GTAO
                    {
                        if (GTAOType == EGTAOType::ENonAsync)
                        {
                            FGTAOCommonParameters Parameters = GetGTAOCommonParameters(GraphBuilder, View, SceneTexturesUniformBuffer, SceneTexturesUniformBufferRHI, GTAOType);
                            AmbientOcclusion = AddPostProcessingGTAOAllPasses(GraphBuilder, View, Parameters, FinalTarget);
                        }
                        else // 默认情况下, UE执行此分支.
                        {
                            FSSAOCommonParameters Parameters = GetSSAOCommonParameters(GraphBuilder, View, SceneTexturesUniformBuffer, SceneTexturesUniformBufferRHI, SSAOLevels, true);
                            AmbientOcclusion = AddPostProcessingAmbientOcclusion(GraphBuilder, View, Parameters, FinalTarget);
                        }
                        
                        (......)
                    }
    
                    SceneContext.bScreenSpaceAOIsValid = true;
                }
            }
        }
    }
    

    在默认情况下,UE会执行AddPostProcessingAmbientOcclusion,进入此接口分析:

    // @param Levels 0..3, how many different resolution levels we want to render
    static FScreenPassTexture AddPostProcessingAmbientOcclusion(
        FRDGBuilder& GraphBuilder,
        const FViewInfo& View,
        const FSSAOCommonParameters& CommonParameters,
        FScreenPassRenderTarget FinalTarget)
    {
        check(CommonParameters.Levels >= 0 && CommonParameters.Levels <= 3);
    
        FScreenPassTexture AmbientOcclusionInMip1;
        FScreenPassTexture AmbientOcclusionPassMip1;
        // 如果Level>=2, 则执行1~2次SetupPass, 1~2次StepPass, 1次FinalPass
        if (CommonParameters.Levels >= 2)
        {
            AmbientOcclusionInMip1 =
                AddAmbientOcclusionSetupPass(
                    GraphBuilder,
                    View,
                    CommonParameters,
                    CommonParameters.SceneDepth);
    
            FScreenPassTexture AmbientOcclusionPassMip2;
            if (CommonParameters.Levels >= 3)
            {
                FScreenPassTexture AmbientOcclusionInMip2 =
                    AddAmbientOcclusionSetupPass(
                        GraphBuilder,
                        View,
                        CommonParameters,
                        AmbientOcclusionInMip1);
    
                AmbientOcclusionPassMip2 =
                    AddAmbientOcclusionStepPass(
                        GraphBuilder,
                        View,
                        CommonParameters,
                        AmbientOcclusionInMip2,
                        AmbientOcclusionInMip2,
                        FScreenPassTexture(),
                        CommonParameters.HZBInput);
            }
    
            AmbientOcclusionPassMip1 =
                AddAmbientOcclusionStepPass(
                    GraphBuilder,
                    View,
                    CommonParameters,
                    AmbientOcclusionInMip1,
                    AmbientOcclusionInMip1,
                    AmbientOcclusionPassMip2,
                    CommonParameters.HZBInput);
        }
    
        FScreenPassTexture FinalOutput =
            AddAmbientOcclusionFinalPass(
                GraphBuilder,
                View,
                CommonParameters,
                CommonParameters.GBufferA,
                AmbientOcclusionInMip1,
                AmbientOcclusionPassMip1,
                CommonParameters.HZBInput,
                FinalTarget);
    
        return FinalOutput;
    }
    

    上面会根据不同的Levels执行不同次数的Setup、Step、Final通道,以笔者的截帧为例,执行了一次Setup两次PS:

    Setup阶段只要是对法线执行下采样,以获得半分辨率的法线。下采样的PS代码如下:

    // EngineShadersPrivatePostProcessAmbientOcclusion.usf
    
    void MainSetupPS(in noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor0 : SV_Target0)
    {
        float2 ViewPortSize = AOViewport_ViewportSize;
        float2 InUV = UVAndScreenPos.xy;
    
        // 4个采样点.
        float2 UV[4];
        UV[0] = InUV + float2(-0.5f, -0.5f) * InputExtentInverse;
        UV[1] = min(InUV + float2( 0.5f, -0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
        UV[2] = min(InUV + float2(-0.5f,  0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
        UV[3] = min(InUV + float2( 0.5f,  0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
    
        float4 Samples[4];
        
        // 获取输入纹理的4个采样点数据.
        UNROLL for(uint i = 0; i < 4; ++i)
        {
    #if COMPUTE_SHADER || FORWARD_SHADING
            // Async compute and forward shading don't have access to the gbuffer.
            Samples[i].rgb = normalize(ReconstructNormalFromDepthBuffer(float4(UV[i] * ViewPortSize, SvPosition.zw))) * 0.5f + 0.5f;
    #else
            Samples[i].rgb = GetGBufferData(UV[i], true).WorldNormal * 0.5f + 0.5f;
    #endif
            Samples[i].a = CalcSceneDepth(UV[i]);
        }
        
        float MaxZ = max( max(Samples[0].a, Samples[1].a), max(Samples[2].a, Samples[3].a));
    
        // 平均颜色值, 此处采样了深度相似度来作为缩放权重.
        float4 AvgColor = 0.0f;
        if (USE_NORMALS)
        {
            AvgColor = 0.0001f;
    
            {
                UNROLL for(uint i = 0; i < 4; ++i)
                {
                    AvgColor += float4(Samples[i].rgb, 1) * ComputeDepthSimilarity(Samples[i].a, MaxZ, ThresholdInverse);
                }
                AvgColor.rgb /= AvgColor.w;
            }
        }
    
        OutColor0 = float4(AvgColor.rgb, MaxZ / Constant_Float16F_Scale);
    }
    

    上面在平均颜色值时使用了深度相似度ComputeDepthSimilarity作为颜色的缩放权重:

    // 0表示非常不相似, 1表示非常相似.
    float ComputeDepthSimilarity(float DepthA, float DepthB, float TweakScale)
    {
        return saturate(1 - abs(DepthA - DepthB) * TweakScale);
    }
    

    在StepPass,执行半分辨率的AO计算,而在FinalPass,执行了上采样的AO计算,它们使用了一样的PS Shader代码(但参数和宏不一样):

    void MainPS(in noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
    {
        MainPSandCS(UVAndScreenPos, SvPosition, OutColor);    
    }
    
    // AO计算的主逻辑
    void MainPSandCS(in float4 UVAndScreenPos, float4 SvPosition, out float4 OutColor)
    {
        OutColor = 0;
    
        // 下列常量在C++层设置而来.
        float AmbientOcclusionPower = ScreenSpaceAOParams[0].x;
        float Ratio = ScreenSpaceAOParams[1].w;
        float AORadiusInShader = ScreenSpaceAOParams[1].z;
        float InvAmbientOcclusionDistance = ScreenSpaceAOParams[0].z;
        float AmbientOcclusionIntensity = ScreenSpaceAOParams[0].w;
        float2 ViewportUVToRandomUV = ScreenSpaceAOParams[1].xy;
        float AmbientOcclusionBias = ScreenSpaceAOParams[0].y;
        float ScaleFactor = ScreenSpaceAOParams[2].x;
        float ScaleRadiusInWorldSpace = ScreenSpaceAOParams[2].z;
    
        float2 UV = UVAndScreenPos.xy;
        float2 ScreenPos = UVAndScreenPos.zw;
    
        float InvTanHalfFov = ScreenSpaceAOParams[3].w;
        float3 FovFix = float3(InvTanHalfFov, Ratio * InvTanHalfFov, 1);
        float3 InvFovFix = 1.0f / FovFix;
    
        float SceneDepth = GetDepthFromAOInput(UV);
        float3 WorldNormal = GetWorldSpaceNormalFromAOInput(UV, SvPosition);
    
        // 如果不使用法线(!USE_NORMALS), ViewSpaceNormal可能是NaN.
        float3 ViewSpaceNormal = normalize(mul(WorldNormal, (float3x3)View.TranslatedWorldToView));
        float3 ViewSpacePosition = ReconstructCSPos(SceneDepth, ScreenPos);
    
        // 计算AO的半球体实际半径.
        float ActualAORadius = AORadiusInShader * lerp(SceneDepth, 1, ScaleRadiusInWorldSpace);
    
        // 修复之后增加偏移.
        if (USE_NORMALS)
        {
            ViewSpacePosition += AmbientOcclusionBias * SceneDepth * ScaleFactor * (ViewSpaceNormal * FovFix);
        }
    
        float2 WeightAccumulator = 0.0001f;
        
        // 根据采样质量选择不同的随机向量.
    #if AO_SAMPLE_QUALITY != 0
        // no SSAO in this pass, only upsampling
    
    #if AO_SAMPLE_QUALITY == 1
        // no 4x4 randomization
        float2 RandomVec = float2(0, 1) * ActualAORadius;
        {
    #elif AO_SAMPLE_QUALITY == 2
        // 从一个4x4重复的纹理中提取16个基向量(旋转和缩放)中的一个.
        float2 RandomVec = (Texture2DSample(RandomNormalTexture, RandomNormalTextureSampler, UV * ViewportUVToRandomUV).rg * 2 - 1) * ActualAORadius;
        {
    #else // AO_SAMPLE_QUALITY == 3
        // 从一个4x4重复的纹理中提取16个基向量(旋转和缩放)中的一个,如果启用TemporalAA,则随时间变化.
    
        // 通过多帧, 仅当TAA启用时, 每帧增加一点抖动可获得更高的质量, 但可能引发鬼影效果.
        const float2 TemporalOffset = ScreenSpaceAOParams[3].xy;
    
        // 调试模式.
        const bool bDebugLookups = DEBUG_LOOKUPS && ViewSpacePosition.x > 0;
    
        float2 RandomVec = (Texture2DSample(RandomNormalTexture, RandomNormalTextureSampler, TemporalOffset + UV * ViewportUVToRandomUV).rg * 2 - 1) * ActualAORadius;
        {
    #endif // AO_SAMPLE_QUALITY == 
    
            if(bDebugLookups && ViewSpacePosition.y > 0)
            {
                // top sample are not per pixel rotated
                RandomVec = float2(0, 1) * ActualAORadius;
            }
    
            float2 FovFixXY = FovFix.xy * (1.0f / ViewSpacePosition.z);
            float4 RandomBase = float4(RandomVec, -RandomVec.y, RandomVec.x) * float4(FovFixXY, FovFixXY);
            float2 ScreenSpacePos = ViewSpacePosition.xy / ViewSpacePosition.z;
    
            // .x意味着对于非常各向异性的视图用x来缩放.
            float InvHaloSize = 1.0f / (ActualAORadius * FovFixXY.x * 2);
    
            float3 ScaledViewSpaceNormal = ViewSpaceNormal;
    
    #if OPTIMIZATION_O1
            ScaledViewSpaceNormal *= 0.08f * lerp(SceneDepth, 1000, ScaleRadiusInWorldSpace);
    #endif
    
            UNROLL for(int i = 0; i < SAMPLESET_ARRAY_SIZE; ++i)
            {
                // -1..1
                float2 UnrotatedRandom = OcclusionSamplesOffsets[i].xy;
    
                float2 LocalRandom = (UnrotatedRandom.x * RandomBase.xy + UnrotatedRandom.y * RandomBase.zw);
    
                if (bDebugLookups)
                {
                    (......)
                }
                else if (USE_NORMALS) // 有法线
                {
                    float3 LocalAccumulator = 0;
    
                    UNROLL for(uint step = 0; step < SAMPLE_STEPS; ++step)
                    {
                        // 运行时是常量.
                        float Scale = (step + 1) / (float)SAMPLE_STEPS;
                        // 运行时是常量(越高对纹理的缓存和性能越好, 越低对质量越好).
                        float MipLevel = ComputeMipLevel(i, step);
    
                        // 单步的采样点.
                        float3 StepSample = WedgeWithNormal(ScreenSpacePos, Scale * LocalRandom, InvFovFix, ViewSpacePosition, ScaledViewSpaceNormal, InvHaloSize, MipLevel);
    
                        // 组合水平方向的样本.
                        LocalAccumulator = lerp(LocalAccumulator, float3(max(LocalAccumulator.xy, StepSample.xy), 1), StepSample.z);
                    }
    
                    // Square(): 用角度的二次曲线缩放面积, 获得更暗一点的效果.
                    WeightAccumulator += float2(Square(1 - LocalAccumulator.x) * LocalAccumulator.z, LocalAccumulator.z);
                    WeightAccumulator += float2(Square(1 - LocalAccumulator.y) * LocalAccumulator.z, LocalAccumulator.z);
                }
                else // 没有法线
                {
                    (......)
                }
            }
        }
    
    #endif // #if AO_SAMPLE_QUALITY == 0
    
        OutColor.r = WeightAccumulator.x / WeightAccumulator.y;
        OutColor.gb = float2(0, 0);
    
        if(!bDebugLookups)
        {
    #if COMPUTE_SHADER || FORWARD_SHADING
            // In compute, Input1 and Input2 are not necessarily valid.
            float4 Filtered = 1;
    #else
            // 上采样.
            float4 Filtered = ComputeUpsampleContribution(SceneDepth, UV, WorldNormal);
    #endif
            // recombined result from multiple resolutions
            OutColor.r = lerp(OutColor.r, Filtered.r, ComputeLerpFactor());
        }
    
    #if !USE_AO_SETUP_AS_INPUT // FinalPass会执行此逻辑.
        if(!bDebugLookups)
        {
            // 全分辨率
    
            // 在距离上软过渡AO
            {
                float Mul = ScreenSpaceAOParams[4].x;
                float Add = ScreenSpaceAOParams[4].y;
                OutColor.r = lerp(OutColor.r, 1, saturate(SceneDepth * Mul + Add));
            }
    
            // 用户修改的AO
            OutColor.r = 1 - (1 - pow(abs(OutColor.r), AmbientOcclusionPower)) * AmbientOcclusionIntensity;
    
            // 只输出单通道
            OutColor = OutColor.r;
        }
        else
        {
            OutColor.r = pow(1 - OutColor.r, 16);    // constnt is tweaked with radius and sample count
        }
    #endif
    
        // SM4不支持ddx_fine()
    #if !COMPUTE_SHADER && QUAD_MESSAGE_PASSING_BLUR > 0 && FEATURE_LEVEL >= FEATURE_LEVEL_SM5
        {
            // .x: AO output, .y:SceneDepth .zw:view space normal
            float4 CenterPixel = float4(OutColor.r, SceneDepth, normalize(ViewSpaceNormal).xy); 
    
            float4 dX = ddx_fine(CenterPixel);
            float4 dY = ddy_fine(CenterPixel);
    
            int2 Mod = (uint2)(SvPosition.xy) % 2;
    
            float4 PixA = CenterPixel;
            float4 PixB = CenterPixel - dX * (Mod.x * 2 - 1);
            float4 PixC = CenterPixel - dY * (Mod.y * 2 - 1);
    
            float WeightA = 1.0f;
            float WeightB = 1.0f;
            float WeightC = 1.0f;
    
            // 用法线计算权重.
    #if QUAD_MESSAGE_PASSING_NORMAL
            const float NormalTweak = 4.0f;
            float3 NormalA = ReconstructNormal(PixA.zw);
            float3 NormalB = ReconstructNormal(PixB.zw);
            float3 NormalC = ReconstructNormal(PixC.zw);
            WeightB *= saturate(pow(saturate(dot(NormalA, NormalB)), NormalTweak));
            WeightC *= saturate(pow(saturate(dot(NormalA, NormalC)), NormalTweak));
    #endif
    
            // 用深度计算权重.
    #if QUAD_MESSAGE_PASSING_DEPTH
            const float DepthTweak = 1;
            float InvDepth = 1.0f / PixA.y;
            WeightB *= 1 - saturate(abs(1 - PixB.y * InvDepth) * DepthTweak);
            WeightC *= 1 - saturate(abs(1 - PixC.y * InvDepth) * DepthTweak);
    #endif
    
            // + 1.0f to avoid div by 0
            float InvWeightABC = 1.0f / (WeightA + WeightB + WeightC);
    
            // 缩放权重.
            WeightA *= InvWeightABC;
            WeightB *= InvWeightABC;
            WeightC *= InvWeightABC;
    
            // 用权重计算最终的输出颜色.
            OutColor = WeightA * PixA.x + WeightB * PixB.x + WeightC * PixC.x;
        }
    #endif
    }
    

    上述提供了法线和深度作为权重的算法,它们的计算公式如下(摘自MSSAO):

    UE在PC端的SSAO使用了法线,所以其在计算采样点时考虑了法线,由WedgeWithNormal担当:

    float3 WedgeWithNormal(float2 ScreenSpacePosCenter, float2 InLocalRandom, float3 InvFovFix, float3 ViewSpacePosition, float3 ScaledViewSpaceNormal, float InvHaloSize, float MipLevel)
    {
        float2 ScreenSpacePosL = ScreenSpacePosCenter + InLocalRandom;
        float2 ScreenSpacePosR = ScreenSpacePosCenter - InLocalRandom;
    
        float AbsL = GetHZBDepth(ScreenSpacePosL, MipLevel);
        float AbsR = GetHZBDepth(ScreenSpacePosR, MipLevel);
    
        float3 SamplePositionL = ReconstructCSPos(AbsL, ScreenSpacePosL);
        float3 SamplePositionR = ReconstructCSPos(AbsR, ScreenSpacePosR);
    
        float3 DeltaL = (SamplePositionL - ViewSpacePosition) * InvFovFix;
        float3 DeltaR = (SamplePositionR - ViewSpacePosition) * InvFovFix;
            
    #if OPTIMIZATION_O1
        float InvNormAngleL = saturate(dot(DeltaL, ScaledViewSpaceNormal) / dot(DeltaL, DeltaL));
        float InvNormAngleR = saturate(dot(DeltaR, ScaledViewSpaceNormal) / dot(DeltaR, DeltaR));
        float Weight = 1;
    #else
        float InvNormAngleL = saturate(dot(DeltaL, ScaledViewSpaceNormal) * rsqrt(dot(DeltaL, DeltaL)));
        float InvNormAngleR = saturate(dot(DeltaR, ScaledViewSpaceNormal) * rsqrt(dot(DeltaR, DeltaR)));
    
        float Weight = 
              saturate(1.0f - length(DeltaL) * InvHaloSize)
            * saturate(1.0f - length(DeltaR) * InvHaloSize);
    #endif
    
        return float3(InvNormAngleL, InvNormAngleR, Weight);
    }
    

    利用法线构造了一个楔形(下图),在此楔形内生成采样数据和权重。主要过程是在法线周围生成屏幕空间的左右偏移量,结合HZB深度值重建出左右两个采样位置,最后输出左右角度的倒数和权重。

    从上面的代码分析可知,UE的SSAO在渲染流程、下采样、权重计算、AO混合等等和MSSAO版本高度相似:

    经过上述的步骤渲染之后,会得到如下所示的全分辨率单通道ScreenSpaceAO纹理数据:

    那么ScreenSpaceAO又是在哪里怎么被应用到光照中呢?下面继续追踪和分析之。

    利用RenderDoc截帧分析可以发现有几个渲染阶段都会用到ScreenSpaceAO纹理:

    上图显示组合非直接光和AO、渲染延迟标准光源、反射环境和天空光的渲染阶段都会引用到ScreenSpaceAO。

    其中组合非直接光和AO涉及到SSAO的代码逻辑如下:

    // EngineShadersPrivateDiffuseIndirectComposite.usf
    
    Texture2D AmbientOcclusionTexture;
    SamplerState AmbientOcclusionSampler;
    
    void MainPS(float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
    {
        (......)
    
        // 采样SSAO数据.
        float DynamicAmbientOcclusion = 1.0f;
    #if DIM_APPLY_AMBIENT_OCCLUSION
        DynamicAmbientOcclusion = AmbientOcclusionTexture.SampleLevel(AmbientOcclusionSampler, BufferUV, 0).r;
    #endif
    
        // 计算最终AO: 材质AO * SSAO.
        float FinalAmbientOcclusion = GBuffer.GBufferAO * DynamicAmbientOcclusion;
    
        (......)
    
        {
            float AOMask = (GBuffer.ShadingModelID != SHADINGMODELID_UNLIT);
            // 利用AOMask和AmbientOcclusionStaticFraction作为权重插值最终AO, 然后将AO值保存到Alpha通道, 而不是直接应用到rgb颜色中.
            OutColor.a = lerp(1.0f, FinalAmbientOcclusion, AOMask * AmbientOcclusionStaticFraction);
        }
    }
    

    渲染延迟标准光源涉及的SSAO计算代码如下:

    // EngineShadersPrivateDeferredShadingCommon.ush
    
    FScreenSpaceData GetScreenSpaceData(float2 UV, bool bGetNormalizedNormal = true)
    {
        FScreenSpaceData Out;
    
        Out.GBuffer = GetGBufferData(UV, bGetNormalizedNormal);
        // 采样SSAO
        float4 ScreenSpaceAO = Texture2DSampleLevel(SceneTexturesStruct.ScreenSpaceAOTexture, SceneTexturesStruct_ScreenSpaceAOTextureSampler, UV, 0);
    
        // 直接将SSAO作为最终AO.
        Out.AmbientOcclusion = ScreenSpaceAO.r;
    
        return Out;
    }
    
    FDeferredLightingSplit GetDynamicLightingSplit(..., float AmbientOcclusion, ...)
    {
        (......)
    
        FShadowTerms Shadow;
        // 直接将SSAO作为阴影的初始值, 后面还会叠加阴影的系数.
        Shadow.SurfaceShadow = AmbientOcclusion;
            
        (......)
        
        LightAccumulator_AddSplit( LightAccumulator, Lighting.Diffuse, Lighting.Specular, Lighting.Diffuse,
                                  // 注意, 此处利用Shadow.SurfaceShadow缩放了光源颜色.
                                  LightColor * LightMask * Shadow.SurfaceShadow,
                                  bNeedsSeparateSubsurfaceLightAccumulation );
        (......)
    }
    

    延迟灯光渲染阶段将叠加了AO和阴影的系数缩放了光源颜色,从而对漫反射和高光都起了影响。

    反射环境和天空光也类似,此处就不再冗余分析了。本节的最后放出UE的SSAO开启和关闭的效果对比图(上无下有):

    此外,UE也支持GTAO的版本,本文就不解析了。

    7.4.8 SSGI

    SSGI(Screen Space Global IIlumination)译为屏幕空间的全局光照,是基于屏幕空间的GBuffer数据进行光线追踪的GI技术。

    默认情况下,UE是关闭SSGI的,需要在工程配置里显式开启:

    开启SSGI后,可以增加角落、凹槽等表面的间接光照,减少它们的漏光,提升画面可信度。

    上:关闭SSGI;下:开启SSGI,桌子和椅子下方的漏光明显减少。

    UE的SSGI是在DiffuseIndirectAndAO内完成的,意味着它也是位于BasePass和Lighting之间。SSGI主要分为以下几个阶段:

    • SSGI渲染阶段:
      • 第一次下采样上一帧数据(4个Mip层级)。
      • 第二次下采样上一帧数据(1个Mip层级,最低级)。
      • 计算屏幕空间非直接漫反射。
    • 屏幕空间降噪:
      • 压缩元数据。
      • 重建数据。
      • 时间累积降噪。
    • 组合SSGI和SSAO等非直接光。

    7.4.8.1 SSGI渲染

    下面就按照上述步骤阐述SSGI的具体实现过程。首先看SSGI渲染阶段:

    // EngineSourceRuntimeRendererPrivateScreenSpaceRayTracing.cpp
    
    void RenderScreenSpaceDiffuseIndirect(
        FRDGBuilder& GraphBuilder, 
        const FSceneTextureParameters& SceneTextures,
        const FRDGTextureRef CurrentSceneColor,
        const FViewInfo& View,
        IScreenSpaceDenoiser::FAmbientOcclusionRayTracingConfig* OutRayTracingConfig,
        IScreenSpaceDenoiser::FDiffuseIndirectInputs* OutDenoiserInputs)
    {
        // 初始化质量, 标记, 尺寸等数据.
        const int32 Quality = FMath::Clamp( CVarSSGIQuality.GetValueOnRenderThread(), 1, 4 );
        bool bHalfResolution = IsSSGIHalfRes();
    
        FIntPoint GroupSize;
        int32 RayCountPerPixel;
        GetSSRTGIShaderOptionsForQuality(Quality, &GroupSize, &RayCountPerPixel);
    
        FIntRect Viewport = View.ViewRect;
        if (bHalfResolution)
        {
            Viewport = FIntRect::DivideAndRoundUp(Viewport, 2);
        }
    
        RDG_EVENT_SCOPE(GraphBuilder, "SSGI %dx%d", Viewport.Width(), Viewport.Height());
    
        const FVector2D ViewportUVToHZBBufferUV(
            float(View.ViewRect.Width()) / float(2 * View.HZBMipmap0Size.X),
            float(View.ViewRect.Height()) / float(2 * View.HZBMipmap0Size.Y)
        );
    
        FRDGTexture* FurthestHZBTexture = GraphBuilder.RegisterExternalTexture(View.HZB);
        FRDGTexture* ClosestHZBTexture = GraphBuilder.RegisterExternalTexture(View.ClosestHZB);
    
        // 重投影和下采样上一帧的颜色.
        FRDGTexture* ReducedSceneColor;
        FRDGTexture* ReducedSceneAlpha = nullptr;
        {
            // 忽略最前面mip的数量.
            const int32 DownSamplingMip = 1;
            // mip数量.
            const int32 kNumMips = 5;
    
            bool bUseLeakFree = View.PrevViewInfo.ScreenSpaceRayTracingInput != nullptr;
    
            // 分配ReducedSceneColor.
            {
                FIntPoint RequiredSize = SceneTextures.SceneDepthTexture->Desc.Extent / (1 << DownSamplingMip);
    
                int32 QuantizeMultiple = 1 << (kNumMips - 1);
                FIntPoint QuantizedSize = FIntPoint::DivideAndRoundUp(RequiredSize, QuantizeMultiple);
    
                FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                    FIntPoint(QuantizeMultiple * QuantizedSize.X, QuantizeMultiple * QuantizedSize.Y),
                    PF_FloatR11G11B10,
                    FClearValueBinding::None,
                    TexCreate_ShaderResource | TexCreate_UAV);
                Desc.NumMips = kNumMips;
    
                ReducedSceneColor = GraphBuilder.CreateTexture(Desc, TEXT("SSRTReducedSceneColor"));
    
                if (bUseLeakFree)
                {
                    Desc.Format = PF_A8;
                    ReducedSceneAlpha = GraphBuilder.CreateTexture(Desc, TEXT("SSRTReducedSceneAlpha"));
                }
            }
    
            // 处理第易次下采样Pass.(有4个mip纹理)
            
            // FSSRTPrevFrameReductionCS参数处理.
            FSSRTPrevFrameReductionCS::FParameters DefaultPassParameters;
            {
                DefaultPassParameters.SceneTextures = SceneTextures;
                DefaultPassParameters.View = View.ViewUniformBuffer;
    
                DefaultPassParameters.ReducedSceneColorSize = FVector2D(
                    ReducedSceneColor->Desc.Extent.X, ReducedSceneColor->Desc.Extent.Y);
                DefaultPassParameters.ReducedSceneColorTexelSize = FVector2D(
                    1.0f / float(ReducedSceneColor->Desc.Extent.X), 1.0f / float(ReducedSceneColor->Desc.Extent.Y));
            }
    
            {
                FSSRTPrevFrameReductionCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSSRTPrevFrameReductionCS::FParameters>();
                *PassParameters = DefaultPassParameters;
    
                FIntPoint ViewportOffset;
                FIntPoint ViewportExtent;
                FIntPoint BufferSize;
    
                // 是否不漏光的处理.
                if (bUseLeakFree)
                {
                    BufferSize = View.PrevViewInfo.ScreenSpaceRayTracingInput->GetDesc().Extent;
                    ViewportOffset = View.ViewRect.Min; // TODO
                    ViewportExtent = View.ViewRect.Size();
    
                    PassParameters->PrevSceneColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.ScreenSpaceRayTracingInput);
                    PassParameters->PrevSceneColorSampler = TStaticSamplerState<SF_Point>::GetRHI();
    
                    PassParameters->PrevSceneDepth = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.DepthBuffer);
                    PassParameters->PrevSceneDepthSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
                }
                else
                {
                    BufferSize = View.PrevViewInfo.TemporalAAHistory.ReferenceBufferSize;
                    ViewportOffset = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Min;
                    ViewportExtent = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Size();
    
                    PassParameters->PrevSceneColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.TemporalAAHistory.RT[0]);
                    PassParameters->PrevSceneColorSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
                }
    
                PassParameters->PrevSceneColorPreExposureCorrection = View.PreExposure / View.PrevViewInfo.SceneColorPreExposure;
    
                PassParameters->PrevScreenPositionScaleBias = FVector4(
                    ViewportExtent.X * 0.5f / BufferSize.X,
                    -ViewportExtent.Y * 0.5f / BufferSize.Y,
                    (ViewportExtent.X * 0.5f + ViewportOffset.X) / BufferSize.X,
                    (ViewportExtent.Y * 0.5f + ViewportOffset.Y) / BufferSize.Y);
    
                // 给每个mip创建输出的UAV.
                for (int32 MipLevel = 0; MipLevel < (PassParameters->ReducedSceneColorOutput.Num() - DownSamplingMip); MipLevel++)
                {
                    PassParameters->ReducedSceneColorOutput[DownSamplingMip + MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneColor, MipLevel));
                    if (ReducedSceneAlpha)
                        PassParameters->ReducedSceneAlphaOutput[DownSamplingMip + MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneAlpha, MipLevel));
                }
    
                FSSRTPrevFrameReductionCS::FPermutationDomain PermutationVector;
                PermutationVector.Set<FSSRTPrevFrameReductionCS::FLowerMips>(false); 
                PermutationVector.Set<FSSRTPrevFrameReductionCS::FLeakFree>(bUseLeakFree);
    
                // 增加CS Pass以下采样颜色和alpha.
                TShaderMapRef<FSSRTPrevFrameReductionCS> ComputeShader(View.ShaderMap, PermutationVector);
                FComputeShaderUtils::AddPass(
                    GraphBuilder,
                    RDG_EVENT_NAME("PrevFrameReduction(LeakFree=%i) %dx%d",
                        bUseLeakFree ? 1 : 0,
                        View.ViewRect.Width(), View.ViewRect.Height()),
                    ComputeShader,
                    PassParameters,
                    FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), 8));
            }
            
            // 处理第二次下采样Pass.(只有1个mip纹理)
    
            for (int32 i = 0; i < 1; i++)
            {
                int32 SrcMip = i * 3 + 2 - DownSamplingMip;
                int32 StartDestMip = SrcMip + 1;
                int32 Divisor = 1 << (StartDestMip + DownSamplingMip);
    
                FSSRTPrevFrameReductionCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSSRTPrevFrameReductionCS::FParameters>();
                *PassParameters = DefaultPassParameters;
    
                PassParameters->HigherMipTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(ReducedSceneColor, SrcMip));
                if (bUseLeakFree)
                {
                    check(ReducedSceneAlpha);
                    PassParameters->HigherMipTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
                    PassParameters->HigherAlphaMipTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(ReducedSceneAlpha, SrcMip));
                    PassParameters->HigherAlphaMipTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
                }
                else
                {
                    PassParameters->HigherMipTextureSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
                }
    
                PassParameters->HigherMipDownScaleFactor = 1 << (DownSamplingMip + SrcMip);
    
                PassParameters->HigherMipBufferBilinearMax = FVector2D(
                    (0.5f * View.ViewRect.Width() - 0.5f) / float(ReducedSceneColor->Desc.Extent.X),
                    (0.5f * View.ViewRect.Height() - 0.5f) / float(ReducedSceneColor->Desc.Extent.Y));
    
                PassParameters->ViewportUVToHZBBufferUV = ViewportUVToHZBBufferUV;
                PassParameters->FurthestHZBTexture = FurthestHZBTexture;
                PassParameters->FurthestHZBTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
    
                for (int32 MipLevel = 0; MipLevel < PassParameters->ReducedSceneColorOutput.Num(); MipLevel++)
                {
                    PassParameters->ReducedSceneColorOutput[MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneColor, StartDestMip + MipLevel));
                    if (ReducedSceneAlpha)
                        PassParameters->ReducedSceneAlphaOutput[MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneAlpha, StartDestMip + MipLevel));
                }
    
                FSSRTPrevFrameReductionCS::FPermutationDomain PermutationVector;
                PermutationVector.Set<FSSRTPrevFrameReductionCS::FLowerMips>(true);
                PermutationVector.Set<FSSRTPrevFrameReductionCS::FLeakFree>(bUseLeakFree);
    
                // 第二次下采样Pass
                TShaderMapRef<FSSRTPrevFrameReductionCS> ComputeShader(View.ShaderMap, PermutationVector);
                FComputeShaderUtils::AddPass(
                    GraphBuilder,
                    RDG_EVENT_NAME("PrevFrameReduction(LeakFree=%i) %dx%d",
                        bUseLeakFree ? 1 : 0,
                        View.ViewRect.Width() / Divisor, View.ViewRect.Height() / Divisor),
                    ComputeShader,
                    PassParameters,
                    FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), 8 * Divisor));
            }
        }
    
        {
            // 分配输出.
            {
                FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                    SceneTextures.SceneDepthTexture->Desc.Extent / (bHalfResolution ? 2 : 1),
                    PF_FloatRGBA,
                    FClearValueBinding::Transparent,
                    TexCreate_ShaderResource | TexCreate_UAV);
    
                OutDenoiserInputs->Color = GraphBuilder.CreateTexture(Desc, TEXT("SSRTDiffuseIndirect"));
    
                Desc.Format = PF_R16F;
                Desc.Flags |= TexCreate_RenderTargetable;
                OutDenoiserInputs->AmbientOcclusionMask = GraphBuilder.CreateTexture(Desc, TEXT("SSRTAmbientOcclusion"));
            }
        
            // 处理FScreenSpaceDiffuseIndirectCS参数.
            
            FScreenSpaceDiffuseIndirectCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceDiffuseIndirectCS::FParameters>();
    
            if (bHalfResolution)
            {
                PassParameters->PixelPositionToFullResPixel = 2.0f;
                PassParameters->FullResPixelOffset = FVector2D(0.5f, 0.5f); // TODO.
            }
            else
            {
                PassParameters->PixelPositionToFullResPixel = 1.0f;
                PassParameters->FullResPixelOffset = FVector2D(0.5f, 0.5f);
            }
    
            {
                PassParameters->ColorBufferScaleBias = FVector4(
                    0.5f * SceneTextures.SceneDepthTexture->Desc.Extent.X / float(ReducedSceneColor->Desc.Extent.X),
                    0.5f * SceneTextures.SceneDepthTexture->Desc.Extent.Y / float(ReducedSceneColor->Desc.Extent.Y),
                    -0.5f * View.ViewRect.Min.X / float(ReducedSceneColor->Desc.Extent.X),
                    -0.5f * View.ViewRect.Min.Y / float(ReducedSceneColor->Desc.Extent.Y));
    
                PassParameters->ReducedColorUVMax = FVector2D(
                    (0.5f * View.ViewRect.Width() - 0.5f) / float(ReducedSceneColor->Desc.Extent.X),
                    (0.5f * View.ViewRect.Height() - 0.5f) / float(ReducedSceneColor->Desc.Extent.Y));
            }
    
            PassParameters->FurthestHZBTexture = FurthestHZBTexture;
            PassParameters->FurthestHZBTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
            PassParameters->ColorTexture = ReducedSceneColor;
            PassParameters->ColorTextureSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
    
            PassParameters->HZBUvFactorAndInvFactor = FVector4(
                ViewportUVToHZBBufferUV.X,
                ViewportUVToHZBBufferUV.Y,
                1.0f / ViewportUVToHZBBufferUV.X,
                1.0f / ViewportUVToHZBBufferUV.Y );
    
            PassParameters->SceneTextures = SceneTextures;
            PassParameters->View = View.ViewUniformBuffer;
        
            PassParameters->IndirectDiffuseOutput = GraphBuilder.CreateUAV(OutDenoiserInputs->Color);
            PassParameters->AmbientOcclusionOutput = GraphBuilder.CreateUAV(OutDenoiserInputs->AmbientOcclusionMask);
            PassParameters->DebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, OutDenoiserInputs->Color->Desc, TEXT("DebugSSGI"));
            PassParameters->ScreenSpaceRayTracingDebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, OutDenoiserInputs->Color->Desc, TEXT("DebugSSGIMarshing"), true);
    
            FScreenSpaceDiffuseIndirectCS::FPermutationDomain PermutationVector;
            PermutationVector.Set<FScreenSpaceDiffuseIndirectCS::FQualityDim>(Quality);
    
            // 增加SSGI计算Pass.
            TShaderMapRef<FScreenSpaceDiffuseIndirectCS> ComputeShader(View.ShaderMap, PermutationVector);
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("ScreenSpaceDiffuseIndirect(Quality=%d RayPerPixel=%d) %dx%d",
                    Quality, RayCountPerPixel, Viewport.Width(), Viewport.Height()),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(Viewport.Size(), GroupSize));
        }
    
        OutRayTracingConfig->ResolutionFraction = bHalfResolution ? 0.5f : 1.0f;
        OutRayTracingConfig->RayCountPerPixel = RayCountPerPixel;
    } // RenderScreenSpaceDiffuseIndirect()
    

    下面分析下采样Pass使用的CS Shader:

    // EngineShadersPrivateSSRTSSRTPrevFrameReduction.usf
    
    [numthreads(GROUP_TILE_SIZE, GROUP_TILE_SIZE, 1)]
    void MainCS(
        uint2 DispatchThreadId : SV_DispatchThreadID,
        uint2 GroupId : SV_GroupID,
        uint2 GroupThreadId : SV_GroupThreadID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
        #if DIM_LOWER_MIPS
            float2 SceneBufferUV = HigherMipDownScaleFactor * (2.0 * float2(DispatchThreadId) + 1.0) * View.BufferSizeAndInvSize.zw;
        #else
            float2 SceneBufferUV = (float2(DispatchThreadId) + 0.5) * View.BufferSizeAndInvSize.zw;
        #endif
    
        SceneBufferUV = clamp(SceneBufferUV, View.BufferBilinearUVMinMax.xy, View.BufferBilinearUVMinMax.zw);
        float2 ViewportUV = BufferUVToViewportUV(SceneBufferUV);
        float2 ScreenPosition = ViewportUVToScreenPos(ViewportUV);
        
        float4 PrevColor;
        float WorldDepth;
    
        #if DIM_LOWER_MIPS // 第二次下采样Pass进入此分支.
            #if DIM_LEAK_FREE
            {
                {
                    float HZBDeviceZ = FurthestHZBTexture.SampleLevel(FurthestHZBTextureSampler, ViewportUV * ViewportUVToHZBBufferUV, 2.0).r;
                    WorldDepth = ConvertFromDeviceZ(HZBDeviceZ);
                }
    
                float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
    
                float WorldBluringRadius = WorldDepthToPixelWorldRadius * WorldDepth;
                float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
    
                PrevColor = 0.0;
    
                // 根据深度缓存还原当前像素及相邻4个像素的世界坐标,使用世界坐标之间的距离平方衰减作为权重缩小Color的值.
                UNROLL_N(4)
                for (uint i = 0; i < 4; i++)
                {
                    const float2 TexelOffset = float2(i % 2, i / 2) - 0.5;
    
                    // 采样UV
                    float2 HZBBufferUV = (ViewportUV + TexelOffset * HigherMipDownScaleFactor * View.ViewSizeAndInvSize.zw) * ViewportUVToHZBBufferUV;
                    // 采样的深度
                    float SampleDeviceZ = FurthestHZBTexture.SampleLevel(FurthestHZBTextureSampler, HZBBufferUV, 1.0).r;
                    // 当前像素和采样深度的距离.
                    float SampleDist = WorldDepth - ConvertFromDeviceZ(SampleDeviceZ);
    
                    // 采样权重.
                    float SampleWeight = 0.25 * saturate(1 - SampleDist * SampleDist * InvSquareWorldBluringRadius);
    
                    float2 SampleUV = HigherMipDownScaleFactor * (2.0 * float2(DispatchThreadId) + 1.0 + TexelOffset) * 0.5 * ReducedSceneColorTexelSize;
                    SampleUV = min(SampleUV, HigherMipBufferBilinearMax);
    
                    float4 SampleColor = float4(
                        HigherMipTexture.SampleLevel(HigherMipTextureSampler, SampleUV, 0).rgb,
                        Texture2DSample_A8(HigherAlphaMipTexture,HigherAlphaMipTextureSampler, SampleUV));
    
                    // 累加应用权重后的颜色.
                    PrevColor += SampleColor * SampleWeight;
                }
            }
            #else
            {
                float2 HigherMipUV = HigherMipDownScaleFactor * (float2(DispatchThreadId) * 1.0 + 0.5) * ReducedSceneColorTexelSize;
                PrevColor = float4(HigherMipTexture.SampleLevel(HigherMipTextureSampler, HigherMipUV, 0).rgb, 1);
            }
            #endif
        #else // 第一次下采样Pass进入此分支.
        {
            float DeviceZ = SampleDeviceZFromSceneTextures(SceneBufferUV);
            WorldDepth = ConvertFromDeviceZ(DeviceZ);
    
            // 当前像素的在屏幕空间的镜头运动向量.
            float4 ThisClip = float4(ScreenPosition, DeviceZ, 1);
            float4 PrevClip = mul(ThisClip, View.ClipToPrevClip);
            float2 PrevScreen = PrevClip.xy / PrevClip.w;
    
            bool bIsSky = WorldDepth > 100 * 1000;
    
            // 获取速度.
            float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, SceneBufferUV, 0);
            if (EncodedVelocity.x > 0.0)
            {
                PrevScreen = ThisClip.xy - DecodeVelocityFromTexture(EncodedVelocity).xy;
            }
    
            float2 PrevFrameUV = PrevScreen.xy * PrevScreenPositionScaleBias.xy + PrevScreenPositionScaleBias.zw;
    
            // 获取颜色(与上面类似)
            #if DIM_LEAK_FREE
            {
                float3 RefWorldPosition = ComputeTranslatedWorldPosition(ScreenPosition, WorldDepth, /* bIsPrevFrame = */ false);
    
                float NoV = dot(View.TranslatedWorldCameraOrigin - normalize(RefWorldPosition), GetGBufferDataFromSceneTextures(SceneBufferUV).WorldNormal);
                    
                float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
    
                float WorldBluringRadius = WorldDepthToPixelWorldRadius * WorldDepth;
                float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
    
                {
                    float2 SampleUV = PrevFrameUV;
                    
                    SampleUV = clamp(SampleUV, View.BufferBilinearUVMinMax.xy, View.BufferBilinearUVMinMax.zw);
    
                    float PrevDeviceZ = PrevSceneDepth.SampleLevel(PrevSceneDepthSampler, SampleUV, 0).r;
    
                    float3 SampleWorldPosition = ComputeTranslatedWorldPosition(PrevScreen.xy, ConvertFromDeviceZ(PrevDeviceZ), /* bIsPrevFrame = */ true);
    
                    float SampleDistSquare = length2(RefWorldPosition - SampleWorldPosition);
    
                    float SampleWeight = saturate(1 - SampleDistSquare * InvSquareWorldBluringRadius);
    
                    PrevColor = float4(PrevSceneColor.SampleLevel(PrevSceneColorSampler, SampleUV, 0).rgb * SampleWeight, SampleWeight);
                }
            }
            #else
            {
                PrevColor = float4(PrevSceneColor.SampleLevel(PrevSceneColorSampler, PrevFrameUV, 0).rgb, 1.0);
            }
            #endif
    
            PrevColor = -min(-PrevColor, 0.0);
    
            #if CONFIG_COLOR_TILE_CLASSIFICATION
            {
                if (bIsSky)
                    PrevColor = 0;
            }
            #endif
    
            // 校正预曝光.
            #if USE_PREEXPOSURE
                PrevColor.rgb *= PrevSceneColorPreExposureCorrection;
            #endif
    
            // 应用暗角.
            {
                float Vignette = min(ComputeHitVignetteFromScreenPos(ScreenPosition), ComputeHitVignetteFromScreenPos(PrevScreen));
                PrevColor *= Vignette;
            }
    
            (......)
        }
        #endif
    
        // 输出mip 0
        #if DIM_LOWER_MIPS
        {
            ReducedSceneColorOutput_0[DispatchThreadId] = float4(PrevColor.rgb, 0);
    
            #if DIM_LEAK_FREE
                ReducedSceneAlphaOutput_0[DispatchThreadId] = PrevColor.a;
            #endif
        }
        #endif
    
        // 下采样低mip级别.
        {
            // 存储颜色到LDS.
            {
                SharedMemory[GROUP_PIXEL_COUNT * 0 | GroupThreadIndex] = (f32tof16(PrevColor.r) << 0) | (f32tof16(PrevColor.g) << 16);
                SharedMemory[GROUP_PIXEL_COUNT * 1 | GroupThreadIndex] = (f32tof16(PrevColor.b) << 0) | (f32tof16(PrevColor.a) << 16);
    
                #if DIM_LEAK_FREE
                    SharedFurthestDepth[GroupThreadIndex] = WorldDepth;
                #endif
            }
        
            GroupMemoryBarrierWithGroupSync();
    
            // 下采样低mip级别.
            UNROLL
            for (uint MipLevel = 1; MipLevel < 3; MipLevel++)
            {
                const uint ReductionAmount = 1 << MipLevel;
                const uint NumberPixelInMip = GROUP_PIXEL_COUNT / (ReductionAmount * ReductionAmount);
                
                if (GroupThreadIndex < NumberPixelInMip)
                {
                    uint2 OutputCoord = uint2(
                        GroupThreadIndex % (GROUP_TILE_SIZE / ReductionAmount),
                        GroupThreadIndex / (GROUP_TILE_SIZE / ReductionAmount));
    
                    // 利用FurthestHZB执行Ray marchsing以避免自相交, 所以这里要保持保守策略。
                    #if DIM_LEAK_FREE
                    // 下采样深度, 计算自身和周边4个像素的最大深度(最靠近相机的深度).
                    float FurthestDepth;
                    {
                        UNROLL_N(2)
                        for (uint x = 0; x < 2; x++)
                        {
                            UNROLL_N(2)
                            for (uint y = 0; y < 2; y++)
                            {
                                uint2 Coord = OutputCoord * 2 + uint2(x, y);
                                uint LDSIndex = Coord.x + Coord.y * ((2 * GROUP_TILE_SIZE) / ReductionAmount);
    
                                float NeighborDepth = SharedFurthestDepth[LDSIndex];
    
                                if (x == 0 && y == 0)
                                    FurthestDepth = NeighborDepth;
                                else
                                    FurthestDepth = max(FurthestDepth, NeighborDepth);
                            }
                        }
                    }
    
                    float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
    
                    float WorldBluringRadius = WorldDepthToPixelWorldRadius * FurthestDepth;
                    float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
    
                    #endif
    
                    // 下采样颜色, 也会考量采样深度到当前像素深度的距离作为权重以缩放颜色值.
                    float4 ReducedColor = 0;
    
                    UNROLL
                    for (uint x = 0; x < 2; x++)
                    {
                        UNROLL
                        for (uint y = 0; y < 2; y++)
                        {
                            uint2 Coord = OutputCoord * 2 + uint2(x, y);
                            uint LDSIndex = Coord.x + Coord.y * ((2 * GROUP_TILE_SIZE) / ReductionAmount);
    
                            uint Raw0 = SharedMemory[GROUP_PIXEL_COUNT * 0 | LDSIndex];
                            uint Raw1 = SharedMemory[GROUP_PIXEL_COUNT * 1 | LDSIndex];
    
                            float4 Color;
                            Color.r = f16tof32(Raw0 >> 0);
                            Color.g = f16tof32(Raw0 >> 16);
                            Color.b = f16tof32(Raw1 >> 0);
                            Color.a = f16tof32(Raw1 >> 16);
    
                            float SampleWeight = 1.0;
                            #if DIM_LEAK_FREE
                            {
                                float NeighborDepth = SharedFurthestDepth[LDSIndex];
                                float SampleDist = (FurthestDepth - NeighborDepth);
    
                                SampleWeight = saturate(1 - (SampleDist * SampleDist) * InvSquareWorldBluringRadius);
                            }
                            #endif
    
                            ReducedColor += Color * SampleWeight;
                        }
                    }
    
                    // 处理并输出结果.
                    
                    ReducedColor *= rcp(4.0);
    
                    uint2 OutputPosition = GroupId * (GROUP_TILE_SIZE / ReductionAmount) + OutputCoord;
    
                    if (MipLevel == 1)
                    {
                        ReducedSceneColorOutput_1[OutputPosition] = float4(ReducedColor.rgb, 0);
                        #if DIM_LEAK_FREE
                            ReducedSceneAlphaOutput_1[OutputPosition] = ReducedColor.a;
                        #endif
                    }
                    else if (MipLevel == 2)
                    {
                        ReducedSceneColorOutput_2[OutputPosition] = float4(ReducedColor.rgb, 0);
                        #if DIM_LEAK_FREE
                            ReducedSceneAlphaOutput_2[OutputPosition] = ReducedColor.a;
                        #endif
                    }
                    
                    SharedMemory[GROUP_PIXEL_COUNT * 0 | GroupThreadIndex] = (f32tof16(ReducedColor.r) << 0) | (f32tof16(ReducedColor.g) << 16);
                    SharedMemory[GROUP_PIXEL_COUNT * 1 | GroupThreadIndex] = (f32tof16(ReducedColor.b) << 0) | (f32tof16(ReducedColor.a) << 16);
    
                    #if DIM_LEAK_FREE
                    {
                        SharedFurthestDepth[GroupThreadIndex] = FurthestDepth;
                    }
                    #endif
                } // if (GroupThreadIndex < NumberPixelInMip)
            } // for (uint MipLevel = 1; MipLevel < 3; MipLevel++)
        }
    } // MainCS()
    

    经过上述处理之后,输出了下采样的拥有5个Mip级别的颜色和深度纹理:

    接下来继续分析SSGI的渲染部分,此Pass的输入有场景深度、HZB、GBuffer和上一阶段生成的下采样纹理,下面直接分析其使用的CS Shader:

    // EngineShadersPrivateSSRTSSRTDiffuseIndirect.usf
    
    [numthreads(TILE_PIXEL_SIZE_X, TILE_PIXEL_SIZE_Y, CONFIG_RAY_COUNT)]
    void MainCS(
        uint2 GroupId : SV_GroupID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
        // 线程组的波形(wave)ID.
        uint GroupWaveIndex = GroupThreadIndex / 64;
        
        FSSRTTileInfos TileInfos;
        {
            const uint BinsAddress = TILE_PIXEL_COUNT * 2;
    
            uint GroupPixelId = GroupThreadIndex % TILE_PIXEL_COUNT;
            uint RaySequenceId = GroupThreadIndex / TILE_PIXEL_COUNT;
    
            // 计算TileCoord, 保证编译器将它用标量来加载.
            uint2 TileCoord = GroupId / uint2(TILE_RES_DIVISOR / TILE_PIXEL_SIZE_X, TILE_RES_DIVISOR / TILE_PIXEL_SIZE_Y);
            TileInfos = LoadTileInfos(TileCoord);
    
            // 存储GBuffer到LDS
            {
                BRANCH
                if (RaySequenceId == 0)
                {
                    uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
                    uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
                    
                    float2 BufferUV;
                    float2 ScreenPos;
                    UpdateLane2DCoordinateInformations(PixelPosition, /* out */ BufferUV, /* out */ ScreenPos);
        
                    FGBufferData GBuffer = GetGBufferDataFromSceneTextures(BufferUV);
                    float3 N = mul(float4(GBuffer.WorldNormal, 0), View.TranslatedWorldToView).xyz;
                    float DeviceZ = SampleDeviceZFromSceneTextures(BufferUV);
    
                    bool bTraceRay = GBuffer.ShadingModelID != SHADINGMODELID_UNLIT;
                    
                    SharedMemory[TILE_PIXEL_COUNT * 0 | GroupPixelId] = CompressN(N);
                    SharedMemory[TILE_PIXEL_COUNT * 1 | GroupPixelId] = asuint(bTraceRay ? DeviceZ : -1.0);
                }
                else if (GroupWaveIndex == 1) // TODO.
                {
                    // Clears the bins
                    SharedMemory[BinsAddress | GroupPixelId] = 0;
                }
            }
            
            (......)
        }
        
        GroupMemoryBarrierWithGroupSync();
        
        // 发射射线
        {
            uint GroupPixelId;
            uint RaySequenceId;
            uint CompressedN;
            float DeviceZ;
            bool bTraceRay;
            #if CONFIG_SORT_RAYS
            {
                uint Raw0 = SharedMemory[LANE_PER_GROUPS * 0 | GroupThreadIndex];
                uint Raw1 = SharedMemory[LANE_PER_GROUPS * 1 | GroupThreadIndex];
    
                // 解压射线数据.
                RaySequenceId = Raw0 >> (24 + TILE_PIXEL_SIZE_X_LOG + TILE_PIXEL_SIZE_Y_LOG);
                GroupPixelId = (Raw0 >> 24) % TILE_PIXEL_COUNT;
                CompressedN = Raw0;
                DeviceZ = asfloat(Raw1);
                bTraceRay = asfloat(Raw1) > 0;
            }
            #else // !CONFIG_SORT_RAYS
            {
                GroupPixelId = GroupThreadIndex % TILE_PIXEL_COUNT;
                RaySequenceId = GroupThreadIndex / TILE_PIXEL_COUNT;
    
                uint Raw0 = SharedMemory[TILE_PIXEL_COUNT * 0 | GroupPixelId];
                uint Raw1 = SharedMemory[TILE_PIXEL_COUNT * 1 | GroupPixelId];
    
                CompressedN = Raw0;
                DeviceZ = asfloat(Raw1);
                bTraceRay = asfloat(Raw1) > 0;
            }
            #endif // !CONFIG_SORT_RAYS
    
            GroupMemoryBarrierWithGroupSync();
    
            #if DEBUG_RAY_COUNT
                float DebugRayCount = 0.0;
            #endif
            uint2 CompressedColor;
    
            BRANCH
            if (bTraceRay) // 确保需要时才发射射线
            {
                // 计算坐标, uv, 屏幕位置.
                uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
                uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
        
                float2 BufferUV;
                float2 ScreenPos;
                UpdateLane2DCoordinateInformations(PixelPosition, /* out */ BufferUV, /* out */ ScreenPos);
        
                // 随机采样光源方向.
                uint2 RandomSeed = ComputeRandomSeed(PixelPosition);
                float2 E = Hammersley16(RaySequenceId, CONFIG_RAY_COUNT, RandomSeed);        
                float3 L = ComputeL(DecompressN(CompressedN), E);
                
                // 步进偏移
                float StepOffset = InterleavedGradientNoise(PixelPosition + 0.5, View.StateFrameIndexMod8);
        
                #if !SSGI_TRACE_CONE
                    StepOffset -= 0.9;
                #endif
    
                bool bDebugPrint = all(PixelPosition == uint2(View.ViewSizeAndInvSize.xy) / 2);
    
                // 初始化射线.
                FSSRTRay Ray = InitScreenSpaceRay(ScreenPos, DeviceZ, L);
                
                float Level;
                float3 HitUVz;
                bool bHit;
    
                #if !CONFIG_SORT_RAYS
                // 如果分块分类可以检测出射线不能有任何交点, 则可以提前退出.
                bool bEarlyOut = TestRayEarlyReturn(TileInfos, Ray);
                
                BRANCH
                if (bEarlyOut)
                {
                    bHit = false;
                    Level = 0;
                    HitUVz = 0;
                }
                else
                #endif
                {
                    // HZB屏幕空间的光线追踪(SSAO已解析过)
                    bHit = CastScreenSpaceRay(
                        FurthestHZBTexture, FurthestHZBTextureSampler,
                        Ray, 1, CONFIG_RAY_STEPS, StepOffset,
                        HZBUvFactorAndInvFactor, bDebugPrint,
                        /* out */ HitUVz,
                        /* out */ Level);
                }
                
                // 如果找到交点, 则计算权重, 采样颜色, 应用权重到颜色并累加.
                BRANCH
                if (bHit)
                {
                    float2 ReducedColorUV =  HitUVz.xy * ColorBufferScaleBias.xy + ColorBufferScaleBias.zw;
                    ReducedColorUV = min(ReducedColorUV, ReducedColorUVMax);
    
                    float4 SampleColor = ColorTexture.SampleLevel(ColorTextureSampler, ReducedColorUV, Level);
                    
                    float SampleColorWeight = 1.0;
    
                    // 交点表面的背面调整
                    #if CONFIG_BACKFACE_MODULATION
                    {
                        float3 SampleNormal = GetGBufferDataFromSceneTextures(HitUVz.xy).WorldNormal;
                        
                        uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
                        uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
    
                        uint2 RandomSeed = ComputeRandomSeed(PixelPosition);
    
                        float2 E = Hammersley16(RaySequenceId, CONFIG_RAY_COUNT, RandomSeed);
            
                        float3 L = ComputeL(DecompressN(CompressedN), E);
            
                        SampleColorWeight *= saturate( 1 - dot( SampleNormal, L ) );
                    }
                    #endif
                    
                    #if CONFIG_RAY_COUNT > 1
                        SampleColorWeight *= rcp( 1 + Luminance(SampleColor.rgb) );
                    #endif
                    
                    // 应用权重到颜色.
                    float3 DiffuseColor = SampleColor.rgb * SampleColorWeight;
                    float AmbientOcclusion = 1.0;
                    
                    #if CONFIG_COLOR_TILE_CLASSIFICATION
                    {
                        float Lumi = Luminance(DiffuseColor.rgb);
                        AmbientOcclusion *= saturate(Lumi / 0.25);
                    }
                    #endif
    
                    // 压缩颜色RGB和AO到XY通道中.
                    CompressedColor.x = asuint(f32tof16(DiffuseColor.r) << 16 | f32tof16(DiffuseColor.g));
                    CompressedColor.y = asuint(f32tof16(DiffuseColor.b) << 16 | f32tof16(AmbientOcclusion));
                }
                else
                {
                    CompressedColor = uint2(0, 0);
                }
                
            }
            else if (!bTraceRay)
            {
                CompressedColor = uint2(0, 0);
            }
            
            uint DestPos = GroupPixelId + RaySequenceId * TILE_PIXEL_COUNT;
            
            // 保存压缩的颜色和AO数据到LDS中.
            SharedMemory[LANE_PER_GROUPS * 0 | DestPos] = CompressedColor.x;
            SharedMemory[LANE_PER_GROUPS * 1 | DestPos] = CompressedColor.y;
        }
        
        GroupMemoryBarrierWithGroupSync();
        
        // 将LDS数据解压并保存到UAV中.
        BRANCH
        if (GroupThreadIndex < TILE_PIXEL_COUNT)
        {
            const uint GroupPixelId = GroupThreadIndex;
        
            float3 DiffuseColor = 0;
            float AmbientOcclusion = 0;
    
            // LDS保存了当前像素的多条射线的数据, 此处将它们解压并累加起来.
            UNROLL
            for (uint RaySequenceId = 0; RaySequenceId < CONFIG_RAY_COUNT; RaySequenceId++)
            {
                uint SrcPos = GroupPixelId + RaySequenceId * TILE_PIXEL_COUNT;
    
                uint Row0 = SharedMemory[LANE_PER_GROUPS * 0 | SrcPos];
                uint Row1 = SharedMemory[LANE_PER_GROUPS * 1 | SrcPos];
    
                DiffuseColor.r += f16tof32(Row0 >> 16);
                DiffuseColor.g += f16tof32(Row0 >>  0);
                DiffuseColor.b += f16tof32(Row1 >> 16);
                AmbientOcclusion += f16tof32(Row1 >> 0);
            }
    
            // 归一化颜色和AO等数据.
            #if CONFIG_RAY_COUNT > 1
            {
                DiffuseColor *= rcp(float(CONFIG_RAY_COUNT));
                AmbientOcclusion *= rcp(float(CONFIG_RAY_COUNT));
    
                DiffuseColor *= rcp( 1 - Luminance(DiffuseColor) );
            }    
            #endif
    
            DiffuseColor *= View.IndirectLightingColorScale;
            AmbientOcclusion = 1 - AmbientOcclusion;
    
            // 保存结果到UAV中.
            {
                uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
                uint2 OutputPixelCoordinate = ComputePixelPosition(GroupId, GroupPixelOffset);
    
                IndirectDiffuseOutput[OutputPixelCoordinate] = float4(DiffuseColor, 1.0);
                AmbientOcclusionOutput[OutputPixelCoordinate] = AmbientOcclusion;
            }
        } // if (GroupThreadIndex < TILE_PIXEL_COUNT)
    } // MainCS()
    

    由此可知,SSGI的采样生成、光线追踪、权重计算和结果累加的过程跟SSAO比较相似。经过上面的计算,最终输出了带噪点的颜色和AO纹理:

    7.4.8.2 SSGI降噪

    噪点的产生是由于每像素采样数不足,估算的结果跟实际值的方差比较大。这就需要后面的降噪步骤。降噪阶段分为三个Pass:压缩元数据、重建数据、时间累积降噪。压缩元数据Pass的输入有深度、法线等GBuffer数据,下面分析其使用的CS:

    // EngineShadersPrivateScreenSpaceDenoiseSSDCompressMetadata.usf
    
    [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
    void MainCS(uint2 DispatchThreadId : SV_DispatchThreadID)
    {
        // 计算UV和屏幕位置.
        float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
        float2 ViewportUV = BufferUVToViewportUV(SceneBufferUV);
        float2 ScreenPosition = ViewportUVToScreenPos(ViewportUV);
    
        // 获取场景元数据.
        FSSDSampleSceneInfos SceneMetadata = FetchCurrentSceneInfosFromGBuffer(ScreenPosition, SceneBufferUV);
    
        // 压缩元数据.
        FSSDCompressedSceneInfos CompressedMetadata = CompressSampleSceneInfo(DIM_METADATA_LAYOUT, SceneMetadata);
    
        // No need to keep DispatchThreadId, while SceneBufferUV is arround at highest VGPR peak.
        uint2 OutputPixelPostion = uint2(SceneBufferUV * BufferUVToOutputPixelPosition);
    
        // 保存压缩后的元数据.
        BRANCH
        if (all(OutputPixelPostion < ViewportMax))
        {
            CompressedMetadataOutput_0[OutputPixelPostion] = CompressedMetadata.VGPR[0];
        }
    } // MainCS
    

    什么是场景的元数据?答案可以在FetchCurrentSceneInfosFromGBuffer

    FSSDSampleSceneInfos FetchCurrentSceneInfosFromGBuffer(float2 ScreenPosition, float2 BufferUV)
    {
        float DeviceZ = SampleDeviceZFromSceneTextures(BufferUV);
        FGBufferData GBufferData = GetGBufferDataFromSceneTextures(BufferUV);
        
        // 处理场景采样信息.
        FSSDSampleSceneInfos Infos = CreateSampleSceneInfos();
        Infos.ScreenPosition = ScreenPosition;
        Infos.DeviceZ = DeviceZ;
        Infos.WorldDepth = GBufferData.Depth;
        Infos.WorldNormal = GBufferData.WorldNormal;
        Infos.Roughness = GBufferData.Roughness;
        
        // 计算平移后的世界位置.
        {
            float2 ClipPosition = ScreenPosition * (View.ViewToClip[3][3] < 1.0f ? Infos.WorldDepth : 1.0f);
            Infos.TranslatedWorldPosition = mul(float4(ClipPosition, Infos.WorldDepth, 1), View.ScreenToTranslatedWorld).xyz;
        }
        
        // 计算视图空间的法线.
        Infos.ViewNormal = mul(float4(Infos.WorldNormal, 0), View.TranslatedWorldToView).xyz;
    
        return Infos;
    }
    

    所谓的场景元数据,就是当前像素的屏幕坐标、设备深度、世界深度、世界法线、粗糙度、偏移世界坐标和视图空间的法线等信息。至于压缩它们的过程,则需要进入CompressSampleSceneInfo分析:

    // EngineShadersPrivateScreenSpaceDenoiseSSDMetadata.ush
    
    FSSDCompressedSceneInfos CompressSampleSceneInfo(
        const uint CompressedLayout,
        FSSDSampleSceneInfos Infos)
    {
        FSSDCompressedSceneInfos CompressedInfos = CreateCompressedSceneInfos();
    
        (......)
        
        // 压缩深度和视图空间的法线.(默认执行此分支)
        else if (CompressedLayout == METADATA_BUFFER_LAYOUT_DEPTH_VIEWNORMAL)
        {
            CompressedInfos.VGPR[0] = CompressDevizeZAndN(Infos.DeviceZ, Infos.ViewNormal);
        }
        
        (......)
    
        return CompressedInfos;
    }
    

    在SSGI涉及的压缩元数据Pass中,压缩的是深度和视图空间的法线,由CompressDevizeZAndN完成:

    uint CompressDevizeZAndN(float DevizeZ, float3 N)
    {
        uint FaceN;
        // 压缩法线.(后面由解析)
        EncodeNormal(/* inout */ N, /* out */ FaceN);
    
        // 将法线的xy从float转成uint.
        uint2 FaceCood = uint2(clamp(round(127.0 * N.xy), 0, 127.0));
        // 压缩法线和深度到32位uint.
        uint Compressed = f32tof16(DevizeZ) | (FaceN << 15) | (FaceCood.x << 18) | (FaceCood.y << 25);
        return Compressed;
    }
    
    // EngineShadersPrivateDeferredShadingCommon.ush
        
    void EncodeNormal( inout float3 N, out uint Face )
    {
        // 默认的法线朝向z.
        uint Axis = 2;
        // 如果法线的x值最大, 则将轴调整成x.
        if( abs(N.x) >= abs(N.y) && abs(N.x) >= abs(N.z) )
        {
            Axis = 0;
        }
        // 如果法线的y值最大, 则将轴调整成y.
        else if( abs(N.y) > abs(N.z) )
        {
            Axis = 1;
        }
        Face = Axis * 2;
    
        // 根据轴向调整法线的各个分量.
        N = Axis == 0 ? N.yzx : N;
        N = Axis == 1 ? N.xzy : N;
        
        // 将法线的值域调整到[0, 1]
        float MaxAbs = 1.0 / sqrt(2.0);
        Face += N.z > 0 ? 0 : 1;
        N.xy *= N.z > 0 ? 1 : -1;
        N.xy = N.xy * (0.5 / MaxAbs) + 0.5;
    }
    

    由上可知,元数据压缩就是将场景的深度和视图空间的法线压缩到32位的无符号整型,其中深度占前15bit,法线的朝向、X、Y分别占随后的3bit、7bit、7bit。元数据压缩后的输出纹理如下所示(值做了调整):

    重建数据Pass的输入有压缩的元数据、带噪点的颜色和AO纹理,下面直接分析其使用的CS Shader(结合RenderDoc截帧做了简化):

    // EngineShadersPrivateScreenSpaceDenoiseSSDSpatialAccumulation.usf
    
    [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
    void MainCS(
        uint2 DispatchThreadId : SV_DispatchThreadID,
        uint2 GroupId : SV_GroupID,
        uint2 GroupThreadId : SV_GroupThreadID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
    #if CONFIG_SIGNAL_INPUT_TEXTURE_TYPE == SIGNAL_TEXTURE_TYPE_FLOAT4
        Texture2D Signal_Textures_0 = SignalInput_Textures_0;
        Texture2D Signal_Textures_1 = SignalInput_Textures_1;
        Texture2D Signal_Textures_2 = SignalInput_Textures_2;
        Texture2D Signal_Textures_3 = SignalInput_Textures_3;
    #else
        (......)
    #endif
    
        // 计算UV.
        float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
        if (true)
        {
            SceneBufferUV = clamp(SceneBufferUV, BufferBilinearUVMinMax.xy, BufferBilinearUVMinMax.zw);
        }
        
        // 读取相关的元数据.
        FSSDCompressedSceneInfos CompressedRefSceneMetadata;
        FSSDSampleSceneInfos RefSceneMetadata;
        {
            CompressedRefSceneMetadata = SampleCompressedSceneMetadata(
                /* bPrevFrame = */ false,
                SceneBufferUV, BufferUVToBufferPixelCoord(SceneBufferUV));
            
            float2 ScreenPosition = DenoiserBufferUVToScreenPosition(SceneBufferUV);
        
            RefSceneMetadata = UncompressSampleSceneInfo(
                CONFIG_METADATA_BUFFER_LAYOUT, /* bPrevFrame = */ false,
                ScreenPosition, CompressedRefSceneMetadata);
        }
    
        // 采样相关的采样数据.
        #if !CONFIG_UPSCALE || 1
            FSSDSignalArray RefSamples;
            FSSDSignalFrequencyArray RefFrequencies;
            SampleMultiplexedSignals(
                Signal_Textures_0,
                Signal_Textures_1,
                Signal_Textures_2,
                Signal_Textures_3,
                GlobalPointClampedSampler,
                CONFIG_SIGNAL_INPUT_LAYOUT,
                /* MultiplexedSampleId = */ 0,
                /* bNormalizeSample = */ CONFIG_NORMALIZE_INPUT != 0,
                SceneBufferUV,
                /* out */ RefSamples,
                /* out */ RefFrequencies);
            
            #if CONFIG_NORMALIZE_INPUT
                FSSDSignalArray NormalizedRefSamples = RefSamples;
            #else
                // TODO(Denoiser): Decode twice instead.
                FSSDSignalArray NormalizedRefSamples = NormalizeToOneSampleArray(RefSamples);
            #endif
        #endif
    
        // 缩放卷积核系数.
        #if CONFIG_UPSCALE
            float KernelSpreadFactor = UpscaleFactor;
        #elif !CONFIG_CUSTOM_SPREAD_FACTOR
            const float KernelSpreadFactor = 1;
        #endif
    
        // 计算所需的采样数量.
        float RequestedSampleCount = 1024;
        
        #if CONFIG_SAMPLE_SET == SAMPLE_SET_NONE
            RequestedSampleCount = 1;
        #elif CONFIG_SAMPLE_COUNT_POLICY == SAMPLE_COUNT_POLICY_DISABLED
            // NOP
        #elif CONFIG_SAMPLE_COUNT_POLICY == SAMPLE_COUNT_POLICY_SAMPLE_ACCUMULATION_BASED
        {
            #if CONFIG_SIGNAL_BATCH_SIZE != 1
                #error Unable to support more than one signal.
            #endif
            RequestedSampleCount = clamp(TARGETED_SAMPLE_COUNT / RefSamples.Array[0].SampleCount, 1, MaxSampleCount);
        }
        #else
            #error Unknown policy to control the number of samples.
        #endif
    
        // 卷积核成员的别名.
        #if (CONFIG_SAMPLE_SET == SAMPLE_SET_STACKOWIAK_4_SETS) && CONFIG_VGPR_OPTIMIZATION
            float2 KernelBufferUV;
            uint SampleTrackId;
        #endif
    
        // 在空间上积累输入.
        FSSDSignalAccumulatorArray SignalAccumulators;
        {
            FSSDKernelConfig KernelConfig = CreateKernelConfig();
    
            #if DEBUG_OUTPUT
            {
                KernelConfig.DebugPixelPosition = DispatchThreadId;
                KernelConfig.DebugEventCounter = 0;
            }
            #endif
    
            // 填充卷积核配置.
            KernelConfig.SampleSet = CONFIG_SAMPLE_SET;
            KernelConfig.SampleSubSetId = CONFIG_SAMPLE_SUBSET;
            KernelConfig.BufferLayout = CONFIG_SIGNAL_INPUT_LAYOUT;
            KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
            KernelConfig.NeighborToRefComputation = NEIGHBOR_TO_REF_LOWEST_VGPR_PRESSURE;
            KernelConfig.bUnroll = CONFIG_SAMPLE_SET != SAMPLE_SET_STACKOWIAK_4_SETS;
            KernelConfig.bDescOrder = CONFIG_OUTPUT_MODE == OUTPUT_MODE_DRB;
            KernelConfig.BilateralDistanceComputation = CONFIG_BILATERAL_DISTANCE_COMPUTATION;
            KernelConfig.WorldBluringDistanceMultiplier = CONFIG_BILATERAL_DISTANCE_MULTIPLIER;
            KernelConfig.bNormalizeSample = CONFIG_NORMALIZE_INPUT != 0;
            KernelConfig.bSampleKernelCenter = CONFIG_UPSCALE;
            KernelConfig.bForceKernelCenterAccumulation = true;
            KernelConfig.bClampUVPerMultiplexedSignal = CONFIG_CLAMP_UV_PER_SIGNAL != 0;
    
            // 从1spp重构球谐函数.
            KernelConfig.bComputeSampleColorSH = DIM_STAGE == STAGE_RECONSTRUCTION && DIM_MULTI_SPP == 0;
            
            // 填充颜色空间.
            {
                UNROLL_N(SIGNAL_ARRAY_SIZE)
                for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
                {
                    KernelConfig.BufferColorSpace[MultiplexId] = CONFIG_INPUT_COLOR_SPACE;
                    KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_ACCUMULATION_COLOR_SPACE;
                }
            }
    
            // 设置双边滤波预设.
            SetBilateralPreset(CONFIG_BILATERAL_PRESET, /* inout */ KernelConfig);
    
            // SGPRs
            KernelConfig.BufferSizeAndInvSize = BufferSizeAndInvSize;
            KernelConfig.BufferBilinearUVMinMax = BufferBilinearUVMinMax;
            KernelConfig.KernelSpreadFactor = KernelSpreadFactor;
            KernelConfig.HarmonicPeriode = HarmonicPeriode;
    
            (......)
    
            // VGPRs
            KernelConfig.BufferUV = SceneBufferUV; 
            {
                #if CONFIG_REF_METADATA_COMPRESSION == CONFIG_METADATA_BUFFER_LAYOUT
                    // Straight up plumb down the compress layout to save any VALU.
                    KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
                #else
                    // Recompress the reference scene metadata
                    KernelConfig.CompressedRefSceneMetadata = CompressSampleSceneInfo(CONFIG_REF_METADATA_COMPRESSION, RefSceneMetadata);
                #endif
                KernelConfig.RefBufferUV = SceneBufferUV;
                KernelConfig.RefSceneMetadataLayout = CONFIG_REF_METADATA_COMPRESSION;
            }
            KernelConfig.HammersleySeed = Rand3DPCG16(int3(SceneBufferUV * BufferUVToOutputPixelPosition, View.StateFrameIndexMod8)).xy;
    
            (......)
    
            // 创建未压缩的累积器.
            FSSDSignalAccumulatorArray UncompressedAccumulators = CreateSignalAccumulatorArray();
    
            // 当不进行缩放时,手动强制累积内核的样本.
            if (!KernelConfig.bSampleKernelCenter && !KernelConfig.bDescOrder)
            {
                // SIGNAL_ARRAY_SIZE默认是1, 即1spp.
                UNROLL_N(SIGNAL_ARRAY_SIZE)
                for (uint SignalMultiplexId = 0; SignalMultiplexId < SIGNAL_ARRAY_SIZE; SignalMultiplexId++)
                {
                    const uint BatchedSignalId = ComputeSignalBatchIdFromSignalMultiplexId(KernelConfig, SignalMultiplexId);
                    FSSDSignalDomainKnowledge DomainKnowledge = GetSignalDomainKnowledge(BatchedSignalId);
    
                    uint2 RefPixelCoord = floor(KernelConfig.BufferUV * KernelConfig.BufferSizeAndInvSize.xy);
                    // 采样数据.
                    FSSDSignalSample CenterSample = TransformSignalSampleForAccumulation(
                        KernelConfig,
                        SignalMultiplexId,
                        RefSceneMetadata,
                        RefSamples.Array[SignalMultiplexId],
                        RefPixelCoord);
                    
                    // 采样累积信息.
                    FSSDSampleAccumulationInfos SampleInfos;
                    SampleInfos.Sample = CenterSample;
                    SampleInfos.Frequency = RefFrequencies.Array[SignalMultiplexId];
                    SampleInfos.FinalWeight = 1.0;
                    SampleInfos.InvFrequency = GetSignalWorldBluringRadius(SampleInfos.Frequency, RefSceneMetadata, DomainKnowledge);
                    
                    if (KernelConfig.BilateralDistanceComputation == SIGNAL_WORLD_FREQUENCY_PRECOMPUTED_BLURING_RADIUS)
                    {
                        SampleInfos.InvFrequency = SampleInfos.Frequency.WorldBluringRadius;
                    }
    
                    // 累积样本.
                    AccumulateSample(
                        /* inout */ UncompressedAccumulators.Array[SignalMultiplexId],
                        SampleInfos);
                }
            }
    
            #if CONFIG_SAMPLE_SET == SAMPLE_SET_STACKOWIAK_4_SETS
            {
                KernelConfig.SampleCount = clamp(uint(RequestedSampleCount) / kStackowiakSampleSetCount, 1, MaxSampleCount);
    
                (......)
                {
                    // 将内核中心放在四边形的中心, 在样本偏移中进行了半像素偏移.
                    KernelConfig.BufferUV = float2(DispatchThreadId | 1) * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
    
                    // Id of the pixel in the quad. This is to match hard coded first samples of the sample set.
                    KernelConfig.SampleTrackId = ((DispatchThreadId.x & 1) | ((DispatchThreadId.y & 1) << 1));
                }
                (......)
            }
            #elif CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_RECT || CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_ELLIPSE
            (......)
            #endif // CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_*
            
            FSSDCompressedSignalAccumulatorArray CompressedAccumulators = CompressAccumulatorArray(UncompressedAccumulators, CONFIG_ACCUMULATOR_VGPR_COMPRESSION);
    
            if (1)
            {
                // 累积卷积核.
                AccumulateKernel(
                    KernelConfig,
                    Signal_Textures_0,
                    Signal_Textures_1,
                    Signal_Textures_2,
                    Signal_Textures_3,
                    /* inout */ UncompressedAccumulators,
                    /* inout */ CompressedAccumulators);
            }
    
            (......)
            
            // 当按降序进行累积时,在任何累积之后手动采样内核的中心.
            if (!KernelConfig.bSampleKernelCenter && KernelConfig.bDescOrder)
            {
                KernelConfig.BufferUV = SceneBufferUV;
    
                SampleAndAccumulateCenterSampleAsItsOwnCluster(
                    KernelConfig,
                    Signal_Textures_0,
                    Signal_Textures_1,
                    Signal_Textures_2,
                    Signal_Textures_3,
                    /* inout */ UncompressedAccumulators,
                    /* inout */ CompressedAccumulators);
            }
            
            #if CONFIG_ACCUMULATOR_VGPR_COMPRESSION == ACCUMULATOR_COMPRESSION_DISABLED
                SignalAccumulators = UncompressedAccumulators;
            #else
                SignalAccumulators = UncompressAccumulatorArray(CompressedAccumulators, CONFIG_ACCUMULATOR_VGPR_COMPRESSION);
            #endif
        }
    
        (......)
    
        // 将空间积累信号按不同的模式转换成多路复用信号.
        uint MultiplexCount = 1;
        FSSDSignalArray OutputSamples = CreateSignalArrayFromScalarValue(0.0);
        FSSDSignalFrequencyArray OutputFrequencies = CreateInvalidSignalFrequencyArray();
        {
            {
                MultiplexCount = CONFIG_SIGNAL_BATCH_SIZE;
                
                UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
                for (uint MultiplexId = 0; MultiplexId < CONFIG_SIGNAL_BATCH_SIZE; MultiplexId++)
                {
                    UncompressSignalAccumulator(/* inout */ SignalAccumulators.Array[MultiplexId]);
    
                    OutputSamples.Array[MultiplexId] = SignalAccumulators.Array[MultiplexId].Moment1;
                    
                    // 输出最小的逆反频率作为新世界模糊半径的subsequent pass.
                    OutputFrequencies.Array[MultiplexId] = SignalAccumulators.Array[MultiplexId].MinFrequency;
                }
            }
            (......)
        }
        
        (......)
    
        // 计算输出像素位置.
        uint2 OutputPixelPostion;
        #if CONFIG_VGPR_OPTIMIZATION && !CONFIG_UPSCALE // TODO(Denoiser)
        {
            OutputPixelPostion = (uint2(KernelBufferUV * BufferUVToOutputPixelPosition) & ~0x1) | (uint2(SampleTrackId, SampleTrackId >> 1) & 0x1);
            (......)
        }
        #else
            OutputPixelPostion = ViewportMin + DispatchThreadId;
        #endif 
    
        BRANCH
        if (all(OutputPixelPostion < ViewportMax))
        {
            // 输出多路复合信号.
            (......)
            {
                OutputMultiplexedSignal(
                    SignalOutput_UAVs_0,
                    SignalOutput_UAVs_1,
                    SignalOutput_UAVs_2,
                    SignalOutput_UAVs_3,
                    CONFIG_SIGNAL_OUTPUT_LAYOUT,
                    MultiplexCount,
                    OutputPixelPostion,
                    OutputSamples,
                    OutputFrequencies);
            }
        }
    } // MainCS
    

    经过数据重建之后,输出的颜色和AO噪点有所降低:

    经过数据重建之后的颜色和AO对比图。左侧是重建前的数据,右侧是重建后的数据。

    由于数据重建之后的图像依然存在明显的噪点,这就需要降噪的最后一个阶段:时间累积降噪。它的输入有当前帧和上一帧的压缩元数据、场景颜色、时间累积数据等。其使用的CS Shader如下:

    // EngineShadersPrivateScreenSpaceDenoiseSSDTemporalAccumulation.usf
    
    void TemporallyAccumulate(
        uint2 DispatchThreadId : SV_DispatchThreadID,
        uint2 GroupId : SV_GroupID,
        uint2 GroupThreadId : SV_GroupThreadID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
        // 计算buffer UV.
        float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
        if (true)
        {
            SceneBufferUV = clamp(SceneBufferUV, BufferBilinearUVMinMax.xy, BufferBilinearUVMinMax.zw);
        }
    
        // 采样当前帧数据.
        FSSDCompressedSceneInfos CompressedRefSceneMetadata = SampleCompressedSceneMetadata(
            /* bPrevFrame = */ false,
            SceneBufferUV, BufferUVToBufferPixelCoord(SceneBufferUV));
    
        float DeviceZ;
        {
            FSSDSampleSceneInfos RefInfo = UncompressSampleSceneInfo(
                CONFIG_METADATA_BUFFER_LAYOUT, /* bIsPrevFrame = */ false,
                DenoiserBufferUVToScreenPosition(SceneBufferUV),
                CompressedRefSceneMetadata);
    
            DeviceZ = RefInfo.DeviceZ;
        }
    
        // 重投影到上一帧.
        float3 HistoryScreenPosition = float3(DenoiserBufferUVToScreenPosition(SceneBufferUV), DeviceZ);
        bool bIsDynamicPixel = false;
        if (1)
        {
            float4 ThisClip = float4(HistoryScreenPosition, 1);
    
            float4 PrevClip = mul(ThisClip, View.ClipToPrevClip);
            float3 PrevScreen = PrevClip.xyz * rcp(PrevClip.w);
            float3 Velocity = HistoryScreenPosition - PrevScreen;
    
            if (1)
            {
                float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GlobalPointClampedSampler, SceneBufferUV, 0);
                if (EncodedVelocity.x > 0.0)
                {
                    Velocity = DecodeVelocityFromTexture(EncodedVelocity);
                }
            }
    
            HistoryScreenPosition -= Velocity;
        }
    
        // 采样多路复合信号.
        FSSDSignalArray CurrentFrameSamples;
        FSSDSignalFrequencyArray CurrentFrameFrequencies;
        SampleMultiplexedSignals(
            SignalInput_Textures_0,
            SignalInput_Textures_1,
            SignalInput_Textures_2,
            SignalInput_Textures_3,
            GlobalPointClampedSampler,
            CONFIG_SIGNAL_INPUT_LAYOUT,
            /* MultiplexedSampleId = */ 0,
            /* bNormalizeSample = */ CONFIG_NORMALIZED_INPUT != 0,
            SceneBufferUV,
            /* out */ CurrentFrameSamples,
            /* out */ CurrentFrameFrequencies);
    
        // 采样历史帧数据.
        FSSDSignalArray HistorySamples = CreateSignalArrayFromScalarValue(0.0);
        {
            float2 HistoryBufferUV = HistoryScreenPosition.xy* ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
            float2 ClampedHistoryBufferUV = clamp(HistoryBufferUV, HistoryBufferUVMinMax.xy, HistoryBufferUVMinMax.zw);
            bool bIsPreviousFrameOffscreen = any(HistoryBufferUV != ClampedHistoryBufferUV);
    
            BRANCH
            if (!bIsPreviousFrameOffscreen)
            {
                FSSDKernelConfig KernelConfig = CreateKernelConfig();
    
                #if DEBUG_OUTPUT
                {
                    KernelConfig.DebugPixelPosition = DispatchThreadId;
                    KernelConfig.DebugEventCounter = 0;
                }
                #endif
                
                // 填充卷积核配置.
                KernelConfig.SampleSet = CONFIG_HISTORY_KERNEL;
                KernelConfig.bSampleKernelCenter = true;
                KernelConfig.BufferLayout = CONFIG_SIGNAL_HISTORY_LAYOUT;
                KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
                KernelConfig.bUnroll = true;
                KernelConfig.bPreviousFrameMetadata = true;
                KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_MIN_METADATA;
                KernelConfig.bClampUVPerMultiplexedSignal = CONFIG_CLAMP_UV_PER_SIGNAL != 0;
    
                // 允许做双边拒绝历史时的一点错误,以适配每帧TAA抖动.
                KernelConfig.WorldBluringDistanceMultiplier = max(CONFIG_BILATERAL_DISTANCE_MULTIPLIER, 3.0);
                
                // 设置双边预设.
                SetBilateralPreset(CONFIG_HISTORY_BILATERAL_PRESET, /* inout */ KernelConfig);
                // 卷积核的SGPR配置.
                KernelConfig.BufferSizeAndInvSize = HistoryBufferSizeAndInvSize;
                KernelConfig.BufferBilinearUVMinMax = HistoryBufferUVMinMax;
                
                #if CONFIG_CLAMP_UV_PER_SIGNAL
                {
                    UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
                    for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
                    {
                        uint MultiplexId = BatchedSignalId / CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
                        KernelConfig.PerSignalUVMinMax[MultiplexId] = HistoryBufferScissorUVMinMax[MultiplexId];
                    }
                }
                #endif
                
                // 卷积核的VGPR配置.
                KernelConfig.BufferUV = HistoryBufferUV + BufferUVBilinearCorrection;
                KernelConfig.bIsDynamicPixel = bIsDynamicPixel;
    
                #if CONFIG_PATCH_PREV_SCENE_DEPTH
                {
                    KernelConfig.RefBufferUV = HistoryBufferUV;
                    KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
                    KernelConfig.bPreviousFrameRefMetadata = true;
    
                    FSSDSampleSceneInfos PrevRefInfo = UncompressSampleSceneInfo(
                        CONFIG_METADATA_BUFFER_LAYOUT, /* bIsPrevFrame = */ false,
                        BufferUVToBufferPixelCoord(SceneBufferUV),
                        CompressedRefSceneMetadata);
    
                    PrevRefInfo.ScreenPosition = HistoryScreenPosition.xy;
                    PrevRefInfo.DeviceZ = HistoryScreenPosition.z;
                    PrevRefInfo.WorldDepth = ConvertFromDeviceZ(HistoryScreenPosition.z);
                    
                    float4 ClipPosition = float4(HistoryScreenPosition.xy * (View.ViewToClip[3][3] < 1.0f ? PrevRefInfo.WorldDepth : 1.0f), PrevRefInfo.WorldDepth, 1);
                    
                    PrevRefInfo.TranslatedWorldPosition = mul(ClipPosition, View.PrevScreenToTranslatedWorld).xyz + (View.PreViewTranslation.xyz - View.PrevPreViewTranslation.xyz);
                    
                    KernelConfig.CompressedRefSceneMetadata = CompressSampleSceneInfo(
                        KernelConfig.RefSceneMetadataLayout,
                        PrevRefInfo);
                }
                #else
                {
                    KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
                    KernelConfig.RefBufferUV = SceneBufferUV;
                    KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
                }
                #endif
    
                // 计算随机信号.
                ISOLATE
                {
                    KernelConfig.Randoms[0] = InterleavedGradientNoise(SceneBufferUV * BufferUVToOutputPixelPosition, View.StateFrameIndexMod8);
                }
                
                FSSDSignalAccumulatorArray SignalAccumulators = CreateSignalAccumulatorArray();
                FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
    
                // 累计卷积核.
                AccumulateKernel(
                    KernelConfig,
                    PrevHistory_Textures_0,
                    PrevHistory_Textures_1,
                    PrevHistory_Textures_2,
                    PrevHistory_Textures_3,
                    /* inout */ SignalAccumulators,
                    /* inout */ UnusedCompressedAccumulators);
            
                // 从累加器导出历史帧采样.
                UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
                for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
                {
                    HistorySamples.Array[BatchedSignalId] = SignalAccumulators.Array[BatchedSignalId].Moment1;
                    BRANCH
                    if (bCameraCut[BatchedSignalId])
                    {
                        HistorySamples.Array[BatchedSignalId] = CreateSignalSampleFromScalarValue(0.0);
                    }
                }
    
                UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
                for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
                {
                    FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
                    FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
    
                    // 应用历史帧的预曝光.
                    #if COMPILE_SIGNAL_COLOR
                        HistorySamples.Array[BatchedSignalId].SceneColor.rgb *= HistoryPreExposureCorrection;
                    #endif
                } // for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
            } // if (!bIsPreviousFrameOffscreen)
        }
        
        #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_DIFFUSE_INDIRECT_AND_AO && 0
            DebugOutput[DispatchThreadId] = float4(
                HistorySamples.Array[0].SampleCount / 4096,
                0,
                0,
                0);
        #endif
        
        const bool bPostRejectionBlending = true;
    
        // 历史帧数据的摒弃.
        #if (CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_MINMAX_BOUNDARIES || CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_VAR_BOUNDARIES)
        {
            FSSDKernelConfig KernelConfig = CreateKernelConfig();
            
            #if DEBUG_OUTPUT
            {
                KernelConfig.DebugPixelPosition = DispatchThreadId;
                KernelConfig.DebugEventCounter = 0;
            }
            #endif
    
            {
                KernelConfig.bSampleKernelCenter = CONFIG_USE_REJECTION_BUFFER != 0;
            
                // 历史帧摒弃已经通过任何重投影模糊了系数. 为了优先考虑样本比精度更大的抑制稳定性,所以只取参考样本的模糊距离,这取决于当前帧的深度和像素大小。
                KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_REF_METADATA_ONLY;
                KernelConfig.NeighborToRefComputation = NEIGHBOR_TO_REF_LOWEST_VGPR_PRESSURE;
    
                if (CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK)
                    KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_PRECOMPUTED_BLURING_RADIUS;
                KernelConfig.WorldBluringDistanceMultiplier = CONFIG_BILATERAL_DISTANCE_MULTIPLIER;
                
                #if CONFIG_REJECTION_SAMPLE_SET == SAMPLE_SET_NXN
                {
                    KernelConfig.SampleSet = SAMPLE_SET_NXN;
                    KernelConfig.BoxKernelRadius = 3;
                    KernelConfig.bUnroll = false;
                }
                #else
                {
                    KernelConfig.SampleSet = CONFIG_REJECTION_SAMPLE_SET;
                    KernelConfig.bUnroll = true;
                }
                #endif
    
                if (CONFIG_USE_REJECTION_BUFFER)
                {
                    // 历史摒弃有被去噪信号的两个矩(moment).
                    KernelConfig.MultiplexedSignalsPerSignalDomain = 2;
                    
                    KernelConfig.BufferLayout = CONFIG_SIGNAL_HISTORY_REJECTION_LAYOUT;
                    KernelConfig.bNormalizeSample = false;
                
                    for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
                    {
                        KernelConfig.BufferColorSpace[MultiplexId] = CONFIG_REJECTION_BUFFER_COLOR_SPACE;
                        KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_HISTORY_REJECTION_COLOR_SPACE;
                    }
    
                    // 强制采样内核中心,因为它将包含两个用于匹配场景元数据的矩.
                    KernelConfig.bForceKernelCenterAccumulation = true;
                }
                else
                {
                    KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
                    KernelConfig.BufferLayout = CONFIG_SIGNAL_INPUT_LAYOUT;
                    KernelConfig.bNormalizeSample = true;
                
                    for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
                    {
                        KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_HISTORY_REJECTION_COLOR_SPACE;
                    }
                
                    if (MAX_SIGNAL_BATCH_SIZE == 1)
                    {
                        KernelConfig.bForceAllAccumulation = CurrentFrameSamples.Array[0].SampleCount == 0;
                    }
                }
                
                SetBilateralPreset(CONFIG_BILATERAL_PRESET, /* inout */ KernelConfig);
            }
    
            // SGPR配置.
            {
                KernelConfig.BufferSizeAndInvSize = BufferSizeAndInvSize;
                KernelConfig.BufferBilinearUVMinMax = BufferBilinearUVMinMax;
            }
    
            // VGPR配置.
            {
                KernelConfig.BufferUV = SceneBufferUV;
                {
                    KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
                    KernelConfig.RefBufferUV = SceneBufferUV;
                    KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
                }
            }
    
            // 累积当前帧以节省不必要的双边性能评估.
            FSSDSignalAccumulatorArray SignalAccumulators = CreateSignalAccumulatorArray();
            {
                FSSDSampleSceneInfos RefSceneMetadata = UncompressRefSceneMetadata(KernelConfig);
                
                FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
    
                FSSDSignalArray CenterSample = CurrentFrameSamples;
                if (KernelConfig.bNormalizeSample)
                {
                    CenterSample = NormalizeToOneSampleArray(CurrentFrameSamples);
                }
    
                AccumulateRefSampleAsKernelCenter(
                    KernelConfig,
                    /* inout */ SignalAccumulators,
                    /* inout */ UnusedCompressedAccumulators,
                    KernelConfig.RefBufferUV,
                    RefSceneMetadata,
                    CenterSample,
                    CurrentFrameFrequencies);
            }
    
            {
                FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
    
                #if CONFIG_USE_REJECTION_BUFFER
                    AccumulateKernel(
                        KernelConfig,
                        HistoryRejectionSignal_Textures_0,
                        HistoryRejectionSignal_Textures_1,
                        HistoryRejectionSignal_Textures_2,
                        HistoryRejectionSignal_Textures_3,
                        /* inout */ SignalAccumulators,
                        /* inout */ UnusedCompressedAccumulators);
                #else
                    AccumulateKernel(
                        KernelConfig,
                        SignalInput_Textures_0,
                        SignalInput_Textures_1,
                        SignalInput_Textures_2,
                        SignalInput_Textures_3,
                        /* inout */ SignalAccumulators,
                        /* inout */ UnusedCompressedAccumulators);
                #endif
            }
    
            // 裁剪历史数据.
            UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
            for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
            {
                FSSDSignalSample NeighborMoment1 = CreateSignalSampleFromScalarValue(0.0);
                FSSDSignalSample NeighborMoment2 = CreateSignalSampleFromScalarValue(0.0);
                #if CONFIG_REJECTION_INPUT_MODE == REJECTION_INPUT_MODE_1UNNORMALIZED
                {
                    float NormalizeFactor = SafeRcp(SignalAccumulators.Array[BatchedSignalId].Moment1.SampleCount);
                    NeighborMoment1 = MulSignal(SignalAccumulators.Array[BatchedSignalId].Moment1, NormalizeFactor);
                    
                    #if COMPILE_MOMENT2_ACCUMULATOR
                        NeighborMoment2 = MulSignal(SignalAccumulators.Array[BatchedSignalId].Moment2, NormalizeFactor);
                    #endif
                }
                #elif CONFIG_REJECTION_INPUT_MODE == REJECTION_INPUT_MODE_2PRETRANSFORMED_MOMMENTS
                {
                    #if SIGNAL_ARRAY_SIZE != 2 * MAX_SIGNAL_BATCH_SIZE
                        #error Invalid signal array size.
                    #endif
    
                    float NormalizeFactor = SafeRcp(SignalAccumulators.Array[BatchedSignalId * 2 + 0].Moment1.SampleCount);
                    NeighborMoment1 = MulSignal(SignalAccumulators.Array[BatchedSignalId * 2 + 0].Moment1, NormalizeFactor);
                    NeighborMoment2 = MulSignal(SignalAccumulators.Array[BatchedSignalId * 2 + 1].Moment1, NormalizeFactor);
                }
                #else
                    #error NOrmalized samples. 
                #endif
            
                #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS && 0
                    FSSDSignalSample Temp = TransformSignalForPostRejection(NeighborMoment1);
                    DebugOutput[DispatchThreadId] = float4(
                        Temp.SceneColor.rgb,
                        0);
                #endif
                    
                FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
                FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
    
                // 裁剪历史数据.
                #if CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_VAR_BOUNDARIES
                {
                    #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_AO
                        const float StdDevMultiplier = 6.00;
                    #else
                        const float StdDevMultiplier = 1.25;
                    #endif
    
                    FSSDSignalSample StdDev = SqrtSignal(AbsSignal(SubtractSignal(NeighborMoment2, PowerSignal(NeighborMoment1, 2))));
                    FSSDSignalSample NeighborMin = AddSignal(NeighborMoment1, MulSignal(StdDev, -StdDevMultiplier));
                    FSSDSignalSample NeighborMax = AddSignal(NeighborMoment1, MulSignal(StdDev,  StdDevMultiplier));
    
                    if (0)
                    {
                        FSSDSignalSample QuantizationErrorMin = MulSignal(NeighborMoment1, 1 - SafeRcp(HistorySample.SampleCount));
                        FSSDSignalSample QuantizationErrorMax = MulSignal(NeighborMoment1, 1 + SafeRcp(HistorySample.SampleCount));
    
                        NeighborMin = MinSignal(NeighborMin, QuantizationErrorMin);
                        NeighborMax = MaxSignal(NeighborMax, QuantizationErrorMax);
                    }
    
                    // 变换历史数据,使其在正确的组件空间中,并规范化为裁剪盒.
                    FSSDSignalSample NormalizedHistorySample = NormalizeToOneSample(HistorySample);
                    FSSDSignalSample TransformedHistorySample = TransformInputBufferForPreRejection(NormalizedHistorySample);
    
                    // 裁剪历史.
                    FSSDSignalSample ClampedTransformedHistorySample = ClampSignal(TransformedHistorySample, NeighborMin, NeighborMax);
    
                    // 变换历史回到线性组件空间.
                    FSSDSignalSample ClampedHistorySample = TransformSignalForPostRejection(ClampedTransformedHistorySample);
    
                    // 重估抗鬼影的历史数据.
                    {
                        FSSDSignalSample RejectedDiff = AbsSignal(SubtractSignal(ClampedTransformedHistorySample, TransformedHistorySample));
    
                        // 计算历史数据被改变的程度.
                        float RejectionFactor = 0.0;
                        #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS && (CONFIG_HISTORY_REJECTION_COLOR_SPACE & COLOR_SPACE_LCOCG)
                        {
                            #if !COMPILE_SIGNAL_COLOR
                                #error Need to compile signal color.
                            #endif
                            RejectionFactor = abs(
                                Luma_To_LumaLog(ClampedTransformedHistorySample.SceneColor.x) -
                                Luma_To_LumaLog(TransformedHistorySample.SceneColor.x));
                    
                            RejectionFactor = max(RejectionFactor, 1 * max(RejectedDiff.SceneColor.y, RejectedDiff.SceneColor.z));
                            RejectionFactor = max(RejectionFactor, 1 * RejectedDiff.SceneColor.a);
                        }
                        #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_POLYCHROMATIC_PENUMBRA_HARMONIC || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_AO || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_DIFFUSE_INDIRECT_AND_AO
                        {
                            RejectionFactor = abs(ClampedTransformedHistorySample.MissCount - TransformedHistorySample.MissCount);
                        }
                        #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SSGI
                        {
                            RejectionFactor = abs(ClampedTransformedHistorySample.MissCount - TransformedHistorySample.MissCount);
                        }
                        #else
                            #error Unsupported signal rejection.
                        #endif
                
                        // 计算一个初始历史权重,就像已经删除了样本一样.
                        float FinalHistoryWeight = HistorySample.SampleCount * saturate(1 - RejectionFactor);
    
                        // 在进行摒弃前的积累时,需要确保输入权重通过.
                        if (!bPostRejectionBlending)
                        {
                            FinalHistoryWeight = max(FinalHistoryWeight, CurrentFrameSample.SampleCount);
                        }
    
                        // 当执行上采样时, 可能拥有非法的输入样本.
                        FinalHistoryWeight = max(FinalHistoryWeight, NeighborMoment1.SampleCount * 0.1);
                
                        // 应用历史权重.
                        HistorySample = MulSignal(ClampedHistorySample, FinalHistoryWeight);
                        HistorySample.SampleCount = FinalHistoryWeight;
                    }
                }
                #elif CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_MINMAX_BOUNDARIES
                {
                    FSSDSignalSample NeighborMin = SignalAccumulators.Array[BatchedSignalId].Min;
                    FSSDSignalSample NeighborMax = SignalAccumulators.Array[BatchedSignalId].Max;
    
                    // 如果没有邻居有样本, 最大样本数将为0.
                    bool bIsValid = NeighborMax.SampleCount > 0.0;
    
                    float RejectedSampleCount = 0;
                    HistorySample = MulSignal(TransformSignalForPostRejection(ClampSignal(TransformInputBufferForPreRejection(NormalizeToOneSample(HistorySample)), NeighborMin, NeighborMax)), HistorySample.SampleCount - RejectedSampleCount);
    
                    // 所有的裁剪盒都是无效的,所以历史样本也无效.
                    FLATTEN
                    if (!bIsValid)
                    {
                        HistorySample = CreateSignalSampleFromScalarValue(0.0);
                    }
                }
                #endif
    
                // 扩大最小的逆反频率.
                if (1)
                {
                    CurrentFrameFrequencies.Array[BatchedSignalId] = MinSignalFrequency(
                        CurrentFrameFrequencies.Array[BatchedSignalId],
                        SignalAccumulators.Array[BatchedSignalId].MinFrequency);
                }
                
                HistorySamples.Array[BatchedSignalId] = HistorySample;
                CurrentFrameSamples.Array[BatchedSignalId] = CurrentFrameSample;
            } // for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
        }
        #endif // CONFIG_HISTORY_REJECTION > 0
        
        // 处理并保存当前像素的所有历史样本.
        {
            UNROLL
            for (uint BatchedSignalId = 0; BatchedSignalId < MAX_SIGNAL_BATCH_SIZE; BatchedSignalId++)
            {
                FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
                FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
                FSSDSignalFrequency CurrentFrequency = CurrentFrameFrequencies.Array[BatchedSignalId];
    
                float TargetedSampleCount;
                {
                    float2 ScreenPosition = DenoiserBufferUVToScreenPosition(SceneBufferUV);
        
                    FSSDSampleSceneInfos RefSceneMetadata = UncompressSampleSceneInfo(
                        CONFIG_METADATA_BUFFER_LAYOUT, /* bPrevFrame = */ false,
                        ScreenPosition, CompressedRefSceneMetadata);
    
                    // Use the diameter, because that is the distance between two pixel.
                    float PixelWorldBluringRadius = ComputeWorldBluringRadiusCausedByPixelSize(RefSceneMetadata);
                    float WorldBluringRadius = WorldBluringRadiusToBilateralWorldDistance(PixelWorldBluringRadius);
        
                    #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK
                    {
                        float ResolutionFraction = 0.5;
                        float ToleratedNoiseRatio = 0.25 * rcp(9 * sqrt(2));
    
                        float OutputPixelRadius = CurrentFrequency.WorldBluringRadius * rcp(PixelWorldBluringRadius) * ResolutionFraction;
    
                        TargetedSampleCount = clamp(OutputPixelRadius * OutputPixelRadius * (PI * ToleratedNoiseRatio), 1, TARGETED_SAMPLE_COUNT);
                    }
                    #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS
                    {
                        float2 NormalizedScreenMajorAxis;
                        float InifinityMajorViewportRadius;
                        float InifinityMinorViewportRadius;
                        ProjectSpecularLobeToScreenSpace(
                            RefSceneMetadata,
                            /* out */ NormalizedScreenMajorAxis,
                            /* out */ InifinityMajorViewportRadius,
                            /* out */ InifinityMinorViewportRadius);
    
                        InifinityMajorViewportRadius *= View.ViewSizeAndInvSize.x;
                        InifinityMinorViewportRadius *= View.ViewSizeAndInvSize.x;
    
                        TargetedSampleCount = PI * InifinityMajorViewportRadius * InifinityMinorViewportRadius;
                        TargetedSampleCount = clamp(TargetedSampleCount, 1, TARGETED_SAMPLE_COUNT);
                    }
                    #else
                    {
                        TargetedSampleCount = TARGETED_SAMPLE_COUNT;
                    }
                    #endif
                }
    
                float PreviousFrameWeight = min(HistorySample.SampleCount, TargetedSampleCount - CurrentFrameSample.SampleCount);
                float PreviousFrameMultiplier = HistorySample.SampleCount > 0 ? PreviousFrameWeight / HistorySample.SampleCount : 0;
    
                // 信号的预变换.
                HistorySample = TransformSignal(
                    HistorySample,
                    /* SrcBasis = */ STANDARD_BUFFER_COLOR_SPACE,
                    /* DestBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE);
                CurrentFrameSample = TransformSignal(
                    CurrentFrameSample,
                    /* SrcBasis = */ STANDARD_BUFFER_COLOR_SPACE,
                    /* DestBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE);
    
                // 混合当前帧和历史帧的样本.
                HistorySample = AddSignal(MulSignal(HistorySample, PreviousFrameMultiplier), CurrentFrameSample);
            
                // 信号的后置变换.
                HistorySample = TransformSignal(
                    HistorySample,
                    /* SrcBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE,
                    /* DestBasis = */ STANDARD_BUFFER_COLOR_SPACE);
                
                HistorySamples.Array[BatchedSignalId] = HistorySample;
            }
        }
        
        // 白名单应该输出,以确保编译器编译出所有最终不需要的东西.
        uint MultiplexCount = 1;
        FSSDSignalArray OutputSamples = CreateSignalArrayFromScalarValue(0.0);
        FSSDSignalFrequencyArray OutputFrequencies = CreateInvalidSignalFrequencyArray();
        {
            MultiplexCount = CONFIG_SIGNAL_BATCH_SIZE;
            
            UNROLL
            for (uint BatchedSignalId = 0; BatchedSignalId < MultiplexCount; BatchedSignalId++)
            {
                OutputSamples.Array[BatchedSignalId] = HistorySamples.Array[BatchedSignalId];
                OutputFrequencies.Array[BatchedSignalId] = CurrentFrameFrequencies.Array[BatchedSignalId];
            }
        }
        
        uint2 OutputPixelPostion = BufferUVToBufferPixelCoord(SceneBufferUV);
            
        BRANCH
        if (all(OutputPixelPostion < ViewportMax))
        {
            OutputMultiplexedSignal(
                SignalHistoryOutput_UAVs_0,
                SignalHistoryOutput_UAVs_1,
                SignalHistoryOutput_UAVs_2,
                SignalHistoryOutput_UAVs_3,
                CONFIG_SIGNAL_HISTORY_LAYOUT,
                MultiplexCount,
                OutputPixelPostion,
                OutputSamples,
                OutputFrequencies);
        }
    } // TemporallyAccumulate
    
    // 时间累积主入口.
    [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
    void MainCS(
        uint2 DispatchThreadId : SV_DispatchThreadID,
        uint2 GroupId : SV_GroupID,
        uint2 GroupThreadId : SV_GroupThreadID,
        uint GroupThreadIndex : SV_GroupIndex)
    {
        // 时间累积主调用.
        TemporallyAccumulate(DispatchThreadId, GroupId, GroupThreadId, GroupThreadIndex);
    }
    

    在实时光线追踪领域,降噪算法有很多,诸如使用引导的模糊内核的滤波,机器学习驱动滤波器或重要采样,通过更好的准随机序列(如蓝色噪声和时空积累)改进采样方案以及近似技术,尝试用某种空间结构来量化结果(如探针、辐照度缓存)。

    滤波(Filtering)技术有Gaussian、Bilateral、À-TrousGuided以及Median,这些方法常用于过滤蒙特卡洛追踪的模糊照片。特别是由特性缓冲区(如延迟渲染的GBuffer)和特殊缓冲区(如first-bounce data, reprojected path length, view position)驱动的引导滤波器已被广泛使用。

    采样(Sampling)技术有TAA、Spatio-Temporal Filter、SVGF(Spatio-Temporal Variance Guided Filter)、Adaptive SVGF (A-SVGF)、BMFR(Blockwise Multi-Order Feature Regression)、ReSTIR(Spatiotemporal Importance Resampling for Many-Light Ray Tracing)等技术。

    近似(approximation )技术常用于尝试微调路径跟踪器的不同方面的行为。

    还有基于深度学习的技术(如DLSS),更多降噪的说明可以参看Ray Tracing Denoising

    从上面分析的代码来看,UE的屏幕空间降噪综合使用了滤波、采样的若干种技术(双边滤波、空间卷积、时间卷积、随机采样、信号和频率等等)。

    结果时间累积之后,可以看到画面的噪点更少且不明显了:

    7.4.8.3 SSGI组合

    SSGI组合的最后一个阶段,它结合降噪阶段输出的颜色、AO以及GBuffer,组合成当前帧的带非直接光和AO的场景颜色:

    组合Pass使用的是PS,代码如下:

    // EngineShadersPrivateDiffuseIndirectComposite.usf
    
    void MainPS(float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
    {
        float2 BufferUV = SvPositionToBufferUV(SvPosition);
        float2 ScreenPosition = SvPositionToScreenPosition(SvPosition).xy;
    
        // 采样GBuffer.
        FGBufferData GBuffer = GetGBufferDataFromSceneTextures(BufferUV);
    
        // 采样每帧动态生成的AO.
        float DynamicAmbientOcclusion = 1.0f;
    #if DIM_APPLY_AMBIENT_OCCLUSION
        DynamicAmbientOcclusion = AmbientOcclusionTexture.SampleLevel(AmbientOcclusionSampler, BufferUV, 0).r;
    #endif
    
        // 计算最终要被应用的AO.
        float FinalAmbientOcclusion = GBuffer.GBufferAO * DynamicAmbientOcclusion;
    
        OutColor.rgb = 0.0f;
        OutColor.a = 1.0f;
    
        // 应用漫反射非直接光.
    #if DIM_APPLY_DIFFUSE_INDIRECT
        {
            float3 DiffuseColor = GBuffer.DiffuseColor;
            if (UseSubsurfaceProfile(GBuffer.ShadingModelID))
            {
                DiffuseColor = GBuffer.StoredBaseColor;
            }
    
            OutColor.rgb += DiffuseColor * DiffuseIndirectTexture.SampleLevel(DiffuseIndirectSampler, BufferUV, 0).rgb;
        }
    #endif
    
        // 应用AO到场景颜色. 因为在延迟直接照明之前,假设SceneColor中的所有照明都是间接照明.
        {
            float AOMask = (GBuffer.ShadingModelID != SHADINGMODELID_UNLIT);
            OutColor.a = lerp(1.0f, FinalAmbientOcclusion, AOMask * AmbientOcclusionStaticFraction);
        }
    }
    

    SSGI的组合逻辑跟SSAO非常相似。

    7.4.9 其它后处理

    除了上述章节的后处理技术,实际上,还存在很多本章未阐述的技术,例如:

    • 泛光(Bloom)
    • 景深(DOF)
    • 自动曝光(又叫人眼适应,Eye Adaptation)
    • 暗角(Vignette)
    • 颗粒(Grain)
    • 颜色分级(Color Grading)
    • 颜色查找(LUT)
    • ......

    这些就留待读者自己去挖掘和研读UE源码了。

    7.5 本篇总结

    本篇主要阐述了UE的后处理,包含传统的后处理技术,如抗锯齿、色调映射、Gamma校正、屏幕半分比等,以及广义的后处理技术,如SSR、SSAO、SSGI等。

    当然,还有更多后处理本篇为涉及,这里就抛砖引玉,让读者对UE有了基础理解之后,再迈向后处理技术的更广阔的天地。

    7.5.1 本篇思考

    按惯例,本篇也布置一些小思考,以助理解和加深本篇知识的掌握和理解:

    • 阐述后处理管线的主要流程和步骤。
    • 阐述PassSequence的作用和注意事项。
    • 实现SMAA。
    • 实现自定义色调映射的后处理材质,以替换UE的默认色调映射。

    特别说明

    • 感谢所有参考文献的作者,部分图片来自参考文献和网络,侵删。
    • 本系列文章为笔者原创,只发表在博客园上,欢迎分享本文链接,但未经同意,不允许转载
    • 系列文章,未完待续,完整目录请戳内容纲目
    • 系列文章,未完待续,完整目录请戳内容纲目
    • 系列文章,未完待续,完整目录请戳内容纲目

    参考文献

  • 相关阅读:
    时间加减天数
    时间加减秒数
    什么BOM?
    js 事件基础
    js 九九乘法
    CSS3 动画基础单词语法
    css3 3D转换 基础语法
    css3 2D 转换 基础语法
    js onchange案例
    js之冒泡排序
  • 原文地址:https://www.cnblogs.com/timlly/p/15048404.html
Copyright © 2011-2022 走看看