科技: 人物 企业 技术 IT业 TMT
科普: 自然 科学 科幻 宇宙 科学家
通信: 历史 技术 手机 词典 3G馆
索引: 分类 推荐 专题 热点 排行榜
互联网: 广告 营销 政务 游戏 google
新媒体: 社交 博客 学者 人物 传播学
新思想: 网站 新书 新知 新词 思想家
图书馆: 文化 商业 管理 经济 期刊
网络文化: 社会 红人 黑客 治理 亚文化
创业百科: VC 词典 指南 案例 创业史
前沿科技: 清洁 绿色 纳米 生物 环保
知识产权: 盗版 共享 学人 法规 著作
用户名: 密码: 注册 忘记密码?

最新历史版本 :Light pre-pass渲染器 返回词条

  • 编辑时间: 历史版本编辑者:土土
  • 内容长度:图片数:目录数:
  • 修改原因:

Light pre-pass渲染器回目录

大约1个月前,我买了iPhone 4S,用这个新设备编写了些许代码。尽管这个设备不支持多渲染目标(游戏邦注:简称“MRT”),但是它支持对浮点渲染目标的渲染支持(仅限iPhone 4S和iPad2)。

所以,我用light pre-pass渲染器来进行测试:

在测试中,通过3个后期处理过滤器(游戏邦注:flimic tone mapping、bloom和照片过滤器)实现HDR光照(游戏邦注:gamma的值为2.0而不是2.2)。在测试场景中,3个方向光照和30点光照配合2个皮肤模型使用,同时运行bullet引擎,帧率约为28-32fps。




G-Buffer存储视图空间常量(from gamasutra)

G-Buffer存储视图空间常量(from gamasutra)

缓冲器深度(from gamasutra)

缓冲器深度(from gamasutra)




light Bound(from gamasutra)

light Bound(from gamasutra)



light Buffer(from gamasutra)

light Buffer(from gamasutra)




程序在低分辨率的环境下运行,后台缓冲为480 X 320像素。而且,G-buffer和后期处理纹理进一步扩展至360 X 300像素。这可以减少像素着色器需要映射的碎片数量。



阴影地图(from gamasutra)

阴影贴图(from gamasutra)


基本阴影地图(from gamasutra)

基本阴影贴图(from gamasutra)

模糊化的基本阴影地图(from gamasutra)

模糊化的基本阴影贴图(from gamasutra)

级联阴影地图(from gamasutra)

级联阴影贴图(from gamasutra)

模糊化的级联阴影地图(from gamasutra)

模糊化的级联阴影贴图(from gamasutra)


在这篇文章中,我描述了用来让light pre-pass渲染器在iPhone上运行以实现带有30个动态光照和30fps帧率的方法。但是,为了维持动态光照、HDR光照和后处理过滤器,必须牺牲高分辨率。

而且,测试中没有进行抗齿锯措施,因为帧率不够高。如果使用基本阴影贴图而不用级联,或许可以实现MSAA。(本文为游戏邦/gamerboom.com编译,作者:Simon Yeung)

In-Depth: Light pre-pass renderer on iPhone

Simon Yeung


About a month ago, I bought an iPhone 4S, so I wrote some code on my new toy. Although this device does not support multiple render target (MRT), it does support rendering to a floating point render target (only available on iPhone 4S and iPad2).

So, I tested it with a light pre-pass renderer:

In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with three post processing filters (flimic tone mapping, bloom, and photo filter). In the test scene, three directional lights (one of them cast shadow with four cascade) and 30 point lights are used with two skinned models, running bullet physics at the same time, which can have around 28~32fps.

G-buffer layout

I have tried two different layout for the G-buffer. My first attempt is to use one 16-bit render target with the R channel storing the depth value, the G and B channels storing the view space normal using the encoding method from “A bit more deferred-CryEngine 3″, and the A channel storing the glossiness for specular lighting calculation.

But later I discovered that this device support the openGL extension GL_OES_depth_texture, which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding, and the A channel storing the glossiness while the depth can be sampled directly from the depth texture.

Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.

Stencil optimization

The second optimization is to optimize the deferred lights, using the stencil trick by drawing a convex light polygon to cull those pixels that do not need to perform lighting.

However, after finishing implementing the stencil trick, the frame rate drops… This is because when filling the stencil buffer, I used the shader that is the same as the one used for performing lighting. Even if the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead, which improves the performance.

Also, drawing out the shape of the point lights made me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) has a large area that does not get lit, so I switched to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.

Combining post-processing passes

Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. I combined these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader, which is faster than before.


The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments that need to be shaded by the pixel shaders.


In the scene, cascaded shadow map is used with four cascades (resolution= 256×256). I have tried using the GL_EXT_shadow_samplers extension, hoping that it can helps the frame rate. But the result is disappointing, as the speed of the extension is the same as performing comparison inside the shader…

It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives a little performance boost depending on how many point lights on screen. Of course, switching off the blur will speed up the shadow calculation a lot.


In this post, I described the methods used to make a light pre-pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters.

Also, no anti-aliasing is done in the test as the frame rate is not good enough. Maybe MSAA can be done if the basic shadow map is used instead of cascade. But we will leave that for future investigation. (Source: Gamasutra)

→如果您认为本词条还有待完善,请 编辑词条

标签: Light pre-pass渲染器