BFP's Forging Notes: November 2011

Hello gents,
this post is just a quick recap about the possible ways to reconstruct position from the depth buffer that I found around: almost all the credits goes to http://mynameismjp.wordpress.com/.

Let's define (again) the problem:

Reconstruct pixel position from the depth buffer.

Applying a personal way of seeing code, let's put in evidence Data and Transformations.
In my experience, I came up with a simplicistic idea about coding:

Coding is a sequence of Data transformed into other Data.

I know it is very simplicistic, and low level (we're not taking in account any architecture) but this is a low-level view of the problem.
And more views of the same problem can shed more light on the true nature of the problem itself (as in life in general).

In this problem we have two data: Pixel Position and Depth buffer.
The transformation is reconstruction.

To understand further, we can define the Domains of the datas.

Pixel position can be either in World Space or in View Space
Depth buffer can be encoded either in linear or in post-perpsective z (raw depth buffer)

So either the transformations will be from Depth buffer to Pixel position and can be:

Linear depth buffer to View Space position
Linear depth buffer to World Space position
Post-Perspective depth buffer to View Space position
Post-Perspective depth buffer to World Space position

To finish, we have two other transformations:

Encode to Linear depth buffer
Encode to Post-Perspective depth buffer

the post-perspective is hidden by the hardware, and it is what it's inside the real depth buffer.

The linear one maps the eye/camera/view space z to the domain 0..1.

To really finish this introduction to the problem, we must know a little bit about our coordinate system. Moving data from world to view to projection spaces, we must define those domains.

We can just skip World space and concentrate on the other.

If we follow OpenGL or DirectX APIs, we know that they are different in both spaces:

OpenGL uses a right-handed system for the view space
DirectX uses a left-handed one
OpenGL uses a cube between (-1, 1) on x,y,z as projection cube
DirectX uses a cube between (-1,1) on x,y and (0, 1) on z

Using a right-hand system ends up looking at negative z. Keep this in mind.

ENCODING AND DECODING TRANSFORMATIONS

In this section we'll talk about encoding: what we want to encode?

The raw-depth-buffer contains a depth transformed from the view-space depth, and they are encoded in a simple way, depending on your projection matrix.

Let's take only the relevant part of the matrix (the last 2x2 corner), that is:

( A B ( zView
-1 0 ) 1 )

and multiply it with the point in viewspace Pview(zView, 1).
Doing the multiplication has the result:

Pndc = (A * zView + B, -zView )

to became a 3d point (1d here) we apply the division by W:

Pndc = ( A * zView + B / -zView, 1 ) that further simplified became

Pndc = ( -A - (B / zView), 1).

Zndc so is -A - (B /zView).

This is the way in which the depth is encoded in the depth buffer, and the value is between -1 and 1.
Note: if you try to do some maths and put zView = n and zView = f, you'll notice that the values are not mapped correctly between -1 and 1. This is because we're using negative values, so the correct ones are zView = -n and zView = -f.

To find zView, just solve by zView and we'll obtain:

zView = -B / (zNdc + A )

So now we have defined the two transformations:

Projection-Space Encoding: -A - (B / zView )
Projection-Space Decoding: -B / (zNdc + A)

Ok then, it's finished.

Wait...what are thos A and B???

Those values depends again on the choice of your projection matrix.

In OpenGL they are defined as (n is near plane, f is far plane):

A = - (f + n) / (f - n)
B = -2 * n * f / (f - n)

those values can be easly calculated and passed to the shaders (don't bother doing it inside a shader, those are perfect values to be set once in a frame with other frame-constants) to reconstruct depth.

Different is the linear depth encoding. We're still encoding view-space depth, but that became easier. The values in camera/eye/view space are like world-space, but just centered around the camera. For the right-handed systems, we will encode all the negative z, because the camera is looking into the negative z semi-space.
The z values will be in the range 0, -infinite: the projection will take care of getting rid of values that are smaller than the near plane and greater than the far.

Linear depth encoding: -zView / f
Linear depth decoding: zLin * f

Those values are between 0 and 1.

Finally...some CODE!!!

This is POST-PROJECTION DEPTH:

// Calculate A and B
float rangeInv = 1 / (gFar - gNear);
float A = -(gFar + gNear) * rangeInv;
float B = -2 * gFar * gNear * rangeInv;

// Write -1,1 post-projection z
float encodePostProjectionDepth( float depthViewSpace )
{
 float depthCS = -projParams.x - (projParams.y / depthViewSpace);
 return depthCS;
}
// Read -1,1 post-projection z
float decodePostProjectionDepth( float2 uv )
{
 float depthPPS = tex2D( depthMap, uv ).r;
       return depthPPS;
}
// Reconstruct view-space depth (0..far)
float decodeViewSpaceDepth( float2 uv )
{
       float depthPPS = decodePostProjectionDepth( uv );
 float depthVS = -B / (A + depthPPS);
       return depthVS;
}

This is Linear depth

// Encode 0..1 view-space depth
float encodeLinearDepth( float depthViewSpace )
{
 return -depthViewSpace / far;
}
// Decode 0..1 view-space depth
float decodeLinearDepth( float2 uv )
{
 float linearDepth = tex2D( depthMap, uv ).r;
 return linearDepth;
}
// Reconstruct view-space depth (0..far)
float decodeViewSpaceDepth( float2 uv )
{
 float linearDepth = decodeLinearDepth( uv );
 return linearDepth * f;
}

As you can see using a linear depth is easier to encode and decode, but it's more expensive from a memory point of view (you'll need an additional render target), and you're already using a depth buffer so you already have those informations.

Next stop is a service post about reconstruction methods for position, even though they are explained a lot by Matt Pettineo on his blog!

P.S. Fixed a typo in the postProjectionDepth encoding. Fixed a typo in the A and B calculations.

Hi guys,

a simple follow up about the very low-level architecture I'm using in the latest months, after having a look at DirectX 11.

In the context of rendering, we can assume that every time we render we know exactly which type of renderer (DX9, DX11, LibGCM, X360, ... ) we are using: we don't want to switch renderer on the fly, and on consoles this is impossible to do.

So starting with this in mind, we know and want that in the executable we will build there will be only one renderer.

This can be achieved with a dll/lib in a different projects as you prefer, but the bit I want to talk about is the RenderInterface.

First of all, what is a RenderInterface?

We have another context parameter: we know that during the rendering phase of our game, we need to provide three main informations to the graphics card and they are

Geometry informations
Shading informations
Render states

and we set those informations in various way, for example on DX/X360 we use Set*** command and Draw*** to issue the drawcall.

The geometry informations are relative to vertex buffer, vertex format/declaration and index buffer; the shading ones are the various shaders (depending on the API, vertex, fragment, geometry,...) and the informations to be used by the shaders (constants and textures); the render states are the all the other informations, like render targets, depth/stencil, alpha blending, so all the configurable states that are grouped in directx 10 and 11.

The render interface thus is splitted in two: a RenderContext, that sets all the informations to issue drawcalls only and draws, and the RenderDevice that manage the creation, destruction and mapping/unmapping of low level graphic resources.

This division permit to easly divide what is "deferrable" to what not, so if you want to create your own command buffer or use the DX11 one (good luck) than you already know that the RenderContext is the right guy to call.

Every object that can be renderable will have a render method that will take pass the RenderContext around, so that it can set the data for the draw calls.

The real catch is to use the curiously recursive template pattern to create the interface for both the RenderContext and the RenderDevice, and create the different implementations for each platforms: even though you need to typedef the specific template implementation, you can assume (see above) that for each target you have only ONE type for the RenderContext implementation (RenderContext) alive and thus you can use it.

The methods called in the API-dependent class can be all protected so that you enforce the interface, and inlining all the calls in the RenderContext class will map a call of your render context to a direct call of the method, thus avoiding virtuals and with "static polymorphism".

Even though on PC is not a cost, on consoles ( I really suggest you to try, if you can ) is a bad hit (especially on ps3) to call virtual functions a LOT of times, but let's try to figure out the numbers:

if you have 1000 draw-calls, probably you'll have 4 or 5 RenderContext calls (SetVertexBuffer, SetIndexBuffer, SetVertexShader,SetPixelShader,SetConstants,SetVertexFormats, DrawIndexed, ...) for each draw-call, thus having 4000-5000 virtual calls for each frame. So you end up having 4000-5000 cache misses per frame and all without any apparent reason, and the cost of cache misses on consoles...is varying, but can be from 40 to 600 cycles for each call.

How many cycles are we wasting?

With this system, you have a common interface and no virtuals. No silver bullet, but the problem to find a solution requires a correct definition of the constraints...

BFP

BFP's Forging Notes

Monday, November 28, 2011

Position reconstruction from depth (1)

Wednesday, November 23, 2011

Rendering Architecture (2)

Blog Archive

Followers