preview.
Crossfire is fairly easy to explain, mostly because a lot of the
complexities involved in explaining an M3DR scheme like SLI disappear with
Crossfire. You'll see what I mean in due course. I'm going to make it as
brief as possible because of that, while not leaving out the key concepts to
understand should you wish to grasp Crossfire from a technical sense. Even
if you don't really care about how it works, it's still worth at least
glancing over the key details. I'll try to highlight those as I go along, so
even the least interested or technically minded of you can pick it all up.
Let's start with how joining more than one recent ATI graphics processor
works, to produce a frame of output that both have worked on.
Supertiling
Everyone that writes Crossfire up should explain Supertiling to you, since
it's the key element to the entire M3DR scheme working. When any recent ATI
graphics processor - from the R300 (which powered the Radeon 9700 and Radeon
9500-series of products), right up to the current ATI flagship GPU, R480
(Radeon X850-series) - renders a 3D scene, it has done so by splitting the
scene up into tiles. Those tiles, usually 16x16 pixels in size, cover the
entire screen, making up the frame from a pixel mosaic. Those mosaic tiles
are processed by the pixel engine and pixel output pipelines in the Radeon
GPU (fragment processor and pixel ROP). To understand that, here's a quick
refresher course on the basic building blocks of a modern immediate-mode 3D
processor. Skip this bit if you know how an IM-focussed 3D processor works,
in basic terms.
The CPU and graphics driver work together to feed the GPU with geometry
data, in the form of triangle primitives. Triangles are the basic building
blocks of all geometry you'll see on your screen, in any modern 3D
accelerator. Want to display a sphere, cylinder, box, or any other geometric
shape, on a computer screen? It's built from tris. The GPU processes those
tris in its vertex processors. The vertex processors are complex
mini-processors made up of a combination of vector (basically a point and
direction in 3D space) and scalar (how big the vector is) arithmetic units
(ALUs). They work together in a SIMD or MIMD fashion to output triangles to
the triangle setup engine.
The triangle setup engine converts the triangle batches to pixel fragments
via specialised silicon called the rasteriser. Each fragment is assigned a
set of parameters and attributes that tell the fragment processor (the
correct term for the pixel processors, since they operate not on whole
pixels, but pixel fragments) things like what colour the fragment is, and
what fragment programs to run for that particular fragment. This is where
things become relevant for Supertiling, since the fragment units operate on
pixel blocks, called quads. Quads are a block of 2x2 pixels. Modern
immediate-mode render architectures, like NV40 and R480, operate on quads
for reasons of efficiency and ease (relatively speaking) of design.
Processed fragments, output by the fragment units, are processed by the
ROPs, which perform functions like colour combining and sampling,
anti-aliasing (Z-sampling) and buffer blends, before writing the pixel out
to the output buffer, for display on your screen. The entire process is then
repeated as fast as it can. There are huge amounts missed out in all three
stages (vertices -> pixels -> ROPs), but that's the basics.
That grouping of pixel fragments into quads, and then screen tiles, by the
rasteriser, is how Supertiling works. With one GPU, that GPU processes all
the screen tiles, effectively Supertiling on its own. With more than one GPU
involved, though, each one gets a split of the tiles to work on, with the
final output combined at the end so you can see it. Since the tiling takes
place after rasterisation, it has an impact on overall performance. I'll
explain that shortly.
The important thing to understand is that everything after rasterisation can
be accelerated in the M3DR scheme that Supertiling allows, that Crossfire
implements.
How the Supertiling mode of Crossfire affects performance
Good question. Obviously, if the tile rendering acceleration only happens
after rasterising the fragments, everything before that is unaccelerated.
With Crossfire, or any other Supertiling M3DR implementation, all geometry
is passed to each GPU that's participating. That obviously means that
geometry performance can't scale absolutely. If each GPU has to process all
of the geometry that all the others are working on, how can they accelerate
the creation of rasterised fragments?
ATI optimise what each rasterisation unit works on by discarding fragments
that'll never be processed, inside of the tiles that each GPU is being asked
to render. Basically the GPU interrogates the fragment to find out where it
lies in screen space. If the fragment overlaps or lies completely inside the
tile boundary for any of the tiles the GPU is processing, it keeps it to
process. If not, it's discarded and no further processing is done on it,
saving valuable bandwidth and processing power.
So geometry performance can't traditionally scale with Supertiling, since
all tris must be at least analysed by all GPUs, but the end result can be
calculated faster. If you've been paying attention, you'll also have spotted
the absolutely key point for ATI's positioning with Crossfire. Absolutely
all 3D operations performed at the pixel fragment level and above on a
Radeon GPU are done on screen tiles. Which means all your current games and
applications are rendered in this tiled fashion on a Radeon GPU as we speak,
and are accelerated just fine. Further, that means, with (hopefully) a very
small number of exceptions, all games titles will be automatically
accelerated by a Crossfire setup. No profile list to turn it on for games,
just CATALYST A.I. to turn it off, if needed.
The main reasons why Crossfire won't be enabled for a game or application
are mainly explained by what happens to image quality in a Crossfire setup.
Let me explain that in more detail.
How Supertiling with Crossfire affects image quality
Since the ROP units operate on fragments from screen tiles, and the ROP
units are where anti-aliasing is performed, sampling fragment depth, image
quality from anti-aliasing can be increased. Any Radeon GPU from the R300
upwards has a sample grid (where the hardware knows to sample inside of a
pixel) that's 12x12 in size. From that 144-position grid, samples are chosen
by the hardware for depth sampling the pixel to be processed. Check out the
sample grids for R300 and higher hardware, here, and the explanation about
how "temporal" anti-aliasing works, here.
With Supertiling, something similar to "temporal" can happen. All tiles are
rendered by all GPUs, but the depth sample grids are different for each GPU.
After processing, the resulting sample data is combined, increasing the
number of samples per pixel. So while the maximum number of multisamples per
GPU doesn't increase, the effective number of multisamples does, by a
multiple of the number of GPUs participating. For a dual-board Crossfire
solution based on X800 or X850, that's 12 multisamples per pixel from that
144-position grid (6 samples each). In other words, 12X AA. ATI call that
Super AA. Join me in a groan.
If you're in Supertiling mode, using Superduper AA, you can also mix in
super-sampling with the multi-sample anti-aliasing, to anti-alias texture
data, too (multi-sampling is geometry anti-aliasing using depth sampling,
not texture super sampling). You can sample the texture twice per pixel (2X
RGSS) along with the sparse-grid multi-sampling you're doing in Super AA
mode. Twelve geometry samples and two texture samples is apparently 14X AA,
according to ATI.
To be fair, they've seemingly named it that way to make it easy for the
consumer to understand. However, if you want to get it right, call it dX
SGMS plus 2X RGSS, where d is the number of multi-samples used across all
boards, in Supertiling mode, with Super AA. SGMS is sparse-grid
multi-sampling and RGMS is rotated-grid super-sampling.
That also affects performance. In most cases, you can likely double your AA
level at at least the same framerate, for increased image quality at no
speed penalty (given identical boards).
Can't you accelerate geometry performance somehow?
Yeah, you can, but not with Supertiling. ATI apparently own the patent to
the alternate frame rendering method you can apply to M3DR schemes, which
NVIDIA uses with SLI. ATI offer AFR as a mode to pair with Crossfire, too.
AFR avoids a number of performance pitfalls available with other M3DR modes
like Supertiling and SFR, since there's no load-balancing to perform, just
buffering of frame data to keep all GPUs busy as much as possible. However,
you lose the ability to increase image quality in AFR mode, like you can
with Supertiling. So for titles where you're geometry limited in some way
and you want to use AFR, you can't get more than 6X anti-aliasing and it
seems there's no texture AA available either, although we'll see.
Any other modes?
Along with Supertiling and AFR, there's a mode called Scissor. Scissor chops
the screen horizontally, with each GPU getting a section to render. I'd
imagine that the break is aligned on a screen tile boundary. It seems to be
fixed, too. It's not like NVIDIA's SFR mode where the split is load-balanced
and can happen on any pixel scanline, and can be adjusted on a per-frame
basis. Rather it seems, at the moment, to be fixed 50/50 on tile boundaries,
and doesn't move. More on that mode as and when I get it, since ATI's
documentation is a bit pants in that regard.