Voodoo Registers – Part 1

I was sucked into graphics programming by demoscene productions of the late eighties and early nineties. Back then it was all about software rendering, copper lists, mode X and other neat tricks. Later, I was exposed to hardware acceleration and OpenGL on high powered graphics workstations during my time at university. These things were massively expensive and I’d have to make trips to the graphics labs to test my programs. Owning a machine capable of accelerated 3D graphics was not realistic. Until, that is, 3Dfx started producing consumer grade hardware that slashed the cost of (basic) 3D hardware from tens of thousands of dollars to a few hundred. I rushed out and got my first ‘real’ graphics card — an Orchid Righteous 3D, which was based on the 3Dfx Voodoo 1 chipset. I still have it.

Orchid Righteous 3D - A 3Dfx Voodoo 1 based 3D accellerator

Orchid Righteous 3D

Now, the graphics API used to program this was Glide. The Glide API was proprietary to 3Dfx and has been touted as a ‘low-level API’ that’s close to the hardware. In fact, the very first paragraph of the Glide 3.0 reference manual states:

The Glide library is a low-level rendering and state management subroutine library that serves as a thin layer over the register level interface to the 3Dfx Interactive family of graphics accelerators.

By 2000, 3Dfx were not doing so great and entered into bankruptcy proceedings. However, right before this, in December of 1999, 3Dfx opened the source 1 and register specifications of their Voodoo 1, 2, and 3 products. If we examine them carefully, we’ll see that the Glide API was close to the hardware insofar as it didn’t support anything that the hardware didn’t support, but it actually was not a thin layer over the register interface and certainly wasn’t the most efficient way to program the hardware. At least, we can do better, and we will.

Our Platform

Our target hardware is my original 3Dfx Voodoo 1 based Orchid Righteous 3D that I’ve owned for nearly 16 years (and still works flawlessly). The platform of choice is a PC running Linux Mint 14. As I’m going to be doing some register bashing, Linux is be a good choice of platform because it’s relatively easy to write kernel mode drivers and even take control of underlying hardware from user-mode. However, we don’t actually need to write a kernel-mode driver because there’s a Linux framebuffer driver for our Voodoo that we’ll let us do most of what we need to do. This driver is named sstfb and for our initial purposes, it’ll do. As you’ll see, though, we’ll need to work around a couple of minor issues.

First, we’ll install the framebuffer driver:

graz@minty ~ $ sudo modprobe sstfb

This results in a new framebuffer device appearing in our devices directory:

graz@minty ~ $ ls /dev/fb*
/dev/fb0  /dev/fb1

/dev/fb1 is the one we’re looking for (/dev/fb0 is the framebuffer device corresponding to our machine’s primary graphics adapter). These devices follow the Linux framebuffer device protocol, which essentially consists of being able to open them from an application (using open) query their properties and change video modes (using ioctl) and mmap them in order to access linear framebuffer memory. Of course, specific framebuffer device drivers are free to implement more, custom ioctls. The only ones that the mainline sstfb driver support are to control the VGA passthrough 2 feature.

Manipulating the Framebuffer

Normally, to use a framebuffer device, one would open it, mmap its display memory and then copy data into it. In fact, we can play videos on our trusty Voodoo with mplayer (which does exactly this) with the following commmand line:

graz@minty ~ $ sudo mplayer -vo fbdev:/dev/fb1 video.mp4

However, we need to do more than this. We need access to the memory mapped register file of the 3D accelerator. We can determine the physical address of both the display memory and the register file with the FBIOGET_FSCREENINFO ioctl. Aparrently, it’s possible to mmap a framebuffer device’s memory mapped registers directly from the framebuffer device descriptor, but I couldn’t get that to work with the sstfb device. So, I resort to brute force and (as root) mmap /dev/mem using the device’s reported register addresses. Code to open the /dev/fb1 device and mmap its registers is as follows:

int fd;
int fd_mem;
struct fb_fix_screeninfo finfo;
void * vd_mmio;

fd = open("/dev/fb1", O_RDWR | O_SYNC);
fd_mem = open("/dev/mem", O_RDWR | O_SYNC);

printf("fd opened as %d. /dev/mem opened as %d\n",
       fd, fd_mem);

ioctl(fd, FBIOGET_FSCREENINFO, &finfo);

printf("Attempt to map 0x%08X bytes @ %p for MMIO\n",
       finfo.mmio_len, (void *)finfo.mmio_start);

vd_mmio = mmap(NULL, finfo.mmio_len,
               PROT_READ | PROT_WRITE, MAP_SHARED,
               fd_mem, finfo.mmio_start);

printf("MMIO @ %p (errno = %d)\n", vd_mmio, errno);

The pointer we get back from our call to mmap is now a user-space virtual address of the register file of the Voodoo accelerator.

Bashing Registers

The register file of the Voodoo chipset is essentially organized as a large set of state registers with a few command registers interspersed amongst them. Writes to state registers are latched and writes to command registers trigger rendering that uses the values previously written into the state registers. In some cases the behavior of the command is affected by the value written to the command register, and in others, the simple act of writing any value to the register is enough to trigger execution of that command.

The first thing we’ll do is attempt to clear the display. To do this, we’ll use the fastFillCMD register, which is located at offset 0x124 from the start of the register file and triggers execution of the FASTFILL command. The FASTFILL command references several of the state registers to control its behavior. In fact, the FASTFILL command is not really a ‘clear the screen’ command, but rather a ‘fill an arbitrary rectangle with user-supplied values’ command. The bounds of the rectangle are set by two registers, clipLeftRight (at offset 0118) and clipLowYHighY (0x0x11C) registers, which set the horizontal and vertical bounds of the fill operation, respectively. We can define structures to represent these registers as follows:

union VD_CLIP_LEFTRIGHT_REG
{
    struct
    {
        unsigned int right      : 12;
        unsigned int            : 4;
        unsigned int left       : 12;
        unsigned int            : 4;
    };
    unsigned int uint_bits;
};

union VD_CLIP_LOWYHIGHY_REG
{
    struct
    {
        unsigned int highy      : 12;
        unsigned int            : 4;
        unsigned int lowy       : 12;
        unsigned int            : 4;
    };
    unsigned int uint_bits;
};

Note that the bounds of the rectangle are expressed as 12-bit numbers here, allowing offsets up to 4095. The original Voodoo 1, however, only honored 10 bits of these registers, allowing for framebuffers up to 1024 pixels wide. They were widened with Voodoo 2. Also notice the use of unnamed bitfields to skip reserved bits in the register.

The FASTFILL command can fill the color and depth buffers with an arbitrary value. Which buffers get written is controlled by the fbzMode register (at 0x110), whose definition is expressed as:

union VD_FBZMODE_REG
{
    struct
    {
        unsigned int clip_enable                    : 1;
        unsigned int chroma_key_enable              : 1;
        unsigned int stipple_enable                 : 1;
        unsigned int w_buffer_select                : 1;
        unsigned int depth_write_enable             : 1;
        unsigned int depth_func                     : 3;
        unsigned int dither_enable                  : 1;
        unsigned int rgb_write_enable               : 1;
        unsigned int depth_alpha_write_enable       : 1;
        unsigned int dither_algorithm               : 1;
        unsigned int stipple_pattern_enable         : 1;
        unsigned int alpha_channel_mask             : 1;
        unsigned int draw_buffer                    : 2;
        unsigned int depth_bias_enable              : 1;
        unsigned int y_origin                       : 1;
        unsigned int alpha_plane_enable             : 1;
        unsigned int alpha_blend_dither_subtract    : 1;
        unsigned int depth_source_compare_select    : 1;
        unsigned int                                : 11;
    };
    unsigned int uint_bits;
};

This register has a huge amount of state in it. For our clear operation, we’re really only interested in the rgb_write_enable field, which controls writes to the color buffer. Everything else can be zeros. Finally, we need to set the color we’re going to clear to. The FASTFILL command uses the color stored in the color1 register (0x148) and fills the specified register with it. The color1 register has following form:

union VD_COLOR_REG
{
    struct
    {
        unsigned int B          : 8;
        unsigned int G          : 8;
        unsigned int R          : 8;
        unsigned int A          : 8;
    };
    unsigned int uint_bits;
};

When we fill in the bits of color1 with our desired color and then write to the fastFillCMD register, the hardware will fill the rectangle defined by the clipLeftRight and clipLowYHighY registers with that color as quickly as possible.

The Register File

We can represent the register file as a structure in memory. Every register is 32 bits wide. Some are raw unsigned integers, some are floating point values, and some are bitfields as you have seen above. There are a few gaps (reserved spots) in the register file as well. By designing a structure that represents the register file and filling in the registers that we care about as we need them, we can build up our view of the underlying hardware. We can represent the reserved registers as unnamed 32 bit wide bitfields, which prevents us from accidentally writing to them. Here is the structure we have so far.

struct VD_REG_FILE
{
    unsigned int                            : 32;   /// reg000
    unsigned int                            : 32;   /// reg004
    unsigned int                            : 32;   /// reg008
    unsigned int                            : 32;   /// reg00C
    unsigned int                            : 32;   /// reg010
    unsigned int                            : 32;   /// reg014
    /* ... etc. Registers 0x018 through 0x10C live here ... */
    VD_FBZMODE_REG fbzMode;                         /// reg110
    unsigned int                            : 32;   /// reg114
    VD_CLIP_LEFTRIGHT_REG clipLeftRight;            /// reg118
    VD_CLIP_LOWYHIGHY_REG clipLowYHighY;            /// reg11C
    unsigned int                            : 32;   /// reg120
    unsigned int                            : 32;   /// reg124
    unsigned int                            : 32;   /// reg128
    unsigned int                            : 32;   /// reg12C
    unsigned int                            : 32;   /// reg130
    unsigned int                            : 32;   /// reg134
    unsigned int                            : 32;   /// reg138
    unsigned int                            : 32;   /// reg13C
    unsigned int                            : 32;   /// reg140
    unsigned int                            : 32;   /// reg144
    VD_COLOR_REG color1;                            /// reg148
};

The register file is actually larger than this. We’ll add more registers to the end of it as we start to use them. Once we have our register file structure, we simply cast the pointer we got back from our mmap operation earlier to a volatile pointer to an instance of it and start reading and writing its members.

All Clear Ahead

Now we have everything we need to issue our first command to our Voodoo board. I’ve constructed a simple class that holds the pointer to the register file and will eventually manage other state for us. For now, we’ll build a function to fill an color arbitrary rectangle with an arbitrary color.

void voodoo::fill_color_rect(unsigned int R,
                             unsigned int G,
                             unsigned int B,
                             unsigned int A,
                             unsigned int left,
                             unsigned int top,
                             unsigned int right,
                             unsigned int bottom)
{
    // Set the fill bounds
    regs->clipLeftRight.uint_bits = VD_CLIP_LEFTRIGHT_REG(left, right).uint_bits;
    regs->clipLowYHighY.uint_bits = VD_CLIP_LOWYHIGHY_REG(top, bottom).uint_bits;

    // Set the clear color
    regs->color1.uint_bits = VD_COLOR_REG(R, G, B, A).uint_bits;

    // Set the write mask
    VD_FBZMODE_REG fbzMode;

    fbzMode.uint_bits = 0;
    fbzMode.rgb_write_enable = 1;
    regs->fbzMode.uint_bits = fbzMode.uint_bits;

    // Issue the command
    regs->fastfillCMD = 0;
}

Notice that although I’m filling in individual bit-fields here, I’m only ever writing to the uint_bits field of the register file. This is because the compiler would otherwise insert read-modify-write cycles to PCI address space due to the volatile declaration of the register file. This is not what we want. At best that would be slow, and at worst it could trigger undesired behavior, hang the hardware, or otherwise bork our system.

To call the clear function, we just construct an instance of our voodoo class around the memory mapped register file (the constructor captures the register file pointer) and then start calling its member functions:

voodoo board(vd_mmio);

board.fill_color_rect(92, 78, 189, 0, 0, 0, 640, 480);

And the result…

The result of our first clear

Screen Cleared to Purple

Congratulations to us, we’ve just issued our first accelerated operation on the trusty Voodoo 1! In the next post we’ll look at the Voodoo’s status register and figure out how to draw a triangle.

Notes:

  1. Initial checkin of the Glide source code to Sourceforge by 3Dfx was on Wednesday Nov 24 21:39:26 1999 UTC
  2. The first few Voodoo chipsets supported only 3D and relied on the existing 2D graphics card for VGA support. The VGA card was connected to the 3Dfx card via a short cable, and the 3Dfx card would switch between the VGA card and its own video signal using analog electronics. Some 3Dfx boards used solid-state electronics to implement this switch, but mine features mechanical relays, which make a satisfying click as 3D acceleration engages.
Hardware, Retro , ,

6 responses to Voodoo Registers – Part 1


  1. Steen Rasmussen

    So I finaly put together a voodoo1 machine. It’s an older dual-core Atom (330) running Linux Mint Mate 15 (32-bit). However when I modprobe sstfb it sort of locks up the screen with distorted blue curves. I tried replacing the Orchid with another voodoo1 but the result was the same. Not sure if the integrated GPU could cause problems but I’ll try another setup asap.

    • Steen Rasmussen

      Oh well, didn’t have much luck with two other setups. Basically the same happens just with a little variation to the output. I even went so far as to install Win98SE on an AMD Barton 3200+ and nForce2 system to confirm whether the Orchid Righteous 3D card was still working. I was able to play both Quake II and Unreal Tournament so the card seems to be OK.

  2. That’s odd. I’ve been using an Orchid Righteous 3D (Voodoo 1) and an STB (I think) Voodoo 2. Both work fine. This is Linux Mint 14 64-bit. I use 2 monitors – one plugged into the Voodoo and one directly to the regular VGA. That way, I can see what I’m doing. You can try an application like mplayer that can render using the framebuffer device. Is there anything odd in dmesg?

    • Steen Rasmussen

      Do you still use a passthrough cable between the cards?

      • No. There is one monitor attached directly to each card. You could, but then you don’t see what you’re doing when the Voodoo is active.

        • Steen Rasmussen

          Hah, I used the passthrough cable because that was how it was done in the old days but when I pause and think about it I can see that I really isn’t needed. I still get the initial blue curves but I have no problem playing a video with mplayer so it actually seems to work alright. Many thanks for pointing me in the right direction.