PSX Notes

This book is a collection of notes related to the PSX (PlayStation 1), geared towards emulator development. It is intended to be used as a complementary resource to the very good PSX-SPX.

note

While information in this book is supposed to be correct, this is ensured on a best effort basis. If you find something wrong or missing, please open an issue or create a pull request.

CPU

Notes related to the CPU of the PSX.

Encoding of BLTZ/BGEZ/BLTZAL/BGEZAL

Both PSX-SPX and the R3000 manual specify these instructions as having the following encoding:

000001 | rs   | 00000| <--immediate16bit--> | bltz
000001 | rs   | 00001| <--immediate16bit--> | bgez
000001 | rs   | 10000| <--immediate16bit--> | bltzal
000001 | rs   | 10001| <--immediate16bit--> | bgezal

This is only partially correct. The true encoding is the following:

000001 | rs   | xxxx0| <--immediate16bit--> | bltz
000001 | rs   | xxxx1| <--immediate16bit--> | bgez
000001 | rs   | 10000| <--immediate16bit--> | bltzal
000001 | rs   | 10001| <--immediate16bit--> | bgezal

Where xxxx is any bit sequence other than 1000.

This behaviour is required for passing the b_0xXX_y tests in Amidog’s CPU test.

Unaligned loads and stores

These instructions are surprisingly confusing, but become easier to grasp once you understand their use case, which is performing unaligned reads/writes from/to memory.

LWL

Description

LWL’s goal is reading the “left” (i.e. higher order) part of an unaligned word in memory into the “left” part of a register.

After computing the address addr to operate on, LWL performs the following:

Iterate through the bytes of rt, from highest order to lowest (i.e. big endian)
At the same time, also iterate through bytes in memory starting at addr and decreasing (since the R3000 is little endian, this is also highest order to lowest)
In each iteration, set the byte of rt to the byte in memory
Stop once the byte address crosses a word boundary (in practice, this means you’ll iterate exactly addr % 4 times)

Examples

// starting state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BEEF

// operation: load left from address 6
lwl rt, 6($0)

// finishing state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]
rt:            0x6655_44EF

// starting state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BEEF

// operation: load left from address 4
lwl rt, 4($0)

// finishing state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0x44AD_BEEF

LWR

Description

LWR’s goal is reading the “right” (i.e. lower order) part of an unaligned word in memory into the “right” part of a register.

After computing the address addr to operate on, LWR performs the following:

Iterate through the bytes of rt, from lowest order to highest (i.e. little endian)
At the same time, also iterate through bytes in memory starting at addr and increasing (since the R3000 is little endian, this is also lowest order to highest)
In each iteration, set the byte of rt to the byte in memory
Stop once the byte address crosses a word boundary (in practice, this means you’ll iterate exactly 4 - addr % 4 times)

Examples

// starting state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BEEF

// operation: load right from address 3
lwr rt, 3($0)

// finishing state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BE33

// starting state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BEEF

// operation: load right from address 1
lwr rt, 1($0)

// finishing state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDE33_2211

LWL and LWR example usage together

// starting state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0xDEAD_BEEF

// let's load the word at address 1
// operation: load right from address 1
lwr rt, 1($0)
// operation: load left from address 4
lwl rt, 4($0)

// finishing state
memory:        [0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, ..]
> address:     [0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    ..]
> word_index:  [0,    0,    0,    0,    1,    1,    1,    1,    2,    2,    ..]

rt:            0x4433_2211

note

Both LWL and LWR are special cased in the hardware to be able to bypas the load delay slot. This means that performing any load to a register and then immediately using either LWL or LWR to load into the same register will have the second load use the value loaded by the first one, even with it being in the delay slot.

At the same time, however, a load cancel will happen, so the instruction right after the second load will not see the value loaded by the first instruction.

SWL and SWR

These two instructions work the exact same as their load counterparts, except that instead of loading the bytes from memory and putting them into the bytes of rt, they store the bytes of rt in memory.

Load Cancelling

Sequential loads to the same register exhibit “load cancelling”: non-terminal (i.e. not the last in a sequence) loads are completely discarded and only the last load will take effect.

This behaviour is required for passing some of the CPU MEM DLY tests in Amidog’s CPU test.

note

“Load” here refers to any kind of value move into a register, not only memory loads. This is specially important for instructions like jal, bltzal and bgezal, which modify ra and also exhibit load cancelling.

Example

Consider the following program:

lb t0, 0($a0) // load value at address in a0 to t0
lb t0, 0($a1) // load value at address in a1 to t0
nop
nop

Since both loads have t0 as their target register, load cancelling will happen and the first load will not be visible at any point in time. That is, up until the first nop, the value of t0 remains unchanged. It is only at the second nop that the value of t0 changes to that of the value loaded by the second load:

lb t0, 0($a0) // t0 unchanged
lb t0, 0($a1) // t0 unchanged
nop           // t0 unchanged
nop           // t0 changed to the value loaded by second instruction

Here’s another example:

lb t0, 0($a0)        // t0 unchanged
addi t0, r0, 0x0001  // moves 0x0001 into t0
nop                  // t0 is still 0x0001 - the first load was cancelled

Manual Errata

A list of corrections of the R3000 manual.

SLTIU

The manual describes SLTIU as putting the result into rt, but show pseudocode that put’s the result into rd. The pseudocode is wrong - rt is the correct register.

GPU

Notes related to the GPU of the PSX.

GPU Commands

GPU commands are composed of a sequence of 32 bit values (which i’ll call packets) sent through registers GP0 and GP1, with the first register being used for rendering commands while the latter is used for display commands.

Most commands require a single packet, but some require more.

GP0 - Rendering Commands

note

This page is mostly intended as encoding reference - for more information about behaviour consult psx-spx.

Commands have an opcode defined in bits 29..32 of the first packet:

Opcode	Name	Alias
`0x0`	Misc
`0x1`	Polygon
`0x2`	Line
`0x3`	Rectangle
`0x4`	VRAM to VRAM blit
`0x5`	CPU to VRAM blit	GP0(A0)
`0x6`	VRAM to CPU blit	GP0(C0)
`0x7`	Environment

0x00 - Misc

Misc commands have another opcode defined in bits 24..29 of the first packet:

Misc Opcode	Name	Alias
`0x00`	NOP	GP0(00)
`0x01`	Clear Cache	GP0(01)
`0x02`	Quick Rect Fill	GP0(02)
`0x03`	Unknown*	GP0(03)
`0x04..0x1E`	NOP
`0x1F`	Interrupt Request	GP0(1F)

_{* Takes space in the FIFO, but seems like a NOP otherwise}

0x07 - Environment

Environment commands have another opcode defined in bits 24..29 of the first packet:

Environment Opcode	Name	Alias
`0x00`	NOP	GP0(E0)
`0x01`	Drawing Settings	GP0(E1)
`0x02`	Texture Settings	GP0(E2)
`0x03`	Set Drawing Area Top-Left	GP0(E3)
`0x04`	Set Drawing Area Bottom-Right	GP0(E4)
`0x05`	Set Drawing Offset	GP0(E5)
`0x06`	Mask Settings	GP0(E6)
`0x07..=0x1F`	NOP	GP0(E7)..=GP0(FF)

GP1 - Display Commands

note

This page is mostly intended as encoding reference - for more information about behaviour consult psx-spx.

Commands have an opcode defined in bits 24..30 of the first packet:

Opcode	Name	Alias
`0x00`	Reset GPU	GP1(00)
`0x01`	Reset Command Buffer	GP1(01)
`0x02`	Acknowledge GPU Interrupt	GP1(02)
`0x03`	Display Enable/Disable	GP1(03)
`0x04`	DMA Direction	GP1(04)
`0x05`	Display Area Start	GP1(05)
`0x06`	Horizontal Display Range	GP1(06)
`0x07`	Vertical Display Range	GP1(07)
`0x08`	Display Mode	GP1(08)
`0x09`	Set VRAM Size (v2)	GP1(09)
`0x0A..=0x0F`	Unknown	GP1(0A)..=GP1(0F)
`0x10`	Read GPU Register	GP1(10)
`0x11..=0x1F`	Mirrors of GP1(10)	GP1(11)..=GP1(1F)
`0x20`	Set VRAM Size (v1)	GP1(11)..=GP1(1F)
`0x21..=0x3F`	Unknown	GP1(21)..=GP1(3F)

None of the display commands require extra packets.

References

This page is a collection of links that are commonly referenced through the book.

PSX Notes

PSX Notes

CPU

Encoding of BLTZ/BGEZ/BLTZAL/BGEZAL

Unaligned loads and stores

LWL

Description

Examples

LWR

Description

Examples

LWL and LWR example usage together

SWL and SWR

Load Cancelling

Example

Manual Errata

SLTIU

GPU

GPU Commands

GP0 - Rendering Commands

0x00 - Misc

0x07 - Environment

GP1 - Display Commands

References

Resources

PSX-SPX

R3000 Manual

Tests

psxtest_cpu