LuaJIT 2.0 SSA IR

Introduction

The following document describes the Intermediate Representation (IR) used by the JIT-compiler of LuaJIT 2.0. The trace-compiler records bytecode instructions, following the control-flow, and emits the corresponding IR instructions on-the-fly.

The IR has the following characteristics:

The IR is in SSA (Static Single Assignment) form. Every instruction (node) represents a single definition of a value. Multiple instructions form a partially connected data-flow graph. Data-flow for loops is represented using PHI-instructions. Control-flow is always implicit.
The IR is linear, pointer-free and implicitly numbered: every instruction can be uniquely referenced (IRRef) by its position in a linear array. Specially crafted, biased IR references allow fast const vs. non-const decisions. No space is wasted on storing an explicit reference number, value number or similar.
The IR is in 2-operand-normalized form: every instruction has an opcode and a maximum of two operands. A few instructions may need more operands (e.g. CALL*), which are composed using extension instructions (CARG).
The IR is typed: every instruction has an output data type. The modeled types correspond to the basic Lua data types plus low-level data types. Higher-level data types are indirectly modeled as-needed with guarded assertions.
The IR has segregated, per-opcode chaining: this allows fast searching for specific instructions in reverse order without a full traversal. This is used to speed up many optimizations, like CSE or alias analysis. Most searches stop after zero (no match), one or two dereferences in practice.
The IR is very compact: it needs only 64 bits per instruction and all instructions are adjacent to each other. This layout is very cache-efficient and very fast to index or traverse.
The IR is incrementally generated: the IR array is bi-directionally grown: constants grow downwards, all other instructions grow upwards. Most optimizations are perfomed on-the-fly and eliminated instructions are either simply not emitted, ignored during code generation or appropriately tagged. There's no general need to insert or delete instructions in the middle. This avoids the very cache-inefficient linked-sea-of-nodes data structure, presented in most compiler textbooks.
The IR is unified: it carries both high-level semantics and low-level details. Different stages of the compiler use different aspects of the IR, but share a common IR format. Eliminating the classic HIR, MIR, LIR separation (high-, medium-, low-level IR) greatly reduces complexity and compiler overhead. It avoids semantic information loss due to abstraction mismatches and allows cheap and effective high-level semantic disambiguation for memory references.
The IR uses auxiliary snapshots: a snapshot captures the IR references corresponding to modified slots and frames in the bytecode execution stack. Every snapshot saves a specific bytecode execution state, which can later be restored on trace exits. Snapshots are sparsely emitted and compressed. Snapshots provide the link between the IR and the bytecode domain (and transitively the source code domain, via the bytecode debug info).

Status

COMPLETE REWRITE IN PROGRESS

See src/lj_ir.h and src/lj_jit.h in the LuaJIT source code for the full details. The generated IR can be listed with luajit -jdump (traced bytecode, IR and machine code) or luajit -jdump=i (IR only).

Example IR Dump

$ ./luajit -jdump=bitmsr
LuaJIT 2.0.0-beta10 -- Copyright (C) 2005-2012 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 AMD fold cse dce fwd dse narrow loop abc sink fuse
> local x = 1.2 for i=1,1e3 do x = x * -3 end
---- TRACE 1 start stdin:1
0006  MULVN    0   0   1  ; -3
0007  FORL     1 => 0006
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbp      int SLOAD  #2    CI
0002 xmm7  >  num SLOAD  #1    T
0003 xmm7   + num MUL    0002  -3
0004 rbp    + int ADD    0001  +1
....              SNAP   #1   [ ---- 0003 ]
0005       >  int LE     0004  +1000
....              SNAP   #2   [ ---- 0003 0004 ---- ---- 0004 ]
0006 ------------ LOOP ------------
0007 xmm7   + num MUL    0003  -3
0008 rbp    + int ADD    0004  +1
....              SNAP   #3   [ ---- 0007 ]
0009       >  int LE     0008  +1000
0010 rbp      int PHI    0004  0008
0011 xmm7     num PHI    0003  0007
---- TRACE 1 mcode 81
394cffa3  mov dword [0x4183f4a0], 0x1
394cffae  movsd xmm0, [0x4184f698]
394cffb7  cvtsd2si ebp, [rdx+0x8]
394cffbc  cmp dword [rdx+0x4], 0xfffeffff
394cffc3  jnb 0x394c0010	->0
394cffc9  movsd xmm7, [rdx]
394cffcd  mulsd xmm7, xmm0
394cffd1  add ebp, +0x01
394cffd4  cmp ebp, 0x3e8
394cffda  jg 0x394c0014	->1
->LOOP:
394cffe0  mulsd xmm7, xmm0
394cffe4  add ebp, +0x01
394cffe7  cmp ebp, 0x3e8
394cffed  jle 0x394cffe0	->LOOP
394cffef  jmp 0x394c001c	->3
---- TRACE 1 stop -> loop

The above prints the bytecode of the trace, the IR generated from that bytecode with snapshots, and the machine code generated from the IR.

The columns of the IR are as follows:

1st column: IR instruction number (implicit SSA ref)
2nd column: physical CPU register or physical CPU stack slot that
  value is written to when converted to machine code.
  '[%x+]' (rather than register name) indicates hexadecimal offset
  from stack pointer.
  (This column is only present if the 'r' flags is included in -jdump, which
  augments the IR with register/stack slots.  It is not part of the IR itself.)
3rd column: Instruction flags:
  ">" (IRT_GUARD = 0x80 instruction flag) are locations of
    guards (leading to possible side exits from the trace).
  "+" (IRT_ISPHI = 0x40 instruction flag) indicates
    instruction is left or right PHI operand. (i.e referred
    to in some PHI instruction).
4th column: IR type (see IR Types below)
5th column: IR opcode (see opcode reference)
6th/7th column: IR operands (SSA refs or literals)
  '#' prefixes refer to slot numbers, used in SLOADS.
     #0 is the base frame (modified only in tail calls).
     #1 is the first slot in the first frame (register 0 in
     the bytecode)
  '[+-]' prefixes indicate positive or negative numeric literals.
  '[0x%d+]' and NULL are memory addresses.
  '"..."' are strings.
  '@' prefixes indicate slots (what is this?).
  Other possible values: "bias" (number 2^52+2^51 ?), "userdata:%p",
     "userdata:%p" (table)--when do these occur?.

See also SSA dump format comments: http://lua-users.org/lists/lua-l/2008-06/msg00225.html (older version). See formatk in dump.lua.

Each snaphot (SNAP) lists the modified stack slots and their values. The i-th value in the snapshot list represents the index of the IR that writes a value in slot number #i. '---' indicates that the slot is not written. Frames are separated by '|'. For further comments on snapshots, see http://lua-users.org/lists/lua-l/2009-11/msg00089.html.

IR Types

Every instruction has an output data type, which is either one of the basic Lua types or a low-level type. The order is carefully designed to simplify the mapping from tagged value types and to optimize common checks (e.g. 'any integer type'). See src/lj_ir.h and src/lj_obj.h.

#	Dump	IRT_	Description
0	nil	NIL	'nil' value
1	fal	FALSE	'false' value
2	tru	TRUE	'true' value
3	lud	LIGHTUD	Lightuserdata value
4	str	STR	Interned string object
5	p32	P32	32 bit pointer
6	thr	THREAD	Thread object
7	pro	PROTO	Function prototype object
8	fun	FUNC	Function (closure) object
9	p64	P64	64 bit pointer
10	cdt	CDATA	cdata object
11	tab	TAB	Table object
12	udt	UDATA	Userdata object
13	flt	FLOAT	32 bit FP number (float)
14	num	NUM	64 bit FP number (double)
15	i8	I8	8 bit signed integer (int8_t)
16	u8	U8	8 bit unsigned integer (uint8_t)
17	i16	I16	16 bit signed integer (int16_t)
18	u16	U16	16 bit unsigned integer (uint16_t)
19	int	INT	32 bit signed integer (int32_t)
20	u32	U32	32 bit unsigned integer (uint32_t)
21	i64	I64	64 bit signed integer (int64_t)
22	u64	U64	64 bit unsigned integer (uint64_t)
23	sfp	SOFTFP	Hi-word of split soft-fp operations

Constants

Constant instructions are only present in the constant part of the IR (growing upwards to lower refs). IR constants are interned (de-duplicated) and can be compared for equality only by looking at their references.

Constant instructions never appear in dumps, since -jdump always shows the actual constant value inlined into the referencing instructions.

OP	Left	Right	Description
KPRI			Primitive type: nil, false, true
KINT	#int		32 bit integer constant
KGC	#ptr		Garbage collected constant
KPTR	#ptr		Pointer constant
KKPTR	#ptr		Pointer constant to constant data
KNULL	#ptr		Typed NULL constant
KNUM	#k64ptr		Double-precision floating-point constant
KINT64	#k64ptr		64 bit integer constant
KSLOT	kref	#slot	Hash slot for constant

Every trace has three KPRI instructions at fixed references for the constants nil, false and true (REF_NIL, REF_FALSE, REF_TRUE).

The 32 bit integer or pointer values occupy the space for both the left and the right 16 bit operand. 64 bit values are interned in a global constant table and indirectly referenced by 32 bit pointers.

KPTR is a constant pointer (absolute address) to possibly non-constant data. KKPTR points to definitely constant data. Only data known by the VM to be constant qualifies, e.g. an interned Lua string. Content tagged as 'const' by users (e.g. const char *) doesn't qualify.

KSLOT is used as a key for HREFK and holds the hash slot where the key is to be found and a reference to the constant key itself.

Guarded Assertions

Guarded assertions have a dual purpose:

They provide an assertion about their operands that can be used by the compiler to optimize all following instructions in the same trace.
They are emitted by the backend as branching comparisons, with the 'true' outcome in the fall-through path. A 'false' outcome exits the trace and restores the state to the most recent snapshot.

OP	Left	Right	Description
LT	left	right	left < right (signed)
GE	left	right	left ≥ right (signed)
LE	left	right	left ≤ right (signed)
GT	left	right	left > right (signed)
ULT	left	right	left < right (unsigned/unordered)
UGE	left	right	left ≥ right (unsigned/unordered)
ULE	left	right	left ≤ right (unsigned/unordered)
UGT	left	right	left > right (unsigned/unordered)
EQ	left	right	left = right
NE	left	right	left ≠ right
ABC	bound	index	Array Bounds Check: bound > index (unsigned)
RETF	proto	pc	Return to lower frame: check target PC, shift base

The U.. opcodes provide unsigned comparison semantics for integer types and unordered comparison semantics for floating-point types. A NaN operand causes a 'false' outcome for EQ and ordered comparisons, and a 'true' outcome for NE and unordered comparisons.

ABC is treated just like UGT in the backend. But it follows different FOLD rules, which simplifies ABC elimination.

The prototype returned to by RETF is below the call graph covered by the trace up to this point. Thus RETF needs to anchor the prototype to prevent recycling the PC after garbage collection.

Bit Ops

OP	Left	Right	Description
BNOT	ref		Bitwise NOT of ref
BSWAP	ref		Byte-swapped ref
BAND	left	right	Bitwise AND of left and right
BOR	left	right	Bitwise OR of left and right
BXOR	left	right	Bitwise XOR of left and right
BSHL	ref	shift	Bitwise left shift of ref
BSHR	ref	shift	Bitwise logical right shift of ref
BSAR	ref	shift	Bitwise arithmetic right shift of ref
BROL	ref	shift	Bitwise left rotate of ref
BROR	ref	shift	Bitwise right rotate of ref

The shift count for shift and rotate instructions is interpreted modulo the bit width of the shifted type, i.e. only the lowest N bits are significant. Appropriate bit masking instructions (BAND) are inserted for backends where the underlying machine instructions don't perform the masking themselves. Similarly, rotates are unified to one direction, in case the architecture doesn't provide machine instructions for both.

Arithmetic Ops

OP	Left	Right	Description
ADD	left	right	left + right
SUB	left	right	left - right
MUL	left	right	left * right
DIV	left	right	left / right
MOD	left	right	left % right
POW	left	right	left ^ right
NEG	ref	kneg	-ref
ABS	ref	kabs	abs(ref)
ATAN2	left	right	atan2(left, right)
LDEXP	left	right	ldexp(left, right)
MIN	left	right	min(left, right)
MAX	left	right	max(left, right)
FPMATH	ref	#fpm	fpmath(ref), see below
ADDOV	left	right	left + right, overflow-checked
SUBOV	left	right	left - right, overflow-checked
MULOV	left	right	left * right, overflow-checked

All arithmetic ops operate within the domain of their types: integers, pointers or floating-point numbers. Not all ops are defined for all possible types. Both signed and unsigned integers wrap around on overflow.

Overflow-checking operations exit the trace upon signed integer arithmetic overflow.

MOD is decomposed into left-floor(left/right)right for floating-point numbers. POW is either turned into POW with an integer as the right operand, or into sqrt(left) if right is 0.5, or into exp2(log2(left)right) otherwise (the backend may later merge this into a call to pow()).

The undefined cases for the integer variants of DIV, MOD and POW return the integer value with only the highest bit set.

NEG and ABS for floating-point numbers reference a SIMD-aligned constant in the right operand. Some backends implement these as a bitwise XOR or AND of the number and the constant.

The right operand of LDEXP is a floating-point number on x86 and x64 platforms and a 32 bit integer on all others.

All floating-point arithmetic operations obey the standard definitions from IEEE 754 wrt. +-0, +-Inf, NaN and denormals. MIN and MAX have no defined behavior for NaN operands.

FPMATH is used for unary floating-point arithmetic operations. The right operand specifies the actual operation:

OP	Description
FPM_FLOOR	floor(ref)
FPM_CEIL	ceil(ref)
FPM_TRUNC	trunc(ref)
FPM_SQRT	sqrt(ref)
FPM_EXP	exp(ref)
FPM_EXP2	exp2(ref)
FPM_LOG	log(ref)
FPM_LOG2	log2(ref)
FPM_LOG10	log10(ref)
FPM_SIN	sin(ref)
FPM_COS	cos(ref)
FPM_TAN	tan(ref)

Memory References

Memory references generate a pointer value to be used by the respective loads or stores. To preserve higher-level semantics and to simplify alias analysis they are not decomposed into lower-level operations (like for XLOAD references). Some of them can be (partially) fused into the operands of load or store instructions by most backends.

OP	Left	Right	Description
AREF	array	index	Array reference
HREFK	hash	kslot	Constant hash reference
HREF	tab	key	Hash reference
NEWREF	tab	key	Create new reference
UREFO	func	#uv	Open upvalue reference
UREFC	func	#uv	Closed upvalue reference
FREF	obj	#field	Object field reference
STRREF	str	index	String reference

AREF and HREFK reference an FLOAD of the array part or hash part of a Lua table with their left operand. HREF and NEWREF directly reference a Lua table, since they need to search or extend the table. HREFK is specialized to the hash slot where the constant key is expected -- see KSLOT above for its right operand. NEWREF assumes the key does not exist in the Lua table, yet.

The left operand of UREFO and UREFC reference the current function (closure). The right operand holds an upvalue disambiguation hash in the lowest 8 bits and the upvalue index in the higher bits.

For the possible values of the field ID in FREF, see IRFLDEF in src/lj_ir.h.

Loads and Stores

Loads and stores operate on memory references and either load a value (result of the instruction) or store a value (the right operand). To preserve higher-level semantics and to simplify alias analysis they are not unified or decomposed into lower-level operations.

OP	Left	Right	Description
ALOAD	aref		Array load
HLOAD	href		Hash load
ULOAD	uref		Upvalue load
FLOAD	obj	#field	Object field load
XLOAD	xref	#flags	Extended load
SLOAD	#slot	#flags	Stack slot load
VLOAD	aref		Vararg slot load
ASTORE	aref	val	Array store
HSTORE	href	val	Hash store
USTORE	uref	val	Upvalue store
FSTORE	fref	val	Object field store
XSTORE	xref	val	Extended store

FLOAD and SLOAD inline their memory references, all other loads and all stores have a memory reference as their left operand. All loads except FLOAD and XLOAD work on tagged values and simultaneously function as a guarded assertion that checks the loaded type.

FLOAD and FSTORE access specific fields inside objects, identified by the field ID of their reference (e.g. the metatable field in table or userdata objects).

XLOAD works on lower-level types and the memory reference is either a STRREF or decomposed into lower-level operations, a combination of ADD, MUL or BSHL of pointers, offsets or indexes.

The slot number of SLOAD is relative to the starting frame of a trace, where #0 indicates the closure/frame slot and #1 the first variable slot (corresponding to slot 0 of the bytecode). Note that RETF shifts down BASE and subsequent SLOAD instructions refer to slots of the lower frame(s).

Note there are no store operations for stack slots or vararg slots. All stores to stack slots are effectively sunk into exits or side traces. Snapshots efficiently manage the references that are to be stored. Vararg slots are read-only from the perspective of the called vararg function.

For the possible values of the field ID in FLOAD and the flags in SLOAD and XLOAD, see IRFLDEF, IRSLOAD_* and IRXLOAD_* in src/lj_ir.h.

Allocations

OP	Left	Right	Description
SNEW	data	length	Allocate interned string
XSNEW	data	length	Allocate interned string from cdata
TNEW	#asize	#hbits	Allocate Lua table with minimum array and hash sizes
TDUP	template		Allocate Lua table, copying a template table
CNEW	ctypeid	(size)	Allocate mutable cdata
CNEWI	ctypeid	val	Allocate immutable cdata

SNEW is only used to reference data that's guaranteed to stay constant, e.g. other strings. This allows elimination of the allocation in case the string object isn't used (its data may still be used). This assumption doesn't hold for data referenced by ffi.string(), which has to emit XSNEW instead.

The size of the allocated cdata object for CNEW and CNEWI is inferred from the ctypeid operand. For variable-length cdata, the size is explicitly given by the size operand, otherwise the size operand is REF_NIL.

CNEWI is only used for immutable scalar cdata types. It combines an allocation with an initialization. The value given by the right operand is implicitly stored. This shortens the IR and turns allocation sinking into simple dead-code elimination (for immutable types only -- the generic allocation sinking optimization is more involved).

Barriers

OP	Left	Right	Description
TBAR	tab		Table write barrier
OBAR	obj	val	Object write barrier
XBAR			XLOAD/XSTORE optimization barrier

TBAR and OBAR are write barriers needed for the incremental GC. OBAR is currently only used for stores to upvalues. These barriers are placed after the corresponding stores.

XBAR prevents optimizations of XLOAD and XSTORE across the barrier. Note that CALLX* implicitly functions as such a barrier.

Type Conversions

OP	Left	Right	Description
CONV	src	#flags	Generic type conversion
TOBIT	num	bias	Convert double to integer with Lua BitOp semantics
TOSTR	number		Convert double or integer to string
STRTO	str		Convert string to double

The bias operand of TOBIT references the floating-point constant 2^52+2^51, which is added to the FP number. The lower 32 bits of the result represent the integer component modulo 2^32 (within certain limits, see the Lua BitOp semantics.

The lower 5 bits of the flags operand of CONV specify the source type, the next 5 bits specify the destination type (identical to the IR instruction result type). Bit 10 specifies truncation behavior for FP to integer conversions and bit 11 specifies sign-extension for widening integer conversions. Bits 12 and 13 specify the strength of the check for (guarded) conversions from double to integers: 1 means any FP number is ok, 2 and 3 check the FP number for integerness. 2 enables special rules for backpropagation of index conversions.

Calls

OP	Left	Right	Description
CALLN	args	#ircall	Call internal function (normal)
CALLL	args	#ircall	Call internal function (load)
CALLS	args	#ircall	Call internal function (store)
CALLXS	args	func	Call external function (store/barrier)
CARG	args	arg	Call argument extension

Calls to internal functions are divided into three categories:

normal calls can be commoned and eliminated like most arithmetic operations.
For calls that perform loads (e.g. lj_tabl_len()) the compiler must check for intervening stores.
Calls that perform stores cannot be eliminated (but they could be sunk, if such an optimization were beneficial).

For the possible values of the ircall operand, see src/lj_ircall.h.

The FFI emits CALLXS for calls to external functions. They are generally considered to be stores and cannot be eliminated. They form an implicit optimization barrier for XLOAD and XSTORE instructions.

The args operand of call instructions references a left-leaning tree of arguments using the CARG extension instruction:

func(): args = REF_NIL
func(arg1): args = arg1
func(arg1, arg2): args = CARG(arg1, arg2)
func(arg1, arg2, arg3): args = CARG(CARG(arg1, arg2), arg3)
etc.

The func operand of CALLXS either directly references the function pointer or a CARG holding the function pointer and the ctypeid for non-standard calling conventions (e.g. vararg calls).

Miscellaneous Ops

OP	Left	Right	Description
NOP			No operation
BASE	#parent	#exit	BASE reference, link to parent side exit
PVAL	#pref		Parent value reference
GCSTEP			Explicit GC step
HIOP	left	right	Hold hi-word operands of split instructions
LOOP			Separator before loop-part of a trace
USE	ref		Explicit use of a reference
PHI	left	right	PHI node for loops
RENAME	ref	#snap	Renamed reference below snapshot

NOP is used to patch previously emitted IR instructions, in case they cannot eliminated or ignored on-the-fly.

BASE is a fixed instruction at REF_BASE, used to hold the BASE pointer. It's implicitly referenced by e.g. SLOAD.

PVAL provides an alternate way to reference specific values of the parent trace, which cannot be referenced with a parent SLOAD (since they are not stored in the stack at the snapshot).

GCSTEP provides an explicit GC step for certain cases where it needs to be done after the initial snapshot.

HIOP must immediately follow a split instruction (split 64 bit op or soft-fp op).

USE is needed to keep instructions for their side-effects that would otherwise be eliminated: e.g. an ADDOV that's used to check for potential integer overflow of loop bounds.

PHI instructions are positioned at the end of a looping trace. The left operand holds a reference to the initial value, the right operand holds a reference to the value after each loop iteration.

RENAME is generated by the register allocator, when it renames a register for a value (for efficiency or to preserve PHI registers). A RENAME instruction holds the register used for the reference below the given snapshot. Multiple instances of this may be present for the same reference. The originally referenced instruction holds the register used above of the highest such snapshot (if any).

LuaJIT Wiki