The following document describes the Intermediate Representation (IR) used by the JIT-compiler of LuaJIT 2.0. The trace-compiler records bytecode instructions, following the control-flow, and emits the corresponding IR instructions on-the-fly.
The IR has the following characteristics:
The IR is in SSA (Static Single Assignment) form. Every instruction (node) represents a single definition of a value. Multiple instructions form a partially connected data-flow graph. Data-flow for loops is represented using PHI-instructions. Control-flow is always implicit.
The IR is linear, pointer-free and implicitly numbered: every instruction can be uniquely referenced (IRRef) by its position in a linear array. Specially crafted, biased IR references allow fast const vs. non-const decisions. No space is wasted on storing an explicit reference number, value number or similar.
The IR is in 2-operand-normalized form: every instruction has an opcode and a maximum of two operands. A few instructions may need more operands (e.g. CALL*
), which are composed using extension instructions (CARG
).
The IR is typed: every instruction has an output data type. The modeled types correspond to the basic Lua data types plus low-level data types. Higher-level data types are indirectly modeled as-needed with guarded assertions.
The IR has segregated, per-opcode chaining: this allows fast searching for specific instructions in reverse order without a full traversal. This is used to speed up many optimizations, like CSE or alias analysis. Most searches stop after zero (no match), one or two dereferences in practice.
The IR is very compact: it needs only 64 bits per instruction and all instructions are adjacent to each other. This layout is very cache-efficient and very fast to index or traverse.
The IR is incrementally generated: the IR array is bi-directionally grown: constants grow downwards, all other instructions grow upwards. Most optimizations are perfomed on-the-fly and eliminated instructions are either simply not emitted, ignored during code generation or appropriately tagged. There's no general need to insert or delete instructions in the middle. This avoids the very cache-inefficient linked-sea-of-nodes data structure, presented in most compiler textbooks.
The IR is unified: it carries both high-level semantics and low-level details. Different stages of the compiler use different aspects of the IR, but share a common IR format. Eliminating the classic HIR, MIR, LIR separation (high-, medium-, low-level IR) greatly reduces complexity and compiler overhead. It avoids semantic information loss due to abstraction mismatches and allows cheap and effective high-level semantic disambiguation for memory references.
The IR uses auxiliary snapshots: a snapshot captures the IR references corresponding to modified slots and frames in the bytecode execution stack. Every snapshot saves a specific bytecode execution state, which can later be restored on trace exits. Snapshots are sparsely emitted and compressed. Snapshots provide the link between the IR and the bytecode domain (and transitively the source code domain, via the bytecode debug info).
COMPLETE REWRITE IN PROGRESS
See src/lj_ir.h
and src/lj_jit.h
in the LuaJIT source code for the full details. The generated IR can be listed with luajit -jdump
(traced bytecode, IR and machine code) or luajit -jdump=i
(IR only).
$ ./luajit -jdump=bitmsr
LuaJIT 2.0.0-beta10 -- Copyright (C) 2005-2012 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 AMD fold cse dce fwd dse narrow loop abc sink fuse
> local x = 1.2 for i=1,1e3 do x = x * -3 end
---- TRACE 1 start stdin:1
0006 MULVN 0 0 1 ; -3
0007 FORL 1 => 0006
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbp int SLOAD #2 CI
0002 xmm7 > num SLOAD #1 T
0003 xmm7 + num MUL 0002 -3
0004 rbp + int ADD 0001 +1
.... SNAP #1 [ ---- 0003 ]
0005 > int LE 0004 +1000
.... SNAP #2 [ ---- 0003 0004 ---- ---- 0004 ]
0006 ------------ LOOP ------------
0007 xmm7 + num MUL 0003 -3
0008 rbp + int ADD 0004 +1
.... SNAP #3 [ ---- 0007 ]
0009 > int LE 0008 +1000
0010 rbp int PHI 0004 0008
0011 xmm7 num PHI 0003 0007
---- TRACE 1 mcode 81
394cffa3 mov dword [0x4183f4a0], 0x1
394cffae movsd xmm0, [0x4184f698]
394cffb7 cvtsd2si ebp, [rdx+0x8]
394cffbc cmp dword [rdx+0x4], 0xfffeffff
394cffc3 jnb 0x394c0010 ->0
394cffc9 movsd xmm7, [rdx]
394cffcd mulsd xmm7, xmm0
394cffd1 add ebp, +0x01
394cffd4 cmp ebp, 0x3e8
394cffda jg 0x394c0014 ->1
->LOOP:
394cffe0 mulsd xmm7, xmm0
394cffe4 add ebp, +0x01
394cffe7 cmp ebp, 0x3e8
394cffed jle 0x394cffe0 ->LOOP
394cffef jmp 0x394c001c ->3
---- TRACE 1 stop -> loop
The above prints the bytecode of the trace, the IR generated from that bytecode with snapshots, and the machine code generated from the IR.
The columns of the IR are as follows:
1st column: IR instruction number (implicit SSA ref)
2nd column: physical CPU register or physical CPU stack slot that
value is written to when converted to machine code.
'[%x+]' (rather than register name) indicates hexadecimal offset
from stack pointer.
(This column is only present if the 'r' flags is included in -jdump, which
augments the IR with register/stack slots. It is not part of the IR itself.)
3rd column: Instruction flags:
">" (IRT_GUARD = 0x80 instruction flag) are locations of
guards (leading to possible side exits from the trace).
"+" (IRT_ISPHI = 0x40 instruction flag) indicates
instruction is left or right PHI operand. (i.e referred
to in some PHI instruction).
4th column: IR type (see IR Types below)
5th column: IR opcode (see opcode reference)
6th/7th column: IR operands (SSA refs or literals)
'#' prefixes refer to slot numbers, used in SLOADS.
#0 is the base frame (modified only in tail calls).
#1 is the first slot in the first frame (register 0 in
the bytecode)
'[+-]' prefixes indicate positive or negative numeric literals.
'[0x%d+]' and NULL are memory addresses.
'"..."' are strings.
'@' prefixes indicate slots (what is this?).
Other possible values: "bias" (number 2^52+2^51 ?), "userdata:%p",
"userdata:%p" (table)--when do these occur?.
See also SSA dump format comments: http://lua-users.org/lists/lua-l/2008-06/msg00225.html (older version).
See formatk
in dump.lua.
Each snaphot (SNAP) lists the modified stack slots and their values. The i-th value in the snapshot list represents the index of the IR that writes a value in slot number #i. '---' indicates that the slot is not written. Frames are separated by '|'. For further comments on snapshots, see http://lua-users.org/lists/lua-l/2009-11/msg00089.html.
Every instruction has an output data type, which is either one of the
basic Lua types or a low-level type. The order is carefully designed to
simplify the mapping from tagged value types and to optimize common
checks (e.g. 'any integer type'). See src/lj_ir.h
and src/lj_obj.h
.
# | Dump | IRT_ | Description |
---|---|---|---|
0 | nil | NIL | 'nil' value |
1 | fal | FALSE | 'false' value |
2 | tru | TRUE | 'true' value |
3 | lud | LIGHTUD | Lightuserdata value |
4 | str | STR | Interned string object |
5 | p32 | P32 | 32 bit pointer |
6 | thr | THREAD | Thread object |
7 | pro | PROTO | Function prototype object |
8 | fun | FUNC | Function (closure) object |
9 | p64 | P64 | 64 bit pointer |
10 | cdt | CDATA | cdata object |
11 | tab | TAB | Table object |
12 | udt | UDATA | Userdata object |
13 | flt | FLOAT | 32 bit FP number (float) |
14 | num | NUM | 64 bit FP number (double) |
15 | i8 | I8 | 8 bit signed integer (int8_t) |
16 | u8 | U8 | 8 bit unsigned integer (uint8_t) |
17 | i16 | I16 | 16 bit signed integer (int16_t) |
18 | u16 | U16 | 16 bit unsigned integer (uint16_t) |
19 | int | INT | 32 bit signed integer (int32_t) |
20 | u32 | U32 | 32 bit unsigned integer (uint32_t) |
21 | i64 | I64 | 64 bit signed integer (int64_t) |
22 | u64 | U64 | 64 bit unsigned integer (uint64_t) |
23 | sfp | SOFTFP | Hi-word of split soft-fp operations |
Constant instructions are only present in the constant part of the IR (growing upwards to lower refs). IR constants are interned (de-duplicated) and can be compared for equality only by looking at their references.
Constant instructions never appear in dumps, since -jdump
always shows
the actual constant value inlined into the referencing instructions.
Description | |||
---|---|---|---|
KPRI | Primitive type: nil, false, true | ||
KINT | #int | 32 bit integer constant | |
KGC | #ptr | Garbage collected constant | |
KPTR | #ptr | Pointer constant | |
KKPTR | #ptr | Pointer constant to constant data | |
KNULL | #ptr | Typed NULL constant | |
KNUM | #k64ptr | Double-precision floating-point constant | |
KINT64 | #k64ptr | 64 bit integer constant | |
KSLOT | kref | #slot | Hash slot for constant |
Every trace has three KPRI
instructions at fixed references for the
constants nil, false and true (REF_NIL, REF_FALSE, REF_TRUE).
The 32 bit integer or pointer values occupy the space for both the left and the right 16 bit operand. 64 bit values are interned in a global constant table and indirectly referenced by 32 bit pointers.
KPTR
is a constant pointer (absolute address) to possibly non-constant
data. KKPTR
points to definitely constant data. Only data known by
the VM to be constant qualifies, e.g. an interned Lua string. Content
tagged as 'const' by users (e.g. const char *
) doesn't qualify.
KSLOT
is used as a key for HREFK
and holds the hash slot where the
key is to be found and a reference to the constant key itself.
Guarded assertions have a dual purpose:
Description | |||
---|---|---|---|
LT | left | right | left < right (signed) |
GE | left | right | left ≥ right (signed) |
LE | left | right | left ≤ right (signed) |
GT | left | right | left > right (signed) |
ULT | left | right | left < right (unsigned/unordered) |
UGE | left | right | left ≥ right (unsigned/unordered) |
ULE | left | right | left ≤ right (unsigned/unordered) |
UGT | left | right | left > right (unsigned/unordered) |
EQ | left | right | left = right |
NE | left | right | left ≠ right |
ABC | bound | index | Array Bounds Check: bound > index (unsigned) |
RETF | proto | pc | Return to lower frame: check target PC, shift base |
The U..
opcodes provide unsigned comparison semantics for integer
types and unordered comparison semantics for floating-point types. A
NaN
operand causes a 'false' outcome for EQ
and ordered comparisons,
and a 'true' outcome for NE
and unordered comparisons.
ABC
is treated just like UGT
in the backend. But it follows
different FOLD rules, which simplifies ABC elimination.
The prototype returned to by RETF
is below the call graph covered by
the trace up to this point. Thus RETF
needs to anchor the prototype to
prevent recycling the PC after garbage collection.
Description | |||
---|---|---|---|
BNOT | ref | Bitwise NOT of ref | |
BSWAP | ref | Byte-swapped ref | |
BAND | left | right | Bitwise AND of left and right |
BOR | left | right | Bitwise OR of left and right |
BXOR | left | right | Bitwise XOR of left and right |
BSHL | ref | shift | Bitwise left shift of ref |
BSHR | ref | shift | Bitwise logical right shift of ref |
BSAR | ref | shift | Bitwise arithmetic right shift of ref |
BROL | ref | shift | Bitwise left rotate of ref |
BROR | ref | shift | Bitwise right rotate of ref |
The shift count for shift and rotate instructions is interpreted modulo
the bit width of the shifted type, i.e. only the lowest N bits are
significant. Appropriate bit masking instructions (BAND
) are inserted
for backends where the underlying machine instructions don't perform the
masking themselves. Similarly, rotates are unified to one direction, in
case the architecture doesn't provide machine instructions for both.
Description | |||
---|---|---|---|
ADD | left | right | left + right |
SUB | left | right | left - right |
MUL | left | right | left * right |
DIV | left | right | left / right |
MOD | left | right | left % right |
POW | left | right | left ^ right |
NEG | ref | kneg | -ref |
ABS | ref | kabs | abs(ref) |
ATAN2 | left | right | atan2(left, right) |
LDEXP | left | right | ldexp(left, right) |
MIN | left | right | min(left, right) |
MAX | left | right | max(left, right) |
FPMATH | ref | #fpm | fpmath(ref), see below |
ADDOV | left | right | left + right, overflow-checked |
SUBOV | left | right | left - right, overflow-checked |
MULOV | left | right | left * right, overflow-checked |
All arithmetic ops operate within the domain of their types: integers, pointers or floating-point numbers. Not all ops are defined for all possible types. Both signed and unsigned integers wrap around on overflow.
Overflow-checking operations exit the trace upon signed integer arithmetic overflow.
MOD
is decomposed into left-floor(left/right)right for floating-point
numbers. POW
is either turned into POW
with an integer as the right
operand, or into sqrt(left) if right is 0.5, or into
exp2(log2(left)right) otherwise (the backend may later merge this into
a call to pow()).
The undefined cases for the integer variants of DIV
, MOD
and POW
return the integer value with only the highest bit set.
NEG
and ABS
for floating-point numbers reference a SIMD-aligned
constant in the right operand. Some backends implement these as a
bitwise XOR or AND of the number and the constant.
The right operand of LDEXP
is a floating-point number on x86 and x64
platforms and a 32 bit integer on all others.
All floating-point arithmetic operations obey the standard definitions
from IEEE 754 wrt. +-0, +-Inf, NaN and denormals. MIN
and MAX
have
no defined behavior for NaN operands.
FPMATH
is used for unary floating-point arithmetic operations. The
right operand specifies the actual operation:
Description | |
---|---|
FPM_FLOOR | floor(ref) |
FPM_CEIL | ceil(ref) |
FPM_TRUNC | trunc(ref) |
FPM_SQRT | sqrt(ref) |
FPM_EXP | exp(ref) |
FPM_EXP2 | exp2(ref) |
FPM_LOG | log(ref) |
FPM_LOG2 | log2(ref) |
FPM_LOG10 | log10(ref) |
FPM_SIN | sin(ref) |
FPM_COS | cos(ref) |
FPM_TAN | tan(ref) |
Memory references generate a pointer value to be used by the respective
loads or stores. To preserve higher-level semantics and to simplify
alias analysis they are not decomposed into lower-level operations (like
for XLOAD
references). Some of them can be (partially) fused into the
operands of load or store instructions by most backends.
Description | |||
---|---|---|---|
AREF | array | index | Array reference |
HREFK | hash | kslot | Constant hash reference |
HREF | tab | key | Hash reference |
NEWREF | tab | key | Create new reference |
UREFO | func | #uv | Open upvalue reference |
UREFC | func | #uv | Closed upvalue reference |
FREF | obj | #field | Object field reference |
STRREF | str | index | String reference |
AREF
and HREFK
reference an FLOAD
of the array part or hash part
of a Lua table with their left operand. HREF
and NEWREF
directly
reference a Lua table, since they need to search or extend the table.
HREFK
is specialized to the hash slot where the constant key is
expected -- see KSLOT
above for its right operand. NEWREF
assumes
the key does not exist in the Lua table, yet.
The left operand of UREFO
and UREFC
reference the current function
(closure). The right operand holds an upvalue disambiguation hash in the
lowest 8 bits and the upvalue index in the higher bits.
For the possible values of the field ID in FREF
, see IRFLDEF in
src/lj_ir.h
.
Loads and stores operate on memory references and either load a value (result of the instruction) or store a value (the right operand). To preserve higher-level semantics and to simplify alias analysis they are not unified or decomposed into lower-level operations.
Description | |||
---|---|---|---|
ALOAD | aref | Array load | |
HLOAD | href | Hash load | |
ULOAD | uref | Upvalue load | |
FLOAD | obj | #field | Object field load |
XLOAD | xref | #flags | Extended load |
SLOAD | #slot | #flags | Stack slot load |
VLOAD | aref | Vararg slot load | |
ASTORE | aref | val | Array store |
HSTORE | href | val | Hash store |
USTORE | uref | val | Upvalue store |
FSTORE | fref | val | Object field store |
XSTORE | xref | val | Extended store |
FLOAD
and SLOAD
inline their memory references, all other loads and
all stores have a memory reference as their left operand. All loads
except FLOAD
and XLOAD
work on tagged values and simultaneously
function as a guarded assertion that checks the loaded type.
FLOAD
and FSTORE
access specific fields inside objects, identified
by the field ID of their reference (e.g. the metatable field in table or
userdata objects).
XLOAD
works on lower-level types and the memory reference is either a
STRREF
or decomposed into lower-level operations, a combination of
ADD
, MUL
or BSHL
of pointers, offsets or indexes.
The slot number of SLOAD
is relative to the starting frame of a trace,
where #0 indicates the closure/frame slot and #1 the first variable slot
(corresponding to slot 0 of the bytecode). Note that RETF
shifts down
BASE and subsequent SLOAD
instructions refer to slots of the lower
frame(s).
Note there are no store operations for stack slots or vararg slots. All stores to stack slots are effectively sunk into exits or side traces. Snapshots efficiently manage the references that are to be stored. Vararg slots are read-only from the perspective of the called vararg function.
For the possible values of the field ID in FLOAD
and the flags in
SLOAD
and XLOAD
, see IRFLDEF, IRSLOAD_* and IRXLOAD_* in
src/lj_ir.h
.
Description | |||
---|---|---|---|
SNEW | data | length | Allocate interned string |
XSNEW | data | length | Allocate interned string from cdata |
TNEW | #asize | #hbits | Allocate Lua table with minimum array and hash sizes |
TDUP | template | Allocate Lua table, copying a template table | |
CNEW | ctypeid | (size) | Allocate mutable cdata |
CNEWI | ctypeid | val | Allocate immutable cdata |
SNEW
is only used to reference data that's guaranteed to stay
constant, e.g. other strings. This allows elimination of the allocation
in case the string object isn't used (its data may still be used). This
assumption doesn't hold for data referenced by ffi.string()
, which has
to emit XSNEW
instead.
The size of the allocated cdata object for CNEW
and CNEWI
is
inferred from the ctypeid operand. For variable-length cdata, the size
is explicitly given by the size operand, otherwise the size operand is
REF_NIL.
CNEWI
is only used for immutable scalar cdata types. It combines an
allocation with an initialization. The value given by the right operand
is implicitly stored. This shortens the IR and turns allocation sinking
into simple dead-code elimination (for immutable types only -- the
generic allocation sinking optimization is more involved).
Description | |||
---|---|---|---|
TBAR | tab | Table write barrier | |
OBAR | obj | val | Object write barrier |
XBAR | XLOAD/XSTORE optimization barrier |
TBAR
and OBAR
are write barriers needed for the incremental GC.
OBAR
is currently only used for stores to upvalues. These barriers are
placed after the corresponding stores.
XBAR
prevents optimizations of XLOAD
and XSTORE
across the
barrier. Note that CALLX*
implicitly functions as such a barrier.
Description | |||
---|---|---|---|
CONV | src | #flags | Generic type conversion |
TOBIT | num | bias | Convert double to integer with Lua BitOp semantics |
TOSTR | number | Convert double or integer to string | |
STRTO | str | Convert string to double |
The bias operand of TOBIT
references the floating-point constant
2^52+2^51, which is added to the FP number. The lower 32 bits of the
result represent the integer component modulo 2^32 (within certain
limits, see the Lua BitOp semantics.
The lower 5 bits of the flags operand of CONV
specify the source type,
the next 5 bits specify the destination type (identical to the IR
instruction result type). Bit 10 specifies truncation behavior for FP to
integer conversions and bit 11 specifies sign-extension for widening
integer conversions. Bits 12 and 13 specify the strength of the check
for (guarded) conversions from double to integers: 1 means any FP number
is ok, 2 and 3 check the FP number for integerness. 2 enables special
rules for backpropagation of index conversions.
Description | |||
---|---|---|---|
CALLN | args | #ircall | Call internal function (normal) |
CALLL | args | #ircall | Call internal function (load) |
CALLS | args | #ircall | Call internal function (store) |
CALLXS | args | func | Call external function (store/barrier) |
CARG | args | arg | Call argument extension |
Calls to internal functions are divided into three categories:
lj_tabl_len()
) the compiler must
check for intervening stores.For the possible values of the ircall operand, see src/lj_ircall.h
.
The FFI emits CALLXS
for calls to external functions. They are
generally considered to be stores and cannot be eliminated. They form an
implicit optimization barrier for XLOAD
and XSTORE
instructions.
The args operand of call instructions references a left-leaning tree of
arguments using the CARG
extension instruction:
The func operand of CALLXS
either directly references the function
pointer or a CARG
holding the function pointer and the ctypeid for
non-standard calling conventions (e.g. vararg calls).
Description | |||
---|---|---|---|
NOP | No operation | ||
BASE | #parent | #exit | BASE reference, link to parent side exit |
PVAL | #pref | Parent value reference | |
GCSTEP | Explicit GC step | ||
HIOP | left | right | Hold hi-word operands of split instructions |
LOOP | Separator before loop-part of a trace | ||
USE | ref | Explicit use of a reference | |
PHI | left | right | PHI node for loops |
RENAME | ref | #snap | Renamed reference below snapshot |
NOP
is used to patch previously emitted IR instructions, in case they
cannot eliminated or ignored on-the-fly.
BASE
is a fixed instruction at REF_BASE, used to hold the BASE
pointer. It's implicitly referenced by e.g. SLOAD
.
PVAL
provides an alternate way to reference specific values of the
parent trace, which cannot be referenced with a parent SLOAD
(since
they are not stored in the stack at the snapshot).
GCSTEP
provides an explicit GC step for certain cases where it needs
to be done after the initial snapshot.
HIOP
must immediately follow a split instruction (split 64 bit op or
soft-fp op).
USE
is needed to keep instructions for their side-effects that would
otherwise be eliminated: e.g. an ADDOV
that's used to check for
potential integer overflow of loop bounds.
PHI
instructions are positioned at the end of a looping trace. The
left operand holds a reference to the initial value, the right operand
holds a reference to the value after each loop iteration.
RENAME
is generated by the register allocator, when it renames a
register for a value (for efficiency or to preserve PHI registers). A
RENAME
instruction holds the register used for the reference below the
given snapshot. Multiple instances of this may be present for the same
reference. The originally referenced instruction holds the register used
above of the highest such snapshot (if any).