240 lines
10 KiB
ReStructuredText
240 lines
10 KiB
ReStructuredText
Target-specific lowering in ICE
|
|
===============================
|
|
|
|
This document discusses several issues around generating target-specific ICE
|
|
instructions from high-level ICE instructions.
|
|
|
|
Meeting register address mode constraints
|
|
-----------------------------------------
|
|
|
|
Target-specific instructions often require specific operands to be in physical
|
|
registers. Sometimes one specific register is required, but usually any
|
|
register in a particular register class will suffice, and that register class is
|
|
defined by the instruction/operand type.
|
|
|
|
The challenge is that ``Variable`` represents an operand that is either a stack
|
|
location in the current frame, or a physical register. Register allocation
|
|
happens after target-specific lowering, so during lowering we generally don't
|
|
know whether a ``Variable`` operand will meet a target instruction's physical
|
|
register requirement.
|
|
|
|
To this end, ICE allows certain directives:
|
|
|
|
* ``Variable::setWeightInfinite()`` forces a ``Variable`` to get some
|
|
physical register (without specifying which particular one) from a
|
|
register class.
|
|
|
|
* ``Variable::setRegNum()`` forces a ``Variable`` to be assigned a specific
|
|
physical register.
|
|
|
|
These directives are described below in more detail. In most cases, though,
|
|
they don't need to be explicity used, as the routines that create lowered
|
|
instructions have reasonable defaults and simple options that control these
|
|
directives.
|
|
|
|
The recommended ICE lowering strategy is to generate extra assignment
|
|
instructions involving extra ``Variable`` temporaries, using the directives to
|
|
force suitable register assignments for the temporaries, and then let the
|
|
register allocator clean things up.
|
|
|
|
Note: There is a spectrum of *implementation complexity* versus *translation
|
|
speed* versus *code quality*. This recommended strategy picks a point on the
|
|
spectrum representing very low complexity ("splat-isel"), pretty good code
|
|
quality in terms of frame size and register shuffling/spilling, but perhaps not
|
|
the fastest translation speed since extra instructions and operands are created
|
|
up front and cleaned up at the end.
|
|
|
|
Ensuring a non-specific physical register
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The x86 instruction::
|
|
|
|
mov dst, src
|
|
|
|
needs at least one of its operands in a physical register (ignoring the case
|
|
where ``src`` is a constant). This can be done as follows::
|
|
|
|
mov reg, src
|
|
mov dst, reg
|
|
|
|
so long as ``reg`` is guaranteed to have a physical register assignment. The
|
|
low-level lowering code that accomplishes this looks something like::
|
|
|
|
Variable *Reg;
|
|
Reg = Func->makeVariable(Dst->getType());
|
|
Reg->setWeightInfinite();
|
|
NewInst = InstX8632Mov::create(Func, Reg, Src);
|
|
NewInst = InstX8632Mov::create(Func, Dst, Reg);
|
|
|
|
``Cfg::makeVariable()`` generates a new temporary, and
|
|
``Variable::setWeightInfinite()`` gives it infinite weight for the purpose of
|
|
register allocation, thus guaranteeing it a physical register (though leaving
|
|
the particular physical register to be determined by the register allocator).
|
|
|
|
The ``_mov(Dest, Src)`` method in the ``TargetX8632`` class is sufficiently
|
|
powerful to handle these details in most situations. Its ``Dest`` argument is
|
|
an in/out parameter. If its input value is ``nullptr``, then a new temporary
|
|
variable is created, its type is set to the same type as the ``Src`` operand, it
|
|
is given infinite register weight, and the new ``Variable`` is returned through
|
|
the in/out parameter. (This is in addition to the new temporary being the dest
|
|
operand of the ``mov`` instruction.) The simpler version of the above example
|
|
is::
|
|
|
|
Variable *Reg = nullptr;
|
|
_mov(Reg, Src);
|
|
_mov(Dst, Reg);
|
|
|
|
Preferring another ``Variable``'s physical register
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
(An older version of ICE allowed the lowering code to provide a register
|
|
allocation hint: if a physical register is to be assigned to one ``Variable``,
|
|
then prefer a particular ``Variable``'s physical register if available. This
|
|
hint would be used to try to reduce the amount of register shuffling.
|
|
Currently, the register allocator does this automatically through the
|
|
``FindPreference`` logic.)
|
|
|
|
Ensuring a specific physical register
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Some instructions require operands in specific physical registers, or produce
|
|
results in specific physical registers. For example, the 32-bit ``ret``
|
|
instruction needs its operand in ``eax``. This can be done with
|
|
``Variable::setRegNum()``::
|
|
|
|
Variable *Reg;
|
|
Reg = Func->makeVariable(Src->getType());
|
|
Reg->setWeightInfinite();
|
|
Reg->setRegNum(Reg_eax);
|
|
NewInst = InstX8632Mov::create(Func, Reg, Src);
|
|
NewInst = InstX8632Ret::create(Func, Reg);
|
|
|
|
Precoloring with ``Variable::setRegNum()`` effectively gives it infinite weight
|
|
for register allocation, so the call to ``Variable::setWeightInfinite()`` is
|
|
technically unnecessary, but perhaps documents the intention a bit more
|
|
strongly.
|
|
|
|
The ``_mov(Dest, Src, RegNum)`` method in the ``TargetX8632`` class has an
|
|
optional ``RegNum`` argument to force a specific register assignment when the
|
|
input ``Dest`` is ``nullptr``. As described above, passing in ``Dest=nullptr``
|
|
causes a new temporary variable to be created with infinite register weight, and
|
|
in addition the specific register is chosen. The simpler version of the above
|
|
example is::
|
|
|
|
Variable *Reg = nullptr;
|
|
_mov(Reg, Src, Reg_eax);
|
|
_ret(Reg);
|
|
|
|
Disabling live-range interference
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
(An older version of ICE allowed an overly strong preference for another
|
|
``Variable``'s physical register even if their live ranges interfered. This was
|
|
risky, and currently the register allocator derives this automatically through
|
|
the ``AllowOverlap`` logic.)
|
|
|
|
Call instructions kill scratch registers
|
|
----------------------------------------
|
|
|
|
A ``call`` instruction kills the values in all scratch registers, so it's
|
|
important that the register allocator doesn't allocate a scratch register to a
|
|
``Variable`` whose live range spans the ``call`` instruction. ICE provides the
|
|
``InstFakeKill`` pseudo-instruction to compactly mark such register kills. For
|
|
each scratch register, a fake trivial live range is created that begins and ends
|
|
in that instruction. The ``InstFakeKill`` instruction is inserted after the
|
|
``call`` instruction. For example::
|
|
|
|
CallInst = InstX8632Call::create(Func, ... );
|
|
NewInst = InstFakeKill::create(Func, CallInst);
|
|
|
|
The last argument to the ``InstFakeKill`` constructor links it to the previous
|
|
call instruction, such that if its linked instruction is dead-code eliminated,
|
|
the ``InstFakeKill`` instruction is eliminated as well. The linked ``call``
|
|
instruction could be to a target known to be free of side effects, and therefore
|
|
safe to remove if its result is unused.
|
|
|
|
Instructions producing multiple values
|
|
--------------------------------------
|
|
|
|
ICE instructions allow at most one destination ``Variable``. Some machine
|
|
instructions produce more than one usable result. For example, the x86-32
|
|
``call`` ABI returns a 64-bit integer result in the ``edx:eax`` register pair.
|
|
Also, x86-32 has a version of the ``imul`` instruction that produces a 64-bit
|
|
result in the ``edx:eax`` register pair. The x86-32 ``idiv`` instruction
|
|
produces the quotient in ``eax`` and the remainder in ``edx``, though generally
|
|
only one or the other is needed in the lowering.
|
|
|
|
To support multi-dest instructions, ICE provides the ``InstFakeDef``
|
|
pseudo-instruction, whose destination can be precolored to the appropriate
|
|
physical register. For example, a ``call`` returning a 64-bit result in
|
|
``edx:eax``::
|
|
|
|
CallInst = InstX8632Call::create(Func, RegLow, ... );
|
|
NewInst = InstFakeKill::create(Func, CallInst);
|
|
Variable *RegHigh = Func->makeVariable(IceType_i32);
|
|
RegHigh->setRegNum(Reg_edx);
|
|
NewInst = InstFakeDef::create(Func, RegHigh);
|
|
|
|
``RegHigh`` is then assigned into the desired ``Variable``. If that assignment
|
|
ends up being dead-code eliminated, the ``InstFakeDef`` instruction may be
|
|
eliminated as well.
|
|
|
|
Managing dead-code elimination
|
|
------------------------------
|
|
|
|
ICE instructions with a non-nullptr ``Dest`` are subject to dead-code
|
|
elimination. However, some instructions must not be eliminated in order to
|
|
preserve side effects. This applies to most function calls, volatile loads, and
|
|
loads and integer divisions where the underlying language and runtime are
|
|
relying on hardware exception handling.
|
|
|
|
ICE facilitates this with the ``InstFakeUse`` pseudo-instruction. This forces a
|
|
use of its source ``Variable`` to keep that variable's definition alive. Since
|
|
the ``InstFakeUse`` instruction has no ``Dest``, it will not be eliminated.
|
|
|
|
Here is the full example of the x86-32 ``call`` returning a 32-bit integer
|
|
result::
|
|
|
|
Variable *Reg = Func->makeVariable(IceType_i32);
|
|
Reg->setRegNum(Reg_eax);
|
|
CallInst = InstX8632Call::create(Func, Reg, ... );
|
|
NewInst = InstFakeKill::create(Func, CallInst);
|
|
NewInst = InstFakeUse::create(Func, Reg);
|
|
NewInst = InstX8632Mov::create(Func, Result, Reg);
|
|
|
|
Without the ``InstFakeUse``, the entire call sequence could be dead-code
|
|
eliminated if its result were unused.
|
|
|
|
One more note on this topic. These tools can be used to allow a multi-dest
|
|
instruction to be dead-code eliminated only when none of its results is live.
|
|
The key is to use the optional source parameter of the ``InstFakeDef``
|
|
instruction. Using pseudocode::
|
|
|
|
t1:eax = call foo(arg1, ...)
|
|
InstFakeKill // eax, ecx, edx
|
|
t2:edx = InstFakeDef(t1)
|
|
v_result_low = t1
|
|
v_result_high = t2
|
|
|
|
If ``v_result_high`` is live but ``v_result_low`` is dead, adding ``t1`` as an
|
|
argument to ``InstFakeDef`` suffices to keep the ``call`` instruction live.
|
|
|
|
Instructions modifying source operands
|
|
--------------------------------------
|
|
|
|
Some native instructions may modify one or more source operands. For example,
|
|
the x86 ``xadd`` and ``xchg`` instructions modify both source operands. Some
|
|
analysis needs to identify every place a ``Variable`` is modified, and it uses
|
|
the presence of a ``Dest`` variable for this analysis. Since ICE instructions
|
|
have at most one ``Dest``, the ``xadd`` and ``xchg`` instructions need special
|
|
treatment.
|
|
|
|
A ``Variable`` that is not the ``Dest`` can be marked as modified by adding an
|
|
``InstFakeDef``. However, this is not sufficient, as the ``Variable`` may have
|
|
no more live uses, which could result in the ``InstFakeDef`` being dead-code
|
|
eliminated. The solution is to add an ``InstFakeUse`` as well.
|
|
|
|
To summarize, for every source ``Variable`` that is not equal to the
|
|
instruction's ``Dest``, append an ``InstFakeDef`` and ``InstFakeUse``
|
|
instruction to provide the necessary analysis information.
|