161 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			161 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
==========================================
 | 
						|
Design and Usage of the InAlloca Attribute
 | 
						|
==========================================
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
 | 
						|
taking the address of an aggregate argument that is being passed by
 | 
						|
value through memory.  Primarily, this feature is required for
 | 
						|
compatibility with the Microsoft C++ ABI.  Under that ABI, class
 | 
						|
instances that are passed by value are constructed directly into
 | 
						|
argument stack memory.  Prior to the addition of inalloca, calls in LLVM
 | 
						|
were indivisible instructions.  There was no way to perform intermediate
 | 
						|
work, such as object construction, between the first stack adjustment
 | 
						|
and the final control transfer.  With inalloca, all arguments passed in
 | 
						|
memory are modelled as a single alloca, which can be stored to prior to
 | 
						|
the call.  Unfortunately, this complicated feature comes with a large
 | 
						|
set of restrictions designed to bound the lifetime of the argument
 | 
						|
memory around the call.
 | 
						|
 | 
						|
For now, it is recommended that frontends and optimizers avoid producing
 | 
						|
this construct, primarily because it forces the use of a base pointer.
 | 
						|
This feature may grow in the future to allow general mid-level
 | 
						|
optimization, but for now, it should be regarded as less efficient than
 | 
						|
passing by value with a copy.
 | 
						|
 | 
						|
Intended Usage
 | 
						|
==============
 | 
						|
 | 
						|
The example below is the intended LLVM IR lowering for some C++ code
 | 
						|
that passes two default-constructed ``Foo`` objects to ``g`` in the
 | 
						|
32-bit Microsoft C++ ABI.
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    // Foo is non-trivial.
 | 
						|
    struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
 | 
						|
    void g(Foo a, Foo b);
 | 
						|
    void f() {
 | 
						|
      g(Foo(), Foo());
 | 
						|
    }
 | 
						|
 | 
						|
.. code-block:: llvm
 | 
						|
 | 
						|
    %struct.Foo = type { i32, i32 }
 | 
						|
    declare void @Foo_ctor(%struct.Foo* %this)
 | 
						|
    declare void @Foo_dtor(%struct.Foo* %this)
 | 
						|
    declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
 | 
						|
 | 
						|
    define void @f() {
 | 
						|
    entry:
 | 
						|
      %base = call i8* @llvm.stacksave()
 | 
						|
      %memargs = alloca <{ %struct.Foo, %struct.Foo }>
 | 
						|
      %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
 | 
						|
      call void @Foo_ctor(%struct.Foo* %b)
 | 
						|
 | 
						|
      ; If a's ctor throws, we must destruct b.
 | 
						|
      %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
 | 
						|
      invoke void @Foo_ctor(%struct.Foo* %a)
 | 
						|
          to label %invoke.cont unwind %invoke.unwind
 | 
						|
 | 
						|
    invoke.cont:
 | 
						|
      call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
 | 
						|
      call void @llvm.stackrestore(i8* %base)
 | 
						|
      ...
 | 
						|
 | 
						|
    invoke.unwind:
 | 
						|
      call void @Foo_dtor(%struct.Foo* %b)
 | 
						|
      call void @llvm.stackrestore(i8* %base)
 | 
						|
      ...
 | 
						|
    }
 | 
						|
 | 
						|
To avoid stack leaks, the frontend saves the current stack pointer with
 | 
						|
a call to :ref:`llvm.stacksave <int_stacksave>`.  Then, it allocates the
 | 
						|
argument stack space with alloca and calls the default constructor.  The
 | 
						|
default constructor could throw an exception, so the frontend has to
 | 
						|
create a landing pad.  The frontend has to destroy the already
 | 
						|
constructed argument ``b`` before restoring the stack pointer.  If the
 | 
						|
constructor does not unwind, ``g`` is called.  In the Microsoft C++ ABI,
 | 
						|
``g`` will destroy its arguments, and then the stack is restored in
 | 
						|
``f``.
 | 
						|
 | 
						|
Design Considerations
 | 
						|
=====================
 | 
						|
 | 
						|
Lifetime
 | 
						|
--------
 | 
						|
 | 
						|
The biggest design consideration for this feature is object lifetime.
 | 
						|
We cannot model the arguments as static allocas in the entry block,
 | 
						|
because all calls need to use the memory at the top of the stack to pass
 | 
						|
arguments.  We cannot vend pointers to that memory at function entry
 | 
						|
because after code generation they will alias.
 | 
						|
 | 
						|
The rule against allocas between argument allocations and the call site
 | 
						|
avoids this problem, but it creates a cleanup problem.  Cleanup and
 | 
						|
lifetime is handled explicitly with stack save and restore calls.  In
 | 
						|
the future, we may want to introduce a new construct such as ``freea``
 | 
						|
or ``afree`` to make it clear that this stack adjusting cleanup is less
 | 
						|
powerful than a full stack save and restore.
 | 
						|
 | 
						|
Nested Calls and Copy Elision
 | 
						|
-----------------------------
 | 
						|
 | 
						|
We also want to be able to support copy elision into these argument
 | 
						|
slots.  This means we have to support multiple live argument
 | 
						|
allocations.
 | 
						|
 | 
						|
Consider the evaluation of:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    // Foo is non-trivial.
 | 
						|
    struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
 | 
						|
    Foo bar(Foo b);
 | 
						|
    int main() {
 | 
						|
      bar(bar(Foo()));
 | 
						|
    }
 | 
						|
 | 
						|
In this case, we want to be able to elide copies into ``bar``'s argument
 | 
						|
slots.  That means we need to have more than one set of argument frames
 | 
						|
active at the same time.  First, we need to allocate the frame for the
 | 
						|
outer call so we can pass it in as the hidden struct return pointer to
 | 
						|
the middle call.  Then we do the same for the middle call, allocating a
 | 
						|
frame and passing its address to ``Foo``'s default constructor.  By
 | 
						|
wrapping the evaluation of the inner ``bar`` with stack save and
 | 
						|
restore, we can have multiple overlapping active call frames.
 | 
						|
 | 
						|
Callee-cleanup Calling Conventions
 | 
						|
----------------------------------
 | 
						|
 | 
						|
Another wrinkle is the existence of callee-cleanup conventions.  On
 | 
						|
Windows, all methods and many other functions adjust the stack to clear
 | 
						|
the memory used to pass their arguments.  In some sense, this means that
 | 
						|
the allocas are automatically cleared by the call.  However, LLVM
 | 
						|
instead models this as a write of undef to all of the inalloca values
 | 
						|
passed to the call instead of a stack adjustment.  Frontends should
 | 
						|
still restore the stack pointer to avoid a stack leak.
 | 
						|
 | 
						|
Exceptions
 | 
						|
----------
 | 
						|
 | 
						|
There is also the possibility of an exception.  If argument evaluation
 | 
						|
or copy construction throws an exception, the landing pad must do
 | 
						|
cleanup, which includes adjusting the stack pointer to avoid a stack
 | 
						|
leak.  This means the cleanup of the stack memory cannot be tied to the
 | 
						|
call itself.  There needs to be a separate IR-level instruction that can
 | 
						|
perform independent cleanup of arguments.
 | 
						|
 | 
						|
Efficiency
 | 
						|
----------
 | 
						|
 | 
						|
Eventually, it should be possible to generate efficient code for this
 | 
						|
construct.  In particular, using inalloca should not require a base
 | 
						|
pointer.  If the backend can prove that all points in the CFG only have
 | 
						|
one possible stack level, then it can address the stack directly from
 | 
						|
the stack pointer.  While this is not yet implemented, the plan is that
 | 
						|
the inalloca attribute should not change much, but the frontend IR
 | 
						|
generation recommendations may change.
 |