127 lines
6.4 KiB
ReStructuredText
127 lines
6.4 KiB
ReStructuredText
Object allocation and lifetime in ICE
|
|
=====================================
|
|
|
|
This document discusses object lifetime and scoping issues, starting with
|
|
bitcode parsing and ending with ELF file emission.
|
|
|
|
Multithreaded translation model
|
|
-------------------------------
|
|
|
|
A single thread is responsible for parsing PNaCl bitcode (possibly concurrently
|
|
with downloading the bitcode file) and constructing the initial high-level ICE.
|
|
The result is a queue of Cfg pointers. The parser thread incrementally adds a
|
|
Cfg pointer to the queue after the Cfg is created, and then moves on to parse
|
|
the next function.
|
|
|
|
Multiple translation worker threads draw from the queue of Cfg pointers as they
|
|
are added to the queue, such that several functions can be translated in parallel.
|
|
The result is a queue of assembler buffers, each of which consists of machine code
|
|
plus fixups.
|
|
|
|
A single thread is responsible for writing the assembler buffers to an ELF file.
|
|
It consumes the assembler buffers from the queue that the translation threads
|
|
write to.
|
|
|
|
This means that Cfgs are created by the parser thread and destroyed by the
|
|
translation thread (including Cfg nodes, instructions, and most kinds of
|
|
operands), and assembler buffers are created by the translation thread and
|
|
destroyed by the writer thread.
|
|
|
|
Deterministic execution
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Although code randomization is a key aspect of security, deterministic and
|
|
repeatable translation is sometimes needed, e.g. for regression testing.
|
|
Multithreaded translation introduces potential for randomness that may need to
|
|
be made deterministic.
|
|
|
|
* Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the
|
|
translation queue in deterministic order. But since translation is
|
|
multithreaded, FIFO order for the assembler buffer queue may not be
|
|
deterministic. The writer thread would be responsible for reordering the
|
|
buffers, potentially waiting for slower translations to complete even if other
|
|
assembler buffers are available.
|
|
|
|
* Different translation threads may add new constant pool entries at different
|
|
times. Some constant pool entries are emitted as read-only data. This
|
|
includes floating-point constants for x86, as well as integer immediate
|
|
randomization through constant pooling. These constant pool entries are
|
|
emitted after all assembler buffers have been written. The writer needs to be
|
|
able to sort them deterministically before emitting them.
|
|
|
|
Object lifetimes
|
|
----------------
|
|
|
|
Objects of type Constant, or a subclass of Constant, are pooled globally. The
|
|
pooling is managed by the GlobalContext class. Since Constants are added or
|
|
looked up by translation threads and the parser thread, access to the constant
|
|
pools, as well as GlobalContext in general, need to be arbitrated by locks.
|
|
(It's possible that if there's too much contention, we can maintain a
|
|
thread-local cache for Constant pool lookups.) Constants live across all
|
|
function translations, and are destroyed only at the end.
|
|
|
|
Several object types are scoped within the lifetime of the Cfg. These include
|
|
CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand.
|
|
When the Cfg is destroyed, these scoped objects are destroyed as well. To keep
|
|
this cheap, the Cfg includes a slab allocator from which these objects are
|
|
allocated, and the objects should not contain fields with non-trivial
|
|
destructors. Most of these fields are POD, but in a couple of cases these
|
|
fields are STL containers. We deal with this, and avoid leaking memory, by
|
|
providing the container with an allocator that uses the Cfg-local slab
|
|
allocator. Since the container allocator generally needs to be stateless, we
|
|
store a pointer to the slab allocator in thread-local storage (TLS). This is
|
|
straightforward since on any of the threads, only one Cfg is active at a time,
|
|
and a given Cfg is only active in one thread at a time (either the parser
|
|
thread, or at most one translation thread, or the writer thread).
|
|
|
|
Even though there is a one-to-one correspondence between Cfgs and assembler
|
|
buffers, they need to use different allocators. This is because the translation
|
|
thread wants to destroy the Cfg and reclaim all its memory after translation
|
|
completes, but possibly before the assembly buffer is written to the ELF file.
|
|
Ownership of the assembler buffer and its allocator are transferred to the
|
|
writer thread after translation completes, similar to the way ownership of the
|
|
Cfg and its allocator are transferred to the translation thread after parsing
|
|
completes.
|
|
|
|
Allocators and TLS
|
|
------------------
|
|
|
|
Part of the Cfg building, and transformations on the Cfg, include STL container
|
|
operations which may need to allocate additional memory in a stateless fashion.
|
|
This requires maintaining the proper slab allocator pointer in TLS.
|
|
|
|
When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's
|
|
slab allocator into its own TLS. This is used as the Cfg is built within the
|
|
parser thread. After the Cfg is built, the parser thread clears its allocator
|
|
pointer, adds the new Cfg pointer to the translation queue, continues with the
|
|
next function.
|
|
|
|
When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab
|
|
allocator into its TLS and translates the function. When generating the
|
|
assembly buffer, it must take care not to use the Cfg's slab allocator. If
|
|
there is a slab allocator for the assembler buffer, a pointer to it can also be
|
|
installed in TLS if needed.
|
|
|
|
The translation thread destroys the Cfg when it is done translating, including
|
|
the Cfg's slab allocator, and clears the allocator pointer from its TLS.
|
|
Likewise, the writer thread destroys the assembler buffer when it is finished
|
|
with it.
|
|
|
|
Thread safety
|
|
-------------
|
|
|
|
The parse/translate/write stages of the translation pipeline are fairly
|
|
independent, with little opportunity for threads to interfere. The Subzero
|
|
design calls for all shared accesses to go through the GlobalContext, which adds
|
|
locking as appropriate. This includes the coarse-grain work queues for Cfgs and
|
|
assembler buffers. It also includes finer-grain access to constant pool
|
|
entries, as well as output streams for verbose debugging output.
|
|
|
|
If locked access to constant pools becomes a bottleneck, we can investigate
|
|
thread-local caches of constants (as mentioned earlier). Also, it should be
|
|
safe though slightly less efficient to allow duplicate copies of constants
|
|
across threads (which could be de-dupped by the writer at the end).
|
|
|
|
We will use ThreadSanitizer as a way to detect potential data races in the
|
|
implementation.
|