Subzero - Fast code generator for PNaCl bitcode
===============================================
Design
------
See the accompanying DESIGN.rst file for a more detailed technical overview of
Subzero.
Building
--------
Subzero is set up to be built within the Native Client tree. Follow the
`Developing PNaCl
<https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl>`_
instructions, in particular the section on building PNaCl sources. This will
prepare the necessary external headers and libraries that Subzero needs.
Checking out the Native Client project also gets the pre-built clang and LLVM
tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which
are used for building Subzero.
The Subzero source is in ``native_client/toolchain_build/src/subzero``. From
within that directory, ``git checkout master && git pull`` to get the latest
version of Subzero source code.
The Makefile is designed to be used as part of the higher level LLVM build
system. To build manually, use the ``Makefile.standalone``. There are several
build configurations from the command line::
make -f Makefile.standalone
make -f Makefile.standalone DEBUG=1
make -f Makefile.standalone NOASSERT=1
make -f Makefile.standalone DEBUG=1 NOASSERT=1
make -f Makefile.standalone MINIMAL=1
make -f Makefile.standalone ASAN=1
make -f Makefile.standalone TSAN=1
``DEBUG=1`` builds without optimizations and is good when running the translator
inside a debugger. ``NOASSERT=1`` disables assertions and is the preferred
configuration for performance testing the translator. ``MINIMAL=1`` attempts to
minimize the size of the translator by compiling out everything unnecessary.
``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer.
The result of the ``make`` command is the target ``pnacl-sz`` in the current
directory.
Building within LLVM trunk
--------------------------
Subzero can also be built from within a standard LLVM trunk checkout. Here is
an example of how it can be checked out and built::
mkdir llvm-git
cd llvm-git
git clone http://llvm.org/git/llvm.git
cd llvm/projects/
git clone https://chromium.googlesource.com/native_client/pnacl-subzero
cd ../..
mkdir build
cd build
cmake -G Ninja ../llvm/
ninja
./bin/pnacl-sz -version
This creates a default build of ``pnacl-sz``; currently any options such as
``DEBUG=1`` or ``MINIMAL=1`` have to be added manually.
``pnacl-sz``
------------
The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it
into ICE (Subzero's intermediate representation). It then invokes the ICE
translate method to lower it to target-specific machine code, optionally dumping
the intermediate representation at various stages of the translation.
The program can be run as follows::
../pnacl-sz ./path/to/<file>.pexe
../pnacl-sz ./tests_lit/pnacl-sz_tests/<file>.ll
At this time, ``pnacl-sz`` accepts a number of arguments, including the
following:
``-help`` -- Show available arguments and possible values. (Note: this
unfortunately also pulls in some LLVM-specific options that are reported but
that Subzero doesn't use.)
``-notranslate`` -- Suppress the ICE translation phase, which is useful if
ICE is missing some support.
``-target=<TARGET>`` -- Set the target architecture. The default is x8632.
Future targets include x8664, arm32, and arm64.
``-filetype=obj|asm|iasm`` -- Select the output file type. ``obj`` is a
native ELF file, ``asm`` is a textual assembly file, and ``iasm`` is a
low-level textual assembly file demonstrating the integrated assembler.
``-O<LEVEL>`` -- Set the optimization level. Valid levels are ``2``, ``1``,
``0``, ``-1``, and ``m1``. Levels ``-1`` and ``m1`` are synonyms, and
represent the minimum optimization and worst code quality, but fastest code
generation.
``-verbose=<list>`` -- Set verbosity flags. This argument allows a
comma-separated list of values. The default is ``none``, and the value
``inst,pred`` will roughly match the .ll bitcode file. Of particular use
are ``all``, ``most``, and ``none``.
``-o <FILE>`` -- Set the assembly output file name. Default is stdout.
``-log <FILE>`` -- Set the file name for diagnostic output (whose level is
controlled by ``-verbose``). Default is stdout.
``-timing`` -- Dump some pass timing information after translating the input
file.
Running the test suite
----------------------
Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which
lives in ``tests_lit``. To execute the test suite, first build Subzero, and then
run::
make -f Makefile.standalone check-lit
There is also a suite of cross tests in the ``crosstest`` directory. A cross
test takes a test bitcode file implementing some unit tests, and translates it
twice, once with Subzero and once with LLVM's known-good ``llc`` translator.
The Subzero-translated symbols are specially mangled to avoid multiple
definition errors from the linker. Both translated versions are linked together
with a driver program that calls each version of each unit test with a variety
of interesting inputs and compares the results for equality. The cross tests
are currently invoked by running::
make -f Makefile.standalone check-xtest
Similar, there is a suite of unit tests::
make -f Makefile.standalone check-unit
A convenient way to run the lit, cross, and unit tests is::
make -f Makefile.standalone check
Assembling ``pnacl-sz`` output as needed
----------------------------------------
``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``.
``pnacl-sz`` can also produce textual assembly code in a structure suitable for
input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``. An object
file can then be produced using the command::
llvm-mc -triple=i686 -filetype=obj -o=MyObj.o
Building a translated binary
----------------------------
There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe
into a fully linked executable. Run it with ``-help`` for extensive
documentation.
By default, ``szbuild.py`` builds an executable using only Subzero translation,
but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is
the name of the LLVM translator) for bisection-based debugging. In bisection
debugging mode, the pexe is translated using both Subzero and ``llc``, and the
resulting object files are combined into a single executable using symbol
weakening and other linker tricks to control which Subzero symbols and which
``llc`` symbols take precedence. This is controlled by the ``-include`` and
``-exclude`` arguments. These can be used to rapidly find a single function
that Subzero translates incorrectly leading to incorrect output.
There is another helper script, ``pydir/szbuild_spec2k.py``, that runs
``szbuild.py`` on one or more components of the Spec2K suite. This assumes that
Spec2K is set up in the usual place in the Native Client tree, and the finalized
pexe files have been built. (Note: for working with Spec2K and other pexes,
it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the
original function and global variable names.)
Status
------
Subzero currently fully supports the x86-32 architecture, for both native and
Native Client sandboxing modes. The x86-64 architecture is also supported in
native mode only, and only for the x32 flavor due to the fact that pointers and
32-bit integers are indistinguishable in PNaCl bitcode. Sandboxing support for
x86-64 is in progress. ARM and MIPS support is in progress. Two optimization
levels, ``-Om1`` and ``-O2``, are implemented.
The ``-Om1`` configuration is designed to be the simplest and fastest possible,
with a minimal set of passes and transformations.
* Simple Phi lowering before target lowering, by generating temporaries and
adding assignments to the end of predecessor blocks.
* Simple register allocation limited to pre-colored or infinite-weight
Variables.
The ``-O2`` configuration is designed to use all optimizations available and
produce the best code.
* Address mode inference to leverage the complex x86 addressing modes.
* Compare/branch fusing based on liveness/last-use analysis.
* Global, linear-scan register allocation.
* Advanced phi lowering after target lowering and global register allocation,
via edge splitting, topological sorting of the parallel moves, and final local
register allocation.
* Stack slot coalescing to reduce frame size.
* Branch optimization to reduce the number of branches to the following block.