327 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			327 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <html>
 | |
| <head>
 | |
|     <title>Dalvik Optimization and Verification</title>
 | |
| </head>
 | |
| 
 | |
| <body>
 | |
| <h1>Dalvik Optimization and Verification With <i>dexopt</i></h1>
 | |
| 
 | |
| <p>
 | |
| The Dalvik virtual machine was designed specifically for the Android
 | |
| mobile platform.  The target systems have little RAM, store data on slow
 | |
| internal flash memory, and generally have the performance characteristics
 | |
| of decade-old desktop systems.  They also run Linux, which provides
 | |
| virtual memory, processes and threads, and UID-based security mechanisms.
 | |
| <p>
 | |
| The features and limitations caused us to focus on certain goals:
 | |
| 
 | |
| <ul>
 | |
|     <li>Class data, notably bytecode, must be shared between multiple
 | |
|     processes to minimize total system memory usage.
 | |
|     <li>The overhead in launching a new app must be minimized to keep
 | |
|     the device responsive.
 | |
|     <li>Storing class data in individual files results in a lot of
 | |
|     redundancy, especially with respect to strings.  To conserve disk
 | |
|     space we need to factor this out.
 | |
|     <li>Parsing class data fields adds unnecessary overhead during
 | |
|     class loading.  Accessing data values (e.g. integers and strings)
 | |
|     directly as C types is better.
 | |
|     <li>Bytecode verification is necessary, but slow, so we want to verify
 | |
|     as much as possible outside app execution.
 | |
|     <li>Bytecode optimization (quickened instructions, method pruning) is
 | |
|     important for speed and battery life.
 | |
|     <li>For security reasons, processes may not edit shared code.
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| The typical VM implementation uncompresses individual classes from a
 | |
| compressed archive and stores them on the heap.  This implies a separate
 | |
| copy of each class in every process, and slows application startup because
 | |
| the code must be uncompressed (or at least read off disk in many small
 | |
| pieces).  On the other hand, having the bytecode on the local heap makes
 | |
| it easy to rewrite instructions on first use, facilitating a number of
 | |
| different optimizations.
 | |
| <p>
 | |
| The goals led us to make some fundamental decisions:
 | |
| 
 | |
| <ul>
 | |
|     <li>Multiple classes are aggregated into a single "DEX" file.
 | |
|     <li>DEX files are mapped read-only and shared between processes.
 | |
|     <li>Byte ordering and word alignment are adjusted to suit the local
 | |
|     system.
 | |
|     <li>Bytecode verification is mandatory for all classes, but we want
 | |
|     to "pre-verify" whatever we can.
 | |
|     <li>Optimizations that require rewriting bytecode must be done ahead
 | |
|     of time.
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| The consequences of these decisions are explained in the following sections.
 | |
| 
 | |
| 
 | |
| <h2>VM Operation</h2>
 | |
| 
 | |
| <p>
 | |
| Application code is delivered to the system in a <code>.jar</code>
 | |
| or <code>.apk</code> file.  These are really just <code>.zip</code>
 | |
| archives with some meta-data files added.  The Dalvik DEX data file
 | |
| is always called <code>classes.dex</code>.
 | |
| <p>
 | |
| The bytecode cannot be memory-mapped and executed directly from the zip
 | |
| file, because the data is compressed and the start of the file is not
 | |
| guaranteed to be word-aligned.  These problems could be addressed by
 | |
| storing <code>classes.dex</code> without compression and padding out the zip
 | |
| file, but that would increase the size of the package sent across the
 | |
| data network.
 | |
| <p>
 | |
| We need to extract <code>classes.dex</code> from the zip archive before
 | |
| we can use it.  While we have the file available, we might as well perform
 | |
| some of the other actions (realignment, optimization, verification) described
 | |
| earlier.  This raises a new question however: who is responsible for doing
 | |
| this, and where do we keep the output?
 | |
| 
 | |
| <h3>Preparation</h3>
 | |
| 
 | |
| <p>
 | |
| There are at least three different ways to create a "prepared" DEX file,
 | |
| sometimes known as "ODEX" (for Optimized DEX):
 | |
| <ol>
 | |
|     <li>The VM does it "just in time".  The output goes into a special
 | |
|     <code>dalvik-cache</code> directory.  This works on the desktop and
 | |
|     engineering-only device builds where the permissions on the
 | |
|     <code>dalvik-cache</code> directory are not restricted.  On production
 | |
|     devices, this is not allowed.
 | |
|     <li>The system installer does it when an application is first added.
 | |
|     It has the privileges required to write to <code>dalvik-cache</code>.
 | |
|     <li>The build system does it ahead of time.  The relevant <code>jar</code>
 | |
|     / <code>apk</code> files are present, but the <code>classes.dex</code>
 | |
|     is stripped out.  The optimized DEX is stored next to the original
 | |
|     zip archive, not in <code>dalvik-cache</code>, and is part of the
 | |
|     system image.
 | |
| </ol>
 | |
| <p>
 | |
| The <code>dalvik-cache</code> directory is more accurately
 | |
| <code>$ANDROID_DATA/data/dalvik-cache</code>.  The files inside it have
 | |
| names derived from the full path of the source DEX.  On the device the
 | |
| directory is owned by <code>system</code> / <code>system</code>
 | |
| and has 0771 permissions, and the optimized DEX files stored there are
 | |
| owned by <code>system</code> and the
 | |
| application's group, with 0644 permissions.  DRM-locked applications will
 | |
| use 640 permissions to prevent other user applications from examining them.
 | |
| The bottom line is that you can read your own DEX file and those of most
 | |
| other applications, but you cannot create, modify, or remove them.
 | |
| <p>
 | |
| Preparation of the DEX file for the "just in time" and "system installer"
 | |
| approaches proceeds in three steps:
 | |
| <p>
 | |
| First, the dalvik-cache file is created.  This must be done in a process
 | |
| with appropriate privileges, so for the "system installer" case this is
 | |
| done within <code>installd</code>, which runs as root.
 | |
| <p>
 | |
| Second, the <code>classes.dex</code> entry is extracted from the the zip
 | |
| archive.  A small amount of space is left at the start of the file for
 | |
| the ODEX header.
 | |
| <p>
 | |
| Third, the file is memory-mapped for easy access and tweaked for use on
 | |
| the current system.  This includes byte-swapping and structure realigning,
 | |
| but no meaningful changes to the DEX file.  We also do some basic
 | |
| structure checks, such as ensuring that file offsets and data indices
 | |
| fall within valid ranges.
 | |
| <p>
 | |
| The build system uses a hairy process that involves starting the
 | |
| emulator, forcing just-in-time optimization of all relevant DEX files,
 | |
| and then extracting the results from <code>dalvik-cache</code>.  The
 | |
| reasons for doing this, rather than using a tool that runs on the desktop,
 | |
| will become more apparent when the optimizations are explained.
 | |
| <p>
 | |
| Once the code is byte-swapped and aligned, we're ready to go.  We append
 | |
| some pre-computed data, fill in the ODEX header at the start of the file,
 | |
| and start executing.  (The header is filled in last, so that we don't
 | |
| try to use a partial file.)  If we're interested in verification and
 | |
| optimization, however, we need to insert a step after the initial prep.
 | |
| 
 | |
| <h3>dexopt</h3>
 | |
| 
 | |
| <p>
 | |
| We want to verify and optimize all of the classes in the DEX file.  The
 | |
| easiest and safest way to do this is to load all of the classes into
 | |
| the VM and run through them.  Anything that fails to load is simply not
 | |
| verified or optimized.  Unfortunately, this can cause allocation of some
 | |
| resources that are difficult to release (e.g. loading of native shared
 | |
| libraries), so we don't want to do it in the same virtual machine that
 | |
| we're running applications in.
 | |
| <p>
 | |
| The solution is to invoke a program called <code>dexopt</code>, which
 | |
| is really just a back door into the VM.  It performs an abbreviated VM
 | |
| initialization, loads zero or more DEX files from the bootstrap class
 | |
| path, and then sets about verifying and optimizing whatever it can from
 | |
| the target DEX.  On completion, the process exits, freeing all resources.
 | |
| <p>
 | |
| It is possible for multiple VMs to want the same DEX file at the same
 | |
| time.  File locking is used to ensure that dexopt is only run once.
 | |
| 
 | |
| 
 | |
| <h2>Verification</h2>
 | |
| 
 | |
| <p>
 | |
| The bytecode verification process involves scanning through the instructions
 | |
| in every method in every class in a DEX file.  The goal is to identify
 | |
| illegal instruction sequences so that we don't have to check for them at
 | |
| run time.  Many of the computations involved are also necessary for "exact"
 | |
| garbage collection.  See
 | |
| <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more
 | |
| information.
 | |
| <p>
 | |
| For performance reasons, the optimizer (described in the next section)
 | |
| assumes that the verifier has run successfully, and makes some potentially
 | |
| unsafe assumptions.  By default, Dalvik insists upon verifying all classes,
 | |
| and only optimizes classes that have been verified.  If you want to
 | |
| disable the verifier, you can use command-line flags to do so.  See also
 | |
| <a href="embedded-vm-control.html"> Controlling the Embedded VM</a>
 | |
| for instructions on controlling these
 | |
| features within the Android application framework.
 | |
| <p>
 | |
| Reporting of verification failures is a tricky issue.  For example,
 | |
| calling a package-scope method on a class in a different package is
 | |
| illegal and will be caught by the verifier.  We don't necessarily want
 | |
| to report it during verification though -- we actually want to throw
 | |
| an exception when the method call is attempted.  Checking the access
 | |
| flags on every method call is expensive though.  The
 | |
| <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document
 | |
| addresses this issue.
 | |
| <p>
 | |
| Classes that have been verified successfully have a flag set in the ODEX.
 | |
| They will not be re-verified when loaded.  The Linux access permissions
 | |
| are expected to prevent tampering; if you can get around those, installing
 | |
| faulty bytecode is far from the easiest line of attack.  The ODEX file has
 | |
| a 32-bit checksum, but that's chiefly present as a quick check for
 | |
| corrupted data.
 | |
| 
 | |
| 
 | |
| <h2>Optimization</h2>
 | |
| 
 | |
| <p>
 | |
| Virtual machine interpreters typically perform certain optimizations the
 | |
| first time a piece of code is used.  Constant pool references are replaced
 | |
| with pointers to internal data structures, operations that always succeed
 | |
| or always work a certain way are replaced with simpler forms.  Some of
 | |
| these require information only available at runtime, others can be inferred
 | |
| statically when certain assumptions are made.
 | |
| <p>
 | |
| The Dalvik optimizer does the following:
 | |
| <ul>
 | |
|     <li>For virtual method calls, replace the method index with a
 | |
|     vtable index.
 | |
|     <li>For instance field get/put, replace the field index with
 | |
|     a byte offset.  Also, merge the boolean / byte / char / short
 | |
|     variants into a single 32-bit form (less code in the interpreter
 | |
|     means more room in the CPU I-cache).
 | |
|     <li>Replace a handful of high-volume calls, like String.length(),
 | |
|     with "inline" replacements.  This skips the usual method call
 | |
|     overhead, directly switching from the interpreter to a native
 | |
|     implementation.
 | |
|     <li>Prune empty methods.  The simplest example is
 | |
|     <code>Object.<init></code>, which does nothing, but must be
 | |
|     called whenever any object is allocated.  The instruction is
 | |
|     replaced with a new version that acts as a no-op unless a debugger
 | |
|     is attached.
 | |
|     <li>Append pre-computed data.  For example, the VM wants to have a
 | |
|     hash table for lookups on class name.  Instead of computing this
 | |
|     when the DEX file is loaded, we can compute it now, saving heap
 | |
|     space and computation time in every VM where the DEX is loaded.
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| All of the instruction modifications involve replacing the opcode with
 | |
| one not defined by the Dalvik specification.  This allows us to freely
 | |
| mix optimized and unoptimized instructions.  The set of optimized
 | |
| instructions, and their exact representation, is tied closely to the VM
 | |
| version.
 | |
| <p>
 | |
| Most of the optimizations are obvious "wins".  The use of raw indices
 | |
| and offsets not only allows us to execute more quickly, we can also
 | |
| skip the initial symbolic resolution.  Pre-computation eats up
 | |
| disk space, and so must be done in moderation.
 | |
| <p>
 | |
| There are a couple of potential sources of trouble with these
 | |
| optimizations.  First, vtable indices and byte offsets are subject to
 | |
| change if the VM is updated.  Second, if a superclass is in a different
 | |
| DEX, and that other DEX is updated, we need to ensure that our optimized
 | |
| indices and offsets are updated as well.  A similar but more subtle
 | |
| problem emerges when user-defined class loaders are employed: the class
 | |
| we actually call may not be the one we expected to call.
 | |
| <p>These problems are addressed with dependency lists and some limitations
 | |
| on what can be optimized.
 | |
| 
 | |
| 
 | |
| <h2>Dependencies and Limitations</h2>
 | |
| 
 | |
| <p>
 | |
| The optimized DEX file includes a list of dependencies on other DEX files,
 | |
| plus the CRC-32 and modification date from the originating
 | |
| <code>classes.dex</code> zip file entry.  The dependency list includes the
 | |
| full path to the <code>dalvik-cache</code> file, and the file's SHA-1
 | |
| signature.  The timestamps of files on the device are unreliable and
 | |
| not used.  The dependency area also includes the VM version number.
 | |
| <p>
 | |
| An optimized DEX is dependent upon all of the DEX files in the bootstrap
 | |
| class path.  DEX files that are part of the bootstrap class path depend
 | |
| upon the DEX files that appeared earlier.  To ensure that nothing outside
 | |
| the dependent DEX files is available, <code>dexopt</code> only loads the
 | |
| bootstrap classes.  References to classes in other DEX files fail, which
 | |
| causes class loading and/or verification to fail, and classes with
 | |
| external dependencies are simply not optimized.
 | |
| <p>
 | |
| This means that splitting code out into many separate DEX files has a
 | |
| disadvantage: virtual method calls and instance field lookups between
 | |
| non-boot DEX files can't be optimized.  Because verification is pass/fail
 | |
| with class granularity, no method in a class that has any reliance on
 | |
| classes in external DEX files can be optimized.  This may be a bit
 | |
| heavy-handed, but it's the only way to guarantee that nothing breaks
 | |
| when individual pieces are updated.
 | |
| <p>
 | |
| Another negative consequence: any change to a bootstrap DEX will result
 | |
| in rejection of all optimized DEX files.  This makes it hard to keep
 | |
| system updates small.
 | |
| <p>
 | |
| Despite our caution, there is still a possibility that a class in a DEX
 | |
| file loaded by a user-defined class loader could ask for a bootstrap class
 | |
| (say, String) and be given a different class with the same name.  If a
 | |
| class in the DEX file being processed has the same name as a class in the
 | |
| bootstrap DEX files, the class will be flagged as ambiguous and references
 | |
| to it will not be resolved during verification / optimization.  The class
 | |
| linking code in the VM does additional checks to plug another hole;
 | |
| see the verbose description in the VM sources for details (vm/oo/Class.c).
 | |
| <p>
 | |
| If one of the dependencies is updated, we need to re-verify and
 | |
| re-optimize the DEX file.  If we can do a just-in-time <code>dexopt</code>
 | |
| invocation, this is easy.  If we have to rely on the installer daemon, or
 | |
| the DEX was shipped only in ODEX, then the VM has to reject the DEX.
 | |
| <p>
 | |
| The output of <code>dexopt</code> is byte-swapped and struct-aligned
 | |
| for the host, and contains indices and offsets that are highly VM-specific
 | |
| (both version-wise and platform-wise).  For this reason it's tricky to
 | |
| write a version of <code>dexopt</code> that runs on the desktop but
 | |
| generates output suitable for a particular device.  The safest way to
 | |
| invoke it is on the target device, or on an emulator for that device.
 | |
| 
 | |
| 
 | |
| <h2>Generated DEX</h2>
 | |
| 
 | |
| <p>
 | |
| Some languages and frameworks rely on the ability to generate bytecode
 | |
| and execute it.  The rather heavy <code>dexopt</code> verification and
 | |
| optimization model doesn't work well with that.
 | |
| <p>
 | |
| We intend to support this in a future release, but the exact method is
 | |
| to be determined.  We may allow individual classes to be added or whole
 | |
| DEX files; may allow Java bytecode or Dalvik bytecode in instructions;
 | |
| may perform the usual set of optimizations, or use a separate interpreter
 | |
| that performs on-first-use optimizations directly on the bytecode (which
 | |
| won't be mapped read-only, since it's locally defined).
 | |
| 
 | |
| <address>Copyright © 2008 The Android Open Source Project</address>
 | |
| 
 | |
| </body>
 | |
| </html>
 |