217 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			217 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <html>
 | |
| <head>
 | |
| <title>Dalvik Bytecode Verifier Notes</title>
 | |
| </head>
 | |
| 
 | |
| <body>
 | |
| <h1>Dalvik Bytecode Verifier Notes</h1>
 | |
| 
 | |
| <p>
 | |
| The bytecode verifier in the Dalvik VM attempts to provide the same sorts
 | |
| of checks and guarantees that other popular virtual machines do.  We
 | |
| perform generally the same set of checks as are described in _The Java
 | |
| Virtual Machine Specification, Second Edition_, including the updates
 | |
| planned for the Third Edition.
 | |
| 
 | |
| <p>
 | |
| Verification can be enabled for all classes, disabled for all, or enabled
 | |
| only for "remote" (non-bootstrap) classes.  It should be performed for any
 | |
| class that will be processed with the DEX optimizer, and in fact the
 | |
| default VM behavior is to only optimize verified classes.
 | |
| 
 | |
| 
 | |
| <h2>Why Verify?</h2>
 | |
| 
 | |
| <p>
 | |
| The verification process adds additional time to the build and to
 | |
| the installation of new applications.  It's fairly quick for app-sized
 | |
| DEX files, but rather slow for the big "core" and "framework" files.
 | |
| Why do it all, when our system relies on UNIX processes for security?
 | |
| <p>
 | |
| <ol>
 | |
|     <li>Optimizations.  The interpreter can ignore a lot of potential
 | |
|     error cases because the verifier guarantees that they are impossible.
 | |
|     Also, we can optimize the DEX file more aggressively if we start
 | |
|     with a stronger set of assumptions about the bytecode.
 | |
|     <li>"Precise" GC.  The work peformed during verification has significant
 | |
|     overlap with the work required to compute register use maps for
 | |
|     type-precise GC.
 | |
|     <li>Intra-application security.  If an app wants to download bits
 | |
|     of interpreted code over the network and execute them, it can safely
 | |
|     do so using well-established security mechanisms.
 | |
|     <li>3rd party app failure analysis.  We have no way to control the
 | |
|     tools and post-processing utilities that external developers employ,
 | |
|     so when we get bug reports with a weird exception or native crash
 | |
|     it's very helpful to start with the assumption that the bytecode
 | |
|     is valid.
 | |
| </ol>
 | |
| <p>
 | |
| It's also a convenient framework to deal with certain situations, notably
 | |
| replacement of instructions that access volatile 64-bit fields with
 | |
| more rigorous versions that guarantee atomicity.
 | |
| 
 | |
| 
 | |
| <h2>Verifier Differences</h2>
 | |
| 
 | |
| <p>
 | |
| There are a few checks that the Dalvik bytecode verifier does not perform,
 | |
| because they're not relevant.  For example:
 | |
| <ul>
 | |
|     <li>Type restrictions on constant pool references are not enforced,
 | |
|     because Dalvik does not have a pool of typed constants.  (Dalvik
 | |
|     uses a simple index into type-specific pools.)
 | |
|     <li>Verification of the operand stack size is not performed, because
 | |
|     Dalvik does not have an operand stack.
 | |
|     <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
 | |
|     because Dalvik doesn't support subroutines.
 | |
| </ul>
 | |
| 
 | |
| In some cases they are implemented differently, e.g.:
 | |
| <ul>
 | |
|     <li>In a conventional VM, backward branches and exceptions are
 | |
|     forbidden when a local variable holds an uninitialized reference.  The
 | |
|     restriction was changed to mark registers as invalid when they hold
 | |
|     references to the uninitialized result of a previous invocation of the
 | |
|     same <code>new-instance</code> instruction.
 | |
|     This solves the same problem -- trickery potentially allowing
 | |
|     uninitialized objects to slip past the verifier -- without unduly
 | |
|     limiting branches.
 | |
| </ul>
 | |
| 
 | |
| There are also some new ones, such as:
 | |
| <ul>
 | |
|     <li>The <code>move-exception</code> instruction can only appear as
 | |
|     the first instruction in an exception handler.
 | |
|     <li>The <code>move-result*</code> instructions can only appear
 | |
|     immediately after an appropriate <code>invoke-*</code>
 | |
|     or <code>filled-new-array</code> instruction.
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| The VM is permitted but not required to enforce "structured locking"
 | |
| constraints, which are designed to ensure that, when a method returns, all
 | |
| monitors locked by the method have been unlocked an equal number of times.
 | |
| This is not currently implemented.
 | |
| 
 | |
| <p>
 | |
| The Dalvik verifier is more restrictive than other VMs in one area:
 | |
| type safety on sub-32-bit integer widths.  These additional restrictions
 | |
| should make it impossible to, say, pass a value outside the range
 | |
| [-128, 127] to a function that takes a <code>byte</code> as an argument.
 | |
| 
 | |
| 
 | |
| <h2>Monitor Verification</h2>
 | |
| 
 | |
| <p>
 | |
| If a method locks an object with a <code>synchronized</code> statement, the
 | |
| object must be unlocked before the method returns.  At the bytecode level,
 | |
| this means the method must execute a matching <code>monitor-exit</code>
 | |
| for every <code>monitor-enter</code> instruction, whether the function
 | |
| completes normally or abnormally.  The bytecode verifier optionally
 | |
| enforces this.
 | |
| 
 | |
| <p>
 | |
| The verifier uses a fairly simple-minded model.  If you enter a monitor
 | |
| held in register N, you can exit the monitor using register N or any
 | |
| subsequently-made copies of register N.  The verifier does not attempt
 | |
| to identify previously-made copies, track loads and stores through
 | |
| fields, or recognize identical constant values (for example, the result
 | |
| values from two <code>const-class</code> instructions on the same class
 | |
| will be the same reference, but the verifier doesn't recognize this).
 | |
| 
 | |
| <p>
 | |
| Further, you may only exit the monitor most recently entered.  "Hand
 | |
| over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
 | |
| are not allowed.
 | |
| 
 | |
| <p>
 | |
| This means that there are a number of situations in which the verifier
 | |
| will throw an exception on code that would execute correctly at run time.
 | |
| This is not expected to be an issue for compiler-generated bytecode.
 | |
| 
 | |
| <p>
 | |
| For implementation convenience, the maximum nesting depth of
 | |
| <code>synchronized</code> statements has been set to 32.  This is not
 | |
| a limitation on the recursion count.  The only way to trip this would be
 | |
| to have a single method with more than 32 nested <code>synchronized</code>
 | |
| statements, something that is unlikely to occur.
 | |
| 
 | |
| 
 | |
| <h2>Verification Failures</h2>
 | |
| 
 | |
| <p>
 | |
| The verifier may reject a class immediately, or it may defer throwing
 | |
| an exception until the code is actually used.  For example, if a class
 | |
| attempts to perform an illegal access on a field, the VM should throw
 | |
| an IllegalAccessError the first time the instruction is encountered.
 | |
| On the other hand, if a class contains an invalid bytecode, it should be
 | |
| rejected immediately with a VerifyError.
 | |
| 
 | |
| <p>
 | |
| Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
 | |
| information in the log file.  From this it's possible to determine the
 | |
| exact instruction that failed, and the reason for the failure.
 | |
| 
 | |
| <p>
 | |
| It's a bit tricky to implement deferred verification errors in Dalvik.
 | |
| A few approaches were considered:
 | |
| 
 | |
| <ol>
 | |
| <li>We could replace the invalid field access instruction with a special
 | |
| instruction that generates an illegal access error, and allow class
 | |
| verification to complete successfully.  This type of verification must
 | |
| be deferred to first class load, rather than be performed ahead of time
 | |
| during DEX optimization, because some failures will depend on the current
 | |
| execution environment (e.g. not all classes are available at dexopt time).
 | |
| At that point the bytecode instructions are mapped read-only during
 | |
| verification, so rewriting them isn't possible.
 | |
| </li>
 | |
| 
 | |
| <li>We can perform the access checks when the field/method/class is
 | |
| resolved.  In a typical VM implementation we would do the check when the
 | |
| entry is resolved in the context of the current classfile, but our DEX
 | |
| files combine multiple classfiles together, merging the field/method/class
 | |
| resolution results into a single large table.  Once one class successfully
 | |
| resolves the field, every other class in the same DEX file would be able
 | |
| to access the field.  This is incorrect.
 | |
| </li>
 | |
| 
 | |
| <li>Perform the access checks on every field/method/class access.
 | |
| This adds significant overhead.  This is mitigated somewhat by the DEX
 | |
| optimizer, which will convert many field/method/class accesses into a
 | |
| simpler form after performing the access check.  However, not all accesses
 | |
| can be optimized (e.g. accesses to classes unknown at dexopt time),
 | |
| and we don't currently have an optimized form of certain instructions
 | |
| (notably static field operations).
 | |
| </li>
 | |
| </ol>
 | |
| 
 | |
| <p>
 | |
| In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
 | |
| simply regarded all problems as immediately fatal.  This generally worked,
 | |
| but in some cases the VM was rejecting classes because of bits of code
 | |
| that were never used.  The VerifyError itself was sometimes difficult to
 | |
| decipher, because it was thrown during verification rather than at the
 | |
| point where the problem was first noticed during execution.
 | |
| <p>
 | |
| The current version uses a variation of approach #1.  The dexopt
 | |
| command works the way it did before, leaving the code untouched and
 | |
| flagging fully-correct classes as "pre-verified".  When the VM loads a
 | |
| class that didn't pass pre-verification, the verifier is invoked.  If a
 | |
| "deferrable" problem is detected, a modifiable copy of the instructions
 | |
| in the problematic method is made.  In that copy, the troubled instruction
 | |
| is replaced with an "always throw" opcode, and verification continues.
 | |
| 
 | |
| <p>
 | |
| In the example used earlier, an attempt to read from an inaccessible
 | |
| field would result in the "field get" instruction being replaced by
 | |
| "always throw IllegalAccessError on field X".  Creating copies of method
 | |
| bodies requires additional heap space, but since this affects very few
 | |
| methods overall the memory impact should be minor.
 | |
| 
 | |
| <p>
 | |
| <address>Copyright © 2008 The Android Open Source Project</address>
 | |
| 
 | |
| </body>
 | |
| </html>
 |