369 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			369 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| ________________________________________________________________________
 | |
| 
 | |
| PYBENCH - A Python Benchmark Suite
 | |
| ________________________________________________________________________
 | |
| 
 | |
|      Extendable suite of low-level benchmarks for measuring
 | |
|           the performance of the Python implementation 
 | |
|                  (interpreter, compiler or VM).
 | |
| 
 | |
| pybench is a collection of tests that provides a standardized way to
 | |
| measure the performance of Python implementations. It takes a very
 | |
| close look at different aspects of Python programs and let's you
 | |
| decide which factors are more important to you than others, rather
 | |
| than wrapping everything up in one number, like the other performance
 | |
| tests do (e.g. pystone which is included in the Python Standard
 | |
| Library).
 | |
| 
 | |
| pybench has been used in the past by several Python developers to
 | |
| track down performance bottlenecks or to demonstrate the impact of
 | |
| optimizations and new features in Python.
 | |
| 
 | |
| The command line interface for pybench is the file pybench.py. Run
 | |
| this script with option '--help' to get a listing of the possible
 | |
| options. Without options, pybench will simply execute the benchmark
 | |
| and then print out a report to stdout.
 | |
| 
 | |
| 
 | |
| Micro-Manual
 | |
| ------------
 | |
| 
 | |
| Run 'pybench.py -h' to see the help screen.  Run 'pybench.py' to run
 | |
| the benchmark suite using default settings and 'pybench.py -f <file>'
 | |
| to have it store the results in a file too.
 | |
| 
 | |
| It is usually a good idea to run pybench.py multiple times to see
 | |
| whether the environment, timers and benchmark run-times are suitable
 | |
| for doing benchmark tests. 
 | |
| 
 | |
| You can use the comparison feature of pybench.py ('pybench.py -c
 | |
| <file>') to check how well the system behaves in comparison to a
 | |
| reference run. 
 | |
| 
 | |
| If the differences are well below 10% for each test, then you have a
 | |
| system that is good for doing benchmark testings.  Of you get random
 | |
| differences of more than 10% or significant differences between the
 | |
| values for minimum and average time, then you likely have some
 | |
| background processes running which cause the readings to become
 | |
| inconsistent. Examples include: web-browsers, email clients, RSS
 | |
| readers, music players, backup programs, etc.
 | |
| 
 | |
| If you are only interested in a few tests of the whole suite, you can
 | |
| use the filtering option, e.g. 'pybench.py -t string' will only
 | |
| run/show the tests that have 'string' in their name.
 | |
| 
 | |
| This is the current output of pybench.py --help:
 | |
| 
 | |
| """
 | |
| ------------------------------------------------------------------------
 | |
| PYBENCH - a benchmark test suite for Python interpreters/compilers.
 | |
| ------------------------------------------------------------------------
 | |
| 
 | |
| Synopsis:
 | |
|  pybench.py [option] files...
 | |
| 
 | |
| Options and default settings:
 | |
|   -n arg           number of rounds (10)
 | |
|   -f arg           save benchmark to file arg ()
 | |
|   -c arg           compare benchmark with the one in file arg ()
 | |
|   -s arg           show benchmark in file arg, then exit ()
 | |
|   -w arg           set warp factor to arg (10)
 | |
|   -t arg           run only tests with names matching arg ()
 | |
|   -C arg           set the number of calibration runs to arg (20)
 | |
|   -d               hide noise in comparisons (0)
 | |
|   -v               verbose output (not recommended) (0)
 | |
|   --with-gc        enable garbage collection (0)
 | |
|   --with-syscheck  use default sys check interval (0)
 | |
|   --timer arg      use given timer (time.time)
 | |
|   -h               show this help text
 | |
|   --help           show this help text
 | |
|   --debug          enable debugging
 | |
|   --copyright      show copyright
 | |
|   --examples       show examples of usage
 | |
| 
 | |
| Version:
 | |
|  2.0
 | |
| 
 | |
| The normal operation is to run the suite and display the
 | |
| results. Use -f to save them for later reuse or comparisons.
 | |
| 
 | |
| Available timers:
 | |
| 
 | |
|    time.time
 | |
|    time.clock
 | |
|    systimes.processtime
 | |
| 
 | |
| Examples:
 | |
| 
 | |
| python2.1 pybench.py -f p21.pybench
 | |
| python2.5 pybench.py -f p25.pybench
 | |
| python pybench.py -s p25.pybench -c p21.pybench
 | |
| """
 | |
| 
 | |
| License
 | |
| -------
 | |
| 
 | |
| See LICENSE file.
 | |
| 
 | |
| 
 | |
| Sample output
 | |
| -------------
 | |
| 
 | |
| """
 | |
| -------------------------------------------------------------------------------
 | |
| PYBENCH 2.0
 | |
| -------------------------------------------------------------------------------
 | |
| * using Python 2.4.2
 | |
| * disabled garbage collection
 | |
| * system check interval set to maximum: 2147483647
 | |
| * using timer: time.time
 | |
| 
 | |
| Calibrating tests. Please wait...
 | |
| 
 | |
| Running 10 round(s) of the suite at warp factor 10:
 | |
| 
 | |
| * Round 1 done in 6.388 seconds.
 | |
| * Round 2 done in 6.485 seconds.
 | |
| * Round 3 done in 6.786 seconds.
 | |
| ...
 | |
| * Round 10 done in 6.546 seconds.
 | |
| 
 | |
| -------------------------------------------------------------------------------
 | |
| Benchmark: 2006-06-12 12:09:25
 | |
| -------------------------------------------------------------------------------
 | |
| 
 | |
|     Rounds: 10
 | |
|     Warp:   10
 | |
|     Timer:  time.time
 | |
| 
 | |
|     Machine Details:
 | |
|        Platform ID:  Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
 | |
|        Processor:    x86_64
 | |
| 
 | |
|     Python:
 | |
|        Executable:   /usr/local/bin/python
 | |
|        Version:      2.4.2
 | |
|        Compiler:     GCC 3.3.4 (pre 3.3.5 20040809)
 | |
|        Bits:         64bit
 | |
|        Build:        Oct  1 2005 15:24:35 (#1)
 | |
|        Unicode:      UCS2
 | |
| 
 | |
| 
 | |
| Test                             minimum  average  operation  overhead
 | |
| -------------------------------------------------------------------------------
 | |
|           BuiltinFunctionCalls:    126ms    145ms    0.28us    0.274ms
 | |
|            BuiltinMethodLookup:    124ms    130ms    0.12us    0.316ms
 | |
|                  CompareFloats:    109ms    110ms    0.09us    0.361ms
 | |
|          CompareFloatsIntegers:    100ms    104ms    0.12us    0.271ms
 | |
|                CompareIntegers:    137ms    138ms    0.08us    0.542ms
 | |
|         CompareInternedStrings:    124ms    127ms    0.08us    1.367ms
 | |
|                   CompareLongs:    100ms    104ms    0.10us    0.316ms
 | |
|                 CompareStrings:    111ms    115ms    0.12us    0.929ms
 | |
|                 CompareUnicode:    108ms    128ms    0.17us    0.693ms
 | |
|                  ConcatStrings:    142ms    155ms    0.31us    0.562ms
 | |
|                  ConcatUnicode:    119ms    127ms    0.42us    0.384ms
 | |
|                CreateInstances:    123ms    128ms    1.14us    0.367ms
 | |
|             CreateNewInstances:    121ms    126ms    1.49us    0.335ms
 | |
|        CreateStringsWithConcat:    130ms    135ms    0.14us    0.916ms
 | |
|        CreateUnicodeWithConcat:    130ms    135ms    0.34us    0.361ms
 | |
|                   DictCreation:    108ms    109ms    0.27us    0.361ms
 | |
|              DictWithFloatKeys:    149ms    153ms    0.17us    0.678ms
 | |
|            DictWithIntegerKeys:    124ms    126ms    0.11us    0.915ms
 | |
|             DictWithStringKeys:    114ms    117ms    0.10us    0.905ms
 | |
|                       ForLoops:    110ms    111ms    4.46us    0.063ms
 | |
|                     IfThenElse:    118ms    119ms    0.09us    0.685ms
 | |
|                    ListSlicing:    116ms    120ms    8.59us    0.103ms
 | |
|                 NestedForLoops:    125ms    137ms    0.09us    0.019ms
 | |
|           NormalClassAttribute:    124ms    136ms    0.11us    0.457ms
 | |
|        NormalInstanceAttribute:    110ms    117ms    0.10us    0.454ms
 | |
|            PythonFunctionCalls:    107ms    113ms    0.34us    0.271ms
 | |
|              PythonMethodCalls:    140ms    149ms    0.66us    0.141ms
 | |
|                      Recursion:    156ms    166ms    3.32us    0.452ms
 | |
|                   SecondImport:    112ms    118ms    1.18us    0.180ms
 | |
|            SecondPackageImport:    118ms    127ms    1.27us    0.180ms
 | |
|          SecondSubmoduleImport:    140ms    151ms    1.51us    0.180ms
 | |
|        SimpleComplexArithmetic:    128ms    139ms    0.16us    0.361ms
 | |
|         SimpleDictManipulation:    134ms    136ms    0.11us    0.452ms
 | |
|          SimpleFloatArithmetic:    110ms    113ms    0.09us    0.571ms
 | |
|       SimpleIntFloatArithmetic:    106ms    111ms    0.08us    0.548ms
 | |
|        SimpleIntegerArithmetic:    106ms    109ms    0.08us    0.544ms
 | |
|         SimpleListManipulation:    103ms    113ms    0.10us    0.587ms
 | |
|           SimpleLongArithmetic:    112ms    118ms    0.18us    0.271ms
 | |
|                     SmallLists:    105ms    116ms    0.17us    0.366ms
 | |
|                    SmallTuples:    108ms    128ms    0.24us    0.406ms
 | |
|          SpecialClassAttribute:    119ms    136ms    0.11us    0.453ms
 | |
|       SpecialInstanceAttribute:    143ms    155ms    0.13us    0.454ms
 | |
|                 StringMappings:    115ms    121ms    0.48us    0.405ms
 | |
|               StringPredicates:    120ms    129ms    0.18us    2.064ms
 | |
|                  StringSlicing:    111ms    127ms    0.23us    0.781ms
 | |
|                      TryExcept:    125ms    126ms    0.06us    0.681ms
 | |
|                 TryRaiseExcept:    133ms    137ms    2.14us    0.361ms
 | |
|                   TupleSlicing:    117ms    120ms    0.46us    0.066ms
 | |
|                UnicodeMappings:    156ms    160ms    4.44us    0.429ms
 | |
|              UnicodePredicates:    117ms    121ms    0.22us    2.487ms
 | |
|              UnicodeProperties:    115ms    153ms    0.38us    2.070ms
 | |
|                 UnicodeSlicing:    126ms    129ms    0.26us    0.689ms
 | |
| -------------------------------------------------------------------------------
 | |
| Totals:                           6283ms   6673ms
 | |
| """
 | |
| ________________________________________________________________________
 | |
| 
 | |
| Writing New Tests
 | |
| ________________________________________________________________________
 | |
| 
 | |
| pybench tests are simple modules defining one or more pybench.Test
 | |
| subclasses.
 | |
| 
 | |
| Writing a test essentially boils down to providing two methods:
 | |
| .test() which runs .rounds number of .operations test operations each
 | |
| and .calibrate() which does the same except that it doesn't actually
 | |
| execute the operations.
 | |
| 
 | |
| 
 | |
| Here's an example:
 | |
| ------------------
 | |
| 
 | |
| from pybench import Test
 | |
| 
 | |
| class IntegerCounting(Test):
 | |
| 
 | |
|     # Version number of the test as float (x.yy); this is important
 | |
|     # for comparisons of benchmark runs - tests with unequal version
 | |
|     # number will not get compared.
 | |
|     version = 1.0
 | |
|     
 | |
|     # The number of abstract operations done in each round of the
 | |
|     # test. An operation is the basic unit of what you want to
 | |
|     # measure. The benchmark will output the amount of run-time per
 | |
|     # operation. Note that in order to raise the measured timings
 | |
|     # significantly above noise level, it is often required to repeat
 | |
|     # sets of operations more than once per test round. The measured
 | |
|     # overhead per test round should be less than 1 second.
 | |
|     operations = 20
 | |
| 
 | |
|     # Number of rounds to execute per test run. This should be
 | |
|     # adjusted to a figure that results in a test run-time of between
 | |
|     # 1-2 seconds (at warp 1).
 | |
|     rounds = 100000
 | |
| 
 | |
|     def test(self):
 | |
| 
 | |
| 	""" Run the test.
 | |
| 
 | |
| 	    The test needs to run self.rounds executing
 | |
| 	    self.operations number of operations each.
 | |
| 
 | |
|         """
 | |
|         # Init the test
 | |
|         a = 1
 | |
| 
 | |
|         # Run test rounds
 | |
| 	#
 | |
|         # NOTE: Use xrange() for all test loops unless you want to face
 | |
| 	# a 20MB process !
 | |
| 	#
 | |
|         for i in xrange(self.rounds):
 | |
| 
 | |
|             # Repeat the operations per round to raise the run-time
 | |
|             # per operation significantly above the noise level of the
 | |
|             # for-loop overhead. 
 | |
| 
 | |
| 	    # Execute 20 operations (a += 1):
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
|             a += 1
 | |
| 
 | |
|     def calibrate(self):
 | |
| 
 | |
| 	""" Calibrate the test.
 | |
| 
 | |
| 	    This method should execute everything that is needed to
 | |
| 	    setup and run the test - except for the actual operations
 | |
| 	    that you intend to measure. pybench uses this method to
 | |
|             measure the test implementation overhead.
 | |
| 
 | |
|         """
 | |
|         # Init the test
 | |
|         a = 1
 | |
| 
 | |
|         # Run test rounds (without actually doing any operation)
 | |
|         for i in xrange(self.rounds):
 | |
| 
 | |
| 	    # Skip the actual execution of the operations, since we
 | |
| 	    # only want to measure the test's administration overhead.
 | |
|             pass
 | |
| 
 | |
| Registering a new test module
 | |
| -----------------------------
 | |
| 
 | |
| To register a test module with pybench, the classes need to be
 | |
| imported into the pybench.Setup module. pybench will then scan all the
 | |
| symbols defined in that module for subclasses of pybench.Test and
 | |
| automatically add them to the benchmark suite.
 | |
| 
 | |
| 
 | |
| Breaking Comparability
 | |
| ----------------------
 | |
| 
 | |
| If a change is made to any individual test that means it is no
 | |
| longer strictly comparable with previous runs, the '.version' class
 | |
| variable should be updated. Therefafter, comparisons with previous
 | |
| versions of the test will list as "n/a" to reflect the change.
 | |
| 
 | |
| 
 | |
| Version History
 | |
| ---------------
 | |
| 
 | |
|   2.0: rewrote parts of pybench which resulted in more repeatable
 | |
|        timings:
 | |
|         - made timer a parameter
 | |
|         - changed the platform default timer to use high-resolution
 | |
|           timers rather than process timers (which have a much lower
 | |
|           resolution)
 | |
|         - added option to select timer
 | |
|         - added process time timer (using systimes.py)
 | |
|         - changed to use min() as timing estimator (average
 | |
|           is still taken as well to provide an idea of the difference)
 | |
|         - garbage collection is turned off per default
 | |
|         - sys check interval is set to the highest possible value
 | |
|         - calibration is now a separate step and done using
 | |
|           a different strategy that allows measuring the test
 | |
|           overhead more accurately
 | |
|         - modified the tests to each give a run-time of between
 | |
|           100-200ms using warp 10
 | |
|         - changed default warp factor to 10 (from 20)
 | |
|         - compared results with timeit.py and confirmed measurements
 | |
|         - bumped all test versions to 2.0
 | |
|         - updated platform.py to the latest version
 | |
|         - changed the output format a bit to make it look
 | |
|           nicer
 | |
|         - refactored the APIs somewhat
 | |
|   1.3+: Steve Holden added the NewInstances test and the filtering 
 | |
|        option during the NeedForSpeed sprint; this also triggered a long 
 | |
|        discussion on how to improve benchmark timing and finally
 | |
|        resulted in the release of 2.0
 | |
|   1.3: initial checkin into the Python SVN repository
 | |
| 
 | |
| 
 | |
| Have fun,
 | |
| --
 | |
| Marc-Andre Lemburg
 | |
| mal@lemburg.com
 |