by Denis Robilliard, Virginie Marion-Poty, Cyril Fonlupt
Univ. Lille Nord de France,
ULCO, LIL,
50 rue F. Buisson,
62100 Calais, France.
Last updated on Friday May 8th 2009
(last update may have modified the code package)
The new version package is now available: clik here to download. The accompanying paper is due to appear in the GPEM journal (you can download a draft.
The installation of the package is similar to the previous one (see below): simply uncompress the archive in the "app" directory of ECJ, this will create an "rpncuda" directory containing the program files, that you should compile. Do not forget to run make in this directory in order to build the GPU libraries.
This package implements an RPN representation for GP trees into ECJ - we express our gratitude to W. B. Langdon for kindly giving access to his RPN code, see his web page. Breeding programs in RPN form accelerates the transfer of the GP population between CPU and GPU.
Another improvement is GP programs are now cached in the G80 fast memory bank.
Both these optimizations make this version about three times faster on the sextic regression benchmark, with a measured speed of 2.8 billion GP operations per second for 1024 fitness cases.
The size of the population is no more limited to 65535 individuals.
ERCs are implemented as an array of fixed constant values, either randomly drawn or specified at the beginning of the run, i.e. in a way somewhat similar to Banzhaf and Brameier's Linear GP. Constants are limited to 128 values maximum, so we can keep the RPN opcodes stored in bytes.
The package works only with graphics card in the NVidia G80 family (e.g. 8800GTX, 8600 GT, ...)
The code is written using single precision (float) type.
Several new parameters are needed. These can be found in the commented parameter files that are included in the package. Most parameters are loaded in the problem class: that is untidy... it would have to be corrected in the future...
The package does not support multi-threading for the ECJ/Java part.
Only one subpopulation is allowed.
The package has been tested with Nvidia graphics driver 169.09 . Older graphics driver versions may/will halt with a core dump, or hang, or can even hang the system if you run your experiments on the same graphics card that run the system desktop. Newer driver have not been tested.
The package has been tested with Nvidia CUDA release 1.1, V0.2.1221. It has not been run with newer version of the toolkit due to compatibility problems with gcc (that may have been corrected by now) .
If experiments are run on the same graphics card that runs the system desktop, calculations that last for more than 5 seconds will be killed by the desktop watchdog, i.e. one generation evaluation must last less then 5 seconds.
Memory requirements are increased, as we need to store a contiguous copy of the whole population before transferring it to the GPU memory. Thus you may need to increase the Java Virtual Machine available memory (e.g. -Xms1000M -Xmx1000M command line parameters for the Sun JVM). We also recommend using -XX:+AggressiveHeap -XX:NewSize=100M to speed up breeding of large populations.
This code is distributed on an "as is" basis, without any warranty of suitability for any problem. In particular, the code is not well documented.
The code has been tested and successfully passed several thousands GP runs notably with populations ranging from 100 to 200,000 and fitness cases ranging from 20 to 100,000 ; thus you should probably be able to run your own experiments. The authors intend to improve the code and documentation, as far as our administrative authorities will leave us some time available for research and study (i.e. don't expect too much).
Comments (if possible constructive ;-) or questions can be sent to: Denis.Robilliard at lisic.univ-littoral.fr
by Denis Robilliard, Virginie Marion-Poty, Cyril Fonlupt - Université du Littoral-Côte d'Opale, Calais, France.
This
work was supported by European InterregIIIA, 182b project
This web page presents a package named GPURegression that implements a population parallel scheme for Genetic Programming where the evaluation of individuals is performed on a NVidia G80 graphics processing unit. This parallel scheme was presented at the 11th European Conference on Genetic Programming (Euro'GP 2008, Naples, Italy), and the paper can be downloaded here.
The GPURegression package is based on Sean Luke's ECJ library, it allows to benefit from the high computing power of modern parallel graphics hardware based on the Nvidia G80 graphics processors family (e.g. GeForce 8x00 graphics card series). GP individuals are evaluated in parallel on the graphics card using an interpreter written in the Cuda language, while the breeding and selection is done on the CPU in the standard ECJ framework.
On a 8800 GTX card, speedups up to 40 times faster than ECJ running on an Intel Core 2 6600 @ 2.40 GHz can be observed for regression problem, this amounts up to 770 million GP operations per second.
Download and install the ECJ library, version 18.
Download and install the CUDA Driver, Toolkit and SDK, as explained on the NVidia website (make sure that you get a recent driver).
Download the GPURegression package and decompress it in the ECJ application directory, this will create a directory named gpuregression.
Add this new application directory to the main ECJ Makefile in ECJ root directory, (typically add a line ec/app/gpuregression/*.java\ and ec/app/gpuregression/func/*.java\ under the DIRS = \ line in the Makefile) , and execute the make command, as the java classes must be generated before step 7.
If you have more than one G80 card, edit the regression.cu file and change the parameter of the cudaSetDevice() call, in order to suit with the number of the graphics card that you want to target.
Verify the path to the Cuda SDK in the Makefile in the gpuregression directory, and adapt the "javah" invocation to suit your configuration (the default provided is the Sun SDK "javah" syntax)
Run make in the gpuregression directory to compile the ".cu" files associated to the tutorial problem into a library.
Go to the ECJ root directory and run the tutorial problem via the .params file, specifying the library path for both the Cuda runtime and the problem dependent library, e.g. with Sun SDK java:
java -Djava.library.path=/usr/local/cuda/lib/:ec/app/gpuregression -cp ./ ec.Evolve -file ec/app/gpuregression/cudaregression.params
The interpreter is a stack based postfix interpreter (also known as Reverse Polish Notation or RPN), that is operand(s) are read first and pushed on the stack, then the operator is read and interpreted, popping its argument(s) and pushing the result on the stack.
Breeding/evolutionary operators are performed as usual in ECJ, then evaluation is managed by class CudaEvaluator. First the GP individuals are parsed and translated into RPN and they are copied in a contiguous chunk of memory. Then the evalPopChunkGPU method of the problem class is called (e.g. CudaRegression.java). This method transfers the population, fitness cases and related data to the host cuda code (e.g. regression.cu) using the Java Native Interface (JNI). The host cuda code performs the actual transfer to the GPU and calls the interpreter (e.g. regressionKernel.cu).
In case you define your own functions (in the func subdirectory), the classes must define a public float postFixed() method that returns the opcode associated to the function. The opcode value should be defined in GPUCommonDefs.java and the opcode must of course be processed by the cuda interpreter in file regressionKernel.cu.
The package works only with graphics card in the NVidia G80 family (e.g. 8800GTX, 8600 GT, ...)
The code is written using single precision (float) type.
The size of the population is limited to 65535 individuals.
The size of the whole population array must be specified in the parameter file (parameter eval.problem.progssize) and thus must be roughly evaluated before the run.
The package does not support multi-threading for the ECJ/Java part.
Only one subpopulation is allowed.
Old graphics driver versions may/will halt with a core dump, or hang, or can even hang the system if you run your experiments on the same graphics card that run the system desktop. Recent driver are much more stable.
If experiments are run on the same graphics card that runs the system desktop, calculations that last for more than 5 seconds will be killed by the desktop watchdog. However, unless the fitness function is very costly, this is unlikely to happen because each generation is computed as an independent GPU call, and thus usually lasts less than 5 seconds.
Memory requirements are increased, as we need to store a contiguous copy of the whole population before transferring it to the GPU memory. Thus you may need to increase the Java Virtual Machine available memory (e.g. -Xms1000M -Xmx1000M command line parameters for the Sun JVM).
This code is distributed on an "as is" basis, without any warranty of suitability for any problem. In particular, the code is not well documented.
Disclaimer : all registered brand and products name mentioned on this web page are owned by their respective proprietors.