Download - Java gpu computing

Transcript
Page 1: Java gpu computing

Java GPU Computing

Maarten Steur & Arjan Lamers

Page 2: Java gpu computing

● Overzicht OpenCL● Simpel voorbeeld ● Casus● Tips & tricks● Vragen

Page 3: Java gpu computing

Waarom GPU Computing

Page 4: Java gpu computing

Afkortingen

● CPU, GPU, APU● Khronos: OpenCL, OpenGL● Nvidia: CUDA● JogAmp JOCL, JavaCL, JOCL

Page 5: Java gpu computing

GPU vergeleken met CPU● Veel simpele cores● Veel high bandwidth geheugen

●Intel core i7 GeForce GT 650M

8 cores 384 cores

180 Gflops 650 Gflops

Page 6: Java gpu computing

Programmeer model

● Definieer stream (flow)

● Run in parallel

Page 7: Java gpu computing

Gebruik

● Algorithme:– Hoge Concurrency

– Partitioneerbaar

● Maar:– Extra latency door on- en offloaden op

de GPU

– Extra complexiteit

Page 8: Java gpu computing

Componenten

Page 9: Java gpu computing

Componenten

Page 10: Java gpu computing

Voorbeeld (MacBook Pro)Platform name: ApplePlatform profile: FULL_PROFILEPlatform spec version: OpenCL 1.2Platform vendor: Apple

Device 16925696 HD Graphics 4000Driver:1.2(Aug 17 2014 20:29:07)Max work group size:512Global mem size: 1073741824Local mem size: 65536Max clock freq: 1200Max compute units: 16

Device 16918272 GeForce GT 650MDriver:8.26.28 310.40.55b01Max work group size:1024Global mem size: 1073741824Local mem size: 49152Max clock freq: 900Max compute units: 2

Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHzDriver:1.1Max work group size:1024Global mem size: 17179869184Local mem size: 32768Max clock freq: 2600Max compute units: 8

Page 11: Java gpu computing

Work & Memory

Page 12: Java gpu computing

Application / Kernel

● Schrijf .cl files in C variant● Kernels zijn de 'publieke' functies

● Java Bytecode – Aparapi (OpenCL)

– RootBeer (CUDA)

Page 13: Java gpu computing

Disclaimer

Page 14: Java gpu computing

Parallel sort

kernel void sort(global const float* in, global float* out, int size) { int i = get_global_id(0); // current thread float id = in[i]; int pos = 0; for (int j=0;j<size;j++) { float jd = in[j];

// in[j] < in[i] ? bool smaller = (jx < ix) || (jx == ix && j < i);

pos += (smaller)?1:0; } out[pos] = id;}

Page 15: Java gpu computing

Java GPU Computing

CLContext globalContext = CLContext.create();

CLDevice device = globalContext.getMaxFlopsDevice(Type.GPU);

CLContext context = CLContext.create(device);

CLCommandQueue queue = device.createCommandQueue();

CLProgram program = context.createProgram(

First8GpuComputing.class.getResourceAsStream("MyTask.cl")).build();

Je kunt ook builden voor specifieke devices: build(device)

Page 16: Java gpu computing

Java GPU ComputingCLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer(

input.length , READ_ONLY);

CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer(

input.length, WRITE_ONLY);

mapToBuffer(inBuffer.getBuffer(), workLoad);

Page 17: Java gpu computing

Java GPU ComputingCLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer(

input.length , READ_ONLY);

CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer(

input.length, WRITE_ONLY);

mapToBuffer(inBuffer.getBuffer(), workLoad);

CLKernel kernel = program.createCLKernel("MyTask");

kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);

Page 18: Java gpu computing

Java GPU ComputingCLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer(

input.length , READ_ONLY);

CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer(

input.length, WRITE_ONLY);

mapToBuffer(inBuffer.getBuffer(), workLoad);

CLKernel kernel = program.createCLKernel("MyTask");

kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);

queue.putWriteBuffer(inBuffer, false)

.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)

.putReadBuffer(outBuffer, true);

FloatBuffer output = outBuffer.getBuffer();

Page 19: Java gpu computing

Praktijkcasus

Page 20: Java gpu computing

Praktijk casus

● Rekeninstrument ter ondersteuning van de Programmatische Aanpak Stikstof.

● http://www.aerius.nl

Page 21: Java gpu computing

Praktijk casus

Page 22: Java gpu computing

Praktijk casus

Page 23: Java gpu computing

Tips & tricks

● CL beheer– getResourceAsStream()?

– Java constanten → #define

– Locale? Oops!

Page 24: Java gpu computing

Tips & tricks

● Unit testen– Aparte test kernels

– Test cases in batches

kernel void testDifficultCalculation(const int testCount, global const double* distance, global double* results) {

const int testId = get_global_id(0); if (testId < testCount) { results[testId] = difficultCalculation(distance[testId]); }}

Page 25: Java gpu computing

Direct memory management

● -XX:MaxDirectMemorySize=??M● ByteBuffer.allocateDirect(int capacity)

– Max 2GB per buffer

● Garbage collection te laat– Getriggered door heap collection

– Handmatig vrijgeven

– ((sun.nio.ch.DirectBuffer) myBuffer).cleaner().clean();

● VisualVM plugin voor direct buffers

Page 26: Java gpu computing

GPU vs CPU

● GPU's checken minder dan CPU's– Div by zero

– Out of bounds checks

– Test eerst op CPU

Page 27: Java gpu computing

Portabiliteit

● OpenCL is portable, de performance niet

– Memory sizes verschillen

– Memory latencies verschillen

– Work group sizes verschillen

– Compute devices verschillen

– OpenCL implementatie verschillen

● Develop dus voor de productie hardware

Page 28: Java gpu computing

Ten slotte

● Float vs Double– Dubbele precisie

– Halve performance

– Double support optioneel

Page 29: Java gpu computing

Conclusie

Page 30: Java gpu computing

Conclusie

● Wanneer te gebruiken?– Als performance echt nodig is

– Als probleem hoge concurrency heeft

– Als probleem partitioneerbaar is

Page 31: Java gpu computing

Vragen?Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHzWarming up OpenCL test[thread 32003 also had an error][thread 33027 also had an error]

## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV[thread 32515 also had an error] (0xb)[thread 32771 also had an error][thread 32259 also had an error] at pc=0x00000001250ded70, pid=99851, tid=29475## JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops)# Problematic frame:# [thread 17415 also had an error]C [cl_kernels+0x1d70] sort_wrapper+0x1b0## Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again## An error report file with more information is saved as:# /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log[thread 31763 also had an error]## If you would like to submit a bug report, please visit:# http://bugreport.sun.com/bugreport/crash.jsp#