Hello JOCL

先日Scala2.9.0RC1が出ましたね。IDEも良くなってきて、IDEに頼って生きてきた身としては有り難い限りです。


さて、こちらも少し前の話ですが、JogAmpのJOGL/JOCL/JOALの2.0RCが出ています。
JOGLは少し触ってたのですが、他は見てなかったのでこれを機に触ってみます。まずJOCL。

JOCLとは

"Java binding for the OpenCL"ということで、OpenCLをJavaから使うためのライブラリです。

OpenCL(オープンシーエル、Open Computing Language)は、OpenCL C言語による、マルチコアCPUやGPU、Cellプロセッサ、DSPなどによる異種混在の計算資源を利用した並列コンピューティングのためのフレームワークである。用途には高性能計算サーバやパーソナルコンピュータのシステムのほか、携帯機器などでの利用も想定されており、組み込みシステム向けに必要条件を下げたOpenCL Embedded Profileが存在する。

OpenCL - Wikipedia

もちろんScalaからも使えます。

準備

JogAmpのHow to build JOCLを見てビルドするか、ビルド済みのものが置いてあるのでそれを持ってきましょう。
必要なファイルを以下のような感じでプロジェクトに配置します。

  • path/to/project
    • lib
      • jogamp
        • gluegen-rt.jar
        • jocl.jar
    • lib_native
      • macosx_x86_64
        • libgluegen-rt.jnilib
        • libjocl.dylib
      • windows_amd64
        • gluegen-rt.dll
        • jocl.dll
      • windows_x86
        • gluegen-rt.dll
        • jocl.dll

Hello JOCL

では、JogampWikiのJOCLチュートリアルを元にしてScalaで書いてみます。
配列A,B,Cがあって、C(n) = A(n) + B(n) を並列に実行するというものです。

実行結果

上記のJOCLを利用したものと、Scala2.9のParallel Collectionで同様の処理を行ったもの、ベタに線形処理したものをペタリ。

その1
[info] created CLContext [id: 4493152704, platform: Apple, profile: FULL_PROFILE, devices: 1]
[info] using CLDevice [id: 16909312 name: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz type: CPU profile: FULL_PROFILE]
[info] used device memory: 16MiB
[info] localWorkSize: 1, globalWorkSize: 1444477
[info] a+b=c results snapshot: 
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more
[info] computation took  1: 199454 micro sec
[info] computation took  2: 5878 micro sec
[info] computation took  3: 5936 micro sec
[info] computation took  4: 5919 micro sec
[info] computation took  5: 5926 micro sec
[info] computation took  6: 6737 micro sec
[info] computation took  7: 5873 micro sec
[info] computation took  8: 6457 micro sec
[info] computation took  9: 5999 micro sec
[info] computation took 10: 5835 micro sec

[info] Parallel - availableProcessors: 4
[info] a+b=c results snapshot: 
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more
[info] computation took: 137413 micro sec
[info] computation took: 10175 micro sec
[info] computation took: 10248 micro sec
[info] computation took: 10013 micro sec
[info] computation took: 8839 micro sec
[info] computation took: 10014 micro sec
[info] computation took: 9133 micro sec
[info] computation took: 8972 micro sec
[info] computation took: 8906 micro sec
[info] computation took: 8920 micro sec

[info] Linear
[info] a+b=c results snapshot: 
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more
[info] computation took: 78716 micro sec
[info] computation took: 22134 micro sec
[info] computation took: 14659 micro sec
[info] computation took: 14648 micro sec
[info] computation took: 14767 micro sec
[info] computation took: 14769 micro sec
[info] computation took: 14710 micro sec
[info] computation took: 14743 micro sec
[info] computation took: 14723 micro sec
[info] computation took: 14905 micro sec
その2
[info] created CLContext [id: 107566480, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1]
[info] using CLDevice [id: 107566400 name: GeForce GTX 470 type: GPU profile: FULL_PROFILE]
[info] used device memory: 16MiB
[info] localWorkSize: 256, globalWorkSize: 1444608
[info] a+b=c results snapshot:
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444598 more
[info] computation took  1: 230009 micro sec
[info] computation took  2: 6853 micro sec
[info] computation took  3: 6605 micro sec
[info] computation took  4: 6697 micro sec
[info] computation took  5: 6668 micro sec
[info] computation took  6: 6846 micro sec
[info] computation took  7: 6490 micro sec
[info] computation took  8: 5926 micro sec
[info] computation took  9: 5903 micro sec
[info] computation took 10: 5843 micro sec

[info] Parallel - availableProcessors: 8
[info] a+b=c results snapshot:
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more
[info] computation took: 113953 micro sec
[info] computation took: 5854 micro sec
[info] computation took: 5863 micro sec
[info] computation took: 6205 micro sec
[info] computation took: 6049 micro sec
[info] computation took: 6591 micro sec
[info] computation took: 6518 micro sec
[info] computation took: 6511 micro sec
[info] computation took: 7761 micro sec
[info] computation took: 6342 micro sec

[info] Linear
[info] a+b=c results snapshot:
[info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more
[info] computation took: 60240 micro sec
[info] computation took: 18839 micro sec
[info] computation took: 19091 micro sec
[info] computation took: 19181 micro sec
[info] computation took: 18770 micro sec
[info] computation took: 18772 micro sec
[info] computation took: 18889 micro sec
[info] computation took: 18821 micro sec
[info] computation took: 19142 micro sec
[info] computation took: 20197 micro sec

JOGLとの連携

OpenGL/OpenCLは互いのメモリにアクセスできますが、JOGL/JOCLでも相互にやりとりできるようです。
次はこれをやってみます。