Hello JOCL
先日Scala2.9.0RC1が出ましたね。IDEも良くなってきて、IDEに頼って生きてきた身としては有り難い限りです。
さて、こちらも少し前の話ですが、JogAmpのJOGL/JOCL/JOALの2.0RCが出ています。
JOGLは少し触ってたのですが、他は見てなかったのでこれを機に触ってみます。まずJOCL。
JOCLとは
"Java binding for the OpenCL"ということで、OpenCLをJavaから使うためのライブラリです。
OpenCL(オープンシーエル、Open Computing Language)は、OpenCL C言語による、マルチコアCPUやGPU、Cellプロセッサ、DSPなどによる異種混在の計算資源を利用した並列コンピューティングのためのフレームワークである。用途には高性能計算サーバやパーソナルコンピュータのシステムのほか、携帯機器などでの利用も想定されており、組み込みシステム向けに必要条件を下げたOpenCL Embedded Profileが存在する。
OpenCL - Wikipedia
もちろんScalaからも使えます。
準備
JogAmpのHow to build JOCLを見てビルドするか、ビルド済みのものが置いてあるのでそれを持ってきましょう。
必要なファイルを以下のような感じでプロジェクトに配置します。
Hello JOCL
では、JogampWikiのJOCLチュートリアルを元にしてScalaで書いてみます。
配列A,B,Cがあって、C(n) = A(n) + B(n) を並列に実行するというものです。
実行結果
上記のJOCLを利用したものと、Scala2.9のParallel Collectionで同様の処理を行ったもの、ベタに線形処理したものをペタリ。
その1
[info] created CLContext [id: 4493152704, platform: Apple, profile: FULL_PROFILE, devices: 1] [info] using CLDevice [id: 16909312 name: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz type: CPU profile: FULL_PROFILE] [info] used device memory: 16MiB [info] localWorkSize: 1, globalWorkSize: 1444477 [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more [info] computation took 1: 199454 micro sec [info] computation took 2: 5878 micro sec [info] computation took 3: 5936 micro sec [info] computation took 4: 5919 micro sec [info] computation took 5: 5926 micro sec [info] computation took 6: 6737 micro sec [info] computation took 7: 5873 micro sec [info] computation took 8: 6457 micro sec [info] computation took 9: 5999 micro sec [info] computation took 10: 5835 micro sec [info] Parallel - availableProcessors: 4 [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more [info] computation took: 137413 micro sec [info] computation took: 10175 micro sec [info] computation took: 10248 micro sec [info] computation took: 10013 micro sec [info] computation took: 8839 micro sec [info] computation took: 10014 micro sec [info] computation took: 9133 micro sec [info] computation took: 8972 micro sec [info] computation took: 8906 micro sec [info] computation took: 8920 micro sec [info] Linear [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more [info] computation took: 78716 micro sec [info] computation took: 22134 micro sec [info] computation took: 14659 micro sec [info] computation took: 14648 micro sec [info] computation took: 14767 micro sec [info] computation took: 14769 micro sec [info] computation took: 14710 micro sec [info] computation took: 14743 micro sec [info] computation took: 14723 micro sec [info] computation took: 14905 micro sec
その2
[info] created CLContext [id: 107566480, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1] [info] using CLDevice [id: 107566400 name: GeForce GTX 470 type: GPU profile: FULL_PROFILE] [info] used device memory: 16MiB [info] localWorkSize: 256, globalWorkSize: 1444608 [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444598 more [info] computation took 1: 230009 micro sec [info] computation took 2: 6853 micro sec [info] computation took 3: 6605 micro sec [info] computation took 4: 6697 micro sec [info] computation took 5: 6668 micro sec [info] computation took 6: 6846 micro sec [info] computation took 7: 6490 micro sec [info] computation took 8: 5926 micro sec [info] computation took 9: 5903 micro sec [info] computation took 10: 5843 micro sec [info] Parallel - availableProcessors: 8 [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more [info] computation took: 113953 micro sec [info] computation took: 5854 micro sec [info] computation took: 5863 micro sec [info] computation took: 6205 micro sec [info] computation took: 6049 micro sec [info] computation took: 6591 micro sec [info] computation took: 6518 micro sec [info] computation took: 6511 micro sec [info] computation took: 7761 micro sec [info] computation took: 6342 micro sec [info] Linear [info] a+b=c results snapshot: [info] 116.87144, 96.54628, 165.53079, 148.23895, 147.75995, 30.946493, 95.10199, 100.24969, 92.733475, 55.860107, ...; 1444467 more [info] computation took: 60240 micro sec [info] computation took: 18839 micro sec [info] computation took: 19091 micro sec [info] computation took: 19181 micro sec [info] computation took: 18770 micro sec [info] computation took: 18772 micro sec [info] computation took: 18889 micro sec [info] computation took: 18821 micro sec [info] computation took: 19142 micro sec [info] computation took: 20197 micro sec
JOGLとの連携
OpenGL/OpenCLは互いのメモリにアクセスできますが、JOGL/JOCLでも相互にやりとりできるようです。
次はこれをやってみます。