CUDA †$ sudo yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo $ sudo yum clean all $ sudo yum -y install nvidia-driver-latest-dkms cuda $ sudo yum -y install cuda-drivers $ cat /usr/local/cuda/version.txt CUDA Version 11.0.228 でバージョンを確認できる。 cuDNN †https://qiita.com/tatsuya11bbs/items/70205b070c7afd7dd651 に従う。NVIDIAへのユーザー登録が必要。 https://developer.nvidia.com/rdp/cudnn-download 2020年2月時点ではtarボールを使う必要があった。 $ cp cudnn-10.2-linux-x64-v7.6.5.32.tgz /usr/local/ $ cd /usr/local/ $ tar zxvf cudnn-10.2-linux-x64-v7.6.5.32.tgz 2020年8月時点ではRPMファイルが利用可能。 $ rpm -ivh libcudnn8-8.0.2.39-1.cuda11.0.x86_64.rpm $ rpm -ivh libcudnn8-devel-8.0.2.39-1.cuda11.0.x86_64.rpm $ rpm -ivh libcudnn8-doc-8.0.2.39-1.cuda11.0.x86_64.rpm しろと、https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-linux に書いてある。 Pyenv †$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv Add the following three lines to ~/.bashrc export PYENV_ROOT=$HOME/.pyenv export PATH=$PYENV_ROOT/bin:$PATH eval "$(pyenv init -)" Anaconda †$ pyenv install --list $ pyenv install anaconda3-2019.10 $ pyenv versions * system (set by /home/oda/.pyenv/version) anaconda3-2019.10 $ pyenv global anaconda3-2019.10 system * anaconda3-2019.10 (set by /home/oda/.pyenv/version) Keras †$ pip install --upgrade pip $ pip install keras 2020年8月に更新したら、2.4.3になった。 $ pip install -U keras Successfully installed keras-2.4.3 TensorFlow †$ pip install --upgrade pip $ pip install tensorflow $ pip install tensorflow-gpu $ python Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf 2020-01-30 12:32:29.462468: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory 2020-01-30 12:32:29.462551: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory 2020-01-30 12:32:29.462562: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. >>> TensorRT †libnvinfer.so.6を用意するために、TensorRTを導入しようとしてみた。 https://developer.nvidia.com/tensorrt cp TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz /usr/local/ cd /usr/local/ tar zxvf TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz cd /usr/local/TensorRT-7.0.0.11/lib libnvinfer.so.7はあるが、libnvinfer.so.6はないので、 ln -s libnvinfer.so libnvinfer.so.6 ln -s libnvinfer_plugin.so libnvinfer_plugin.so.6 としてみた。 /.bashrc export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 を追加して、glibcのバージョンを2.17から2.27に変更とかしようとしたけれど、つまづく。 TensorFlow2.0.1 †TensorFlow2.1.0でのlibnvinfer.so.6の問題は https://github.com/tensorflow/tensorflow/issues/35968 で10日前に報告されているので、対処は諦め、バージョンを下げた。 $ pip install tensorflow==2.0.1 $ pip install tensorflow-gpu==2.0.1 $ python Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> TensorFlowはインストールできた。しかし、 https://thr3a.hatenablog.com/entry/20180113/1515820265 に従って確認すると、 >>> from tensorflow.python.client import device_lib >>> device_lib.list_local_devices() 2020-01-30 14:21:30.013092: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-01-30 14:21:30.022186: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2020-01-30 14:21:30.022941: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b1c0f6b170 executing computations on platform Host. Devices: 2020-01-30 14:21:30.022982: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2020-01-30 14:21:30.024665: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-01-30 14:21:30.025440: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination 2020-01-30 14:21:30.025468: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: localhost.localdomain 2020-01-30 14:21:30.025478: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: localhost.localdomain 2020-01-30 14:21:30.025514: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.33.1 2020-01-30 14:21:30.025543: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.44.0 2020-01-30 14:21:30.025553: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 440.44.0 does not match DSO version 440.33.1 -- cannot find working devices in this configuration [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 15191092083718892008 , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 9175125806567128648 physical_device_desc: "device: XLA_CPU device" ] となるので、GPUは認識されていない。。 ただ再起動したら、認識してくれた。 $ python Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> from tensorflow.python.client import device_lib >>> device_lib.list_local_devices() 2020-01-30 15:02:51.584089: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-01-30 15:02:51.594858: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2020-01-30 15:02:51.595512: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f260a01a0 executing computations on platform Host. Devices: 2020-01-30 15:02:51.595559: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2020-01-30 15:02:51.597305: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-01-30 15:02:51.664145: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-01-30 15:02:51.664450: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f260c30d0 executing computations on platform CUDA. Devices: 2020-01-30 15:02:51.664470: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2060 SUPER, Compute Capability 7.5 2020-01-30 15:02:51.664603: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-01-30 15:02:51.664822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2060 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.83 pciBusID: 0000:01:00.0 2020-01-30 15:02:51.664906: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.664958: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.665010: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.665059: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.665108: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.665158: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 2020-01-30 15:02:51.667833: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-01-30 15:02:51.667848: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2020-01-30 15:02:51.667868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-01-30 15:02:51.667878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2020-01-30 15:02:51.667885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 6508581553371486294 , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 1987290959829944833 physical_device_desc: "device: XLA_CPU device" , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 7094331008891992157 physical_device_desc: "device: XLA_GPU device" ] >>> しかし、良く見るとWarningが出ていて、ライブラリが見つからないから、GPUはつかえないとのこと。 2020-01-30 16:06:10.616708: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64:/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64 CUDA 10.2を入れているが、CUDA 10.0を要求しているようで、 yum --showduplicates search cuda で利用可能なバージョンを調べて、 yum install cuda-cublas-10-0-10.0.130-1.x86_64 yum install cuda-cufft-10-0-10.0.130-1.x86_64 yum install cuda-curand-10-0-10.0.130-1.x86_64 yum install cuda-cusolver-10-0-10.0.130-1.x86_64 yum install cuda-cusparse-10-0-10.0.130-1.x86_64 yum install cuda-cudart-10-0-10.0.130-1.x86_64 でインストールし、 /.bashrc で export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64:/usr/local/cuda-10.0/lib64/ とした。すると、Warningは消えて、 tensorflowのジョブを走らせている時、 $ nvidia-smi Thu Jan 30 17:53:44 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A | | 0% 52C P2 44W / 215W | 7692MiB / 7982MiB | 4% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 18243 C ...v/versions/anaconda3-2019.10/bin/python 7681MiB | +-----------------------------------------------------------------------------+ となり、使われているようだ。 TesnorFlow 2.3の導入 (2020年8月) †$ pip install -U tensorflow==2.3.0 $ pip install -U tensorflow-gpu==2.3.0 $ python >>> import tensorflow as tf 2020-08-19 15:46:14.339401: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2020-08-19 15:46:14.339448: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. CUDA 11.0を入れているが、CUDA 10.1を要求しているようで、 yum --showduplicates search cuda で利用可能なバージョンを調べて、 $ su yum install cuda-cudart-10-1-10.1.243-1.x86_64 でインストールし、 /.bashrc で export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-7.0.0.11/lib:/usr/local/cuda/lib64:/usr/local/cuda-10.1/lib64/ としたら、warningは消えた。 バージョンの確認 †$ python -c 'import tensorflow as tf; print(tf.__version__)' 2020-08-19 16:01:35.100518: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2.3.0 テストジョブ †
TensorFlowのテストジョブ †https://www.amazon.co.jp/gp/product/4839969515/ に基づいた、テストジョブ。 https://github.com/yusugomori/deeplearning-keras-tf2-torch 問題点 †RNNを使おうとするとcuDNN周りでこけた。
やりたかったこと †Autoencoder †
制約ボルツマンマシン †KerasでModelとParameterをLoad/Saveする方法 †https://qiita.com/supersaiakujin/items/b9c9da9497c2163d5a74 Colaboratory †https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja これを使う方がラクか。 Sample programs † |