在AWS中国深度学习AMI上的简易切换脚本

我们现代社会的一个基本特点就是疏离感
我们拥有最便捷的交流方式
但是却拥有最远的交流距离
人是群居动物谁都不希望自己孤独终老

1. 切换最新 Pytorch-1.7.1+cu110

1
2
3
4
5
6
7
8
9
10
11
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-11.0 /usr/local/cuda
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
sudo rm /var/lib/dpkg/lock
# optional
sudo rm /var/lib/dpkg/lock-frontend
sudo dpkg --configure -a
sudo apt install -y nodejs
# sudo apt install npm
source activate pytorch_latest_p37

2. 更新 JupyterLab 安装深度学习依赖 开启远程连接

1
2
3
4
5
6
7
8
9
10
11
pip install pip -U
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install -U --no-cache jupyterlab ipywidgets pandas openpyxl sweetviz rich pretty_errors
pip install -U --no-cache 'ray[tune]' xgboost pytorch-lightning
pip install -U --no-cache jupyterlab-language-pack-zh-CN
jupyter lab --generate-config
sed -i 's/#c.ServerApp.allow_remote_access = False/c.ServerApp.allow_remote_access = True/g' /home/ubuntu/.jupyter/jupyter_lab_config.py
sed -i "s/#c.ServerApp.ip = 'localhost'/c.ServerApp.ip = '*'/g" /home/ubuntu/.jupyter/jupyter_lab_config.py
sed -i "s/#c.ServerApp.open_browser = False/c.ServerApp.open_browser = False/g" /home/ubuntu/.jupyter/jupyter_lab_config.py
sed -i "s/#c.ServerApp.port = 8888/c.ServerApp.port = 9999/g" /home/ubuntu/.jupyter/jupyter_lab_config.py

3. 验证 gpu 版 pytorch 和 cuda/cudnn 可用性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 若正常则静默
import torch

# 若正常则静默
a = torch.tensor(1.)

# 若正常则返回 tensor(1., device='cuda:0')
print(a.cuda())

# 若正常则静默
from torch.backends import cudnn

# 若正常则返回 True
print(cudnn.is_available())

# 若正常则返回 True
print(cudnn.is_acceptable(a.cuda()))

附录1:CUDA 版本切换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 查看当前 CUDA 版本
nvcc --version

# CUDA 11.0
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-11.0 /usr/local/cuda

# CUDA 10.2
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda

# CUDA 10.1
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda

# CUDA 10.0
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

附录2:Conda 版本切换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Please use one of the following commands to start the required environment with the framework of your choice:
for MXNet(+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) ____________________________________ source activate mxnet_p36
for MXNet(+Keras2) with Python2 (CUDA 10.1 and Intel MKL-DNN) ____________________________________ source activate mxnet_p27
for MXNet(+Amazon Elastic Inference) with Python3 _______________________________________ source activate amazonei_mxnet_p36
for MXNet(+Amazon Elastic Inference) with Python2 _______________________________________ source activate amazonei_mxnet_p27
for MXNet(+AWS Neuron) with Python3 ___________________________________________________ source activate aws_neuron_mxnet_p36
for TensorFlow(+Keras2) with Python3 (CUDA 10.0 and Intel MKL-DNN) __________________________ source activate tensorflow_p36
for TensorFlow(+Keras2) with Python2 (CUDA 10.0 and Intel MKL-DNN) __________________________ source activate tensorflow_p27
for TensorFlow 2(+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________ source activate tensorflow2_p36
for TensorFlow 2(+Keras2) with Python2 (CUDA 10.1 and Intel MKL-DNN) _______________________ source activate tensorflow2_p27
for TensorFlow 2.2 with Python3 (CUDA 10.2 and Intel MKL-DNN) _______________________ source activate tensorflow2_latest_p37
for Tensorflow(+Amazon Elastic Inference) with Python2 _____________________________ source activate amazonei_tensorflow_p27
for Tensorflow(+Amazon Elastic Inference) with Python3 _____________________________ source activate amazonei_tensorflow_p36
for Tensorflow 2(+Amazon Elastic Inference) with Python2 __________________________ source activate amazonei_tensorflow2_p27
for Tensorflow 2(+Amazon Elastic Inference) with Python3 __________________________ source activate amazonei_tensorflow2_p36
for Tensorflow(+AWS Neuron) with Python3 _________________________________________ source activate aws_neuron_tensorflow_p36
for PyTorch 1.4 with Python3 (CUDA 10.1 and Intel MKL) _________________________________________ source activate pytorch_p36
for PyTorch 1.4 with Python2 (CUDA 10.1 and Intel MKL) _________________________________________ source activate pytorch_p27
for PyTorch 1.6 with Python3 (CUDA 10.1 and Intel MKL) __________________________________ source activate pytorch_latest_p36
for PyTorch (+AWS Neuron) with Python3 ______________________________________________ source activate aws_neuron_pytorch_p36
for PyTorch with(+Amazon Elastic Inference) with Python3 _______________________________source activate amazonei_pytorch_p36
for Chainer with Python2 (CUDA 10.0 and Intel iDeep) ___________________________________________ source activate chainer_p27
for Chainer with Python3 (CUDA 10.0 and Intel iDeep) ___________________________________________ source activate chainer_p36
for base Python2 (CUDA 10.0) _______________________________________________________________________ source activate python2
for base Python3 (CUDA 10.0) _______________________________________________________________________ source activate python3

附录3:Conda 环境删除

1
2
3
# 可以删除 conda 环境节约空间
conda env list
conda env remove –-name <env_name>

Reference

  1. What Is the AWS Deep Learning AMI?
  2. PyTorch下 CUDA 和 CuDNN 安装验证程序