Chinese Colab Cookbook [transl.]

With Colaboratory (referred to as Colab), you can write and execute Python code in the browser, and: without any configuration, free use of GPU, easy to share.

Most people first used Colab to squeeze Google’s wool for free. Colab was born around 18 years ago. The most resounding slogan is the free GPU/TPU. Free computing power is waiting for you. After all, it is a habit. game. Sure enough, at the beginning of 2020, the Colab Pro membership policy started, paying to allocate better computing power (V100/T4/K80 gives priority to V100 XD), longer code continuous calculation time (general user code runs 12H automatically), and Balabala’s differentiated solution. Besides, if you run large-scale algorithms on Colab, whether the data set behind it is placed on Drive or GCS, it will be an additional cost. By the way, continue to explore potential users, and you will get multiple results!

“Difficulty of migration” Code farmers who are already familiar with Jupyter should be no strangers to the interface of Colab. The difficulty of migration is almost 0. The explanation line of Colab’s configuration can switch between Py2/Py3 and TF1.x/2.x with one line of code, which is a very comfortable user experience.

“Common scenarios” Just talk about my environment: use a host equipped with GPU at a fixed worksite, the TPU and GPU supported by TFRC on the cloud are available, the data part is in the remote NAS part in the GCS, and the data that is not frequently used is placed in the cold-line. In summary, when you leave your workplace and need to collaborate or show your needs or try a code snippet you are not familiar with, use it as a cloud sharing Jupyter.

Colab cannot correctly display Chinese labels/Chinese garbled characters when drawing Matplotlib

Calling Matplotlib on Colab to display Chinese and displaying garbled characters, Chinese fonts cannot be found, you need to manually install Chinese fonts to Colab.

  1. First check the local fonts in the virtual machine:

By executing the !fc-list :lang=zh command, the result is found to be empty, indicating that the Colab virtual machine Ubuntu operating system does not support Chinese fonts.

  1. Find and download and install the fonts you need to use:

The font file ends with .otf or .ttf, SimHei download link

Ubuntu’s font directory is located at /usr/share/fonts/truetype.

Colab calls and executes system commands with commands starting with !.

1
2
3
4
5
# Download--Unzip--Move the font file
!wget "https://www.wfonts.com/download/data/2014/06/01/simhei/simhei.zip"
!unzip "simhei.zip"
!rm "simhei.zip"
!mv SimHei.ttf /usr/share/fonts/truetype/
  1. Test Matplotlib to correctly output and display Chinese characters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import matplotlib.pyplot as plt
%matplotlib inline

import matplotlib.font_manager as fm
path = '/usr/share/fonts/truetype/SimHei.ttf'
fontprop = fm.FontProperties(fname=path, size=13)

figure = plt.figure(figsize=(8,4),dpi=80)
ax = figure.add_axes((0.1, 0.1, 0.8, 0.8))
plt.plot(X_test,y_test, 'c*-', color=(1, 0, 0, 1), linewidth=2.0, label='ground truth')
plt.plot(X_test,test_predict, 'c*-', color=(0, 0, 1, 1), linewidth=2.0, label='predicted value')

plt.title("model results",fontproperties=fontprop)
plt.xlabel("X",fontproperties=fontprop)
plt.ylabel("Y",fontproperties=fontprop)
plt.legend(loc = "lower left",prop=fontprop)
plt.show()

Use the form to interact with Colab variable input

Colab interactive form is a great function, simple and clear, the data in the form is read in the str format, and the actual variables are obtained after further string processing. The effect is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
#@title Visualization of bivariate correlation confusion matrix
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
from sklearn.preprocessing import normalize
import matplotlib.font_manager as fm


sheet_no = '2' #@param ["0", "1", "2"]
sheet_no = int(sheet_no)
norm_choice = '0' #@param ["0", "1"]
norm_choice = bool(int(norm_choice))

Colab Link Google Drive

In the past experiments, the acquisition, storage and loading of a large amount of training and test data has always been a headache. In Colab, we can mount the Google Driver to the current working path:

1
2
3
4
5
from google.colab import drive
drive.mount("/content/drive")

print('Files in Drive:')
!ls /content/drive/'My Drive'

Then create and operate through normal Linux Shell commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Working with files
# Create directories for the new project
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection

!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/train
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/test
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/valid

# Download files
!wget -O /content/drive/'My Drive'/Data/fashion_mnist/train-images-idx3-ubyte.gz http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz

# Download and Unzip files
%env DIR=/content/drive/My Drive/Data/animals/cats_and_dogs

!rm -rf "$DIR"
!mkdir -pv "$DIR"
!wget -O "$DIR"/Cat_Dog_data.zip https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip

# remove existing directories
!(cd "$DIR" && unzip -qqj Cat_Dog_data.zip -d .)

Reference

  1. Welcome to Colaboratory
  2. Cloud Storage Docs
  3. Practice deep learning quickly in Google Colab
  4. External data: local files, Google Drive, Google Sheets, Cloud Storage