Python Environment and Basics

System Preparation

Install NVIDIA 2080Ti (driver + cuda)

# if need remove old driver
sudo apt-get purge nvidia*
sudo apt autoremove

#add ppa
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
#install driver version 440
sudo apt install nvidia-driver-440
sudo reboot

#Cuda10.1 if the system is ubuntu 18
cd ~/Downloads
wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run
sudo sh cuda_10.1.168_418.67_linux.run
# if need remove cuda
sudo apt-get purge nvidia-cuda-toolkit
sudo apt-get purge --auto-remove nvidia-cuda-toolkit

# for ubuntu 20.04,  you can directly use the following command
sudo apt update
sudo apt install nvidia-cuda-toolkit

# (previous) Cuda10.0
# cd Downloads
# wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
# sudo sh cuda_10.0.130_410.48_linux.run

Check CUDA version

  • Go to cuda folder
    cd /usr/local/cuda/
  • Open "version.txt"
    vi version.txt
  • Exit
    Press ESC, then input : q. Finally, press Enter to exit

    Solve Error: NVCC is not found

    export LD_LIBRARY_PATH=/usr/local/cuda/lib64
    export PATH=$PATH:/usr/local/cuda/bin

GPU usage

watch -d -n 0.5 nvidia-smi

to find the processes that use the GPU

sudo fuser -v /dev/nvidia*
# kill it
kill -9 [PID NUMBER]

Control GPU Visibility of PyTorch

CUDA_VISIBLE_DEVICES=1,2 python myscript.py

Check GPU number in python

import torch
torch.cuda.device_count()

Visualization of training process via Tensorboard

Record loss during training

# if use PyTorch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir=None,comment=HBPN)

# visualize models
dummy_input = torch.rand(32,3,48,48)
writer.add_graph(Encoder,(dummy_input,))

# record losses at each epoch
writer.add_scalars('Loss/Epoch', {'epoch_loss':epoch_loss,
                                      'epoch_stage_1':epoch_stage_1,
                                      'epoch_stage_2':epoch_stage_2,
                                      }, epoch)
writer.close()
#-----------------------------------------------------------------------

# if use TensorFlow :: revise it later
import tensorflow as tf
writer = tf.summary.FileWriter('/save_path/',graph)

# record losses at each epoch
test_writer.add_summary(summary, i)

Visualization

# open terminal and go to the project directory
tensorboard --logdir=runs
# open the returned link 

Docker

# Docker IDE
docker pull portainer/portainer
docker volume create portainer_data
docker run -d -p 9000:9000 --restart=always --name portainer -v /var/run/docker.sock:/var/run/docker.sock portainer/portainer

# GUI to outside
sudo nvidia-docker run --rm -ti --ipc=host --volume=$HOME/.Xauthority:/root/.Xauthority:rw --net=host --env=DISPLAY --name Liwen_Demo -v ~/Demo_Liwen:/Demo_Liwen demo_liwen_v1

# save image
docker commit [container]  mymysql:v1
docker save [image] > busybox.tar
docker load < busybox.tar

Virtual Environment

Python Environment: Conda Installation and Basic Usage

# Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

# Create a new environment, activation and deactivation
conda create -n myenv python=3.5
conda activate myenv
conda deactivate

# conda show created environments
conda env list

# In a virtual environment, show the path of python
which python

# List installed packages
conda list

# Remove an environment
conda env remove --name myenv

# Install a list of packages from requirements.txt (using conda)
conda install --yes --file requirements.txt
# OR through pip
pip install -r requirements.txt

# output the env's dependency to requirements.txt
pip freeze > requirements.txt

# use bash to enther the conda enviroment, it needs init first
#!/bin/bash
export PATH=/root/miniconda3/bin/:$PATH
eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"

Install Tenserflow 1.* with GPU

To take tensorflow 1.14 for example, it relies on cuda=10.0 and cudnn=7.4 (7.6 is ok). For the dependencies of other version of tensorflow, check the link.
Before installation, we need to check whether the GPU driver support cuda 10 (use "nvidia-smi" see the driver version, and supported CUDA).

# Create a new environment, and activate it
conda create -n tf14 python=3.7
conda activate tf14

conda install cudatoolkit=10.0
conda install -c annaconda cudnn
conda install tensorflow-gpu=1.14

# test the GPU visibility
python
import tensorflow as tf
tf.test.is_gpu_available() # to check whether there is a GPU device

Python Packages Control and Basic Usage

Python add Environment Variable

os.environ["VARIABLE_NAME"] = "VALUE"

Install some package

# ImportError: No module named 'past'
pip install future

# Install OpenCV
pip install opencv-python 

# Install skimage
pip install scikit-image

Python get current path

# get current path
import os
curPath =  os.getcwd()

# add dictionary to path
import sys
sys.path
sys.path.append('/path/to/the/example_file.py')

make directory if not exist

 import os
 directory =  Floder
 try:
     os.stat(directory)
 except:
     os.mkdir(directory) 

List all files of a folder

import glob

path = 'WHICH_PATH'
files = [f for f in glob.glob(path + "**/*.jpg", recursive=True)]

for f in files:
    print(f)

Format transfer between: OpenCV and PIL

# OpenCV to PIL
image = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))

# PIL to OpenCV
img = cv2.cvtColor(numpy.asarray(image),cv2.COLOR_RGB2BGR) 

PIL image cannot show during debug

Install essential package

sudo apt-get install imagemagick

Training on Multi-GPUs but testing on a single GPU

state_dict = torch.load('myfile.pth.tar')
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove `module.`
    new_state_dict[name] = v
# load params
model.load_state_dict(new_state_dict)

Screen: to keep your work running, even stop the ssh connection.

Start a session with session_name

screen -S session_name

Detach from Linux Screen Session: Ctrl + a + d

To find the session ID list the current running screen sessions with:

screen -ls

If you want to restore screen

screen -r [name or id]

kill a screen session with session ID

screen -X -S [session ID] kill

Experiment Control

Running time

import time

tStart = time.time()
print("--- %s seconds ---" % (time.time() - tStart))

Pytorch GPU Control

Control GPU usage. If we have two GPU and want to use them all.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

Video Technology

Video and Image Sequence Transfer

resize video

ffmpeg -i video1_trim.mp4 -vf scale=1036:540 video_1.mp4

video to image sequence

ffmpeg -i video_1.mp4 -vf fps=30 vd_1/%04d.png

image sequence to video

cat *.png | ffmpeg -f image2pipe -i - video_1.mp4

Leave a Reply