Category: Notes

14 Posts

[Reading Notes] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Source Paper: [ICCV'2017] Authors: Xun Huang, Serge Belongie Code: Contributions In this paper, the authors present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. Arbitrary style transfer: takes a content image $C$ and an arbitrary style image $S$ as inputs, and synthesizes an output image with the same content as $C$ and the same syle as $S$. Background Batch Normalization Given a input batch $x \in \mathbb{R}^{N \times C \times H \times W}$, batch normalization (BN) normalizes the mean and standard deviation for each individual feature channel: $$ \mathrm{BN}(x)=\gamma\left(\frac{x-\mu(x)}{\sigma(x)}\right)+\beta $$ where $\gamma , \beta \in \mathbb{R}^{C}$ are affine parameters learned from data. $\mu(x) , \sigma(x) \in \mathbb{R}^{C}$ are mean and standard deviation computed across batch size and spatial dimensions, independently. $$ \mu_{c}(x)=\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{n c h w} $$ $$ \sigma_{c}(x)=\sqrt{\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W}\left(x_{n c h w}-\mu_{c}(x)\right)^{2}+\epsilon} $$ Instance Normalization Original feed-forward stylization method [51] utilizes BN layers after the convolutional layer. Ulyanov et al. [52] found using Instance Normalization…
[Reading Notes] Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Source Authors: Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan YangPaper: [CVPR2020] Contributions It proposes a new knowledge distillation method "Collobrative Distillation" based on the exclusive collaborative relation between the encoder and its decoder. It proposes to restrict the students to learn linear embedding of the teacher's outputs, which boosts its learning. Experimetenal works are done with different stylization frameworks, like WCT and AdaIN. Related Works Style Transfer WCT: Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M. H. (2017). Universal style transfer via feature transforms. arXiv preprint arXiv:1705.08086.AdaIN: Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1501-1510). Model Compression low-rank decomposition pruning quantization knowledge distillationKnowledge distillation is a promising model compression method by transferring the knowledge of large networks (called teacher) to small networks (called student), where the knowledge can be softened probability (which can reflect the inherent class similarity structure known as dark knowledge) or sample relations (which…
How to Schedule a Meeting via Zoom
Step 1: Open "Zoom" software from the Desktop/Start_menu Step 2: Click the “Schedule” at the homepage of Zoom Step 3: Set the basic information of the scheduled meeting Step 4: Copy the invitation link at the homepage of Zoom. Step 5: Paste the link, and send to others An example:
Scaled-YOLOv4: Scaling Cross Stage Partial Network
Scaled-YOLOv4: Scaling Cross Stage Partial Network In this reading notes: We have reviewed some basic model scaling method: width, depth, resolution, compound scaling. We have computed the operation amount of residual blocks, and showed the relation with input image size (square), number of layers (linear), number of filters (square). We have presented the proposed Cross-Stage Partial (CSP) method that decreases the operations and improves the performance of basic CNN layers. PPT can be download from: This is an embedded Microsoft Office presentation, powered by Office.
Biscuits of Deep Learning
Manipulate Gradient The section includes some technologies related to gradient descent optimization method. Gradient Clipping A good explanation: What is Gradient Clipping? Related paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity (ICLR'2020) Sometimes the training loss may not stable, it may caused by exploding problem. A simple yet effective way is to use the Gradient Clipping method. Implemented by PyTorch [document] # inside the training loop loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm_value) # add this line optimizer.step() Gradient Centralization (ECCV'20) Paper: Gradient Centralization: A New Optimization Technique for Deep Neural Networks It normalizes the gradient to zero mean that can speed up the training process and increase the generalization ability (see the repository). Gradient Flooding (ICML'20) Paper: Do We Need Zero Training Loss After Achieving Zero Training Error? It sets a threshold for the training loss. If the loss is lower than the threshold, the method will penalize the overflowing value to avoid overfitting. Just adding one line of code (PyTorch) to implement it: outputs = model(inputs) loss = criterion(outputs, labels) flood = (loss-b).abs()+b #…
Experiment Control
Prepare the environment of the experiment control Tutorial from Installation To install Sacred at client (t.g. conda environment) pip install sacred pip install numpy pymongo Server: database # 1. Import the public key used by the package management system. wget -qO - | sudo apt-key add - # 2. Create a list file for MongoDB. echo "deb [ arch=amd64,arm64 ] bionic/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list # 3. Reload local package database. sudo apt-get update # 4. Install the MongoDB packages. sudo apt-get install -y mongodb-org # 4.1 prevent MongoDB upgrades when use apt-get echo "mongodb-org hold" | sudo dpkg --set-selections echo "mongodb-org-server hold" | sudo dpkg --set-selections echo "mongodb-org-shell hold" | sudo dpkg --set-selections echo "mongodb-org-mongos hold" | sudo dpkg --set-selections echo "mongodb-org-tools hold" | sudo dpkg --set-selections # control the service sudo service mongod start sudo service mongod stop sudo service mongod restart # start the mongod service with startup automatically sudo systemctl enable mongod && sudo systemctl start mongod # create a new database to store our experiment #…
Useful tools
Auto notification of modification of a web page Select an area and relax: It will send an email alert when something changes. Just from:
Sentences in Paper Writing
Feature Vocabulary: boost the representation power A Feature aggregation strategy is proposed to propagate information from early stags to the later ones. -- (Li, et al. 2019) “Retinking on Multi-Stage Networks for Huma Pose Estimation” A multi-stage network is vulnerable by the information losing during repeated up and down sampling. To mitigate this issuage, a cross stage feature aggregation strategy is used to propagte multi-scale features form the early stages to the current stage in an efficient way.-- (Li, et al. 2019) “Retinking on Multi-Stage Networks for Huma Pose Estimation” Features with different depth have different levels of abstraction of the image. -- (Sun. et al. 2018) FishNet Feature concatenation is used when vertical and horizontal arrow meet. -- (Sun. et al. 2018) FishNet In the context of the channels of a CNN, different channels are about differnt types of image features or raindrop features which cover a wide range of local image patterns. Deep Learning for Seeing Through Window With Raindrops Then feature seletection in a CNN is about assigning different weights to different…
Article Writing
Phrasebank Proof Reading Paraphrasing