BoTNet (2021-01): 将 Self-Attention 嵌入 ResNet 文章：Bottleneck Transformers for Visual Recognition 论文： https://arxiv.org/abs/2101.11605 摘要： We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy…

Source Paper: [ICCV'2017] https://arxiv.org/abs/1703.06868 Authors: Xun Huang, Serge Belongie Code: https://github.com/xunhuang1995/AdaIN-style Contributions In this paper, the authors present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. Arbitrary style transfer: takes a content image $C$ and an arbitrary style image $S$ as inputs, and synthesizes an output image with the same content as $C$ and the same syle as $S$. Background Batch Normalization Given a input batch $x \in \mathbb{R}^{N \times C \times H \times W}$, batch normalization (BN) normalizes the mean and standard deviation for each individual feature channel: $$ \mathrm{BN}(x)=\gamma\left(\frac{x-\mu(x)}{\sigma(x)}\right)+\beta $$ where $\gamma , \beta \in \mathbb{R}^{C}$ are affine parameters learned from data. $\mu(x) , \sigma(x) \in \mathbb{R}^{C}$ are mean and standard deviation computed across batch size and spatial dimensions, independently. $$ \mu_{c}(x)=\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{n c h w} $$ $$ \sigma_{c}(x)=\sqrt{\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W}\left(x_{n c h w}-\mu_{c}(x)\right)^{2}+\epsilon} $$ Instance Normalization Original feed-forward stylization method [51] utilizes BN layers after the convolutional layer. Ulyanov et al. [52] found using Instance Normalization…

Source Authors: Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan YangPaper: [CVPR2020] https://arxiv.org/abs/2003.08436Code: https://github.com/mingsun-tse/collaborative-distillation Contributions It proposes a new knowledge distillation method "Collobrative Distillation" based on the exclusive collaborative relation between the encoder and its decoder. It proposes to restrict the students to learn linear embedding of the teacher's outputs, which boosts its learning. Experimetenal works are done with different stylization frameworks, like WCT and AdaIN. Related Works Style Transfer WCT: Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M. H. (2017). Universal style transfer via feature transforms. arXiv preprint arXiv:1705.08086.AdaIN: Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1501-1510). Model Compression low-rank decomposition pruning quantization knowledge distillationKnowledge distillation is a promising model compression method by transferring the knowledge of large networks (called teacher) to small networks (called student), where the knowledge can be softened probability (which can reflect the inherent class similarity structure known as dark knowledge) or sample relations (which…

Involution 内卷积 CVPR 2021 论文 作者： Duo Li, Jie Hu, Changhu Wang et al. 论文地址：https://arxiv.org/pdf/2103.06255.pdf 源码：https://github.com/d-li14/involution ...

Scaled-YOLOv4: Scaling Cross Stage Partial Network In this reading notes: We have reviewed some basic model scaling method: width, depth, resolution, compound scaling. We have computed the operation amount of residual blocks, and showed the relation with input image size (square), number of layers (linear), number of filters (square). We have presented the proposed Cross-Stage Partial (CSP) method that decreases the operations and improves the performance of basic CNN layers. PPT can be download from: https://connectpolyu-my.sharepoint.com/:p:/g/personal/18048204r_connect_polyu_hk/ET9zlHku9TFApqdl1A5NTV8BjFXPLizhCMupm6Ohcbehig?e=hhLlyc This is an embedded Microsoft Office presentation, powered by Office.

Paper Information Paper: YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design Authors: Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang Paper: https://arxiv.org/abs/2009.05697 Github: https://github.com/nightsnack/YOLObile Objective: Real-time object detection for mobile devices. Study notes and presentation: Download: https://connectpolyu-my.sharepoint.com/:p:/g/personal/18048204r_connect_polyu_hk/EcRbix5iqshBglmxuLurS-sBBFmbrk8chRkim1y54-yOXw?e=8Qdfmd This is an embedded Microsoft Office presentation, powered by Office.

Abstract The paper introduces a position attention module and a channel attention module to capture global dependencies in the spatial and channel dimensions respectively. The proposed DANet adaptively integrates local semantic features using the self-attention mechanism. 摘要 本文引入了位置关注模块和通道关注模块，分别在空间和通道维度上捕捉全局依赖性。 所提出的DANet利用自注意力机制自适应地集成局部语义特征。 Outline Brief Review: attention mechanism, SE net DANet: Dual Attention Network Experiments: visualization and comparison Conclusion 大纲 回顾：注意机制、SENet DANet： 双重关注网络 实验：可视化和对比 结论 Download: https://connectpolyu-my.sharepoint.com/:p:/g/personal/18048204r_connect_polyu_hk/EbgphNjvYP5Psw5gdgDjInQBs761z4x8FYboKXF2arT6kw?e=haTOHI This is an embedded Microsoft Office presentation, powered by Office.

Presentation Slides

Presentation Slides Q&A

Objectives Deep learning is a recently hot machine learning method. The deep learning architectures are formed by the composition of several nonlinear transformations with the goal to yield more abstract and extract useful representations/features. (i) Start with a revision of the basic principle of Neural Networks, neutron structure, examples of back-propagation, learning procedure and iterations (preferably with experimental results), and then (ii) discuss at least one type of deep learning architecture, with a way or ways to illustrate its working principle. (iii) You can also give a summary of different deep learning architectures, and highlight their uses and significances with a good explanation if possible. (iv) you can also illustrate the whole procedure for its use for object recognition/classification. Outlines Introduction of Deep Neural Network and Convolutional Neural Network Milestones (some famous networks)Deep Convolutional Neural NetworkConclusion Presentation Slides [2021-06-25] A revised version can be download from Download_Link This is an embedded Microsoft Office presentation, powered by Office. References [1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.”…