### Category: Study

8 Posts

BoTNet (2021-01): 将 Self-Attention 嵌入 ResNet 文章：Bottleneck Transformers for Visual Recognition 论文： https://arxiv.org/abs/2101.11605 摘要： We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy…
Source Paper: [ICCV'2017] https://arxiv.org/abs/1703.06868 Authors: Xun Huang, Serge Belongie Code: https://github.com/xunhuang1995/AdaIN-style Contributions In this paper, the authors present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. Arbitrary style transfer: takes a content image $C$ and an arbitrary style image $S$ as inputs, and synthesizes an output image with the same content as $C$ and the same syle as $S$. Background Batch Normalization Given a input batch $x \in \mathbb{R}^{N \times C \times H \times W}$, batch normalization (BN) normalizes the mean and standard deviation for each individual feature channel: $$\mathrm{BN}(x)=\gamma\left(\frac{x-\mu(x)}{\sigma(x)}\right)+\beta$$ where $\gamma , \beta \in \mathbb{R}^{C}$ are affine parameters learned from data. $\mu(x) , \sigma(x) \in \mathbb{R}^{C}$ are mean and standard deviation computed across batch size and spatial dimensions, independently. $$\mu_{c}(x)=\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{n c h w}$$ $$\sigma_{c}(x)=\sqrt{\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W}\left(x_{n c h w}-\mu_{c}(x)\right)^{2}+\epsilon}$$ Instance Normalization Original feed-forward stylization method [51] utilizes BN layers after the convolutional layer. Ulyanov et al. [52] found using Instance Normalization…
Source Authors: Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan YangPaper: [CVPR2020] https://arxiv.org/abs/2003.08436Code: https://github.com/mingsun-tse/collaborative-distillation Contributions It proposes a new knowledge distillation method "Collobrative Distillation" based on the exclusive collaborative relation between the encoder and its decoder. It proposes to restrict the students to learn linear embedding of the teacher's outputs, which boosts its learning. Experimetenal works are done with different stylization frameworks, like WCT and AdaIN. Related Works Style Transfer WCT: Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M. H. (2017). Universal style transfer via feature transforms. arXiv preprint arXiv:1705.08086.AdaIN: Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1501-1510). Model Compression low-rank decomposition pruning quantization knowledge distillationKnowledge distillation is a promising model compression method by transferring the knowledge of large networks (called teacher) to small networks (called student), where the knowledge can be softened probability (which can reflect the inherent class similarity structure known as dark knowledge) or sample relations (which…
Paper Information Paper: YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design Authors: Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang Paper: https://arxiv.org/abs/2009.05697 Github: https://github.com/nightsnack/YOLObile Objective: Real-time object detection for mobile devices. Study notes and presentation: Download: https://connectpolyu-my.sharepoint.com/:p:/g/personal/18048204r_connect_polyu_hk/EcRbix5iqshBglmxuLurS-sBBFmbrk8chRkim1y54-yOXw?e=8Qdfmd This is an embedded Microsoft Office presentation, powered by Office.
MeTriX MuX Visual Quality Assessment Package The name of the package is MeTriX MuX Visual Quality Assessment Package. The official website does not work (attempt on 08-Sep-2019), and the download link http://foulard.ece.cornell.edu/gaubatz/metrix_mux/metrix_mux_1.1.zip is invaild. I find a copy that is from a github repository https://github.com/sattarab/image-quality-tools/tree/master/metrix_mux Installation It can be easily followed at the index.html file. It is very simple, just run the command ">> configure_metrix_mux" in matlab. For "VIF" method there is a bug that need to change a function name from "_m"to "_M". Content It is a powerful toolbox that contains implementation of many evaluation algorithms: 'MSE': mean-squared-error 'PSNR': peak signal-to-noise-ratio 'SSIM': structural similarity index 'MSSIM': multi-scale SSIM index 'VSNR': visual signal-to-noise ratio 'VIF': visual information fidelity 'VIFP': pixel-based VIF 'UQI': universal quality index 'IFC': information fidelity criterion 'NQM': noise quality measure 'WSNR': weighted signal-to-noise ratio 'SNR': signal-to-noise ratio Usage distorted_ssim_index = metrix_mux( reference_image, distorted_image, 'SSIM')