### Category: Notes

19 Posts

Revisiting Global Statistics Aggregation for Improving Image Restoration (消除图像复原中的“misalignment”，性能大幅提升) Paper: Revisiting Global Statistics Aggregation for Improving Image Restoration (AAAI 2022) arXiv:https://arxiv.org/pdf/2112.04491.pdf Code: https://github.com/megvii-research/tlsc Reference: [1] 消除图像复原中的“misalignment”，性能大幅提升 https://mp.weixin.qq.com/s/HRm6wPDBeopsmLtilF5K-A [2] https://xiaoqiangzhou.cn/post/chu_revisiting/ 问题的提出: Specifically, with the increasing size of patches for testing, the performance increases in the case of UNet while it increases and then decreases in UNet-IN and UNet-SE cases. 对于一个训练好的UNet模型，使用patch作为测试输入时，随着输入尺寸的增加，性能会变好 （comment：消除了边界效应，更好的边界信息融合）。但是，如果UNet中包含了IN (Instance Norm, 对spatial 平面做归一化) 或者SE （Channel Attention/ Squeeze-and-Excitation, 中间包含了global average pooing， 对spatial平面做平均），增加输入的尺寸，性能先升后降。这就表明，对现有的模型，“训练是patch，测试是全图，使用IN和SE的策略” 存在问题。 训练与测试阶段的不同全局统计聚合计算方式就导致了"misalignment"，即出现了统计分布不一致现象。[1] 训练/测试阶段的基于图像块/完整图像特征的统计聚合计算差异会导致不同的分布，进而导致图像复原的性能下降(该现象被广泛忽视了)。 为解决该问题，我们提出一种简单的TLSC(Test-time Local Statistics Converter)方案，它在测试阶段的区域统计聚合操作由全局替换为局部。无需重新训练或微调，所提方案可以大幅提升已有图像复原方案的性能。[1] 解决方案 以SE为例，原本的global average pooling做法： 改进后，local statistics calculation 做法。每一个像素都在一个local的区域内（大小等于训练时的尺寸）去做平均。 对于边缘像素，复制padding，再做local statistics calculation 方法在论文中被拓展到Instance Norm 实验结果 原始的HiNet包含InstanceNorm，使用提出的方法后，性能获得提升。 原始的MPRNet包含SE模块，使用提出的方法后，性能获得提升。 数据分布的提升。 Another observation Full-image training causes severe performance loss in low-level vision task. This is explained by that full-images training lacks cropping augmentation [2]. 代码 # ------------------------------------------------------------------------ # Copyright (c) 2021 megvii-model. All Rights Reserved. # ------------------------------------------------------------------------ """ ## Revisiting Global Statistics Aggregation for Improving Image Restoration ## Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu """ import torch from torch import nn from torch.nn import functional as F from basicsr.models.archs.hinet_arch import HINet from basicsr.models.archs.mprnet_arch import MPRNet train_size=(1,3,256,256) class AvgPool2d(nn.Module):…
BoTNet (2021-01): 将 Self-Attention 嵌入 ResNet 文章：Bottleneck Transformers for Visual Recognition 论文： https://arxiv.org/abs/2101.11605 摘要： We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy…

Unzip a set of "*.tar.gz" file for f in *.tar.gz; do tar -xvf "$f"; done List folders under a path import os list_dirs = [name for name in os.listdir("path") if os.path.isdir(name)] HDF5 Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Document for HDF5 Python API: https://docs.h5py.org/en/stable/build.html Installation Installation with conda: conda install h5py Installation with pre-built wheels pip install h5py Usage import h5py h5_file_name = "my_data.h5" h5_writer = h5py.File(h5_file_name, 'a') # indicate the file to store the data for index in len(**YOUR_DATA_LIST**): set_intensity.create_dataset(f"{index:06}", data=**YOUR_DATA**) # save data if (index + 1) % 1500 == 0: # force to save once every 1500 records. print('Finish processing one section\n') h5_writer.close() time.sleep(1) h5_writer = h5py.File(h5_file_name, 'a') continue # when finished! h5_writer.close() References [1] https://en.wikipedia.org/wiki/Hierarchical_Data_Format [2] https://docs.h5py.org/en/stable/build.html Acknowledgement Codes are from Qiuliang Ye Source Paper: [ICCV'2017] https://arxiv.org/abs/1703.06868 Authors: Xun Huang, Serge Belongie Code: https://github.com/xunhuang1995/AdaIN-style Contributions In this paper, the authors present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. Arbitrary style transfer: takes a content image$C$and an arbitrary style image$S$as inputs, and synthesizes an output image with the same content as$C$and the same syle as$S$. Background Batch Normalization Given a input batch$x \in \mathbb{R}^{N \times C \times H \times W}$, batch normalization (BN) normalizes the mean and standard deviation for each individual feature channel: $$\mathrm{BN}(x)=\gamma\left(\frac{x-\mu(x)}{\sigma(x)}\right)+\beta$$ where$\gamma , \beta \in \mathbb{R}^{C}$are affine parameters learned from data.$\mu(x) , \sigma(x) \in \mathbb{R}^{C}\$ are mean and standard deviation computed across batch size and spatial dimensions, independently. $$\mu_{c}(x)=\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{n c h w}$$ $$\sigma_{c}(x)=\sqrt{\frac{1}{N H W} \sum_{n=1}^{N} \sum_{h=1}^{H} \sum_{w=1}^{W}\left(x_{n c h w}-\mu_{c}(x)\right)^{2}+\epsilon}$$ Instance Normalization Original feed-forward stylization method [51] utilizes BN layers after the convolutional layer. Ulyanov et al. [52] found using Instance Normalization…
Source Authors: Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan YangPaper: [CVPR2020] https://arxiv.org/abs/2003.08436Code: https://github.com/mingsun-tse/collaborative-distillation Contributions It proposes a new knowledge distillation method "Collobrative Distillation" based on the exclusive collaborative relation between the encoder and its decoder. It proposes to restrict the students to learn linear embedding of the teacher's outputs, which boosts its learning. Experimetenal works are done with different stylization frameworks, like WCT and AdaIN. Related Works Style Transfer WCT: Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M. H. (2017). Universal style transfer via feature transforms. arXiv preprint arXiv:1705.08086.AdaIN: Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1501-1510). Model Compression low-rank decomposition pruning quantization knowledge distillationKnowledge distillation is a promising model compression method by transferring the knowledge of large networks (called teacher) to small networks (called student), where the knowledge can be softened probability (which can reflect the inherent class similarity structure known as dark knowledge) or sample relations (which…
Step 1: Open "Zoom" software from the Desktop/Start_menu Step 2: Click the “Schedule” at the homepage of Zoom Step 3: Set the basic information of the scheduled meeting Step 4: Copy the invitation link at the homepage of Zoom. Step 5: Paste the link, and send to others An example:
Scaled-YOLOv4: Scaling Cross Stage Partial Network In this reading notes: We have reviewed some basic model scaling method: width, depth, resolution, compound scaling. We have computed the operation amount of residual blocks, and showed the relation with input image size (square), number of layers (linear), number of filters (square). We have presented the proposed Cross-Stage Partial (CSP) method that decreases the operations and improves the performance of basic CNN layers. PPT can be download from: https://connectpolyu-my.sharepoint.com/:p:/g/personal/18048204r_connect_polyu_hk/ET9zlHku9TFApqdl1A5NTV8BjFXPLizhCMupm6Ohcbehig?e=hhLlyc This is an embedded Microsoft Office presentation, powered by Office.