Category: CNN网络架构设计

2 Posts

TLSC (Test-time Local Statistics Converter)
Revisiting Global Statistics Aggregation for Improving Image Restoration (消除图像复原中的“misalignment”,性能大幅提升) Paper: Revisiting Global Statistics Aggregation for Improving Image Restoration (AAAI 2022) arXiv:https://arxiv.org/pdf/2112.04491.pdf Code: https://github.com/megvii-research/tlsc Reference: [1] 消除图像复原中的“misalignment”,性能大幅提升 https://mp.weixin.qq.com/s/HRm6wPDBeopsmLtilF5K-A [2] https://xiaoqiangzhou.cn/post/chu_revisiting/ 问题的提出: Specifically, with the increasing size of patches for testing, the performance increases in the case of UNet while it increases and then decreases in UNet-IN and UNet-SE cases. 对于一个训练好的UNet模型,使用patch作为测试输入时,随着输入尺寸的增加,性能会变好 (comment:消除了边界效应,更好的边界信息融合)。但是,如果UNet中包含了IN (Instance Norm, 对spatial 平面做归一化) 或者SE (Channel Attention/ Squeeze-and-Excitation, 中间包含了global average pooing, 对spatial平面做平均),增加输入的尺寸,性能先升后降。这就表明,对现有的模型,“训练是patch,测试是全图,使用IN和SE的策略” 存在问题。 训练与测试阶段的不同全局统计聚合计算方式就导致了"misalignment",即出现了统计分布不一致现象。[1] 训练/测试阶段的基于图像块/完整图像特征的统计聚合计算差异会导致不同的分布,进而导致图像复原的性能下降(该现象被广泛忽视了)。 为解决该问题,我们提出一种简单的TLSC(Test-time Local Statistics Converter)方案,它在测试阶段的区域统计聚合操作由全局替换为局部。无需重新训练或微调,所提方案可以大幅提升已有图像复原方案的性能。[1] 解决方案 以SE为例,原本的global average pooling做法: 改进后,local statistics calculation 做法。每一个像素都在一个local的区域内(大小等于训练时的尺寸)去做平均。 对于边缘像素,复制padding,再做local statistics calculation 方法在论文中被拓展到Instance Norm 实验结果 原始的HiNet包含InstanceNorm,使用提出的方法后,性能获得提升。 原始的MPRNet包含SE模块,使用提出的方法后,性能获得提升。 数据分布的提升。 Another observation Full-image training causes severe performance loss in low-level vision task. This is explained by that full-images training lacks cropping augmentation [2]. 代码 # ------------------------------------------------------------------------ # Copyright (c) 2021 megvii-model. All Rights Reserved. # ------------------------------------------------------------------------ """ ## Revisiting Global Statistics Aggregation for Improving Image Restoration ## Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu """ import torch from torch import nn from torch.nn import functional as F from basicsr.models.archs.hinet_arch import HINet from basicsr.models.archs.mprnet_arch import MPRNet train_size=(1,3,256,256) class AvgPool2d(nn.Module):…
BoTNet (Bottleneck Transformers)
BoTNet (2021-01): 将 Self-Attention 嵌入 ResNet 文章:Bottleneck Transformers for Visual Recognition 论文: https://arxiv.org/abs/2101.11605 摘要: We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy…