paper: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

总体结构

主要模块如下

(a)为初始的init
(b)为整体结构主要模块,借鉴resnet中bottleneck,其中

  • regularizer use Spatial Dropout, with p = 0.01 before bottleneck2.0, and p = 0.1 afterwards.
  • bottleneck downsampling: using maxpooling and replace the first 1x1 projection with a 2x2 conv stride=2
  • bottleneck dilated: conv使用空洞卷积
  • bottleneck asymmetric: conv使用分离卷积 eg. 5x1conv + 1x5conv
  • bottleneck unsampling(decoder): replace maxpooling with max unpooling

特点:

  • use dilated convolutions to have a wide receptive field
  • early downsampling
  • a large encoder and a small decoder
  • use PReLUs, not ReLUs
  • factorizing filters by using 5x1 conv and 1x5 conv
  • regularization update from L2 -> stochastic depth -> Spatial Dropout

paper: LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation [ website | pytorch_code ]

总体结构

其中encoder block 及 decoder block 分别如下




  • use ResNet18 as its encoder
  • link each encoder with decoder which is aimed to recover lost spatial information
  • can give real-time performance even on NVIDIA TX1 embedded system module
  • use weighing scheme $w_{\text {class }}=\frac{1}{\ln \left(1.02+p_{\text {class }}\right)}$, which is better than mean average frequency
  • use database cityscape and camvid
实验结果
  • cityscape
  • camvid

paper: The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

总体结构



其中DB(Dense Block)、DB中的layer、TD(尺寸缩减)、TU(尺寸变大)部分如下


  • use DenseNet architecture
  • use database camvid and gatech
实验结果
  • camvid