In this post, Fast-SCNN (fast segmentation convolutional neural network) [1] is briefly reviewed. This architecture aims on real-time semantic segmentation tasks, and it can reach 123.5 frames per second on Cityscapes dataset with a high resolution of input image (1024 x 2048 px), while with small capacity network size.

Outline

Fast-SCNN Architecture
Learning to Downsample
Experiment Results
Ablation Studies

1. Fast-SCNN Architecture

Fast-SCNN architecture

As shown above, Fast-SCNN is composed of four modules: Learning to Downsample, Global Feature Extractor, Feature Fusion, and Classifier.
All modules are built using depth-wise separable convolution.
- The reason is that such convolution has become a key building block adopted by many efficient DCNN architectures such as Xception, MobileNet, and Contextnet.

layout of Fast-SCNN

The layout is shown above, where the horizontal lines separate the modules.
- Parameter explanation
  - t: expansion factor of the bottleneck block
  - c: number of output channels
  - n: number of times that block is repeated
  - s: stride parameter which is applied to first sequence of the repeating block

2. Learning to Downsample

Current state-of-the-art real-time semantic segmentaiton methods are built by networks with two braches operating on distinct resolutions on each side.
- The methods learn global information from low-resolution versions of the input image, and shallow networks refine the precision of the segmentation results on full input resolution.
As the well-known fact that DCNNs extract the low-level features such as corners and edges on the first few layers, the authors think that sharing feature computation between the low and high -level branch in a shallow network block will boost up the performance.

3. Experiment Results

The authors evaluate Fast-SCNN on Cityscapes dataset.
- Furthermore, they add 20,000 weakly annotated images, or coarse labels, on training.
They report results with three groups: both, fine only, and fine with coarse labeled data.
They Use only 19 classes on evaluation.
As comparison, ContextNet, BiSeNet, GUN, ENet, and ICNet are chosen as they are SOTA real-time semantic segmentation methods.
The proposed Fast-SCNN is divided to two types: Fast-SCNN cls and Fast-SCNN prob on the part of runtime comparison.
- The reason of doing this is that softmax operation is costly on inferencing; as a consequence, they replace softmax to argmax when the network is on inference mode.
- Fast-SCNN cls denotes the softmax is changed to argmax.
- Fast-SCNN prob denotes the standard version.

Qualitative results of Fast-SCNN

Qualitative Results of Fast-SCNN

Class and category mIoU comparison

Class and Category mIoU Comparison

Runtime comparison

Runtime (fps) comparison

The above tables shows Fast-SCNN outperforms other STOA methods.

4. Ablation Studies

Pre-training and Weakly Labeled Data

High capacity DCNNs such as R-CNN and PSPNet have shown that performance can be boosted with pre-training through contrast auxiliary missions.
As the authors specify Fast-SCNN having low capacity, they want to test performance with and withoug pre-training, and with the connection with and without additional weakly labeled data.
- Also, it seems that the importance of pre-training and additional weakly labeled data on DCNNs with low capacity has not been studied before.

Class mIoU of different Fast-SCNN settings

As shown in Table 6, it seems neither pre-training nor weakly labeled data boosts up the performance for low capacity DCNN.

Zero-out Skip Connection

The authors make this test to confirm whether skip connection benefits Fast-SCNN or not.
By zeroing-out skip connection, the mIoU drops from 69.22% to 64.30% on the validation dataset, and Figure 3 shows the results between without and with zeroing-out skip connection.

Lower Input Solutions

Since the authors are having interests on those embedding devices without full resolution input or powerful computational power, they study this with half and quarter input resolutions.
Shown in Table 7, the authors conclude Fast-SCNN is directly applicable to lower input resolution without modification.

Runtime and Accuracy of Fast-SCNN on Different Input Resolution

Reference

[1] https://arxiv.org/abs/1902.04502