paper: SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE with pytorch code

Why we need smaller CNN architectures?

  • More efficient distributed training;
  • Less overhead when exporting new models to clients;
  • Feasible FPGA and embedded deployment.

SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques, it is able to compress SqueezeNet to less than 0.5MB (510× smaller than AlexNet).

Fire module

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Fire(nn.Module):
def __init__(self, in_channel, out_channel, squzee_channel):
super().__init__()
self.squeeze = nn.Sequential(
nn.Conv2d(in_channel, squzee_channel, 1),
nn.BatchNorm2d(squzee_channel),
nn.ReLU(inplace=True)
)
self.expand_1x1 = nn.Sequential(
nn.Conv2d(squzee_channel, int(out_channel / 2), 1),
nn.BatchNorm2d(int(out_channel / 2)),
nn.ReLU(inplace=True)
)
self.expand_3x3 = nn.Sequential(
nn.Conv2d(squzee_channel, int(out_channel / 2), 3, padding=1),
nn.BatchNorm2d(int(out_channel / 2)),
nn.ReLU(inplace=True)
)

def forward(self, x):
x = self.squeeze(x)
x = torch.cat([
self.expand_1x1(x),
self.expand_3x3(x)
], 1)
return x

Architecture

It was designed follow three main strategies:

  • Replace 3x3 filters with 1x1 filters (Fire module);
  • Decrease the number of input channels to 3x3 filters (Fire module);
  • Downsample late in the network so that convolution layers have large activation maps