Update to mirrors.aliyun.com, reference this in step 1.
paper: Fast R-CNN
SPPnet solved the R-CNN’s problem that it extracts features for each of the 2k~ region proposal and costs a lot of time. SPPnet runs the convolutional layers only once on the entire image (regardless of the number of windows), and then extract features by SPP-net on the feature maps. But it is not an end-to-end model so that extracted features need to be written to the disk, and uses two stages for classification and bbox regression. Fast R-CNN can do back-propagation end-to-end.
paper: Selective Kernel Networks
It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. In Selective Kernel (SK) unit, multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.