Abstract
Deep convolutional neural networks accurately classify a diverse range of
natural images, but may be easily deceived when designed, imperceptible
perturbations are embedded in the images. In this paper, we design a
multi-pronged training, input transformation, and image ensemble system that is
attack agnostic and not easily estimated. Our system incorporates two novel
features. The first is a transformation layer that computes feature level
polynomial kernels from class-level training data samples and iteratively
updates input image copies at inference time based on their feature kernel
differences to create an ensemble of transformed inputs. The second is a
classification system that incorporates the prediction of the undefended
network with a hard vote on the ensemble of filtered images. Our evaluations on
the CIFAR10 dataset show our system improves the robustness of an undefended
network against a variety of bounded and unbounded white-box attacks under
different distance metrics, while sacrificing little accuracy on clean images.
Against adaptive full-knowledge attackers creating end-to-end attacks, our
system successfully augments the existing robustness of adversarially trained
networks, for which our methods are most effectively applied.