Abstract
Deep neural networks (DNNs) have made tremendous progress towards the integration with a wide range of mobile and Internet-of-Things (IoT) applications such as face recognition, object detection and speech assistant. Nowadays, DNNs are increasingly being trained/inferred using local and private data generated/collected by mobile devices to suit the users' personal needs and quickly adapt to changing environment conditions. Although steps have already been taken recently to enable efficient DNN training on resource constrained mobile devices, they face significant challenges to address the immediate deep learning needs of many existing mobile devices, which exhibit a substantial heterogeneity in terms of their computing capabilities. An alternative approach for deep learning for resource constrained mobile devices is edge computing. Unlike centralized-based cloud computing, edge computing allows the mobile devices to use a nearby computing server located at the edge of the network. While recognizing the advantages of edge computing for deep learning, previous empirical studies reveal that its performance is highly sensitive to the bandwidth between edge servers and mobile devices. Current wisdom focuses on collaborative deep learning between mobile devices and edge servers as it is able to leverage the power of both on-device processing and computation offloading. In this proposal, three collaborative deep learning systems are well investigated for edge intelligence.
First, we build a collaborative deep inference system between a resource constrained mobile device and a powerful edge server, aiming at joining the power of both on-device processing and computation offloading. Second, we present CrossVision, a distributed framework for real-time video analytics, that retains all video data on cameras while achieving low inference delay and high inference accuracy. Finally, we propose an energy-efficient edge-assisted multiple-camera system, E3Pose, for real-time multi-human 3D pose estimation, based on the key idea of adaptive camera selection.