Abstract
<p>In modern data science, large-scale data with hidden structures are fast emerging in a wide range of applications. In this thesis, we address two challenging issues in analyzing high-dimensional data with hidden structures.</p>
<p><br />
In the first part, we aim to estimate the latent network structure underlying large-scale multivariate point process data. This research task is useful in numerous scientific and business applications. To characterize the complex processes underlying the observed data, we propose a new and flexible class of nonstationary Hawkes processes that allow both excitatory and inhibitory effects. We estimate the latent network structure using an efficient sparse least squares estimation approach. Using a thinning representation, we establish concentration inequalities for the first and second order statistics of the proposed Hawkes process. Such theoretical results enable us to establish the non-asymptotic error bound and the selection consistency of the estimated parameters. Furthermore, we describe a least squares loss based statistic for testing if the baseline is constant in time.</p>
<p> </p>
<p>In the second part, we consider the problem of jointly modeling and clustering populations of tensors. This is an important task in many scientific and business fields. Specifically, we introduce a flexible high-dimensional tensor mixture model with heterogeneous covariances. The proposed mixture model exploits the intrinsic structures of tensor data, and is assumed to have means that are low-rank and internally sparse as well as heterogeneous covariances that are separable and conditionally sparse. We develop an efficient high-dimensional expectation-conditional-maximization (HECM) algorithm that breaks the challenging optimization in the M-step into several simpler conditional optimization problems, each of which is convex, admits regularization and has closed-form updating formulas. We show that the proposed HECM algorithm, with an appropriate initialization, converges geometrically to a neighborhood that is within statistical precision of the true parameter.</p>
<p> </p>
<p> </p>