Statistical Inference of High-dimensional Data Structures: Point Processes and Tensors

Biao Cai

Back

Statistical Inference of High-dimensional Data Structures: Point Processes and Tensors

Dissertation

Open access

Statistical Inference of High-dimensional Data Structures: Point Processes and Tensors

Biao Cai

Doctor of Philosophy (PhD), University of Miami

2021-06

Abstract

non-stationary multivariate Hawkes process

non-asymptotic error bound

selection consistency

high-dimensional ECM Algorithm

tensor clustering

tensor decomposition

In modern data science, large-scale data with hidden structures are fast emerging in a wide range of applications. In this thesis, we address two challenging issues in analyzing high-dimensional data with hidden structures. In the first part, we aim to estimate the latent network structure underlying large-scale multivariate point process data. This research task is useful in numerous scientific and business applications. To characterize the complex processes underlying the observed data, we propose a new and flexible class of nonstationary Hawkes processes that allow both excitatory and inhibitory effects. We estimate the latent network structure using an efficient sparse least squares estimation approach. Using a thinning representation, we establish concentration inequalities for the first and second order statistics of the proposed Hawkes process. Such theoretical results enable us to establish the non-asymptotic error bound and the selection consistency of the estimated parameters. Furthermore, we describe a least squares loss based statistic for testing if the baseline is constant in time.   In the second part, we consider the problem of jointly modeling and clustering populations of tensors. This is an important task in many scientific and business fields. Specifically, we introduce a flexible high-dimensional tensor mixture model with heterogeneous covariances. The proposed mixture model exploits the intrinsic structures of tensor data, and is assumed to have means that are low-rank and internally sparse as well as heterogeneous covariances that are separable and conditionally sparse. We develop an efficient high-dimensional expectation-conditional-maximization (HECM) algorithm that breaks the challenging optimization in the M-step into several simpler conditional optimization problems, each of which is convex, admits regularization and has closed-form updating formulas. We show that the proposed HECM algorithm, with an appropriate initialization, converges geometrically to a neighborhood that is within statistical precision of the true parameter.

Files and links (1)

pdf

bxc511S217.81 MBDownload View

Open Access

Metrics

10 File views/ downloads

142 Record Views

Details

Title: Statistical Inference of High-dimensional Data Structures: Point Processes and Tensors
Creators: Biao Cai
Contributors: Emma Jingfei Zhang (Committee Member)
Yongtao Guan (Committee Member)
Lan Wang (Committee Member)
J. Sunil Rao (Committee Member)
Theses and Dissertations: Doctor of Philosophy (PhD), University of Miami; Dissertation
Degree in: Management Science
Date of defense: 2021-06-17
Academic Unit: MHBS - Management Science
Language: English
Resource Type: Dissertation
Record Identifier: 991031596386702976