Empirical Likelihood and Bootstrap Inference with Constraints

作者: Chunlin Wang

DOI:

关键词:

摘要: Empirical likelihood and the bootstrap play influential roles in contemporary statistics. This thesis studies two distinct statistical inference problems, referred to as Part I and Part II, related to the empirical likelihood and bootstrap, respectively. Part I of this thesis concerns making statistical inferences on multiple groups of samples that contain excess zero observations. A unique feature of the target populations is that the distribution of each group is characterized by a non-standard mixture of a singular distribution at zero and a skewed nonnegative component. In Part I of this thesis, we propose modelling the nonnegative components using a semiparametric, multiple-sample, density ratio model (DRM). Under this semiparametric setup, we can efficiently utilize information from the combined samples even with unspecified underlying distributions. We first study the question of testing homogeneity of multiple nonnegative distributions when there is an excess of zeros in the data, under the proposed semiparametric setup. We develop a new empirical likelihood ratio (ELR) test for homogeneity and show that this ELR has a $\chi^2$-type limiting distribution under the homogeneous null hypothesis. A nonparametric bootstrap procedure is proposed to calibrate the finite-sample distribution of the ELR. The consistency of this bootstrap procedure is established under both the null and alternative hypotheses. Simulation studies show that the bootstrap ELR test has an accurate nominal type I error, is robust to changes of underlying distributions, is competitive to, and sometimes more powerful than, several popular one- and two-part tests. A real data example is used to illustrate the advantages of the proposed test. We next investigate the problem of comparing the means of multiple nonnegative distributions, with excess zero observations, under the proposed semiparametric setup. We develop a unified inference framework based on our new ELR statistic, and show that this ELR has a $\chi^2$-type limiting distribution under a general null hypothesis. This allows us to construct a new test for mean equality. Simulation results show favourable performance of the proposed ELR test compared with other existing tests for mean equality, especially when the correctly specified basis function in the DRM is the logarithm function. A real data set is analyzed to illustrate the advantages of the proposed method. In Part II of this thesis, we investigate the asymptotic behaviour of, the commonly used, bootstrap percentile confidence intervals when the parameters are subject to inequality constraints. We concentrate on the important one- and two-sample problems with data generated from distributions in the natural exponential family. Our attention is focused on quantifying asymptotic coverage probabilities of the percentile confidence intervals based on bootstrapping maximum likelihood estimators. We propose a novel local framework to study the subtle asymptotic behaviour of bootstrap percentile confidence intervals when the true parameter values are close to the boundary. Under this framework, we discover that when the true parameter is on, or close to, the restriction boundary, the local asymptotic coverage probabilities can always exceed the nominal level in the one-sample case; however, they can be, surprisingly, both under and over the nominal level in the two-sample case. The results provide theoretical justification and guidance on applying the bootstrap percentile method to constrained inference problems. The two individual parts of this thesis are connected by being referred to as {\em constrained statistical inference}. Specifically, in Part I, the semiparametric density ratio model uses an exponential tilting constraint, which is a type of equality constraint, on the parameter space. In Part II, we deal with inequality constraints, such as a boundary or ordering constraints, on the parameter space. For both parts, an important regularity condition in traditional likelihood inference, that parameters should be interior points of the parameter space, is violated. Therefore, the respective inference procedures involve non-standard asymptotics that create new technical challenges.

参考文章(92)
Liang Peng, EMPIRICAL LIKELIHOOD METHODS FOR THE GINI INDEX Australian & New Zealand Journal of Statistics. ,vol. 53, pp. 131- 139 ,(2011) , 10.1111/J.1467-842X.2011.00614.X
Guang Cheng, Moment Consistency of the Exchangeably Weighted Bootstrap for Semiparametric M-estimation Scandinavian Journal of Statistics. ,vol. 42, pp. 665- 684 ,(2015) , 10.1111/SJOS.12128
Yang Ning, Yong Chen, A Class of Pseudolikelihood Ratio Tests for Homogeneity in Exponential Tilt Mixture Models Scandinavian Journal of Statistics. ,vol. 42, pp. 504- 517 ,(2015) , 10.1111/SJOS.12119
Yong Chen, Yang Ning, Chuan Hong, Shuang Wang, Semiparametric tests for identifying differentially methylated loci with case-control designs using Illumina arrays. Genetic Epidemiology. ,vol. 38, pp. 42- 50 ,(2014) , 10.1002/GEPI.21774
Brandie D. Wagner, Charles E. Robertson, J. Kirk Harris, Application of two-part statistics for comparison of sequence variant counts. PLOS ONE. ,vol. 6, ,(2011) , 10.1371/JOURNAL.PONE.0020296
Guoqing Diao, Jing Ning, jing qin, Maximum Likelihood Estimation for Semiparametric Density Ratio Model The International Journal of Biostatistics. ,vol. 8, pp. 1372- ,(2012) , 10.1515/1557-4679.1372
G. Vallejo, M. P. Fernández, P. E. Livacic-Rojas, Analysis of unbalanced factorial designs with heteroscedastic data Journal of Statistical Computation and Simulation. ,vol. 80, pp. 75- 88 ,(2010) , 10.1080/00949650802482386
Peter A Lachenbruch, Analysis of data with excess zeros Statistical Methods in Medical Research. ,vol. 11, pp. 297- 302 ,(2002) , 10.1191/0962280202SM289RA