A Methodology for Two-Level Product Partition Model Estimation of Normal Means
Diver, Paul, Statistics - Graduate School of Arts and Sciences, University of Virginia
Holt, Jeffrey, Department of Statistics, University of Virginia
In many instances, a collection of items can be thought of as having two levels: an individual-level in which each item is uniquely identified and a group-level defined by a set of known group membership labels. A two-level mean estimation and clustering problem using probability models is addressed where each item has an independent observation following a normal distribution with an item specific unknown mean and constant variance. The two-level structure allows the individual-level item observations to be aggregated and averaged by group membership index. This implies that each group index has an associated mean equal to the average of its members' means. This two-level setting is studied adapting probability models which allow means to be equal at both the individual and group-levels. The possibility of equal means at the group-level implies a two-level mean condition which restricts the possible values of the means to be estimated and thus is necessarily incorporated into the model. These probability models, called product partition models, provide a logical and flexible framework to the problem and permit the use of popular computational tools. Given a set of items, a partition is an arrangement of these items into a collection of non-empty, non-overlapping subsets. Items in the same set have observations with the same mean. Similarly, the group indices may also be partitioned into group-level sets. Random partitions at the group and individual-levels jointly possess a probability distribution which is updated through the information contained in the data. This posterior distribution allows for the estimation of the unknown means. Markov sampling adapted for the two-level setting assists in providing two estimates: a two-level product estimate computed via a weighted average of posterior means summed over all possible pairs of group and individual-level partitions, and an estimate via the posterior mode, the maximum a posteriori clustering of the data. The posterior mode provides a clustering structure at both the individual and group-levels, herein automatically selecting the number of clusters at both levels and avoiding the need for presetting as with other methods. This dissertation extends the insightful work of Crowley (1997) to this two-level setting with the two-level mean condition. Her important work focuses on estimating the means of normally distributed observations with known variance equal to 1 in the one-level setting where no group-level index information is utilized, whether or not it is known. The incorporation of this two-level mean condition provides not only logically consistent analysis regarding the means across levels, but also superior mean estimation results including when applied to Major League Baseball batting average data studied in Crowley (1997) and elsewhere.
PHD (Doctor of Philosophy)
Product Partition Models
All rights reserved (no additional license for public reuse)