| Title: | Shrinkage Partition Distribution |
|---|---|
| Description: | Implementation of the distribution introduced in our paper "Dependent Random Partitions by Shrinking Toward an Anchor". |
| Authors: | David B. Dahl [aut, cre] (ORCID: <https://orcid.org/0000-0002-8173-1547>), Richard L. Warr [aut] (ORCID: <https://orcid.org/0000-0001-8508-3105>), Thomas P. Jensen [aut] (ORCID: <https://orcid.org/0009-0000-0015-3881>), Authors of the dependency Rust crates [ctb] (see inst/AUTHORS file for details) |
| Maintainer: | David B. Dahl <[email protected]> |
| License: | MIT + file LICENSE | Apache License 2.0 |
| Version: | 0.2.20 |
| Built: | 2026-05-21 09:37:42 UTC |
| Source: | https://github.com/dbdahl/dbdahl.r-universe.dev |
This function specifies the centered partition distribution for given an anchor partition, shrinkage, concentration, and discount parameters.
CenteredPartition(anchor, shrinkage, baseline, useVI = TRUE, a = 1)CenteredPartition(anchor, shrinkage, baseline, useVI = TRUE, a = 1)
anchor |
An integer vector giving the anchor partition (a.k.a., center partition, location partition). |
shrinkage |
A scalar giving the value of the shrinkage parameter. |
baseline |
An object of class |
useVI |
Should the distance between a particular partition and the
anchor partition be measured using the variation of information
( |
a |
A nonnegative scalar influencing the relative cost of placing two items
in separate clusters when in truth they belong to the same cluster. This
defaults to |
A numeric vector giving either probabilities or log probabilities for the supplied partitions.
CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3, baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1)) CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3, baseline=UniformPartition(nItems=5))CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3, baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1)) CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3, baseline=UniformPartition(nItems=5))
This function specifies the Chinese restaurant process (CRP) partition distribution for given concentration and discount parameters.
CRPPartition(nItems, concentration, discount = 0)CRPPartition(nItems, concentration, discount = 0)
nItems |
The number of items in each partition, as an integer greater
than or equal to |
concentration |
The concentration parameter, as a numeric value
greater than |
discount |
The discount parameter, as a numeric value in |
An object of class PartitionDistribution representing this
partition distribution.
CRPPartition(nItems=5, concentration=1.5, discount=0.1)CRPPartition(nItems=5, concentration=1.5, discount=0.1)
This function specifies the fixed partition distribution for a given anchor partition. This is a point-mass distribution at the anchor partition.
FixedPartition(anchor)FixedPartition(anchor)
anchor |
An integer vector giving the anchor partition (a.k.a., center partition, location partition). |
An object of class PartitionDistribution representing this
partition distribution.
FixedPartition(anchor=c(1,1,1,2,2))FixedPartition(anchor=c(1,1,1,2,2))
This function specifies the Jensen Liu (2008) partition distribution for given concentration and permutation parameters.
JensenLiuPartition(concentration, permutation)JensenLiuPartition(concentration, permutation)
concentration |
The concentration parameter, as a numeric value
greater than |
permutation |
A vector containing the integers |
An object of class PartitionDistribution representing this
partition distribution.
JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3))JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3))
This function specifies the location scale partition distribution for given anchor partition, shrinkage, and permutation parameters. The shrinkage parameter is the reciprocal of the scale parameter.
LocationScalePartition(anchor, shrinkage, concentration, permutation)LocationScalePartition(anchor, shrinkage, concentration, permutation)
anchor |
An integer vector giving the anchor partition (a.k.a., center partition, location partition). |
shrinkage |
A scalar giving the value of the shrinkage parameter, i.e.,the reciprocal of the scale parameter. |
concentration |
The concentration parameter, as a numeric value
greater than |
permutation |
A vector containing the integers |
An object of class PartitionDistribution representing this
partition distribution.
scale <- 0.1 LocationScalePartition(anchor=c(1,1,1,2,2), shrinkage=1/scale, concentration=0.6, permutation=c(1,5,4,2,3))scale <- 0.1 LocationScalePartition(anchor=c(1,1,1,2,2), shrinkage=1/scale, concentration=0.6, permutation=c(1,5,4,2,3))
Users should not call this function. This is an internal function exported for the sake of developers of packages depending on this package.
mkDistrPtr(distr, excluded = NULL, included = NULL)mkDistrPtr(distr, excluded = NULL, included = NULL)
distr |
An object of class ‘PartitionDistribution’. |
excluded |
A character vector of explicitly excluded partition distributions. |
included |
A character vector of explicitly included partition distributions. |
A pointer
This function computes (the log of) the probability of the supplied partitions for the given partition distribution.
prPartition(distr, partition, log = TRUE)prPartition(distr, partition, log = TRUE)
distr |
A specification of the partition distribution, i.e., an object
of class |
partition |
A matrix of integers giving cluster labels on the rows. For
partition |
log |
A logical indicating whether the probability ( |
A numeric vector giving either probabilities or log probabilities for the supplied partitions.
CRPPartition, ShrinkagePartition,
LocationScalePartition, CenteredPartition,
samplePartition
concentration <- 1.0 discount <- 0.1 nSamples <- 3 distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x) anchor <- c(1,1,1,2,2) permutation <- c(1,5,4,2,3) n_items <- length(permutation) distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3), permutation=permutation, grit=0.2, CRPPartition(nItems=n_items, concentration=concentration, discount=discount)) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x)concentration <- 1.0 discount <- 0.1 nSamples <- 3 distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x) anchor <- c(1,1,1,2,2) permutation <- c(1,5,4,2,3) n_items <- length(permutation) distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3), permutation=permutation, grit=0.2, CRPPartition(nItems=n_items, concentration=concentration, discount=discount)) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x)
This function samples from a partition distribution.
samplePartition( distr, nSamples, randomizePermutation = FALSE, randomizeShrinkage = c("fixed", "common", "cluster", "idiosyncratic")[1], randomizeGrit = FALSE, shrinkage_shape = 5, shrinkage_rate = 1, grit_shape1 = 1, grit_shape2 = 1, nCores = 0 )samplePartition( distr, nSamples, randomizePermutation = FALSE, randomizeShrinkage = c("fixed", "common", "cluster", "idiosyncratic")[1], randomizeGrit = FALSE, shrinkage_shape = 5, shrinkage_rate = 1, grit_shape1 = 1, grit_shape2 = 1, nCores = 0 )
distr |
A specification of the partition distribution, i.e., an object
of class |
nSamples |
An integer giving the number of partitions to sample. |
randomizePermutation |
Should the permutation be uniformly randomly sampled for each partition? |
randomizeShrinkage |
Should the shrinkage be random for each
sample? Specifically, the shrinkage is the same for every
observations and sampled from a gamma distribution with parameters
|
randomizeGrit |
Should the grit of the |
shrinkage_shape |
The shape parameter of the gamma distribution for randomizing the shrinkage. |
shrinkage_rate |
The rate parameter of the gamma distribution for randomizing the shrinkage. |
grit_shape1 |
The first parameter of the beta distribution for randomizing the grit. |
grit_shape2 |
The first parameter of the beta distribution for randomizing the grit. |
nCores |
The number of CPU cores to use. A value of zero indicates to use all cores on the system. |
Note that the centered partition distribution CenteredPartition
uses MCMC to sample.
An integer matrix containing a partition in each row using cluster label notation.
CRPPartition, ShrinkagePartition,
LocationScalePartition, CenteredPartition,
prPartition
concentration <- 1.0 discount <- 0.1 nSamples <- 3 distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x) anchor <- c(1,1,1,2,2) permutation <- c(1,5,4,2,3) n_items <- length(permutation) distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3), permutation=permutation, grit=0.2, CRPPartition(nItems=n_items, concentration=concentration, discount=discount)) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x)concentration <- 1.0 discount <- 0.1 nSamples <- 3 distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x) anchor <- c(1,1,1,2,2) permutation <- c(1,5,4,2,3) n_items <- length(permutation) distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3), permutation=permutation, grit=0.2, CRPPartition(nItems=n_items, concentration=concentration, discount=discount)) x <- samplePartition(distr, nSamples, nCores=1) prPartition(distr, x)
This function specifies the shrinkage partition distribution given an anchor partition, shrinkage, permutation and a baseline distribution.
ShrinkagePartition( anchor, shrinkage, permutation, grit, baseline, optimized = TRUE )ShrinkagePartition( anchor, shrinkage, permutation, grit, baseline, optimized = TRUE )
anchor |
An integer vector giving the anchor partition (a.k.a., center partition, location partition). |
shrinkage |
A numeric vector of length equal to the length of
|
permutation |
A vector containing the integers |
grit |
A numeric value controlling the amount of clustering, with small values encouraging few clusters and large values encouraging more clusters. Values between 0 and 1 (exclusive) ensure that, as shrinakge goes to infinity, the partition distribution concentrates on the anchor partition. |
baseline |
An object of class |
optimized |
When the baseline distribution is a CRP with zero discount, should optimized calculations be used? |
An object of class PartitionDistribution representing this
partition distribution.
ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(10,10,10,3,3), permutation=c(1,5,4,2,3), grit=0.1, baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1)) ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(1,1,1,3,3), permutation=c(1,5,4,2,3), grit=0.2, baseline=UniformPartition(nItems=5)) ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(0,0,0,3,3), permutation=c(1,5,4,2,3), grit=0.2, baseline=JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3)))ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(10,10,10,3,3), permutation=c(1,5,4,2,3), grit=0.1, baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1)) ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(1,1,1,3,3), permutation=c(1,5,4,2,3), grit=0.2, baseline=UniformPartition(nItems=5)) ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(0,0,0,3,3), permutation=c(1,5,4,2,3), grit=0.2, baseline=JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3)))
To aid in the prior elicitation process for the shrinkage and grit parameters in the Shrinkage Partition (SP) distribution, this function computes the log of the prior joint density of the shrinkage and grit parameters and the expected Rand index and the expected entropy using samples from the SP distribution for combinations of shrinkage and grit parameters. These computed quantities can then be displayed as shown in the examples below.
summarize_prior_on_shrinkage_and_grit( anchor, shrinkage_shape = 4, shrinkage_rate = 1, grit_shape1 = 2, grit_shape2 = 2, use_crp = TRUE, concentration = 1, shrinkage_n = 25, grit_n = 25, n_mc_samples = 100, domain_specification = list(n_mc_samples = 1000, percentile = 0.95), a = 1, n_cores = 0 )summarize_prior_on_shrinkage_and_grit( anchor, shrinkage_shape = 4, shrinkage_rate = 1, grit_shape1 = 2, grit_shape2 = 2, use_crp = TRUE, concentration = 1, shrinkage_n = 25, grit_n = 25, n_mc_samples = 100, domain_specification = list(n_mc_samples = 1000, percentile = 0.95), a = 1, n_cores = 0 )
anchor |
Anchor partition in the Shrinkage Partition (SP) distribution as a numeric vector of cluster labels. |
shrinkage_shape |
Shape parameter of the gamma prior distribution for the shrinkage parameter of the SP distribution. |
shrinkage_rate |
Rate parameter of the gamma prior distribution for the shrinkage parameter of the SP distribution. |
grit_shape1 |
First shape parameter of the beta prior distribution for the grit parameter of the SP distribution. |
grit_shape2 |
Second shape parameter of the beta prior distribution for the grit parameter of the SP distribution. |
use_crp |
Use the Chinese restaurant process (CRP) which serves as the
baseline distribution? If |
concentration |
Concentration parameter of the baseline distribution for the SP distribution. |
shrinkage_n |
Length of the evenly-spaced grid for the shrinkage parameter of the SP distribution. |
grit_n |
Length of the evenly-spaced grid for the grit parameter of the SP distribution. |
n_mc_samples |
Number of Monte Carlo samples used to compute the expectation of the Rand index and entropy. |
domain_specification |
List to control the domain for the computations,
containing either: 1. Elements named |
a |
The argument |
n_cores |
Number of CPU cores to use, where 0 (default) indicates all cores. |
A list containing five elements. Vectors shrinkage and grit give
the locations of grid values. Matrix log_density gives the log of the
prior density at the combinations of shrinakge and grit values. Likewise,
matrices expected_vi, expected_binder, expected_rand_index, and
expected_entropy give expectations at the combinations of shrinkage and
grit values. See the examples below to see how one might use this output.
D. B. Dahl, D. J. Johnson, and P. Müller (2022), Search Algorithms and Loss Functions for Bayesian Clustering, Journal of Computational and Graphical Statistics, 31(4), 1189-1201, doi:10.1080/10618600.2022.2069779.
anchor <- rep(1:4, each = 13) out <- summarize_prior_on_shrinkage_and_grit(anchor, n_mc_samples = 100, n_cores = 1) if (requireNamespace("fields") ) { fields::image.plot(out$shrinkage, out$grit, out$expected_rand_index, xlab = "Shrinkage", ylab = "Grit") contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0) } image(out$shrinkage, out$grit, out$expected_entropy, xlab = "Shrinkage", ylab = "Grit") contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0)anchor <- rep(1:4, each = 13) out <- summarize_prior_on_shrinkage_and_grit(anchor, n_mc_samples = 100, n_cores = 1) if (requireNamespace("fields") ) { fields::image.plot(out$shrinkage, out$grit, out$expected_rand_index, xlab = "Shrinkage", ylab = "Grit") contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0) } image(out$shrinkage, out$grit, out$expected_entropy, xlab = "Shrinkage", ylab = "Grit") contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0)
This function specifies the uniform partition distribution.
UniformPartition(nItems)UniformPartition(nItems)
nItems |
An integer giving the number of items in each partition. |
An object of class PartitionDistribution representing this
partition distribution.
UniformPartition(nItems=5)UniformPartition(nItems=5)