Package 'gourd'

Title: Shrinkage Partition Distribution
Description: Implementation of the distribution introduced in our paper "Dependent Random Partitions by Shrinking Toward an Anchor".
Authors: David B. Dahl [aut, cre] (ORCID: <https://orcid.org/0000-0002-8173-1547>), Richard L. Warr [aut] (ORCID: <https://orcid.org/0000-0001-8508-3105>), Thomas P. Jensen [aut] (ORCID: <https://orcid.org/0009-0000-0015-3881>), Authors of the dependency Rust crates [ctb] (see inst/AUTHORS file for details)
Maintainer: David B. Dahl <[email protected]>
License: MIT + file LICENSE | Apache License 2.0
Version: 0.2.20
Built: 2026-05-21 09:37:42 UTC
Source: https://github.com/dbdahl/dbdahl.r-universe.dev

Help Index


Probabilities for the Centered Partition Process

Description

This function specifies the centered partition distribution for given an anchor partition, shrinkage, concentration, and discount parameters.

Usage

CenteredPartition(anchor, shrinkage, baseline, useVI = TRUE, a = 1)

Arguments

anchor

An integer vector giving the anchor partition (a.k.a., center partition, location partition).

shrinkage

A scalar giving the value of the shrinkage parameter.

baseline

An object of class PartitionDistribution representing a partition distribution. Currently, only UniformPartition, JensenLiuPartition and CRPPartition are supported.

useVI

Should the distance between a particular partition and the anchor partition be measured using the variation of information (TRUE) or using Binder loss (FALSE)?

a

A nonnegative scalar influencing the relative cost of placing two items in separate clusters when in truth they belong to the same cluster. This defaults to 11, meaning equal costs.

Value

A numeric vector giving either probabilities or log probabilities for the supplied partitions.

Examples

CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3,
                  baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1))

CenteredPartition(anchor=c(1,1,1,2,2), shrinkage=3,
                  baseline=UniformPartition(nItems=5))

Probabilities for the Chinese Restaurant Process (CRP) Partition Distribution

Description

This function specifies the Chinese restaurant process (CRP) partition distribution for given concentration and discount parameters.

Usage

CRPPartition(nItems, concentration, discount = 0)

Arguments

nItems

The number of items in each partition, as an integer greater than or equal to 11.

concentration

The concentration parameter, as a numeric value greater than 1discount-1*discount.

discount

The discount parameter, as a numeric value in [0,1)[0,1).

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

CRPPartition(nItems=5, concentration=1.5, discount=0.1)

Probabilities for the Fixed Partition Distribution

Description

This function specifies the fixed partition distribution for a given anchor partition. This is a point-mass distribution at the anchor partition.

Usage

FixedPartition(anchor)

Arguments

anchor

An integer vector giving the anchor partition (a.k.a., center partition, location partition).

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

FixedPartition(anchor=c(1,1,1,2,2))

Probabilities for the Jensen Liu Partition Distribution

Description

This function specifies the Jensen Liu (2008) partition distribution for given concentration and permutation parameters.

Usage

JensenLiuPartition(concentration, permutation)

Arguments

concentration

The concentration parameter, as a numeric value greater than 1discount-1*discount.

permutation

A vector containing the integers 1,2,,n1, 2, \ldots, n giving the order in which items are allocated to the partition.

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3))

Probabilities for the Location Scale Partition Distribution

Description

This function specifies the location scale partition distribution for given anchor partition, shrinkage, and permutation parameters. The shrinkage parameter is the reciprocal of the scale parameter.

Usage

LocationScalePartition(anchor, shrinkage, concentration, permutation)

Arguments

anchor

An integer vector giving the anchor partition (a.k.a., center partition, location partition).

shrinkage

A scalar giving the value of the shrinkage parameter, i.e.,the reciprocal of the scale parameter.

concentration

The concentration parameter, as a numeric value greater than 1discount-1*discount.

permutation

A vector containing the integers 1,2,,n1, 2, \ldots, n giving the order in which items are allocated to the partition.

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

scale <- 0.1
LocationScalePartition(anchor=c(1,1,1,2,2), shrinkage=1/scale,
                       concentration=0.6, permutation=c(1,5,4,2,3))

Make a Pointer to Partitional Distribution Parameters

Description

Users should not call this function. This is an internal function exported for the sake of developers of packages depending on this package.

Usage

mkDistrPtr(distr, excluded = NULL, included = NULL)

Arguments

distr

An object of class ‘PartitionDistribution’.

excluded

A character vector of explicitly excluded partition distributions.

included

A character vector of explicitly included partition distributions.

Value

A pointer


Compute the (Log of) the Probability of a Partition

Description

This function computes (the log of) the probability of the supplied partitions for the given partition distribution.

Usage

prPartition(distr, partition, log = TRUE)

Arguments

distr

A specification of the partition distribution, i.e., an object of class PartitionDistribution as returned by, for example, a function such as CRPPartition.

partition

A matrix of integers giving cluster labels on the rows. For partition kk, items ii and jj are in the same cluster if and only if partition[k,i] == partition[k,j].

log

A logical indicating whether the probability (FALSE) or its natural logarithm (TRUE) is desired.

Value

A numeric vector giving either probabilities or log probabilities for the supplied partitions.

See Also

CRPPartition, ShrinkagePartition, LocationScalePartition, CenteredPartition, samplePartition

Examples

concentration <- 1.0
discount <- 0.1
nSamples <- 3

distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount)
x <- samplePartition(distr, nSamples, nCores=1)
prPartition(distr, x)

anchor <- c(1,1,1,2,2)
permutation <- c(1,5,4,2,3)
n_items <- length(permutation)

distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3),
             permutation=permutation, grit=0.2,
             CRPPartition(nItems=n_items, concentration=concentration, discount=discount))
x <- samplePartition(distr, nSamples, nCores=1)
prPartition(distr, x)

Sample From a Partition Distribution

Description

This function samples from a partition distribution.

Usage

samplePartition(
  distr,
  nSamples,
  randomizePermutation = FALSE,
  randomizeShrinkage = c("fixed", "common", "cluster", "idiosyncratic")[1],
  randomizeGrit = FALSE,
  shrinkage_shape = 5,
  shrinkage_rate = 1,
  grit_shape1 = 1,
  grit_shape2 = 1,
  nCores = 0
)

Arguments

distr

A specification of the partition distribution, i.e., an object of class PartitionDistribution as returned by, for example, a function such as CRPPartition.

nSamples

An integer giving the number of partitions to sample.

randomizePermutation

Should the permutation be uniformly randomly sampled for each partition?

randomizeShrinkage

Should the shrinkage be random for each sample? Specifically, the shrinkage is the same for every observations and sampled from a gamma distribution with parameters shrinkage_shape and shrinakge_rate.

randomizeGrit

Should the grit of the ShrinkagePartition be sampled from a beta distribution for each partition? This is ignored for other distributions.

shrinkage_shape

The shape parameter of the gamma distribution for randomizing the shrinkage.

shrinkage_rate

The rate parameter of the gamma distribution for randomizing the shrinkage.

grit_shape1

The first parameter of the beta distribution for randomizing the grit.

grit_shape2

The first parameter of the beta distribution for randomizing the grit.

nCores

The number of CPU cores to use. A value of zero indicates to use all cores on the system.

Details

Note that the centered partition distribution CenteredPartition uses MCMC to sample.

Value

An integer matrix containing a partition in each row using cluster label notation.

See Also

CRPPartition, ShrinkagePartition, LocationScalePartition, CenteredPartition, prPartition

Examples

concentration <- 1.0
discount <- 0.1
nSamples <- 3

distr <- CRPPartition(nItems=5, concentration=concentration, discount=discount)
x <- samplePartition(distr, nSamples, nCores=1)
prPartition(distr, x)

anchor <- c(1,1,1,2,2)
permutation <- c(1,5,4,2,3)
n_items <- length(permutation)

distr <- ShrinkagePartition(anchor=anchor, shrinkage=c(0,0,0,0.3,0.3),
             permutation=permutation, grit=0.2,
             CRPPartition(nItems=n_items, concentration=concentration, discount=discount))
x <- samplePartition(distr, nSamples, nCores=1)
prPartition(distr, x)

Probabilities for the Shrinkage Partition Distribution

Description

This function specifies the shrinkage partition distribution given an anchor partition, shrinkage, permutation and a baseline distribution.

Usage

ShrinkagePartition(
  anchor,
  shrinkage,
  permutation,
  grit,
  baseline,
  optimized = TRUE
)

Arguments

anchor

An integer vector giving the anchor partition (a.k.a., center partition, location partition).

shrinkage

A numeric vector of length equal to the length of anchor (i.e., the number of items) giving the shrinkage probability for each item. This can also be a scalar, in which case that value is used for each item.

permutation

A vector containing the integers 1,2,,n1, 2, \ldots, n giving the order in which items are allocated to the partition.

grit

A numeric value controlling the amount of clustering, with small values encouraging few clusters and large values encouraging more clusters. Values between 0 and 1 (exclusive) ensure that, as shrinakge goes to infinity, the partition distribution concentrates on the anchor partition.

baseline

An object of class PartitionDistribution representing a partition distribution. Currently, only UniformPartition, JensenLiuPartition and CRPPartition are supported.

optimized

When the baseline distribution is a CRP with zero discount, should optimized calculations be used?

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(10,10,10,3,3),
                      permutation=c(1,5,4,2,3), grit=0.1,
                      baseline=CRPPartition(nItems=5, concentration=1.5, discount=0.1))

ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(1,1,1,3,3),
                      permutation=c(1,5,4,2,3), grit=0.2,
                      baseline=UniformPartition(nItems=5))


ShrinkagePartition(anchor=c(1,1,1,2,2), shrinkage=c(0,0,0,3,3),
                      permutation=c(1,5,4,2,3), grit=0.2,
                      baseline=JensenLiuPartition(concentration=0.5, permutation=c(1,5,4,2,3)))

Summarize Implications of the Prior Distribution of the SP Distribution

Description

To aid in the prior elicitation process for the shrinkage and grit parameters in the Shrinkage Partition (SP) distribution, this function computes the log of the prior joint density of the shrinkage and grit parameters and the expected Rand index and the expected entropy using samples from the SP distribution for combinations of shrinkage and grit parameters. These computed quantities can then be displayed as shown in the examples below.

Usage

summarize_prior_on_shrinkage_and_grit(
  anchor,
  shrinkage_shape = 4,
  shrinkage_rate = 1,
  grit_shape1 = 2,
  grit_shape2 = 2,
  use_crp = TRUE,
  concentration = 1,
  shrinkage_n = 25,
  grit_n = 25,
  n_mc_samples = 100,
  domain_specification = list(n_mc_samples = 1000, percentile = 0.95),
  a = 1,
  n_cores = 0
)

Arguments

anchor

Anchor partition in the Shrinkage Partition (SP) distribution as a numeric vector of cluster labels.

shrinkage_shape

Shape parameter of the gamma prior distribution for the shrinkage parameter of the SP distribution.

shrinkage_rate

Rate parameter of the gamma prior distribution for the shrinkage parameter of the SP distribution.

grit_shape1

First shape parameter of the beta prior distribution for the grit parameter of the SP distribution.

grit_shape2

Second shape parameter of the beta prior distribution for the grit parameter of the SP distribution.

use_crp

Use the Chinese restaurant process (CRP) which serves as the baseline distribution? If FALSE, the Jensen Liu partition (JLP) is used instead.

concentration

Concentration parameter of the baseline distribution for the SP distribution.

shrinkage_n

Length of the evenly-spaced grid for the shrinkage parameter of the SP distribution.

grit_n

Length of the evenly-spaced grid for the grit parameter of the SP distribution.

n_mc_samples

Number of Monte Carlo samples used to compute the expectation of the Rand index and entropy.

domain_specification

List to control the domain for the computations, containing either: 1. Elements named shrinkage_lim and grit_lim (each vectors of length two indicating the lower and upper bound), or 2. Elements named n_mc_samples (giving the number of samples to draw to learn the domain) and percentile (indicating which percentile should be used to determine the upper bound).

a

The argument a is a nonnegative scalar in [0,2][0,2] giving (for Binder loss) the cost of placing two items in separate clusters when in truth they belong to the same cluster. Without loss of generality, the cost (under Binder loss) of placing two items in the same cluster when in truth they belong to separate clusters is fixed 2-a. For VI, a has a similar interpretation, although is not a unit cost. See Dahl, Johnson, Müller (2021).

n_cores

Number of CPU cores to use, where 0 (default) indicates all cores.

Value

A list containing five elements. Vectors shrinkage and grit give the locations of grid values. Matrix log_density gives the log of the prior density at the combinations of shrinakge and grit values. Likewise, matrices expected_vi, expected_binder, expected_rand_index, and expected_entropy give expectations at the combinations of shrinkage and grit values. See the examples below to see how one might use this output.

References

D. B. Dahl, D. J. Johnson, and P. Müller (2022), Search Algorithms and Loss Functions for Bayesian Clustering, Journal of Computational and Graphical Statistics, 31(4), 1189-1201, doi:10.1080/10618600.2022.2069779.

Examples

anchor <- rep(1:4, each = 13)
out <- summarize_prior_on_shrinkage_and_grit(anchor, n_mc_samples = 100, n_cores = 1)
if (requireNamespace("fields") ) {
  fields::image.plot(out$shrinkage, out$grit, out$expected_rand_index,
                     xlab = "Shrinkage", ylab = "Grit")
  contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0)
}
image(out$shrinkage, out$grit, out$expected_entropy, xlab = "Shrinkage", ylab = "Grit")
contour(out$shrinkage, out$grit, exp(out$log_density), add = TRUE, labcex = 1.0)

Probabilities for the Uniform Partition Distribution

Description

This function specifies the uniform partition distribution.

Usage

UniformPartition(nItems)

Arguments

nItems

An integer giving the number of items in each partition.

Value

An object of class PartitionDistribution representing this partition distribution.

Examples

UniformPartition(nItems=5)