Discretize Distributions.jl

A Julia package for converting continuous and discrete probability distributions into discrete representations with interval-based support using IntervalArithmetic.jl.

The package provides functions to discretize univariate distributions into DiscreteNonParametric distributions where the support consists of IntervalArithmetic.Interval objects. Each interval [a, b) represents a probability mass over that range, computed using the cumulative distribution function (CDF) for continuous distributions or aggregated probability mass function (PMF) for discrete distributions.

Alternative Packages

In julia, there are more lightweight discretizations. This package creates a new distribution that matches the discrete approximation (and so should be faster to simulate from) however this approach has more overhead than wrapping the existing pdf so these alternatives are recommended for fitting censored or discretized distributions (i.e. with Turing or some other package):

  • pdf/logpdf/cdf/logcdf methods from (StatsDiscretizations.jl)[https://github.com/nignatiadis/StatsDiscretizations.jl/tree/master]
  • (CensoredDistributions.jl)[https://github.com/EpiAware/CensoredDistributions.jl] which also has the ability for account for double censoring and truncation

In R:

  • distcrete in the (discrete)[https://github.com/reconhub/distcrete] package
  • discretize in the (actuar)[https://gitlab.com/vigou3/actuar] package

Limitations

  • Finite support: Infinite distributions are truncated using quantile bounds (default 0.1% and 99.9%)
  • Discrete distribution quirks: Discretizing already-discrete distributions has some limitations and edge cases
  • Non-integer discrete values: Discrete distributions with non-integer support may behave unexpectedly
  • Numeric means: Not all distributions have exact numeric means (i.e. truncated Gamma), these are needed for the :unbiased method so a backup numeric mean is calculated where possible using the trapezoid rule with trapezoid_points.

Future Work

  • Develop better warnings for incompatible distributions
  • Support for multivariate distributions

API Overview

The package provides three main discretize methods:

  1. Fixed intervals: discretize(dist, interval_width) - Creates uniform intervals of specified width
  2. Custom boundaries: discretize(dist, boundaries) - Uses custom interval boundaries
  3. Pre-constructed intervals: discretize(dist, intervals) - Uses pre-built Interval objects

All methods return a DiscreteNonParametric distribution with support determined by the method parameter.

Method Parameter

The discretize functions accept a method parameter that controls the output format:

  • :interval (default): Returns IntervalArithmetic.Interval objects as support points
  • :left_aligned: Returns left endpoints of intervals as point masses
  • :centred: Returns interval midpoints as point masses
  • :right_aligned: Returns right endpoints of intervals as point masses
  • :unbiased: Returns mean-preserving point masses (requires equal interval widths)
normal_dist = Normal(0, 1)

# Different output methods
intervals = discretize(normal_dist, 0.5; method=:interval)        # Interval objects
left_points = discretize(normal_dist, 0.5; method=:left_aligned)  # Left endpoints
center_points = discretize(normal_dist, 0.5; method=:centred)     # Midpoints
right_points = discretize(normal_dist, 0.5; method=:right_aligned) # Right endpoints

Unbiased Method

The :unbiased method provides mean-preserving discretization designed to minimize the difference between the original distribution's mean and the discretized distribution's mean. This is an implementation from the discretize function in the R package actuar.

# Unbiased discretization - preserves mean
normal_dist = Normal(2.0, 1.0)
unbiased_discrete = discretize(normal_dist, 0.2; method=:unbiased)

# Compare means
println("Original mean: ", mean(normal_dist))      # 2.0
println("Unbiased mean: ", mean(unbiased_discrete)) # ≈ 2.0
println("Centered mean: ", mean(discretize(normal_dist, 0.2; method=:centred)))

Both preserve the mean but the unbiased gives more control, supporting all values between [min, min + interval, ..., max] or [lowerquantile, lowerquantile + interval, ..., upper_quantile], where as centred (which maintains the mean) by necessity supports [min + interval/2, min + 3*interval/2, ..., max - interval/2]. However this requires that the mean of the given distribution be defined, and where an analytical approach is not defined (but the mean of the distribution isn't undefined in general) in Distributions an empirical mean is calculated.

Working with Results

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Discretize a normal distribution
normal_dist = Normal(0, 1)
interval_dist = discretize(normal_dist, 0.5)

# The result has interval support
support(interval_dist)  # Vector of Interval{Float64} objects
probs(interval_dist)    # Corresponding probabilities

# Convert to point-based distributions
left_aligned = left_align_distribution(interval_dist)     # Use left endpoints
centered = centred_distribution(interval_dist)            # Use midpoints  
right_aligned = right_align_distribution(interval_dist)   # Use right endpoints

Mathematical Details

Continuous Distributions

For continuous distributions, discretisation computes probability masses using the cumulative distribution function (CDF):

\[P(X' ∈ [a_i, a_{i+1})) = F(a_{i+1}) - F(a_i)\]

where F(x) is the CDF of the continuous distribution X.

Discrete Distributions

For discrete distributions, probability masses are aggregated over intervals using the probability mass function (PMF):

\[P(X' ∈ [a_i, a_{i+1})) = ∑_{k=⌈a_i⌉}^{⌊a_{i+1}⌋-1} P(X = k) + (P(X = ⌊a_i⌋) × (⌈a_i⌉ - a_i)) + (P(X = ⌊a_{i+1}⌋) × (a_{i+1} - ⌊a_{i+1}⌋))\]

All resulting discrete distributions are normalized to ensure probabilities sum to 1.

Advanced Usage

Handling Unbounded Distributions

For distributions with infinite support, control truncation with quantile bounds:

# Normal distribution - unbounded in both directions  
normal_dist = Normal(0, 1)
discrete_normal = discretize(normal_dist, 0.2; min_quantile=0.005, max_quantile=0.995)

# Exponential distribution - unbounded above
exp_dist = Exponential(1.0)  
discrete_exp = discretize(exp_dist, 0.1; max_quantile=0.99)

# Result includes infinite tail intervals
support(discrete_exp)  # [..., interval(4.5, 5.0), interval(5.0, ∞)]

Custom Interval Structures

Create non-uniform discretisations with custom boundaries:

# Fine resolution near zero, coarser elsewhere
custom_boundaries = [-5.0, -2.0, -1.0, -0.5, 0.0, 0.5, 1.0, 2.0, 5.0]
discrete_custom = discretize(Normal(0, 1), custom_boundaries)

# Results in intervals: [(-∞,-5], [-5,-2], [-2,-1], ..., [5,∞)]
length(support(discrete_custom))  # 10 intervals (8 from boundaries + 2 infinite tails)

Working with Pre-constructed Intervals

For advanced use cases, you can provide pre-constructed IntervalArithmetic.Interval objects:

using IntervalArithmetic

# Create custom intervals with specific properties
intervals = [
    interval(-2.0, -1.0),    # Standard interval
    interval(-1.0, 0.0),     # Adjacent interval
    interval(0.0, 2.0),      # Wider interval
    interval(2.0, Inf)       # Semi-infinite interval
]

# Discretize using these intervals
normal_dist = Normal(0, 1)
discrete_custom = discretize(normal_dist, intervals)
DiscretizeDistributions.discretizeFunction
discretize(dist::Distributions.UnivariateDistribution, interval::Real;
           method=:interval, min_quantile=0.001, max_quantile=0.999)

Discretize a univariate distribution into a discrete distribution using fixed intervals.

This function converts a univariate distribution into a discrete one by dividing the distribution's support into intervals of fixed width and computing the probability mass in each interval.

Arguments

  • dist::Distributions.UnivariateDistribution: The distribution to discretize (continuous or discrete)
  • interval::Real: The width of each discretisation interval
  • method::Symbol=:interval: Method for representing the output distribution
    • :interval (default): Return IntervalArithmetic.Interval objects as support
    • :left_aligned: Convert intervals to left-aligned point values
    • :centred: Convert intervals to centered point values
    • :right_aligned: Convert intervals to right-aligned point values
    • :unbiased: Return unbiased point estimates (requires equal interval widths), designed such that the means match, see discretize from the R package actuar
  • min_quantile=0.001: Lower quantile bound for unbounded distributions
  • max_quantile=0.999: Upper quantile bound for unbounded distributions
  • trapezoid_points::Int=10000: Number of points for numerical integration of the mean (when needed)

Returns

  • DiscreteNonParametric: Discrete distribution with support determined by the method parameter

Details

For bounded distributions, the natural bounds are used. For unbounded distributions, the bounds are determined using the specified quantiles. The probability mass in each interval is computed using the CDF for continuous distributions or a pseudo-CDF for discrete distributions.

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Discretize a normal distribution with interval width 0.5
normal_dist = Normal(0, 1)

# Different output methods
discrete_intervals = discretize(normal_dist, 0.5)                        # Intervals (default)
discrete_left = discretize(normal_dist, 0.5; method=:left_aligned)      # Left endpoints
discrete_center = discretize(normal_dist, 0.5; method=:centred)         # Midpoints
discrete_right = discretize(normal_dist, 0.5; method=:right_aligned)    # Right endpoints

# Compare means (centered method typically closest to original)
println("Original mean: ", mean(normal_dist))
println("Centered discretization mean: ", mean(discrete_center))

# Discretize a discrete distribution
poisson_dist = Poisson(3.0)
discrete_poisson = discretize(poisson_dist, 2; method=:centred)
source
discretize(dist::Distributions.UnivariateDistribution, interval::AbstractVector; method=:interval)

Discretize a univariate distribution using custom interval boundaries.

This function converts a univariate distribution into a discrete one using user-specified interval boundaries. The resulting distribution represents the probability mass in each interval.

Arguments

  • dist::Distributions.UnivariateDistribution: The distribution to discretize
  • interval::AbstractVector: Vector of interval boundaries (will be sorted automatically)
  • method::Symbol=:interval: Method for representing the output distribution
    • :interval (default): Return IntervalArithmetic.Interval objects as support
    • :left_aligned: Convert intervals to left-aligned point values
    • :centred: Convert intervals to centered point values
    • :right_aligned: Convert intervals to right-aligned point values
    • :unbiased: Return unbiased point estimates (requires equal interval widths), designed such that the means match, see discretize from the R package actuar
  • trapezoid_points::Int=10000: Number of points for numerical integration of the mean (when needed)

Returns

  • DiscreteNonParametric: Discrete distribution with support determined by the method parameter

Details

The input interval vector is automatically sorted and combined with distribution bounds. Probability masses are computed using the CDF for continuous distributions or pseudo-CDF for discrete distributions. The resulting distribution represents probability masses over intervals [a_i, a_{i+1}).

For the :unbiased method with unequal intervals, the function will warn and fall back to :centred.

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Discretize using custom intervals
normal_dist = Normal(5, 2)
custom_intervals = [0.0, 2.0, 4.0, 6.0, 8.0, 10.0]

# Different output methods
discrete_intervals = discretize(normal_dist, custom_intervals)                      # Intervals
discrete_left = discretize(normal_dist, custom_intervals; method=:left_aligned)    # Left points
discrete_center = discretize(normal_dist, custom_intervals; method=:centred)       # Midpoints
discrete_right = discretize(normal_dist, custom_intervals; method=:right_aligned)  # Right points

# Support: intervals like [interval(-∞, 0.0), interval(0.0, 2.0), ..., interval(10.0, ∞)]

# Discrete distribution with custom intervals
poisson_dist = Poisson(3.0)
discrete_poisson = discretize(poisson_dist, [0.5, 2, 4, 6, 8, 10]; method=:centred)
source
discretize(dist::Distributions.UnivariateDistribution,
           interval::AbstractVector{IntervalArithmetic.Interval{X}}; method=:interval) where X <: Real

Discretize a univariate distribution using pre-constructed interval objects.

This function converts a univariate distribution into a discrete one using user-specified IntervalArithmetic.Interval objects. This is the core discretization method that all other discretize methods ultimately call.

Arguments

  • dist::Distributions.UnivariateDistribution: The distribution to discretize
  • interval::AbstractVector{IntervalArithmetic.Interval{X}}: Vector of pre-constructed intervals
  • method::Symbol=:interval: Method for representing the output distribution
    • :interval (default): Return intervals as support points
    • :left_aligned: Convert intervals to left-aligned point values
    • :centred: Convert intervals to centered point values
    • :right_aligned: Convert intervals to right-aligned point values
    • :unbiased: Return unbiased point estimates (requires equal interval widths), designed such that the means match, see discretize from the R package actuar
  • trapezoid_points::Int=10000: Number of points for numerical integration of the mean (when needed)

Returns

  • DiscreteNonParametric: Discrete distribution with support determined by the method parameter

Details

This method computes probability masses directly using the interval boundaries. For each interval [a, b], the probability is computed as cdf(dist, b) - cdf(dist, a). The resulting probabilities are normalized to sum to 1.

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Create intervals manually
intervals = [interval(-1.0, 0.0), interval(0.0, 1.0), interval(1.0, 2.0)]

# Discretize using these intervals with different methods
normal_dist = Normal(0, 1)
discrete_intervals = discretize(normal_dist, intervals)                      # Intervals
discrete_centered = discretize(normal_dist, intervals; method=:centred)      # Midpoints
discrete_left = discretize(normal_dist, intervals; method=:left_aligned)     # Left endpoints

# Each method gives the same probabilities but different support representations
source
DiscretizeDistributions.left_align_distributionFunction
left_align_distribution(dist::Distributions.DiscreteNonParametric{IntervalArithmetic.Interval{T}, ...})

Convert an interval-based discrete distribution to a left-aligned point-based distribution.

This function takes a discrete distribution with interval support and creates a new distribution where each support point is positioned at the left endpoint (infimum) of the corresponding interval. Infinite intervals are automatically removed before conversion.

Arguments

  • dist::DiscreteNonParametric{Interval{T}, ...}: Input discrete distribution with interval support

Returns

  • DiscreteNonParametric{T, ...}: New distribution with left-aligned point support

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Create an interval-based distribution
intervals = [interval(0.0, 1.0), interval(1.0, 2.0), interval(2.0, 3.0)]
probs = [0.3, 0.4, 0.3]
interval_dist = DiscreteNonParametric(intervals, probs, check_args=false)

# Convert to left-aligned points
left_aligned = left_align_distribution(interval_dist)
# Support becomes [0.0, 1.0, 2.0] (left endpoints of intervals)
source
DiscretizeDistributions.centred_distributionFunction
centred_distribution(dist::Distributions.DiscreteNonParametric{IntervalArithmetic.Interval{T}, ...})

Convert an interval-based discrete distribution to a centered point-based distribution.

This function takes a discrete distribution with interval support and creates a new distribution where each support point is positioned at the center (midpoint) of the corresponding interval. Infinite intervals are automatically removed before conversion.

Arguments

  • dist::DiscreteNonParametric{Interval{T}, ...}: Input discrete distribution with interval support

Returns

  • DiscreteNonParametric{T, ...}: New distribution with centered point support

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Create an interval-based distribution
intervals = [interval(0.0, 1.0), interval(1.0, 2.0), interval(2.0, 3.0)]
probs = [0.3, 0.4, 0.3]
interval_dist = DiscreteNonParametric(intervals, probs, check_args=false)

# Convert to centered points
centered = centred_distribution(interval_dist)
# Support becomes [0.5, 1.5, 2.5] (midpoints of intervals)
source
DiscretizeDistributions.right_align_distributionFunction
right_align_distribution(dist::Distributions.DiscreteNonParametric{IntervalArithmetic.Interval{T}, ...})

Convert an interval-based discrete distribution to a right-aligned point-based distribution.

This function takes a discrete distribution with interval support and creates a new distribution where each support point is positioned at the right endpoint (supremum) of the corresponding interval. Infinite intervals are automatically removed before conversion.

Arguments

  • dist::DiscreteNonParametric{Interval{T}, ...}: Input discrete distribution with interval support

Returns

  • DiscreteNonParametric{T, ...}: New distribution with right-aligned point support

Examples

using Distributions, DiscretizeDistributions, IntervalArithmetic

# Create an interval-based distribution
intervals = [interval(0.0, 1.0), interval(1.0, 2.0), interval(2.0, 3.0)]
probs = [0.3, 0.4, 0.3]
interval_dist = DiscreteNonParametric(intervals, probs, check_args=false)

# Convert to right-aligned points
right_aligned = right_align_distribution(interval_dist)
# Support becomes [1.0, 2.0, 3.0] (right endpoints of intervals)
source