UncertainHistogramming 
Have you ever had a situation where you need to visualize a set of data as a histogram, except the data you have to visualize are each endowed with some amount of uncertainty? If so, this package is for you! UncertainHistogramming.jl is a lightweight Julia package to plot a density function for a given set of values with known uncertainties.
Background Information
An example application of the main exported abstract struct, ContinuousHistogram, is to visualize a "histogram" of experimental values, when each value has a measured experimental uncertainty. This is to be contrast with normal Histogramming that assumes that each value is exact, meaning its uncertainty is zero().
For me, the need for this package first came about when I was running Monte Carlo simulations, where I needed to understand the underlying distribution of some observables. But, as anybody who has ever played around with Monte Carlo methods knows, each observable has a certain amount of statistical error. Thus, any regular histogram I would make when ignoring these statistical errors would not really expose the true distribution, as each data point could not entirely be claimed by a single histogram bin. So I invented the ContinuousHistogram as a somewhat tongue-in-cheek generalization of the regular histogram that takes data uncertainty into account.
This package provides similar functionality to what is expected from kernel density estimation (KDE), but here the data errors/uncertainties which act as the kernel bandwidths are all, in principle, different.
A ContinuousHistogram is continuous in the sense of its domain. This is admittedly a bit confusion, but the discretization that occurs in a regular histogram comes from its bins, or its domain, not its range. Of course, the range, or vertical values, are jumpy, but that is because of the discrete nature of the regular histogram. Most kernel functions that exist are at least piecewise continuous in their range, which is the same standard we take here.
Available ContinuousHistograms
We currently offer the following ContinuousHistograms which implement their designated KernelDistribution and kernel functions:
- The
GaussianHistogrambuilt onGaussianDistributions
\[G(y; \mu_i, \sigma_i) = \frac{ \exp\left[ -\frac{ \left( y - \mu_i \right)^2 }{2 \sigma_i^2} \right] }{ \sigma_i \sqrt{2\pi}}.\]
- The
UniformHistogrambuilt onUniformDistributions
\[\mathcal{U}(y; x_i, \epsilon_i) = \begin{cases} \frac{1}{2\epsilon_i}, & y \in (x_i - \epsilon_i, x_i + \epsilon_i) \\ 0, & \mathrm{otherwise} \end{cases}.\]
Each ContinuousHistogram are built around value-error pairs. For example, with the GaussianHistogram, the value-error pair are the mean and standard deviation of that gaussian.
Example Usage
An example ContinuousHistogram can be constructed from the following Julia code for a simple Vector of value-error Tuples.
To start, first include the following packages:
using Plots # One must Pkg.add this separately
using UncertainHistogramming
Plots.gr() # Use GR to reproduce the plot exactlyNext, define a list of (value, error)-Tuples:
values_errors = [(-3.5, 0.5),
(-1.5, 0.75),
(0, 0.25),
(1.5, 0.75),
(3.5, 0.5)]From here, we're in the position to initialize both a GaussianHistogram ghist and a UniformHistogram uhist, and then push! the values_errors Vector into them.
ghist = GaussianHistogram()
uhist = UniformHistogram()
push!(ghist, values_errors)GaussianHistogram{Float64}:
length = 5
moments = 1.1102230246251565e-16 6.1375 1.7763568394002505e-15 72.89453125
Statistics
mean = 1.1102230246251565e-16
variance = 6.1375
skewness = -1.7615310149552143e-17
kurtosis = -1.0648620173302747
push!(uhist, values_errors)UniformHistogram{Float64}:
length = 5
moments = 1.1102230246251565e-16 5.9125 0.0 65.54296875
Statistics
mean = 1.1102230246251565e-16
variance = 5.9125
skewness = -1.369764503743595e-16
kurtosis = -1.125075426073508
Note that the non-central statistical moments are updated in an online matter. This means that, aside from the overhead associated with push!ing two elements into the ContinuousHistogram's values and errors Vectors, there is an amortized cost associated with computing the statistics.
From here, we just need to define an input domain for the ContinuousHistograms to be computed over as
x = LinRange(-6, 6, 3000)3000-element LinRange{Float64, Int64}:
-6.0,-5.996,-5.992,-5.988,-5.98399,…,5.97999,5.98399,5.988,5.992,5.996,6.0and then, with the help of Plots.jl and RecipesBase.jl, we have
plot( plot(x, ghist; title = "\$ \\mathtt{GaussianHistogram}ming \$"),
plot(x, uhist; title = "\$ \\mathtt{UniformHistogram}ming \$");
size = (800, 600),
layout = (1, 2),
link = :both )
In the left plot, one can see that the GaussianHistogram is plotted as the solid blue curve, and the individual gaussian kernels that make it up are plotted as the dashed orange curves. The right plot shows the same set of curves defined by the UniformDistribution, instead. (The orange curves are only zero within their visible range; otherwise they are hidden by the solid blue curve.)
I want to remark here that with the power of Julia's multiple dispatch, once one properly defines the interface for a new type of ContinuousHistogram, the plotting functionality, along with the utilities and statistics, just work.
One may also supply the keyword argument nkernels to plot(x, hist) to change the number of kernels displayed. By default, nkernels == 5.
If the number of value-error pairs exceeds nkernels, that is nkernels < length(hist), then no kernels will be shown to save the end user from trying to understand an overly busy plot.
Add UncertainHistogramming.jl to your Julia environment
To add UncertainHistogramming.jl simply press ] in the Julia REPL to enter pkg mode and type
pkg> add UncertainHistogrammingand presto! You now have full access to UncertainHistogramming.jl.