UncertainHistogramming
Have you ever had a situation where you need to visualize a set of data as a histogram, except the data you have to visualize are each endowed with some amount of uncertainty? If so, this package is for you! UncertainHistogramming.jl
is a lightweight Julia package to plot a density function for a given set of values with known uncertainties.
Background Information
An example application of the main export
ed abstract
struct
, ContinuousHistogram
, is to visualize a "histogram" of experimental values, when each value has a measured experimental uncertainty. This is to be contrast with normal Histogram
ming that assumes that each value is exact, meaning its uncertainty is zero()
.
For me, the need for this package first came about when I was running Monte Carlo simulations, where I needed to understand the underlying distribution of some observables. But, as anybody who has ever played around with Monte Carlo methods knows, each observable has a certain amount of statistical error. Thus, any regular histogram I would make when ignoring these statistical errors would not really expose the true distribution, as each data point could not entirely be claimed by a single histogram bin. So I invented the ContinuousHistogram
as a somewhat tongue-in-cheek generalization of the regular histogram that takes data uncertainty into account.
This package provides similar functionality to what is expected from kernel density estimation (KDE), but here the data errors/uncertainties which act as the kernel bandwidths are all, in principle, different.
A ContinuousHistogram
is continuous in the sense of its domain. This is admittedly a bit confusion, but the discretization that occurs in a regular histogram comes from its bins, or its domain, not its range. Of course, the range, or vertical values, are jumpy, but that is because of the discrete nature of the regular histogram. Most kernel functions that exist are at least piecewise continuous in their range, which is the same standard we take here.
Available ContinuousHistogram
s
We currently offer the following ContinuousHistogram
s which implement their designated KernelDistribution
and kernel
functions:
- The
GaussianHistogram
built onGaussianDistribution
s
\[G(y; \mu_i, \sigma_i) = \frac{ \exp\left[ -\frac{ \left( y - \mu_i \right)^2 }{2 \sigma_i^2} \right] }{ \sigma_i \sqrt{2\pi}}.\]
- The
UniformHistogram
built onUniformDistribution
s
\[\mathcal{U}(y; x_i, \epsilon_i) = \begin{cases} \frac{1}{2\epsilon_i}, & y \in (x_i - \epsilon_i, x_i + \epsilon_i) \\ 0, & \mathrm{otherwise} \end{cases}.\]
Each ContinuousHistogram
are built around value-error pairs. For example, with the GaussianHistogram
, the value-error pair are the mean and standard deviation of that gaussian
.
Example Usage
An example ContinuousHistogram
can be construct
ed from the following Julia code for a simple Vector
of value-error Tuple
s.
To start, first include the following packages:
using Plots # One must Pkg.add this separately
using UncertainHistogramming
Plots.gr() # Use GR to reproduce the plot exactly
Next, define a list of (value, error)
-Tuple
s:
values_errors = [(-3.5, 0.5),
(-1.5, 0.75),
(0, 0.25),
(1.5, 0.75),
(3.5, 0.5)]
From here, we're in the position to initialize both a GaussianHistogram
ghist
and a UniformHistogram
uhist
, and then push!
the values_errors
Vector
into them.
ghist = GaussianHistogram()
uhist = UniformHistogram()
push!(ghist, values_errors)
GaussianHistogram{Float64}:
length = 5
moments = 1.1102230246251565e-16 6.1375 1.7763568394002505e-15 72.89453125
Statistics
mean = 1.1102230246251565e-16
variance = 6.1375
skewness = -1.7615310149552143e-17
kurtosis = -1.0648620173302747
push!(uhist, values_errors)
UniformHistogram{Float64}:
length = 5
moments = 1.1102230246251565e-16 5.9125 0.0 65.54296875
Statistics
mean = 1.1102230246251565e-16
variance = 5.9125
skewness = -1.369764503743595e-16
kurtosis = -1.125075426073508
Note that the non-central statistical moment
s are updated in an online matter. This means that, aside from the overhead associated with push!
ing two elements into the ContinuousHistogram
's values
and errors
Vectors
, there is an amortized cost associated with computing the statistics.
From here, we just need to define an input domain for the ContinuousHistogram
s to be computed over as
x = LinRange(-6, 6, 3000)
3000-element LinRange{Float64, Int64}:
-6.0,-5.996,-5.992,-5.988,-5.98399,…,5.97999,5.98399,5.988,5.992,5.996,6.0
and then, with the help of Plots.jl
and RecipesBase.jl
, we have
plot( plot(x, ghist; title = "\$ \\mathtt{GaussianHistogram}ming \$"),
plot(x, uhist; title = "\$ \\mathtt{UniformHistogram}ming \$");
size = (800, 600),
layout = (1, 2),
link = :both )
In the left plot, one can see that the GaussianHistogram
is plotted as the solid blue curve, and the individual gaussian
kernel
s that make it up are plotted as the dashed orange curves. The right plot shows the same set of curves defined by the UniformDistribution
, instead. (The orange curves are only zero within their visible range; otherwise they are hidden by the solid blue curve.)
I want to remark here that with the power of Julia's multiple dispatch, once one properly defines the interface for a new type of ContinuousHistogram
, the plotting functionality, along with the utilities and statistics, just work.
One may also supply the keyword argument nkernels
to plot(x, hist)
to change the number of kernel
s displayed. By default, nkernels == 5
.
If the number of value-error pairs exceeds nkernels
, that is nkernels < length(hist)
, then no kernel
s will be shown to save the end user from trying to understand an overly busy plot.
Add UncertainHistogramming.jl
to your Julia environment
To add UncertainHistogramming.jl
simply press ]
in the Julia REPL
to enter pkg
mode and type
pkg> add UncertainHistogramming
and presto! You now have full access to UncertainHistogramming.jl
.