OnlineLogBinning API Reference

Documentation for OnlineLogBinning.

const MINIMUM_RX_64 = eps(Float64)
const MINIMUM_RX_32 = eps(Float32) 
const MINIMUM_RX_16 = eps(Float16)

Minimum allowable variance values based on the least-squares fit type. Any data stream variances smaller than these are suspiciously small, and one should not trust an automated binning analysis in these instances.


Defines the list of tested numerical types for OnlineLogBinning.jl.


These types are specifically given as:

  • Float16, Float32, Float64 for Real numbers.
  • ComplexF16, ComplexF32, ComplexF64 for Complex numbers.
BinningAccumulator{T}() where {T <: Number} (default T = Float64)
BinningAccumulator{T}(::Int) where {T <: Number} (default T = Float64)

Main data structure for the binning analysis. T == Float64 by default in the empty constructor. There are three constructors, an empty one, one that copy-constructs with a Vector of LevelAccumulators, and one that pre-allocates that Vector based on an anticipated datastream size.


  • LvlAccums::Vector{LevelAccumulator{T}}

The pre-allocated constructor requires at least version 0.2.2.


julia> # Create a BinningAccumulator with the default type T == Float64

julia> bacc = BinningAccumulator()  
BinningAccumulator{Float64} with 0 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN

julia> # Add a data stream using the push! function

julia> # (The data stream does not have to have a length == power of 2.)

julia> push!(bacc, [1, 2, 3, 4])
BinningAccumulator{Float64} with 2 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 4
    Taccum   = 10.0
    Saccum   = 5.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = 2.5
    Current Variance         = 1.6666666666666667
    Current Std. Deviation   = 1.2909944487358056
    Current Var. of the Mean = 0.4166666666666667
    Current Std. Error       = 0.6454972243679028

1th Binning Level:
LevelAccumulator{Float64} with online fields:
    level    = 1
    num_bins = 2
    Taccum   = 5.0
    Saccum   = 2.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = 2.5
    Current Variance         = 2.0
    Current Std. Deviation   = 1.4142135623730951
    Current Var. of the Mean = 1.0
    Current Std. Error       = 1.0

2th Binning Level:
LevelAccumulator{Float64} with online fields:
    level    = 2
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(false, [0.0, 2.5])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN
BinningAnalysisResult{T <: AbstractFloat}

Small struct to determine if there is a _plateau_found from a BinningAccumulator, and what its value is.


  • plateau_found::Bool: whether the fit_RxValues found a plateau from the binned data.
  • RxAmplitude::T: the value for the plateau as calculated by fit_RxValues.
    • If plateau_found == false, then RxAmplitude = length(X) for a datastream X, so as to maximize the error estimation.
  • effective_length::Int: the effective number of uncorrelated data points in the datastream X as calculated by

\[m_{\rm eff} = \mathtt{floor} \left( \frac{\mathtt{length}(X)}{R_X} \right).\]

  • binning_mean::T: the value of the mean as calculated by

\[\mathtt{mean}(X) = \frac{ T^{(0)} }{ m^{(0)} }.\]

  • binning_error::T: the value of the error as calculated by

\[\begin{aligned} \mathtt{error}(X) &= \sqrt{ \frac{ S^{(0)} }{ m_{\rm eff} \left( m^{(0)} - 1 \right) } } \\ &= \sqrt{ \left[ \mathtt{floor}\left( \frac{m^{(0)}}{R_X} \right) \right]^{-1} \, \frac{ S^{(0)} }{ m^{(0)} - 1 } }. \end{aligned}\]

LevelAccumulator{T <: Number}

Accumulator structure for a given binning level.


  • level::Int
    • Registers the binning level this accumulator is assigned
  • num_bins::Int
    • How many elements (i.e. bins) have been added to this accumulator
  • Taccum::T
    • Stands for Total Accumulator.
    • This represents the T accumulator for the mean: mean ≡ T / num_bins.
  • Saccum::T
    • Stands for Square Accumulator.
    • This represents the S accumulator for the variance: var ≡ S/(num_bins - 1).
  • Paccum::PairAccumulator{T}
    • An outward facing PairAccumulator to meet incoming data streams.
    • This accumulator processes the incoming data and then exports the Tvalue and Svalue into updates for Taccum and Saccum, respectively.
PairAccumulator{T <: Number}

Accumulator that directly faces an incoming data stream. Two values from that stream enter and are processed into the exported values of Tvalue and Svalue.


  • fullpair::Bool
    • A Boolean to keep track of which element of the pair is being accessed. Additionally, when fullpair == true then the contents are exported.
  • values::MVector{2, T}
    • The individual values taken from the data stream to be processed. Both Tvalue and Svalue rely on them being accessible.
getindex(bacc::BinningAccumulator; level)

Overload the [] notation by accessing the BinningAccumulator's LvlAccums at a specific binning level keyword.


julia> bacc = BinningAccumulator();

julia> bacc[level = 0]
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN

Return the number of LevelAccumulators there are.


julia> bacc = BinningAccumulator();

julia> push!(bacc, [1, 2, 3, 4, 3, 2, 1]); # Data stream with 7 elements

julia> length(bacc) # Only 2 binning levels (1 for unbinned data)
push!(bacc::BinningAccumulator, itr)

push! each value of the data stream itr through the BinningAccumulator.


julia> bacc = BinningAccumulator()
BinningAccumulator{Float64} with 0 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN

julia> push!(bacc, [42, -26])
BinningAccumulator{Float64} with 1 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 2
    Taccum   = 16.0
    Saccum   = 2312.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = 8.0
    Current Variance         = 2312.0
    Current Std. Deviation   = 48.08326112068523
    Current Var. of the Mean = 1156.0
    Current Std. Error       = 34.0

1th Binning Level:
LevelAccumulator{Float64} with online fields:
    level    = 1
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(false, [0.0, 8.0])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN
push!(bacc::BinningAccumulator, value::Number)

Add a single value from the data stream into the online binning analysis. The single value enters at the bin at the lowest level.


julia> bacc = BinningAccumulator()
BinningAccumulator{Float64} with 0 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(true, [0.0, 0.0])

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN

julia> push!(bacc, 42)
BinningAccumulator{Float64} with 0 binning levels.
0th Binning Level (unbinned data):
LevelAccumulator{Float64} with online fields:
    level    = 0
    num_bins = 0
    Taccum   = 0.0
    Saccum   = 0.0
    Paccum   = PairAccumulator{Float64}(false, [0.0, 42.0])        

    Calculated Level Statistics:
    Current Mean             = NaN
    Current Variance         = -0.0
    Current Std. Deviation   = -0.0
    Current Var. of the Mean = NaN
    Current Std. Error       = NaN

Notice that the Taccum and Saccum remain zero while num_bins == 0. These are only accumulated for each input pair. Or once Paccum.fullpair == true.

push!(pacc::PairAccumulator, value::Number)

Overload Base.push! for a PairAccumulator. One can only push! a single value <: Number at a time into this type of accumulator.

show([io::IO = stdout], bacc::BinningAccumulator)

Overload the function for human-readable displays.

show([io = stdout], lacc::LevelAccumulator)

Overload for human-readable displays.

RxValue(bacc::BinningAccumulator, [trustworthy_only = true]; [trusting_cutoff])

Calculate the RxValues from the statistically trustworthy binning levels by default, or from all of them if trustworthy_only == false.

RxValue(bacc::BinningAccumulator, level)

Compute the $R_X$ quantity from the binning analysis. This quantity starts at $1$ for low binning levels, then gradually rises, until the bins become statistically uncorrelated at which point $R_X$ should saturate. Once saturated, the effective number of uncorrelated elements in a correlated data stream of size $M$ is given in terms of $R_X$ by $M / R_X$.


Function to calculate the online $S_{1,m+2}$ summation as:

\[S_{1,m+2} = S_{1,m} + S_{m+1,m+2} + \frac{m}{2(m+2)}\left( \frac{2}{m} T_{1,m} - T_{m+1,m+2} \right)^2.\]

where $T_{m+1,m+2}$ is the pairwise Tvalue for the PairAccumulator.


The $S$ function for a single pair following the accumulation of $m$ data points follows as

\[\begin{aligned} S_{m+1, m+2} &\equiv \sum_{k = m+1}^{m+2} \left( x_k - \frac{1}{2} T_{m+1,m+2} \right)^2 \\ &= \frac{1}{2}\left( x_{m+2} - x_{m+1} \right)^2. \end{aligned}\]

Thus, $S_{m+1,m+2}$ does not need to take $T_{m+1,m+2}$ as an argument.


The $T$ function for a single pair following the accumulation of $m$ data points follows as

\[T_{m+1, m+2} \equiv \sum_{k = m+1}^{m+2} x_k = x_{m+1} + x_{m+2},\]

as expected.

_plateau_found(bacc, fit) → Bool

Test whether a plateau has been found from the fit using the LsqFit.jl package. This includes finding reasonable values for the sigmoid parameters.


What counts as a plateau?

A plateau in the RxValues is defined to be present if the following three conditions on the sigmoid fit are all true:

  1. None of the computed level variances are too small.
  2. The amplitude is positive.
  3. The inflection point given by θ₁ / θ₂ < max_trustworthy_level(levels).

If any of these conditions are violated, then we do not trust that the RxValues have actually converged to a single value, meaning that the datastream is not sufficiently large enough to separate correlated data from one another.


Number of binned levels present. length of the [BinningAccumulator] minus 1.


julia> bacc = BinningAccumulator();

julia> push!(bacc, [1, 2, 3, 4, 3, 2, 1]); # Data stream with 7 elements

julia> bin_depth(bacc) # Only 2 binning levels (1 for unbinned data)
effective_uncorrelated_values(mvals, RxVal)

Calculation of the effective number of uncorrelated values in a correlated datastream:

\[m_{\rm eff} = \mathtt{floor} \left( \frac{ m^{(0)} }{R_X} \right).\]

levels_RxValues(bacc::BinningAccumulator, [trustworthy_only = true]; [trusting_cutoff = TRUSTING_CUTOFF])

Return a Tuple of identically-sized Vectors. The first element of the Tuple are the binning levels and the second are the corresonding RxValues. If trustworthy_only == true, then only the trustworthy levels and values are returned. If trustworthy_only == false, then all levels and values are returned (except for the last level which is typically not full).

This function is meant to make visualization more convenient and does not offer any different functionality than what was available before.


Requires OnlineLogBinning.jl v0.3.0 or higher.

max_trustworthy_level(nelements; [trusting_cutoff])

Calculates the highest binning level that remains statistically trustworthy according to the TRUSTING_CUTOFF, $t_c$.

Given a number of elements in a data stream, $N$, this quantity is

\[\ell_{\rm max} = {\rm floor} \left[ \log_2 \left( \frac{N}{t_c} \right) \right].\]

sigmoid(x, [amp = 1], [θ₁ = 0], [θ₂ = 1])

Calculate a Sigmoid at a given argument x. The Sigmoid function $S(x; A, \theta_1, \theta_2)$ is of the form

\[S(x; A, \theta_1, \theta_2) = \frac{A}{1 + \exp\left( \theta_1 - \theta_2 x \right)}.\]

sigmoid_jacobian(x, pvals)

Calculate the "Jacobian" of first derivatives for a sigmoid to speed the LsqFit fitting. The derivatives are given by

\[\begin{aligned} \frac{\partial S}{\partial A} &= \frac{1}{1 + \exp\left( \theta_1 - \theta_2 x \right)}, \\ & \\ \frac{\partial S}{\partial \theta_1} &= -\frac{A \, \exp\left( \theta_1 - \theta_2 x \right) }{\left[ 1 + \exp\left( \theta_1 - \theta_2 x \right) \right]^2}, \\ & \\ \frac{\partial S}{\partial \theta_2} &= \frac{A \, x \, \exp\left( \theta_1 - \theta_2 x \right) }{\left[ 1 + \exp\left( \theta_1 - \theta_2 x \right) \right]^2}. \end{aligned}\]

std_error( bacc::BinningAccumulator )

Online measurement of the [BinningAccumulator] standard error.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
std_error( lacc::LevelAccumulator ) = sqrt(var_of_mean(lacc))

Online measurement of the [LevelAccumulator] standard error.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
trustworthy_level(level; [trustworthy_cutoff = 64])

A binning level is said to be a trustworthy_level if the number of bins it contains is greater than or equal to the trustworthy_cutoff.

The number of bins $N_{\rm bin}$ in any binning level is related to the number of elements $N$ and its binning level $\ell \in \{0, 1, \dots \}$ by

\[N_{\rm bin} = \frac{N}{2^{\ell}}.\]

This means that, for a given trustworthy_cutoff of $t_c$, then the maximum number of trustworthy_levels present are

\[{\rm Total}(\ell) = 1 + {\rm floor} \left[ \log_2 \left( \frac{N}{t_c} \right) \right],\]

where the extra 1 comes from assuming the original data stream has more than $t_c$ elements in it, making the $\ell = 0$ level a trustworthy_level.


Basically this just means that the statistics we're showing are not susceptible to low-number effects. The $log_2$ term is the calculated using max_trustworthy_level.

var_of_mean( bacc::BinningAccumulator; [level = 0] )

Online measurement of the [BinningAccumulator] variance of the mean.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
var_of_mean( lacc::LevelAccumulator ) = var(lacc) / lacc.num_bins

Online measurement of the [LevelAccumulator] variance of the mean.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
mean( bacc::BinningAccumulator; [level = 0] )

Online measurement of the [BinningAccumulator] mean.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
mean( lacc::LevelAccumulator )

Online measurement of the [LevelAccumulator] mean.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
std( bacc::BinningAccumulator )

Online measurement of the [BinningAccumulator] standard deviation.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
std( lacc::LevelAccumulator ) = sqrt(var(lacc))

Online measurement of the [LevelAccumulator] standard deviation.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
var( bacc::LevelAccumulator; [level = 0] )

Online measurement of the [BinningAccumulator] variance.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.
var( lacc::LevelAccumulator )

Online measurement of the [LevelAccumulator] variance.

Additional information

  • This quantity is considered online despite that it is not regularly updated when data is push!ed from the stream.