PointProcessTools.jl

Exported data types

PointProcessTools.RecordType

Record

  • events::Vector{<:AbstractFloat} -> Event history of the process
  • interval::Interval -> Time window of the observation

Represents an observed process. Event times represent time before present, therefore the present in represented as 0 and positive values represent past events. If negative values are passed, they will be converted to positive.

Implemented functions: show, events_to_array, n_events, length, interval, start, finish, getindex.

The events can either be in a csv file and the path to it provided to the constructor or passed directly as a Vector. For a csv file, first column must contain the times of the events and it must contain a header. Look at the folder Resources/data for an example.

Record("Resources/data/record.csv")
Record([1, 4, 5, 5.5, 8, 8.9])

An interval can be provided in the last positions.

Record("Resources/data/record.csv", "Resources/data/proxy.csv", 100000, period=10000)
Record([1, 4, 5, 5.5, 8, 8.9], [0, 2, 4, 6, 8, 10], [1, 2, 5, 4, 7, 3], 200000, 500000)

For csv files, if the file contains addtional categorical columns, keyword arguments can be passed to filter the desired events. Provide the pair col=[val1, val2, ..., valn] to keep only events whose value in column col is any of the values val1, val2, ..., valn.

Record("Resources/data/record.csv", Composition=["Mafic", "Bimodal"])
PointProcessTools.ProxyFunction

Proxy

Is a continuous function constructed by linearly interpolating the proxy values. The actual data type of a Proxy is Interp, which is an alias for Interpolations. Extrapolation (from the Interpolations package), but the Proxy function acts as a constructor.

Implemented functions: show, minimum, maximum, get_xs, get_ys, -, interval, integral, .

To initialize, a path to a csv file can be provided. The file must contain the times of each observation in the first column and the values of each observation in the second. Look in the Resources/data folder for an example.

Proxy("Resources/data/proxy.csv")

There are two keyword arguments: period and shift.

period is for calculating the finite central difference of the function.

shift is for shifting the function backwards or forwards.

-Proxy("Resources/data/proxy.csv", period=10000, shift=5000)

Instead of a file, is possible to initialize a Proxy with the values columns of the file as Vectors.

-Proxy([0:1000:1000000], rand(1000))

Optionally, an interval can be passed too. Either one number, representing the total time span of the proxy, or two number, representing the start and end of the interval where the proxy is defined. Interval.

Proxy("Resources/data/proxy.csv", 500000)
Proxy("Resources/data/proxy.csv", 100000, 300000)
PointProcessTools.ParametersType

Abstract type for dispatching on the type of parameters.

Used in CIF, simulate, time_transform.

Implemented methods for: length, collect and show.

Possible concrete subtypes are:

  • ParametersHP -> Homogeneous Poisson | 1 parameter | μ
  • ParametersIP -> Inhomogeneous Poisson | 2 parameters | μ, γ
  • ParametersHH -> Homogeneous Hawkes | 3 parameters | μ, α, β
  • ParametersIH -> Inhomogeneous Hawkes | 4 parameters | μ, γ, α, β

Mostly initialized by the estimate function. But can be initialized by providing the corresponding parameters.

rec = Record("Resources/data/record.csv")
params_hp = estimate("hp", rec) # ParametersHP
params_hh = PointProcessTools.ParametersHH(5e-5, 5e-5, 8e-5)

Exported functions

PointProcessTools.fit_testFunction

Performs the goodness-of-fit test for a given model and distance function. The number of simulations used in the bootstrap can be set with the n_sims keyword argument (default 1000).

The model may be provided either as a string or as an instance of the model type. The dist may be provided either as a string or as an instance of the distance type.

For inhomogeneous processes, the proxy field of rec must not be nothing.

Returns a named tuple with fields:

  • p -> returned p-value of the test
  • sim_dists -> simulated distances used in calculating the p-value
  • dist -> distance between the observed and the estimated process
  • params -> estimated parameters of the model

See Record distance Model simulate estimate

rec = Record("Resources/data/record.csv")
fit_test("hp", "ks", rec) # Perform the goodness-of-fit test for a Poisson process using the KS-distance
fit_test("hh", "lp", rec; n_sims=1000) # Perform the test for a Hawkes process using the Laplace distance.
PointProcessTools.estimateFunction

Estimate the parameters of an observed process as one of the supported models.

The 'model' may be provided either as a string or as an instance of the model type.

For the homogeneous Poisson model, the maximum likelihood estimator (MLE) can be calculated directly (Laub (2021)).

For inhomogeneous Poisson, the MLE is approximated using a newtonian optimization method.

For both variants of the Hawkes process, the MLE is approximated with an expectation maximization (EM) algorithm (E. Lewis, G. Mohler (2011)). Returns an instance of Parameters.

See Model, Parameters, Record.

rec = Record("Resources/data/record.csv")
estimate("hp", rec) # Estimate the parameters for a Poisson process
estimate("hh", rec) # Estimate the parameters for a Hawkes process
PointProcessTools.simulateFunction

Simulate one realization of a point process with the given parameters and interval.

This function is dispatched on the type of 'params'. For inhomogeneous processes, a Proxy object is required to provide the intensity function.

Returns a vector containing the event times.

See Parameters, Record, Proxy.

simulate(ParametersHP(1), 0, 100) # Simulates a Poisson process with unit intensity over [0, 100]
params_ih = ParametersIH(1, 4, 2, 4)
proxy = Proxy(collect(0:100), log.(0:100))
simulate(params_ih, 0, 100, proxy) # Simualte a Inhomogeneous Hawkes process
PointProcessTools.CIFFunction

Return the conditional intensity function of a point process as a Proxy.

If only a model and a Record are passed, calculates the CIF of the process with parameters estimated from Record.

The specific model (Homogeneous Poisson, Inhomogeneous Poisson, Hawkes, etc...) is determined by the type of params.

See Parameters Record Proxy

rec = Record("Resources/data/record.csv", "Resources/data/proxy.csv")
CIF("hp", rec) # CIF of a Poisson process with parameters estimated from `rec`
CIF(ParametersIP(1e-4, 2e-2), rec) # CIF of an Inhomogeneous Poisson process
PointProcessTools.periodicitiesFunction

Calculates the fourier transform of a point process. This method uses the structure of the data to improve the speed and precision of the calculations. Each event is represented as a shifted dirac delta function, so an event that occurred at time t₀ is represented as δ(t - t₀). Since the Fourier transform is linear and we know that the Fourier transform of δ(t - t₀) is F(ω) = exp(-iωt₀), where ω is the frequency, we can just sum these functions for all different t₀ in the event history. This allows the computation of only specific components, speeding up the computation. See M. Bartlett (1963). Statistical Estimation of Density Function

Calculates the equivalent of the fourier transform, but for specific chosen periodicities. It is simply the sum of the complex exponential with the chosen frequency calculated at the times of each of the events in the event record.

PointProcessTools.period_functionFunction

Constructs the sine wave corresponding to a component from the Fourier transform. This function is not scaled with the magnitude of the component. The scaling can be done by simply multiplying the function by the magnitude of the component and dividing by the length of the record.

Non exported data types

PointProcessTools.ModelType

Abstract type for dispatching on the type of model.

Used in CIF, simulate, time_transform.

Possible concrete subtypes:

  • HP -> Homogeneous Poisson
  • IP -> Inhomogeneous Poisson
  • HH -> Homogeneous Hawkes
  • IH -> Inhomogenous Hawkes

Non exported functions

PointProcessTools.distanceFunction

Calculates the distance between the empirical distribution of the interarrival times and an unit exponential distribution. It assumes transf_events are the time transformed event times.

'dist' must be either "KS" for the Kolmogorov-Smirnov distance, or "Lp" for the distance using the Laplace transform (not case sensitive).

See time_transform.

rec = Record("Resources/data/record.csv")
params_hh = ParametersHH(1e-4, 1e-4, 3e-4)
distance("ks", time_transform(params_hh, rec))
distance("Lp", time_transform(params_hh, rec))

For the Kolmogorov-Smirnov distance, the Wikipedia article is sufficient.

For the Laplace distance, see this paper.

PointProcessTools.time_transformFunction

Returns the time transformed event history of the process given the conditional intensity function calculated with respect to the given parameters.

The returned times always be on the interval from 0 to the time transform of the end of the difinition interval, which is returned as the last element of the vector.

Used in distance for calculating the KS or Laplace distances.

The function is dispatched based on the type of 'params'.

See Parameters Record

rec = Record("Resources/data/proxy.csv")
params = ParametersHH(1e-4, 1e-4, 2e-4)
time_transform(params, rec)
PointProcessTools.likelihoodFunction

Calculates the likelihood of as observed process with respect to the given parameters.

This function is dispatched on the type of 'params'.

See Parameters [Record]@ref.

Tests

PointProcessTools.test_simulationFunction

Tests the simulation algorithm for a given model and record.

The function simulates the model n_sims times with the parameters estimated from the provided Record and compares the expected cummulative number of events over the process interval with the average cummulative number of events generated by the simulations.

Instead of a model as a first argument, it is possible to provide parameters.

The function plots both curves and their difference to visually compare if there is a significantdiscrepancy between the two.

The function is dispatched based on the type of 'params'. See simulate Parameters Record

rec = Record("Resources/data/record.csv")
test_simulation("hh", rec)
test_simulation(ParametersHP(1e-4), rec, n_sims=1000, plot_results=true)
PointProcessTools.test_estimationFunction

Tests the estimation algorithm for a given model and record.

The function simulates n_estimations processes with parameters estimated from the provided Record and estimates the parameters from the simulations.

Instead of a model as a first argument, it is possible to provide parameters.

The function then plots the histogram of the distribution of the estimated parameters and the true parameters.

The function is dispatched based on the type of 'params'.

See estimate Parameters Record

rec = Record("Resources/data/record.csv")
test_estimation("hh", rec)
test_estimation(ParametersHP(1e-4), rec; n_estimations=1000, plot_results=true)
PointProcessTools.test_fit_testFunction

Tests the goodness of fit algorithm for a given model and record.

Given a specific model_type and dist_type, the function estimates the parameters of rec as the given model type and simulates n_tests processes with these parameters. The function then runs the goodness-of-fit test on each of these simulations and collects the n_tests p-values.

The p-values are returned and, if the plot_results keyword is set to true, the function plots the distribution of the p-values. A distribution of p-values close to uniform means that the test is working correctly.

See fit_test Model distance Record

rec = Record("Resources/data/record.csv", "Resources/data/proxy.csv")
fit_test("ip", "ks", rec) # Test the goodness-of-fit test for Poisson
fit_test("ih", "lp", rec; n_sims=1000, n_tests=1000, plot_results=true)

Index