noctiluca.util

noctiluca.util.clean

Everything to do with cleaning up experimental data

noctiluca.util.clean.split_trajectory_at_big_steps(traj, threshold)

Removes suspected mislinkages.

Exception: two consecutive big steps such that the overall displacement is smaller than the threshold are single misconnections in otherwise fine trajectories. In this case we will simply remove the misconnected point.

Parameters:
  • traj (Trajectory) – the trajectory to investigate

  • threshold (float) – the maximum allowed frame to frame displacement

Returns:

set of Trajectory

Notes

As of now, this only works on trajectories with N=1.

This really just checks for frame-to-frame connections exceeding the threshold. So if there are missing frames in a trajectory, the step across those missing data will not be considered.

noctiluca.util.clean.split_dataset_at_big_steps(data, threshold)

Apply split_trajectory_at_big_steps to a whole data set

Parameters:
  • data (TaggedSet) – the dataset

  • threshold (float) – the maximum allowed step size

Returns:

TaggedSet – a new data set with the split trajectories

noctiluca.util.mcmc

A small module for running MCMC sampling. It provides the abstract base class Sampler that can be subclassed for concrete sampling schemes.

class noctiluca.util.mcmc.MCMCRun(logLs=None, samples=None)

Bases: object

The results of an MCMC run

logLs

the likelihoods from the run

Type:

np.array(dtype=float)

samples

the actual MCMC sample

Type:

list of <sample data type>)

Notes

The list of samples is not automatically copied upon initialization.

logLs_trunc()

Give only the likelihoods associated with the samples

Sampler.run returns likelihoods starting at the first iteration, but samples only after a given burn-in period. This function cuts this initial overhang from the likelihood array

Returns:

(len(samples),) np.ndarray

best_sample_logL()

Give the best sample (and log-likelihood)

Returns:

  • sample (<sample data type>) – the maximum likelihood estimate

  • logL (float) – the maximum likelihood value

acceptance_rate(criterion='sample_identity', comp_fun=None)

Calculate fraction of accepted moves

We can see whether a move was accepted or rejected by checking whether the samples are actually different. There are three different ways to do so: syntactically correct would be sample comparison (using ==). However, since samples have a user-defined data type, we do not necessarily have the == operator defined. For mutable objects we can exploit that they would not be copied for an unaccepted step, i.e. we can use identity check (is). As a last resort, we can use the likelihood as a proxy: it is very unlikeliy that we did a move where the likelihood remained exactly the same.

Alternatively, you can provide a comparison function comp_fun(sample0, sample1)->bool that provides the comparison. This overrides the criterion setting.

Parameters:
  • criterion ({'sample_equality', 'sample_identity', 'likelihood_equality'}) – which method to use to determine whether a step was accepted or rejected.

  • comp_fun (callable, optional) – should return True if the two arguments compare equal, False otherwise

Returns:

float – the calculated acceptance rate

evaluate(fun)

Evaluate a function on all samples

This exploits that (if the sample data type is e.g. a user-defined class) many samples will actually be identical and we have to evaluate the function significantly fewer than len(samples) times.

Parameters:

fun (callable of signature fun(sample) --> object) – the function to evaluate. It should expect a single sample as input and return something.

Returns:

list – a list of output values, in the order of samples.

Notes

This function is supposed to decrease computational cost. It is usually quicker however, to use a vectorized function fun instead.

class noctiluca.util.mcmc.Sampler

Bases: ABC

Abstract base class for MCMC sampling

To implement a sampling scheme, subclass this and override

See their documentations in this base class for more details.

The above methods take an argument params that is supposed to represent the parameters of the model. The structure of this representation (e.g. a list of parameters, a np.ndarray, or a custom class) is completely up to the user. Since all explicit handling of the parameters happens in the propose_update and logL methods, the code in this base class is completely independent from the parameterization.

config

some configuration values. See configure.

Type:

dict

Example

>>> import numpy as np
... import scipy.stats
>>> class normMCMC(Sampler):
...     def __init__(self, stepsize=0.1):
...         self.stepsize = stepsize
...
...     def propose_update(self, current_value):
...         proposed_value = current_value + np.random.normal(scale=self.stepsize)
...         logp_forward = -0.5*(proposed_value - current_value)**2/self.stepsize**2 - 0.5*np.log(2*np.pi*self.stepsize**2)
...         logp_backward = -0.5*(current_value - proposed_value)**2/self.stepsize**2 - 0.5*np.log(2*np.pi*self.stepsize**2)
...         return proposed_value, logp_forward, logp_backward
...
...     def logL(self, value):
...         return -0.5*value**2 - 0.5*np.log(2*np.pi)
...
...     def callback_logging(self, current_value, best_value):
...         print("Current value: {}".format(current_value))
>>> mc = normMCMC()
... mc.configure(iterations=100, burn_in=50, log_every=10, show_progress=True)
... logL, vals = mc.run(1)
abstract propose_update(params)

Propose an update step.

Parameters:

params (<user-specified parameter structure>) – the current parameters

Returns:

  • proposed_values (<user-specified parameter structure>) – the proposed new parameters

  • logp_forward (float) – (log of the) probability of proposing these values from the current ones

  • logp_backward (float) – (log of the) probability of proposing the current values from the proposed ones

abstract logL(params)

Calculate log-likelihood of the given parameters.

Parameters:

params (<user-specified parameter structure>) – the parameters to use

Returns:

float – the log-likelihood for the given parameters.

callback_logging(current_params, best_params)

Callback upon logging.

This function will be called whenever a status line is printed. It can thus be used to provide additional information about where the sampler is, or how the current values compare to the best ones found so far, etc.

Parameters:
  • current_params (<user-specified parameter structure>) – the current parameter set.

  • best_params (<user-specified parameter structure>) – the best (i.e. highest likelihood) parameter set the sampler has seen so far.

See also

propose_update, logL

callback_stopping(myrun)

Callback to enable early stopping

This function will be called at well-defined intervals to check whether sampling should abort. Judgement is to be made based on the data so far, which is handed over as the myrun argument. This is an MCMCRun object.

Parameters:

myrun (MCMCRun) – the data generated so far

Returns:

stop (bool) – should be True if the sampling is to stop, False otherwise

configure(iterations=1, burn_in=0, log_every=-1, check_stopping_every=-1, show_progress=False)

Set the configuration of the sampler.

Note that after calling this function once, you can also access single configuration entries via the attribute config.

Parameters:
  • iterations (int) – how many MCMC iterations to run total

  • burn_in (int) – how many steps to discard at the beginning

  • log_every (int) – print a status line every … steps. Set to -1 (default) to disable status lines.

  • check_stopping_every (int) – how often to call the callback_stopping. Set to -1 (default) to never check.

  • show_progress (bool) – whether to show a progress bar using tqdm

See also

run

run(initial_values)

Run the sampling scheme

Remember to run configure before this.

Parameters:

initial_values (<user-specified parameter structure>) – the initial values for the sampling scheme

Returns:

MCMCRun – the output data of the run. This is essentially a wrapper class for an array of likelihoods and the associated samples. Plus some convenience functions.

See also

configure

Notes

The returned MCMCRun contains likelihoods for all steps starting at initialization, while the list of sampled parameters starts only after the burn-in period.

noctiluca.util.userinput

Convert user data to noctiluca objects (Trajectory, TaggedSet)

This module provides convenience functions to convert from various formats in which users might provide their data to the objects that noctiluca and libraries building on it use. These functions are designed to be used under the hood in downstream libraries, such that beginning users do not have to go the extra step of creating Trajectory/TaggedSet objects from their data. In the long run it is of course recommended to get familiar with these objects, as they provide some useful additional functionality.

For trajectories, the following inputs are accepted:

  • noctiluca.Trajectory, which is returned unchanged

  • pandas.DataFrame; columns should be named 'x', 'y', 'z' to indicate dimension, possibly with a suffix identifying different loci. So for example

    • ('x', 'y', 'x2', 'y2') would indicate a trajectory with two loci in two spatial dimensions

    • equivalent column specifications to the above include ('x_left', 'x_right', 'y_left', 'y_right'), (x0, y0, x1, y1), ('x_red', 'y_red', 'x_green', y_green')

    • the precise regular expression to identify columns containing positional data is '(x|y|z)(\d*|_.*)'.

    • for each locus all dimensions should exist, i.e. ('x0', 'y0', 'x0') would be invalid

    • dimensions should be labelled starting with 'x', i.e. a dataframe containing just a 'y' column would be invalid

    Beyond the positional data, the dataframe may contain a column 'frame' indicating the frame number that each detection belongs to. This column will be coerced to integer.

  • numpy array-like; anything that casts well to a numpy array is accepted. This case is simply forwarded to the noctiluca.Trajectory constructor, which accepts array-like structures. Possible shapes for array-like inputs:

    • (N, T, d) for a trajectory with N loci in d spatial dimensions over T time steps

    • (T, d) for a single-locus trajectory

    • (T,) for a trajectory with a single locus in a single spatial dimension.

  • pandas.Series, which is cast to a one dimensional numpy array. In doing so, we use the index of the series as frame number, i.e. pd.Series(data=[1, 2, 3], index=[5, 7, 8]) would be read as a trajectory with 4 frames, where the second one is missing.

For datasets (i.e. a collection of trajectories) we accept

  • noctiluca.TaggedSet, which is returned unchanged

  • any iterable of structures that make_Trajectory can deal with. Specifically this includes a list of numpy arrays (which then can be of different length / dimension) or a big numpy array, where the different trajectories are indexed along the first dimension.

  • pandas.DataFrame with column names conforming to the patterns outlined above for make_Trajectory, plus an additional column 'particle' indicating which detections belong to which trajectory. The name 'particle' for this column was chosen for compatibility with trackpy. Note that contrary to make_Trajectory, the specification of the 'frame' column is now mandatory.

More control over the input process (e.g. the assumed column names for pandas dataframes) can be attained by using the functions in this module explicitly, in which case additional keyword arguments can be specified. For yet finer control we recommend directly using the noctiluca.Trajectory and noctiluca.TaggedSet classes.

noctiluca.util.userinput.make_Trajectory(inpt, **kwargs)

Create a noctiluca.Trajectory from user specified data

Parameters:

inpt (<various>) – see module docstring for description.

Keyword Arguments:
  • t_column (str) – name of the column containing frame numbers when giving a pandas.DataFrame as inpt

  • pos_columns ((list of) list of str) – names of the columns containing the positional data when giving a pandas.DataFrame as inpt. This should be either a list of identifiers for the different dimensions (e.g. ['x', 'y']) or a list of such lists, if the trajectory has multiple loci (e.g. [['x1', 'y1'], ['x2', 'y2']]).

Returns:

noctiluca.Trajectory

noctiluca.util.userinput.make_TaggedSet(inpt, **kwargs)

Convert a collection of trajectories to noctiluca.TaggedSet

Parameters:

inpt (<various>) – see module docstring for description.

Keyword Arguments:
  • id_column (str) – the name of the column containing trajectory identifiers (when specifying a pandas.DataFrame). The data in this column does not have to be numerical. Defaults to 'particle'.

  • **kwargs (<other keyword arguments>) – forwarded to make_Trajectory. Note 't_column' and 'pos_columns'.