noctiluca.io

noctiluca.io.hdf5

noctiluca.io.hdf5.registrystring(obj)
noctiluca.io.hdf5.writer(func)
noctiluca.io.hdf5.write(obj, name, hdf5_base)
noctiluca.io.hdf5.read(hdf5_container)
noctiluca.io.hdf5.new_group(hdf5_base, name)
noctiluca.io.hdf5.write_dict(obj, name, hdf5_base)
noctiluca.io.hdf5.read_group_as_dict(group)
noctiluca.io.hdf5.write_ndarray(obj, name, hdf5_base)
noctiluca.io.hdf5.read_dataset_as_ndarray(dset)
noctiluca.io.hdf5.write_iterable(obj, name, hdf5_base)
noctiluca.io.hdf5.read_iterable(hdf5_container)
noctiluca.io.hdf5.write_None(obj, name, hdf5_base)
noctiluca.io.hdf5.read_None(group)
noctiluca.io.hdf5.write_generic_class(obj, name, hdf5_base)
noctiluca.io.hdf5.read_generic_class(hdf5_container)
noctiluca.io.hdf5.ls(filename, group='/', depth=1)

List toplevel contents of file (or group within a file)

Parameters:
  • filename (str or pathlib.Path) – the file to inspect

  • group (str) – the group whose contents to list

  • depth (int) – how many levels of content to recurse through when encountering subgroups

Returns:

list of str – the contents of the specified group, one string per item. Attributes are printed with their value and enclosed in braces {}, Datasets are surrounded by brackets [], Groups are just given by name.

noctiluca.io.hdf5.check_group_or_attr(group)

Check whether we’re querying a group or just a single attribute

The syntax for querying a specific attribute is group/{attr} (as opposed to group/subgroup).

Parameters:

group (str or None) – the identifier to check/dissect

Returns:

group, name (str) – if the input conforms to the attribute syntax, it is dissected into group and name. Otherwise group is the same as input and name is None. Exception: if input is None, the output group is '/'.

noctiluca.io.load

Loading data from common formats into the Trajectory and TaggedSet structures used throughout the library

noctiluca.io.load.csv(filename, columns=['x', 'y', 't', 'id'], tags=None, meta_post={}, **kwargs)

Load data from a .csv file.

This uses np.genfromtxt, and all kwargs are forwarded to it. By default, we assume the delimiter ',' and utf8 encoding for string data, but these can of course be changed. Refer to numpy.genfromtxt.

Parameters:
  • filename (string or file-like object) – the file to be read

  • columns (list) – how to interpret the columns in the file. Use any of these identifiers: {'x', 'y', 'z', 'x2', 'y2', 'z2', 't', 'id', None}, where 't' (mandatory) is the frame number, 'id' (mandatory) the trajectory id, and the remaining ones can be used to indicate spatial components of single or double-locus trajectories. Use None to indicate a column that should be ignored. Columns beyond the list given here will be ignored in any case. Finally, the data for any str identifier not matching one of the above will be written to a corresponding entry in the trajectory’s meta dict.

  • tags (str, list of str or set of str, optional) – the tag(s) to be associated with trajectories from this file

  • meta_post (dict, optional) – post-processing options for the meta data. Keys should be meta field names, values can be “unique” or “mean”. With the former, all the values in the corresponding column should be the same, and only that value (instead of the whole array) will be written into the meta field. With the latter we simply take the mean of the array.

Returns:

TaggedSet – the loaded data set

Examples

This function can be used to load data from pandas.DataFrame tables, if they conform to the format described above:

>>> import io
... import pandas as pd
... import noctiluca as nl
...
... # Set up a DataFrame containing some dummy data
... # Caveat to pay attention to: the order of the columns is important!
... df = pd.DataFrame()
... df['frame_no'] = [1, 2, 3]
... df['trajectory_id'] = [4, 4, 4]
... df['coord1'] = [1, 2, 3]
... df['coord2'] = [4, 5, 6]
...
... csv_stream = io.StringIO(df.to_csv())
... dataset = nl.io.load.csv(csv_stream,
...                          [None, 't', 'id', 'x', 'y'], # first column will be index
...                          delimiter=',',               # pandas' default
...                          skip_header=1,               # pandas prints a header line
...                         )
noctiluca.io.load.evalSPT(filename, tags={})

Load data in the format used by evalSPT

This is a shortcut for csv(filename, ['x', 'y', 't', 'id'], tags, delimiter=' ').

See also

csv

noctiluca.io.load.hdf5(filename, group='/')

Load data from an HDF5 file

Parameters:
  • filename (str or pathlib.Path) – the file to read

  • group (str) – which group in the file to read. Defaults to root, i.e. the whole file.

Returns:

dict or object – whatever is stored in the file

noctiluca.io.write

Some functions for writing trajectories / data sets to file

noctiluca.io.write.csv(data, filename, header=True, delimiter='\t')

A quick-and-dirty csv-writer. Might be updated eventually.

Parameters:
  • data (TaggedSet of Trajectory) – the data set to write to file

  • filename (str) – the file to write to

  • header (bool, optional) – whether to write a header line with column names

  • delimiter (chr, optional) – which character to use as delimiter

Notes

The columns in the file will be ['id', 'frame', 'x', 'y', 'z', 'x2', 'y2', 'z2'], where of course only those coordinates present in the data set will be written.

Missing frames, i.e. those where all of the coordinates are np.nan will simply be omitted.

Since TaggedSet and Trajectory have more structure than can reasonably represented in .csv files, this function has no aspirations of writing the whole structure to file. It can write only the “core” data, i.e. the actual trajectories.

noctiluca.io.write.mat(data, filename)

Write a dataset to MATLAB’s .mat format

This will produce a cell array containing the individual trajectories as structs. All the meta-data is passed along as well. The tags associated with the trajectory will be written to an entry 'noctiluca_tags'.

Parameters:
  • data (TaggedSet of Trajectory) – the data set to write

  • filename (str) – the file to write to

noctiluca.io.write.hdf5(data, filename, group=None)

Write to HDF5 file

Parameters:
  • data (TaggedSet or dict) – the stuff to write

  • filename (str or pathlib.Path) – where to write to

  • group (str) – where in the file to write the data. If unspecified, the file will be truncated and content written to the root node.

Notes

Caution is advised, since this function will silently overwrite existing data (this is most often the desired behavior).

noctiluca.io.write.hdf5_subTaggedSet(data, filename, group, refTaggedSet=None)

Write a subset of an already stored TaggedSet to file

Sometimes it is handy to store subsets of data in a directly loadable way (i.e. as its own TaggedSet object). This would duplicate data and thus increase file size, so this function takes advantage of hdf5’s hard links to store the properly pruned TaggedSet by just linking to the corresponding entries in the full data set, which should be located in the same file at the refTaggedSet address.

Parameters:
  • data (TaggedSet) – a TaggedSet with some selection applied. The full data set (potentially with a different selection, this does not matter) should already be written to filename under the path refTaggedSet.

  • filename (str or pathlib.Path) – the file to store things in

  • group (str) – the location in the file where to store the new entry

  • refTaggedSet (str) – where the full data set is stored in the file.

Notes

Caution is advised, since this function will silently overwrite existing data (this is most often the desired behavior).

This function is intended for storing selections (subsets) of TaggedSets such that they can be read from file as complete TaggedSets themselves. Usecases include having a big dataset, out of which you routinely need only a specific part. If your subset is identified by the tag subset in the big dataset, this is equivalent to

>>> from noctiluca import io
...
... big_data = io.load.hdf5('file_with_big_data.h5', 'data')
... big_data.makeSelection(tags='subset')
... data = big_data.copySelection()
...
... # This can be reduced to a single line by saving the selection beforehand
... # when saving the data:
... big_data.makeSelection()
... io.write.hdf5(big_data, 'file_with_big_data_and_subset.h5', 'data')
... big_data.makeSelection(tags='subset')
... io.write.hdf5_subTaggedSet(big_data, 'file_with_big_data_and_subset.h5',
...                            'data_subset', refTaggedSet='/data')
...
... # so now when loading the data, we can just do
... data = io.load.hdf5('file_with_big_data_and_subset.h5', 'data_subset')

Note that basically we just shifted the process of making the selection from loading to writing. This however can come in very handy when the selection process is more involved than a simple tag, or you distribute your data to others, who will appreciate an easy way to load just the relevant data.