[1]:

import numpy as np
import noctiluca as nl

In- & Output from/to file

SPT trajectories are commonly saved in .csv format, i.e. as human-readable text files; this satisfies most basic needs and is easy to read for anyone, human or machine. More versatile storage—e.g. including arbitrary meta data—can be achieved by the binary, tree-structured HDF5 format. noctiluca is equipped to handle both.

.csv files

noctiluca can read .csv files with minimal requirements on the format. Let’s take a look:

[2]:

# Write a file like what we might expect to see in the wild
filename = "example_data_in.csv"
with open(filename, 'wt') as f:
    f.write("trajid\tframe\tpos_x\tpos_y\tpos_z\n")
    f.write("5 \t  7 \t 0.3 \t  0.7 \t -1.2 \n")
    f.write("5 \t  8 \t 1.5 \t -0.4 \t -0.8 \n")
    f.write("5 \t 10 \t 1.1 \t -0.2 \t  0.1 \n")
    f.write("6 \t  4 \t 2.1 \t 11.3 \t  5.1 \n")
    f.write("6 \t  5 \t 2.3 \t 12.5 \t  4.7 \n")

print(f"Contents of {filename}:\n")
with open(filename, 'r') as f:
    for line in f:
        print(line[:-1])

Contents of example_data_in.csv:

trajid  frame   pos_x   pos_y   pos_z
5         7      0.3      0.7    -1.2
5         8      1.5     -0.4    -0.8
5        10      1.1     -0.2     0.1
6         4      2.1     11.3     5.1
6         5      2.3     12.5     4.7

[3]:

# Load .csv file as `TaggedSet`
data = nl.io.load.csv(filename, columns=['id', 't', 'x', 'y', 'z'], delimiter='\t', skip_header=1)

# Check what we got
for i, traj in enumerate(data):
    print(f"traj #{i}")
    print(traj[:])
    print()

traj #0
[[ 0.3  0.7 -1.2]
 [ 1.5 -0.4 -0.8]
 [ nan  nan  nan]
 [ 1.1 -0.2  0.1]]

traj #1
[[ 2.1 11.3  5.1]
 [ 2.3 12.5  4.7]]

Let’s take a look at the call to nl.io.load.csv() to understand how this works. We start from the end:

load.csv() accepts any keywords that numpy.genfromtxt() understands, most importantly the last two in the call above, delimiter and skip_header. The latter gives the number of lines at the start of the file to ignore; in this case we set it to 1, such that we start reading right at the data.
But wait, that means we ignore the column names completely? Yes! load.csv() does not care what your columns are called (or whether they have names at all, for that matter). How to process the data is instead specified by the columns argument. This list assigns pre-defined identifiers to the columns, in the order in which they appear in the file.
two of those identifiers are mandatory: 'id' should be a column that has a unique entry for each trajectory, identifying which localization belongs to which trajectory; 't' should be a column containing integer frame numbers. Note that missing frames (above: frame 9 in trajectory 5) are simply patched with np.nan.
beyond those two, possible identifiers are x, y, z, x2, y2, z2 for spatial coordinates of trajectories with up to three spatial dimensions and two particles; and None, which indicates that the corresponding column should be ignored
finally, any string not recognized as one of the defined identifiers means that the data from the corresponding column will be written to the meta-dict of the corresponding trajectory, with that string as key. Refer to nl.io.load.csv() for more details.

Writing .csv files is straight-forward:

[4]:

filename = "example_data_out.csv"
nl.io.write.csv(data, filename)

# Let's see what that produced
print(f"Contents of {filename}:\n")
with open(filename, 'r') as f:
    for line in f:
        print(line[:-1])

Contents of example_data_out.csv:

id      frame   x       y       z
0       0       0.3     0.7     -1.2
0       1       1.5     -0.4    -0.8
0       3       1.1     -0.2    0.1
1       0       2.1     11.3    5.1
1       1       2.3     12.5    4.7

Note that when loading data we do not keep track of the “real” trajectory ID, or frame numbers. Therefore, when writing back to file, these are just indexed starting at zero.

HDF5 (.h5 / .hdf5) files

HDF5 is a binary, tree-structured file format that is optimized for storage of large data structures, specifically numerical arrays. noctiluca implements an interface to store TaggedSet objects in HDF5 files:

[5]:

# still using the dataset from above
filename = "example_data_out.h5"
nl.io.write.hdf5(data, filename)

# Let's check what's in there
print(nl.io.hdf5.ls(filename))
print(nl.io.hdf5.ls(filename, '/_data'))
print(nl.io.hdf5.ls(filename, '/_data/0'))

['_data', '[_selected]', '_tags', '{_HDF5_ORIG_TYPE_ = 198593848_noctiluca.TaggedSet}']
['0', '1', '{_HDF5_ORIG_TYPE_ = list}']
['[data]', 'localization_error', 'meta', 'parity', '{_HDF5_ORIG_TYPE_ = 198593848_noctiluca.Trajectory}']

The structure of HDF5 files is similar to a file system: data is organized in “groups” that can contain “datasets” (i.e. numerical arrays, indicated by [] in the above output), “attributes” (meta data, indicated by {}), or subgroups. You can parse the file with nl.io.hdf5.ls() and inspect its structure, starting from the root group '/'.

Internally, noctiluca uses the h5py package to handle HDF5 files. See there for more details on HDF5 and how to work with it in python (if you want/need to go beyond the base functionality exposed by noctiluca).

In the above example, the root group directly contains the TaggedSet we wrote to the file. It is often useful to write some documentation of what the file contains, like so:

[6]:

nl.io.write.hdf5({}, filename)                 # clear all file contents
nl.io.write.hdf5(data, filename, group='data') # write actual data
nl.io.write.hdf5("""
This is an example file, showing how to write HDF5 files. It contains the following:
+ 'data': sample `TaggedSet` used in the demonstration
"""[1:-1], filename, group='info')             # write a comment telling people what's in the file

# So now what does this look like?
print(nl.io.hdf5.ls(filename))
print()
print(nl.io.load.hdf5(filename, group='info'))

['data', '{_HDF5_ORIG_TYPE_ = dict}', "{info = This is an example file, showing how to write HDF5 files. It contains the following:\n+ 'data': sample `TaggedSet` used in the demonstration}"]

This is an example file, showing how to write HDF5 files. It contains the following:
+ 'data': sample `TaggedSet` used in the demonstration

Notes on the above snippet:

HDF5 files can be written incrementally; we first write the data, then add the description, and below we will add to that
when group is not specified, like in the first line, the existing file is overwritten. Otherwise the specified group is added; this might overwrite existing entries, but will not delete anything else
nl.io.load.hdf5() can be used to load any data from an HDF5 file, not just TaggedSet objects. Note the omnipresent attribute _HDF5_ORIG_TYPE_ which tells the loader what data type the data in this group should be loaded as.

HDF5 files also support random access, which plays well with the selection machinery of TaggedSet. Specifically, you can save references to a subset of your data: if you have a big data set out of which you routinely need only a specific part, you can use write.hdf5_subTaggedSet():

[7]:

# Save a specific part of a data set as directly loadable `TaggedSet`
data.makeSelection(lambda traj, _: len(traj) > 2) # some selection on the "big data set"
nl.io.write.hdf5_subTaggedSet(data, filename,
                              group='important data', # where to store this subset
                              refTaggedSet='data',    # where the big data set is stored in the file
                             )

# Don't forget to update the description!
nl.io.write.hdf5(nl.io.load.hdf5(filename, 'info') + """
+ 'important data': subset of 'data' that is very important in its own right
"""[:-1], filename, 'info')

# Check what we have
print(nl.io.hdf5.ls(filename))
print()
print(nl.io.load.hdf5(filename, group='info'))
print()

# 'important data' is now lloadable as `TaggedSet` by itself, and contains only the specified data
important_data = nl.io.load.hdf5(filename, 'important data')
print(type(important_data))
important_data.makeSelection() # just to demonstrate that there is no selection active here
print(f"len(important_data) = {len(important_data)}")

['data', 'important data', '{_HDF5_ORIG_TYPE_ = dict}', "{info = This is an example file, showing how to write HDF5 files. It contains the following:\n+ 'data': sample `TaggedSet` used in the demonstration\n+ 'important data': subset of 'data' that is very important in its own right}"]

This is an example file, showing how to write HDF5 files. It contains the following:
+ 'data': sample `TaggedSet` used in the demonstration
+ 'important data': subset of 'data' that is very important in its own right

<class 'noctiluca.taggedset.TaggedSet'>
len(important_data) = 1

Other formats

Beyond .csv and HDF5, currently noctiluca supports writing MATLAB .mat files, though of course we recommend using non-proprietary formats. If you have a favorite format that is currently not supported, feel free to submit an issue on GitHub or implement it yourself and submit a pull request.

Input from memory

If you have your data available in form of some python object (e.g. a pandas.DataFrame or numpy.ndarray), you can convert it to a TaggedSet (or Trajectory) using the make_TaggedSet() or make_Trajectory() functions. For example you can circumvent nl.io.load.csv() by using pandas.read_csv() and then converting to TaggedSet:

[8]:

import pandas as pd

filename = "example_data_in.csv" # reuse the file we wrote above in the .csv example
df = pd.read_csv(filename, delimiter='\t')
data = nl.make_TaggedSet(df,
                         id_column='trajid',
                         t_column='frame',
                         pos_columns=['pos_x', 'pos_y', 'pos_z'],
                        )

# Resulting `TaggedSet` is the same as reading directly from .csv (compare above)
for i, traj in enumerate(data):
    print(f"traj #{i}")
    print(traj[:])
    print()

traj #0
[[ 0.3  0.7 -1.2]
 [ 1.5 -0.4 -0.8]
 [ nan  nan  nan]
 [ 1.1 -0.2  0.1]]

traj #1
[[ 2.1 11.3  5.1]
 [ 2.3 12.5  4.7]]

See nl.util.userinput for more details.

The main purpose of make_TaggedSet() and make_Trajectory() is actually to provide downstream analysis libraries with an easy way to accept data in non-noctiluca formats, such that they are more independent. For examples see

which are written such that for a first pass the user does not have to be familiar with noctiluca.