noctiluca.io
noctiluca.io.hdf5
- noctiluca.io.hdf5.registrystring(obj)
- noctiluca.io.hdf5.writer(func)
- noctiluca.io.hdf5.write(obj, name, hdf5_base)
- noctiluca.io.hdf5.read(hdf5_container)
- noctiluca.io.hdf5.new_group(hdf5_base, name)
- noctiluca.io.hdf5.write_dict(obj, name, hdf5_base)
- noctiluca.io.hdf5.read_group_as_dict(group)
- noctiluca.io.hdf5.write_ndarray(obj, name, hdf5_base)
- noctiluca.io.hdf5.read_dataset_as_ndarray(dset)
- noctiluca.io.hdf5.write_iterable(obj, name, hdf5_base)
- noctiluca.io.hdf5.read_iterable(hdf5_container)
- noctiluca.io.hdf5.write_None(obj, name, hdf5_base)
- noctiluca.io.hdf5.read_None(group)
- noctiluca.io.hdf5.write_generic_class(obj, name, hdf5_base)
- noctiluca.io.hdf5.read_generic_class(hdf5_container)
- noctiluca.io.hdf5.ls(filename, group='/', depth=1)
List toplevel contents of file (or group within a file)
- Parameters:
filename (str or pathlib.Path) – the file to inspect
group (str) – the group whose contents to list
depth (int) – how many levels of content to recurse through when encountering subgroups
- Returns:
list of str – the contents of the specified group, one string per item. Attributes are printed with their value and enclosed in braces {}, Datasets are surrounded by brackets [], Groups are just given by name.
- noctiluca.io.hdf5.check_group_or_attr(group)
Check whether we’re querying a group or just a single attribute
The syntax for querying a specific attribute is
group/{attr}(as opposed togroup/subgroup).- Parameters:
group (str or None) – the identifier to check/dissect
- Returns:
group, name (str) – if the input conforms to the attribute syntax, it is dissected into
groupandname. Otherwisegroupis the same as input andnameisNone. Exception: if input isNone, the outputgroupis'/'.
noctiluca.io.load
Loading data from common formats into the Trajectory and TaggedSet structures used throughout the library
- noctiluca.io.load.csv(filename, columns=['x', 'y', 't', 'id'], tags=None, meta_post={}, **kwargs)
Load data from a .csv file.
This uses
np.genfromtxt, and all kwargs are forwarded to it. By default, we assume the delimiter','and utf8 encoding for string data, but these can of course be changed. Refer tonumpy.genfromtxt.- Parameters:
filename (string or file-like object) – the file to be read
columns (list) – how to interpret the columns in the file. Use any of these identifiers:
{'x', 'y', 'z', 'x2', 'y2', 'z2', 't', 'id', None}, where't'(mandatory) is the frame number,'id'(mandatory) the trajectory id, and the remaining ones can be used to indicate spatial components of single or double-locus trajectories. UseNoneto indicate a column that should be ignored. Columns beyond the list given here will be ignored in any case. Finally, the data for any str identifier not matching one of the above will be written to a corresponding entry in the trajectory’smetadict.tags (str, list of str or set of str, optional) – the tag(s) to be associated with trajectories from this file
meta_post (dict, optional) – post-processing options for the
metadata. Keys should bemetafield names, values can be “unique” or “mean”. With the former, all the values in the corresponding column should be the same, and only that value (instead of the whole array) will be written into the meta field. With the latter we simply take the mean of the array.
- Returns:
TaggedSet – the loaded data set
Examples
This function can be used to load data from
pandas.DataFrametables, if they conform to the format described above:>>> import io ... import pandas as pd ... import noctiluca as nl ... ... # Set up a DataFrame containing some dummy data ... # Caveat to pay attention to: the order of the columns is important! ... df = pd.DataFrame() ... df['frame_no'] = [1, 2, 3] ... df['trajectory_id'] = [4, 4, 4] ... df['coord1'] = [1, 2, 3] ... df['coord2'] = [4, 5, 6] ... ... csv_stream = io.StringIO(df.to_csv()) ... dataset = nl.io.load.csv(csv_stream, ... [None, 't', 'id', 'x', 'y'], # first column will be index ... delimiter=',', # pandas' default ... skip_header=1, # pandas prints a header line ... )
- noctiluca.io.load.evalSPT(filename, tags={})
Load data in the format used by evalSPT
This is a shortcut for
csv(filename, ['x', 'y', 't', 'id'], tags, delimiter=' ').See also
- noctiluca.io.load.hdf5(filename, group='/')
Load data from an HDF5 file
- Parameters:
filename (str or pathlib.Path) – the file to read
group (str) – which group in the file to read. Defaults to root, i.e. the whole file.
- Returns:
dict or object – whatever is stored in the file
noctiluca.io.write
Some functions for writing trajectories / data sets to file
- noctiluca.io.write.csv(data, filename, header=True, delimiter='\t')
A quick-and-dirty csv-writer. Might be updated eventually.
- Parameters:
data (
TaggedSetofTrajectory) – the data set to write to filefilename (str) – the file to write to
header (bool, optional) – whether to write a header line with column names
delimiter (chr, optional) – which character to use as delimiter
Notes
The columns in the file will be
['id', 'frame', 'x', 'y', 'z', 'x2', 'y2', 'z2'], where of course only those coordinates present in the data set will be written.Missing frames, i.e. those where all of the coordinates are
np.nanwill simply be omitted.Since
TaggedSetandTrajectoryhave more structure than can reasonably represented in.csvfiles, this function has no aspirations of writing the whole structure to file. It can write only the “core” data, i.e. the actual trajectories.
- noctiluca.io.write.mat(data, filename)
Write a dataset to MATLAB’s .mat format
This will produce a cell array containing the individual trajectories as structs. All the meta-data is passed along as well. The tags associated with the trajectory will be written to an entry
'noctiluca_tags'.- Parameters:
data (TaggedSet of Trajectory) – the data set to write
filename (str) – the file to write to
- noctiluca.io.write.hdf5(data, filename, group=None)
Write to HDF5 file
- Parameters:
data (TaggedSet or dict) – the stuff to write
filename (str or pathlib.Path) – where to write to
group (str) – where in the file to write the data. If unspecified, the file will be truncated and content written to the root node.
Notes
Caution is advised, since this function will silently overwrite existing data (this is most often the desired behavior).
- noctiluca.io.write.hdf5_subTaggedSet(data, filename, group, refTaggedSet=None)
Write a subset of an already stored
TaggedSetto fileSometimes it is handy to store subsets of data in a directly loadable way (i.e. as its own
TaggedSetobject). This would duplicate data and thus increase file size, so this function takes advantage of hdf5’s hard links to store the properly prunedTaggedSetby just linking to the corresponding entries in the full data set, which should be located in the same file at therefTaggedSetaddress.- Parameters:
data (TaggedSet) – a
TaggedSetwith some selection applied. The full data set (potentially with a different selection, this does not matter) should already be written tofilenameunder the pathrefTaggedSet.filename (str or pathlib.Path) – the file to store things in
group (str) – the location in the file where to store the new entry
refTaggedSet (str) – where the full data set is stored in the file.
Notes
Caution is advised, since this function will silently overwrite existing data (this is most often the desired behavior).
This function is intended for storing selections (subsets) of
TaggedSetssuch that they can be read from file as completeTaggedSetsthemselves. Usecases include having a big dataset, out of which you routinely need only a specific part. If your subset is identified by the tagsubsetin the big dataset, this is equivalent to>>> from noctiluca import io ... ... big_data = io.load.hdf5('file_with_big_data.h5', 'data') ... big_data.makeSelection(tags='subset') ... data = big_data.copySelection() ... ... # This can be reduced to a single line by saving the selection beforehand ... # when saving the data: ... big_data.makeSelection() ... io.write.hdf5(big_data, 'file_with_big_data_and_subset.h5', 'data') ... big_data.makeSelection(tags='subset') ... io.write.hdf5_subTaggedSet(big_data, 'file_with_big_data_and_subset.h5', ... 'data_subset', refTaggedSet='/data') ... ... # so now when loading the data, we can just do ... data = io.load.hdf5('file_with_big_data_and_subset.h5', 'data_subset')
Note that basically we just shifted the process of making the selection from loading to writing. This however can come in very handy when the selection process is more involved than a simple tag, or you distribute your data to others, who will appreciate an easy way to load just the relevant data.