API Reference¶
Top-level modules¶
h5features.data module¶
Provides the Data class to the h5features package.
-
class
h5features.data.
Data
(items, labels, features, sparsity=None, check=True)[source]¶ Bases:
object
This class manages h5features data.
-
init_group
(group, chunk_size)[source]¶ Initializes a HDF5 group compliant with the stored data.
This method creates the datasets ‘items’, ‘labels’, ‘features’ and ‘index’ and leaves them empty.
Parameters: - group (h5py.Group) – The group to initializes.
- chunk_size (float) – The size of a chunk in the file (in MB).
-
write_to
(group, append=False)[source]¶ Write the data to the given group.
Parameters: - group (h5py.Group) – The group to write the data on. It is
assumed that the group is already existing or initialized
to store h5features data (i.e. the method
Data.init_group
have been called. - append (bool) – If False, any existing data in the group
is overwrited. If True, the data is appended to the end of
the group and we assume
Data.is_appendable_to
is True for this group.
- group (h5py.Group) – The group to write the data on. It is
assumed that the group is already existing or initialized
to store h5features data (i.e. the method
-
h5features.reader module¶
Provides the Reader class to the h5features package.
-
class
h5features.reader.
Reader
(filename, groupname=None)[source]¶ Bases:
object
This class provides an interface for reading from h5features files.
A Reader object wrap a h5features file. When created it loads items and index from file. The read() method then allows fast access to features and times data.
Parameters: - filename (str) – Path to the HDF5 file to read from.
- groupname (str) – Name of the group to read from in the file. If None, guess there is one and only one group in filename.
Raises: IOError – if filename is not an existing HDF5 file or if groupname is not a valid group in filename.
-
read
(from_item=None, to_item=None, from_time=None, to_time=None)[source]¶ Retrieve requested data coordinates from the h5features index.
Parameters: - from_item (str) – Optional. Read the data starting from this item. (defaults to the first stored item)
- to_item (str) – Optional. Read the data until reaching the item. (defaults to from_item if it was specified and to the last stored item otherwise).
- from_time (float) – Optional. (defaults to the beginning time in from_item) The specified times are included in the output.
- to_time (float) – Optional. (defaults to the ending time in to_item) the specified times are included in the output.
Returns: An instance of h5features.Data read from the file.
h5features.writer module¶
Provides the Writer class to the h5features module.
-
class
h5features.writer.
Writer
(filename, chunk_size=0.1, version='1.1', mode='a')[source]¶ Bases:
object
This class provides an interface for writing to h5features files.
Parameters: - filename (str) – The name of the HDF5 file to write on. For clarity you should use a ‘.h5’ or ‘.h5f’ extension but this is not required by the package.
- chunk_size (float) – Optional. The size in Mo of a chunk in the file. Default is 0.1 Mo. A chunk size below 8 Ko is not allowed as it results in poor performances.
- version (str) – Optional. The file format version to write, default is to write the latest version.
- mode (char) – Optional. The mode for overwriting an existing file, ‘a’ to append data to the file, ‘w’ to overwrite it
Raises: IOError – if the file exists but is not HDF5, if the file can be opened, if the mode is not ‘a’ or ‘w’, if the chunk size is below 8 Ko or if the requested version is not supported.
-
write
(data, groupname='h5features', append=False)[source]¶ Write h5features data in a specified group of the file.
Parameters: - data (dict) – A h5features.Data instance to be writed on disk.
- groupname (str) – Optional. The name of the group in which to write the data.
- append (bool) – Optional. This parameter has no effect if the groupname is not an existing group in the file. If set to True, try to append new data in the group. If False (default) erase all data in the group before writing.
Raises: IOError – if append requested but not possible.
h5features.converter module¶
Provides the Converter class to the h5features package.
-
class
h5features.converter.
Converter
(filename, groupname='h5features', chunk=0.1)[source]¶ Bases:
object
This class allows convertion from various formats to h5features.
A Converter instance owns an h5features file and write converted input files to it, in a specified group.
An input file is converted to h5fatures using the convert method, which choose a concrete conversion method based on the input file extension.
Supported extensions are:
- .npz for numpy NPZ files
- .mat for Octave/Matlab files
- .h5 for h5features files. In this later case, the files are simply converted to latest version of the h5features data format
Parameters: - filename (str) – The h5features to write in.
- groupname (str) – The group to write in filename
- chunk (float) – Size a chunk in filename, in MBytes.
h5features.h5features module¶
Provides the read() and write() wrapper functions.
Note
For compatibility with h5features 1.0, this legacy top-level API have been conserved in this module. Except for use in legacy code, it is better not to use it. Use instead the h5features.writer and h5features.reader modules.
-
h5features.h5features.
read
(filename, groupname=None, from_item=None, to_item=None, from_time=None, to_time=None, index=None)[source]¶ Reads in a h5features file.
Parameters: - filename (str) – Path to a hdf5 file potentially serving as a container for many small files
- groupname (str) – HDF5 group to read the data from. If None, guess there is one and only one group in filename.
- from_item (str) – Optional. Read the data starting from this item. (defaults to the first stored item)
- to_item (str) – Optional. Read the data until reaching the item. (defaults to from_item if it was specified and to the last stored item otherwise)
- from_time (float) – Optional. (defaults to the beginning time in from_item) the specified times are included in the output
- to_time (float) – Optional. (defaults to the ending time in to_item) the specified times are included in the output
- index (int) – Optional. For faster access. TODO Document and test this.
Returns: A tuple (times, features) such as:
- time is a dictionary of 1D arrays values (keys are items).
- features: A dictionary of 2D arrays values (keys are items) with the ‘feature’ dimension along the columns and the ‘time’ dimension along the lines.
Note
Note that all the files that are present on disk between to_item and from_item will be loaded and returned. It’s the responsibility of the user to make sure that it will fit into RAM memory.
-
h5features.h5features.
simple_write
(filename, group, times, features, item='item', mode='a')[source]¶ Simplified version of write() when there is only one item.
-
h5features.h5features.
write
(filename, groupname, items, times, features, dformat='dense', chunk_size=0.1, sparsity=0.1, mode='a')[source]¶ Write h5features data in a HDF5 file.
This function is a wrapper to the Writer class. It has three purposes:
- Check parameters for errors (see details below),
- Create Items, Times and Features objects
- Send them to the Writer.
Parameters: - filename (str) – HDF5 file to be writted, potentially serving as a container for many small files. If the file does not exist, it is created. If the file is already a valid HDF5 file, try to append the data in it.
- groupname (str) – Name of the group to write the data in, or to append the data to if the group already exists in the file.
- items (list of str) – List of files from which the features where extracted. Items must not contain duplicates.
- times (list of 1D or 2D numpy arrays) – Time value for the features array. Elements of a 1D array are considered as the center of the time window associated with the features. A 2D array must have 2 columns corresponding to the begin and end timestamps of the features time window.
- features (list of 2D numpy arrays) – Features should have time along the lines and features along the columns (accomodating row-major storage in hdf5 files).
- dformat (str) – Optional. Which format to store the features into (sparse or dense). Default is dense.
- chunk_size (float) – Optional. In Mo, tuning parameter corresponding to the size of a chunk in the h5file. Ignored if the file already exists.
- sparsity (float) – Optional. Tuning parameter corresponding to the expected proportion (in [0, 1]) of non-zeros elements on average in a single frame.
- mode (char) – Optional. The mode for overwriting an existing file, ‘a’ to append data to the file, ‘w’ to overwrite it
Raises: - IOError – if the filename is not valid or parameters are inconsistent.
- NotImplementedError – if dformat == ‘sparse’
Low-level modules¶
h5features.entry module¶
Provides the Entry class to the h5features package.
-
class
h5features.entry.
Entry
(name, data, dim, dtype, check=True)[source]¶ Bases:
object
The Entry class is the base class of h5features.Data entries.
It provides a shared interface to the classes
Items
,Times
andFeatures
which all together compose aData
.
-
h5features.entry.
nb_per_chunk
(item_size, item_dim, chunk_size)[source]¶ Return the number of items that can be stored in one chunk.
Parameters: - item_size (int) – Size of an item’s scalar componant in Bytes (e.g. for np.float64 this is 8)
- item_dim (int) – Items dimension (length of the second axis)
- chunk_size (float) – The size of a chunk given in MBytes.
h5features.features module¶
Provides Features class to the h5features module.
-
class
h5features.features.
Features
(data, check=True, sparsetodense=False)[source]¶ Bases:
h5features.entry.Entry
This class manages features in h5features files
Parameters: - data (list of 2D numpy arrays) – Features must have time along the lines and features along the columns (accomodating row-major storage in hdf5 files).
- sparsetodense (bool) – If True convert sparse matrices to dense when writing. Used for compatibility with 1.0.
Raises: IOError – if features are badly formatted.
-
class
h5features.features.
SparseFeatures
(data, sparsity, check=True)[source]¶ Bases:
h5features.features.Features
This class is specialized for managing sparse matrices as features
-
h5features.features.
contains_empty
(features)[source]¶ Check features data are not empty
Parameters: features (list of numpy arrays.) – The features data to check. Returns: True if one of the array is empty, False else.
-
h5features.features.
parse_dformat
(dformat, check=True)[source]¶ Return dformat or raise if it is not ‘dense’ or ‘sparse’
h5features.index module¶
Provides indexing facilities to the h5features package.
This index typically allows a faster read access in large datasets and is transparent to the user.
Because the h5features package is designed to handle large datasets, features and times data is internally stored in a compact indexed representation.
-
h5features.index.
create_index
(group, chunk_size)[source]¶ Create an empty index dataset in the given group.
-
h5features.index.
read_index
(group, version='1.1')[source]¶ Return the index stored in a h5features group.
Parameters: - group (h5py.Group) – The group to read the index from.
- version (str) – The h5features version of the group.
Returns: a 1D numpy array of features indices.
-
h5features.index.
write_index
(data, group, append)[source]¶ Write the data index to the given group.
Parameters: - data (h5features.Data) – The that is being indexed.
- group (h5py.Group) – The group where to write the index.
- append (bool) – If True, append the created index to the existing one in the group. Delete any existing data in index if False.
h5features.items module¶
Provides the Items class to the h5features package.
-
class
h5features.items.
Items
(data, check=True)[source]¶ Bases:
h5features.entry.Entry
This class manages items in h5features files.
Parameters: data (list of str) – A list of item names (e.g. files from which the features where extracted). Each name of the list must be unique. Raises: IOError – if data is empty or if one or more names are not unique in the list.
h5features.labels module¶
Provides the Labels class to the h5features module.
-
class
h5features.labels.
Labels
(labels, check=True)[source]¶ Bases:
h5features.entry.Entry
This class manages labels related operations for h5features files
Parameters: - labels (list of numpy arrays) –
Each element of the list contains the labels of an h5features item. Empty list are not accepted. For all t in labels, we must have t.ndim to be either 1 or 2.
- 1D arrays contain the center labelstamps of each frame of the related item.
- 2D arrays contain the begin and end labelstamps of each items’s frame, thus having t.ndim == 2 and t.shape[1] == 2.
- check (bool) – If True, raise on errors
Raises: IOError – if the time format is not 1 or 2, or if labels arrays have different dimensions.
Returns: The parsed labels dimension is either 1 or 2 for 1D or 2D labels arrays respectively.
- labels (list of numpy arrays) –
h5features.version module¶
Provides versioning facilities to the h5features package.
This module manages the h5features file format versions, specified as strings in the format ‘major.minor’. File format versions are independant of the h5feature package version (but actually follow the same numerotation scheme).
The module provides functions to list supported versions, read a version from a h5features file or check a specific version is supported.
-
h5features.version.
is_same_version
(version, group)[source]¶ Return True if version and read_version(group) are equals.
-
h5features.version.
is_supported_version
(version)[source]¶ Return True if the version is supported by h5features.