Class AMSDataStore

Class Documentation

class AMSDataStore
A class representing the persistent data of AMS.

The class abstracts the 'view' of AMS persistent data storage through
a SQL database that stores information reqarding different files in the Fileystem.

The SQL database catecorizes files in three possible types:
    1. 'data' : A collection of files stored in some PFS directory that will train some model
    2. 'models' : A collection of torch-scripted models.
    3. 'candidates' : A collection of files stored in some PFS directory that can be added as data.

Every 'entry' is associated with a domainName. Providing the persistent abstraction.
'EOS maps to a set of files and models'

Public Functions

__init__(self, application_name, url)
Initializes the AMSDataStore class. Upon init the kosh-store is closed and not connected
is_open(self)
Check whether we are connected to a database
open(self)
Open and connect to the database
close(self)
__enter__(self)
__exit__(self, exc_type, exc_val, exc_tb)
find(self, domain_name=None, filename=None, entry_type=None, version=None)
add_data(self, domain_name, data_files=list(), version=None, metadata=dict())
Adds files in the kosh-store and associates them to the 'data' entry.

The function assumes data_files to always be in hdf5 format.

Args:
    data_files: A list of files to add in the entry
    version: The version to assign to all files
    metadata: The metadata to associate with this file
add_model(self, domain_name, model, test_error, val_error, version=None, metadata=dict())
Adds a model in the kosh-store and associates them to the 'models' entry.

The function assumes models to always be in torchscript format.

Args:
    model: The path containing the torchscript model
    version: The version to assing to the model
    metadata: The metadata to associate with this model
add_candidates(self, domain_name, data_files=list(), version=None, metadata=dict())
Adds files in the kosh-store and associates them to the 'candidates' entry.

The function assumes candidates to always be in hdf5 format.

Args:
    data_files: A list of candidate files
    version: The version to assign to the model
    metadata: The metadata to associate with this model
remove_data(self, str domain_name, List[str] filenames, version=None, metadata=None, purge=True)
Remove files from the data database and from filesystem

Args:
    domain_name: The domain name this files belong to
    entry_type: The entry to look for the specified files
    filenames: A list of files to be deleted
    version: An integer or none if we need to filter based on version
    metadata: Additional metadata we can query and partially delete from
    purge: If set to true it will delete the file from the filesystem
remove_models(self, str domain_name, List[str] models, version=None, metadata=None, purge=True)
Remove files from the data database and from filesystem

Args:
    domain_name: The domain name this files belong to
    entry_type: The entry to look for the specified files
    models: A list of files to be deleted
    version: An integer or none if we need to filter based on version
    metadata: Additional metadata we can query and partially delete from
    purge: If set to true it will delete the file from the filesystem
remove_candidates(self, str domain_name, List[str] filenames, version=None, metadata=None, purge=True)
Remove files from the data database and from filesystem

Args:
    domain_name: The domain name this files belong to
    entry_type: The entry to look for the specified files
    filenames: A list of files to be deleted
    version: An integer or none if we need to filter based on version
    metadata: Additional metadata we can query and partially delete from
    purge: If set to true it will delete the file from the filesystem
get_data_versions(self, domain_name, associate_files=False)
Returns a list of versions existing for the data entry


Returns:
    A list of existing versions in our database
get_model_versions(self, domain_name, associate_files=False)
Returns a list of versions existing for the model entry


Returns:
    A list of existing model versions in our database
get_candidate_versions(self, domain_name, associate_files=False)
Returns a list of versions existing for the candidate entry


Returns:
    A list of existing candidate versions in our database
get_files(self, str domain_name, str entry, versions=None)
Returns a list of paths to files for the specified version

Args:
    entry: The entry in the ensemble can be any of candidates, model, data
    versions: A list of versions we are looking for.
        If 'None'   return all files in entry
        If "latest" return the latest version in the store

Returns:
    A list of existing files in the kosh-store
get_candidate_files(self, domain_name, versions=None)
Returns a list of paths to files for the specified version

Args:
    versions: A list of versions we are looking for.
        If 'None'   return all files in entry
        If "latest" return the latest version in the store
        If "list" return only files matching these versions

Returns:
    A list of existing files in the kosh-store candidates ensemble
get_model_files(self, domain_name, versions=None)
Returns a list of paths to files for the specified version

Args:
    versions: A list of versions we are looking for.
        If 'None'   return all files in entry
        If "latest" return the latest version in the store
        If "list" return only files matching these versions

Returns:
    A list of existing files in the kosh-store model ensemble
get_data_files(self, domain_name, versions=None)
Returns a list of paths to files for the specified version

Args:
    versions: A list of versions we are looking for.
        If 'None'   return all files in entry
        If "latest" return the latest version in the store
        If "list" return only files matching these versions

Returns:
    A list of existing files in the kosh-store model ensemble
move(self, domain_name, dest_root_path, src_type, dest_type, filenames)
Moves files between direcories and updates the respective db. It follows a "safe" approach: copy, add, delete the file instead of moving the underlying file.

Args:
    src_type: the ensemble name containing the original files
    dst_root_path: The directory to which we should move directories to
    dest_type: the ensemble name of the files
    files: The files to be moved

NOTE: The current implementation will lose all metadata associated with the original src files. We need to consider whether we want to "migrate"
those to the destination entry dataset.
search(self, domain_name=None, entry=None, version=None, metadata=dict())
Search for items in the database that match the metadata
Args:
    entry: Which entry to search for ('data', 'models', 'candidates')
    version: Specific version to look for, when 'version' is 'latest' we
        return the entry with the largest version. If None, we are not matching
        versions.
    metadata: A dictionary of key values to search in our database

Returns:
    A list of matching entries described as dictionaries
__str__(self)
suggest_model_file_name(self, domain_name=None)
suggest_candidate_file_name(self, domain_name=None)
suggest_data_file_name(self, domain_name=None)

Public Static Attributes

entry_suffix = {"data": "h5", "models": "pt", "candidates": "h5"}
entry_mime_types = {"data": "hdf5", "models": "zip", "candidates": "hdf5"}
valid_entries = {"data", "candidates", "models"}
valid_dbs = {"sqlite", "mariadb"}

Protected Functions

_add_entries(self, str domain_name, str entry_type, List[str] filenames, version=None, metadata=None)
Adds files of entry_type on the designated domain_name and associates the version and the metadata to those entries.
Args:
    domain_name: The domain_name of this entry
    entry_type: Can be either 'models', 'candidates', 'data'.
    filenames: A list of files to add in the entry
    version: The version to assign to all files
    meta_dict: The metadata to associate with this file

Returns:

    None
_remove_entries(self, str domain_name, str entry_type, List[str] filenames, version=None, metadata=None, purge=True)
Remove files from database and from filesystem

Args:
    domain_name: The domain name this files belong to
    entry_type: The entry to look for the specified files
    filenames: A list of files to be deleted
    version: An integer or none if we need to filter based on version
    metadata= Additional metadata we can query and partially delete from
_get_entry_versions(self, domain_name, entry_type, associate_files=False)
Returns a list of versions existing for the specified entry

Args:
    domain_name: The entry type we are looking for
    associate_files: Associate files in store with the versions

Returns:
    A list of the unique existing versions in our database or a dictionary of versions to lists associating files with the specific version
_suggest_entry_file_name(self, entry, domain_name)

Protected Attributes

_application_name
_url
_session
_engine