Class AMSDataStore¶
Defined in File store.py
Class Documentation¶
- class AMSDataStore¶
A class representing the persistent data of AMS. The class abstracts the 'view' of AMS persistent data storage through a SQL database that stores information reqarding different files in the Fileystem. The SQL database catecorizes files in three possible types: 1. 'data' : A collection of files stored in some PFS directory that will train some model 2. 'models' : A collection of torch-scripted models. 3. 'candidates' : A collection of files stored in some PFS directory that can be added as data. Every 'entry' is associated with a domainName. Providing the persistent abstraction. 'EOS maps to a set of files and models'
Public Functions
- __init__(self, application_name, url)¶
Initializes the AMSDataStore class. Upon init the kosh-store is closed and not connected
- is_open(self)¶
Check whether we are connected to a database
- open(self)¶
Open and connect to the database
- close(self)¶
- __enter__(self)¶
- __exit__(self, exc_type, exc_val, exc_tb)¶
- find(self, domain_name=None, filename=None, entry_type=None, version=None)¶
- add_data(self, domain_name, data_files=list(), version=None, metadata=dict())¶
Adds files in the kosh-store and associates them to the 'data' entry. The function assumes data_files to always be in hdf5 format. Args: data_files: A list of files to add in the entry version: The version to assign to all files metadata: The metadata to associate with this file
- add_model(self, domain_name, model, test_error, val_error, version=None, metadata=dict())¶
Adds a model in the kosh-store and associates them to the 'models' entry. The function assumes models to always be in torchscript format. Args: model: The path containing the torchscript model version: The version to assing to the model metadata: The metadata to associate with this model
- add_candidates(self, domain_name, data_files=list(), version=None, metadata=dict())¶
Adds files in the kosh-store and associates them to the 'candidates' entry. The function assumes candidates to always be in hdf5 format. Args: data_files: A list of candidate files version: The version to assign to the model metadata: The metadata to associate with this model
- remove_data(self, str domain_name, List[str] filenames, version=None, metadata=None, purge=True)¶
Remove files from the data database and from filesystem Args: domain_name: The domain name this files belong to entry_type: The entry to look for the specified files filenames: A list of files to be deleted version: An integer or none if we need to filter based on version metadata: Additional metadata we can query and partially delete from purge: If set to true it will delete the file from the filesystem
- remove_models(self, str domain_name, List[str] models, version=None, metadata=None, purge=True)¶
Remove files from the data database and from filesystem Args: domain_name: The domain name this files belong to entry_type: The entry to look for the specified files models: A list of files to be deleted version: An integer or none if we need to filter based on version metadata: Additional metadata we can query and partially delete from purge: If set to true it will delete the file from the filesystem
- remove_candidates(self, str domain_name, List[str] filenames, version=None, metadata=None, purge=True)¶
Remove files from the data database and from filesystem Args: domain_name: The domain name this files belong to entry_type: The entry to look for the specified files filenames: A list of files to be deleted version: An integer or none if we need to filter based on version metadata: Additional metadata we can query and partially delete from purge: If set to true it will delete the file from the filesystem
- get_data_versions(self, domain_name, associate_files=False)¶
Returns a list of versions existing for the data entry Returns: A list of existing versions in our database
- get_model_versions(self, domain_name, associate_files=False)¶
Returns a list of versions existing for the model entry Returns: A list of existing model versions in our database
- get_candidate_versions(self, domain_name, associate_files=False)¶
Returns a list of versions existing for the candidate entry Returns: A list of existing candidate versions in our database
- get_files(self, str domain_name, str entry, versions=None)¶
Returns a list of paths to files for the specified version Args: entry: The entry in the ensemble can be any of candidates, model, data versions: A list of versions we are looking for. If 'None' return all files in entry If "latest" return the latest version in the store Returns: A list of existing files in the kosh-store
- get_candidate_files(self, domain_name, versions=None)¶
Returns a list of paths to files for the specified version Args: versions: A list of versions we are looking for. If 'None' return all files in entry If "latest" return the latest version in the store If "list" return only files matching these versions Returns: A list of existing files in the kosh-store candidates ensemble
- get_model_files(self, domain_name, versions=None)¶
Returns a list of paths to files for the specified version Args: versions: A list of versions we are looking for. If 'None' return all files in entry If "latest" return the latest version in the store If "list" return only files matching these versions Returns: A list of existing files in the kosh-store model ensemble
- get_data_files(self, domain_name, versions=None)¶
Returns a list of paths to files for the specified version Args: versions: A list of versions we are looking for. If 'None' return all files in entry If "latest" return the latest version in the store If "list" return only files matching these versions Returns: A list of existing files in the kosh-store model ensemble
- move(self, domain_name, dest_root_path, src_type, dest_type, filenames)¶
Moves files between direcories and updates the respective db. It follows a "safe" approach: copy, add, delete the file instead of moving the underlying file. Args: src_type: the ensemble name containing the original files dst_root_path: The directory to which we should move directories to dest_type: the ensemble name of the files files: The files to be moved NOTE: The current implementation will lose all metadata associated with the original src files. We need to consider whether we want to "migrate" those to the destination entry dataset.
- search(self, domain_name=None, entry=None, version=None, metadata=dict())¶
Search for items in the database that match the metadata Args: entry: Which entry to search for ('data', 'models', 'candidates') version: Specific version to look for, when 'version' is 'latest' we return the entry with the largest version. If None, we are not matching versions. metadata: A dictionary of key values to search in our database Returns: A list of matching entries described as dictionaries
- __str__(self)¶
- suggest_model_file_name(self, domain_name=None)¶
- suggest_candidate_file_name(self, domain_name=None)¶
- suggest_data_file_name(self, domain_name=None)¶
Public Static Attributes
- entry_suffix = {"data": "h5", "models": "pt", "candidates": "h5"}¶
- entry_mime_types = {"data": "hdf5", "models": "zip", "candidates": "hdf5"}¶
- valid_entries = {"data", "candidates", "models"}¶
- valid_dbs = {"sqlite", "mariadb"}¶
Protected Functions
- _add_entries(self, str domain_name, str entry_type, List[str] filenames, version=None, metadata=None)¶
Adds files of entry_type on the designated domain_name and associates the version and the metadata to those entries. Args: domain_name: The domain_name of this entry entry_type: Can be either 'models', 'candidates', 'data'. filenames: A list of files to add in the entry version: The version to assign to all files meta_dict: The metadata to associate with this file Returns: None
- _remove_entries(self, str domain_name, str entry_type, List[str] filenames, version=None, metadata=None, purge=True)¶
Remove files from database and from filesystem Args: domain_name: The domain name this files belong to entry_type: The entry to look for the specified files filenames: A list of files to be deleted version: An integer or none if we need to filter based on version metadata= Additional metadata we can query and partially delete from
- _get_entry_versions(self, domain_name, entry_type, associate_files=False)¶
Returns a list of versions existing for the specified entry Args: domain_name: The entry type we are looking for associate_files: Associate files in store with the versions Returns: A list of the unique existing versions in our database or a dictionary of versions to lists associating files with the specific version
- _suggest_entry_file_name(self, entry, domain_name)¶