Importing data and D matrix preparation¶
MODULO can operate with two options.
Option 1: the snapshot matrix is provided by the users. MODULO accepts numpy.array data matrices only.
We use this feature in Example 1.
Option 2: a folder containing the data is provided and MODULO must assemble the snapshot matrix. We use this feature in Example 2.
When operating with MEMORY_SAVING=True, MODULO will refrain from working on the full matrix D: - if this matrix was provided (Option 1), MODULO will break the data into partitions, store them in a local folder and delete D from memory. - if only the data folder was provided (Option 2), then MODULO will group the snapshots in different partitions and store them in a local folder.
The format expected for the data is .dat or .txt. Variants for .csv files are in preparation.
The codes to generate the matrix D are available in ReadData. The main function is
_data_processing, documented below:
- modulo_vki.utils.read_db.ReadData._data_processing(D: array, FOLDER_OUT: str = './', N_PARTITIONS: int = 1, MR: bool = False, SAVE_D: bool = False, FOLDER_IN: str = './', filename: str = '', h: int = 0, f: int = 0, c: int = 0, N: int = 0, N_S: int = 0, N_T: int = 0)¶
First, if the D matrix is not provided, this method attempts to load the data and assembles the D matrix. Then, it performs pre-processing operations on the data matrix, D. if MR=True, the mean (per each column - i.e.: snapshot at time t_i) is removed; If the MEMORY_SAVING=True the data matrix is splitted to optimize memory usage. Moreover, D is stored on disk and removed from the live memory. Finally, if in this condition, also the data type of the matrix is self is changed: from float64 -> float32, with the same purpose.
- Parameters:
D – np.array data matrix D
FOLDER_OUT – str folder in which the data (partitions and/or data matrix itself) will be eventually saved.
MEMORY_SAVING – bool, optional If True, memory saving feature is activated. Passed through __init__
N_PARTITIONS – int In memory saving environment, this parameter refers to the number of partitions to be applied to the data matrix. If the number indicated by the user is not a multiple of the N_T i.e.: if (N_T % N_PARTITIONS) !=0 - then an additional partition is introduced, that contains the remaining columns
MR – bool, optional If True, it removes the mean (per column) from each snapshot
SAVE_D – bool, optional If True, the matrix D is saved into memory. If the Memory Saving feature is active, this is performed by default.
FOLDER_IN – str, optional. Needed only if database=None If the D matrix is not provided (database = None), read it from the path FOLDER_IN
filename – str, optional. Needed only if database=None If the database is not provided, read it from the files filename The files must be named “filenamexxxx.dat” where x is the number of the file that goes from 0 to the number of time steps saved
h – int, optional. Needed only if database=None Lines to be skipped from the header of filename
f – int, optional. Needed only if database=None Lines to be skipped from the footer of filename
c – int, optional. Needed only if database=None Columns to be skipped (for example if the first c columns contain the mesh grid.)
N – int, optional. Needed only if database=None Components to be analysed.
N_S – int, optional. Needed only if database=None Number of points in space.
N_T – int, optional. Needed only if database=None components to be analysed.
- Returns:
- There are four possible scenario:
if N_Partitions ==1 and MR = True, return is D,D_MEAN (the mean snapshot!)
if N_Partitions ==1 and MR = False, return is D.
if N_Partitions >1 and MR = True, return is D_MEAN
if N_Partitions >1 and MR=False, return is None