API-cachecow¶
-
class
PyMimircache.top.cachecow.
Cachecow
(**kwargs)¶ cachecow class providing top level API
-
open
(file_path, trace_type='p', data_type='c', **kwargs)¶ The default operation of this function opens a plain text trace, the format of a plain text trace is such a file that each line contains a label.
By changing trace type, it can be used for opening other types of trace, supported trace type includes
trace_type file type require init_params “p” plain text No “c” csv Yes “b” binary Yes “v” vscsi No the effect of this is the save as calling corresponding functions (csv, binary, vscsi)
Parameters: - file_path – the path to the data
- trace_type – type of trace, “p” for plainText, “c” for csv, “v” for vscsi, “b” for binary
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- kwargs – parameters for opening the trace
Returns: reader object
-
csv
(file_path, init_params, data_type='c', block_unit_size=0, disk_sector_size=0, **kwargs)¶ open a csv trace, init_params is a dictionary specifying the specs of the csv file, the possible keys are listed in the table below. The column/field number begins from 1, so the first column(field) is 1, the second is 2, etc.
Parameters: - file_path – the path to the data
- init_params – params related to csv file, see above or csvReader for details
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- block_unit_size – the block size for a cache, currently storage system only
- disk_sector_size – the disk sector size of input file, storage system only
Returns: reader object
Keyword Argument file type Value Type Default Value Description label csv/ binary int this is required the column of the label of the request fmt binary string this is required fmt string of binary data, same as python struct header csv True/False False whether csv data has header delimiter csv char “,” the delimiter separating fields in the csv file real_time csv/ binary int NA the column of real time op csv/ binary int NA the column of operation (read/write) size csv/ binary int NA the column of block/request size
-
binary
(file_path, init_params, data_type='l', block_unit_size=0, disk_sector_size=0, **kwargs)¶ open a binary trace file, init_params see function csv
Parameters: - file_path – the path to the data
- init_params – params related to the spec of data, see above csv for details
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- block_unit_size – the block size for a cache, currently storage system only
- disk_sector_size – the disk sector size of input file, storage system only
Returns: reader object
-
vscsi
(file_path, block_unit_size=0, **kwargs)¶ open vscsi trace file
Parameters: - file_path – the path to the data
- block_unit_size – the block size for a cache, currently storage system only
Returns: reader object
-
reset
()¶ - reset cachecow to initial state, including
- reset reader to the beginning of the trace
-
close
()¶ close the reader opened in cachecow, and clean up in the future
-
stat
(time_period=[-1, 0])¶ obtain the statistical information about the trace, including
- number of requests
- number of uniq items
- cold miss ratio
- a list of top 10 popular in form of (obj, num of requests):
- number of obj/block accessed only once
- frequency mean
- time span
Returns: a string of the information above
-
get_frequency_access_list
(time_period=[-1, 0])¶ obtain the statistical information about the trace, including
- number of requests
- number of uniq items
- cold miss ratio
- a list of top 10 popular in form of (obj, num of requests):
- number of obj/block accessed only once
- frequency mean
- time span
Returns: a string of the information above
-
num_of_req
()¶ Returns: the number of requests in the trace
-
num_of_uniq_req
()¶ Returns: the number of unique requests in the trace
-
get_reuse_distance
()¶ Returns: an array of reuse distance
-
get_hit_count_dict
(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶ get hit count of the given algorithm and return a dict of mapping from cache size -> hit count notice that hit count array is not CDF, meaning hit count of size 2 does not include hit count of size 1, you need to sum up to get a CDF.
Parameters: - algorithm – cache replacement algorithms
- cache_size – size of cache
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
- use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
- kwargs – other parameters including num_of_threads
Returns: an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio
-
get_hit_ratio_dict
(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶ get hit ratio of the given algorithm and return a dict of mapping from cache size -> hit ratio
Parameters: - algorithm – cache replacement algorithms
- cache_size – size of cache
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
- use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
- kwargs – other parameters including num_of_threads
Returns: an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio
-
profiler
(algorithm, cache_params=None, cache_size=-1, bin_size=-1, use_general_profiler=False, **kwargs)¶ get a profiler instance, this should not be used by most users
Parameters: - algorithm – name of algorithm
- cache_params – parameters of given cache replacement algorithm
- cache_size – size of cache
- bin_size – bin_size for generalProfiler
- use_general_profiler –
this option is for LRU only, if it is True, then return a cGeneralProfiler for LRU, otherwise, return a LRUProfiler for LRU.
Note: LRUProfiler does not require cache_size/bin_size params, it does not sample thus provides a smooth curve, however, it is O(logN) at each step, in constrast, cGeneralProfiler samples the curve, but use O(1) at each step
- kwargs – num_of_threads
Returns: a profiler instance
-
heatmap
(time_mode, plot_type, time_interval=-1, num_of_pixels=-1, algorithm='LRU', cache_params=None, cache_size=-1, **kwargs)¶ plot heatmaps, currently supports the following heatmaps
- hit_ratio_start_time_end_time
- hit_ratio_start_time_cache_size (python only)
- avg_rd_start_time_end_time (python only)
- cold_miss_count_start_time_end_time (python only)
- rd_distribution
- rd_distribution_CDF
- future_rd_distribution
- dist_distribution
- reuse_time_distribution
Parameters: - time_mode – the type of time, can be “v” for virtual time, or “r” for real time
- plot_type – the name of plot types, see above for plot types
- time_interval – the time interval of one pixel
- num_of_pixels – if you don’t to use time_interval, you can also specify how many pixels you want in one dimension, note this feature is not well tested
- algorithm – what algorithm to use for plotting heatmap, this is not required for distance related heatmap like rd_distribution
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- cache_size – The size of cache, this is required only for hit_ratio_start_time_end_time
- kwargs – other parameters for computation and plotting such as num_of_threads, figname
-
diff_heatmap
(time_mode, plot_type, algorithm1='LRU', time_interval=-1, num_of_pixels=-1, algorithm2='Optimal', cache_params1=None, cache_params2=None, cache_size=-1, **kwargs)¶ Plot the differential heatmap between two algorithms by alg2 - alg1
Parameters: - cache_size – size of cache
- time_mode – time time_mode “v” for virutal time, “r” for real time
- plot_type – same as the name in heatmap function
- algorithm1 – name of the first alg
- time_interval – same as in heatmap
- num_of_pixels – same as in heatmap
- algorithm2 – name of the second algorithm
- cache_params1 – parameters of the first algorithm
- cache_params2 – parameters of the second algorithm
- kwargs – include num_of_threads
-
twoDPlot
(plot_type, **kwargs)¶ an aggregate function for all two dimenional plots printing except hit ratio curve
plot type required parameters Description cold_miss_count time_mode, time_interval cold miss count VS time cold_miss_ratio time_mode, time_interval cold miss ratio VS time request_rate time_mode, time_interval num of requests VS time popularity NA Percentage of obj VS frequency rd_distribution NA Num of req VS reuse distance rt_distribution NA Num of req VS reuse time scan_vis_2d NA mapping from original objID to sequential number interval_hit_ratio cache_size hit ratio of interval VS time request_traffic_vol obj_size_distribution Parameters: - plot_type – type of the plot, see above
- kwargs – paramters related to plots, see twoDPlots module for detailed control over plots
-
plotHRCs
(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, auto_resize=True, figname='HRC.png', **kwargs)¶ this function provides hit ratio curve plotting
Parameters: - algorithm_list – a list of algorithm(s)
- cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
- cache_size – maximal size of cache, use -1 for max possible size
- bin_size – bin size for non-LRU profiling
- auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
- figname – name of figure
- kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size
save_gradually - save a figure everytime computation for one algorithm finishes,
label - instead of using algorithm list as label, specify user-defined label
-
plotMRCs
(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, figname='MRC.png', **kwargs)¶ this function provides miss ratio curve plotting
Parameters: - algorithm_list – a list of algorithm(s)
- cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
- cache_size – maximal size of cache, use -1 for max possible size
- bin_size – bin size for non-LRU profiling
- auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
- figname – name of figure
- kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size
save_gradually - save a figure everytime computation for one algorithm finishes,
label - instead of using algorithm list as label, specify user-defined label
-
characterize
(characterize_type, cache_size=-1, **kwargs)¶ use this function to obtain a series of plots about your trace, the type includes
- short - short run time, fewer plots with less accuracy
- medium
- long
- all - most of the available plots with high accuracy, notice it can take LONG time on big trace
Parameters: - characterize_type – see above, options: short, medium, long, all
- cache_size – estimated cache size for the trace, if -1, PyMimircache will estimate the cache size
- kwargs – print_stat
Returns: trace stat string
-
-
class
PyMimircache.top.cachecow.
Cachecow
(**kwargs) cachecow class providing top level API
-
open
(file_path, trace_type='p', data_type='c', **kwargs) The default operation of this function opens a plain text trace, the format of a plain text trace is such a file that each line contains a label.
By changing trace type, it can be used for opening other types of trace, supported trace type includes
trace_type file type require init_params “p” plain text No “c” csv Yes “b” binary Yes “v” vscsi No the effect of this is the save as calling corresponding functions (csv, binary, vscsi)
Parameters: - file_path – the path to the data
- trace_type – type of trace, “p” for plainText, “c” for csv, “v” for vscsi, “b” for binary
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- kwargs – parameters for opening the trace
Returns: reader object
-
csv
(file_path, init_params, data_type='c', block_unit_size=0, disk_sector_size=0, **kwargs) open a csv trace, init_params is a dictionary specifying the specs of the csv file, the possible keys are listed in the table below. The column/field number begins from 1, so the first column(field) is 1, the second is 2, etc.
Parameters: - file_path – the path to the data
- init_params – params related to csv file, see above or csvReader for details
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- block_unit_size – the block size for a cache, currently storage system only
- disk_sector_size – the disk sector size of input file, storage system only
Returns: reader object
Keyword Argument file type Value Type Default Value Description label csv/ binary int this is required the column of the label of the request fmt binary string this is required fmt string of binary data, same as python struct header csv True/False False whether csv data has header delimiter csv char “,” the delimiter separating fields in the csv file real_time csv/ binary int NA the column of real time op csv/ binary int NA the column of operation (read/write) size csv/ binary int NA the column of block/request size
-
binary
(file_path, init_params, data_type='l', block_unit_size=0, disk_sector_size=0, **kwargs) open a binary trace file, init_params see function csv
Parameters: - file_path – the path to the data
- init_params – params related to the spec of data, see above csv for details
- data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
- block_unit_size – the block size for a cache, currently storage system only
- disk_sector_size – the disk sector size of input file, storage system only
Returns: reader object
-
vscsi
(file_path, block_unit_size=0, **kwargs) open vscsi trace file
Parameters: - file_path – the path to the data
- block_unit_size – the block size for a cache, currently storage system only
Returns: reader object
-
reset
() - reset cachecow to initial state, including
- reset reader to the beginning of the trace
-
close
() close the reader opened in cachecow, and clean up in the future
-
stat
(time_period=[-1, 0]) obtain the statistical information about the trace, including
- number of requests
- number of uniq items
- cold miss ratio
- a list of top 10 popular in form of (obj, num of requests):
- number of obj/block accessed only once
- frequency mean
- time span
Returns: a string of the information above
-
get_frequency_access_list
(time_period=[-1, 0]) obtain the statistical information about the trace, including
- number of requests
- number of uniq items
- cold miss ratio
- a list of top 10 popular in form of (obj, num of requests):
- number of obj/block accessed only once
- frequency mean
- time span
Returns: a string of the information above
-
num_of_req
() Returns: the number of requests in the trace
-
num_of_uniq_req
() Returns: the number of unique requests in the trace
-
get_reuse_distance
() Returns: an array of reuse distance
-
get_hit_count_dict
(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs) get hit count of the given algorithm and return a dict of mapping from cache size -> hit count notice that hit count array is not CDF, meaning hit count of size 2 does not include hit count of size 1, you need to sum up to get a CDF.
Parameters: - algorithm – cache replacement algorithms
- cache_size – size of cache
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
- use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
- kwargs – other parameters including num_of_threads
Returns: an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio
-
get_hit_ratio_dict
(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs) get hit ratio of the given algorithm and return a dict of mapping from cache size -> hit ratio
Parameters: - algorithm – cache replacement algorithms
- cache_size – size of cache
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
- use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
- kwargs – other parameters including num_of_threads
Returns: an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio
-
profiler
(algorithm, cache_params=None, cache_size=-1, bin_size=-1, use_general_profiler=False, **kwargs) get a profiler instance, this should not be used by most users
Parameters: - algorithm – name of algorithm
- cache_params – parameters of given cache replacement algorithm
- cache_size – size of cache
- bin_size – bin_size for generalProfiler
- use_general_profiler –
this option is for LRU only, if it is True, then return a cGeneralProfiler for LRU, otherwise, return a LRUProfiler for LRU.
Note: LRUProfiler does not require cache_size/bin_size params, it does not sample thus provides a smooth curve, however, it is O(logN) at each step, in constrast, cGeneralProfiler samples the curve, but use O(1) at each step
- kwargs – num_of_threads
Returns: a profiler instance
-
heatmap
(time_mode, plot_type, time_interval=-1, num_of_pixels=-1, algorithm='LRU', cache_params=None, cache_size=-1, **kwargs) plot heatmaps, currently supports the following heatmaps
- hit_ratio_start_time_end_time
- hit_ratio_start_time_cache_size (python only)
- avg_rd_start_time_end_time (python only)
- cold_miss_count_start_time_end_time (python only)
- rd_distribution
- rd_distribution_CDF
- future_rd_distribution
- dist_distribution
- reuse_time_distribution
Parameters: - time_mode – the type of time, can be “v” for virtual time, or “r” for real time
- plot_type – the name of plot types, see above for plot types
- time_interval – the time interval of one pixel
- num_of_pixels – if you don’t to use time_interval, you can also specify how many pixels you want in one dimension, note this feature is not well tested
- algorithm – what algorithm to use for plotting heatmap, this is not required for distance related heatmap like rd_distribution
- cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
- cache_size – The size of cache, this is required only for hit_ratio_start_time_end_time
- kwargs – other parameters for computation and plotting such as num_of_threads, figname
-
diff_heatmap
(time_mode, plot_type, algorithm1='LRU', time_interval=-1, num_of_pixels=-1, algorithm2='Optimal', cache_params1=None, cache_params2=None, cache_size=-1, **kwargs) Plot the differential heatmap between two algorithms by alg2 - alg1
Parameters: - cache_size – size of cache
- time_mode – time time_mode “v” for virutal time, “r” for real time
- plot_type – same as the name in heatmap function
- algorithm1 – name of the first alg
- time_interval – same as in heatmap
- num_of_pixels – same as in heatmap
- algorithm2 – name of the second algorithm
- cache_params1 – parameters of the first algorithm
- cache_params2 – parameters of the second algorithm
- kwargs – include num_of_threads
-
twoDPlot
(plot_type, **kwargs) an aggregate function for all two dimenional plots printing except hit ratio curve
plot type required parameters Description cold_miss_count time_mode, time_interval cold miss count VS time cold_miss_ratio time_mode, time_interval cold miss ratio VS time request_rate time_mode, time_interval num of requests VS time popularity NA Percentage of obj VS frequency rd_distribution NA Num of req VS reuse distance rt_distribution NA Num of req VS reuse time scan_vis_2d NA mapping from original objID to sequential number interval_hit_ratio cache_size hit ratio of interval VS time request_traffic_vol obj_size_distribution Parameters: - plot_type – type of the plot, see above
- kwargs – paramters related to plots, see twoDPlots module for detailed control over plots
-
plotHRCs
(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, auto_resize=True, figname='HRC.png', **kwargs) this function provides hit ratio curve plotting
Parameters: - algorithm_list – a list of algorithm(s)
- cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
- cache_size – maximal size of cache, use -1 for max possible size
- bin_size – bin size for non-LRU profiling
- auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
- figname – name of figure
- kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size
save_gradually - save a figure everytime computation for one algorithm finishes,
label - instead of using algorithm list as label, specify user-defined label
-
plotMRCs
(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, figname='MRC.png', **kwargs) this function provides miss ratio curve plotting
Parameters: - algorithm_list – a list of algorithm(s)
- cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
- cache_size – maximal size of cache, use -1 for max possible size
- bin_size – bin size for non-LRU profiling
- auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
- figname – name of figure
- kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size
save_gradually - save a figure everytime computation for one algorithm finishes,
label - instead of using algorithm list as label, specify user-defined label
-
characterize
(characterize_type, cache_size=-1, **kwargs) use this function to obtain a series of plots about your trace, the type includes
- short - short run time, fewer plots with less accuracy
- medium
- long
- all - most of the available plots with high accuracy, notice it can take LONG time on big trace
Parameters: - characterize_type – see above, options: short, medium, long, all
- cache_size – estimated cache size for the trace, if -1, PyMimircache will estimate the cache size
- kwargs – print_stat
Returns: trace stat string
-