API-cachecow¶

this module offers the upper level API to user, it currently supports four types of operations,

trace loading
trace information retrieving
trace profiling
plotting

Author: Jason Yang <peter.waynechina@gmail.com> 2017/08

class PyMimircache.top.cachecow.Cachecow(**kwargs)¶

cachecow class providing top level API

open(file_path, trace_type='p', data_type='c', **kwargs)¶

The default operation of this function opens a plain text trace, the format of a plain text trace is such a file that each line contains a label.

By changing trace type, it can be used for opening other types of trace, supported trace type includes

trace_type	file type	require init_params
“p”	plain text	No
“c”	csv	Yes
“b”	binary	Yes
“v”	vscsi	No

the effect of this is the save as calling corresponding functions (csv, binary, vscsi)

Parameters:

file_path – the path to the data
trace_type – type of trace, “p” for plainText, “c” for csv, “v” for vscsi, “b” for binary
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
kwargs – parameters for opening the trace

Returns:

reader object

csv(file_path, init_params, data_type='c', block_unit_size=0, disk_sector_size=0, **kwargs)¶

open a csv trace, init_params is a dictionary specifying the specs of the csv file, the possible keys are listed in the table below. The column/field number begins from 1, so the first column(field) is 1, the second is 2, etc.

Parameters:

file_path – the path to the data
init_params – params related to csv file, see above or csvReader for details
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
block_unit_size – the block size for a cache, currently storage system only
disk_sector_size – the disk sector size of input file, storage system only

Returns:

reader object

Keyword Argument	file type	Value Type	Default Value	Description
label	csv/ binary	int	this is required	the column of the label of the request
fmt	binary	string	this is required	fmt string of binary data, same as python struct
header	csv	True/False	False	whether csv data has header
delimiter	csv	char	“,”	the delimiter separating fields in the csv file
real_time	csv/ binary	int	NA	the column of real time
op	csv/ binary	int	NA	the column of operation (read/write)
size	csv/ binary	int	NA	the column of block/request size

binary(file_path, init_params, data_type='l', block_unit_size=0, disk_sector_size=0, **kwargs)¶

open a binary trace file, init_params see function csv

Parameters:

file_path – the path to the data
init_params – params related to the spec of data, see above csv for details
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
block_unit_size – the block size for a cache, currently storage system only
disk_sector_size – the disk sector size of input file, storage system only

Returns:

reader object

vscsi(file_path, block_unit_size=0, **kwargs)¶

open vscsi trace file

Parameters:

file_path – the path to the data
block_unit_size – the block size for a cache, currently storage system only

Returns:

reader object

reset()¶

reset cachecow to initial state, including: reset reader to the beginning of the trace

close()¶: close the reader opened in cachecow, and clean up in the future

stat(time_period=[-1, 0])¶

obtain the statistical information about the trace, including

number of requests

number of uniq items

cold miss ratio

a list of top 10 popular in form of (obj, num of requests):

number of obj/block accessed only once

frequency mean

time span

Returns:: a string of the information above

get_frequency_access_list(time_period=[-1, 0])¶

obtain the statistical information about the trace, including

number of requests

number of uniq items

cold miss ratio

a list of top 10 popular in form of (obj, num of requests):

number of obj/block accessed only once

frequency mean

time span

Returns:: a string of the information above

num_of_req()¶

Returns:: the number of requests in the trace

num_of_uniq_req()¶

Returns:: the number of unique requests in the trace

get_reuse_distance()¶

Returns:: an array of reuse distance

get_hit_count_dict(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶

get hit count of the given algorithm and return a dict of mapping from cache size -> hit count notice that hit count array is not CDF, meaning hit count of size 2 does not include hit count of size 1, you need to sum up to get a CDF.

Parameters:

algorithm – cache replacement algorithms
cache_size – size of cache
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
kwargs – other parameters including num_of_threads

Returns:

an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio

get_hit_ratio_dict(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶

get hit ratio of the given algorithm and return a dict of mapping from cache size -> hit ratio

Parameters:

algorithm – cache replacement algorithms
cache_size – size of cache
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
kwargs – other parameters including num_of_threads

Returns:

an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio

profiler(algorithm, cache_params=None, cache_size=-1, bin_size=-1, use_general_profiler=False, **kwargs)¶

get a profiler instance, this should not be used by most users

Parameters:

algorithm – name of algorithm
cache_params – parameters of given cache replacement algorithm
cache_size – size of cache
bin_size – bin_size for generalProfiler
use_general_profiler –
this option is for LRU only, if it is True, then return a cGeneralProfiler for LRU, otherwise, return a LRUProfiler for LRU.

Note: LRUProfiler does not require cache_size/bin_size params, it does not sample thus provides a smooth curve, however, it is O(logN) at each step, in contrast, cGeneralProfiler samples the curve, but use O(1) at each step
kwargs – num_of_threads

Returns:

a profiler instance

heatmap(time_mode, plot_type, time_interval=-1, num_of_pixels=-1, algorithm='LRU', cache_params=None, cache_size=-1, **kwargs)¶

plot heatmaps, currently supports the following heatmaps

hit_ratio_start_time_end_time
hit_ratio_start_time_cache_size (python only)
avg_rd_start_time_end_time (python only)
cold_miss_count_start_time_end_time (python only)
rd_distribution
rd_distribution_CDF
future_rd_distribution
dist_distribution
reuse_time_distribution

Parameters:

time_mode – the type of time, can be “v” for virtual time, or “r” for real time
plot_type – the name of plot types, see above for plot types
time_interval – the time interval of one pixel
num_of_pixels – if you don’t to use time_interval, you can also specify how many pixels you want in one dimension, note this feature is not well tested
algorithm – what algorithm to use for plotting heatmap, this is not required for distance related heatmap like rd_distribution
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
cache_size – The size of cache, this is required only for hit_ratio_start_time_end_time
kwargs – other parameters for computation and plotting such as num_of_threads, figname

diff_heatmap(time_mode, plot_type, algorithm1='LRU', time_interval=-1, num_of_pixels=-1, algorithm2='Optimal', cache_params1=None, cache_params2=None, cache_size=-1, **kwargs)¶

Plot the differential heatmap between two algorithms by alg2 - alg1

Parameters:

cache_size – size of cache
time_mode – time time_mode “v” for virtual time, “r” for real time
plot_type – same as the name in heatmap function
algorithm1 – name of the first alg
time_interval – same as in heatmap
num_of_pixels – same as in heatmap
algorithm2 – name of the second algorithm
cache_params1 – parameters of the first algorithm
cache_params2 – parameters of the second algorithm
kwargs – include num_of_threads

twoDPlot(plot_type, **kwargs)¶

an aggregate function for all two dimensional plots printing except hit ratio curve

plot type	required parameters	Description
cold_miss_count	time_mode, time_interval	cold miss count VS time
cold_miss_ratio	time_mode, time_interval	cold miss ratio VS time
request_rate	time_mode, time_interval	num of requests VS time
popularity	NA	Percentage of obj VS frequency
rd_popularity	NA	Num of req VS reuse distance
rt_popularity	NA	Num of req VS reuse time
scan_vis_2d	NA	mapping from original objID to sequential number
interval_hit_ratio	cache_size	hit ratio of interval VS time

Parameters:

plot_type – type of the plot, see above
kwargs – parameters related to plots, see twoDPlots module for detailed control over plots

plotHRCs(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, auto_resize=True, figname='HRC.png', **kwargs)¶

this function provides hit ratio curve plotting

Parameters:

algorithm_list – a list of algorithm(s)
cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
cache_size – maximal size of cache, use -1 for max possible size
bin_size – bin size for non-LRU profiling
auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
figname – name of figure
kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size

save_gradually - save a figure every time computation for one algorithm finishes,

label - instead of using algorithm list as label, specify user-defined label

characterize(characterize_type, cache_size=-1, **kwargs)¶

use this function to obtain a series of plots about your trace, the type includes

short - short run time, fewer plots with less accuracy
medium
long
all - most of the available plots with high accuracy, notice it can take LONG time on big trace

Parameters:

characterize_type – see above, options: short, medium, long, all
cache_size – estimated cache size for the trace, if -1, PyMimircache will estimate the cache size
kwargs – print_stat

Returns:

trace stat string

class PyMimircache.top.cachecow.Cachecow(**kwargs)¶

cachecow class providing top level API

open(file_path, trace_type='p', data_type='c', **kwargs)¶

The default operation of this function opens a plain text trace, the format of a plain text trace is such a file that each line contains a label.

By changing trace type, it can be used for opening other types of trace, supported trace type includes

trace_type	file type	require init_params
“p”	plain text	No
“c”	csv	Yes
“b”	binary	Yes
“v”	vscsi	No

the effect of this is the save as calling corresponding functions (csv, binary, vscsi)

Parameters:

file_path – the path to the data
trace_type – type of trace, “p” for plainText, “c” for csv, “v” for vscsi, “b” for binary
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
kwargs – parameters for opening the trace

Returns:

reader object

csv(file_path, init_params, data_type='c', block_unit_size=0, disk_sector_size=0, **kwargs)¶

open a csv trace, init_params is a dictionary specifying the specs of the csv file, the possible keys are listed in the table below. The column/field number begins from 1, so the first column(field) is 1, the second is 2, etc.

Parameters:

file_path – the path to the data
init_params – params related to csv file, see above or csvReader for details
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
block_unit_size – the block size for a cache, currently storage system only
disk_sector_size – the disk sector size of input file, storage system only

Returns:

reader object

Keyword Argument	file type	Value Type	Default Value	Description
label	csv/ binary	int	this is required	the column of the label of the request
fmt	binary	string	this is required	fmt string of binary data, same as python struct
header	csv	True/False	False	whether csv data has header
delimiter	csv	char	“,”	the delimiter separating fields in the csv file
real_time	csv/ binary	int	NA	the column of real time
op	csv/ binary	int	NA	the column of operation (read/write)
size	csv/ binary	int	NA	the column of block/request size

binary(file_path, init_params, data_type='l', block_unit_size=0, disk_sector_size=0, **kwargs)¶

open a binary trace file, init_params see function csv

Parameters:

file_path – the path to the data
init_params – params related to the spec of data, see above csv for details
data_type – the type of request label, can be either “c” for string or “l” for number (for example block IO LBA)
block_unit_size – the block size for a cache, currently storage system only
disk_sector_size – the disk sector size of input file, storage system only

Returns:

reader object

vscsi(file_path, block_unit_size=0, **kwargs)¶

open vscsi trace file

Parameters:

file_path – the path to the data
block_unit_size – the block size for a cache, currently storage system only

Returns:

reader object

reset()¶

reset cachecow to initial state, including: reset reader to the beginning of the trace

close()¶: close the reader opened in cachecow, and clean up in the future

stat(time_period=[-1, 0])¶

obtain the statistical information about the trace, including

number of requests

number of uniq items

cold miss ratio

a list of top 10 popular in form of (obj, num of requests):

number of obj/block accessed only once

frequency mean

time span

Returns:: a string of the information above

get_frequency_access_list(time_period=[-1, 0])¶

obtain the statistical information about the trace, including

number of requests

number of uniq items

cold miss ratio

a list of top 10 popular in form of (obj, num of requests):

number of obj/block accessed only once

frequency mean

time span

Returns:: a string of the information above

num_of_req()¶

Returns:: the number of requests in the trace

num_of_uniq_req()¶

Returns:: the number of unique requests in the trace

get_reuse_distance()¶

Returns:: an array of reuse distance

get_hit_count_dict(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶

get hit count of the given algorithm and return a dict of mapping from cache size -> hit count notice that hit count array is not CDF, meaning hit count of size 2 does not include hit count of size 1, you need to sum up to get a CDF.

Parameters:

algorithm – cache replacement algorithms
cache_size – size of cache
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
kwargs – other parameters including num_of_threads

Returns:

an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio

get_hit_ratio_dict(algorithm, cache_size=-1, cache_params=None, bin_size=-1, use_general_profiler=False, **kwargs)¶

get hit ratio of the given algorithm and return a dict of mapping from cache size -> hit ratio

Parameters:

algorithm – cache replacement algorithms
cache_size – size of cache
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
bin_size – if algorithm is not LRU, then the hit ratio will be calculated by simulating cache at cache size [0, bin_size, bin_size*2 … cache_size], this is not required for LRU
use_general_profiler – if algorithm is LRU and you don’t want to use LRUProfiler, then set this to True, possible reason for not using a LRUProfiler: 1. LRUProfiler is too slow for your large trace because the algorithm is O(NlogN) and it uses single thread; 2. LRUProfiler has a bug (let me know if you found a bug).
kwargs – other parameters including num_of_threads

Returns:

an dict of hit ratio of given algorithms, mapping from cache_size -> hit ratio

profiler(algorithm, cache_params=None, cache_size=-1, bin_size=-1, use_general_profiler=False, **kwargs)¶

get a profiler instance, this should not be used by most users

Parameters:

algorithm – name of algorithm
cache_params – parameters of given cache replacement algorithm
cache_size – size of cache
bin_size – bin_size for generalProfiler
use_general_profiler –
this option is for LRU only, if it is True, then return a cGeneralProfiler for LRU, otherwise, return a LRUProfiler for LRU.

Note: LRUProfiler does not require cache_size/bin_size params, it does not sample thus provides a smooth curve, however, it is O(logN) at each step, in contrast, cGeneralProfiler samples the curve, but use O(1) at each step
kwargs – num_of_threads

Returns:

a profiler instance

heatmap(time_mode, plot_type, time_interval=-1, num_of_pixels=-1, algorithm='LRU', cache_params=None, cache_size=-1, **kwargs)¶

plot heatmaps, currently supports the following heatmaps

hit_ratio_start_time_end_time
hit_ratio_start_time_cache_size (python only)
avg_rd_start_time_end_time (python only)
cold_miss_count_start_time_end_time (python only)
rd_distribution
rd_distribution_CDF
future_rd_distribution
dist_distribution
reuse_time_distribution

Parameters:

time_mode – the type of time, can be “v” for virtual time, or “r” for real time
plot_type – the name of plot types, see above for plot types
time_interval – the time interval of one pixel
num_of_pixels – if you don’t to use time_interval, you can also specify how many pixels you want in one dimension, note this feature is not well tested
algorithm – what algorithm to use for plotting heatmap, this is not required for distance related heatmap like rd_distribution
cache_params – parameters passed to cache, some of the cache replacement algorithms require parameters, for example LRU-K, SLRU
cache_size – The size of cache, this is required only for hit_ratio_start_time_end_time
kwargs – other parameters for computation and plotting such as num_of_threads, figname

diff_heatmap(time_mode, plot_type, algorithm1='LRU', time_interval=-1, num_of_pixels=-1, algorithm2='Optimal', cache_params1=None, cache_params2=None, cache_size=-1, **kwargs)¶

Plot the differential heatmap between two algorithms by alg2 - alg1

Parameters:

cache_size – size of cache
time_mode – time time_mode “v” for virtual time, “r” for real time
plot_type – same as the name in heatmap function
algorithm1 – name of the first alg
time_interval – same as in heatmap
num_of_pixels – same as in heatmap
algorithm2 – name of the second algorithm
cache_params1 – parameters of the first algorithm
cache_params2 – parameters of the second algorithm
kwargs – include num_of_threads

twoDPlot(plot_type, **kwargs)¶

an aggregate function for all two dimensional plots printing except hit ratio curve

plot type	required parameters	Description
cold_miss_count	time_mode, time_interval	cold miss count VS time
cold_miss_ratio	time_mode, time_interval	cold miss ratio VS time
request_rate	time_mode, time_interval	num of requests VS time
popularity	NA	Percentage of obj VS frequency
rd_popularity	NA	Num of req VS reuse distance
rt_popularity	NA	Num of req VS reuse time
scan_vis_2d	NA	mapping from original objID to sequential number
interval_hit_ratio	cache_size	hit ratio of interval VS time

Parameters:

plot_type – type of the plot, see above
kwargs – parameters related to plots, see twoDPlots module for detailed control over plots

plotHRCs(algorithm_list, cache_params=(), cache_size=-1, bin_size=-1, auto_resize=True, figname='HRC.png', **kwargs)¶

this function provides hit ratio curve plotting

Parameters:

algorithm_list – a list of algorithm(s)
cache_params – the corresponding cache params for the algorithms, use None for algorithms that don’t require cache params, if none of the alg requires cache params, you don’t need to set this
cache_size – maximal size of cache, use -1 for max possible size
bin_size – bin size for non-LRU profiling
auto_resize – when using max possible size or specified cache size too large, you will get a huge plateau at the end of hit ratio curve, set auto_resize to True to cutoff most of the big plateau
figname – name of figure
kwargs –
options: block_unit_size, num_of_threads, auto_resize_threshold, xlimit, ylimit, cache_unit_size

save_gradually - save a figure every time computation for one algorithm finishes,

label - instead of using algorithm list as label, specify user-defined label

characterize(characterize_type, cache_size=-1, **kwargs)¶

use this function to obtain a series of plots about your trace, the type includes

short - short run time, fewer plots with less accuracy
medium
long
all - most of the available plots with high accuracy, notice it can take LONG time on big trace

Parameters:

characterize_type – see above, options: short, medium, long, all
cache_size – estimated cache size for the trace, if -1, PyMimircache will estimate the cache size
kwargs – print_stat

Returns:

trace stat string

API-cachecow¶

Table of Contents

Previous topic

Next topic

This Page