Analyzing storage system workloads
Word processed copy.
Thesis
Analysis of storage system workloads is important for a number of reasons. The analysis might be performed to understand the usage patterns of existing storage systems. It is very important for the architects to understand the usage patterns when designing and developing a new, or improving upon the existing design of a storage system. It is also important for a system administrator to understand the usage patterns when configuring and tuning a storage system. The analysis might also be performed to determine the relationship between any two given workloads. Before a decision is taken to pool storage resources to increase the throughput, there is need to establish whether the different workloads involved are correlated or not. Furthermore, the analysis of storage system workloads can be done to monitor the usage and to understand the storage requirements and behavior of system and application software. Another very important reason for analyzing storage system workloads, is the need to come up with correct workload models for storage system evaluation. For the evaluation, based on simulations or otherwise, to be reliable, one has to analyze, understand and correctly model the workloads. In our work we have developed a general too, called ESSWA (Enterprise Storage System Workload Analyzer) for analyzing storage system workloads, which has a number of advantages over other storage system workload analyzers described in literature. Given a storage system workload in the form of an I/O trace file containing data for the workload parameters, ESSWA gives statistics of the data. From the statistics one can derive mathematical models in the form of probability distribution functions for the workload parameters. The statistics and mathematical models describe only the particular workload for which they are produced. This is because storage system workload characteristics are sensitive to the file system and buffer pool design and implementation, so that the results of any analysis are less broadly applicable. We experimented with ESSWA by analyzing storage system workloads represented by three sets of I/O traces at our disposal. Our results, among other things show that: I/O request sizes are influenced by the operating system in use; the start addresses of I/O requests are somewhat influenced by the application; and the exponential probability density function, which is often used in simulation of storage systems to generate inter-arrival times of I/O requests, is not the best model for that purpose in the workloads that we analyzed. We found the Weibull, lognormal and beta probability density functions to be better models.