Lack of visibility into how information is being used can be extremely problematic in any data center, resulting in poor application performance, excessive operational costs, and over-investment in infrastructure hardware and software.
One of the biggest mysteries in modern day data centers is the “working set,” which refers to the amount of data that a process or workflow uses in a given time period. Many administrators find it hard to define, let alone understand and measure how working sets impact data center operations.
Virtualization helps by providing an ideal control plane for visibility into working set behavior, but hypervisors tend to present data in ways that can be easily misinterpreted, which can actually create more problems than are solved.
So how can data center administrators get the working set information they need in a manner that is most useful for proper planning, design, operations, and management?
What Is It?
For all practical purposes, working sets are the data most commonly accessed from persistent storage. But that simple explanation leaves a handful of terms that are difficult to qualify, and quantify. What is recent? Does “amount” mean reads, writes, or both? What happens when the same data is written over and over again?
Determining a working set’s size helps administrators understand the behaviors of workloads for better design, operations, and optimization. For the same reason administrators pay attention to compute and memory demands, it is also important to understand storage characteristics like working sets. Understanding and accurately calculating working sets can have a profound effect on the consistency of a data center. Have you ever heard about a real workload performing poorly, or inconsistently on a tiered storage array, hybrid array, or hyper-converged environment? This is because both are extremely sensitive to right sizing the caching layer. Not accurately accounting for working set sizes of the production workloads is a common reason for such issues.
To explore this more, let’s review a few traits associated with working sets:
Working sets are driven by the workload, the applications driving the workload, and the virtual machines (VMs) on which they run. Whether the persistent storage is local, shared, or distributed, it really doesn’t matter from the perspective of how the VMs see it. The size will be largely the same.
Working sets always relate to a time period. However, it’s a continuum, with cycles in the data activity over time.
Working set will be comprised of reads and writes. The amount of each is important to know because reads and writes have different characteristics, and demand different things from your storage system.
Working set size refers to an amount, or capacity. But how many I/Os it takes to make up that capacity will vary due to ever- changing block sizes.
Data access types may be different. Is one block read a thousand times, or are a thousand blocks read one at a time? Are the writes mostly overwriting existing data, or is it new data? This is part of what makes workloads so unique.
Working set sizes evolve and change as your workloads and data center change. Like everything else, they are not static.