Talking about the astronomical growth of data these days is like talking about the rising seas: the measurements don’t lie. The only difference is that world’s oceans are not rising nearly as fast as the amount of the world’s data. Fortunately finding storage for all that data is not quite as vexing as finding a place for all that extra saltwater. Nonetheless, data storage is still a big challenge.
Part of the problem is that while total enterprise data volumes are doubling every two years, storage budgets are only growing at around five percent per year. As a result, the attack on the data explosion has developed into a dual-front assault addressing both data size and storage costs.
To help reduce the size of the data, storage vendors are including data reduction technologies in storage devices. According to a recent Network Computing article, data reduction technologies like de-duplication, compression, and thin provisioning can reduce data sets from 1/4 to 1/10 of the original capacity.
- Data de-duplication is best for unstructured data sets, virtual machines, application services, virtual desktops, and test/development environments.
- Data compression works best on relational databases, OLTP, decision support systems, and data warehouses.
- Thin provisioning eliminates the pre-allocation of unused capacity, thus providing an efficient, on-demand storage consumption model.
The key to maximum data reduction is to make sure all three of these technologies are employed as part of your storage solution. But that’s just part of the puzzle. To effectively reduce the cost of storage, the industry needs to develop more efficient methods to address how and where the data gets stored.
One popular scenario, espoused by storage expert Jim O’Reilly, is disk-to-disk-to-cloud, which keeps the most important and recent data on disk or flash and then moves it out tier-by-tier to the cloud. Gartner projects that by 2016, most Global 1000 companies will have stored customer-sensitive data in the public cloud. Within three years, at least 80% of ECM vendors will provide cloud service alternatives to their on-premise solutions.
And then there’s the technology that decides what is getting backed up and to where. That process is becoming increasingly software-defined, which offloads the heavy lifting – like RDMA protocol handling, advanced data lifecycle management, caching, and compression – to software programs that take advantage of large amounts of CPU power in public and private clouds.
Software-defined flash memory storage systems area big focus at IBM these days, as are future storage technologies. Earlier this year at the annual IBM Edge conference, the company revealed a 120 PB file system solution using a grid of IBM Power 775 servers and a staggering 200,000 disks. In addition, they revealed multi-cloud storage that allows users to access storage from multiple providers and PCM (phase-change memory) cards that are controlled by heat and Liquid State Storage.
In the short run, hybrid storage arrays combine flash memory, disk, and tape with cloud archival capabilities. This mixture provides a manageable, flexible, and cost effective solution. In the future, it's possible that Liquid State Storage might be tricky for air travelers, given TSA rules about liquids on the plane. Not to worry. By the time it's commercially available, we'll all be riding in submarines.