Why the Data Storage Industry is Changing

Data Storage

For years organisations have been grappling with data backup and recovery issues. Like the need to decide what data needs to be kept in near-term storage for easy access, what can live at a second level where it’s still accessible (perhaps not immediately), and what can be kept in deep archives.

Big Problems

The rise of Big Data brings fresh dilemmas. Besides scale, there are characteristics of Big Data management that distinguish it from traditional approaches.

Big Data analysis often involves large data stores that are really only useful for a short period. Once the analysis is over, the data has served its purpose. The question arises as to whether it’s worth keeping at all, much less backing up.

Other issues include:

  • the relative immaturity of the Big Data tool set
  • the lack of people with expertise necessary to deal with the technology
  •  misaligned investments
  •  inadequate hardware

Few Solutions

One approach is to have multiple copies of important data. In effect, this substitutes redundancy for data protection and disaster recovery.

Others look at Big Data simply as raw material for analytics - source data which they can recreate, at will. No need for storage.

The Snowden Effect

An NTT Communications survey of 1,000 business leaders shows that, in light of whistleblower Edward Snowden's disclosures about the NSA, many are choosing more secure forms of storage over cloud computing. This trend mirrors efforts by individual countries like Brazil and Germany, which are encouraging regional online traffic to be routed locally, rather than through the US.

The vast scale of online surveillance is changing how businesses store commercially sensitive data - which could have a big impact on US technology companies like Facebook and Google. Not to mention the "global" nature of the Internet, itself.

Speaking of Surveillance...

Changes to legislation are having an impact on the market for enterprise data storage in video surveillance. In Germany, for instance, the maximum retention time for video data has been increased from 24 hours to 48 hours, effectively doubling the maximum capacity requirements of storage systems.

Increasingly, video surveillance is being used for business intelligence applications, as well as legal evidence. Both scenarios open up a market for greater investment in storing data for extended periods, or at a higher quality.

Network attached storage (NAS) is expected to compete aggressively with storage area network (SAN) solutions. Performances are becoming comparable, and NAS is also typically more cost effective to install and maintain.

Then, there's Flash

The evolution of storage has failed to keep pace with servers and networks, because of the limits of mechanical disk.

Flash memory is seen as the key to resolving this, but has not yet seen widespread adoption:

  • Flash is expensive; enterprise-grade flash typically costs more than $40 per GB.
  • The less costly MLC flash isn’t currently reliable enough for enterprise workloads.
  • Flash form factors currently lack the reliability, scale, and ease-of-integration of disk-based arrays

Early adoption has largely come via the insertion of PCIe cards within servers. The challenge with server-local storage is that it's generally incompatible with existing database architectures and virtualisation.

Arrays

Arrays remain the optimal form factor, for data centres. Storage arrays are easy to adopt, being plug-compatible with current systems, and are well optimised for existing application infrastructure. 

Arrays also deliver features for business-critical workloads:

  • high availability (HA)
  •  {C}data snapshots
  • backup facilities
  • multi-parity RAID and self-healing from failures
  • replication

Shared storage arrays also make it easier to rebalance CPU and storage capacity, to maximise utilisation across both.

The Hybrid Option?

An idealised mix of disk and flash. Traditional disk-based array vendors are now integrating large amounts of flash (as flash cache, flash tier, or both). The consensus view is that flash is just too expensive to use across an entire array.

The principal alternative to an all-flash array would be a flash/SATA disk hybrid, where either an administrator or the array controller ensures that the most active data resides in flash, with the less active on SATA disk.

There are challenges, of course:

  • The complexity of optimising widely different storage media within a single controller architecture.
  • Performance predictability in the face of enormous latency differences between the tiers.
  • The caching/tiering model concentrates the write-intensive work on a relatively small pool of flash, requiring the highest-cost versions.
  • The chunk size for moving data from disk to flash tends to be multi-megabyte.

This said, for any workload that's performance-centric, flash is the better technology, and over the coming decade, may completely replace mechanical disk, providing higher performance, greater reliability, simpler operation, and lower cost.

In The Cloud

Enterprise-class hybrid cloud storage is now a reality. Three breakthroughs are contributing to this:

1. Enterprise NAS Performance

Addressing the issue of excessive latency.

2. Painless Migrations

Allowing enterprises to change storage configurations at will. Historically, organisations couldn't migrate from on-premises NAS to cloud storage without rewriting all their enterprise applications for cloud, and then testing there.

3. Switching Vendors and Storage Options on the Fly

With less physical dependence on storage equipment, buyers are no longer tied to a single vendor for three- to seven-year periods. They can now have different vendors for each need.