EMC kicked off EMC World 2014 with the announcement of their acquisition of DSSD, a shadowy company founded by eccentric geniuses from Sun Microsystems: Andy Bechtolsheim, Jeff Bonwick and Bill Moore. DSSD has been working for three years to accelerate data access and organization for huge datasets by building object storage capabilities directly onto a custom chipset. This is a fascinating story, and can be seen as another step in the struggle of vendors to own information intelligence and speed access to exabytes of data being created.
Like switching from a record player to an iPod
To understand the story, you have to start with virtualization- technology that disaggregates information and management from the hardware that’s processing it. In the case of server virtualization, you separate the logical server- the operating system, configurations and the workload running on it- from the physical hardware of the server. This accomplishes two huge things: First, you can consolidate multiple workloads onto physical servers that were previously used for single workloads at extremely low utilization. Second, you can move workloads from server to server, which gives you much more flexibility in terms of mitigating server failures, migrating workloads and generally making changes without downtime. Object storage is another form of virtualization that moves information management intelligence from the storage hardware system to the data itself. Basically, object storage uses logical addressing based on unique identifiers, rather than physical addressing based on the sector of disk, tape or other media that it lives on. Each piece of data carries metadata along with it, allowing it to be identified, searched, verified and used, regardless of what type of hardware it’s stored on. Think of block storage as a record, where you find a song by moving the needle to a spot that’s a particular distance from the edge of the vinyl, and object storage as music on an iPod, that finds songs based on a code in software. You can add a song to a mix without making a new copy, just by creating a reference to it, you can use metadata category tags to sort and search, all of which object storage allows you to do with data.
Ebb & flow: commodity vs. custom
The next piece of the story is the tension between commodity hardware and specialized chipsets. The IT industry continually swings between the two in search of performance and cost advantages. Most recently, the trend has been towards commodity, low cost off the shelf hardware – vendors using intelligent software to get the most out of the same X86 servers, spinning disk, flash storage, ethernet interconnects etc. Using commodity hardware keeps costs down, it makes systems more compatible and supportable in the long run, and reduces the risk of being locked in to a vendor with proprietary hardware. But the benefit that software optimizers can wring out of commodity hardware diminishes over time, leading to optimized hardware as the next level of improvement- which if effective and widely adopted, may become the next generation of commodity hardware.
The future today: object storage on custom chipsets with flash
DSSD builds custom chipsets that offload and accelerate the processing of object storage transactions (searches, reads, writes, deletes and changes) from servers and applications that use the data. Specialized hardware and software could yield better results when performing complex processing of huge data sets. In fact, it could provide significant advantage over companies attempting to accelerate the data processing through software acceleration of commodity hardware alone.
Every large storage vendor today, including EMC, Hitachi Data Systems, HP, IBM and NetApp offers flash capacity in their systems, aimed at speeding up reads and writes. Some newer entrants offer purpose-built all-flash systems, like EMC’s recently purchased XtremeIO, IBM’s acquired Texas Memory Systems, Cisco’s Whiptail, Pure Storage, Violin Memory, SolidFire, Nimbus Data, Kaminario. Innovators like Nimble Storage, Coho Data, Tegile, deliver hybrids of flash and disk aimed at balancing the benefits of flash acceleration with the cost efficiency of spinning disk. Other vendors like FusionIO and Intel offer server side flash that bypass centralized storage for accelerated reads and writes at the server. Another approach is converging server and storage resources for optimized processing for key applications like Nutanix, Simplivity, Oracle Exadata, IBM Netezza, Pivot3 and others. Every single one of these approaches is using the same types of flash capacity from a few common suppliers, and relying on software and network configurations to eke out advantages. Object storage approaches abound in the market from software focused players like Scality, Caringo and NetApp StorageGrid, to appliances from Cleversafe, Amplidata, EMC Atmos. None of these have seen dramatic success in the market, with many of the solutions stuck in the doldrums of the archiving space, as well as lacking the portability and consistency of a standard that users of tried and true POSIX compliant file systems enjoy. Mega-scale cloud vendors Amazon, Google and Microsoft use object storage to get the massive scale they need, but the fact that they build the systems for their own usage limits the popularization of object storage for more general usage.
“Commoditize the other guys? Sure.”
EMC’s purchase of DSSD may well marry object storage virtualization with flash acceleration and give them a significant advantage over vendors that are attempting to do both separately. It may also give them the ability to achieve another goal of virtualization- To be the intelligence in IT, forcing their competitors to languish as dumb commodities without differentiation. The risk for them is that the technology won’t work, or if it does, be seen as too specialized, speculative or risky by customers, and miss mainstream adoption in spite of a real technology advantage. Either way, EMC’s willingness to buy early stage but exciting technology, both validates the importance of flash and object storage, and gives them an opportunity to change the way data is used at a fundamental level.