Scale-out Storage — It’s Time to Take Back Your Data!
In our previous blog piece, “It’s a Trap! — The True Cost of the Cloud,” we referenced an Andreesen Horowitz article called “The Cost of Cloud, a Trillion Dollar Paradox.” This recent gem indulges in more than a little heresy. It discusses Dropbox’s repatriation of user data from third-party cloud resources to in-house storage infrastructure and asserts — in italics — that “if you’re operating at scale, the cost of cloud can at least double your infrastructure bill.”
What? Cloud computing and operating at scale are practically synonymous. At a time when cloud services are seemingly the be-all and end-all of enterprise operations, stating that cloud costs more than on-prem is preposterous.
After all, the 2021 Flexera State of the Cloud report showed that 61% of survey respondents noted cost optimization as a top cloud initiative. Isn’t that why enterprises transition to the cloud? Because the cloud is more resource-efficient than on-premises infrastructure? It must be. That explains year-over-year spending graphs like the following from Synergy Research Group.
“Over the last ten years we have seen … an explosion in the amount of data being generated and processed, resulting in an ever-growing need for data center capacity,” noted Synergy Research chief analyst John Dinsdale. “However, 60% of the servers now being sold are going into cloud providers’ data centers and not those of enterprises. … Clearly, companies have been voting with their wallets on what makes the most sense for them.”
Or…maybe not. Maybe those companies are following the herd and practicing “common wisdom” from a decade ago out of habit.
The Contrary Case — Is Cloud Storage Really Efficient?
Let’s back up to that Andreesen Horowitz piece. The author spotlights Dropbox’s 2018 S-1 SEC filing, but there’s more to the filing than what appeared in the article. For example, Andreesen Horowitz focuses on Dropbox’s free cash flow. However, free cash flow, as well as gross margin, is highly impacted by Dropbox’s significant revenue increase over that time period, which somewhat obfuscates the point. The real story is in Dropbox’s Infrastructure Optimization initiative, which is worth quoting (from page 67) here.
“In recent years, we have taken several steps to improve the efficiency of the infrastructure that supports our platform. These efforts include an initiative that focused on migrating the vast majority of user data stored on the infrastructure of third-party service providers to our own lower-cost, custom-built infrastructure in co-location facilities that we directly lease and operate. In order to host user data on our own infrastructure, we leased or purchased infrastructure that is depreciated within our cost of revenue. During the migration to our internal infrastructure, we duplicated our users’ data between our internal infrastructure and that of our third-party service providers, resulting in higher storage costs. We reduced this practice over time until we completed the migration in the fourth quarter of 2016. … We expect to continue to realize benefits from expanding our internal infrastructure due to our operating scale and lower unit costs.”
Said simply, Dropbox realized higher costs as it increased internal infrastructure and essentially paid for data storage twice — once on internal systems and once on third-party services.
Once IT was confident that user data was safe and available, Dropbox canceled the outside services and began to depreciate its infrastructure investments. Higher up-front costs were soon offset by lower ongoing operational expenses.
The company noted, “Cost of revenue decreased $21.7 million or 6% during 2017, as compared to 2016, primarily due to a $35.1 million decrease in our infrastructure costs due to our Infrastructure Optimization.” Over the first two years of implementation, repatriating from the public cloud saved Dropbox $75 million.
Andreesen Horowitz viewed this topic largely through the lens of impact on total market capitalization. Certainly, that calculus is valid and concerning, but we find the author’s key statement to be this: “Repatriation results in one-third to one-half the cost of running equivalent workloads in the cloud.”
We don’t disagree with this. In fact, anecdotal input from our customers indicates a 3x to 7x cost difference between public cloud and on-premises storage. However, there’s a lot going on under the hood of such statements, and it bears further discussion.
Why You Can Be Like Dropbox — Cloud Storage VS Scale-out Storage
This starts off like a Morpheus meme. “What if I told you…” that cloud and appliance storage wasn’t as cost-effective as on-prem?
You’d probably say we were wrong. And you’d be right. Sometimes.
Cloud storage is absolutely the right choice in three cases:
1) when your datasets measure in the terabytes,
2) sometimes for disaster recovery, and
3) when you’re needing ad hoc, short-term resources, such as for seasonal load spikes or standing up a new project to try out.
Cloud storage can play an excellent role in enterprise storage strategy, and we advocate a hybrid approach with this in mind.
Cloud is not the right choice for petabyte-class, scale-out storage. Not anymore.
Let’s examine the options.
First, you’ve got public cloud services. You’re paying for infrastructure as a service, which often means someone else runs the hardware and you’re still paying to manage the software. If you don’t want to manage the software, then the provider takes that over, too, and you have total vendor lock-in. Over time, they can keep raising rates, and you’re like the proverbial frog simmering in the kettle, and the edges are too high (because of lock-in) to climb out. The situation is even worse and the lock-in even tighter if you have to code your applications to the provider’s proprietary API.
Next, you’ve got storage appliances. Most are based on NFS, which isn’t suitable for a wide range of modern enterprise workloads. Some don’t reach their potential unless stocked with some form of non-volatile memory, which balloons the solution cost out of proportion to what most workloads require, all in the pursuit of pushing latency far lower than what most storage applications truly require.
With both public cloud and storage appliances, it’s trendy now to advertise the solution as “software-defined.” One prominent vendor claims its appliance is “software-defined” because the software runs on the appliance.
That’s like saying you have an engine-based car.
Real Software-Defined Storage
A true software-defined storage solution is agnostic in every feasible way.
If you can’t run that storage software on any server, including repurposed, off-the-shelf hardware you’ve had sitting in a closet for a couple of years, then it’s not truly software-defined storage.
It’s a ploy to put you in the lock-in kettle.
Agnostic software-defined storage should be instantly accessible — just download, run, and get to work — and free from requirements to rack and stack someone’s branded enclosures. If you don’t have that, you have an appliance.
Cloud grew into its reputation as the de facto storage choice over the past 10 to 20 years because, in those days, most workloads were traditional enterprise workloads, not today’s mass-scale workloads. In this decade, every sizable business is digging into machine learning and big data. It’s a check-box item for staying competitive. If that doesn’t mean petabytes for your business now, it will soon. The nature of many if not most enterprise workloads have changed.
During that change, cloud providers have continually improved their storage software platforms. Yes, their scale and volume earn discounts on drives and servers, but their real advantage, the place where their profits skyrocket, is in their software. They have better tools than everyone else. Mainstream enterprises haven’t invested those years into storage optimization. Because of this, the gap between their storage efficiency and yours is too wide and too costly to bridge. You would have to put in those years and millions of development dollars, just like they did.
You would, in effect, have to be like Dropbox, except Dropbox had the advantage of being a storage company, so all of that investing went straight into their core business model.
Quobyte’s Scale-out Storage Solution
Against that backdrop, Quobyte’s value proposition should be clear. We created a true software-defined storage solution with the efficiencies and optimizations enjoyed by the massive public cloud providers. You have the hardware. We leveled the playing field. With Quobyte, you can repatriate all those petabytes to local storage infrastructure and put them to work cost-efficiently.
As we mentioned, our customers claim a 3x to 7x cost advantage after migrating from the cloud to on-prem. You really can realize all those Dropbox-style advantages. But beyond that, you can take back control of your data, have complete sovereignty, and no longer be an amphibious hostage in a kettle.
The “common wisdom” of a decade ago no longer applies. It’s time to get smarter about scale-out storage.
Originally posted on Quobyte’s blog on September 3, 2021.