Your Big Data System in the Cloud: Getting it Right From the Start (Part Two in a Three-Part Series)
The concept of cloud computing—running workloads remotely over the Internet in a commercial provider’s data center—has analogies anchored in centuries past that serve to illustrate the imperatives, benefits, and sheer logic of cloud migration for 21st century organizations.
Power generation is a strikingly apt example. Turn back the clock to the mid-19th century when homes and businesses generated their own power—it was on-site, unreliable, and expensive. In 1882, Thomas Edison realized his vision of a central power station to distribute electricity to end-users in New York City’s main business district. Businesses no longer had to manage on-site power generation; they simply purchased the electricity they needed from a centralized plant run by dedicated professionals.
Over the ensuing decades, with major advances in power-generation technology, including nuclear power, Edison’s basic outsourcing model has in many ways evolved into the 21st century’s interconnected national grid, which is exceedingly efficient, ultra-reliable, and managed by experts in today’s highly complex power generation and distribution systems. So today, a normal home running a generator as their sole source of power or even renting a generator located in a building down the street would be considered extremely unusual.
Computing has followed a path similar to power generation, from on-site, unreliable, and expensive servers in the 1960s, through to the advent of the online, interconnected world enabled by the Web around 1990, and then the widespread adoption of cloud computing in the 2010s. Many companies who simply “lifted and shifted” their computing workloads to the cloud have discovered that while they moved the generator, they haven’t realized any of the other advantages.
On the surface, the fundamental principle driving cloud adoption—that purchasing computer capacity from off-site cloud providers lets organizations focus on their respective missions; not on the technology required to support those missions—sounds simple, but it’s not.
In the booming digital world, people speak of ever-growing “mountains of data” or Big Data, and of head-spinning continuous change in technology. Enterprises face an overwhelming number of choices and options when procuring the right technology. It takes a roster of skilled and versatile technologists, like the specialists who work in CVP’s Data Science and Engineering Practice, to navigate customers through a migration to the cloud, from procurement to implementation.
The Big Data challenges that organizations face are daunting, as storing, mining, and analyzing vast amounts of data become ever more important for gaining mission-critical business intelligence and a competitive edge. With traditional, on-site computing, organizations must acquire servers and try to plan for variability in capacity needs and usage.
Cloud computing changes the game. The dynamic nature of the cloud lets customers buy exactly the amount of hardware they need at certain points in time. They can buy capacity in variable chunks rather than a fixed capacity, a limitation of the typical corporate-size data center. This is comparable to the idea that a home or office can use an atypically high amount of electricity when needed, as long as the correct wiring is in place.
Given the flexibility of the cloud, organizations moving data systems to the cloud are up against a dizzying assortment of performance and pricing options.
Cloud providers, such as Amazon, Google, and Microsoft, offer many different services, some of which do the same thing but in a different way. Clients who don’t have astute and experienced experts in cloud technology might end up buying the wrong service.
CVP has seen many clients who just move their system to the cloud but don’t change anything else, such as continuing to run expensive, old database software directly on virtual machines. Their invoice goes up and the performance goes down, and they ask themselves, “What the heck was all the allure of cloud computing?”
CVP recently answered that question emphatically for a client planning to move a large data set to the Amazon Web Services (AWS) cloud to speed-up processing and reduce costs, which were running about $100 per unit of work for the current, on-site system. Testing an array of AWS technical environments, CVP data technologists demonstrated that by deploying AWS Athena, an interactive query service that makes it easy to analyze data in Amazon’s S3 storage service using standard SQL, the customer could increase processing speed 24 times and reduce the cost per unit of work to 6 cents. That is the allure of cloud computing.
What’s more, Athena is serverless, so there is no infrastructure to set up or manage; users can start analyzing at any time without having to worry about scaling or system availability. And they don’t need to load their data into Athena–it works directly with data stored in S3.
Looking ahead to the next decade, CVP data technologists expect to see the maturation of serverless computing yielding data systems that will cost even less and offer scalability that will be nearly unlimited. Additionally, all patching, provisioning, and backup will be provided by the cloud vendor. When it comes to building, maintaining, and managing mission-supporting technology, organization leaders can confidently say, “That’s somebody else’s problem.”
Next in the three-part series: A challenge for technology providers is building trust with clients who often worry about security and losing control of their data when it is moved off-site to a cloud environment. The next blog post will look at cultural resistance to change and how CVP works to mitigate it.