A lot goes into high-performance computing…at every level. That’s not breaking news. At the same time so much of the focus of what is challenging about performing data analytics in an AI, HPC or other compute-intensive environment is on software – platforms and applications alike – that some of the most challenging aspects to deploying a compute-intensive system goes without meaningful consideration.
Over the years – especially in the past 5 – as HPC and Big Data platforms host evermore demanding AI or Machine Learning environments, we’ve witnessed a range of issues that come up for clients because they assumed that if they got the software right, the rest was like Lego’s – just piece it together, plug it in, and begin crunching data.
With this in mind, here are the Top 5 Considerations when considering an on-premises or cloud platform for your next deployment.
Hardware architecture matters as much as software architecture
One of the increasing challenges for any compute-intensive deployment has been the rise of cloud computing. Operating in the cloud is a good thing. Operating HPC or Big Data systems in the cloud is proving to be problematic. For one, in many instances, cloud providers consider the hardware component of your system to be already handled. The common position is that you no longer have to worry about your hardware since you’re outsourcing it. And while it’s one thing to migrate your email or ERP system to the cloud, migrating your AI or Machine Learning work is a very different proposition.
The nature of intensive computing is that you’re working with not only massive amounts of data but with a complexity that requires a greater capacity for speed, volume, data integration, and compute power. It means that where “normal” computations take minutes, your computations may take hours or even days to complete. As much as software continues to evolve to handle these issues, without the appropriate hardware architecture, the software won’t run efficiently or simply won’t run.
Deploying systems with HPC, Big Data or other extreme-scale capabilities is very much like auto racing. If you upgrade your engine to IndyCar specifications, you also need to ensure you’re putting that super charged engine into a frame and body that can handle the power and the needs of that engine. In this analogy, your engine is the software you want to use and the hardware solution is the body, frame, and accessories that house that engine. Of course, the photo to the right may be too dramatic but I hope it makes my point. An Indy car’s engine can easily overpower a normal car frame. Indy cars have to have the right tires, the right aerodynamics, the right shocks and steering to handle the speed – and to sustain that speed. Likewise, your HPC hardware solution has to accommodate a similar number of details and considerations – considerations that a cloud deployment may miss or simply not address.
Integration is where the vast majority of challenges occur
Two points here. First, I can’t count the number of times over the past two decades we’ve been asked to address installed HPC systems that don’t work. We’ve been called in by big manufacturers and by customers. In some cases, the issues were related to integrating the new system with the customers existing platform or, most often, there wasn’t enough attention paid prior to delivery to test the software on the designed system – testing it for the apps running AI as well as testing to ensure the new system works with their existing platform
Second point: we commissioned independent market research late last year that asked HPC and Big Data practitioners to rate and rank their 1) experience with on-prem and cloud deployments, 2) the importance of hardware architecture and integration in designing and executing their data management strategies, and 3) how important it is to include a hardware architect on their systems design team. The results came from more than 500 practitioners across the country and we were surprised by some of the findings.
What wasn’t surprising is that integration was their top concern when asked to rate the challenges of installing a new HPC or Big Data server or cluster.
Deciding to migrate to the cloud or deploy on-premises is both a technical and a business decision
This is a good time to remember that we’re addressing AI, HPC, or Big Data, rather than standard computing. Cloud providers have been making their business case for a long time and in cases where companies are looking to minimize costs for operational software, the cloud is a good choice. For HPC and Big Data deployments, the business case is not so clear. To begin with, we again learned a lot from the market research study we commissioned in September 2018.
Results uncovered several stories of data center managers tasked with recommending a deployment strategy for their HPC and Big Data initiatives. Their findings indicated that the volume of core hours required to run their HPC jobs led to a wide margin of difference in operating expenses between cloud and on-premises deployments. In one case, they calculated they would break even in just over a year – where the capital expense of purchasing their HPC clusters would be overtaken by the ongoing operating expenses of a cloud deployment. Still, other aspects of the business decision include accessibility, performance, and security. Depending on your work flow, access – both in terms of when you need compute time and how reliable your internet connection is, can affect efficiency. Where cyber security is concerned, some in the public sector as well as those dealing with sensitive or mission-critical data reported a reluctance to migrate to the cloud. One practitioner stated, “we deal with large scale public data. Public perception of our commitment to security would fall under much more scrutiny if we abdicated responsibility for our data to a cloud provider.”
As for technical considerations, there are more than a few but what stands out to me are the points I made above – HPC and Big Data systems (running AI, machine learning, or other compute-intensive applications) are complicated systems that are more often not able to “run themselves” the way a lot of operational software does. At the heart of a lot of HPC work is proprietary algorithms and other software that requires consistent attention to run in an optimal environment. In fact, referring to the market research, one of the questions asked practitioners to rate types of analytics work in terms of their preference for cloud or on-premises.
Focusing on your outcome requirements is the first and most important consideration
When considering any HPC or Big Data deployment – in the cloud or on-prem – it’s critical that you engage partners who first want to learn as much as they can about your business – what are you trying to achieve; what does success look like; and more technical questions such as what software platform and applications you’re running or considering. They should also be interested in understanding your timeline and circumstances for your deliverables. Does your work require fast turnaround times for results? What should your results look like – simple dashboards or complex visualization?
These and other exploratory questions are critical to designing and building a system optimized for your work. The operative phrase here is: there is no such thing as HPC or Big Data…only YOUR HPC or Big Data.
Understanding the distinct phases of your data’s workflow can make all the difference
Data workflow is important to understand when designing and deploying an HPC or Big Data system. While data lifecycles refer to the stages of data within your organization, data workflow refers to three main phases that should be understood by anyone working with you on your HPC or Big Data deployment.
The three phases are: Capture and Store, Computation, and Visualization. Each of these phases bring questions that impact hardware architecture, software integration, and compute power needs.
Questions cover a range of considerations. How many data sources do you need to capture and what work needs to be done to get them ready for your analytics work? When you want to run a job how do you need to access your stored data? How complex is your computational requirements? Do you work with visual or text-based data? There are more issues to address, but I hope you get the idea.
Answers to these questions have a direct impact what’s required to optimize a system for your needs.
In the cloud or in a fog?
Cloud computing has been and is an essential and useful business resource for a lot of what I’ll call operational software. The vast majority of Microsoft’s cloud business, for example, is with their Office 365 offerings – which we use here at PSSC Labs. We love the cloud for what it does well – simplifies our life and creates great efficiencies for a lot of our daily computing needs.
What isn’t as clear is how or whether the cloud provides these benefits for HPC, Big Data, or other compute-intensive applications. It may, but it’s well worth your while to consider the 5 topics above before making a decision on your next HPC or Big Data deployment. In many cases, simplifying and creating efficiency – and delivering your best work – requires expertise across the spectrum of both the software and hardware components required to reach your destination.
—– —– —– —–
Alex Lesser can be reached at firstname.lastname@example.org or on LinkedIn.