Bringing Hardware Acceleration to Data Science Without Touching Your Code
Bringing Hardware Acceleration to Data Science Without Touching Your Code
On March 14 of this year, Bigstream announced the availability of the Bigstream Hyper-Acceleration Layer on Amazon EMR and for on-prem use. After years of research and almost 2 years of stealth development, the introduction of Bigstream hyper-acceleration for Apache Spark, Spark SQL/Dataframes and other big data technologies adds a powerful, frictionless technology to maximize the performance and economics of big data infrastructure.
But what is hyper-acceleration? What does it mean for data scientists, data engineers, DevOps teams and application architects? We sat down with Bigstream CEO Maysam Lavasani to get his perspective on hyper-acceleration, Apache Spark and the company that he co-founded.
There is a concept in the semiconductor industry called Dennard Scaling. Dennard scaling is about reducing voltage at each technology node, keeping power roughly constant even though the number of transistors is doubling for each node. It is related to Moore’s Law that states that by shrinking the size of transistors and increasing the clock frequency you keep getting free performance by just going to the next generation of processors.
However, the limits of physics at some point restrict this improvement. That is why people in the business of building microprocessors started thinking that we need to look at alternatives. One of the most promising areas was to look at nontraditional micro-architectures, what today we call accelerators. That is how, for example, GPUs came to life to accelerate specific applications like graphics processing, where conventional microarchitectures are not the best choice. So they came up with other types of micro-architectures, one example being SIMT (Single Instruction Multiple Threads).
So the problem with this disruption in Dennard scaling, (or what some people call the End of Moore’s Law) is that it is becoming more and more serious as big data, machine learning, and AI gain in popularity. As a result, we’ve seen more chip industry investment in non-conventional architectures and investment in acceleration technologies. For instance, we have seen activities like the acquisition of Altera by Intel and statements from the Intel CEO saying that by 2020, 30% of Intel processors will have FPGAs in the data centers.
Microsoft’s Project Catapult has ensured that almost every Azure server has an FPGA in it for networking, security acceleration, and machine learning. Other big companies are also building in-house solutions for acceleration. Google went ahead and built the TPU (Tensor Processing Units). And of course, NVidia is flying high on a wave of increasing use of GPUs.
It has become clear that while cloud giants like Google, Microsoft and Facebook can apply various tactical solutions to accelerating different stages of the computation pipeline, there is no holistic, affordable approach to solve this problem for Fortune 1000 companies. The computing workloads that are perhaps most affected by this disruption of Moore’s law is big data analytics and machine learning applications. These kind of applications are performance critical, meaning that they need a lot of compute, and when you begin to experience processing bottlenecks at the CPU, these are the applications that get hurt the most.
If you look at these workloads and the industries that depend on them (especially AdTech, FinTech, Media, Healthcare, and Security), you will see the big data and machine learning ecosystems are mostly built on open source software (OSS) projects. A lot of open source software projects are driven by startups like Elasticsearch and Databricks, or cloud-scale companies like Facebook, Google, and LinkedIn. But the growing number and variety of big data platforms means that it is difficult to apply a cross-platform acceleration technology without great effort and expense. This creates a situation where numerous approaches are applied independently, creating additional complexity and unpredictability in the processing pipeline.
So we built Bigstream to solve these problems. We wanted to have a platform independent, seamless acceleration substrate that can utilize the new generation of specialized hardware accelerators. At the same time we wanted zero disruption for users of these big data and machine learning platforms. And that is the really hard part.
So getting performance out of specialized hardware without impacting developers and DevOps is a complex and difficult job. It is a much harder job to keep the same level of abstraction without introducing yet another collection of APIs that software developers have to worry about. We created Bigstream with the goal of providing “push button” hyper-acceleration: no additional training required, no extra programming to worry about. Just use the big data tools you have been using, and take advantage of Bigstream Hyper-acceleration to get the performance you need.
Hyper-acceleration is a technology that enables big data and machine learning applications automatically utilize the power of unconventional hardware (e.g. GPUs, FPGAs) as well as software optimizations with many-core CPUs. The Bigstream Hyper-acceleration Layer (HaL) functions as a runtime system that sits between a software platform (such as Apache Spark, or TensorFlow) and the underlying hardware to slice and distribute the computation between traditional CPU cores and different accelerator resources like FPGAs and GPUs.
There are a lot of approaches to accelerating some stages of a big data workload, but our approach is transparent to end users – in this case, data scientists, BI teams and application developers.
There is always a trade off between the developer productivity benefits of generality and abstraction, and getting the best performance out of specialization and customization. As soon as you build a general/abstracted framework – either in hardware, software, or a mix – you pay some cost. In other words, there is always some overhead associated with generalization and abstraction.
Take, for example, memory management. Generally speaking, you get better performance if you do it yourself, but the price you pay is the added complexity and the increased risk of making an error. On the other hand, if you want to get the best performance out of either hardware or software, that typically comes from a lot of fine-tuning, specialization, and customization. However, these customizations may not be portable or supportable across platforms.
But what we noticed is that for big data and especially machine learning, there are some common patterns in these customizations – data flow patterns – that can be generalized. So, when you put your applications in a certain template, what we call a dataflow template, then you can generalize many type of optimizations and even the specializations at the application level.
That’s the key challenge. There is always the trade-off between these two things: getting the best performance versus making things general. The solution at the very top level is to discover the common patterns in these applications – the data flows and computational models. We have found that whatever customization and specializations that an engineer is doing can, in fact, be automated by tools, compilers and runtime systems.
So, this type of optimization is nothing new. Engineers are always looking to create generic templates for a specific computation model and then optimize it to get the best performance. This is done everywhere in the technology stack.
The database community has been looking at optimizing dataflow models – or what they call query plans – for a long time. If you look at the compiler industry, they’re looking at a very similar concept of data control/dataflow graphs. If you look at all modern compilers, they extract a control dataflow graph of the program and optimize it. Even if you look at modern microprocessors at the hardware level, they are also looking at dataflow graphs at run time and trying to optimize the scheduling of instructions.
But what we noticed is that if you want new accelerators to be first-class citizens in data center infrastructures, they need an easy-to-use optimization and customization process. So, there are problems like deciding what part of the computation/data flow should execute on the general-purpose processor (CPU) and which part should execute on the accelerator, the specialized hardware. This is something that we spend a lot of time thinking about as we work to extend Bigstream’s Hyper-acceleration technology.
In Part 2 of the interview, we will talk about some of the fundamentals of software-based performance acceleration and how it applies to Apache Spark, and Spark SQL.