After holding for 50 years, Moore’s Law is coming to an end. The law that predicted a doubling in processor transistor count, and hence compute power, every 18 months has ceased to hold largely because of fundamental, physics-based reasons.
At the same time, big data and machine learning are being adopted by enterprises as a means of creating a competitive advantage. With input data needs ever-growing in size, these new big data workloads have inspired a new generation of tools such as Apache Hive, Apache Spark, and TensorFlow that are pushing advanced analytics into the mainstream.
Large clustered systems have been employed to address these large computations, but cluster scaling alone has its limitations in providing high performance. Scale up and scale out strategies can work effectively for smaller workloads, but they run into diminishing returns when cluster sizes (Scale out), or server capability (Scale up) grow larger.
Hardware acceleration such as GPUs, or FPGAs provide a vehicle to provide high performance and, in fact, enhance the gains of scaling as well. To date, however, acceleration has had limited success due to a key gap, illustrated by Figure 1.
Today, there is no automated way for big data platforms such as Spark to leverage advanced field programmable hardware. Consequently, data scientists, analysts and quants must work with performance engineers to fill the programming model gap illustrated in Figure 1. Though feasible, this process is typically inefficient and time consuming.
The gap stems from the fact that data scientists, developers and quants are accustomed to programming using big data platforms in a high-level language. Performance engineers, on the other hand, are focused on programming at a low level, including field programmable hardware. Thus, the scarcity of resources, along with additional implementation time can significantly lengthen time to value of analytics when accelerating. In addition, the resultant solutions are typically difficult to change/update as analytics evolve.
Bigstream has developed technology to address this gap. The architecture is illustrated in Figure 2.
At a high level, Bigstream Hyper-acceleration automates the process of acceleration for users of big data platforms. It is comprised of compiler technology for both software acceleration via native C++, and FPGA acceleration via bitfile templates. As shown in Figure 2, this technology yields between 2x-30x end-to-end factor in performance for analytics, but with zero code change.
The rest of this paper discusses performance results, use cases and technical details of Bigstream Hyper-acceleration.
The most common method of increasing cluster performance when data needs grow is scaling. Scale up increases the capability of cluster nodes, keeping their number the same. Scale out refers to increasing the number of nodes in the cluster, keeping their type the same. Obviously mixed scaling approaches that apply both scale up, scale out also exist.
Figure 3 illustrates the two approaches to scaling. In this example, both approaches increase the number of virtual CPUs (vCPUs) as the cluster scales, increasing the compute power. It is also possible to scale in other ways, such as network connections, memory, disk and other resources. In both scale up and scale out, however, the idea is to increase performance by adding resources.
Scaling, in almost all cases, yields sub-linear performance increase as resources are added. That is, as the cluster is scaled by a factor of N, performance increase is almost always less than N. As scales become large and very large, this diminishing return becomes very severe. The technical reasons for this are listed below:
Acceleration such as that provided by Bigstream has the opportunity to improve the outlook for scaling in two ways: 1. It provides the ability reduce the size of the cluster needed to yield a given performance level, and 2. It reduces the overhead of some of the above factors (i.e. network, I/O), thus reducing their impact.
Figure 4 shows the results of some experimentation conducted that illustrates the scaling issue, in this example for scale up. We ran two TPC-DS (http://www.tpc.org/tpcds/default.asp) benchmark queries on Amazon EMR using Spark, in various cluster scenarios. Moving from left to right in the figure, each point represents the performance seen with the given number of vCPU (16,32,64,128,256). Thus, we scale up the cluster by 2x at each step. Speedup is calculated with respect to the datapoint labeled “Base”. The speedup performance of both benchmarks fall off from the blue linear line as the cluster scales, likely due to reasons listed above.
Figure 5 shows the same results as Figure 4 (dashed lines), but with results for clusters equipped with Bigstream software-based acceleration in use (solid lines) added. The accelerated curve displays a much more gentle falloff with scaling than the Spark curve. In addition, comparing datapoints horizontally, we see that acceleration has the potential to allow a smaller cluster to actually outperform a larger cluster.
We see similar results in experiments with scale out. These results indicate that acceleration can work synergistically with scaling, to provide maximum performance and a wide variety of performant configuration choices for the user. This, in turn, can result in total cost of ownership (TCO) savings. For cloud users, it enables the use of smaller clusters, or use of the same cluster for a shorter amount of time, to achieve a given analysis. For on-premise clusters, it allows for more analyses to be accomplished per unit of operation time.
As stated earlier, hardware-based acceleration has the highest performance potential. Adding an FPGA to a server can be a cost-effective way to speed up big data platforms, if the introduced hardware can be easily leveraged. These chips are typically a fraction of a cost of a full CPU-based server.
The performance results in Figure 6 represent a demonstration of Bigstream FPGA-based acceleration on (i.e. running on an FPGA instance). The results were obtained using a commodity FPGA platform, with Bigstream Hyper-acceleration software installed. 104 TPC-DS Spark benchmarks were run on the platform CPU-only (baseline), and using the FPGA (accelerated). Speedup was calculated per-benchmark by dividing the baseline runtime by the accelerated runtime.
A maximum and average of 5x and 3.3x speedup, respectively, were observed, with zero code change to the benchmarks. As the Bigstream FPGA product evolves, we expect to use both multiple-FPGA configurations, and a larger footprint on each chip. Therefore, we expect this number, and hence performance, to increase. This result demonstrates not only the performance advantage that hardware-based acceleration can provide, but also the usability Bigstream can provide for FPGA platforms.
This section presents a technical overview of Bigstream technology as applied to Spark, and the role of its components. We focus on its relationship to the standard Spark architecture and how it enables acceleration transparently.
Figure 7 shows the basic components of standard Spark using YARN for resource management. The Spark components and associated roles are as follows:
Figure 8 above shows Spark architecture with Bigstream acceleration integrated. Note that this illustration applies equally to software and hardware (many-core, GPU and FPGA) acceleration. The red arrows and red outlined items indicate HaL components that are added at bootstrap time and can then provide acceleration throughout the course of multiple application executions. The Client Application, Driver, Resource Manager components, and the structure of the Master and Executors all remain unchanged. Bigstream HaL does not require changes to anything in the system related to fault tolerance, storage management and resource management. It has been carefully designed only to provide an alternative execution substrate at a node level that is transparent to the rest of Spark. We describe the functions and interfaces of the components:
Bigstream’s current product focuses on acceleration of Spark Platform in the following use cases:
We are offering the Bigstream acceleration layer to solve real-world big data problems. Engaged organizations can expect state of the art acceleration technology and the expertise to realize significant, measurable ROI, and staggering performance gains. Working with Bigstream will result in successful accelerated production deployments, as well as a better understanding of computing workloads. The optimal performance architecture will make big data and advanced analytics part of your competitive edge, in any business area.