WITH BIGSTREAM TECHNOLOGY
After holding for 50 years, Moore’s Law is coming to an end. The law that predicted a doubling in processor compute power approximately every 18 months has ceased to hold largely because of fundamental, physics-based reasons.
At the same time, emerging big data technologies such as real-time and predictive analytics, machine learning, deep learning, natural language processing and artificial intelligence are becoming indispensable to tech-forward industries such as digital media, healthcare, financial services and telecommunications. With their input data needs ever-growing in size, these new big data workloads have inspired a new generation of tools such as Apache Hive, Apache Spark, and TensorFlow that are pushing advanced analytics into the mainstream.
Large clustered systems have been employed to address these large computations but cluster scaling alone has its limitations. Analysts and infrastructure providers alike have turned to the use of hardware acceleration such as GPUs, or FPGAs to provide needed performance. However, this has had limited success due to two conditions: high complexity and a lack of portability. These conditions lead to higher costs due to specialized support and skills requirements, additional programming time, increased test complexity, and the risk of operational instability. See Figure 1.
FIGURE 1. COMPLEXITY AND LACK OF PORTABILITY LIMIT ADOPTION OF HARDWARE ACCELERATION
Bigstream Solutions has pioneered an advanced data-flow computational architecture to solve the problem of providing performance with a minimum of complexity. The Bigstream Hyper-acceleration Layer™ (HaL) enables scaling of the analytics engines and accelerates time-to-insight for businesses. It provides orders of magnitude improvements in performance and scalability, while the architectural model results in greater resiliency and predictability of these new applications. Most critically, it can be leveraged with zero code change. Bigstream HaL can accelerate a variety of execution engines, with Apache Spark being the first supported platform. Others will follow.
FIGURE 2. BIGSTREAM HYPER-ACCELERATION LAYER REDUCES COMPLEXITY AND REQUIRES NO CODE CHANGES
The high-level Bigstream HaL architecture is shown in Figure 2 (compare to Figure 1). Bigstream HaL address the two issues with current acceleration described above, because it is transparent from a user perspective, while also agnostic from a hardware adaptation perspective. That is, it presents the user with an unchanged programming/scripting interface, while automatically adapting to the underlying hardware to extract high performance.
Bigstream hyper-acceleration is achieved by translating the dataflow representation of a computation which is specific to a platform such as Spark into an optimized, platform independent dataflow. Bigstream HaL is composed of two main approaches to Hyper-acceleration:
Both are appropriate for cloud and on-premise deployment, and are transparent to end users. The first platform targeted by the technology is Spark. The additional big data technologies shown in Figure 2 addition are all being considered for the Bigstream product roadmap.
We have developed and performance benchmarked this technology. The specific technologies tested are Spark/SQL and Dataframes, which are widely used in the data analytics community. In this section, we describe the evaluation setup and results. The benchmarks use techniques that apply to a wide set of currently relevant analytics areas. Results will show that Bigstream HaL displays best-in-breed performance, while maintaining the simplicity of existing Spark and Spark/SQL interfaces. The latter aspect suggests it can readily be deployed in a production environment.
Deployment of Kafka data streaming in conjunction with Spark Streaming is a very important use case for a number of big data application areas. These include online or near-real-time applications such as at-watch network security, digital media bidding/auctioning, financial decision systems and many more. The basic setup of such a data processing system is shown below.
FIGURE 3. BIGSTREAM APPLIED TO A SPARK STREAMING + KAFKA APPLICATION ARCHITECTURE
Figure 3 is a schematic of our Spark Streaming/Kafka test setup, where a set of Kafka servers provides streaming data to Spark for processing. For efficient data consumption, the key parameter is the request frequency with which Spark requests data from the Kafka server. Data is requested and delivered to Spark in chunks called micro-batches. Micro-batches are then processed by executors as jobs comprised of RDD operations.
For real-time or online operation, it is important that the request frequency be less than the processing time for each micro-batch to prevent data backlog. Of course, if this relationship is violated, the backlog will be constantly-expanding and unbounded, an untenable situation.
A POC partner provided a Spark Streaming application for the setup above to compare standard vs. Bigstream HaL accelerated Spark. We emphasize that all three measurements were performed with no change to the user-level processing code, and were run using the same Kafka broker configuration. In these tests, micro-batches of JSON data were almost 1GB each in size.
BIGSTREAM SPEEDUP OVER SPARK
Spark (no acceleration)
Bigstream software acceleration
Bigstream FPGA acceleration
TABLE 1: BIGSTREAM HYPER-ACCELERATION RESULTS OVER SPARK
Table 1 shows the results for micro-batch processing times. Bigstream HaL shows a 3x and 13x speedup for software and hardware acceleration, respectively. This implies a potential for 3x and 13x improvement for online processing throughput. We expect that this level of performance improvement will widen the scope of problems that can be solved using a given configuration. Viewed another way, it enables cost savings by producing equal throughput for a smaller configuration.
The TPC-DS benchmarks are well known for testing performance of decision system support solutions, e.g. big data systems. They are a set of SQL queries, along with associated test data generation tools, designed to evaluate performance of big data systems that support SQL. For these tests, the queries were used without any changes.
The dataset used is the standard TPC-DS dataset with a size of 2GB. The benchmarks are composed of standard, common SQL operations such as FILTER, SELECT, GROUP BY, ORDER BY, LIMIT, WHERE (i.e. filters), and implicit JOIN operations via the use of multi-table SELECT,WHERE clauses. We selected a set of 26 queries that cover a large set of SQL operations Please see the TPC website for more details on the benchmarks.
FIGURE 4. RESULTS FOR THE TPC-DS BENCHMARKS WITH BIGSTREAM FPGA ACCELERATION
The accelerated tests were run on an Intel-based Micron AC-520 FPGA. Figure 4 shows the speedup over running the same set of queries on a late-generation Xeon 6 CPU alone. The figure shows that Bigstream acceleration provides an average of 4x speedup for the 26 queries tested. The speedup is dependent on the mix of operators used by a particular benchmark, and went as high as 9.5x. This performance improvement can be leveraged in either of two ways:
Amazon is now offering FPGA-powered “F” instances, and we are in the process of testing Bigstream hardware acceleration on these instances. In addition to providing acceleration, the product can be viewed as a usability vehicle. Using Bigstream, data scientists can now program their domain specific applications onto an FPGA instance such as “F” with zero code change.
This section presents a technical overview of Bigstream HaL and the role of its components. We focus on its relationship to the standard Spark architecture and how it enables acceleration transparently.
FIGURE 5: BASELINE SPARK ARCHITECTURE
Figure 5 shows the basic components of standard Spark using YARN for resource management. The Spark components and associated roles are as follows:
Spark Driver – Runs the client application and communicates with the Master to install the application to be run and configurations for the cluster. The configurations include number of Masters and Core nodes as well as memory size choices for these.
Spark Master – Instantiates the Spark Executors, also known as the Core nodes. The Master must communicate with the Resource Manager with requests for resources as per the application needs. The Resource Manager system, in turn, allocates resources for Executor creation. The Master creates the stages of the application and distributes tasks to the Executors.
Spark Executor – Runs individual Spark tasks, reporting back to the Master when stages are completed. The computation proceeds in stages, generating parallelism among the Executor nodes. It’s clear that the faster that the Executors can execute their individual task sets, the faster stages can finish, and therefore the faster the application finishes. In standard Spark, tasks are created as Java bytecode at runtime and downloaded to the Executors for execution.
FIGURE 6: BIGSTREAM HYPER-ACCELERATION ARCHITECTURE
Figure 6 above shows Spark architecture with Bigstream HaL acceleration integrated. Note that this illustration applies equally to software and hardware (many-core, GPU and FPGA) acceleration. The red arrows and red outlined items indicate HaL components that are added at bootstrap time and can then provide acceleration throughout the course of multiple application executions. The Client Application, Driver, Resource Manager components, and the structure of the Master and Executors all remain unchanged. Bigstream HaL does not require changes to anything in the system related to fault tolerance, storage management and resource management. It has been carefully designed only to provide an alternative execution substrate at a node level that is transparent to the rest of Spark. We describe the functions and interfaces of the components:
Spark Master – Generates the physical plan exactly as in standard Spark through the execution of the Catalyst optimizer. Note that the standard byte-code for Spark tasks are generated by the Master as normal.
Bigstream Runtime – The Bigstream runtime is a set of natively compiled C++ modules and their associated APIs that implement accelerated versions of Spark operations.
Streaming Compiler – The Bigstream Gorilla++ Streaming Compiler examines the physical plan and inspects/evaluates individual stages for potential optimized execution. Details of the evaluation are omitted here, but the output of the process is a set of calls into the Bigstream Runtime API, implementing each stage in the plan if deemed possible.
Spark Executor – Via a hook inserted at cluster bootstrap time, all Executors possess a pre-execution check that determines if a stage has been accelerated. If so, the associated compiled module is called. Otherwise, the standard java byte-code version is executed. It is important to note that this check is transparent to the programmer; she is unaware whether a stage is running accelerated, except for the difference in performance. Thus, stages are accelerated optimistically, defaulting to being run as in standard Spark. As can be seen from this description, users of Bigstream HaL are presented an identical interface to standard Spark. This also allows Bigstream HaL to be updated incrementally as features become available, making it easily extensible.
We are offering the Bigstream acceleration layer to solve real-world big data problems. Engaged organizations can expect state of the art acceleration technology and the expertise to realize significant, measurable ROI, and staggering performance gains. Working with Bigstream will result in successful accelerated production deployments, as well as a better understanding of computing workloads. The optimal performance architecture will make big data and advanced analytics part of your competitive edge, in any business area.