Bigstream Benchmark Report Shows 54% Cost Savings on Spark Deployments

« Back

June benchmark results show a 2.89X speedup of Sparkspark-rocket

By George Demarest, Vice President of Marketing, Bigstream

June 13, 2017

Bigstream is in the acceleration business. Our approach to hyper-acceleration is at the platform level, as opposed to point solutions that require special APIs, custom coding, and changes to IT and DevOps processes. This means that as we add new platforms and features, the entire platform benefits. And Bigstream customers reap the benefits without ever changing a line of code.

This month’s Bigstream Benchmark Report shows a nearly 300% performance gain over the baseline open source Apache Spark 2.1 on the same configuration. This is the result of a complete performance overhaul of Spark using Bigstream Hyper-acceleration technology.  To get a better understanding of what Hyper-acceleration is, check out this interview with Bigstream CEO Maysam Lavasani.

Benchmarks are part of our daily processes and from time to time it makes sense to share performance data as we refine and expand our hyper-acceleration product.    The numbers and types of testing we do will evolve over time, but we use a combination of standard industry benchmarks  like TPC-DS (decision support), as well as some specific big data and machine learning use cases such as ETL, data ingest and parsing, SQL analytics and vertical industry use cases like AdTech real-time bidding, FinServ trading systems, and Retail analytics.

Here is a summary of this month’s results running Apache Spark with the Bigstream Hyper-acceleration Layer on Amazon EMR versus unaccelerated Apache Spark on the same configuration.

avro-tpc-ds-june csv-tpc-ds-june

Here is a look at the full results of the run:

tpc-ds-june-graph

All tests were done on standard Amazon EC2 server instances (m4.4xlarge) running on Intel Xeon processors.

For a full accounting of the results, take a look at the Bigstream Benchmark Report for July 2017.  In that report, you will see the following:

  • Test environment: Amazon EMR cluster of four m4.4xlarge instances, Apache Spark 2.1.1
  • Average acceleration: 2.89X using csv data, 2.47X using Avro data
  • Maximum acceleration:  3.86X on TPC-DS Query #9 (csv), 3.11X on TPC-DS Query #27 (Avro)
  • Average monthly cost savings on Amazon EMR: 54% (csv) and 46% (Avro)

Check out the report to get a bit more detail about the testing environment and how we worked out the cost savings. Things are going to get even more interesting when we publish our next benchmark report. In that report, we will include some testing of FPGA powered servers.

benchmark-report

« Back