Data Ingest

While Data Ingest is a component of many data pipelines, it also stands alone as a key computing challenge that can greatly benefit from software and hardware acceleration. Fast Data Ingest is especially needed for high-volume Extraction, Transformation and Load (ETL) for Data Lakes, as well as Hadoop powered Enterprise Data Warehousing (EDW), IoT data capture, and a host of other similar applications.

The need for acceleration of data ingestion tools and ELT processes comes from the fact that moving data is often one of the most expensive and important operations in a Big Data pipeline. Data engineering and performance engineering teams know that the best option is to leave data in place, but for some operations there is no choice.

Bigstream focuses on these key parts of the Ingest pipeline:

  • High-speed data connectors to Amazon S3, Kafka and Hadoop File System (HDFS)
  • Hyperaccelerated compression and decompression of GZIP, snappy, DEFLATE and LZO formats
  • Hyperaccelerated parsing of JSON, Avro, CSV and Parquet documents/data

Bigstream Hyperacceleration provides a performance boost at several critical stages of Big Data Ingest:

Bigstream benchmark testing shows 4X+ Hyperacceleration CSV parsing with work underway on similar speedups for JSON, Parquet, and more.

Bigstream Benchmark Report

Read the Bigstream Benchmark Report to see specific Apache Spark acceleration results.

Read Now