ETL/ELT

The disruption of traditional ELT and Enterprise Data Warehousing (EDW) has been one of the most notable impacts of Big Data platforms such as Hadoop, Spark, and NoSQL databases. Data Lakes now often provide many of the essential functions  of EDW that remain relevant in the era of machine learning, and advanced analytics.

The key new ingredients of ELT/EDW in the Big Data era are the radical increase in unstructured data and the introduction of fast data into the equation.

  • Faster response times for ad hoc queries and large scale joins executed via Spark SQL
  • Rapid massive data ingest from Hadoop HDFS, Amazon S3, and Apache Kafka
  • Hyperaccelerated data/document parsing of JSON, CSV, Parquet and Avro data
  • Accelerated ELT, data cleansing, and data enrichment processes in Spark

Bigstream Benchmark Report

Read the Bigstream Benchmark Report to see specific Apache Spark acceleration results.

Read Now