Robust and Scalable ETL on Big Data with Apache SparkJohn McCarthy Stage
ETL pipelines are a critical component of the data infrastructure of modern enterprises. As Big Data assumes an infinite shape, one needs to process and integrate much higher volume of data coming from more sources and at much greater speed than ever before, and traditional data warehouse and related ETL/DI processes are struggling to keep the pace in the big data integration context. Building your ETL data pipelines for big data processing using Apache Spark has become viable choice of many as it not only helps organisations to dramatically reduce costs but it will facilitate agile and iterative data discovery between legacy systems and big data sources. In this session, we present the feature-rich & flexible ADASTRA Framework for Big Data Integration based on Apache Spark that enables you to build robust, scalable and reliable data pipelines for your Data Lakes and Big Data environments. We will also talk about the benefits of a Framework-based approach gained through valuable experience from successful customer projects.