Big Data Integration Workshop
Big data represents a major shift in the technology landscape. If you are working with a big data environment or just starting to work on your big data strategy, this workshop is a great opportunity to get hands-on experience with Pentaho design patterns in big data.
The Pentaho Big Data Integration Workshop covers blueprints and best practices related to Hadoop. You will have the opportunity to work hands-on with some of the most popular technologies in the Hadoop ecosystem. Using a combination of instructor-led presentations and hands-on exercises, the workshop teaches you Pentaho development methods for the following big data use cases:
1. Fill your data lake automatically – use metadata and real time streaming with Kafka to automatically fill your data lake. You can find additional information about the blueprint here.
2. Build a streamlined data refinery – leverage MapReduce and Spark and learn to combine those technologies successfully to build a streamlined data refinery. You can find additoinal information about the blueprint here.
3. Enable end user self-service analytics – deliver processed data into Impala and PostgreSQL and allow direct analytical access to HBase using virtual database interfaces.
At the end of this full-day Pentaho workshop, data architects and developers will have hands-on experience with a complete, visual design environment to simplify and accelerate data preparation, integration, and analysis within a big data architecture.
|9:00am- 9:30am||Registration and Breakfast|
|9:30am-10:15am||Welcome and Introduction|
|10:15am-12:00pm||Use Case 1: Fill Your Data Lake Automatically
|1:00pm-2:30pm||Use Case 2: Build a Streamlined Data Refinery (SDR)|
|2:45-4:15pm||Use Case 3: Enable End User Self-Service Analytics
|4:15-4:30pm||Summary and Q&A|
Module 1: Fill the Data Lake
Use PDI to ingest files sources to Hadoop leveraging metadata injection
Use PDI with Kafka to ingest IoT data
Module 2: Create a Data Refinery
Use PDI to build Visual MapReduce jobs to transform and blend data
Use PDI to orchestrate MapReduce and Spark jobs
Use PDI to automatically deliver data into Impala and Postgres
Use Analyzer to create an analysis of data delivered to Impala and Postgres
Module 3: Self Service Data Preparation
Use Impala to blend IoT data and visualize in PDI
Module 4: Self Service Data Analysis
Analyze HBase data with Analyzer by delivering into Postgres
Analyze HBase and Salesforce data with Analyzer leveraging Pentaho Data Services