Big Data Integration Workshop

Training Course

Big data represents a major shift in the technology landscape. If you are working with a big data environment or just starting to work on your big data strategy, this workshop is a great opportunity to get hands-on experience with Pentaho design patterns in big data.

The Pentaho Big Data Integration Workshop covers blueprints and best practices related to Hadoop.  You will have the opportunity to work hands-on with some of the most popular technologies in the Hadoop ecosystem. Using a combination of instructor-led presentations and hands-on exercises, the workshop teaches you Pentaho development methods for the following big data use cases:

1. Fill your data lake automatically – use metadata and real time streaming with Kafka to automatically fill your data lake.  You can find additional information about the blueprint here.

2. Build a streamlined data refinery – leverage MapReduce and Spark and learn to combine those technologies successfully to build a streamlined data refinery.  You can find additoinal information about the blueprint here.

3. Enable end user self-service analytics – deliver processed data into Impala and PostgreSQL and allow direct analytical access to HBase using virtual database interfaces.

At the end of this full-day Pentaho workshop, data architects and developers will have hands-on experience with a complete, visual design environment to simplify and accelerate data preparation, integration, and analysis within a big data architecture.

Training Agenda

9:00am- 9:30am Registration and Breakfast
9:30am-10:15am Welcome and Introduction
10:15am-12:00pm Use Case 1: Fill Your Data Lake Automatically
12:00pm-1:00pm Hosted Lunch
1:00pm-2:30pm Use Case 2: Build a Streamlined Data Refinery (SDR)
2:30-2:45pm Break
2:45-4:15pm Use Case 3: Enable End User Self-Service Analytics
4:15-4:30pm Summary and Q&A


Level: Advanced
Audience: Big Data Architects, ETL Developers, ETL Architects
Duration: 1 Day
Cost: $249.00 USD
Category: Big Data Integration


7 hours

Upcoming Classes


Location Oct 2016 Nov 2016 Dec 2016 Jan 2017 Feb 2017
Stuttgart Oct 25

Class dates in bold are guaranteed to run!

During this hands-on training, you will work with:

  • Pentaho Data Integration (PDI) to manipulate big data
  • Pentaho Analyzer to analyze big data with Impala, Postgres and HBase
  • Pentaho Dashboard Designer to build and design reports


This hands-on training is an advanced course and is intended for those experienced in database design and development, SQL programming, and reporting. The course is designed to help users who are evaluating Pentaho for a big data implementation.

While there are no prerequisites for the training, prior experience with ETL and reporting tools is helpful in completing the course objectives. Attendees will also benefit from prior Hadoop development or administration experience.

Pentaho will provide all hardware and software required to participate in this workshop.  Attendees will be working with Pentaho as well as VMWare Virtual Machine (VM) software with Cloudera and Postgres database sets.

Day 1

Module 1: Fill the Data Lake

Use PDI to ingest files sources to Hadoop leveraging metadata injection
Use PDI with Kafka to ingest IoT data

Module 2: Create a Data Refinery

Use PDI to build Visual MapReduce jobs to transform and blend data
Use PDI to orchestrate MapReduce and Spark jobs
Use PDI to automatically deliver data into Impala and Postgres
Use Analyzer to create an analysis of data delivered to Impala and Postgres

Module 3: Self Service Data Preparation

Use Impala to blend IoT data and visualize in PDI

Module 4: Self Service Data Analysis

Analyze HBase data with Analyzer by delivering into Postgres
Analyze HBase and Salesforce data with Analyzer leveraging Pentaho Data Services

Onsite Training

For groups of six or more

Request Quote

Public Training


Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class

Big Data Integration Workshop Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?