Big Data Integration Workshop

Training Course

Big data represents a major shift in the technology landscape. To tackle challenges related to the increased variety, velocity, and volume of information, companies are turning to Hadoop, NoSQL and Relational Databases. If you are working with a big data environment or just starting to work on your big data strategy, this workshop is a great opportunity to get hands-on experience with Pentaho design patterns in big data.

The Pentaho Big Data Integration Workshop will teach you how to use Pentaho to implement three common big data uses with Cloudera, MongoDB and Postgres.  Pentaho provides data integration and an analytics platform that supports the entire big data lifecycle.  Using a combination of instructor-led presentations and hands-on exercises, the workshop teaches you Pentaho development methods for the following big data use cases:

1. EDW Optimization – offload and optimize your EDW with Cloudera

2. Streamlined Data Refinery – implement a streamlined data refinery with Cloudera and Postgres

3. Customer 360◦ View – create a single view of your customers with MongoDB

At the end of this full-day Pentaho workshop, data architects and developers will have hands-on experience with a complete, visual design environment to simplify and accelerate data preparation, integration, and analysis within a big data architecture.

Training Agenda

9:00am- 9:30am Registration and Breakfast
9:30am-10:15am Welcome and Introduction
10:15am-12:00pm Use Case 1: EDW Optimization
12:00pm-1:00pm Hosted Lunch
1:00pm-2:30pm Use Case 2: Streamlined Data Refinery (SDR)
2:30-2:45pm Break
2:45-4:15pm Use Case 3: Creating a 360◦ View
4:15-4:30pm Summary and Q&A


Level: Advanced
Audience: Big Data Architects, ETL Developers, ETL Architects
Duration: 1 Day
Cost: $249.00 USD
Category: Big Data Integration


7 hours

Upcoming Classes


Location Sep 2016 Oct 2016 Nov 2016 Dec 2016 Jan 2017
Stuttgart Oct 25

Class dates in bold are guaranteed to run!

During this hands-on training, you will work with:

  • Pentaho Data Integration (PDI) to manipulate big data
  • Pentaho Analyzer to analyze big data with Impala, Postgres and MongoDB
  • Pentaho Report Designer and Dashboard Designer to build and design reports


Orchestrate big data jobs in Pentaho, including:

  • Optimize Data Warehouse - with Cloudera data
  • Streamline Data Refinery - with Postgres data
  • Customer 360 View - with MongoDB data

This hands-on training is an advanced course and is intended for those experienced in database design and development, SQL programming, and reporting. The course is designed to help users who are evaluating Pentaho for a big data implementation.

While there are no prerequisites for the training, prior experience with ETL and reporting tools is helpful in completing the course objectives. Attendees will also benefit from prior Hadoop development or administration experience.

Pentaho will provide all hardware and software required to participate in this workshop.  Attendees will be working with Pentaho as well as VMWare Virtual Machine (VM) software with Cloudera, Postgres, and MongoDB database sets.

Day 1

Module 1: Enterprise Data Warehouse Optimization

Use PDI to load Hadoop
Use PDI to create visual map reduce transformations
Use PDI to create a job to sequence & automate transformations
Use Analyzer and Dashboard Designer to analyze data in Hadoop

Module 2: Streamlined Data Refinery

Use PDI to load geo-location to Hadoop
Use PDI to create Impala tables and execute Impala queries for data blending
Use PDI to load blended data to Postgres for analytics
Use Analyzer with Impala to analyze the blended data in Postgres

Module 3: Customer 360 View

Use PDI to create a MongoDB customer 360 data store
Use Analyzer for MongoDB to analyze customer data
Use Pentaho Report Designer to build a highly formatted production report

Onsite Training

For groups of six or more

Request Quote

Public Training


Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class

Big Data Integration Workshop Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?