Pentaho Data Integration Advanced

Pentaho Data Integration Advanced (DI1500)

Training Course

This course is designed to build upon your fundamental knowledge of Pentaho Data Integration (PDI).   

Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios.  You'll add PDI as a data source for a variety of visualization options, utilize PDI's streaming data processing capabilities, build transformations with metadata injection, and scale and performance tune your PDI solution.

This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.

Back to Courses


Id: DI1500
Audience: Data Analyst, ETL Developer
Delivery Method: Instructor-led online, Private on-site, Public classroom
Duration: 2 Day(s)
Cost: $1,350.00 USD
Credits: 2
Category: Pentaho Data Integration



2 Days

Upcoming Classes


Instructor-led online training

Location Mar 2019 Apr 2019 May 2019 Jun 2019 Jul 2019
Online - APAC Mar 28 – Mar 29
May 2 – May 3
Online - NA Apr 11 – Apr 12

Class dates in bold are guaranteed to run!

Course Benefits

  • Improve productivity by learning advanced PDI techniques that improve efficiency and expand capabilities
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

Skills Achieved

At the completion of this course, you should be able to:

  • Understand how to manage the project lifecycle in different development environments
  • Use PDI as a data source for Pentaho Report Designer, CDA, Data Services, and machine learning applications
  • Utilize PDI's streaming data processing capabilities with MQTT and Kafka
  • Reduce manual tasks by harnessing the power of metadata injection
  • Scale PDI by using Carte clustering, monitoring, and partitioning
  • Tune PDI with checkpoints and logging

This course is part of the Data Analyst learning path. Students already familiar with Pentaho Data Integration should take this course.

Students should complete DI1000 Pentaho Data Integration Fundamentals prior to attending this class. Comparable PDI experience is also acceptable. This course will not cover any fundamental concepts.

Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.

In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho Documentation site.

  • Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
  • RAM: at least 4GB
  • Hard drive space: at least 2GB for the software, and more for solution and content files
  • Processor: dual-core AMD64 or Intel EM64T
  • USB port

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to GoToTraining. For more information on GoToTraining requirements, see Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at for more information regarding on-site training requirements.

Day 1

Module 1: Metadata Injection

  Lesson 1: Overview of Metadata Injection Concepts

  Lesson 2: Metadata Injection Workflows

      Guided Demo: Standard Metadata Injection

      Guided Demo: Push Metadata Injection

      Guided Demo: Pull Metadata Injection

      Guided Demo: Push/Pull Metadata Injection

      Exercise: Push/Pull Metadata Injection

      Exercise: Phase Metadata Injection

      Guided Demo: Using Filters in Metadata Injection (2-phase Metadata Injection)

      Exercise: Retail Sales Case Study

Module 2: PDI as a Data Source

  Lesson 1: Report Designer

      Guided Demo: Pentaho Reporting Step

      Guided Demo: Pentaho Reporting - Parameters

      Demonstration: Report Designer - PDI Transformation

  Lesson 2: Community Data Access

      Guided Demo: Community Data Access

  Lesson 3: Data Services

      Guided Demo: Configuring a Twitter Data Service

  Lesson 4: Machine Learning

      Guided Demo: Retail Fraud

Day 2

Module 3: Data Streaming

  Lesson 1: MQTT

      Guided Demo: MQTT with GPS data

  Lesson 2: Kafka

      Demonstration: Using Kafka to Obtain a Streaming Twitter Feed in PDI

Module 4: Scalability

  Lesson 1: Clustering Carte Servers

      Guided Demo: Configure Master and Slave Server Nodes

      Guided Demo: Monitoring Master and Slave Server Nodes

      Guided Demo: Round-Robin vs. Copy

      Guided Demo: Clustering and Group by

  Lesson 2: Partitioning

      Guided Demo: Stream Partitioning

  Lesson 3: Checkpoints

      Exercise: Using Checkpoints to Restart Jobs

      Exercise: Using Checkpoints

Onsite Training

For groups of six or more

Request Quote

Public Training

Online - APAC

Online - NA

Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class

What Our Clients Are Saying

This class gave me the opportunity to spend time looking at new aspects of the PDI tool and to help me get kick started in new ways to use the software.

- Invoca, Inc.

Pentaho Data Integration Advanced Ratings

Averaged from 33 responses.

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?