Pentaho Data Integration

Onsite Training

For groups of six or more

Request Quote

Public Training

Mexico City






Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class

Pentaho Data Integration Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?

Training Course

With continuous volumes and increased variety and velocity of data, organizations need fast and easy ways to harness data and gain insight from it. However, one of the biggest challenges facing IT organizations today is to provide a consistent, single version of the truth across all sources of information in an analytics-ready format. With powerful data extract, transform and load (ETL) capabilities, an intuitive and rich graphical design environment, and an open and standards-based architecture, Pentaho Data Integration is increasingly the choice over proprietary and homegrown data integration tools.

Back to Courses


Id: DI1000
Level: Introductory
Audience: Data Analyst
Delivery Method: Instructor-led online, Private on-site, Public classroom
Duration: 4 Day(s)
Cost: $2,600.00 USD
Credits: 4
Category: Pentaho Data Integration

Pentaho Data Integration provides a full ETL solution, including:

  • Rich graphical designer to empower ETL developers
  • Broad connectivity to any type of data, including diverse and big data
  • Enterprise scalability and performance, including in-memory caching
  • Big data integration, analytics and reporting, including Hadoop, NoSQL, traditional OLTP & analytic databases
  • Modern, open, standards-based architecture

Through a series of lectures and hands-on exercises covering theory, best practices, and design patterns, Pentaho Data Integration provides students the skills they need to maximize the value of data to the organization. This course helps prepare you for the Pentaho Data Integration Certification Exam.


4 Days

Upcoming Classes


Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
Mexico City (Spanish language) May 12 – May 15
May 26 – May 29

Classes in bold are guaranteed to run!


Instructor-led online training

Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
Online May 13 – May 16
Jun 3 – Jun 6
Jun 24 – Jun 27

Classes in bold are guaranteed to run!


Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
Geneva (French language) May 19 – May 22
Zurich (English language) Jun 23 – Jun 26

Classes in bold are guaranteed to run!


Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
München, (German language) Jun 3 – Jun 6
Fulda, (German language) Jun 24 – Jun 27

Classes in bold are guaranteed to run!


Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
Paris (French language) Jun 23 – Jun 26

Classes in bold are guaranteed to run!

Course Benefits

  • Improve productivity by giving your data integration team the skills they need to succeed with Pentaho Data Integration
  • Instead of coding in SQL or writing MapReduce Java functions, you can immediately gain real value from data, including from multiple sources like Hadoop, NoSQL and relational data stores, using an easy to use graphical designer
  • Learn to deliver data to a wide variety of applications using Pentaho's out-of-the-box data standardization, enrichment and quality capabilities
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

Skills Achieved

At the completion of this course, you should be able to:

  • Install Pentaho Data Integration
  • Create, preview, and run basic transformations containing steps and hops
  • View transformation results in the Step Metrics view and the Log view
  • Create a database connection and use Database Explorer to interact with a data source
  • Create more complex transformations that involve configuring the following steps: Table input, Table output, Text file output, CSV file input, Insert/Update, Add constants, Filter, Value Mapper, Stream lookup, Join rows, Merge join, Sort rows, Row normalizer, JavaScript, Dimension lookup/update, Database Lookup, Get Data from XML, Set Environment Variables, and Analytic query
  • Create transformations that use parameterized values
  • Map the structure of an online transaction processing database to the structure of an online analytical processing database
  • Load data from and write data to different data sources
  • Use ETL design patterns to populate a data warehouse
  • Create a transformation that handles slowly changing dimensions
  • Create Pentaho Data Integration jobs that: run multiple transformations, use variables, contain sub-jobs, provide built-in error notification, load and process multiple text files, and convert files into Microsoft Excel format
  • Configure logging for transformation steps and for job entries and examine the logged data
  • Configure error handling for transformation steps
  • Configure the Pentaho Enterprise Repository, including basic security
  • Use the Pentaho Enterprise Repository to: create folders; store transformations and jobs; move, lock, revise, delete, and restore artifacts
  • Schedule and monitor the execution of a transformation in Pentaho Data Integration and in the Pentaho Enterprise Console
  • Create and drop indexes using a transformation
  • Create a transformation that contains steps configured to run in a cluster, run the transformation in the cluster, examine the results, and monitor the transformation
  • Create a transformation that uses a partition schema to partition data to slave servers in the cluster

This course is the third course in the Data Analyst learning path. Students with prior database development or administration experience who are new to Pentaho Data Integration should take this course.

There are no prerequisites for this course but some ETL experience is preferred.
Though not a requirement, attendees would benefit from taking Business Analytics User Console (BA1000) prior to taking this class to gain an overview of the Pentaho Business Analytics interface.

Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.

In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho InfoCenter:

  • Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
  • RAM: at least 4GB
  • Hard drive space: at least 2GB for the software, and more for solution and content files
  • Processor: dual-core AMD64 or Intel EM64T
  • DVD drive

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at for more information regarding on-site training requirements.

Day 1

Module 1: Pentaho Data Integration Overview

Exercise 1: Introducing Pentaho Data Integration

Module 2: Inputs and Outputs

Exercise: Inputs and Outputs

Module 3: Introduction to the Training Data

(Lecture and Demo)

Module 4: Data Warehouse Steps

Exercise 3: Data Warehouse Steps

Day 2

Module 5: Lookups

(Lecture and Demo)

Module 6: Field Transformations, Part 1

Exercise 4: Lookups and Field Transformations

Module 7: Set Transformations

Exercise 5: Set Transformations

Module 9: Field Transformations, Part 2

Exercise: Field Transformations, Part 2

Module 10: Loading the Time Dimension and the Fact Table

Exercise 7: Loading a Fact Table

Day 3

Module 11: Introduction to Jobs

Exercise 8: Creating a Job

Module 12: Advanced Job Concepts

Exercise 9: Advanced Job Concepts

Module 13: Common Scripting Uses

Exercise 10: Using JavaScript

Module 14: Dynamic Transformations

(Lecture and Demo)

Module 15: Using XML in Pentaho Data Integration

Exercise 11: Using XML

Module 16: Portable Transformations and Jobs

Exercise 12: Portable Transformations and Jobs

Day 4

Module 17: Logging

Exercise 13: Configuring Logging

Module 18: Error Handling in Transformations

Exercise 14: Error Handling in Transformations

Module 19: Pentaho Enterprise Repository

Exercise 15: Pentaho Enterprise Repository

Module 20: Scheduling and Monitoring

Exercise 16: Scheduling and Monitoring

Module 21: Pre and Post-Processing

Exercise 17: Constraint and Index Management

Module 22: Interpreting Runtime Data

(Lecture and Demo)

(Optional) Module 23: Clustering and Partitioning

Exercise 18: Clustering and Partitioning