Pentaho Big Data Integration

Onsite Training

For groups of six or more

Request Quote

Public Training

Training Course

With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Pentaho provides the right set of tools to each user, all within a tightly coupled data integration and analytics platform that supports the entire big data lifecycle. For IT and developers, Pentaho provides a complete, visual design environment to simplify and accelerate data preparation and modeling. For business users, Pentaho provides visualization and exploration of data. And for data analysts and scientists, Pentaho provides full data discovery, exploration and predictive analytics.

Using a combination of instructor-led presentations and hands-on exercises, this course provides an overview of the big data capabilities within Pentaho Data Integration, including visualization tools. This course helps prepare you for the Pentaho Data Integration Certification Exam.

Back to Courses


Id: DI1100
Level: Advanced
Audience: Data Analyst
Delivery Method: Instructor-led online, Private on-site
Duration: 1 Day(s)
Cost: $650.00 USD
Credits: 1
Category: Pentaho Data Integration


8 hours

Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.

Skills Achieved

At the completion of this course, you should be able to:

  • Use Pentaho Data Integration (and Pentaho MapReduce) to manipulate big data
  • Orchestrate big data jobs in Pentaho Data Integration
  • Visualize big data using Pentaho InstaView

This course is the fourth course in the Data Analyst learning path. DI1100 is an advanced course and is intended for students experienced in both PDI and big data. Students who need a comprehensive overview of big data tools and technologies should take course DI2000 Big Data Fundamentals

Before taking this class, students should complete course DI1000: Pentaho Data Integration or have equivalent field experience with Pentaho Data Integration.

Big data knowledge is also required. This course does not present an overview of the various big data tools and technologies.For an overview of big data tools and technologies, please register for DI2000 Big Data Fundamentals.

The content in this course is fully covered in DI2000 Big Data Fundamentals so there is no need to register for both.

Some basic knowledge of the Linux operating system (CentOS) is required.

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at for more information regarding on-site training requirements.

Day 1

Course Agenda

  • Module 1: Overview of Pentaho Big Data and Business Intelligence
  • Module 2: PDI and HDFS Integration
  • Module 3: PDI and MapReduce Integration
  • Module 4: MapReduce Best Practices
  • Module 5: Hive and Impala
  • Module 6: PDI and NoSQL Datastores
  • Module 7: Analytical Columnar Databases
  • Module 8: Job Orchestration
  • Module 9: Oozie, Pig and Sqoop
  • Module 10: Reporting on Big Data
  • Module 11: Working with Pentaho Instaview
  • Module 12: Working with Amazon