Pentaho Data Integration Advanced
This course is designed to build upon your fundamental knowledge of Pentaho Data Integration (PDI).
Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios. You'll add PDI as a data source for a variety of visualization options, utilize PDI's streaming data processing capabilities, build transformations with metadata injection, and scale and performance tune your PDI solution.
This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.
- Improve productivity by learning advanced PDI techniques that improve efficiency and expand capabilities
- Interactive, hands-on training materials significantly improve skill development and maximize retention
At the completion of this course, you should be able to:
- Understand how to manage the project lifecycle in different development environments
- Use PDI as a data source for Pentaho Report Designer, CDA, Data Services, and machine learning applications
- Utilize PDI's streaming data processing capabilities with MQTT and Kafka
- Reduce manual tasks by harnessing the power of metadata injection
- Scale PDI by using Carte clustering, monitoring, and partitioning
- Tune PDI with checkpoints and logging
Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.
In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho Documentation site.
- Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
- RAM: at least 4GB
- Hard drive space: at least 2GB for the software, and more for solution and content files
- Processor: dual-core AMD64 or Intel EM64T
- USB port
Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to GoToTraining. For more information on GoToTraining requirements, see http://www.gotomeeting.com/online/training. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.
For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.
Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at firstname.lastname@example.org for more information regarding on-site training requirements.
Module 1: Metadata Injection
Lesson 1: Overview of Metadata Injection Concepts
Lesson 2: Metadata Injection Workflows
Guided Demo: Standard Metadata Injection
Guided Demo: Push Metadata Injection
Guided Demo: Pull Metadata Injection
Guided Demo: Push/Pull Metadata Injection
Exercise: Push/Pull Metadata Injection
Exercise: Phase Metadata Injection
Guided Demo: Using Filters in Metadata Injection (2-phase Metadata Injection)
Exercise: Retail Sales Case Study
Module 2: PDI as a Data Source
Lesson 1: Report Designer
Guided Demo: Pentaho Reporting Step
Guided Demo: Pentaho Reporting - Parameters
Demonstration: Report Designer - PDI Transformation
Lesson 2: Community Data Access
Guided Demo: Community Data Access
Lesson 3: Data Services
Guided Demo: Configuring a Twitter Data Service
Lesson 4: Machine Learning
Guided Demo: Retail Fraud
Module 3: Data Streaming
Lesson 1: MQTT
Guided Demo: MQTT with GPS data
Lesson 2: Kafka
Demonstration: Using Kafka to Obtain a Streaming Twitter Feed in PDI
Module 4: Scalability
Lesson 1: Clustering Carte Servers
Guided Demo: Configure Master and Slave Server Nodes
Guided Demo: Monitoring Master and Slave Server Nodes
Guided Demo: Round-Robin vs. Copy
Guided Demo: Clustering and Group by
Lesson 2: Partitioning
Guided Demo: Stream Partitioning
Lesson 3: Checkpoints
Exercise: Using Checkpoints to Restart Jobs
Exercise: Using Checkpoints