Predictive Analytics with PDI and Weka

Predictive Analytics with PDI and Weka (DI3000)

Training Course

This course is designed to introduce you to the predictive analytics capabilities of Pentaho Data Integration and Weka. Building upon Pentaho Data Integration Fundamentals, you will learn how Pentaho tools assist you with:

  • Data Understanding
  • Data Preparation / Engineering
  • Predictive Modeling
  • Deployment and Operationalization

This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.

Back to Courses


Id: DI3000
Level: Advanced
Audience: Data Analyst
Delivery Method: Instructor-led online, Private on-site, Public classroom
Duration: 2 Day(s)
Cost: $1,350.00 USD
Credits: 2
Category: Pentaho Data Integration


2 Days

Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.

Course Benefits

  • Improve productivity by giving your data science team the skills they need to succeed with Pentaho Data Integration and Weka
  • Apply predictive modeling in a real-world scenario to solve a business challenge
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

Skills Achieved

At the completion of this course, you should be able to:

  • Prepare data for predictive modeling with Pentaho Data Integration
  • Build, evaluate, and export predictive models using Weka
  • Deploy and refresh predictive models using Pentaho Data Integration
  • Review predictive results in Pentaho Analyzer and Dashboards

This course is taught at a level that is accessible to most technical Pentaho Data Integration users. However, data science and statistics knowledge is required to apply the classroom concepts in real-world scenarios.

DI1000 Pentaho Data Integration Fundamentals is required prior to taking this course. Basic PDI functional knowledge is used throughout the course.

A comprehensive understanding of statistics and data science is needed to apply classroom concepts in real-world scenarios. However, the course is taught at a level that will be accessible to most technical PDI users.

Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.

In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho Documentation site.

  • Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
  • RAM: at least 4GB
  • Hard drive space: at least 2GB for the software, and more for solution and content files
  • Processor: dual-core AMD64 or Intel EM64T
  • USB port

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to GoToTraining. For more information on GoToTraining requirements, see Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at for more information regarding on-site training requirements.

Day 1

Module 1: Introduction to Data Mining

  Lesson 1: Objectives

  Lesson 2: Definition, Tasks, and Processes

  Lesson 3: Overview of Pentaho Tools

Module 2: Data Understanding and Preparation / Engineering

  Lesson 1: Understand the ClearWireless Business and Problem Domain

  Lesson 2: Data Preparation with PDI

      Lab1: Creating the Predictive Dataset

Module 3: Predictive Modeling

  Lesson 1: Knowledge Representation

  • Tables
  • Functions
  • Naive Bayes
  • Trees

  Lesson 2: Building Models in the Weka Explorer

      Lab 2: Using the Explorer

  • Exercise 1: Load Clear Wireless data into Weka Explorer
  • Exercise 2: Review Data Characteristics
  • Exercise 3: Build a Logistic Regression Model to Predict "Added_Item_to_Cart"
  • Exercise 4: Build a Decision Tree to Predict "Added_Item_to_Cart"
  • Exercise 5: Review Results and Save Model

      Lab 3: Building Models in the Weka Knowledge Flow

  • Exercise 1:Develop a Knowledge Flow Process to Build and Save Models

  Lesson 3: Data Preparation Revisited

  • Missing Values
  • Algorithm Specific Data Preparation

Day 2

Module 4: Evaluating Predictive Models

  Lesson 1: Basic Evaluation Metrics

  Lesson 2: Ranking Performance

      Lab 4: Comparing the Ranking Performance of Two Classifiers in the Knowledge Flow

Module 5: Operationalizing Predictive Models

  Lesson 1: Deploying a Model in PDI

      Lab 5: Using the Weka Scoring PDI Step

  • Exercise 1: Importing a Weka Model into PDI
  • Exercise 2: Scoring Data with Weka Scoring

  Lesson 2: Refreshing / Rebuilding a Model in PDI

      Lab 6: Using the Knowledge Flow PDI Step

  • Exercise 1: Designing the Knowledge Flow Process and Configuring the Step
  • Exercise 2: Orchestrating Model Building and Scoring Transformations from a PDI Job

  Lesson 3: Viewing the Predictive Results in the Pentaho User Console

      Lab 7: Creating Visualizations for Viewing Predictive Results

  • Exercise 1: Using Pentaho Analyzer to Drill, Pivot, and Chart Predictive Data
  • Exercise 2: Using Pentaho Dashboard to Display Predictive Data

Onsite Training

For groups of six or more

Request Quote

Public Training

What Our Clients Are Saying

Very good instruction on a complex topic

- Pentaho Corporation

Predictive Analytics with PDI and Weka Ratings

Averaged from 29 responses.

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?