Predictive Analytics with PDI and Weka
Predictive Analytics with PDI and Weka (DI3000)
This course is designed to introduce you to the predictive analytics capabilities of Pentaho Data Integration and Weka. Building upon Pentaho Data Integration Fundamentals, you will learn how Pentaho tools assist you with:
- Data Understanding
- Data Preparation / Engineering
- Predictive Modeling
- Deployment and Operationalization
This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.
- Improve productivity by giving your data science team the skills they need to succeed with Pentaho Data Integration and Weka
- Apply predictive modeling in a real-world scenario to solve a business challenge
- Interactive, hands-on training materials significantly improve skill development and maximize retention
At the completion of this course, you should be able to:
- Prepare data for predictive modeling with Pentaho Data Integration
- Build, evaluate, and export predictive models using Weka
- Deploy and refresh predictive models using Pentaho Data Integration
- Review predictive results in Pentaho Analyzer and Dashboards
DI1000 Pentaho Data Integration Fundamentals is required prior to taking this course. Basic PDI functional knowledge is used throughout the course.
A comprehensive understanding of statistics and data science is needed to apply classroom concepts in real-world scenarios. However, the course is taught at a level that will be accessible to most technical PDI users.
Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.
In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho Documentation site.
- Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
- RAM: at least 4GB
- Hard drive space: at least 2GB for the software, and more for solution and content files
- Processor: dual-core AMD64 or Intel EM64T
- USB port
Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to GoToTraining. For more information on GoToTraining requirements, see http://www.gotomeeting.com/online/training. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.
For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.
Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at firstname.lastname@example.org for more information regarding on-site training requirements.
Module 1: Introduction to Data Mining
Lesson 1: Objectives
Lesson 2: Definition, Tasks, and Processes
Lesson 3: Overview of Pentaho Tools
Module 2: Data Understanding and Preparation / Engineering
Lesson 1: Understand the ClearWireless Business and Problem Domain
Lesson 2: Data Preparation with PDI
Lab1: Creating the Predictive Dataset
Module 3: Predictive Modeling
Lesson 1: Knowledge Representation
- Naive Bayes
Lesson 2: Building Models in the Weka Explorer
Lab 2: Using the Explorer
- Exercise 1: Load Clear Wireless data into Weka Explorer
- Exercise 2: Review Data Characteristics
- Exercise 3: Build a Logistic Regression Model to Predict "Added_Item_to_Cart"
- Exercise 4: Build a Decision Tree to Predict "Added_Item_to_Cart"
- Exercise 5: Review Results and Save Model
Lab 3: Building Models in the Weka Knowledge Flow
- Exercise 1:Develop a Knowledge Flow Process to Build and Save Models
Lesson 3: Data Preparation Revisited
- Missing Values
- Algorithm Specific Data Preparation
Module 4: Evaluating Predictive Models
Lesson 1: Basic Evaluation Metrics
Lesson 2: Ranking Performance
Lab 4: Comparing the Ranking Performance of Two Classifiers in the Knowledge Flow
Module 5: Operationalizing Predictive Models
Lesson 1: Deploying a Model in PDI
Lab 5: Using the Weka Scoring PDI Step
- Exercise 1: Importing a Weka Model into PDI
- Exercise 2: Scoring Data with Weka Scoring
Lesson 2: Refreshing / Rebuilding a Model in PDI
Lab 6: Using the Knowledge Flow PDI Step
- Exercise 1: Designing the Knowledge Flow Process and Configuring the Step
- Exercise 2: Orchestrating Model Building and Scoring Transformations from a PDI Job
Lesson 3: Viewing the Predictive Results in the Pentaho User Console
Lab 7: Creating Visualizations for Viewing Predictive Results
- Exercise 1: Using Pentaho Analyzer to Drill, Pivot, and Chart Predictive Data
- Exercise 2: Using Pentaho Dashboard to Display Predictive Data