Pentaho Big Data Test Drive Workshop

Pentaho Big Data Test Drive Workshop

Training Course

Many companies are turning to Hadoop, NoSQL, and Analytical databases to tackle challenges related to the increased variety, velocity, and volume of information.  The Pentaho Big Data Test Drive Workshop teaches you how to use Pentaho with these new big data technologies by implementing four high impact big data use cases through a combination of instructor-led presentations, demonstrations, and hands-on exercises. In this class you will use Pentaho to extract, prepare, and blend data from disparate sources in order to derive insights using visualizations and analytics.

You will gain hands-on experience for the following four Big Data use cases:

1. Fill the Data Lake - Use Pentaho to onboard new data sources into Hadoop.  As organizations scale data onboarding from just a few sources to hundreds or more, data engineering time and resources can be monopolized. The process typically involves creating hundreds of hard-coded data movement procedures in a practice that is often highly manual and error-prone. Practice ways to fill the data lake with Pentaho by ingesting multiple data sources. Simplify onboarding this data using Pentaho's proprietary metadata injection methodology.

2. Create a Data Refinery - Use Pentaho to process data at scale in Hadoop. Take your data lake to the next step by creating a data refinery using MapReduce, Impala, and Spark to streamline the data process and delivery.  The data refinery becomes the landing and processing zone for data from many diverse sources before it is pushed downstream to an analytical database for rapid queries. When this is done, ETL and data management cost savings are scaled up, and big data becomes and essential part of the analytics process.  Engineer data on Hadoop and Spark by processing, blending, and aggregating data for business.

3. Self-Service Data Preparation - Use Pentaho to prepare data for the business using Impala.  Too many analysts are stuck with data preparation processes that rely on coding or scripting by a data engineer. Let teams do more with existing resources and skills by empowering a broad set of analysts to prepare the data they need in a self-service fashion without waiting on IT - but within the boundaries defined by IT. Prepare data with Impala to blend, bin, and visualize data from multiple sources.

4. Self-Service Analytics - Use Pentaho to analyze and report a 360° view with HBase, RDBMS, and Impala.  The 360° view blends a variety of operational and transactional data sources to create and on-demand analytical view across customer touch points. It also includes providing customer-facing employees and partners with information made available inside everyday line-of-business applications. Develop a 360° view using Impala on a variety of data from HBase and RDBMS and visualize within Pentaho.

Training Agenda

9:00am- 9:30am Registration and Breakfast
9:30am-10:15am Welcome and Introduction
10:15am-12:00pm Use Case 1: Fill the Data Lake
12:00pm-1:00pm Hosted Lunch
1:00pm-2:30pm Use Case 2: Create a Data Refinery
2:30pm-2:45pm Break
2:45pm-3:30pm Use Case 3: Self-Service Data Preparation
3:30pm-4:15pm            Use Case 4: Self-Service Analytics
4:15pm-4:30pm Summary and Q&A


Level: Advanced
Audience: Technical architects, Technology leaders and stakeholders, Developers, Project team leaders and members
Duration: 1 Day
Cost: $100.00 USD
Category: Big Data Integration


7 hours

Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.

During this hands-on training, you will:

  • Review overall Pentaho platform and Hadoop architecture to set the foundation for a Pentaho-based big data solution.
  • Cover Pentaho blueprints on how to use Pentaho to address four big data use cases with platform components including Pentaho Data Integration (PDI) to integrate big data sources, Pentaho Analyzer to analyze big data with Impala, and Pentaho Report Designer and Dashboard Designer to build and design reports and dashboards.
  • Learn four high-value big data use cases - Fill the Data Lake, Create a Data Refinery, Self-Service Data Preparation, and Self-Service Analytics.


This hands-on training is intended for:

  • Technical architects
  • Technology leaders and stakeholders
  • Developers
  • Project team leaders and members

Attendees should have:

  • Overall vision of potential solution at a high level
  • Experience in database design and development, SQL programming, and reporting
  • Prior experience with ETL and reporting tools recommended
  • Basic understanding of different big data technologies such as MapReduce, Spark, Kafka, PostGres, and HBase recommended.

Pentaho will provide all hardware and software required to participate in this workshop.  Attendees will be working with Pentaho as well as VMWare Virtual Machine (VM) software with Cloudera and Postgres database sets.

Day 1

Module 1: Fill the Data Lake

Use PDI to ingest files sources to Hadoop leveraging metadata injection
Use PDI with Kafka to ingest IoT data

Module 2: Create a Data Refinery

Use PDI to build Visual MapReduce jobs to transform and blend data
Use PDI to orchestrate MapReduce and Spark jobs

Module 3: Self Service Data Preparation

Use Impala to blend IoT data and visualize in PDI

Module 4: Self Service Data Analysis

Use Analyzer to create an analysis of data delivered by PDI to Impala and Postgres

Analyze HBase data with Analyzer by using PDI to bring into Postgres

Analyze HBase and Postgres data with Analyzer leveraging PDI Data Services

Onsite Training

For groups of six or more

Request Quote

Public Training

Pentaho Big Data Test Drive Workshop Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?