Pentaho and Hadoop Framework Fundamentals

Training Course

This course is designed to introduce you to various big data concepts with the Hadoop framework of technologies and Pentaho products. Building upon Pentaho Data Integration Fundamentals, you will learn how Pentaho works with the following Hadoop Framework technologies:

  • HDFS
  • Sqoop
  • Pig
  • Oozie
  • MapReduce
  • YARN
  • Hive
  • Impala
  • HBase
  • Flume
  • Spark

This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.

Back to Courses

Description

Id: DI2000
Level: Advanced
Audience: Data Analyst
Delivery Method: Instructor-led online, Private on-site
Duration: 2 Day(s)
Cost: $1,350.00 USD
Credits: 2
Category: Pentaho Data Integration

Duration

2 Days

Upcoming Classes

Online

Instructor-led online training

Location Dec 2016 Jan 2017 Feb 2017 Mar 2017 Apr 2017
Online - EMEA Dec 19 – Dec 20
Online Jan 24 – Jan 25
Mar 7 – Mar 8

Class dates in bold are guaranteed to run!

Course Benefits

  • Improve productivity by giving your data integration team the skills they need to use Pentaho Data Integration with Hadoop data sources
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

Skills Achieved

At the completion of this course, you should be able to:

  • Use Hadoop technologies from the native command line and with Pentaho Data Integration
  • Employ data ingestion and processing best practices

This course is for experienced Pentaho Data Integration users that want to learn how PDI works with a wide variety of Hadoop Framework technologies. The content of this course is advanced and very technical.

DI1000 Pentaho Data Integration Fundamentals is required prior to taking this course. Basic PDI functional knowledge is used throughout this course.

Some basic knowledge of the Linux operating system is required.

Prior exposure to Hadoop concepts is not required but is beneficial.

Students attending classroom courses in the United States are provided with a PC to use during class. Students attending courses outside the US should contact the Authorized Training Provider regarding PC requirements for Pentaho courses.

In general, if your training provider requires you to bring a PC to class, it must meet the following requirements. You can also verify your system against the Compatibility Matrix: List of Supported Products topic in the Pentaho Documentation site.

  • Windows XP, 7 desktop operating system (for Macintosh support, please contact your Customer Success Manager)
  • RAM: at least 4GB
  • Hard drive space: at least 2GB for the software, and more for solution and content files
  • Processor: dual-core AMD64 or Intel EM64T
  • USB port

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to GoToTraining. For more information on GoToTraining requirements, see http://www.gotomeeting.com/online/training. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at training@pentaho.com for more information regarding on-site training requirements.

Day 1

Module 1: Course Agenda and Structure

 

Module 2: Introduction to Pentaho and Big Data

      Exercise 1: Using the Virtual Exercise Environment


Module 3: Big Data Solutions Architectures

 Lesson 1: Batch Processing Architecture

 Lesson 2: Real-Time and Stream Processing Architecture

 Lesson 3: Mixed Batch and Real-Time Processing Architecture


Module 4: Hadoop and HDFS

  Lesson 1: Basics of HDFS

  Lesson 2: Working with HDFS in PDI

      Exercise 2: Reading and Writing Data with PDI and HDFS

  Lesson 3: HDFS and PDI Best Practices


Module 5: Hadoop Data Ingestion Tools

  Lesson 1: Apache Flume

  Lesson 2: Apache Sqoop

  Lesson 3: Ingestion Best Practices


Module 6: Data Processing in Hadoop using Map Reduce

  Lesson 1: Understanding Hadoop MapReduce

  Lesson 2: MapReduce with Pentaho Data Integration

      Exercise 3: Using Pentaho MapReduce

  Lesson 3: MapReduce Best Practices


Module 7: Data Processing in Hadoop using Carte/YARN

  Lesson 1: YARN Architecture

  Lesson 2: MapReduce2 on YARN

  Lesson 3: PDI/Carte on YARN

Day 2

Module 8: Data Processing with Pig

  Lesson 1: Pig Basics

  Lesson 2: Using Pig in Data Integration


Module 9: Job Orchestration with PDI and Oozie

  Lesson 1: Oozie Basics

  Lesson 2: Oozie with PDI


Module 10: Overview of SQL on Hadoop - Best Practices

  Lesson 1: Hive Basics

  Lesson 2: Impala Basics

  Lesson 3: Using Hive / Impala with PDI

      Exercise 4: Working with Hive and Impala

  Lesson 4: Hive Best Practices


Module 11: Overview of HBase

  Lesson 1: HBase Basics

  Lesson 2: HBase with PDI

  Lesson 3: Using HBase with PDI MapReduce

      Exercise 5: Working with HBase

  Lesson 4: HBase and PDI Best Practices


Module 12: Overview of Spark

  Lesson 1: Spark Basics

  Lesson 2: Spark SQL

  Lesson 3: Spark Streams

  Lesson 4: Spark MLlib and SparkR

  Lesson 5: Spark GraphX

  Lesson 6: Spark with PDI


Module 13: Reporting on Big Data

  Lesson 1: Pentaho Report Designer with Hadoop

  Lesson 2: Analyzer with Hadoop


Module 14: (Optional) PDI with Amazon Hadoop

Onsite Training

For groups of six or more

Request Quote

Public Training

Online - EMEA

Online

Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!


Don't see a date that works for you?

Request Class

Pentaho and Hadoop Framework Fundamentals Ratings

Averaged from 55 responses.

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?