Big Data Fundamentals

Onsite Training

For groups of six or more

Request Quote

Public Training

Online


Don't see a date that works for you?

Request Class

Training Course

With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Pentaho provides the right set of tools to each user, all within a tightly coupled data integration and analytics platform that supports the entire big data lifecycle. For IT and developers, Pentaho provides a complete, visual design environment to simplify and accelerate data preparation and modeling. For business users, Pentaho provides visualization and exploration of data. And for data analysts and scientists, Pentaho provides full data discovery, exploration and predictive analytics.

Using a combination of instructor-led presentations and hands-on exercises, this course provides an overview of big data technologies (focusing on Hadoop for the first day) and an overview of the Pentaho tools for both working with big data and for visualizing it. This course helps prepare you for the Pentaho Data Integration Certification Exam.

Back to Courses

Description

Id: DI2000
Level: Intermediate
Audience: Data Analyst
Delivery Method: Instructor-led online, Private on-site
Duration: 2 Day(s)
Cost: $1,950.00 USD
Credits: 3
Category: Pentaho Data Integration

The content in DI1100 will be completely covered in this DI2000 Big Data Fundamentals course but you will also receive additional knowledge related to big data concepts, tools and technologies. 

If you complete the DI2000 Big Data Fundamentals course, you should not register for DI1100 Pentaho Big Data Integration.

Duration

2 Days

Upcoming Classes

Online

Instructor-led online training

Location Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014
Online May 6 – May 7

Classes in bold are guaranteed to run!

Skills Achieved

At the completion of this course, you should be able to:

  • Identify the purpose and value of various big data technologies: Hadoop, HDFS, Hive, MapReduce, NoSQL databases, etc.
  • Read and write data using HDFS
  • Orchestrate big data jobs in Pentaho Data Integration
  • Use Pentaho Data Integration (and Pentaho MapReduce) to manipulate big data
  • Read and write data using a NoSQL data source
  • Visualize big data using Pentaho InstaView

This course is a stand-alone course in the Data Analyst learning path. Students who need a comprehensive overview of big data tools and technologies should take this course instead of DI1100 Pentaho Big Data Integration.

The content in DI1100 will be completely covered in this DI2000 Big Data Fundamentals course but you will also receive additional knowledge related to big data concepts, tools and technologies.

If you complete the DI2000 Big Data Fundamentals course, you should not register for DI1100 Pentaho Big Data Integration.

Before taking this class, students should complete course DI1000: Pentaho Data Integration or have equivalent field experience with Pentaho Data Integration. Big data knowledge is helpful but not required. Some basic knowledge of the Linux operating system (CentOS) is required.

Though not a requirement, attendees would benefit from taking Business Analytics User Console (BA1000) prior to taking this class to gain an overview of the Pentaho Business Analytics interface.

Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see www.webex.com.  Students are provided access to a dowloadable virtual machine used to complete the exercises.

Minimum Host/VM/Hardware Requirements are as follows:

Host Machine:

  • CPU: 2 Cores
  • RAM: 8 GB
  • Disk: 30GB Free

Virtual Machine:

  • CPU: 2 Cores
  • RAM: 6 GB
  • Disk: 30 GB

VMware client software (any one of the following products will work)

Windows:

  • VMware Workstation 10.0+ (free for 30 days; this is the preferred option for Windows)
  • VMware Player 6.0+ (free for personal use)

Mac:

  • VMWare Fusion 6.0+ (free for 30 days)

Linux:

  • VMware Workstation 10.0+ (free for 30 days; this is the preferred option for Linux)
  • VMware Player 6.0+ (free)

For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.

Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at training@pentaho.com for more information regarding on-site training requirements.

Day 1

Day 1 Agenda

Pentaho and Big Data Big Data Overview and Architecture

Hadoop, HDFS and Flume

Writing Data to HDFS using Flume

Working with Structured Data

Working with MapReduce

Working with Pentaho MapReduce

Day 2

Day 2 Agenda

Working with Hive

Working with Pentaho InstaView

Reporting on Big Data

Working with NoSQL Databases Job Orchestration Oozie, Pig and Sqoop Transforming Data using Pig