Pentaho Big Data Test Drive Workshop
Many companies are turning to Hadoop, NoSQL, and Analytical databases to tackle challenges related to the increased variety, velocity, and volume of information. The Pentaho Big Data Test Drive Workshop teaches you how to use Pentaho with these new big data technologies by implementing four high impact big data use cases through a combination of instructor-led presentations, demonstrations, and hands-on exercises. In this class you will use Pentaho to extract, prepare, and blend data from disparate sources in order to derive insights using visualizations and analytics.
You will gain hands-on experience for the following four Big Data use cases:
1. Fill the Data Lake - Use Pentaho to onboard new data sources into Hadoop. As organizations scale data onboarding from just a few sources to hundreds or more, data engineering time and resources can be monopolized. The process typically involves creating hundreds of hard-coded data movement procedures in a practice that is often highly manual and error-prone. Practice ways to fill the data lake with Pentaho by ingesting multiple data sources. Simplify onboarding this data using Pentaho's proprietary metadata injection methodology.
2. Create a Data Refinery - Use Pentaho to process data at scale in Hadoop. Take your data lake to the next step by creating a data refinery using MapReduce, Impala, and Spark to streamline the data process and delivery. The data refinery becomes the landing and processing zone for data from many diverse sources before it is pushed downstream to an analytical database for rapid queries. When this is done, ETL and data management cost savings are scaled up, and big data becomes and essential part of the analytics process. Engineer data on Hadoop and Spark by processing, blending, and aggregating data for business.
3. Self-Service Data Preparation - Use Pentaho to prepare data for the business using Impala. Too many analysts are stuck with data preparation processes that rely on coding or scripting by a data engineer. Let teams do more with existing resources and skills by empowering a broad set of analysts to prepare the data they need in a self-service fashion without waiting on IT - but within the boundaries defined by IT. Prepare data with Impala to blend, bin, and visualize data from multiple sources.
4. Self-Service Analytics - Use Pentaho to analyze and report a 360° view with HBase, RDBMS, and Impala. The 360° view blends a variety of operational and transactional data sources to create and on-demand analytical view across customer touch points. It also includes providing customer-facing employees and partners with information made available inside everyday line-of-business applications. Develop a 360° view using Impala on a variety of data from HBase and RDBMS and visualize within Pentaho.
|9:00am- 9:30am||Registration and Breakfast|
|9:30am-10:15am||Welcome and Introduction|
|10:15am-12:00pm||Use Case 1: Fill the Data Lake
|1:00pm-2:30pm||Use Case 2: Create a Data Refinery|
|2:45pm-3:30pm||Use Case 3: Self-Service Data Preparation
|3:30pm-4:15pm||Use Case 4: Self-Service Analytics|
|4:15pm-4:30pm||Summary and Q&A|
During this hands-on training, you will:
- Review overall Pentaho platform and Hadoop architecture to set the foundation for a Pentaho-based big data solution.
- Cover Pentaho blueprints on how to use Pentaho to address four big data use cases with platform components including Pentaho Data Integration (PDI) to integrate big data sources, Pentaho Analyzer to analyze big data with Impala, and Pentaho Report Designer and Dashboard Designer to build and design reports and dashboards.
- Learn four high-value big data use cases - Fill the Data Lake, Create a Data Refinery, Self-Service Data Preparation, and Self-Service Analytics.
Attendees should have:
- Overall vision of potential solution at a high level
- Experience in database design and development, SQL programming, and reporting
- Prior experience with ETL and reporting tools recommended
- Basic understanding of different big data technologies such as MapReduce, Spark, Kafka, PostGres, and HBase recommended.
Module 1: Fill the Data Lake
Use PDI to ingest files sources to Hadoop leveraging metadata injection
Use PDI with Kafka to ingest IoT data
Module 2: Create a Data Refinery
Use PDI to build Visual MapReduce jobs to transform and blend data
Use PDI to orchestrate MapReduce and Spark jobs
Module 3: Self Service Data Preparation
Use Impala to blend IoT data and visualize in PDI
Module 4: Self Service Data Analysis
Use Analyzer to create an analysis of data delivered by PDI to Impala and Postgres
Analyze HBase data with Analyzer by using PDI to bring into Postgres
Analyze HBase and Postgres data with Analyzer leveraging PDI Data Services