· Introduction
· The Motivation
for Hadoop
o
Problems with traditional
large-scale systems
o
Requirements for a new approach
· Hadoop: Basic
Concepts
o
An Overview of Hadoop
o
The Hadoop Distributed File System
o
Hands-On Exercise
o
How Map Reduce Works
o
Anatomy of a Hadoop Cluster
o
Other Hadoop Ecosystem Components
· Writing a Map Reduce
Program
o
The Map Reduce Flow
o
Examining a Sample Map Reduce
Program
o
Basic Map Reduce API Concepts
o
The Driver Code
o
The Mapper
o
The Reducer
o
Hadoop’s Streaming API
o
Using Eclipse for Rapid Development
o
Hands-on exercise
o
The New Map Reduce API
· Integrating
Hadoop Into The Workflow
o
Relational Database Management
Systems
o
Storage Systems
o
Importing Data from RDBMSs With
Sqoop
o
Hands-on exercise
o
Importing Real-Time Data with Flume
o
Accessing HDFS Using Fussed and
Hoop
· Delving Deeper
Into The Hadoop API
o
More about Tool Runner
o
Testing with MR Unit
o
Reducing Intermediate Data With
Combiners
o
The configure and close methods for
Map/Reduce Setup and Teardown
o
Writing Practitioners for Better
Load Balancing
o
Hands-On Exercise
o
Directly Accessing HDFS
o
Using the Distributed Cache
· Common Map Reduce
Algorithms
o
Sorting and Searching
o
Indexing
o
Machine Learning With Mahout
o
Term Frequency – Inverse Document
Frequency
o
Word Co-Occurrence
o
Hands-On Exercise
· Using Hive and
Pig
o
Hive Basics
o
Pig Basics
o
Hands-on exercise
· Practical
Development Tips and Techniques
o
Debugging Map Reduce Code
o
Using Local Job Runner Mode For
Easier Debugging
o
Retrieving Job Information with
Counters
o
Logging
o
Split table File Formats
o
Determining the Optimal Number of
Reducers
o
Map-Only Map Reduce Jobs
o
Hands-On Exercise
· More Advanced Map
Reduce Programming
o
Custom Writable and Writable Comparables
o
Saving Binary Data using Sequence Files
and Avro Files
o
Creating Input Formats and Output Formats
o
Hands-On Exercise
· Joining Data Sets
in Map Reduce
o
Map-Side Joins
o
The Secondary Sort
o
Reduce-Side Joins
· Graph
Manipulation in Hadoop
o
Introduction to graph techniques
o
Representing graphs in Hadoop
o
Implementing a sample algorithm:
Single Source Shortest Path
· Creating
Workflows With Oozie
o
The Motivation for Oozie
o
Oozie’s Workflow Definition Format
o Hands-On Exercise