·       
Introduction
·       
The Motivation
for Hadoop
o  
Problems with traditional
large-scale systems
o  
Requirements for a new approach
 
·       
Hadoop: Basic
Concepts
o  
An Overview of Hadoop
o  
The Hadoop Distributed File System
o  
Hands-On Exercise
o  
How Map Reduce Works
o  
Anatomy of a Hadoop Cluster
o  
Other Hadoop Ecosystem Components
 
 Â·       
Writing a Map Reduce
Program
o  
The Map Reduce Flow
o  
Examining a Sample Map Reduce
Program
o  
Basic Map Reduce API Concepts
o  
The Driver Code
o  
The Mapper
o  
The Reducer
o  
Hadoop’s Streaming API
o  
Using Eclipse for Rapid Development
o  
Hands-on exercise
o  
The New Map Reduce API
 
·       
Integrating
Hadoop Into The Workflow
o  
Relational Database Management
Systems
o  
Storage Systems
o  
Importing Data from RDBMSs With
Sqoop
o  
Hands-on exercise
o  
Importing Real-Time Data with Flume
o  
Accessing HDFS Using Fussed and
Hoop
 
·       
Delving Deeper
Into The Hadoop API
o  
More about Tool Runner
o  
Testing with MR Unit
o  
Reducing Intermediate Data With
Combiners
o  
The configure and close methods for
Map/Reduce Setup and Teardown
o  
Writing Practitioners for Better
Load Balancing
o  
Hands-On Exercise
o  
Directly Accessing HDFS
o  
Using the Distributed Cache
 
·       
Common Map Reduce
Algorithms
o  
Sorting and Searching
o  
Indexing
o  
Machine Learning With Mahout
o  
Term Frequency – Inverse Document
Frequency
o  
Word Co-Occurrence
o  
Hands-On Exercise
 
·       
Using Hive and
Pig
o  
Hive Basics
o  
Pig Basics
o  
Hands-on exercise
 
·       
Practical
Development Tips and Techniques
o  
Debugging Map Reduce Code
o  
Using Local Job Runner Mode For
Easier Debugging
o  
Retrieving Job Information with
Counters
o  
Logging
o  
Split table File Formats
o  
Determining the Optimal Number of
Reducers
o  
Map-Only Map Reduce Jobs
o  
Hands-On Exercise
 
 
 
·       
More Advanced Map
Reduce Programming
o  
Custom Writable and Writable Comparables
o  
Saving Binary Data using Sequence Files
and Avro Files
o  
Creating Input Formats and Output Formats
o  
Hands-On Exercise
 
·       
Joining Data Sets
in Map Reduce
o  
Map-Side Joins
o  
The Secondary Sort
o  
Reduce-Side Joins
 
·       
Graph
Manipulation in Hadoop
o  
Introduction to graph techniques
o  
Representing graphs in Hadoop
o  
Implementing a sample algorithm:
Single Source Shortest Path
 
·       
Creating
Workflows With Oozie
o  
The Motivation for Oozie
o  
Oozie’s Workflow Definition Format
o   Hands-On Exercise