Big Data Hadoop

Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and HDFS

  • What is Big Data?
  • Where does Hadoop fit in?
  • Hadoop Distributed File System
    • Replications
    • Block Size
    • Secondary Namenode
    • High Availability
  • Understanding YARN
    • Resource Manager
    • Node Manager
    • Difference between 1.x and 2.x

Hadoop Installation & setup

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability
  • A Typical Production Cluster setup
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Cloudera Single node cluster

Deep Dive in Mapreduce

  • How Mapreduce Works?
  • How Reducer works?
  • How Driver works?
  • Define Combiners
  • Define Partitioners
  • Define Input Formats & Output Formats
  • Explain Shuffle and Sort
  • Define Mapside Joins
  • How to Reduce Side Joins
  • Explain MRUnit
  • Distributed Cache

Lab exercises

  • Working with HDFS
  • Writing Wordcount Program
  • Writing custom partitioner
  • Mapreduce with Combiner
  • Map Side Join
  • Reduce Side Joins
  • Unit Testing Mapreduce
  • Running Mapreduce in Local Job Runner Mode

Graph Problem Solving

  • What is Graph?
  • How Graph Representation works?
  • Breadth first Search Algorithm
  • Graph Representation of Map Reduce
  • How to work with Graph Algorithm?
  • Example of Graph Map Reduce
  • Detailed understanding of Pig

(AVRO) Data Formats

  • Selecting a File Format
  • Tool Support for File Formats
  • Avro Schemas
  • Using Avro with Hive and Sqoop
  • Avro Schema Evolution
  • Compression

Introduction to Hbase architecture

  • What is Hbase?
  • Where does it fits?
  • What is NOSQL?

Hadoop Cluster Setup and Running Map Reduce Jobs

  • Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
  • Running Map Reduce Jobs on Cluster

Advance Mapreduce

  • Delving Deeper into The Hadoop API
  • More Advanced Map Reduce Programming
  • Joining Data Sets in Map Reduce
  • Graph Manipulation in Hadoop