SRI NARMADA IT TRAINING: Hadoop

Hadooop

· What is Big Data?

· What is Hadoop?

· Relation between Big Data and Hadoop?

· What is the need of going ahead with Hadoop?

· Scenarios to apt Hadoop Technology in REAL TIME Projects

· Challenges with Big Data

Ø Storage

Ø processing

· How Hadoop is addressing Big Data Changes

· Comparison with Other Technologies

Ø RDBMS

Ø Data Warehouse

Ø TeraData

· Different Components of Hadoop Echo System

Ø Storage Components

Ø Processing Components

HDFS(Hadoop Distributed File System)

· What is a cluster Environment?

· Cluster Vs Hadoop Cluster

· Significance of HDFS in Hadoop

· Features of HDFS

· Storage aspects of HDFS

Ø Block

Ø How to Configure Block size

Ø Default Vs Configurable Block size

Ø Why HDFS Block size so large?

Ø Design Principles of Block Size

· HDFS Architecture - 5 Daemons of Hadoop

Ø NameNode and its functionality

Ø DataNode and its functionality

Ø JobTracker and its functionality

Ø TaskTracker and its functionality

Ø Secondary Name Node and its functionality

· Replication in Hadoop - Fail Over Mechanism

Ø Data Storage in Data nodes

Ø Fail Over Mechanism in Hadoop - Replication

Ø Replication Configuration

Ø Custom Replication

Ø Design Constrains with Replication Factor

· Accessing HDFS

Ø CLI (Command Line Interface) and HDFS Commands

Ø JAVA Based Approach

· Hadoop Archives

MapReduce

· Why Map Reduce is essential in Hadoop?

· Processing daemons of Hadoop

Ø Job Tracker

§ Roles of Job Tracker

§ Drawbacks w.r.t Tracker failure in Hadoop Cluster

§ How to configure Job tracker in Hadoop Cluster

Ø Task Tracker

§ Roles of Task Tracker

§ Drawbacks w.r.t Task Tracker failure in Hadoop Cluster

· Input Split

Ø InputSplit

Ø Need Of Input Split in Map Reduce

Ø InputSplit Size

Ø InputSplit Size Vs Block Size

Ø InputSplit Vs Mappers

· Map Reduce Life Cycle

Ø Communication Mechanism of Job Tracker & Task Tracker

Ø Input Format Class

Ø Record Reader Class

Ø Success Case Scenarios

Ø Failure Case Scenarios

Ø Retry Mechanism in the Map Reduce

· MapReduce Programming Model

Ø Different places of Map Reduce Algorithm

Ø Different Data types in Map Reduce

§ Primitive Data types Vs Map Reduce Data types

Ø How to write a basic Map Reduce Program

§ Driver Code

§ Mapper Code

§ Reducer Code

Ø Driver Code

§ Importance of Driver Code in a Map Reduce Program

§ How to identify the Driver code in Map Reduce Program

§ Different sections of Driver code

Ø Mapper Code

§ Importance of Mapper Phase in Map Reduce

§ How to Write a Mapper Class?

§ Methods in Mapper Class

Ø Reducer Code

§ Importance of Reduce phase in Map Reduce

§ How to Write Reducer Class?

§ Methods in Reducer Class

Ø IDENTITY MAPPER & IDENTITY REDUCER

Ø Input Format's in Map Reduce

§ TextInputFormat

§ KeyValueTextInputFormat

§ NLineInputFormat

§ DBInputFormat

§ SequenceFileInputFormat.

§ How to use the specific input format in map Reduce

Ø Output Format's in Map Reduce

§ TextOutputFormat

§ KeyValueTextOutputFormat

§ NLineOutputFormat

§ DBOutputFormat

§ SequenceFileOutputFormat

§ How To use the specific Output format in Map Reduce

Ø Map Reduce API(Application Programming Interface)

§ New API

§ Deprecated API

Ø Combiner in Map Reduce

§ Is combiner mandate in Map Reduce

§ How to use the combiner class in Map Reduce

§ performance tradeoffs w.r.t Combiner

Ø Partitioner in Map Reduce

§ importance of practitioner class in Map Reduce

§ How to use the partitioner class in Map Reduce

§ hashPartitioner functionality

§ How to write a custom partitioner

Ø Compression Techniques in Map Reduce

§ Importance of Compression in Map Reduce

§ What is CODEC

§ Compression Types

§ GzipCodec

§ BzipCodec

§ LZOCodec

§ SnappuCodec

§ Configurations w.r.t Compression Techniques

§ How to customize the Compression per one job Vs all the job

Ø Joins - in Map Reduce

§ Map Side Join

§ Performance Trade Off

§ Distributed cache

Ø How to debug Mapreduce Jobs in Local and Pseudo cluster Mode.

Ø Introduction to MapReduce Streaming

Ø Data localization in Map Reduce

§ Secondary Sorting Using Map Reduce

Apache PIG

· Introduction to Apache Pig

· Map Reduce Vs Apache Pig

· SQL Vs Apache Pig

· Different data types in pig

· Modes Of Execution in Pig

Ø Local Mode

Ø Map Reduce OR Distributed Mode

· Execution Mechanism

Ø Grunt Shell

Ø Script

· Embedded

· Transformations in Pig

· How to write a simple pig script

· How to develop the Complex Pig Script

· Bags, Tuples and fields in PIG

· UDFs in Pig

Ø Need of using UDFs in PIG

Ø How to use UDFs

Ø REGISTER key word in PIG

· When to use Map Reduce & Apache PIG in REAL TIME Projects

HIVE

· Hive Introduction

· Need of Apache HIVE in Hadoop

· Hive Architecture

Ø Driver

Ø Compiler

Ø Executor(Semantic Analyzer)

· Meta Store in Hive

Ø Importance Of Hive Meta Store

Ø Embedded metastore configuration

Ø External metastore configuration

Ø Commmunication mechanism with Metastore

· Hive Integration with Hadoop

· Hive Query Language(Hive QL)

· Configuring Hive with MYSQL MetaStore

· SQL VS Hive QL

· Data Slicing Mechanisms

Ø Partitions In Hive

Ø Buckets In Hive

Ø Partitioning Vs Bucketing

Ø Real Time Use Cases

· Collection Data Types in Hive

Ø Array

Ø Struct

Ø Map

· User Defined Functions(UDFs) in HIVE

Ø UDFs

Ø UDAFs

Ø UDTFs

Ø Need of UDFs in HIVE

· Hive Serializer/Deserializer - SerDe

· HIVE - HBASE Integration

SQOOP

· Introduction to Sqoop.

· MySQL client and Server Installation

· How to connect to Relational Database using Sqoop

· Different flavors of Imports

§ Different flavors of Imports

§ Export

§ Hive-Imports

HBase

· Hbase introduction

· HDFS Vs HBase

· HBase Usecases

· Hbase basics

Ø Column Architecture

Ø Scans

· HBase Architecture

· Clients

Ø REST

Ø Thrift

Ø Java Based

Ø Avro

· Map Reduce Integration

· Map Reduce overHBase

· HBase Admin

Ø Schema Definition

Ø Basic CRUD Operation

Flume

· Flume Introduction

· Flume Architecture

· Flume Master, Flume Collector and Flume Agent

· Flume Configurations

· Real Time Use Case using Apache Flume

Oozie

· Oozie Introduction

· Oozie Architecture

· Oozie Configuration Files

· Oozie Job Submission

Ø Workflow.xml

Ø Coordinator.xml

Ø job.coordinator.properties

YARN(Yet Resource Negotiator) - Next Gen. Map Reduce

· What is YARN?

· YARN Architecture

Ø Resource Manager

Ø Application Master

Ø Node Manager

· When should we go ahead with YARN

· Classic Map Reduce Vs YARN Map Reduce

· Different Configuration Files for YARN

Impala

· What is Impala?

· How can we use Impala for Query Processing?

· When should we go ahead with Impala

· HIVE Vs Impala

· REAL TIME Use Cases with Impala

MongoDB(As part of NoSQL Databases)

· Need of NoSQL Databases

· Relational Vs Non -Relational Databases

· Introduction to MongoDB

· Features of MongoDB

· Installation of MongoDB

· Mongo DB Basic operations

Mahout(As a part of BIGDATA ANALYTICS)

· Introduction to Machine Learning (ML) Languages

· Types of Machine Learning

· Introduction to Apache MAHOUT

· Categories of Mahout Algorithms

· Real Time Use case using Classifier Algorithm of Mahout -Navies Bayes

Introduction to Scala

Hadoop Adminstartion

· Hadoop Single Node Cluster Set Up(Hands on installation on Laptops)

Ø Operating System Installation

Ø JDK Installation

Ø SSH Configuration

Ø Dedicated Group & User Creation

Ø Hadoop Installation

Ø Different Configuration Flies Setting

Ø Name node format

Ø Starting the Hadoop Daemons

Multi Node Hadoop Cluster Set Up

Ø Network related settings

Ø Hosts Configuration

Ø Password less SSH Communication

Ø Hadoop Installation

Ø configuration Files Setting

Ø Name Node Format

Ø Starting the Hadoop Daemons

· PIG Installation (Hands on installation on Laptops)

Ø Local MODE

Ø Clustered Mode

Ø Bashrc file configuration

· SQOOP Installation(Hands on installation on Laptops )

Ø Sqoop installation with MYSQL client

· HIVE Installation (Hands on installation on Laptops)

Ø Local Mode

Ø Clustered Mode

· Hbase Installation (Hands on Installation on Laptops)

Ø Local Mode

Ø Clustered Mode

· OOZIE Installation(Hands on Installation on Laptops)

· Mongo DB Installation (Hands on Installation on Laptops)

· Commissioning Of Nodes In Hadeep Cluster

· Decommissioning Of Nodes from Hadoop Cluster

SRI NARMADA IT TRAINING

CONTACT FOR DEMO : 7793954674, 9966524295

Hadoop

Hadoop

1 comment: