SRI NARMADA IT TRAINING: Hadoop

Hadoop

Hadooop
·      What is Big Data?
·      What is Hadoop?
·      Relation between Big Data and Hadoop?
·      What is the need of going ahead with Hadoop?
·      Scenarios to apt Hadoop Technology in REAL TIME Projects
·      Challenges with Big Data
Ø Storage
Ø processing
·      How Hadoop is addressing Big Data Changes
·      Comparison with Other Technologies
Ø RDBMS
Ø Data Warehouse
Ø TeraData
·      Different Components of Hadoop Echo System
Ø Storage Components
Ø Processing Components
HDFS(Hadoop Distributed File System)
·      What is a cluster Environment?
·      Cluster Vs Hadoop Cluster
·      Significance of HDFS in Hadoop
·      Features of HDFS
·      Storage aspects of HDFS
Ø Block
Ø How to Configure Block size
Ø Default Vs Configurable Block size
Ø Why HDFS Block size so large?
Ø Design Principles of Block Size
·      HDFS Architecture - 5 Daemons of Hadoop
Ø NameNode and its functionality
Ø DataNode and its functionality
Ø JobTracker and its functionality
Ø TaskTracker and its functionality
Ø Secondary Name Node and its functionality
·      Replication in Hadoop - Fail Over Mechanism
Ø Data Storage in Data nodes
Ø Fail Over Mechanism in Hadoop - Replication
Ø Replication Configuration
Ø Custom Replication
Ø Design Constrains with Replication Factor
·      Accessing HDFS
Ø CLI (Command Line Interface) and HDFS Commands
Ø JAVA Based Approach
·      Hadoop Archives

MapReduce
·      Why Map Reduce is essential in Hadoop?
·      Processing daemons of Hadoop
Ø Job  Tracker
§  Roles of Job Tracker
§  Drawbacks w.r.t Tracker failure in Hadoop Cluster
§  How to configure Job tracker in Hadoop Cluster
Ø Task Tracker
§  Roles of Task Tracker
§  Drawbacks w.r.t Task Tracker failure in Hadoop Cluster
·      Input Split
Ø InputSplit
Ø Need Of Input Split in Map Reduce
Ø InputSplit Size
Ø InputSplit Size Vs Block Size
Ø InputSplit Vs Mappers
·      Map Reduce Life Cycle
Ø Communication Mechanism of Job Tracker & Task Tracker
Ø Input Format Class
Ø Record Reader Class
Ø Success Case Scenarios
Ø Failure Case Scenarios
Ø Retry Mechanism in the Map Reduce
·      MapReduce Programming Model
Ø Different places of Map Reduce Algorithm
Ø Different Data types in Map Reduce
§  Primitive Data types Vs Map Reduce Data types
Ø How to write a basic Map Reduce Program
§  Driver Code
§  Mapper Code
§  Reducer Code
Ø Driver Code
§  Importance of Driver Code in a Map Reduce Program
§  How to identify the Driver code in Map Reduce Program
§  Different sections of Driver code
Ø Mapper Code
§  Importance of Mapper Phase in Map Reduce
§  How to Write a Mapper Class?
§  Methods in Mapper Class
Ø Reducer Code
§  Importance of Reduce phase in Map Reduce
§  How to Write Reducer Class?
§  Methods in Reducer Class
Ø IDENTITY MAPPER & IDENTITY REDUCER
Ø Input Format's in Map Reduce
§  TextInputFormat
§  KeyValueTextInputFormat
§  NLineInputFormat
§  DBInputFormat
§  SequenceFileInputFormat.
§  How to use the specific input format in map Reduce
Ø Output Format's in Map Reduce
§  TextOutputFormat
§  KeyValueTextOutputFormat
§  NLineOutputFormat
§  DBOutputFormat
§  SequenceFileOutputFormat
§  How To use the specific Output format in Map Reduce
Ø Map Reduce API(Application Programming Interface)
§  New API
§  Deprecated API
Ø Combiner in Map Reduce
§  Is combiner mandate in Map Reduce
§  How to use the combiner class in Map Reduce
§  performance tradeoffs w.r.t Combiner
Ø Partitioner in Map Reduce
§  importance of practitioner class in Map Reduce
§  How to use the partitioner class in Map Reduce
§  hashPartitioner functionality
§  How to write a custom partitioner
Ø Compression Techniques in Map Reduce
§  Importance of Compression in Map Reduce
§  What is CODEC
§  Compression Types
§  GzipCodec
§  BzipCodec
§  LZOCodec
§  SnappuCodec
§  Configurations  w.r.t  Compression Techniques
§  How to customize the Compression per one job Vs all the job
Ø Joins - in Map Reduce
§  Map Side Join
§  Performance Trade Off
§  Distributed cache
Ø How to debug Mapreduce Jobs in Local and Pseudo cluster Mode.
Ø Introduction to MapReduce Streaming
Ø Data localization in Map Reduce
§  Secondary Sorting Using Map Reduce
Apache PIG
·      Introduction to Apache Pig
·      Map Reduce Vs Apache Pig
·      SQL Vs Apache Pig
·      Different data types in pig
·      Modes Of Execution in Pig
Ø Local Mode
Ø Map Reduce OR Distributed Mode
·      Execution Mechanism
Ø Grunt Shell
Ø Script
·      Embedded
·      Transformations in Pig
·      How to write a simple pig script
·      How to develop the Complex Pig Script
·      Bags, Tuples and fields in PIG
·      UDFs in Pig
Ø Need of using UDFs in PIG
Ø How to use UDFs
Ø REGISTER key word in PIG
·      When to use Map Reduce & Apache PIG in REAL TIME Projects
HIVE
·      Hive Introduction
·      Need of Apache HIVE in Hadoop
·      Hive Architecture
Ø Driver
Ø Compiler
Ø Executor(Semantic Analyzer)
·      Meta Store in Hive
Ø Importance Of Hive Meta Store
Ø Embedded metastore configuration
Ø External metastore configuration
Ø Commmunication mechanism with Metastore
·      Hive Integration with Hadoop
·      Hive Query Language(Hive QL)
·      Configuring Hive with MYSQL MetaStore
·      SQL VS Hive QL
·      Data Slicing Mechanisms
Ø Partitions In Hive
Ø Buckets In Hive
Ø Partitioning Vs Bucketing
Ø Real Time Use Cases
·      Collection Data Types in Hive
Ø Array
Ø Struct
Ø Map
·      User Defined Functions(UDFs) in HIVE
Ø UDFs
Ø UDAFs
Ø UDTFs
Ø Need of UDFs in HIVE
·      Hive Serializer/Deserializer - SerDe
·      HIVE - HBASE Integration
SQOOP
·      Introduction to Sqoop.
·      MySQL client and Server Installation
·      How to connect to Relational Database using Sqoop
·      Different flavors of Imports
§  Different flavors of Imports
§  Export
§  Hive-Imports
HBase
·      Hbase introduction
·      HDFS Vs HBase
·      HBase Usecases
·      Hbase basics
Ø Column Architecture
Ø Scans
·      HBase Architecture
·      Clients
Ø REST
Ø Thrift
Ø Java Based
Ø Avro
·      Map Reduce Integration
·      Map Reduce overHBase
·      HBase Admin
Ø Schema Definition
Ø Basic CRUD Operation
Flume
·      Flume Introduction
·      Flume Architecture
·      Flume Master, Flume Collector and Flume Agent
·      Flume Configurations
·      Real Time Use Case using Apache Flume

Oozie
·      Oozie Introduction
·      Oozie Architecture
·      Oozie Configuration Files
·      Oozie Job Submission
Ø Workflow.xml
Ø Coordinator.xml
Ø job.coordinator.properties
YARN(Yet Resource Negotiator) - Next Gen. Map Reduce
·      What is YARN?
·      YARN Architecture
Ø Resource Manager
Ø Application Master
Ø Node Manager
·      When should we go ahead with YARN
·      Classic Map Reduce Vs YARN Map Reduce
·      Different Configuration Files for YARN
Impala
·      What is Impala?
·      How  can we use Impala for Query Processing?
·      When should we go ahead with Impala
·      HIVE Vs Impala
·      REAL TIME Use Cases with Impala
MongoDB(As part of NoSQL Databases)
·      Need of NoSQL Databases
·      Relational Vs Non -Relational Databases
·      Introduction to MongoDB
·      Features of MongoDB
·      Installation of MongoDB
·      Mongo DB Basic operations
Mahout(As a part of BIGDATA ANALYTICS)
·      Introduction to Machine Learning (ML) Languages
·      Types of Machine Learning
·      Introduction to Apache MAHOUT
·      Categories of Mahout Algorithms
·      Real Time Use case using Classifier Algorithm of Mahout -Navies Bayes
Introduction to Scala
Hadoop Adminstartion
·      Hadoop Single Node Cluster Set Up(Hands on installation on Laptops)
Ø Operating System Installation
Ø JDK Installation
Ø SSH Configuration
Ø Dedicated Group & User Creation
Ø Hadoop Installation
Ø Different Configuration Flies Setting
Ø Name node format
Ø Starting the Hadoop Daemons
Multi Node Hadoop Cluster Set Up
Ø Network related settings
Ø Hosts Configuration
Ø Password less SSH Communication
Ø Hadoop Installation
Ø configuration Files Setting
Ø Name Node Format
Ø Starting the Hadoop Daemons
·      PIG Installation (Hands on installation on Laptops)
Ø Local MODE
Ø Clustered Mode
Ø Bashrc file configuration
·      SQOOP Installation(Hands on installation on Laptops )
Ø Sqoop installation with MYSQL client
·      HIVE Installation (Hands on installation on Laptops)
Ø Local Mode
Ø Clustered Mode
·      Hbase Installation (Hands on Installation on Laptops)
Ø Local Mode
Ø Clustered Mode
·      OOZIE Installation(Hands on Installation on Laptops)
·      Mongo DB Installation (Hands on Installation on Laptops)
·      Commissioning Of Nodes In Hadeep Cluster
·      Decommissioning Of Nodes from Hadoop Cluster

1 comment:

  1. It'sVery informative blog and useful article thank you for sharing with us , keep posting learn more
    Hadoop admin Online Course Hyderabad

    ReplyDelete