• +91 9988773176
  • office@simbacourse.com

Syllabus

Hadoop (Big Data) Ecosystem

  • Motivation for Hadoop
  • Different types of projects by Apache
  • Role of projects in the Hadoop Ecosystem
  • Key technology foundations required for Big Data
  • Limitations and Solutions of existing Data Analytics Architecture
  • Comparison of traditional data management systems with Big Data management systems
  • Evaluate key framework requirements for Big Data analytics
  • Hadoop Ecosystem & Hadoop 2.x core components
  • Explain the relevance of real-time data
  • Explain how to use big and real-time data as a Business planning tool

Building Blocks

  • Quick tour of Java
  • Quick tour of Linux commands
  • Introduction to Cloudera VM/Cloudera Manager(Apache Ambari)/Download & usage instruction

Hadoop Cluster -Architecutre - Configuration files

  • Hadoop Master-Slave Architecture
  • The Hadoop Distributed File System - data storage
  • Explain different types of cluster setups(Fully distributed/Pseudo etc)
  • Hadoop Cluster setup - Installation
  • Hadoop 2.x Cluster Architecture
  • A Typical enterprise cluster – Hadoop Cluster Modes

Hadoop Core Components - HDFS & Map Reduce(YARN)

  • HDFS Overview & Data storage in HDFS
  • Get the data into Hadoop from local machine (Data Loading Techniques) - vice versa
  • MapReduce Overview (Traditional way Vs. MapReduce way)
  • HDFS Overview & Data storage in HDFS
  • Concept of Mapper & Reducer
  • Understanding MapReduce program skeleton
  • Running MapReduce job in Command line/Eclipse
  • Develop MapReduce Program in JAVA
  • Develop MapReduce Program with the streaming API
  • Test and debug a MapReduce program in the design time
  • How Partitioners and Reducers Work Together
  • Writing Customer Partitioners Data Input and Output
  • Creating Custom Writable and Writable Comparable Implementations

Data Integration using Sqoop, Flume & Talend

  • Integrating Hadoop into an existing Enterprise
  • Loading Data from an RDBMS into HDFS by Using Sqoop
  • Managing Real-Time Data Using Flume
  • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS
  • Introduction to Talend (community system)
  • Data loading to HDFS using Talend

Data Analysis using PIG

  • Introduction to Hadoop Data Analysis Tools
  • Introduction to PIG - MapReduce Vs Pig, Pig Use Cases
  • Pig Latin Program & Execution
  • Use Pig to automate the design and implementation of MapReduce applications
  • Data Analysis using PIG

Data Analysis using HIVE

  • Introduction to Hive - Hive Vs. PIG - Hive Use Cases
  • Discuss the Hive data storage principle
  • Explain the File formats and Records formats supported by the Hive environment
  • Perform operations with data in Hive
  • Hive QL: Joining Tables, Dynamic Partitioning, Custom MapReduce Scripts
  • Hive Script, Hive UDF

Data Analysis using Impala

  • Introduction to Impala & Architecture
  • How Impala executes Queries and its importance
  • Hive vs. PIG vs. Impala
  • Extending impala with User Defined functions
  • Improving impala Performance

NoSql Database - HBASE

  • Introduction to NoSQL Databases and Hbase
  • HBase v/s RDBMS, HBase Components, HBase Architecture
  • HBase Cluster Deployment

Hadoop - Other Analytics Tools

  • Introduction to the role of R in Hadoop Ecosystem
  • Introduction to Jasper Reports & creating reports by integrating with Hadoop
  • Role of Kafka & Avro in real projects