Big Data Training with Spark and Kafka

Duration: 8 weeks (50 Hrs)

Classes only on Weekends

Program Summary:

Session 1

  • Introduction to Big data and Hadoop Ecosystem
  • Why industry needs Bigdata ? Advantages of Bigdata over traditional RDBMS
  • Introduction to Bigdata ecosystems
  • Understanding Data various formats, transformation techniques
  • HDFS, YARN architecture
  • MapReduce

Session 2

  • Understanding Hadoop and Hive
  • HDFS and Hive
  • Hive and datatypes
  • Hive exercises
  • Hive advance features for performance
  • Hive exercises
  • Uses of Hive in real life projects
  • Project -1

Session 3

  • Usages of Shell scripting in Bigdata projects
  • Shell and Hive exercises
  • Projects-2
  • Introduction to Impala
  • Architecture of Impala
  • Impala exercise
  • Usages of Hive and Impala in Real life project
  • Understanding Oozie as scheduler
    • Oozie Coordinator
    • Setup ooze job
  • Project-3

Session 4

  • Introduction to Sqoop
  • Understanding capabilities of Sqoop and underlying MapReduce
  • In class exercises
  • Sqoop in real life projects
  • Project-4
  • Introduction to streaming, new era of data analytics
  • Introduction Kafka
  • Deep drive of Kafka architecture
  • Setup up Kafka for message generation

Session 5

  • Flume architecture
  • Usages of Flume to setup streaming pipeline
  • Exercises on Flume Agent setup
  • Kafka and Flume
  • Project-5
  • Scala from functional perspective
  • Scala features for Bigdata transformations

Session 6

  • Spark, fastest data processing engine in the world
  • Spark architecture
  • Deep drive spark data transformation capabilities
  • Spark SQL with HDFS, Hive and Impala
  • Dealing with various data types JSON, XML, CSV, parquet, text
  • Project-6

Session 7

  • Spark Streaming
  • Setup streaming pipelines with Spark and Kafka
  • Consideration for Zero data loss streaming pipelines
  • Dealing with small file issue and compaction
  • Project-7

Session 8

  • Understanding noSQL database
  • Hbase
  • Spark and Hbase
  • Final Project (Practical Questions and answers/ exercise and final evaluation plus any topic that needs more attention)

Familiarity with:

CORE Java, SQL, Linux

BIG DATA

The training material is developed using real-world use cases that are designed to give students a competitive career advantage.