Big Data Training with Spark and Kafka

Duration: 8 weeks

Classes only on Weekends

Program Summary:

Session 1

  • Introduction to Big data and Hadoop Ecosystem
  • Why industry needs Big data ? Advantages of Big data over traditional RDBMS
  • Introduction to Big data ecosystems
  • Understanding Data various formats, transformation techniques
  • HDFS, YARN architecture
  • MapReduce
  • Understanding Hadoop and Hive
  • HDFS and Hive
  • Hive and datatypes
  • Hive exercises
  • Hive advance features for performance
  • Hive exercises
  • Uses of Hive in real life projects
  • Usages of Shell scripting in Big data projects
  • Shell and Hive exercises
  • Project-1

Session 2

  • Introduction to Impala
  • Architecture of Impala
  • Impala exercise
  • Usages of Hive and Impala in Real life project
  • Understanding Oozie as scheduler
  • Oozie coordinator
  • Setup Oozie job
  • Project-2

Session 3

  • Introduction to Sqoop
  • Understanding capabilities of Sqoop and underlying MapReduce
  • In class exercises
  • Sqoop in real life projects
  • Project-3

Session 4

  • Introduction to streaming, new era of data analytics
  • Introduction Kafka
  • Deep drive of Kafka architecture
  • Setup up Kafka for message generation
  • Flume architecture
  • Usages of Flume to setup streaming pipeline
  • Exercises on Flume Agent setup
  • Kafka and Flume
  • Project-4

Session 5

  • Scala from functional perspective
  • Scala features for Big data transformations
  • Spark, fastest data processing engine in the world
  • Spark architecture
  • Deep drive spark data transformation capabilities
  • Spark SQL with HDFS, Hive and Impala
  • Dealing with various data types JSON, XML, CSV, parquet, text
  • Project-5

Session 6

  • Spark Streaming
  • Setup streaming pipelines with Spark and Kafka
  • Consideration for Zero data loss streaming pipelines
  • Dealing with small file issue and compaction
  • Project-6

Session 7: KSQL

  • Concept of Confluence Kafka
  • Setup environment for Confluence Kafka
  • Using control Panel
  • Understanding and using KStream, KTable
  • Understanding and using KSQL
  • Project-7

Session 8: Docker

  • Understanding Micro Services
  • Introduction to Docker and its usages
  • Docker installation, configuration
  • Understanding and working with container
  • Inter Containers communication, expose services through port
  • Understanding docker file
  • Container based deployment
  • Docker compose
  • Introduction to Kubernetes
  • Using Kubernets
  • Final Project (Practical Questions and answers/ exercise and final evaluation plus any topic that needs more attention

Familiarity with:

CORE Java, SQL, Linux

The training material is developed using real-world use cases that are designed to give students a competitive career advantage.