Data Science and Analytics

Duration 40-45 hours

Major Focus of the training is to develop/improve skillset at industry level to work in Data Science and Data Engineering domain. Less theories, more exercise, and projects

Module 1: Introduction to Data Science

  1. What is Data Science?

Definition, Overview, and Role of a Data Scientist
Data Science vs. Data Analytics vs. Business Intelligence
Real-World Use Case: Airbnb’s Data Science for Price Optimization

  1. Why Learn Data Science?

Importance of Data-Driven Decisions
Industry Applications (Finance, Healthcare, E-commerce, etc.)
Case Study: How Netflix Uses Data Science to Enhance User
Experience

  1. Data Science Workflow

Data Collection, Preparation, Modeling, Evaluation, and Deployment
Tools and Technologies (Python, R, SQL, Excel, etc.)

Module 2: Python for Data Science

  1. Introduction to Python Programming

Python Basics: Variables, Data Types, Control Structures
Data Structures: Lists, Dictionaries, Tuples, Sets Functions, Loops, and Conditionals

  1. NumPy for Numerical Computing

Arrays, Element-Wise Operations, Array Manipulation
Case Study: Simulating Data for Stock Market Predictions

  1. Pandas for Data Manipulation

DataFrames, Series, Filtering, Merging, Grouping
Use Case: Analyzing Sales Data for Retail Companies

Module 3: Data Wrangling & Cleaning

  1. Data Cleaning

Handling Missing Data, Duplicates, Outliers, and Inconsistent Data Tools: Pandas, NumPy, scikit-learn

  1. Feature Engineering

Creating New Features, Encoding Categorical Variables, Scaling, and Normalization
Use Case: Building a Credit Risk Model for a Bank

  1. Data Transformation

Log Transform, Binning, Polynomial Features
Use Case: House Price Prediction by Transforming Features for Better Accuracy

Module 4: Data Visualization

  1. Introduction to Data Visualization

Importance of Visualization in Data Science
Tools: Matplotlib, Seaborn

  1. Exploratory Data Analysis (EDA)

Creating Histograms, Box Plots, Pair Plots, Heatmaps
Case Study: Visualization of Customer Churn Data for a Telecom Company

  1. Advanced Visualization Techniques

Using Plotly and Tableau for Interactive Dashboards
Case Study: Building a Sales Dashboard for a Retail Company

Module 5: Statistics & Probability for Data Science

  1. Descriptive Statistics

Measures of Central Tendency (Mean, Median, Mode)
Measures of Dispersion (Variance, Standard Deviation, Skewness, Kurtosis)

  1. Probability Distributions

Normal Distribution, Poisson, Binomial, Uniform
Use Case: Predicting Sales Trends Using Probability Distributions

  1. Hypothesis Testing

Null and Alternative Hypothesis, T-tests, Chi-Square, P-Values
Use Case: A/B Testing for Website Optimization

Module 6: Machine Learning Basics

  1. Introduction to Machine Learning

Supervised vs. Unsupervised Learning, Terminology, and Concepts

  1. Supervised Learning Algorithms

Linear Regression, Logistic Regression
Use Case: Predicting House Prices Using Linear Regression

  1. Unsupervised Learning Algorithms

Clustering (K-Means, Hierarchical), Dimensionality Reduction (PCA)
Use Case: Customer Segmentation Using K-Means Clustering

  1. Model Evaluation

Train/Test Split, Cross-Validation, Metrics (Accuracy, Precision, Recall, F1-Score)
Use Case: Evaluating a Fraud Detection Model in Banking

Module 7: Advanced Machine Learning

  1. Decision Trees and Random Forests

Building Trees, Feature Importance, Overfitting
Use Case: Predicting Employee Attrition Using Random Forests

  1. Gradient Boosting & XGBoost

Boosting Techniques, Hyperparameter Tuning
Use Case: Predicting Loan Default Using XGBoost

  1. Support Vector Machines

Concepts, Kernels, and Hyperplane
Use Case: Image Classification Using SVM

Module 8: Deep Learning

  1. Introduction to Neural Networks

Structure of a Neural Network, Forward and Backpropagation
Use Case: Handwritten Digit Classification Using Neural Networks

  1. Convolutional Neural Networks (CNNs)

Convolutions, Pooling, Dropout, and Architectures (LeNet, VGG)
Use Case: Image Recognition for Retail Product Detection

  1. Recurrent Neural Networks (RNNs) and LSTMs

Sequential Data, Long Short-Term Memory (LSTM) Use Case: Predicting Stock Prices Using LSTMs

Module 9: Natural Language Processing (NLP)

  1. Introduction to NLP

Tokenization, Stop Words, Lemmatization, and Stemming
Use Case: Sentiment Analysis of Movie Reviews

  1. Text Vectorization

TF-IDF, Word2Vec, Embeddings
Use Case: Building a Text Classification Model for Spam Detection

  1. Advanced NLP Techniques

BERT, GPT, Transformers
Use Case: Building a Chatbot Using Transformer Models

Module 10: Big Data & Cloud Computing

  1. Introduction to Big Data

Apache, Spark, Distributed Computing
Use Case: Processing Large Datasets in Financial Services

  1. Data Science in the Cloud

AWS, Google Cloud, Azure for Data Science
Use Case: Deploying Machine Learning Models in AWS SageMaker

Module 11: Model Deployment & MLOps

  1. Introduction to MLOps

CI/CD for ML Models, Model Monitoring, and Management
Use Case: Deploying a Real-Time Fraud Detection Model in
Production using Docker and Kubernetes

  1. Model Deployment Techniques

Flask, FastAPI, Docker, Kubernetes for Model Serving
Use Case: Building a REST API for a Prediction Model

Module 12: Capstone Project & Real-World Use Cases

  1. Capstone Project

Choose a Real-World Data Science Problem (Predictive Analytics,
NLP, or Computer Vision)
Full Pipeline: Data Collection, Cleaning, Modeling, Evaluation, and Deployment

  1. Real-World Use Cases will be discussed in class.

Pre-requisite for Program:

  1. Familiar with programming language like Python.
  2. Familiar with SQL, NoSQL.
  3. Basic understanding of Mathematics and Statistics.
  4. Basic Git knowledge (optional).
  5. Awareness of cloud resources like google colab.
  6. Must be available for 8 hours class per week, and at least 2 hours

a day for learning and projects beyond class hours.

Job Roles:
Data Scientist, Machine Learning Engineer, Data Analyst, Business Intelligence Developer, AI Specialist, Research Scientist (AI/ML), NLP Engineer, Analytics Consultant, Data Product Manager.

The training material is developed using real-world use cases that are designed to give students a competitive career advantage.