Data Science and Analytics
Duration 40-45 hours
Major Focus of the training is to develop/improve skillset at industry level to work in Data Science and Data Engineering domain. Less theories, more exercise, and projects
Duration 40-45 hours
Major Focus of the training is to develop/improve skillset at industry level to work in Data Science and Data Engineering domain. Less theories, more exercise, and projects
Module 1: Introduction to Data Science
Definition, Overview, and Role of a Data Scientist
Data Science vs. Data Analytics vs. Business Intelligence
Real-World Use Case: Airbnb’s Data Science for Price Optimization
Importance of Data-Driven Decisions
Industry Applications (Finance, Healthcare, E-commerce, etc.)
Case Study: How Netflix Uses Data Science to Enhance User
Experience
Data Collection, Preparation, Modeling, Evaluation, and Deployment
Tools and Technologies (Python, R, SQL, Excel, etc.)
Module 2: Python for Data Science
Python Basics: Variables, Data Types, Control Structures
Data Structures: Lists, Dictionaries, Tuples, Sets Functions, Loops, and Conditionals
Arrays, Element-Wise Operations, Array Manipulation
Case Study: Simulating Data for Stock Market Predictions
DataFrames, Series, Filtering, Merging, Grouping
Use Case: Analyzing Sales Data for Retail Companies
Module 3: Data Wrangling & Cleaning
Handling Missing Data, Duplicates, Outliers, and Inconsistent Data Tools: Pandas, NumPy, scikit-learn
Creating New Features, Encoding Categorical Variables, Scaling, and Normalization
Use Case: Building a Credit Risk Model for a Bank
Log Transform, Binning, Polynomial Features
Use Case: House Price Prediction by Transforming Features for Better Accuracy
Module 4: Data Visualization
Importance of Visualization in Data Science
Tools: Matplotlib, Seaborn
Creating Histograms, Box Plots, Pair Plots, Heatmaps
Case Study: Visualization of Customer Churn Data for a Telecom Company
Using Plotly and Tableau for Interactive Dashboards
Case Study: Building a Sales Dashboard for a Retail Company
Module 5: Statistics & Probability for Data Science
Measures of Central Tendency (Mean, Median, Mode)
Measures of Dispersion (Variance, Standard Deviation, Skewness, Kurtosis)
Normal Distribution, Poisson, Binomial, Uniform
Use Case: Predicting Sales Trends Using Probability Distributions
Null and Alternative Hypothesis, T-tests, Chi-Square, P-Values
Use Case: A/B Testing for Website Optimization
Module 6: Machine Learning Basics
Supervised vs. Unsupervised Learning, Terminology, and Concepts
Linear Regression, Logistic Regression
Use Case: Predicting House Prices Using Linear Regression
Clustering (K-Means, Hierarchical), Dimensionality Reduction (PCA)
Use Case: Customer Segmentation Using K-Means Clustering
Train/Test Split, Cross-Validation, Metrics (Accuracy, Precision, Recall, F1-Score)
Use Case: Evaluating a Fraud Detection Model in Banking
Module 7: Advanced Machine Learning
Building Trees, Feature Importance, Overfitting
Use Case: Predicting Employee Attrition Using Random Forests
Boosting Techniques, Hyperparameter Tuning
Use Case: Predicting Loan Default Using XGBoost
Concepts, Kernels, and Hyperplane
Use Case: Image Classification Using SVM
Module 8: Deep Learning
Structure of a Neural Network, Forward and Backpropagation
Use Case: Handwritten Digit Classification Using Neural Networks
Convolutions, Pooling, Dropout, and Architectures (LeNet, VGG)
Use Case: Image Recognition for Retail Product Detection
Sequential Data, Long Short-Term Memory (LSTM) Use Case: Predicting Stock Prices Using LSTMs
Module 9: Natural Language Processing (NLP)
Tokenization, Stop Words, Lemmatization, and Stemming
Use Case: Sentiment Analysis of Movie Reviews
TF-IDF, Word2Vec, Embeddings
Use Case: Building a Text Classification Model for Spam Detection
BERT, GPT, Transformers
Use Case: Building a Chatbot Using Transformer Models
Module 10: Big Data & Cloud Computing
Apache, Spark, Distributed Computing
Use Case: Processing Large Datasets in Financial Services
AWS, Google Cloud, Azure for Data Science
Use Case: Deploying Machine Learning Models in AWS SageMaker
Module 11: Model Deployment & MLOps
CI/CD for ML Models, Model Monitoring, and Management
Use Case: Deploying a Real-Time Fraud Detection Model in
Production using Docker and Kubernetes
Flask, FastAPI, Docker, Kubernetes for Model Serving
Use Case: Building a REST API for a Prediction Model
Module 12: Capstone Project & Real-World Use Cases
Choose a Real-World Data Science Problem (Predictive Analytics,
NLP, or Computer Vision)
Full Pipeline: Data Collection, Cleaning, Modeling, Evaluation, and Deployment
Pre-requisite for Program:
a day for learning and projects beyond class hours.
Job Roles:
Data Scientist, Machine Learning Engineer, Data Analyst, Business Intelligence Developer, AI Specialist, Research Scientist (AI/ML), NLP Engineer, Analytics Consultant, Data Product Manager.
The training material is developed using real-world use cases that are designed to give students a competitive career advantage.