## What is Data Science

Data science makes use of scientific methods, processes and algorithms to extract knowledge from data that is structural or non-structural. Data science is one of the most fiercely debated topics in the industry right now. Data science is a crucial aspect of any sector nowadays, given the massive amounts of data created. Blue Shell tech provides the Best Data Science Training in Kochi

### Why Data Science?

Unlike traditional structured data, most of the data generated today are unstructured or semi-structured.In order to extract knowledge from the unstructured huge volume of data more complex, advanced analytical tools and algorithms are needed. There Comes the importance of Data Science. Its popularity has grown over time, and companies have started to employ data science approaches to expand their operations and improve client loyalty. Data scientists are in high demand all across the world, and it’s just becoming higher.

### Who is a Data Scientist?

Data Scientist uses their strong expertise in certain scientific disciplines to crack complex data problems. They extract and present the information in a more useful manner as compared to the raw data available to them from structured as well as unstructured forms.

## Best Data Science Course in Kochi, Kerala

**Data Science Syllabus**

**Module – 1(Python Basics)** **Introduction to Python (22 hours)**

- Welcome To The Course
- Software Installation
- Jupyter Notebook Tutorial
- Comments
- Variable,Operators,DataTypes
- If Else,For and While Loops
- Functions
- Lambda Expression
- Taking input from keyboard
- List
- Tuple
- Set
- Dictionary
- Modules and Packages
- Objects and Classes
- File Handling
- MySQL

- What is Artificial Intelligence
- Introduction To Data Science
- Real Time Use Cases Of Data Science
- Who is a Data Scientist?
- Data Science Project Lifecycle
- Skill sets needed for Data Scientist
- Difference between Data Engineer, Data Scientist and Data Analyst
- How to Transition into Data Science from Different Backgrounds
- Machine Learning
- Supervised vs Unsupervised
- DeepLearning vs Machine Learning
- INTERVIEW QUESTIONS ASSIGNMENT-1

**Module – 2(Python Advanced)** **NumPy,Pandas (8 hours)**

- Introduction to Numpy
- Creating Arrays
- arange(),linspace() etc.
- Creating Arrays of Random Numbers
- Basic Operations on an Array
- Applying Universal functions on an array
- Linear Algebra operations on an array
- Numpy DataTypes
- Type Conversion
- Array Stacking

- Introduction to Pandas
- Creating DataFrames
- Reading data from csv,excel etc. into a DataFrame & writing df into csv,excel
- Selection and Indexing
- Conditional Selection
- Groupby
- Pivot Table
- Merging , Joining, Concatenation
- Missing Value Treatment
- INTERVIEW QUESTIONS ASSIGNMENT-2

**Module – 3 (Visualisation)** **Visualisation-Matplotlib,Seaborn,Plotly (3 hours)**

- Line Plots
- Scatter Plots
- Pair Plots
- Histograms
- Heat Maps
- Bar Plots
- Stacked Bar plot
- Pie chart
- Box Plots
- Swarm Plots

**Module – 4 (Statistics)** **Statistics (8 hours)**

- Descriptive vs Inferential Statistics
- Mean, Median, Mode
- Central Limit Theorem
- Measure of dispersion
- Inter Quartile Range
- Variance
- Standard Deviation
- Z score
- Pearson’s Product Moment Correlation-r
- R square
- Adjusted R-square
- Normal Distribution
- Standard Normal Distribution
- Empirical rule of Normal Distribution
- What is an Outlier
- Outlier Detection and Removal
- Exploratory Data Analysis
- INTERVIEW QUESTIONS ASSIGNMENT-3

**Module – 5 (ML-Linear Regression)** **Linear Regression, Cost Function, Gradient Descent (10 hours)**

- Introduction to Machine Learning
- Supervised vs Unsupervised
- Regression vs Classification
- Bias and Variance tradeoff
- Cross Validation
- Linear Regression Theory
- Gradients/Derivative Theory
- Assumption of Linear Regression
- Cost Function
- Optimize Cost function using Gradient Descent
- Mathematical Derivation
- Multi- Colinearity
- MAE
- MSE
- RMSE
- Multiple Linear Regression
- Polynomial Regression
- INTERVIEW QUESTIONS ASSIGNMENT-4

**Module -6 (ML-Logistic Regression, Algorithm Validation)** **Logistic Regression (8 hours)**

- Logistic Regression Theory
- Logistic function
- Classification Algorithm Validation
- Confusion Matrix
- Classification Report
- Recall
- Precision
- AUC
- ROC
- INTERVIEW QUESTIONS ASSIGNMENT-5

**Module -7(ML- Naive Bayes, SVM )** **Naive Bayes, SVM (6 hours)**

- Naive Bayes classification
- Bayes Theorem
- Support Vector Machine (SVM)
- Support Vectors
- Kernel Trick

**Module – 8 (Decision Tree, Random Forest)** **Decision Tree (6 hours)**

- What is ID3 Algorithm
- Entropy
- Calculating Information Gain
- Overfitting, Underfitting, Best fit
- Random Forest
- What is Bootstrap
- Bagging
- Difference between Random Forest and Decision Tree
- Feature Selection using Random Forest
- Hyperparameter tuning
- INTERVIEW QUESTIONS ASSIGNMENT-6

**Module – 9 (KMeans)** **KMeans Clustering (2 hours)**

- Introduction to Unsupervised Machine Learning
- KMeans Theory
- How to decide K in KMeans

**Module – 10(PCA, Recommendation Systems )** **Principal Component Analysis (6 hours)**

- Introduction to Dimensionality Reduction
- PCA Theory discussion
- Eigenvalues , Eigen Vectors
- Step by Step Detail Mathematical Derivation
- Singular Value Decomposition
- Recommendation Systems
- Content-Based Filtering
- Collaborative Filtering
- INTERVIEW QUESTIONS ASSIGNMENT-7

**Module -11 (NLP)** **Text Mining (3 hours)**

- Introduction to NLP
- Text Preprocessing Techniques using Space and NLTK
- Word Tokens
- Document Similarity
- StopWord Removal
- Lemmatization
- Stemming
- Count Vectorizer
- Tf-Idf Vectorizer