Shruti Pandit

Logo

Github | LinkedIn | Twitter

View the Project on GitHub shrutipandit707/portfolio

Data Science Portfolio

This website contains projects that I have completed in AIML and Data science with Github links for each.

CNN based model to accurately detect melanoma

I built a CNN based model which can accurately detect melanoma. Melanoma is a type of cancer that can be deadly if not detected early. It accounts for 75% of skin cancer deaths. A solution that can evaluate images and alert dermatologists about the presence of melanoma has the potential to reduce a lot of manual effort needed in diagnosis. Here is the github link for the code- Melanoma Cancer Detection using CNN

Style Transfer and Object Detection using CNN

In this notebook will be going through two interesting applications of CNNs - Style Transfer and Object Detection.

Style transfer is a fun ‘artistic’ application of CNNs using which various ‘styles’ can be applied to images. The main learning objective of studying style transfer > is to understand how the basic ideas of CNNs can be used to write entirely new applications. Object detection, as you already know, is the problem of detecting >objects in images. It is very commonly used in computer vision applications, such as driverless cars, extracting specific text from documents (such as Aadhar cards, >passports etc.). . Here is the github link for the code- Style Transfer and Object Detection using CNN

Reading Digital Images using CNN

In this notebook will be reading digital images using CNN. Here is the github link for the code- Reading Digital Images using CNN

Object Tracking in Videos using CNN

In this notebook will be tracking objects in videos using CNN. In previous notebooks, we have seen the following pipeline:

Optical Character Recognition using CNN

In this notebook , you will see the process of extracting text from image. You will use OpenCV to preprocess the image and use open-source tesseract library to extract text from pre-processed image.

Here is the github link for the code- Optical Character Recognition using CNN

Predicting Dow Jones with News Headlines using RNN

In this notebook we are trying to predict Dow Jones using RNN.We are using Reddit News Headlines to predict the movement of Dow Jones Industrial Average. Data Source: https://www.kaggle.com/aaron7sun/stocknews Data Description: Dow Jones details on Open, High, Low and Close for each day from 2008-08-08 to 2016-07-01 and headlines for those dates from Reddit News. Methodology: For this project, we will use GloVe to create our word embeddings and CNNs followed by LSTMs to build our model. This model is based off the work done in this paper https://www.aclweb.org/anthology/C/C16/C16-1229.pdf.

Here is the github link for the code- Predicting Dow Jones with News Headlines using RNN

Text generation using RNN - Character Level

We’re going to build a C code generator by training an RNN on a huge corpus of C code (the linux kernel code). You can download the C code used as source text from the following link: https://github.com/torvalds/linux/tree/master/kernel We have already downloaded the entire kernel folder and stored in a local directory.

Here is the github link for the code- Text generation using RNN - Character Level

Parts-of-speech Tagging using RNN

In this notebook we are classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, or simply POS-tagging.

Here is the github link for the code- Parts-of-speech Tagging using RNN

Getting Started with OpenCV and it’s usage in images

This notebook has two main objectives:

  1. Getting familiar with OpenCV - its installation and some of its basic usage.
  2. Looking at the data we will use - this will give you a lot of context for my future projects.

    Here is the github link for the code- Getting Started with OpenCV and it’s usage in images

Flower Image Reconition by CNN

We will be classifying flower images with CNN technique. We are using various CNN concepts in this notebook like Data Augmentation, Ablation, Morphological Transformations, Normalisations, Network building, Hyperparameter Tuning etc. to solve this image classification problem.

Here is the github link for the code- Flower Image Reconition by CNN

Digit Recognition using CNN

A classic problem in the field of pattern recognition is that of handwritten digit recognition. Suppose that you have images of handwritten digits ranging from 0-9 written by various people in boxes of a specific size - similar to the application forms in banks and universities.We will apply Convolutional Neural Network here for this problem. Here is the github link for the code- Digit Recognition using CNN

Transfer Learning on Flower Dataset

For this project, we begin with a very popular machine learning dataset - Flowers. In this segment, we will learn the necessary steps needed to perform Transfr Learning on a dataset and then appreciate how it helps in visualising the data. Here is the github link for the code- Transfer Learning on Flower Dataset

Digit Recognition using Keras

A classic problem in the field of pattern recognition is that of handwritten digit recognition. Suppose that you have images of handwritten digits ranging from 0-9 written by various people in boxes of a specific size - similar to the application forms in banks and universities. Here is the github link for the code- Digit Recognition using Keras

House Price Prediction System with Feedforward Neural Networks using Keras

In this case study, we are attempting to solve a real world business problem using with Neural Networks using Keras techniques.We are understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System with Feedforward Neural Networks using Keras

House Price Prediction System with Feedforward Neural Networks

In this case study, we are attempting to solve a real world business problem using FeedForward Neural Networks techniques.We are understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System with FeedForward Neural Netowrks

PCA on IRIS Dataset

For this project, we begin with a very popular machine learning dataset - ‘Iris’. In this segment, we will learn the necessary steps needed to perform PCA on a dataset and then appreciate how it helps in visualising your data that contains more than two dimensions. Here is the github link for the code- PCA On IRIS Dataset

Example-Change of Basis-PCA

In this segment, we’ll take a look at how we compute the transformation matrix M that helps us navigate between multiple basis vectors. We’ll generalise the conventions on how to move from one basis to another basis so that it becomes easier while using the formula. Here is the github link for the code- Change Of Basis-PCA

Telecom Churn Case Study by PCA and Random Forest

In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition. Here is the github link for the code- Telecom Churn Case Study by PCA and Random Forest

House Price Prediction System using PCA

In this case study, we are attempting to solve a real world business problem using PCA techniques.We are understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System using PCA

Telecom Churn Case Study by PCA

In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition. Here is the github link for the code- Telecom Churn Case Study by PCA

Unsupervised K-Prototype Clustering for Blood Transfusion Datset

In this case study, we are attempting to solve a real world business problem using Unsupervised Clustering K-Prototype techniques. The data is related with Blood Transfusion. Here is the github link for the code- Unsupervised K-Prototype Clustering for Blood Transfusion Datset

Unsupervised K-Mode Clustering for Banking Data

In this case study, we are attempting to solve a real world business problem using Unsupervised Clustering K-Mode techniques. The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed. Here is the github link for the code- Unsupervised K-Mode Clustering for Banking Data

Unsupervised K-Means Clustering for Online Retail Data

In this case study, we are attempting to solve a real world business problem using Unsupervised Clustering K-Means techniques. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. Here is the github link for the code- Unsupervised K-Means Clustering for Online Retail Data

House Price Prediction System using XGBoost Regression

In this case study, we are attempting to solve a real world business problem using XGBoost Regression techniques. We will be understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System using XGBoost Regression

Employee Attrition Classification System by XGBoost Classification Technique

In this case study, we are attempting to solve a real world business problem using XGBoost Classification Technique. We will be understanding and solving a risk analytics problem in Human Resources Domain.We will be checking how data can be used effectively to solve business problems like The problem statement that we will be working on is to predict which factors led to employee attrition in a particular company. Here is the github link for the code- Employee Attrition Classification System by XGBoost Classification Technique

Decision Tree for Heart Disease Prediction System

Heart disease is a leading cause of premature death in the world.Predicting the outcome of disease is the challenging task.Data mining is involved to automatically infer diagnostic rules and help specialists to make diagnosis process more reliable. Here is the github link for the code- Decision Tree for Heart Disease Prediction System

Telecom Churn Case Study by Decision Tree Technique

In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition. Here is the github link for the code- Telecom Churn Case Study by Decision Tree Technique

SVM Letter Recognition

In this case study, we are attempting to solve a real world business problem using Exploratory Data Science techniques. We will be understanding and solving a risk analytics problem in Banking and Financial Domain.We will be checking how data can be used effectively to solve business problems like defaulters prediction in Loan Lending club Here is the github link for the code- SVM Letter Recognition

Bernoulli Naive Bayes for SMS Classification System

In this case study, we are attempting to solve a real world problem using Multinomial Naive Bayes techniques. Here is the github link for the code- Bernoulli Naive Bayes for SMS Classification System

Multinomial Naive Bayes for SMS Classification System

In this case study, we are attempting to solve a real world problem using Multinomial Naive Bayes techniques. Here is the github link for the code- Multinomial Naive Bayes for SMS Classification System

Advertising Case Study for Simple Linear Regression

In this case study, we are attempting to solve a real world business problem using Simple Linear Regression techniques. We will be understanding and solving an advertising problem in Marketing Domain.We will be checking how data can be used effectively to solve business problems like advertising prediction system. Here is the github link for the code- Advertising Case Study for Simple Linear Regression

Random Forest for Heart Disease Prediction System

Heart disease is a leading cause of premature death in the world.Predicting the outcome of disease is the challenging task.Data mining is involved to automatically infer diagnostic rules and help specialists to make diagnosis process more reliable. Here is the github link for the code- Random Forest for Heart Disease Prediction System

Multinomial and Bernoulli Naive Bayes

For understanding Multinomial and Bernoulli Naive Bayes, we will start with a small example and understand the end to end process. To start with, let’s take a few sentences and classify them in two different classes - education or cinema. Each sentence will represent one document. In real-world cases, a document be any piece of text such as an email, a news article, a book review, a tweet etc. The analysis and the algorithm involved doesn’t depend on the type of document we use. Here is the github link for the code- Multinomial and Bernoulli Naive Bayes

Time Series Johansen Impulse Demonstration

This is a Time Series Forecast Demonstration.

Here is the github link for the code- Time Series Johansen Impulse Demonstration

Time Series Forecast Demonstration

This is a Time Series Forecast Demonstration.

Here is the github link for the code- Time Series Forecast Demonstration

Time Series Forecast for Airline Passenger Prediction System

This dataset provides monthly totals of a US airline passengers from 1949 to 1960. This dataset is taken from an inbuilt dataset of R called AirPassengers. Here is the github link for the code- Time Series Forecast for Airline Passenger Prediction System

EMail Spam Detection and Classification using Linear SVM

Today emails have become to be a standout amongst the most well-known and efficient types of correspondence for Internet clients. Hence because of its fame, the email will be misused. One such misuse is the posting of unwelcome,undesirable messages known as spam or junk messages. Email spam has different consequences. It diminishes productivity,consumes additional space in mailboxes, additional time, expands programming damaging viruses, and materials that containconceivably destructive data for Internet clients, destroys the stability of mail servers, and subsequently, clients invest lots of time for sorting approaching mail and erasing undesirable correspondence. So there is a need for spam detection so that its outcomes can be reduced. In this notebook, I propose a novel method for email spam detection using SVM and feature extraction. Here is the github link for the code- EMail Spam Detection and Classification using Linear SVM

IMDB Review Classification using Naive Bayes Technique

In this case study, we are attempting to solve a real world business problem using Naive Bayes techniques. based on Naive Bayes classifier to predict positive and negative reviews from IMDB dataset. Here is the github link for the code- IMDB Review Classification using Naive Bayes Technique

House Price Prediction System using Random Forest

In this case study, we are attempting to solve a real world business problem using Multiple Linear Regression and Random Forest techniques.We are understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System using Random Forest

House Price Prediction System using Multiple Linear Regression and Decision Trees

In this case study, we are attempting to solve a real world business problem using Multiple Linear Regression and Decision Trees techniques. We will be understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- House Price Prediction System using Multiple Linear Regression and Decision Trees

House Price Prediction System using Gradient Boosting Regression

In this case study, we are attempting to solve a real world business problem using Gradient Boosting Regression techniques. We will be understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- GBM Regression

Employee Attrition Classification System by GBM Classification

In this case study, we are attempting to solve a real world business problem using Gradient Boosting Classification Algortihm. We will be understanding and solving a risk analytics problem in Human Resources Domain.We will be checking how data can be used effectively to solve business problems like The problem statement that we will be working on is to predict which factors led to employee attrition in a particular company. Here is the github link for the code- GBM Classification

EMail Spam Detection and Classification using SVM

Today emails have become to be a standout amongst the most well-known and efficient types of correspondence for Internet clients. Hence because of its fame, the email will be misused. One such misuse is the posting of unwelcome,undesirable messages known as spam or junk messages. Email spam has different consequences. It diminishes productivity,consumes additional space in mailboxes, additional time, expands programming damaging viruses, and materials that containconceivably destructive data for Internet clients, destroys the stability of mail servers, and subsequently, clients invest lots of time for sorting approaching mail and erasing undesirable correspondence. So there is a need for spam detection so that its outcomes can be reduced. In this notebook, I propose a novel method for email spam detection using SVM and feature extraction. Here is the github link for the code- EMail Spam Detection and Classification using SVM

Digit Recognition using SVM

A classic problem in the field of pattern recognition is that of handwritten digit recognition. Suppose that you have images of handwritten digits ranging from 0-9 written by various people in boxes of a specific size - similar to the application forms in banks and universities. Here is the github link for the code- Digit Recognition using SVM

Delhi Delight Case Study using Decision Tree Algorithm

We work at Delhi Delights! which is a food delivery company in Delhi. It offers a premium membership called ‘Delighted Members’, with which there is no delivery cost for your order. Lately, the number of purchases of this premium membership has been going down. Now, based on past data, Delhi Delights! wants to predict which of the customers will buy the ‘Delighted Members’ membership and which ones will not. Here is the github link for the code- Delhi Delight Case Study using Decision Tree Algorithm

House Price Prediction System using Cross Validation with Linear Regression Technique

In this case study, we are attempting to solve a real world business problem using Cross Validation with Linear Regression Technique. We will be understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- Cross Validation for House Price Prediction

Advanced Regression Case Study for Car Price Prediction

In this case study, we are attempting to solve a real world business problem using Advanced Regression techniques. We will be understanding and solving an Car Price Prediction problem in Automobile Domain.We will be checking how data can be used effectively to solve business problems like car price prediction system. Here is the github link for the code- Advanced Regression

Employee Attrition Case Study for a Company by AdaBoost Classification

In this case study, we are attempting to solve a real world business problem using AdaBoost Classification Algortihm. We will be understanding and solving a risk analytics problem in Human Resources Domain.We will be checking how data can be used effectively to solve business problems like The problem statement that we will be working on is to predict which factors led to employee attrition in a particular company. Advaned Classifiction

Advertising Case Study for Advanced Regression

In this case study, we are attempting to solve a real world business problem using Advanced Regression techniques. We will be understanding and solving an advertising problem in Marketing Domain.We will be checking how data can be used effectively to solve business problems like advertising prediction system. Here is the github link for the code- Advanced Regression

House Price Prediction System using AdaBoost Regression

In this case study, we are attempting to solve a real world business problem using Adaboost Regression techniques. We will be understanding and solving a House Price Prediction problem in Real Estate Domain.We will be checking how data can be used effectively to solve business problems like House Price Prediction Problem.The problem statement that we will be working on is to predict the house sales in a particular location and understand which factors are responsible for higher property value. Here is the github link for the code- AdaBoost Regression

Lending Club Case Study for Loan Defaulter Prediction System

In this case study, we are attempting to solve a real world business problem using Exploratory Data Science techniques. We will be understanding and solving a risk analytics problem in Banking and Financial Domain.We will be checking how data can be used effectively to solve business problems like defaulters prediction in Loan Lending club. Here is the github link for the code- Lending CLub Case Study Code

Simple Linear Regression on Sales Data

In this case study, we are attempting to solve a real world business problem using Simple Linear Regression Techniques. We will be understanding and solving a risk analytics problem in Sales Domain.We will be checking how data modelling can be used effectively to solve business problems using Residual Analysis and Predicting and evaluating the test set Here is the github link for the code- Simple Linear Regression of Sales Data

Multiple Linear Regression on Bike Sharing Data

In this case study, we are attempting to solve a real world business problem using Multiple Linear Regression Techniques. We will have to build a multiple linear regression model for the prediction of demand for shared bikes. Here is the github link for the code- Multiple Linear Regression of Bike Sharing Data

Multiple Linear Regression on Real Estate Data

Consider a real estate company that has a dataset containing the prices of properties in the Delhi region. It wishes to use the data to optimise the sale prices of the properties based on important factors such as area, bedrooms, parking, etc. Essentially, the company wants —

  • To identify the variables affecting house prices, e.g. area, number of rooms, bathrooms, etc.
  • To create a linear model that quantitatively relates house prices with variables such as number of rooms, area, number of bathrooms, etc. To know the accuracy of the model, i.e. how well these variables can predict house prices.

So interpretation is important!

The steps we will follow in this exercise are as follows:

Logistic Regression on Customer Churn Data

Our data, sourced from Kaggle, is centered around customer churn, the rate at which a commercial customer will leave the commercial platform that they are currently a (paying) customer, of a telecommunications company. We did the EDA process has, and we have a pretty good sense of what our data tells us before processing, we built a Logistic Regression classification model which will allow for us to predict whether a customer is at risk to churn from Telco’s platform. Logistic Regression on Customer Churn Data