Venkatesh Subramaniam
Data Scientist looking for opportunity

  • Graduate research and teaching assistant at St Peters University
    September 2016

    ● Researching key factors that impact road accident and car theft rates by zip code to help Liberty Mutual better forecast premium costs and reduce payouts

    ● Traveled to Boston to present findings to 30+ Liberty Mutual and Subsidiary Analysts

    ● Instructed Machine Learning and Statistics bootcamp at Saint Peter's University

  • What It Took To Score The Top 2% On The Higgs Boson Kaggle Machine Learning Challenge
    August, 2016

    ● Classified the characterizing events detected by ATLAS into “tau tau decay of a Higgs Boson” versus “background”

    ● Implemented Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting and Extreme Gradient Boosting. Picked the Extreme Gradient Boosting as it had the highest predictive accuracy

    ● Combined the models with feature engineering and used an ensemble method called stacking to improve the results further. The final model yielded a rank of top 2% as 34th out of 2100 contestant in Kaggle Competition

  • Predicting the Road Accident Fatality Likelihood
    August, 2016

    ● Utilized Tableau, Python and R to analyze key factors including climate, crime rates, road dimensions and traffic density to determine which of these contribute to road accident fatalities in Seattle and Boston

    ● Scraped Wunderground weather, Cambridge accidents and Socrata 911 response data with Python

    ● Implemented Logistic Regression, Random Forest, Gradient Boosting and Hierarchical Clustering machine learning techniques; identified weather as the largest contributing factor

  • Web Scraping and Sentiment Analysis Of Yelp User Data
    July, 2016

    Scraping the Yelp restaurant reviews data using BeautifulSoup. The scraped data is then used for Exploratory Data Analysis to obtain insights about the data. The user reviews text data is used for sentiment analysis. The sentiment analysis tool used is VADER(Valence Aware Dictionary and sEntiment Reasoner). Sentiment analysis output gives the information whether the restaurant have positive or negative comments from the customers

  • NYC data science academy
    May, 2016

    Selected to join an intensive 12 week Data Science Boot camp program.

    ● Distributed Computing and High Performance Computing.

    ● Machine learning - Time Series Analysis, Regression, Gradient Boosted Machine, Random Forest, Clustering, Principal Component Analysis, Support Vector.Machines, and Neural Networks.

    ● Web scraping.

    ● Data manipulation, Data visualization and Interactive App development.

    ● Fundamental statistics, A/B Testing.

  • Saint Peter’s University
    Master Of Science in Data Science with Concentration in Business Analytics
    January, 2016
    Course Details

    ● Introduction to Data Science
    ● Statistical Programming
    ● Data Analysis and Decision Modeling
    ● Database & Data Warehousing
    ● Big Data Analytics
    ● Data Visualization
    ● Machine Learning
    ● Data Mining
    ● Predictive Analytics and Experimental Design
    ● Business Analytics
    ● Data Law, Ethics and Privacy
    ● Capstone: Business Analytics
  • Training in Analytics at INSOFE
    January, 2014 - January, 2015

    Certification Program,Big Data Analytics and Optimization :

    This 6-month classroom based program (352 hours requirement) is certified for the quality of Content, Pedagogy & Assessment by the LTI at Carnegie Mellon University ,USA. Statistics, Machine Learning Algorithms, Optimisation and Big Data Analytics.In addition, the program also covers extremely current topics like deep learning, spectral methods, kernel techniques, BSP, HAMA, SPARK, Pregel/Giraph, NUTCH, Social Graphs, Big Text Processing, Text Mining using RTM, HaaS (Hadoop as a service) and Apache MAHOUT.

    Skills acquired from this program are -

    Programming Expertise: R, Hadoop and its Ecosystem

    Topical Expertise: Statistics fundamentals, Statistical Modeling, Data Analytics, Machine Learning, Text Mining, Optimization, Data Visualization, Communications and Ethics Issues in Analytics

    Techniques: Regression, Time Series, Decision Trees, Clustering, Association Rules, K-Nearest Neighbors, Neural Nets,SVM,Genetic Algorithms,Monte Carlo Simulations, Linear & Quadratic Programming.

  • Data Analyst in ACCENTURE
    January, 2016

    ● Provided Business Intelligence (BI) data to Analysts for an Australian Bank; utilized BI tools Informatica, SAP Business Object Data Services and Data Visualization with Tableau

    ● Created relational schema in SQL that captured structured data including number of account types, transaction volume as well as deposit and withdrawal amounts

    ● Provided Business Intelligence (BI) data to Analysts for an Australian Bank; utilized BI tools Informatica, SAP Business Object Data Services and Data Visualization with Tableau

  • Bachelor of Technology in Electronics and Commmunication Engineering Amrita University
    June, 2010 - June, 2014


    • Communicative English
    • Calculus and Matrix Algebra
    • Computational Thinking and Problem Solving
    • Physics/Chemistry
    • Physics/Chemistry Lab
    • Workshop A/Workshop B
    • Engg.Drawing- CAD
    • Cultural Education I


    • Vector Calculus and Ordinary Differential Equations
    • Chemistry/Physics
    • Computer Programming
    • Solid State Devices
    • Fundamentals of Electrical technology
    • Chemistry Lab. / Physics Lab
    • Workshop B / Workshop A
    • Computer Programming Lab
    • Cultural Education II


    • Humanities Elective I
    • Amrita Values Program I
    • Linear Algebra
    • Network Theory
    • Electromagnetic Theory
    • Digital Systems
    • Signals and Systems
    • Digital Systems Lab
    • Signals and Systems Lab


    • Humanities Elective II
    • Amrita Values Program II
    • Probability and Random Process
    • Electronic Circuits
    • Digital Signal Processing
    • Transmission Lines and Waveguides
    • Digital Signal Processing Lab
    • Electronic Circuits Lab I
    • Soft Skills I


    • Optimization Techniques
    • Linear Integrated Circuits
    • Control Engineering
    • Communication Theory
    • Microprocessor and Microcontroller
    • Circuits and Communication Lab
    • Microcontroller Lab
    • Soft Skills II
    • Live-in –Lab


    • Digital Communication
    • Data Communication and Networks
    • Computer Organization and Architecture
    • VLSI Design
    • Elective 1
    • VLSI Design Lab
    • Digital Communication Lab
    • Open Lab
    • Soft Skills III


    • Environmental Studies
    • Radio Frequency Engineering
    • Information Theory and Coding Techniques
    • Elective II
    • Elective III
    • Microwave Engineering Lab
    • Project Phase 1
    • Live-in –Lab


    • Elective IV
    • Elective V
    • Project Phase 2

Predicting The Road Accident Fatality Likelihood

September 23, 2016

The costs of fatalities and injuries due to traffic accidents have a great impact on the society. In recent years, researchers have paid increasing attention to determining factors that significantly affect severity of driver injuries caused by traffic accidents. Applying data mining techniques to model traffic accident data records can help to understand the characteristics of driver's behaviour, roadway condition and weather condition that were causally connected with different injury severity.

Read more

What It Took to Score the Top 2% on the Higgs Boson Machine Learning Challenge

August 29, 2016

How do we pit machine learning with physics? Particle physics is a branch of physics that studies the elementary constituents of matter and radiation, and the interactions between them. Modern particle physics research is focused on subatomic particles that use particle accelerator to break the atoms to detect sub-particles (smaller particles than atom). The ATLAS detector at CERN's Large Hadron Collider was built to search the mysterious Higgs Boson responsible for generating the masses. The Higgs Boson is named after particle physicist Peter Higgs, who with other five physicists predicted the existence of such a particle in 1964.

Read more

Geospatial and Temporal Data Analysis on the New York City Taxi Trip Data

November 8, 2016

The New York City taxi business is one of the interesting fields for data analysis. The analysis is done on spark and it concentrate on the duration of wait time for the drivers after a successful ride based on location. The data used is from NYC Taxi and Limousine department. The geospatial and temporal data is made to good use in spark and the insights are derived. The final result that is desired is in the form of location and average wait time for next passenger. Hence the data can be used to find the good place to get the customer or lesser wait time in order to get the next customer. The results are pretty assuring and sensible. For example, it is possible to get customers quicker in Manhattan than in Bronx and the output exactly shows that.

Read more

Sentiment Analysis Of Yelp User Review Data

August 22, 2016

Social data provides important, real-time insights on consumer opinion – on lifestyle, habits, brands, and preferences. Because these opinions are unsolicited, they provide genuine insight into consumer feelings, and, as such, they should be valued. Yelp provides restaurant details including name, price, rating, address and reviews. The ratings given by the users say how good the restaurant is, but do you really think that the ratings alone is sufficient to give the correct information? No, because people who really hated a restaurant would comment on their experience. The same goes for a the good experience. So, Thus, one would expect that performing sentiment analysis would give give a better insight about judging a into the masses’ opinions of restaurants.

Read more

International Visitation Analysis For United States

August 8, 2016

The U.S. Department of Commerce announced that 5.5 million international visitors traveled to the United States in January 2016, a one percent increase over January 2015. January 2016 registered the fifth straight month of increases in total U.S. visits.

Read more

Exploratory Analysis Of New York City Yellow Taxi Data

July 25, 2016

The New York City Taxi & Limousine Commission has released staggeringly detailed historical data covering over 1.1 billion individual taxi trips in the city from January 2009 through June 2015. Taken as a whole, the detailed trip-level data is more than just a vast list of taxi pickup and drop off coordinates. It specifies some other useful information about the number of passengers, pick up times, location and revenue.

I chose this project to understand the dynamics of the yellow taxi industry better. What kind of trips are made in cabs? Where do those trips occur? Does the number of passengers using the taxi follow any pattern? What are the predominant costs and locations of taxi trips, and what are the implications of these findings?

The primary goal of this analysis is to find useful insights to help the yellow taxi cab drivers work smart not work hard.

Read more

Venkatesh is a graduate who holds a Bachelors Degree( in Electronics and Communication Engineering and is currently pursuing his Masters Degree(M.S.) in Data Science. He has experience as a data Analyst at Accenture where he worked on database processing and visualization of financial data. In addition to this, he also completed several freelance projects. In one of these projects he worked with an insurance company to predict road accidents and leverage that insight to reduce car insurance premiums for customers.
Venkatesh supplemented his theoretical background from his Masters degree and practical experience by attending the NYC Data Science Academy bootcamp


*Feel free to download my resume here:)