In a world where everyone is connected to the internet and there are sensors everywhere, being a data scientist and mastering machine learning are my main goals to make accurate predictions about the future.
The fields that I am most interested in, for using machine learning and data science techniques on large datasets, are transportation, health, social networks and cryptocurrencies. Currently, at the same time that I finish my Master's in Data Science, I work as an intern in the MyDrive team at TomTom Amsterdam and I also work independently as a Professional Fitness Coach, helping people to improve their body composition, health, fitness performance and long term habits.
Other interests of mine are GNU/Linux administration, physics, athletic performance optimization, human nutrition and traveling. If you want to see the rest of my interests and skills, please check out my profile on LinkedIn.
Data Science Projects
These are some of my last data science projects. To access the full content, click on the images.
Google Trends Modeling: Predicting depression
- Extracted the search volume of depression related keywords from Google Trends
- Found the best model, using official surveys as ground truth, that predicts with R^2 = 0.74 the percentage of people with depression in each state of the USA
Data Science challenge: Decision making
- Calculated the maximum price that the company is willing to pay to move into a new technology to make it profitable
- Carried out a data exploration, statistical analysis and created a model with random fores
Brain Oscillations and Network Activity
- Performed a spectral analysis on the time series of the electroencephalography signals of the brain of a patient under different conditions and found significant differences in the spectrums in the occipital lobe
- Created a network representation of the brain and discovered which regions interact with each other forming communities and their topology
Social and Temporal Text Analysis of the Italian Referendum
- Used 76GB of Twitter data to detect the most influential users during the previous days of the Italian referendum of 2016
- Performed a temporal analysis using SAX and clustering methods to find the most relevant words that classify the users based on their ideology
- Our results show a polarization of the network where the No supporters of the referendum have a bigger weight. This is in line with the results of the referendum.
Bayesian Inference: Predicting bodyweight changes
- Performed bayesian inference with 2 years of one of my clients' personal data, and showed the dependency of the bodyweight changes with the physical activity, metabolic rate and energy intake
- Predicted with 80% confidence that a given diet will lead to weight loss
Recommender systems for books on big datasets
This software gives book recommendations based on a data set of ratings 1.200.000 ratings from 279.000 users on 271.000 books. The algorithms used are collaborative filtering, item based, user based and association rules.
House Prices: Advanced Regression Techniques
- Modeled the houses' prices with an accuracy of 89%
- Used XGBoosting with a careful data preprocessing and feature engineering
Epidemic spreading in Social Networks
- Simulated and studied the spread of a disease on a social network following the standard SIR model
- Minimized the time and harm of the pandemia while using a limited number of vaccines by predicting and immunizing the most important nodes of the network
Flights Scraper: a dynamic web scraper to get the best flight deals
Suppose you want to fly somewhere anytime in May, and come back anytime in July. There are about 900 possible day combinations for your trip, too many to track them manually. Because of that, I did this software which scrapes the dynamic web of the popular travel agent Kayak, and finds all the flights that satisfy your dates, along with all relevant information.
Book clustering by language using Jaccard similarity
Using the Jaccard similarity on the words of the books, this software lets you find the the books that are written in the same language and cluster them together.