Portfolio including my data science projects for academic, self-learning, and hobby.
More information about me: LinkedIn
Graduate Thesis Research, Beck's Research Lab, University of Washington, Seattle, WA
Developing a Visualization Tool for Unsupervised Machine Learning Analysis on Genomics Data
- Applied K-means clustering and PCA analysis on RNA-seq data using Scikit-Learn.
- Created a SQL Server database and wrote queries for data loading and extraction.
- Built an interactive web application and visualized analysis results by Tableau and Plot.ly - Dash.
Keywords: K-Means Clustering / Principal Component Analysis / Data Visualization / Genomics / RNA-seq
Working Repository: DashOmics
Learning Repository: Learning-Tableau ; Learning-Dash
------------------------------------------------------------------------------------------------------------------------------------------------
DIRECT Data Science Trainee, Clean Energy Institute, University of Washington, Seattle, WA
- Analysis and Optimization of Lignin PyrolysisKinetic Model (Capstone Project) : Developed an open source package in Scipy to analyze chemical kinetic model of lignin pyrolysis (involving 93 species and 406 reactions) to predict the temporal evolution of molecules and functional groups during chemical reaction
- Electricity Analysis and Suggestion System(Coursework Project): Implemented random forest and statistical analysis for electricity generation suggestion model, and built GUI based on Tkinter package, with prioritized resources and revenue plots presented by matplotlib
Keywords: Data Mining / Python (pandas, matplotlib) / Random Forest / P-Value / GUI
-
Data Analysis
- Data Cleaning_Airbnb_Listing: Use Python to clean Airbnb Listing Data (from csv file)
- Sberbank Data Analytics with Pandas: Go through data loading and data frame creation, selection and query, grouping and function applying, plotting and writing data to file by Pandas and Numpy
-
Web Crawling
-
Web Crawling with BeautifulSoup: Use Python BeautifulSoup to collect and clean job listing data from indeed.com
-
Web Crawling with Scrapy: Use Scrapy Framework to collect news from Newswire, and generate trending report for a certain time period.
-
-
Machine Learning
- Genome-wide Prediction of Chromatin Accessibility based on gene expression: Clean and preprocess Dnase-seq and RNA-seq data with high dimensional features (p > 20,000); Implemented and compared Regulated Logistic Regression, Random Forest and SVM, increasing prediction accuracy by 13%
- Unsupervised Machine Learning Techniques on *Omics Data: Applied K-means clustering and PCA analysis on RNA-seq data using Scikit-Learn, evaluated model by Elbow Method and Silhouetee Analysis
-
Data Visualization
- A Visualization Tool for Unsupervised Machine Learning Analysis on Genomics Data: Built an interactive web application and visualized analysis results Plot.ly - Dash.
- Airbnb Visualization Analysis by Tableau