Kuala Lumpur, Malaysia • firstname.lastname@example.org • +601162078955
Master of Data Science degree with distinction (GPA: 3.98/4)
Kuala Lumpur, Malaysia
Feb 2018 - Oct 2019
B. Sc. degree in Computer Engineering; ranked first (GPA: 93.9%)
Sep 2012 - Sep 2016
IBM Data Science Professional Certificate (2019)
Data Science and Analytics Intern
Kuala Lumpur, Malaysia
April 2019 - July 2019
Data Analysis and Machine Learning: Pandas, NumPy, Scikit-learn, TensorFlow/Keras, XGBoost, LightGBM, etc. Also familiar with SAS and Hadoop ecosystem.
Data Visualization and Dashboards: Matplotlib, Seaborn, Google Data Studio, Follium for maps, etc.
Databases: SQL. Web Scraping: BeautifulSoup, Selenium, HTTrack.
Cloud Computing: Google Cloud Platform (BigQuery for big data, Compute Engine, and Storage), Amazon Web Services (EC2, S3, and Route 53).
Languages: English: fluent (TOEFL iBT: 102). Arabic: native.
End-to-end data-science projects: In addition to the master-degree project mentioned above, a project was conducted to build a machine-learning model to predict house prices based on many characteristics like house size, construction year, etc. The project data was prepared and cleaned before applying exploratory data analysis. Then multiple models were built and compared including Linear Regression, K Nearest Neighbors, Support Vector Machines, Neural Network, Random Forest, and Gradient Tree Boosting (XGBoost). Project report and code can be found on http://bit.ly/hp-pdf.
YouTube Trending Videos Analysis: data of 40,000+ videos was analyzed and explored to get insights on YouTube trending videos and to identify the common characteristics among them. Toward that end, informative visualizations and tables were generated using Python, Pandas, Seaborn, and other tools. The Jupyter Notebook that contains the analysis code and results can be accessed on http://bit.ly/YT-analysis.
Kaggle Competitions: Participated in many Kaggle machine-learning competitions with regression and classification tasks. Some of them are:
Clustering and Comparing the Neighborhoods of New York City and Toronto: In this project, the neighborhoods were clustered into groups based on the similarity of the categories (types) of venues in the neighborhoods. Foursquare API was utilized to retrieve venues data. Project page can be found on: http://bit.ly/clustnt.
Focus Phase: An open-source time-tracking command-line application with statistics and visualizations. It is built using Python and published on the Python Package Index. Github link: http://bit.ly/focus-phase.
S3upload: an open-source Python application that makes it faster to upload a large number of files to AWS S3. Github link: http://bit.ly/s3upload.