About

Human centered behavioral data scientist who uses data for you. I use your data to understand your problems, your needs, and what solutions will work for you, like a psychologist does during therapy. Connect with me on LinkedIn.

Basic Information
Age:
31
Email:
dustin.luchmee@protonmail.com
Address:
Philadelphia, PA
Language:
English
Professional Exprience

2019 - 2022

HappyNeuron Inc - Product Owner

Philadelphia, PA

Product Owner
  • Spearheaded the creation of written and video content to attract prospective clients and communicate company solutions; present company solutions at conferences.
  • Forged and foster strong client relationships to drive sales of company solutions and promote satisfaction and retention.
  • Facilitated training sessions to coach clients in product knowledge and offer technical support; explain complex technical concepts in easy-to-understand terms.
  • Partnered with the consulting branch of Humans Matter to boost business development by responding to proposal requests.

2016 - 2019

Moss Rehabilitation Research Institute

Elkins Park, PA

Research Assistant, Neuroplasticity and Motor Behavior Laboratory
  • Assisted in protocol design and implementation of studies as well as managed timelines for recruitment of participants, collection, and analysis of data.
  • Trained junior laboratory members in data collection, analysis, procedures, use of technology, and provided orientation to the institute and laboratory.
  • Did preliminary analysis using Excel/SPSS to describe findings in data.
  • Worked closely with the IRB coordinator to ensure proper conduction of research and make modifications as needed to ensure adherence to protocols.
  • Conducted literature searches as needed to assist in review article and protocol design.
  • Presented and designed posters for conferences as well as figures for posters and papers.
  • Provided feedback and assistance with implementation of the new patient registry.
  • Assisted and coordinated with scientists, patients, and fundraising teams for donor presentations.

2014 - 2015

University of Pennsylvania
Center for Cognitive Neuroscience

Philadelphia, PA

Research Intern
  • Managed the recruitment of participants for studies and the scheduling and conduction of neuroimaging sessions.
  • Conducted and assisted senior lab members with neurostimulation sessions and behavioral testing.
  • Shared and explained preliminary findings of studies with laboratory staff.
  • Managed diverse research studies including one study on executive functioning and the neural networks responsible for it, one study on smoking cessation and messaging, and one study on weight management and exercise.
  • Managed the recruitment of participants for studies and the scheduling and conduction of neuroimaging sessions.
  • Conducted and assisted senior lab members with neurostimulation sessions and behavioral testing.
  • Shared and explained preliminary findings of studies with laboratory staff.
  • Managed diverse research studies including one study on executive functioning and the neural networks responsible for it, one study on smoking cessation and messaging, and one study on weight management and exercise.
Professional Skills

  • Data acquisition & preprocessing
  • Python programming
  • Data mining
  • Applied machine learning for data science
  • Data analysis
  • Natural language processing for data science
  • Data acquisition & preprocessing
  • Data visualization
  • Social network analysis
  • Blogging
  • Content Development
  • Research methods
  • Data science
PROJECTS

Project Title: How frequently is depression discussed in r/Narcolepsy?

Techniques Used:

This project aimed to understand how frequently depression was discussed in a sample of data collected from the Reddit forum r/Narcolepsy. Techniques employed:

  • Exploratory data analysis
  • Text cleaning
  • TF-IDF
  • Cosine Similarity
  • NLTK
  • spaCy
  • Scikit-Learn
  • MatPlotLib
  • WordCloud

Summary: Patient forums are helpful for patients, caregivers, and others living with a medical condition. Information shared on these forums can be used to understand what struggles different medical communities have and what therapies may work for different conditions. Depression is a common psychological impact resulting from struggling with narcolepsy. In this project, I found that 20% of posts and comments pertained to depression.

Project Title: Big 5 Personality Inventory Analysis

Techniques Used:

The Big 5 Personality Inventory is an assessment used to measure a person’s personality using questions that examine the 5 broad dimensions of personality. This assessment can be used to provide someone with a sense of self-awareness, help them find roles or workplace environments that they would enjoy, and even for dating!
While popular, the Big 5 Personality inventory does have limitations. For one, the assessment is limited, with many critics concerned about the absence of a comprehensive theory. Second, individuals may answer questions in a way that they deem socially acceptable rather than true to their own nature. Lastly, personality changes over time as individuals mature or face new situations in life. Thus, the results of this test are not stable.
Nonetheless, this was a fun project to work on to connect data science with psychology! Techniques employed:

  • Exploratory data analysis
  • Data visualization
  • NumPy
  • NumPy
  • Cosine Similarity
  • SeaBorn

Project Title: Spotify Playlist Classification

Techniques Used:

Classification project that uses data scraped from Spotify using Spotify’s API. Techniques employed:

  • Exploratory data analysis
  • NumPy
  • Pandas
  • SeaBorn
  • Scikit-Learn
  • Logistic regression
  • SVM

Summary: Playlist classification using variables of artist, album, danceability, energy, key, loudness, mode, speechiness, instrumentalness, liveness, valence, & tempo. Analytical techniques compared logistic regression vs. variations of SVM. Data was scraped using Spotify's API, playlists selected were those I recently listened to.

Project Title: Restaurant Recommendation

Techniques Used:

Restaurant recommendation project that uses data from restaurant reviews to recommend similar restaurants Techniques employed:

  • Exploratory data analysis
  • Text cleaning
  • TF-IDF
  • Cosine Similarity
  • NLTK
  • MatPlotLib

Summary: Each day, millions of people look for restaurants to try. In this project, I developed a restaurant recommendation system utilizing customer reviews of restaurants using TF-IDF for the content based recommender system that was deployed.

Course: Data Science Capstone 1

Project Title: Sentiment analysis of Amazon product reviews

Techniques Used:

  • Exploratory data analysis
  • Text cleaning
  • spaCy
  • Scikit-learn
  • WordCloud
  • Sentiment analysis

Summary: Each day, millions of people leave reviews on Amazon regarding their experience with different products. Sentiment analysis regarding specific product types can provide insight as to what customer preferences exist as well as frustrations. Amazon product reviews for electronic products were explored, cleaned, and analyzed in order to determine which products were most successful and which ones were least liked by customers.

Course: Data Science Capstone 2

Project Title: Using Twitter tweets and news headlines to predict stock market day return

Techniques Used:

  • Exploratory data analysis
  • Text cleaning
  • Linear regression
  • Scikit-learn
  • WordCloud
  • Decision tree regression

Summary: Stock market prediction is a noteworthy task as investors are eager to earn additional income. The stock market is subject to volatility, especially from cultural influences such as when Manuel Locatelli shoved a Coke bottle away and demanded water. Using linear regression and decision tree regression, I aimed to see whether it is possible to predict day return using tweets and news headlines.

Course: Natural language processing with deep learning for data science

Project Title: Extracting tweet-like summaries from the news

Techniques Used:

  • Continuous Bag of Words (CBOW)
  • Scikit-learn
  • Text and data mining (TDM)
  • Inverse document frequency (IDF)
  • Logistic regression
  • ROUGE-L
  • Matplotlib

Summary: News stories can be further condensed into ‘tweet’ like summaries. By operating a statistical engine, isolating the inverse document frequencies, creating a logistic classifier, and evaluating the precision, accuracy, and recall metrics of our summarizations. Box-and-whisker plotting was done to visualize these metrics.

Project Title: Abstracting summaries of the news

Techniques Used:

  • Continuous Bag of Words (CBOW)
  • N-gram counting
  • Language modeling
  • Scikit-learn
  • Recitation function
  • Perplexity performance evaluation
  • Rambling functions

Summary: This project was performed in order to learn language modeling techniques. By operating a statistical engine, building n-gram frequencies, building a language model with a model sampler, creating a recitation and rambling function, and a perplexity performance evaluation function, I was able to summarize a news story about an upcoming Robert Downey Jr. film.

Project Title: GloVe semantic representation

Techniques Used:

  • Co-occurrence matrix
  • Loss functions
  • GloVe loss and gradient functions
  • Sampling
  • Analogy testing

Summary: This project was performed in order to gain experience with iteration based learning. In this project, I had to build a model’s training data, weight co-occurrence matrices, implement a loss function, build a sampling function, operate the GloVe function, and evaluate the performance of the model.

Project Title: Conversational disentanglement

Techniques Used:

  • - Tokenization
  • - PyTorch
  • Loss function
  • Train function
  • Evaluation function
  • Position embedding
  • Time embedding

Summary: Conversational disentanglement allows for individuals to know which speakers express which sentiments or attitudes. This project required the use of tokenization, constructing a PyTorch dataset, constructing a network architecture, constructing a loss function, building an optimizer, creating an evaluation function, implementing position embedding, and a time embedding. In addition to this, a linear layer was used to evaluate the success of the network.

Project Title: Open Information Extraction (OIE)

Techniques Used:

  • Recurrent neural networks (RNN)
  • Tokenization
  • Part of speech tagging
  • Multi modal vector representation
  • Neural LSTM tagging

Summary: Recurrent neural networks can be used for tagging tasks. In this exercise, I created a recurrent neural network that performed part-of-speech (POS) tagging. The model was evaluated using scikit-learn’s classification report.

Project Title: Abstractive Summarization of Scientific Papers with BART

Techniques Used:

  • BART
  • Hugging Face
  • Text summarization
  • ROUGE metrics

Summary: Scientific texts are summarized in an abstract. In this project, my team and I fine-tuned a pre-trained BART model to perform an abstractive summarization of a scientific document in order to generate an abstract. In addition, this allowed us to gain further experience and understanding of using Hugging Face.

Course: Applied Machine Learning for Data Science

Project Title: New York City Airbnb Open Data via Kaggle

Techniques Used:

  • Exploratory data analysis
  • Pipeline building
  • Linear Regression
  • Decision Tree Regression

Summary: Airbnb’s within one area are priced differently for various reasons. In this assignment, I first used exploratory data analysis using Matplotlib to explore numerical data variables. Next, I employed scikit-learn to split data into training and test sets. After this, I then built a pipeline to produce usable training and testing data. Lastly, I performed linear regression and decision tree regression root mean square errors to see which model best predicted Airbnb pricing based on the available variables.

Project Title: Income Classification

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Logistic regression
  • Scikit-learn
  • Support vector machine model
  • Poly support vector machine model
  • Naïve bayes mode
  • K-nearest neighbor
  • Hyperparameter tuning

Summary: Income determination of individuals has been an interest in the context of addressing social problems, such as poverty. In this assignment, classification techniques were tested to determine which model was best able to determine if an individual made over $50,000 USD per year. Exploratory data analysis was performed using Seaborn. Test and training data were separated using scikit-learn. Next, logistic regression, SVM, pSVM, naïve bayes, and KNN models were tested. It was determined that SVM was the best model to use in order to classify which individuals made over $50K.

Project Title: Unsupervised machine learning techniques for vehicle pricing

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Linear regression
  • Logistic regression
  • Decision tree regression
  • Random forest regression
  • Scikit-learn

Summary: Vehicles of all brands and conditions vary in price across the world. In this project, regression techniques were employed to determine which model best predicted vehicle price. Exploratory data analysis was performed using Seaborn. Test and training data were separated using scikit-learn. Next, linear regression, logistic regression, decision tree regression, and random forest regression were tested. In this exercise, random forest regression was found to be the best predictor of vehicle pricing.

Project Title: Predicting heart disease

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Logistic regression
  • Scikit-learn
  • Support vector machine model
  • Naïve bayes model

Summary: Across the United States, heart disease is a top killer of American adults. When people are admitted to the hospital, numerous data points are collected. In this assignment, we used common health data points in order to find which model most accurately predicted whether someone would develop heart disease. Models chosen for comparison included logistic regression, SVM, and naïve bayes modeling. It was found that logistic regression was the best model to use to predict whether someone is likely to develop heart disease.

EDUCATION

Dec 2021

Master's
Master of Science, Data Science

Drexel University | Philadelphia, PA

May 2013

Bachelor's
Bachelor of Arts, Psychology and Cognitive Science

University of Richmond | Richmond, VA

ACHIEVEMENTS

Research Presentations:

  • Johnson, T., Ridgeway, G., Luchmee, D., & Kantak, S. (2021)>. Bimanual coordination during reach-to-grasp actions is sensitive to task goal with distinctions between left- and right- hemispheric stroke. Submitted.
  • Kantak, S., & Luchmee, D. (2020)>. Contralesional motor cortex is causally engaged during more dexterous actions of the paretic hand after stroke-A Preliminary report. Neuroscience Letters, 134751.
  • Kantak, S., McGrath, R., Zahedi, N. & Luchmee, D. (2017). Behavioral and neurophysiological mechanisms underlying motor skill learning in patients with post - stroke hemiparesis. Clinical Neurophysiology, 129(1), 1 - 12.
  • Research Presentations:

    • Kantak, S. & Luchmee, D. (2017, November). Dexterity requirements modulate ipsilateral motor corticospinal excitability in post-stroke individuals. Poster session presented at the annual Society for Neuroscience conference in Washington, D.C.