A decorative image of blue and red lines

Bena Smith

A collection of projects

Picture of Bena Smith
I am a data scientist and statistician excited about predictive analytics, natural language processing, AI agents, recommender systems, time series analysis and stochastic processes. I enjoy creating data-powered applications and solving mysteries using statistics. Below are projects I have worked on in school and during my free time.

Modeling Genetic Drift with Markov Chains

This is a collection of simulations and visualizations of genetic drift using Markov Chains.

Picture of a graph of 3 genotypes over time in red blue and green

Sentiment Explorer

This Streamlit application takes a user query of a company or product, searches the web for that query then:

  • Creates an interactive graph of webpage sentiments
  • Uses OpenAI to find common features that users like and that can be improved
Positive Waymo reviews

Mandelbrot Set

This Python program graphs the Mandelbrot set using an array of pixels that correspond to the divergence of f(z)=z^2+(the complex number x+yi at (x,y)).

 
A yellow and blue image of the Madelbrot Set Fractal

Song Recommender

This Python script uses the Spotify 1 million playlist dataset and Word2Vec to suggest songs similar to a searched song.

A PCA graph of song names. Songs that are similar to one another are closer

NLP Search Engine and Sentiment Analysis

Using chat data from the American Bar Association’s pro bono chat service, my team and I developed:

  • A natural language processing-based search engine using Word2Vec
  • Sentiment analysis over chat category, time of day, and length of conversation

This project won the highest award at the American Statistical Association DataFest

Chicago Crime Report

This report uses a dataset of Chicago crimes from 2002 to 2023. Our report includes:

  • Interactive R Shiny apps
  • Animated ggplots over time
  • Predictive and descriptive analysis using K-means and XGBoost
  • Detailed descriptions of our results and ethical considerations
Gif of Plot of Chicago Interference with Officer Crimes by Lat and Long changing over time with arrest rates by cluster

Spotify Playlist Tableau Dashboard

Using Spotify’s API, I retrieved my playlists and their tracks. Interact with the dashboards to explore the artists and genres in my music library.

A picture of a dashboard where a bar is clicked called "women in edm" edm artists and genres are displayed

Streamlit California Housing App

This site uses a dataset of California housing prices. The app includes:

  • Visualizations of the dataset
  • Allows the user to select an area on map and other features which are used to predict house price using several machine learning models. 

Policies and Price Controls on the Research and Development of Orphan Drugs in the United States and the European Union

This is my graduate thesis at Cal Poly. I studied the differences in R&D spending and the development of drugs for rare diseases in the EU and US over time using generalized linear models and the Newey-West estimator for time series data.

Black text on a white background: POLICIES AND PRICE CONTROLS ON THE RESEARCH AND DEVELOPMENT OF ORPHAN DRUGS IN THE UNITED STATES AND THE EUROPEAN UNION A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo