Bena Smith

A collection of projects

Welcome to my site! Some of my project and life interests include:

• Machine Learning (NLP, Predictive Analytics, Recommendation Systems, GenAI, Sentiment Analysis, Genetic Algorithms, Automation)

• Statistics (Time Series Analysis, Generalized Linear Models, Bayesian Statistics, Stochastic Processes/ Markov Chains, Spatial Statistics)

• Art (Photography, Sketching, Graphs, Dancing, Electronic Music, Skateboarding)

• Working Collaboratively, Economics, Politics, History, Social Issues, Teaching, Brainstorming, Process Mapping, Meeting New People, Learning New Things, Presenting

LinkedIn ↗

Modeling Genetic Drift with Markov Chains

This is a collection of simulations and visualizations of genetic drift using Markov Chains.

Song Recommender

This Python script uses the Spotify 1 million playlist dataset and Word2Vec to suggest songs similar to a searched song. This video gives a high-level overview of how this algorithm works.

Sentiment Explorer

This Streamlit application takes a user query of a company or product, searches the web for that query then:

Creates an interactive graph of webpage sentiments
Uses OpenAI to find common features that users like and that can be improved

Mandelbrot Set

This Python program graphs the Mandelbrot set using an array of pixels that correspond to the divergence of f(z)=z^2+(the complex number x+yi at (x,y)).

Policies and Price Controls on the Research and Development of Orphan Drugs in the United States and the European Union

This is my graduate thesis at Cal Poly. I studied the differences in R&D spending and the development of drugs for rare diseases in the EU and US over time using generalized linear models and the Newey-West estimator for time series data.

Friend-Pairing Service

I created a recommendation system with a Flask backend and React frontend for a friend-making service. Procedure:

Simulated 300 profiles with OpenAI
Used the OpenAI sentence embedding API. Similar people have closer vector embeddings
I grouped the simulated profiles using K Nearest Neighbors
Users can fill out a form to be matched with similar people. Relevant events are also recommended

Chicago Crime Report

This report uses a dataset of Chicago crimes from 2002 to 2023. Our report includes:

Interactive R Shiny apps
Animated ggplots over time
Predictive and descriptive analysis using K-means and XGBoost
Detailed descriptions of our results and ethical considerations

NLP Search Engine and Sentiment Analysis

Using chat data from the American Bar Association’s pro bono chat service, my team and I developed:

A natural language processing-based search engine using Word2Vec
Sentiment analysis over chat category, time of day, and length of conversation

This project won the highest award at the American Statistical Association DataFest

Spotify Playlist Tableau Dashboard

Using Spotify’s API, I retrieved my playlists and their tracks. Interact with the dashboards to explore the artists and genres in my music library.

Streamlit California Housing App

This site uses a dataset of California housing prices. The app includes:

Visualizations of the dataset
Allows the user to select an area on map and other features which are used to predict house price using several regression and ensemble machine learning models including Ridge Regression, Random Forest, and AdaBoost.

GenAI SQL Agent

For a consumer packaged goods company, I created a GenAI-powered strategy report delivered via Microsoft Teams to 75 salespeople.

I used LangChain to query a sales database in plain language to return optimal sales strategies.