
Bena Smith
A collection of projects

I am a data scientist and statistician excited about predictive analytics, natural language processing, AI agents, recommender systems, time series analysis and stochastic processes. I enjoy creating data-powered applications and solving mysteries using statistics. Below are projects I have worked on in school and during my free time.
Modeling Genetic Drift with Markov Chains
This is a collection of simulations and visualizations of genetic drift using Markov Chains.

Sentiment Explorer
This Streamlit application takes a user query of a company or product, searches the web for that query then:
- Creates an interactive graph of webpage sentiments
- Uses OpenAI to find common features that users like and that can be improved

Mandelbrot Set
This Python program graphs the Mandelbrot set using an array of pixels that correspond to the divergence of f(z)=z^2+(the complex number x+yi at (x,y)).

Song Recommender
This Python script uses the Spotify 1 million playlist dataset and Word2Vec to suggest songs similar to a searched song.

NLP Search Engine and Sentiment Analysis
Using chat data from the American Bar Association’s pro bono chat service, my team and I developed:
- A natural language processing-based search engine using Word2Vec
- Sentiment analysis over chat category, time of day, and length of conversation
This project won the highest award at the American Statistical Association DataFest

Chicago Crime Report
This report uses a dataset of Chicago crimes from 2002 to 2023. Our report includes:
- Interactive R Shiny apps
- Animated ggplots over time
- Predictive and descriptive analysis using K-means and XGBoost
- Detailed descriptions of our results and ethical considerations

Spotify Playlist Tableau Dashboard
Using Spotify’s API, I retrieved my playlists and their tracks. Interact with the dashboards to explore the artists and genres in my music library.

Streamlit California Housing App
This site uses a dataset of California housing prices. The app includes:
- Visualizations of the dataset
- Allows the user to select an area on map and other features which are used to predict house price using several machine learning models.
Policies and Price Controls on the Research and Development of Orphan Drugs in the United States and the European Union
This is my graduate thesis at Cal Poly. I studied the differences in R&D spending and the development of drugs for rare diseases in the EU and US over time using generalized linear models and the Newey-West estimator for time series data.
