
Bena Smith
A collection of projects

• Machine Learning (NLP, Predictive Analytics, Recommendation Systems, GenAI, Sentiment Analysis, Genetic Algorithms, Reinforcement Learning, Automation)
• Statistics (Generalized Linear Models, Time Series Analysis, Bayesian Statistics, Stochastic Processes/ Markov Chains, Spatial Statistics)
• Art (Photography, Sketching, Graphs, Dancing, Electronic Music, Skateboarding)
• Working Collaboratively, Economics, Politics, History, Social Issues, Teaching, Brainstorming, Process Mapping, Meeting New People, Learning New Things, Presenting
Modeling Genetic Drift with Markov Chains
This is a collection of simulations and visualizations of genetic drift using Markov Chains.

Sentiment Explorer
This Streamlit application takes a user query of a company or product, searches the web for that query then:
- Creates an interactive graph of webpage sentiments
- Uses OpenAI to find common features that users like and that can be improved

Mandelbrot Set
This Python program graphs the Mandelbrot set using an array of pixels that correspond to the divergence of f(z)=z^2+(the complex number x+yi at (x,y)).

Policies and Price Controls on the Research and Development of Orphan Drugs in the United States and the European Union
This is my graduate thesis at Cal Poly. I studied the differences in R&D spending and the development of drugs for rare diseases in the EU and US over time using generalized linear models and the Newey-West estimator for time series data.

Song Recommender
This Python script uses the Spotify 1 million playlist dataset and Word2Vec to suggest songs similar to a searched song.

Friend-Pairing Service
I created a recommendation system with a Flask backend and React frontend for a friend-making service. Procedure:
- Simulated 300 profiles with OpenAI
- Used the OpenAI sentence embedding API. Similar people have closer vector embeddings
- I grouped the simulated profiles using K Nearest Neighbors
- Users can fill out a form to be matched with similar people. Relevant events are also recommended

Chicago Crime Report
This report uses a dataset of Chicago crimes from 2002 to 2023. Our report includes:
- Interactive R Shiny apps
- Animated ggplots over time
- Predictive and descriptive analysis using K-means and XGBoost
- Detailed descriptions of our results and ethical considerations

NLP Search Engine and Sentiment Analysis
Using chat data from the American Bar Association’s pro bono chat service, my team and I developed:
- A natural language processing-based search engine using Word2Vec
- Sentiment analysis over chat category, time of day, and length of conversation
This project won the highest award at the American Statistical Association DataFest

Spotify Playlist Tableau Dashboard
Using Spotify’s API, I retrieved my playlists and their tracks. Interact with the dashboards to explore the artists and genres in my music library.

Streamlit California Housing App
This site uses a dataset of California housing prices. The app includes:
- Visualizations of the dataset
- Allows the user to select an area on map and other features which are used to predict house price using several regression and ensemble machine learning models including Ridge Regression, Random Forest, and AdaBoost.
GenAI SQL Agent
For a consumer packaged goods company, I created a GenAI-powered strategy report delivered via Microsoft Teams to 75 salespeople.
I used LangChain to query a sales database in plain language to return optimal sales strategies.
