
Brendan C. Smith
Lead Data Scientist
Data scientist and engineer at the intersection of machine learning, cloud infrastructure, and scalable data systems. From core platform engineering at AWS and Azure to lead data scientist at Best Egg, I bring full-stack technical depth to every problem.
Education
01/2022 – 12/2024
M.S. in Data Science
University of Texas at Austin
GPA: 3.89 / 4.00
- Created DNNs using PyTorch for the vision system of a racing simulator. Implemented networks for object detection, keypoint estimation, semantic segmentation, multi-action networks, and reinforcement learning for autonomous driving.
- Created neural networks for Natural Language Processing (NLP) applications such as Semantic Parsing & Labelling, Sentiment Analysis, and Language Generation.
08/2012 – 05/2016
B.S. in Computer Science
University of Nebraska – Lincoln
GPA: 3.90 / 4.00
- Student in the honors program, Jeffrey S. Raikes School of Computer Science and Business Management.
- Minor in Business Management and Mathematics.
- Capstone project involved working with local businesses to implement software solutions.
Experience
09/2024 – 10/2025
Lead Data Scientist
Best Egg · Remote
Led data science and MLOps initiatives for a consumer fintech lending platform, owning credit risk modeling, ML pipeline infrastructure, and model deployment for the Flexible Rent product line and the core personal loan underwriting model.
- Built a next-generation customer expansion XGBoost model for the Flexible Rent Platform, enabling Best Egg to extend credit to applicants with thin or subprime bureau profiles by verifying healthy cash flows directly from bank account data — unlocking a segment traditional credit scoring would have declined.
- Integrated alternative data sources — bank transaction records, third-party payment histories, and bureau tradeline data — and engineered features across rolling windows (30-day, 90-day, 6-month) capturing spending volatility, income stability, deposit frequency, and debt-to-income trends.
- Constructed lagged and differenced features to surface momentum signals such as improving payment behavior or deteriorating cash reserves. Improved discriminative power by 23% (Gini coefficient) through iterative model tuning and feature selection.
- Designed end-to-end MLOps pipelines in Metaflow to automate the challenger model lifecycle — feature engineering, hyperparameter tuning, validation, and deployment — for the primary credit risk underwriting model, accelerating experiment-to-production velocity.
- Trained a team of 3 other Data Scientists to contribute flows to the project.
- Built an agentic RAG system with tool-use orchestration over internal documentation, including a Snowflake SQL integration for natural-language data queries, reducing time-to-insight for non-technical stakeholders.
04/2024 – 07/2024
Senior Data Engineer – Contract
Burns & McDonnell · Kansas City, MO
Contracted to build the ingestion and transformation layer of a Databricks lakehouse for a major engineering consultancy's enterprise data platform.
- Built Airflow DAGs to ingest full-load and incremental data from various SQL databases.
- Transformed records between medallion stages using dbt within the Databricks Medallion Architecture.
- Designed a logging strategy integrated with Azure Monitor / Log Analytics.
01/2024 – 04/2024
Graduate Learning Facilitator – Machine Learning
University of Texas at Austin · Remote
Supported graduate-level ML instruction for UT Austin's MS in AI program while completing the same degree.
- Served as a teaching assistant for a graduate-level Machine Learning course, supporting instruction, grading, and administration for a cohort of approximately 465 students.
05/2023 – 08/2023
Data Scientist
Propense.ai · Remote
Built data enrichment and recommendation capabilities for an early-stage B2B sales intelligence startup, turning third-party knowledge graphs into actionable prospecting signals.
- Augmented enterprise knowledge graphs with internal data to cold-start a B2B recommendation system, solving the new-platform data sparsity problem for market insights.
- Identified patterns in sales gaps by analyzing client and sales history data. Presented actionable insights to clients, securing 5 initial contracts at launch.
02/2023 – 05/2023
Quantitative Investments & Data Science Intern
Nexus Equities · Remote
Applied computer vision to commercial real estate underwriting, automating a manual site assessment step in the investment pipeline.
- Developed a PyTorch computer vision model to estimate the useable land area on outdoor storage facilities using satellite imagery, accelerating approximately 50 investment decisions.
10/2018 – 01/2021
Software Development Engineer – EC2 Core Platform
Amazon Web Services (AWS) · Seattle, WA
Owned critical EC2 host lifecycle services and built internal data infrastructure for one of AWS's largest and most operationally complex services.
- Owned and operated two critical services: one to drain customer instances from unhealthy EC2 hosts, and another to proactively recycle older hosts for re-provisioning. Identified and resolved deadlock conditions, resulting in a $300k/month reduction in 'unsellable' rate.
- Built a centralized data lake on AWS using Python and PySpark, ingesting real-time data from DynamoDB, RDS, S3, and Athena — replacing previously siloed EC2 internal datasets with a unified analytics platform.
- Deployed AWS Glue ETL pipelines to extract cross-regional data from 300+ internal production accounts and surfaced insights through QuickSight dashboards for TPM stakeholders.
- Implemented and deployed a capacity forecasting ML model integrated into the proactive host re-provisioning workflow, increasing turnover rate by up to 18% per region.
08/2016 – 10/2018
Software Development Engineer – HDInsight
Microsoft Azure · Redmond, WA
Shipped anomaly detection and platform reliability improvements for Azure's managed Hadoop service, collaborating directly with Apache open-source communities.
- Served as cross-org liaison for root cause analysis of regressions in Apache Hadoop ecosystem products (Spark, Kafka, etc.), coordinating fixes with upstream Apache engineers.
- Proposed, designed, and deployed time series anomaly detection ML models on Azure, improving alarm triggers and identifying cluster configurations with high customer impact. Reduced average TTD by ~55% and TTR by ~20%.
- Joined a small v-team to refactor the HDInsight control plane, enabling flexible cluster shapes. Closed the feature gap with competitors while increasing service reliability KPIs and reducing COGS.
09/2015 – 05/2016
Design Studio Software Developer – Decision Science
Hudl · Lincoln, NE
Built A/B testing and model shipping workflows for ML-powered video highlight detection for a sports technology platform used by high school basketball teams and athletes.
- Developed a temporal convolutional neural network to automatically detect basketball highlights from user-uploaded video.
- Ran continuous A/B experiments measuring engagement, bounce rate, and stickiness to iterate on model effectiveness, achieving significant growth in active users (DAU/WAU).
Projects
Sports Bets Recommendation Platform
An end-to-end quantitative trading system that treats sports betting as a financial portfolio problem — applying risk management and optimization techniques from quantitative finance to identify and size value bets across over a dozen sports.
- Built a plugin-based platform in Python using protocol-driven architecture with command factories and a sport registry, enabling rapid onboarding of new sports and market types.
- Engineered an XGBoost/LightGBM ensemble pipeline with Optuna HPO, isotonic calibration, and walk-forward backtesting to produce calibrated probability estimates that feed the portfolio optimizer.
- Implemented mean-variance portfolio optimization with fractional Kelly criterion sizing and correlation-aware multi-market calibration across moneyline, spread, totals, and player prop markets.
- Integrated ESPN, Kalshi (RSA-authenticated), and The Odds API with Parquet caching, TTL expiration, and parallel async fetching to maintain a continuously updated view of available odds.
- Automated the full retraining-to-execution loop via the Kalshi API, enforcing position limits, bankroll allocation targets, and cross-sport correlation caps.
Portfolio Website
A performant, zero-dependency portfolio site built to present work history and projects with fast load times, clean design, and a maintainable content architecture.
- Built with Next.js 16, React 19, and TypeScript, styled with Tailwind CSS 4 — statically exported for minimal bundle size and instant page loads.
- Separated all content into typed TypeScript data files, keeping UI components free of copy and enabling quick updates without touching JSX.
- Hosted on Vercel with automatic deploys from GitHub, including scroll-aware navigation, fade-in animations, and inline SVG icons with zero additional runtime dependencies.
Panel of Experts
A chatbot that improves LLM answer quality by querying OpenAI multiple times in parallel and synthesizing a consensus response — a simple implementation of self-consistency sampling (Wang et al. 2022).
- Queries OpenAI in parallel using LangChain's RunnableParallel and abatch, then feeds all expert responses into a consensus chain that reasons across them to produce a more reliable final answer.
- Demonstrates emergent problem-solving: the consensus moderator can solve reasoning puzzles that none of the individual expert responses answer correctly, by evaluating and combining multiple approaches.
- Built with LangChain, Chainlit, and OpenAI, with conversation memory, prompt templating, and streaming output.
RAG Chat Application
A proof-of-concept for document-grounded conversational AI, This standalone version with Chainlit user interface lets users upload PDFs and get accurate, sourced answers.
- Built with LangChain, Chainlit, ChromaDB, and OpenAI, demonstrating the core retrieval-augmented generation pattern used in enterprise document Q&A systems.
- Implemented document ingestion with PDFPlumber and recursive text splitting tuned for retrieval quality, indexed into an ephemeral ChromaDB vector store with OpenAI embeddings.
- Containerized with Docker and a devcontainer configuration, with pre-commit hooks and GitHub Actions CI.
Video Subtitling Tool
A practical automation tool that eliminates the manual workflow of video subtitling — extracting audio, transcribing with AI, and burning subtitles back in, all in a single command.
- Automates the full subtitling pipeline in a single CLI invocation: ffmpeg audio extraction, OpenAI Whisper transcription, SRT generation, and subtitle burn-in.
- Processes entire directories of video files idempotently, skipping already-subtitled outputs for safe re-runs against growing media libraries.
Diffbot Knowledge Graph Client
An open-source Python client for the Diffbot Knowledge Graph API, built to support enterprise knowledge graph augmentation work at Propense.ai and published on PyPI for the broader developer community.
- Developed an async Python client for B2B data enrichment via the Diffbot Knowledge Graph API, installable as pip install diffbot-kg.
- Implemented production-grade resilience patterns including token bucket rate limiting with aiolimiter and exponential backoff retries via tenacity, with Pydantic response models for type-safe API interaction.
- Tested with pytest using VCR cassettes for deterministic API replay, with CI via GitHub Actions.
Flocking Simulation
An interactive boids simulation that demonstrates how complex, lifelike flocking behavior emerges from simple local rules applied to individual agents with no central coordination.
- Implemented the classic boids algorithm in Python with pygame — separation, alignment, and cohesion rules produce emergent flocking behavior in real time at 60 FPS.
- Added interactive mouse forces (attract/repel), runtime-adjustable speed and perception radius, and spawning controls for hands-on exploration of parameter effects.
- Architected with clean dataclass models and strict mypy typing, managed with uv and linted with ruff.