Brendan C. Smith

Lead Data Scientist

Data scientist and engineer at the intersection of machine learning, cloud infrastructure, and scalable data systems. From core platform engineering at AWS and Azure to lead data scientist at Best Egg, I bring full-stack technical depth to every problem.

hire@brendansmith.ai 816-522-2998 LinkedIn GitHub Resume

Education

01/2022 – 12/2024

M.S. in Data Science

University of Texas at Austin

GPA: 3.89 / 4.00

Created DNNs using PyTorch for the vision system of a racing simulator. Implemented networks for object detection, keypoint estimation, semantic segmentation, multi-action networks, and reinforcement learning for autonomous driving.
Created neural networks for Natural Language Processing (NLP) applications such as Semantic Parsing & Labelling, Sentiment Analysis, and Language Generation.

08/2012 – 05/2016

B.S. in Computer Science

University of Nebraska – Lincoln

GPA: 3.90 / 4.00

Student in the honors program, Jeffrey S. Raikes School of Computer Science and Business Management.
Minor in Business Management and Mathematics.
Capstone project involved working with local businesses to implement software solutions.

Experience

09/2024 – 10/2025

Lead Data Scientist

Best Egg · Remote

Led data science and MLOps initiatives for a consumer fintech lending platform, owning credit risk modeling, ML pipeline infrastructure, and model deployment for the Flexible Rent product line and the core personal loan underwriting model.

Built a next-generation customer expansion XGBoost model for the Flexible Rent Platform, enabling Best Egg to extend credit to applicants with thin or subprime bureau profiles by verifying healthy cash flows directly from bank account data — unlocking a segment traditional credit scoring would have declined.
Integrated alternative data sources — bank transaction records, third-party payment histories, and bureau tradeline data — and engineered features across rolling windows (30-day, 90-day, 6-month) capturing spending volatility, income stability, deposit frequency, and debt-to-income trends.
Constructed lagged and differenced features to surface momentum signals such as improving payment behavior or deteriorating cash reserves. Improved discriminative power by 23% (Gini coefficient) through iterative model tuning and feature selection.
Designed end-to-end MLOps pipelines in Metaflow to automate the challenger model lifecycle — feature engineering, hyperparameter tuning, validation, and deployment — for the primary credit risk underwriting model, accelerating experiment-to-production velocity.
Trained a team of 3 other Data Scientists to contribute flows to the project.
Built an agentic RAG system with tool-use orchestration over internal documentation, including a Snowflake SQL integration for natural-language data queries, reducing time-to-insight for non-technical stakeholders.

04/2024 – 07/2024

Senior Data Engineer – Contract

Burns & McDonnell · Kansas City, MO

Contracted to build the ingestion and transformation layer of a Databricks lakehouse for a major engineering consultancy's enterprise data platform.

Built Airflow DAGs to ingest full-load and incremental data from various SQL databases.
Transformed records between medallion stages using dbt within the Databricks Medallion Architecture.
Designed a logging strategy integrated with Azure Monitor / Log Analytics.

01/2024 – 04/2024

Graduate Learning Facilitator – Machine Learning

University of Texas at Austin · Remote

Supported graduate-level ML instruction for UT Austin's MS in AI program while completing the same degree.

Served as a teaching assistant for a graduate-level Machine Learning course, supporting instruction, grading, and administration for a cohort of approximately 465 students.

05/2023 – 08/2023

Data Scientist

Propense.ai · Remote

Built data enrichment and recommendation capabilities for an early-stage B2B sales intelligence startup, turning third-party knowledge graphs into actionable prospecting signals.

Augmented enterprise knowledge graphs with internal data to cold-start a B2B recommendation system, solving the new-platform data sparsity problem for market insights.
Identified patterns in sales gaps by analyzing client and sales history data. Presented actionable insights to clients, securing 5 initial contracts at launch.

02/2023 – 05/2023

Quantitative Investments & Data Science Intern

Nexus Equities · Remote

Applied computer vision to commercial real estate underwriting, automating a manual site assessment step in the investment pipeline.

Developed a PyTorch computer vision model to estimate the useable land area on outdoor storage facilities using satellite imagery, accelerating approximately 50 investment decisions.

10/2018 – 01/2021

Software Development Engineer – EC2 Core Platform

Amazon Web Services (AWS) · Seattle, WA

Owned critical EC2 host lifecycle services and built internal data infrastructure for one of AWS's largest and most operationally complex services.

Owned and operated two critical services: one to drain customer instances from unhealthy EC2 hosts, and another to proactively recycle older hosts for re-provisioning. Identified and resolved deadlock conditions, resulting in a $300k/month reduction in 'unsellable' rate.
Built a centralized data lake on AWS using Python and PySpark, ingesting real-time data from DynamoDB, RDS, S3, and Athena — replacing previously siloed EC2 internal datasets with a unified analytics platform.
Deployed AWS Glue ETL pipelines to extract cross-regional data from 300+ internal production accounts and surfaced insights through QuickSight dashboards for TPM stakeholders.
Implemented and deployed a capacity forecasting ML model integrated into the proactive host re-provisioning workflow, increasing turnover rate by up to 18% per region.

08/2016 – 10/2018

Software Development Engineer – HDInsight

Microsoft Azure · Redmond, WA

Shipped anomaly detection and platform reliability improvements for Azure's managed Hadoop service, collaborating directly with Apache open-source communities.

Served as cross-org liaison for root cause analysis of regressions in Apache Hadoop ecosystem products (Spark, Kafka, etc.), coordinating fixes with upstream Apache engineers.
Proposed, designed, and deployed time series anomaly detection ML models on Azure, improving alarm triggers and identifying cluster configurations with high customer impact. Reduced average TTD by ~55% and TTR by ~20%.
Joined a small v-team to refactor the HDInsight control plane, enabling flexible cluster shapes. Closed the feature gap with competitors while increasing service reliability KPIs and reducing COGS.

09/2015 – 05/2016

Design Studio Software Developer – Decision Science

Hudl · Lincoln, NE

Built A/B testing and model shipping workflows for ML-powered video highlight detection for a sports technology platform used by high school basketball teams and athletes.

Developed a temporal convolutional neural network to automatically detect basketball highlights from user-uploaded video.
Ran continuous A/B experiments measuring engagement, bounce rate, and stickiness to iterate on model effectiveness, achieving significant growth in active users (DAU/WAU).

Projects

Sports Bets Recommendation Platform

An end-to-end quantitative trading system that treats sports betting as a financial portfolio problem — applying risk management and optimization techniques from quantitative finance to identify and size value bets across over a dozen sports.

Built a plugin-based platform in Python using protocol-driven architecture with command factories and a sport registry, enabling rapid onboarding of new sports and market types.
Engineered an XGBoost/LightGBM ensemble pipeline with Optuna HPO, isotonic calibration, and walk-forward backtesting to produce calibrated probability estimates that feed the portfolio optimizer.
Implemented mean-variance portfolio optimization with fractional Kelly criterion sizing and correlation-aware multi-market calibration across moneyline, spread, totals, and player prop markets.
Integrated ESPN, Kalshi (RSA-authenticated), and The Odds API with Parquet caching, TTL expiration, and parallel async fetching to maintain a continuously updated view of available odds.
Automated the full retraining-to-execution loop via the Kalshi API, enforcing position limits, bankroll allocation targets, and cross-sport correlation caps.

Portfolio Website

A performant, zero-dependency portfolio site built to present work history and projects with fast load times, clean design, and a maintainable content architecture.

Built with Next.js 16, React 19, and TypeScript, styled with Tailwind CSS 4 — statically exported for minimal bundle size and instant page loads.
Separated all content into typed TypeScript data files, keeping UI components free of copy and enabling quick updates without touching JSX.
Hosted on Vercel with automatic deploys from GitHub, including scroll-aware navigation, fade-in animations, and inline SVG icons with zero additional runtime dependencies.

Panel of Experts

A chatbot that improves LLM answer quality by querying OpenAI multiple times in parallel and synthesizing a consensus response — a simple implementation of self-consistency sampling (Wang et al. 2022).

Queries OpenAI in parallel using LangChain's RunnableParallel and abatch, then feeds all expert responses into a consensus chain that reasons across them to produce a more reliable final answer.
Demonstrates emergent problem-solving: the consensus moderator can solve reasoning puzzles that none of the individual expert responses answer correctly, by evaluating and combining multiple approaches.
Built with LangChain, Chainlit, and OpenAI, with conversation memory, prompt templating, and streaming output.

RAG Chat Application

A proof-of-concept for document-grounded conversational AI, This standalone version with Chainlit user interface lets users upload PDFs and get accurate, sourced answers.

Built with LangChain, Chainlit, ChromaDB, and OpenAI, demonstrating the core retrieval-augmented generation pattern used in enterprise document Q&A systems.
Implemented document ingestion with PDFPlumber and recursive text splitting tuned for retrieval quality, indexed into an ephemeral ChromaDB vector store with OpenAI embeddings.
Containerized with Docker and a devcontainer configuration, with pre-commit hooks and GitHub Actions CI.

Video Subtitling Tool

A practical automation tool that eliminates the manual workflow of video subtitling — extracting audio, transcribing with AI, and burning subtitles back in, all in a single command.

Automates the full subtitling pipeline in a single CLI invocation: ffmpeg audio extraction, OpenAI Whisper transcription, SRT generation, and subtitle burn-in.
Processes entire directories of video files idempotently, skipping already-subtitled outputs for safe re-runs against growing media libraries.

Diffbot Knowledge Graph Client

An open-source Python client for the Diffbot Knowledge Graph API, built to support enterprise knowledge graph augmentation work at Propense.ai and published on PyPI for the broader developer community.

Developed an async Python client for B2B data enrichment via the Diffbot Knowledge Graph API, installable as pip install diffbot-kg.
Implemented production-grade resilience patterns including token bucket rate limiting with aiolimiter and exponential backoff retries via tenacity, with Pydantic response models for type-safe API interaction.
Tested with pytest using VCR cassettes for deterministic API replay, with CI via GitHub Actions.

Flocking Simulation

An interactive boids simulation that demonstrates how complex, lifelike flocking behavior emerges from simple local rules applied to individual agents with no central coordination.

Implemented the classic boids algorithm in Python with pygame — separation, alignment, and cohesion rules produce emergent flocking behavior in real time at 60 FPS.
Added interactive mouse forces (attract/repel), runtime-adjustable speed and perception radius, and spawning controls for hands-on exploration of parameter effects.
Architected with clean dataclass models and strict mypy typing, managed with uv and linted with ruff.

Skills

Languages & Frameworks

PythonSQLRPySparkJavaJavaScript / TypeScriptC#PyTorchNumPyPandasPolarsScikit-learnMatplotlibPlotlySeaborn

Machine Learning & MLOps

Deep LearningComputer VisionForecastingRecommendation SystemsHyperparameter TuningML PipelinesModel MonitoringMetaflow

Generative AI

LLMsAgentic AITool Use & Function CallingFine-tuningRAGPrompt EngineeringEmbeddingsLangChain

Data Engineering

ETL PipelinesData WarehousingSparkKafkaHiveHadoop StackDatabricks

Cloud & Infrastructure

AWSAzureDatabricksDockerREST APIsFastAPISQL & NoSQL DatabasesMicroservices

Data Analysis

Statistical ModelingCausal InferenceTime Series AnalysisData Visualization

Get in Touch

Interested in working together? I'd love to hear from you.

Send an Email LinkedIn GitHub